We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples. By establishing a correlation between sample quality and image classification accuracy, we show that our best generative model also contains features competitive with top convolutional nets in the unsupervised setting.
Latest Briefs
Fast updates from the latest stories.
SOCIAL
+4
Celebrating 20 Years Since Jack Dorsey's First Tweet
Mar 21, 2026
MEDIA & ENTERTAINMENT
+3
Publisher pulls horror novel ‘Shy Girl’ over AI concerns
Mar 21, 2026
COMPANIES
State Street, Voya Seek Shelter From Default Risk
Mar 21, 2026
NEWS
Zetwerk's Upcoming IPO: Key Details on Size, Valuation, and Timeline
Mar 21, 2026