It’s tough to understand all the numbers on COVID developments. I made a Colab notebook to try to answer the question “are things getting better or worse?”

Reading about how Leonardo da Vinci journaled so prolifically but only published a tiny fraction made me feel both sad and inspired. I was sad that his incredible observations and discoveries lay dormant in his private collection. For example, he was the first person to document all the kinds of…

An exploration of standard sampling techniques and the new nucleus sampling

Humans often choose words that surprise language models (Holtzman et al 2019)

Causal language models like GPT-2 are trained to predict the probability of the next word given some context. For example, given “I ate a delicious hot ___”, the model may predict “dog” with 80% probability, “pancake” 5% probability, etc. The cool thing about this structure is they can be used…

Ben Mann, Yaroslav Bulatov, Darius Lam

TL;DR: we made Transformer-XL train efficiently on 128 GPUs on AWS. The code is available at

We achieved almost linear throughput scaling on AWS p3dn.24xlarge instances with 8 V100–32GB each on a 100Gbps network


One of the difficulties of researching language models is that you often don’t know if your ideas work until you try them on a real-world datasets. …

Ben Mann

