Lewis Tunstall(@_lewtun) 's Twitter Profileg
Lewis Tunstall

@_lewtun

🤗 LLM engineering & research @huggingface
📖 Co-author of "NLP with Transformers" book
💥 Ex-particle physicist
🤘 Occasional guitarist
🇦🇺 in 🇨🇭

ID:1029493180704714753

linkhttps://transformersbook.com/ calendar_today14-08-2018 22:21:16

3,2K Tweets

9,4K Followers

424 Following

Lewis Tunstall(@_lewtun) 's Twitter Profile Photo

What's the best weight quantization algorithm for LLMs that maximally preserves performance on reasoning tasks?

I've been tinkering with AutoAWQ lately and 4-bit quantization leads to a noticeable drop in accuracy, even after using an in-domain dataset for calibration 🤔

Maybe

account_circle
Lewis Tunstall(@_lewtun) 's Twitter Profile Photo

The Open LLM Leaderboard just crossed 10k likes, making it the second most popular Space on the Hub 🔥!

What started as an internal project by Edward Beeching has grown into a large-scale evaluation effort thanks to Clémentine Fourrier 🍊 Nathan and the whole open source

The Open LLM Leaderboard just crossed 10k likes, making it the second most popular Space on the Hub 🔥! What started as an internal project by @edwardbeeching has grown into a large-scale evaluation effort thanks to @clefourrier @nathanhabib1011 and the whole open source
account_circle
Lewis Tunstall(@_lewtun) 's Twitter Profile Photo

This is a very nice and thorough analysis of LoRA vs full fine-tuning. The observation that LoRA forgets less also explains why DPO with LoRA often works really well because it acts as a regularizer to prevent the effects of overfitting

account_circle
Victor Sanh(@SanhEstPasMoi) 's Twitter Profile Photo

💬🔥Releasing idefics2-8b-chatty, the chat-optimized version of Idefics2!

It is a very efficient (8B parameters) state-of-the-art VLM, has been red-teamed, and comes with a few surprises:
- 📖Paper dissecting a lot of the experimental insights we learned building Idefics2:
-

💬🔥Releasing idefics2-8b-chatty, the chat-optimized version of Idefics2! It is a very efficient (8B parameters) state-of-the-art VLM, has been red-teamed, and comes with a few surprises: - 📖Paper dissecting a lot of the experimental insights we learned building Idefics2: -
account_circle
Sebastian Ruder(@seb_ruder) 's Twitter Profile Photo

Lewis Tunstall lmsys.org Patrick Lewis sophiaalthammer Ola Piktus and others published a paper on arXiv today that advocates for using an ensemble of judges (panel of LLMs; PoLL). Their evaluation includes an ablation comparing different prompt variants.
arxiv.org/abs/2404.18796

@_lewtun @lmsysorg @PSH_Lewis @sophiaalthammer @olapiktus and others published a paper on arXiv today that advocates for using an ensemble of judges (panel of LLMs; PoLL). Their evaluation includes an ablation comparing different prompt variants. arxiv.org/abs/2404.18796
account_circle
BigCode(@BigCodeProject) 's Twitter Profile Photo

Releasing StarCoder2 Instruct! 🚀

Achieves 72% HumanEval score using only self-generated content without any GPT-3.5/4 data. This work demonstrates that self-instruct works already well at the 15B scale without data from proprietary models!

Read more: huggingface.co/blog/sc2-instr…

Releasing StarCoder2 Instruct! 🚀 Achieves 72% HumanEval score using only self-generated content without any GPT-3.5/4 data. This work demonstrates that self-instruct works already well at the 15B scale without data from proprietary models! Read more: huggingface.co/blog/sc2-instr…
account_circle
Victor Sanh(@SanhEstPasMoi) 's Twitter Profile Photo

Glad to see Idefics2 making its way into the awesome OpenVLM Leaderboard which ranks VLMs.
2nd in its category (<10B parameters and open weights)!

While InternLM-XComposer2 uses proprietary data, Idefics2 is built solely using openly available data.

Leaderboard:

Glad to see Idefics2 making its way into the awesome OpenVLM Leaderboard which ranks VLMs. 2nd in its category (<10B parameters and open weights)! While InternLM-XComposer2 uses proprietary data, Idefics2 is built solely using openly available data. Leaderboard:
account_circle
Lewis Tunstall(@_lewtun) 's Twitter Profile Photo

Yesterday I gave an overview of the LLM alignment landscape at the ZurichAI meetup - thank you Aleks Ficek 🧪 and Florian for hosting me 🤗!

Here's the slides from the talk: docs.google.com/presentation/d…

Yesterday I gave an overview of the LLM alignment landscape at the @zurichnlp meetup - thank you @AlekFicek and @FlorianCaesar for hosting me 🤗! Here's the slides from the talk: docs.google.com/presentation/d…
account_circle
Lewis Tunstall(@_lewtun) 's Twitter Profile Photo

Fun prompt!

Interesting to see that even Llama-70B trips up on it (right), but Zephyr ORPO (left) realises the monkeys are jumping on the bed

Fun prompt! Interesting to see that even Llama-70B trips up on it (right), but Zephyr ORPO (left) realises the monkeys are jumping on the bed
account_circle
Lewis Tunstall(@_lewtun) 's Twitter Profile Photo

The best part about making slides on LLM alignment is that I now get to combine my two passions in life: math and memes 😅

(this one is a classic from Tom Goldstein)

The best part about making slides on LLM alignment is that I now get to combine my two passions in life: math and memes 😅 (this one is a classic from @tomgoldsteincs)
account_circle
Wing Lian (caseus)(@winglian) 's Twitter Profile Photo

If you set Llama-3's rope_theta to 8M, you can get 100% passkey retrieval across all depths up to 40K context. No continued pre-training needed. Scaling up further leads to much lower retrieval accuracy, but it doesn't completely fail.

account_circle
Lewis Tunstall(@_lewtun) 's Twitter Profile Photo

Once inference for QDoRA matches 16-bit precision speed, I can see this being super handy for online methods like PPO where you use adapters to both generate & rank completions.

(This would also be cool for inference, where you generate/rank N completions and send the best one

account_circle