Lewis Tunstall (@_lewtun) Twitter Tweets • TwiCopy

Lewis Tunstall

@_lewtun

+ Follow

🤗 LLM engineering & research @huggingface
📖 Co-author of "NLP with Transformers" book
💥 Ex-particle physicist
🤘 Occasional guitarist
🇦🇺 in 🇨🇭

ID:1029493180704714753

linkhttps://transformersbook.com/ calendar_today14-08-2018 22:21:16

3,2K Tweets

9,4K Followers

424 Following

Lewis Tunstall

5 days ago

What's the best weight quantization algorithm for LLMs that maximally preserves performance on reasoning tasks?

I've been tinkering with AutoAWQ lately and 4-bit quantization leads to a noticeable drop in accuracy, even after using an in-domain dataset for calibration 🤔

Maybe

thumb_up_off_alt3

chat_bubble_outline0

account_circle

Lewis Tunstall

6 days ago

The Open LLM Leaderboard just crossed 10k likes, making it the second most popular Space on the Hub 🔥!

What started as an internal project by Edward Beeching has grown into a large-scale evaluation effort thanks to Clémentine Fourrier 🍊 Nathan and the whole open source

The Open LLM Leaderboard just crossed 10k likes, making it the second most popular Space on the Hub 🔥! What started as an internal project by @edwardbeeching has grown into a large-scale evaluation effort thanks to @clefourrier @nathanhabib1011 and the whole open source

thumb_up_off_alt24

chat_bubble_outline0

account_circle

Lewis Tunstall

1 week ago

Pretty cool to see Zephyr 7B ranked as one of the top-3 best instruct models by the new leaderboard from Predibase 🥳

Pretty cool to see Zephyr 7B ranked as one of the top-3 best instruct models by the new leaderboard from @predibase 🥳

thumb_up_off_alt46

chat_bubble_outline0

account_circle

Lewis Tunstall

2 weeks ago

Sometimes, my pet LLM is a total dumbarse

Sometimes, my pet LLM is a total dumbarse

thumb_up_off_alt3

chat_bubble_outline0

account_circle

Lewis Tunstall

2 weeks ago

This is a very nice and thorough analysis of LoRA vs full fine-tuning. The observation that LoRA forgets less also explains why DPO with LoRA often works really well because it acts as a regularizer to prevent the effects of overfitting

thumb_up_off_alt34

chat_bubble_outline0

account_circle

Victor Sanh

4 weeks ago

💬🔥Releasing idefics2-8b-chatty, the chat-optimized version of Idefics2!

It is a very efficient (8B parameters) state-of-the-art VLM, has been red-teamed, and comes with a few surprises:
- 📖Paper dissecting a lot of the experimental insights we learned building Idefics2:
-

💬🔥Releasing idefics2-8b-chatty, the chat-optimized version of Idefics2! It is a very efficient (8B parameters) state-of-the-art VLM, has been red-teamed, and comes with a few surprises: - 📖Paper dissecting a lot of the experimental insights we learned building Idefics2: -

thumb_up_off_alt98

chat_bubble_outline0

account_circle

Lewis Tunstall

1 month ago

Looks like Cohere For AI provided a timely answer to my question :)

x.com/PSH_Lewis/stat…

thumb_up_off_alt15

chat_bubble_outline0

account_circle

Sebastian Ruder

1 month ago

Lewis Tunstall lmsys.org Patrick Lewis sophiaalthammer Ola Piktus and others published a paper on arXiv today that advocates for using an ensemble of judges (panel of LLMs; PoLL). Their evaluation includes an ablation comparing different prompt variants.
arxiv.org/abs/2404.18796

@_lewtun @lmsysorg @PSH_Lewis @sophiaalthammer @olapiktus and others published a paper on arXiv today that advocates for using an ensemble of judges (panel of LLMs; PoLL). Their evaluation includes an ablation comparing different prompt variants. arxiv.org/abs/2404.18796

thumb_up_off_alt25

chat_bubble_outline0

account_circle

Victor M

1 month ago

zephyr-orpo-141b passes it + it gives a 'you are a bit dumb' vibe in the answer 😅

zephyr-orpo-141b passes it + it gives a 'you are a bit dumb' vibe in the answer 😅

thumb_up_off_alt16

chat_bubble_outline0

account_circle

BigCode

@BigCodeProject

1 month ago

Releasing StarCoder2 Instruct! 🚀

Achieves 72% HumanEval score using only self-generated content without any GPT-3.5/4 data. This work demonstrates that self-instruct works already well at the 15B scale without data from proprietary models!

Read more: huggingface.co/blog/sc2-instr…

Releasing StarCoder2 Instruct! 🚀 Achieves 72% HumanEval score using only self-generated content without any GPT-3.5/4 data. This work demonstrates that self-instruct works already well at the 15B scale without data from proprietary models! Read more: huggingface.co/blog/sc2-instr…

thumb_up_off_alt303

chat_bubble_outline0

account_circle

Victor Sanh

1 month ago

Glad to see Idefics2 making its way into the awesome OpenVLM Leaderboard which ranks VLMs.
2nd in its category (<10B parameters and open weights)!

While InternLM-XComposer2 uses proprietary data, Idefics2 is built solely using openly available data.

Leaderboard:

Glad to see Idefics2 making its way into the awesome OpenVLM Leaderboard which ranks VLMs. 2nd in its category (<10B parameters and open weights)! While InternLM-XComposer2 uses proprietary data, Idefics2 is built solely using openly available data. Leaderboard:

thumb_up_off_alt22

chat_bubble_outline0

account_circle

Lewis Tunstall

1 month ago

Yesterday I gave an overview of the LLM alignment landscape at the ZurichAI meetup - thank you Aleks Ficek 🧪 and Florian for hosting me 🤗!

Here's the slides from the talk: docs.google.com/presentation/d…

Yesterday I gave an overview of the LLM alignment landscape at the @zurichnlp meetup - thank you @AlekFicek and @FlorianCaesar for hosting me 🤗! Here's the slides from the talk: docs.google.com/presentation/d…

thumb_up_off_alt58

chat_bubble_outline0

account_circle

Lewis Tunstall

1 month ago

Fun prompt!

Interesting to see that even Llama-70B trips up on it (right), but Zephyr ORPO (left) realises the monkeys are jumping on the bed

Fun prompt! Interesting to see that even Llama-70B trips up on it (right), but Zephyr ORPO (left) realises the monkeys are jumping on the bed

thumb_up_off_alt6

chat_bubble_outline0

account_circle

Lewis Tunstall

1 month ago

Maybe Ilya is simply training to become a professional athlete?

Maybe Ilya is simply training to become a professional athlete?

thumb_up_off_alt12

chat_bubble_outline0

account_circle

Lewis Tunstall

1 month ago

The best part about making slides on LLM alignment is that I now get to combine my two passions in life: math and memes 😅

(this one is a classic from Tom Goldstein)

The best part about making slides on LLM alignment is that I now get to combine my two passions in life: math and memes 😅 (this one is a classic from @tomgoldsteincs)

thumb_up_off_alt57

chat_bubble_outline0

account_circle

Wing Lian (caseus)

1 month ago

If you set Llama-3's rope_theta to 8M, you can get 100% passkey retrieval across all depths up to 40K context. No continued pre-training needed. Scaling up further leads to much lower retrieval accuracy, but it doesn't completely fail.

thumb_up_off_alt274

chat_bubble_outline0

account_circle

Lewis Tunstall

1 month ago

I should probably spend more time working on my slides than playing with Dalle ...

I should probably spend more time working on my slides than playing with Dalle ...

thumb_up_off_alt18

chat_bubble_outline0

account_circle

Lewis Tunstall

1 month ago

Once inference for QDoRA matches 16-bit precision speed, I can see this being super handy for online methods like PPO where you use adapters to both generate & rank completions.

(This would also be cool for inference, where you generate/rank N completions and send the best one

thumb_up_off_alt22

chat_bubble_outline0

account_circle