Dominic Cummings(@Dominic2306) 's Twitter Profileg
Dominic Cummings

@Dominic2306

peace abroad, regime change at home / maths circles / systems politics / worked at Klute nightclub

ID:1354919270375940099

linkhttps://dominiccummings.substack.com/p/tolstoy calendar_today28-01-2021 22:28:52

8,2K Tweets

295,6K Followers

2 Following

Dominic Cummings(@Dominic2306) 's Twitter Profile Photo

Final update on RV Jones and intelligence during World War II. Some general thoughts & lessons on Whitehall - particularly, how Whitehall immediately vandalised both Alanbrooke's Chiefs of Staff system and RVJ's Scientific Intelligence system, the same pattern as General Groves

account_circle
Dominic Cummings(@Dominic2306) 's Twitter Profile Photo

Hacks looking for some electoral/'strategy'/campaign logic

There isn't one

Sunak gave up long ago, if MPs put enough letters in for a vote he'd resign immediately and jump on Netjets to California

There is no plan, there's no desire to fight, No10 is tumbleweed as officials.

account_circle
Alec Stapp(@AlecStapp) 's Twitter Profile Photo

New degrowther idea just dropped in the UK:

Instead of making it legal to build more housing, they're just gonna... checks notes... reallocate the spare bedrooms in 'under-occupied' houses.

New degrowther idea just dropped in the UK: Instead of making it legal to build more housing, they're just gonna... checks notes... reallocate the spare bedrooms in 'under-occupied' houses.
account_circle
Ilan Gur(@ilangur) 's Twitter Profile Photo

๐Ÿšจ Is it possible to build a practical early warning system for climate tipping points? We're interested in deploying a suite of low cost sensing & advanced modelling systems to find out! Check out the latest ARIA programme thesis to weigh in on our plans & get involved ๐Ÿ‘‡

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

I also wanted to highlight one part of the paper that's less flashy and I expect might be missed, but which I think is very scientifically deep: there appears to be a relationship between the size of the dictionary and frequency of rarest concepts learned.

I also wanted to highlight one part of the paper that's less flashy and I expect might be missed, but which I think is very scientifically deep: there appears to be a relationship between the size of the dictionary and frequency of rarest concepts learned.
account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

Beyond safety -- I'm so, so excited for what we're going to learn about the internals of language models.

Some of the features we found are just so delightfully abstract. transformer-circuits.pub/2024/scaling-mโ€ฆ

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

Some other things I'm excited about:

- Can monitoring or steering features improve safety in deployment?
- Can features give us a kind of 'test set' for safety, that we can use to tell how well alignment efforts are working?
- Is there a way we can use this to build an

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

I'm honestly not sure how these new results should update views on safety yet.

But I'm hopeful they'll allow us to make conversations about many issues more concrete.

(Similar to how scaling laws improved discourse on 'when will there be more powerful AI models?')

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

So much of the discourse is confidently polarized: Worrying about AI safety is dumb! No, AI is definitely going to kill us!

I don't know how one could be so confident about these questions on present evidence. x.com/ch402/status/1โ€ฆ

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

Why does this matter? One thing I'm really excited about is safety discourse more grounded in empirical understanding.

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

Forcing the unsafe code feature on causes Claude to introduce a buffer overflow.

Forcing the backdoor feature on will cause Claude to helpfully suggest code that opens a port and starts dumping data to it.

And so on.

Forcing the unsafe code feature on causes Claude to introduce a buffer overflow. Forcing the backdoor feature on will cause Claude to helpfully suggest code that opens a port and starts dumping data to it. And so on.
account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

Now we have traction on a real deployed model, and we're finding features that seem relevant to questions of safety: deception, power seeking, bias, security vulnerability and bioweapon features.

It's wild! x.com/AnthropicAI/stโ€ฆ

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

I've been working on interpretability for more than a decade, significantly motivated by concerns about safety. But it's always been this aspirational goal โ€“ I could tell a story for how this work might someday help, but it was far off.

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

I'm really excited about these results for many reasons, but the most important is that we're starting to connect mechanistic interpretability to questions about the safety of large language models.

account_circle
Dominic Cummings(@Dominic2306) 's Twitter Profile Photo

1/ Thread on failure of old parties, academia, media, think tanks, government bureaucracies on UKR war

The Cadwalladr of IR, Prof Obrien, has been one of the most extreme and delusional. After every failure he just creates a new fake story, blames a lack of willingness to risk

1/ Thread on failure of old parties, academia, media, think tanks, government bureaucracies on UKR war The Cadwalladr of IR, Prof Obrien, has been one of the most extreme and delusional. After every failure he just creates a new fake story, blames a lack of willingness to risk
account_circle
Dominic Cummings(@Dominic2306) 's Twitter Profile Photo

7/ The Comical Ali Crown for the rotten UK media was with the Telegraph but The Times has taken over. As Russia has advanced daily over the last 10 days here's a snapshot of Times coverage from which you'd think UKR must be closing in on Moscow.
This is why the old media is

7/ The Comical Ali Crown for the rotten UK media was with the Telegraph but The Times has taken over. As Russia has advanced daily over the last 10 days here's a snapshot of Times coverage from which you'd think UKR must be closing in on Moscow. This is why the old media is
account_circle