Ansh Radhakrishnan (@anshrad) Twitter Tweets • TwiCopy

Ansh Radhakrishnan

@anshrad

+ Follow

Researcher @AnthropicAI

ID:1494503784004800517

calendar_today18-02-2022 02:49:44

35 Tweets

311 Followers

2,1K Following

Tristan Hume

1 month ago

Here's Claude 3 Haiku running at >200 tokens/s (>2x as fast as prod)! We've been working on capacity optimizations but we can have fun testing those as speed optimizations via overly-costly low batch size. Come work with me at Anthropic on things like this, more info in thread 🧵

thumb_up_off_alt414

chat_bubble_outline0

account_circle

Jesse Mu

2 months ago

We’re hiring for the adversarial robustness team Anthropic!

As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)

We’re hiring for the adversarial robustness team @AnthropicAI! As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)

thumb_up_off_alt461

chat_bubble_outline0

account_circle

Ethan Perez

2 months ago

Come join our team! We're trying to make LLMs unjailbreakable, or clearly demonstrate it's not possible. More in this 🧵 on what we're up to

thumb_up_off_alt66

chat_bubble_outline0

account_circle

Sam Bowman

@sleepinyourhat

2 months ago

Claude 3 is out, and tops out at 59.5 (or 50.4 zero-shot) on GPQA.

thumb_up_off_alt118

chat_bubble_outline0

account_circle

Anthropic

2 months ago

Today, we're announcing Claude 3, our next generation of AI models.

The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.

Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.

thumb_up_off_alt9,8K

chat_bubble_outline0

account_circle

akbir.

3 months ago

How can we check LLM outputs in domains where we are not experts?

We find that non-expert humans answer questions better after reading debates between expert LLMs.
Moreover, human judges are more accurate as experts get more persuasive. 📈
github.com/ucl-dark/llm_d…

How can we check LLM outputs in domains where we are not experts? We find that non-expert humans answer questions better after reading debates between expert LLMs. Moreover, human judges are more accurate as experts get more persuasive. 📈 github.com/ucl-dark/llm_d…

thumb_up_off_alt235

chat_bubble_outline0

account_circle

Sam Bowman

@sleepinyourhat

4 months ago

I'm hiring research engineers for several alignment/technical safety teams at Anthropic!

I'm hiring research engineers for several alignment/technical safety teams at Anthropic!

thumb_up_off_alt688

chat_bubble_outline0

account_circle

Buck Shlegeris

5 months ago

New paper! We design and test safety techniques that prevent models from causing bad outcomes even if the models collude to subvert them. We think that this approach is the most promising available strategy for minimizing risk from deceptively aligned models. 🧵

New paper! We design and test safety techniques that prevent models from causing bad outcomes even if the models collude to subvert them. We think that this approach is the most promising available strategy for minimizing risk from deceptively aligned models. 🧵

thumb_up_off_alt139

chat_bubble_outline0

account_circle

Sam Bowman

@sleepinyourhat

5 months ago

If you'll be at #NeurIPS2023 and you're interested in chatting with someone at Anthropic about research or roles, there'll be a few people of us around.

Expression of interest form here: docs.google.com/forms/d/e/1FAI…

If you'll be at #NeurIPS2023 and you're interested in chatting with someone at Anthropic about research or roles, there'll be a few people of us around. Expression of interest form here: docs.google.com/forms/d/e/1FAI…

thumb_up_off_alt199

chat_bubble_outline0

account_circle

david rein

6 months ago

🧵Announcing GPQA, a graduate-level “Google-proof” Q&A benchmark designed for scalable oversight! w/ Julian Michael, Sam Bowman

GPQA is a dataset of *really hard* questions that PhDs with full access to Google can’t answer.

Paper: arxiv.org/abs/2311.12022

🧵Announcing GPQA, a graduate-level “Google-proof” Q&A benchmark designed for scalable oversight! w/ @_julianmichael_, @sleepinyourhat GPQA is a dataset of *really hard* questions that PhDs with full access to Google can’t answer. Paper: arxiv.org/abs/2311.12022

thumb_up_off_alt884

chat_bubble_outline0

account_circle

Sam Bowman

@sleepinyourhat

6 months ago

🚨New dataset for LLM/scalable oversight evaluations! 🚨

This has been one of the big central efforts of my NYU lab over the last year, and I’m really exited to start using it.

thumb_up_off_alt145

chat_bubble_outline0

account_circle

Sam Bowman

@sleepinyourhat

8 months ago

I'm proud to see this come out.

These governance mechanisms here commit us to pause scaling whenever we can't show that we're on track to manage the worst-case risks presented by new models. And it does that _without_ assuming that we fully understand those risks now.

thumb_up_off_alt91

chat_bubble_outline0

account_circle

Anthropic

8 months ago

Today, we’re publishing our Responsible Scaling Policy (RSP) – a series of technical and organizational protocols to help us manage the risks of developing increasingly capable AI systems.

Today, we’re publishing our Responsible Scaling Policy (RSP) – a series of technical and organizational protocols to help us manage the risks of developing increasingly capable AI systems.

thumb_up_off_alt615

chat_bubble_outline0

account_circle

Anthropic

9 months ago

Large language models have demonstrated a surprising range of skills and behaviors. How can we trace their source? In our new paper, we use influence functions to find training examples that contribute to a given model output.

Large language models have demonstrated a surprising range of skills and behaviors. How can we trace their source? In our new paper, we use influence functions to find training examples that contribute to a given model output.

thumb_up_off_alt1,0K

chat_bubble_outline0

account_circle

Dwarkesh Patel

9 months ago

Here is my conversation with Dario Amodei, CEO of Anthropic.

We discuss:

- why human level AI is 2-3 years away
- race dynamics with OpenAI & China
- $10 billion training runs, bioterrorism, alignment, cyberattacks, scaling...

thumb_up_off_alt666

chat_bubble_outline0

account_circle

Ajeya Cotra

9 months ago

Important article: time.com/6300942/ai-pro… The single most important data point that suggests 'progress is unlikely to slow in the next 2-3y': GPT-4 cost ~$100M (probably less), and Alphabet has 1000x that much money in cash on hand:

thumb_up_off_alt468

chat_bubble_outline0

account_circle

Logan Graham

9 months ago

Hi Twitter -- I've been quiet for a while! Here's something that (high level) explains some of what I've been up to. I think it is making, and will make, a meaningful and unique contribution to AI safety. I'm hiring -- join me. anthropic.com/index/frontier…

thumb_up_off_alt163

chat_bubble_outline0

account_circle

Anthropic

9 months ago

In this post, we share high level findings from a frontier threats red teaming project we conducted on biological risks: anthropic.com/index/frontier…

In this post, we share high level findings from a frontier threats red teaming project we conducted on biological risks: anthropic.com/index/frontier…

thumb_up_off_alt266

chat_bubble_outline0

account_circle

Joshua Batson

9 months ago

I've thoroughly enjoyed working with this team since I joined in March...highly collaborative, focused on hard and important problems. If you're interested, please apply. If you want to learn more, email me [email protected]

thumb_up_off_alt26

chat_bubble_outline0

account_circle

Chris Olah

9 months ago

The mechanistic interpretability team at Anthropic is hiring! Come work with us to help solve the mystery of how large models do what they do, with the goal of making them safer.

jobs.lever.co/Anthropic/33dc…

thumb_up_off_alt503

chat_bubble_outline0

account_circle

fpc ok :)