Ansh Radhakrishnan(@anshrad) 's Twitter Profileg
Ansh Radhakrishnan

@anshrad

Researcher @AnthropicAI

ID:1494503784004800517

calendar_today18-02-2022 02:49:44

35 Tweets

311 Followers

2,1K Following

Tristan Hume(@trishume) 's Twitter Profile Photo

Here's Claude 3 Haiku running at >200 tokens/s (>2x as fast as prod)! We've been working on capacity optimizations but we can have fun testing those as speed optimizations via overly-costly low batch size. Come work with me at Anthropic on things like this, more info in thread šŸ§µ

account_circle
Jesse Mu(@jayelmnop) 's Twitter Profile Photo

Weā€™re hiring for the adversarial robustness team Anthropic!

As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If youā€™re interested in these areas, let us know! (emails in šŸ§µ)

Weā€™re hiring for the adversarial robustness team @AnthropicAI! As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If youā€™re interested in these areas, let us know! (emails in šŸ§µ)
account_circle
Ethan Perez(@EthanJPerez) 's Twitter Profile Photo

Come join our team! We're trying to make LLMs unjailbreakable, or clearly demonstrate it's not possible. More in this šŸ§µ on what we're up to

account_circle
Anthropic(@AnthropicAI) 's Twitter Profile Photo

Today, we're announcing Claude 3, our next generation of AI models.

The three state-of-the-art modelsā€”Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haikuā€”set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.

Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art modelsā€”Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haikuā€”set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
account_circle
akbir.(@akbirkhan) 's Twitter Profile Photo

How can we check LLM outputs in domains where we are not experts?

We find that non-expert humans answer questions better after reading debates between expert LLMs.
Moreover, human judges are more accurate as experts get more persuasive. šŸ“ˆ
github.com/ucl-dark/llm_dā€¦

How can we check LLM outputs in domains where we are not experts? We find that non-expert humans answer questions better after reading debates between expert LLMs. Moreover, human judges are more accurate as experts get more persuasive. šŸ“ˆ github.com/ucl-dark/llm_dā€¦
account_circle
Buck Shlegeris(@bshlgrs) 's Twitter Profile Photo

New paper! We design and test safety techniques that prevent models from causing bad outcomes even if the models collude to subvert them. We think that this approach is the most promising available strategy for minimizing risk from deceptively aligned models. šŸ§µ

New paper! We design and test safety techniques that prevent models from causing bad outcomes even if the models collude to subvert them. We think that this approach is the most promising available strategy for minimizing risk from deceptively aligned models. šŸ§µ
account_circle
Sam Bowman(@sleepinyourhat) 's Twitter Profile Photo

If you'll be at and you're interested in chatting with someone at Anthropic about research or roles, there'll be a few people of us around.

Expression of interest form here: docs.google.com/forms/d/e/1FAIā€¦

If you'll be at #NeurIPS2023 and you're interested in chatting with someone at Anthropic about research or roles, there'll be a few people of us around. Expression of interest form here: docs.google.com/forms/d/e/1FAIā€¦
account_circle
david rein(@idavidrein) 's Twitter Profile Photo

šŸ§µAnnouncing GPQA, a graduate-level ā€œGoogle-proofā€ Q&A benchmark designed for scalable oversight! w/ Julian Michael, Sam Bowman

GPQA is a dataset of *really hard* questions that PhDs with full access to Google canā€™t answer.

Paper: arxiv.org/abs/2311.12022

šŸ§µAnnouncing GPQA, a graduate-level ā€œGoogle-proofā€ Q&A benchmark designed for scalable oversight! w/ @_julianmichael_, @sleepinyourhat GPQA is a dataset of *really hard* questions that PhDs with full access to Google canā€™t answer. Paper: arxiv.org/abs/2311.12022
account_circle
Sam Bowman(@sleepinyourhat) 's Twitter Profile Photo

šŸšØNew dataset for LLM/scalable oversight evaluations! šŸšØ

This has been one of the big central efforts of my NYU lab over the last year, and Iā€™m really exited to start using it.

account_circle
Sam Bowman(@sleepinyourhat) 's Twitter Profile Photo

I'm proud to see this come out.

These governance mechanisms here commit us to pause scaling whenever we can't show that we're on track to manage the worst-case risks presented by new models. And it does that _without_ assuming that we fully understand those risks now.

account_circle
Anthropic(@AnthropicAI) 's Twitter Profile Photo

Today, weā€™re publishing our Responsible Scaling Policy (RSP) ā€“ a series of technical and organizational protocols to help us manage the risks of developing increasingly capable AI systems.

Today, weā€™re publishing our Responsible Scaling Policy (RSP) ā€“ a series of technical and organizational protocols to help us manage the risks of developing increasingly capable AI systems.
account_circle
Anthropic(@AnthropicAI) 's Twitter Profile Photo

Large language models have demonstrated a surprising range of skills and behaviors. How can we trace their source? In our new paper, we use influence functions to find training examples that contribute to a given model output.

Large language models have demonstrated a surprising range of skills and behaviors. How can we trace their source? In our new paper, we use influence functions to find training examples that contribute to a given model output.
account_circle
Dwarkesh Patel(@dwarkesh_sp) 's Twitter Profile Photo

Here is my conversation with Dario Amodei, CEO of Anthropic.

We discuss:

- why human level AI is 2-3 years away
- race dynamics with OpenAI & China
- $10 billion training runs, bioterrorism, alignment, cyberattacks, scaling...

account_circle
Ajeya Cotra(@ajeya_cotra) 's Twitter Profile Photo

Important article: time.com/6300942/ai-proā€¦ The single most important data point that suggests 'progress is unlikely to slow in the next 2-3y': GPT-4 cost ~$100M (probably less), and Alphabet has 1000x that much money in cash on hand:

account_circle
Logan Graham(@logangraham) 's Twitter Profile Photo

Hi Twitter -- I've been quiet for a while! Here's something that (high level) explains some of what I've been up to. I think it is making, and will make, a meaningful and unique contribution to AI safety. I'm hiring -- join me. anthropic.com/index/frontierā€¦

account_circle
Anthropic(@AnthropicAI) 's Twitter Profile Photo

In this post, we share high level findings from a frontier threats red teaming project we conducted on biological risks: anthropic.com/index/frontierā€¦

In this post, we share high level findings from a frontier threats red teaming project we conducted on biological risks: anthropic.com/index/frontierā€¦
account_circle
Joshua Batson(@thebasepoint) 's Twitter Profile Photo

I've thoroughly enjoyed working with this team since I joined in March...highly collaborative, focused on hard and important problems. If you're interested, please apply. If you want to learn more, email me [email protected]

account_circle
Chris Olah(@ch402) 's Twitter Profile Photo

The mechanistic interpretability team at Anthropic is hiring! Come work with us to help solve the mystery of how large models do what they do, with the goal of making them safer.

jobs.lever.co/Anthropic/33dcā€¦

account_circle