Julian Michael
@_julianmichael_
Researching stuff @NYUDataScience. he/him
ID:1019072664600637440
http://julianmichael.org 17-07-2018 04:13:51
304 Tweets
1,1K Followers
122 Following
Is GPQA garbage?
A couple weeks ago, typedfemale pointed out some mistakes in a GPQA question, so I figured this would be a good opportunity to discuss how we interpret benchmark scores, and what our goals should be when creating benchmarks.
We are thrilled to announce Colleen McKenzie (Colleen McKenzie) as our new Executive Director.
Read about it from Deger Turan: ai.objectives.institute/blog/colleen-m…
🚨📄 Following up on 'LMs Don't Always Say What They Think', Miles Turpin et al. now have an intervention that dramatically reduces the problem! 📄🚨
It's not a perfect solution, but it's a simple method with few assumptions and it generalizes *much* better than I'd expected.
Two new preprints by CDS Jr Research Scientist david rein and CDS Research Scientist Julian Michael, working with CDS Assoc. Prof. Sam Bowman, aim to enhance the reliability of AI systems through innovative debate methodologies and new benchmarks.
nyudatascience.medium.com/pioneering-ai-…
Looking for something to check out on the last day of #NeurIPS2023 ? Come hang out with EleutherAI SoLaR @ NeurIPS2023
Stella Biderman is speaking on a panel and Jacob Pfau Alex Infanger Abhay Sheshadri, Ayush Panda, Curtis Huebner and Julian Michael have a poster
Room R06-R09
We'll be presenting this as a poster at 4pm today!!🕓 Come hear what ambiguity is all about, how well LMs handle it (hint: much room for improvement!), and how it relates to annotator disagreement #EMNLP2023