Alex Reibman 🖇️(@AlexReibman) 's Twitter Profile Photo

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Ever since OpenInterpreter, we've all been wondering just how effective agents can be if you give them a computer.

Now we have a proper benchmark. Let's take a look (🧵):

account_circle
special thanks(@specialtha58233) 's Twitter Profile Photo

We need a railway to the ocean. A cheap railway. Now we are begging the Chinese. Why are we begging? We can build it ourselves if we are disciplined and we do not squander money on wages, allowances, benchmarking, and so on - President Museveni

We need a railway to the ocean. A cheap railway. Now we are begging the  Chinese. Why are we begging? We can build it ourselves if we are  disciplined and we do not squander money on wages, allowances,  benchmarking, and so on - President Museveni
account_circle
Greater Vancouver Board of Trade(@BoardofTrade) 's Twitter Profile Photo

The inaugural Benchmarking Greater Vancouver Report sheds light on our region's economic performance, infrastructure, and livability.

While our branding is strong, the region’s performance in crucial factors lags behind its peers. 1/6
boardoftrade.com/news/58-news/2…

The inaugural Benchmarking Greater Vancouver Report sheds light on our region's economic performance, infrastructure, and livability.

While our branding is strong, the region’s performance in crucial factors lags behind its peers. 1/6 #Benchmark2024
 boardoftrade.com/news/58-news/2…
account_circle
Rwanda Investigation Bureau(@RIB_Rw) 's Twitter Profile Photo

The RIB Team, led by Director General of Crime Investigation Twagirayezu Jean Marie, received members of the anti-corruption commission of South Sudan who are in Rwanda for a study tour aimed at benchmarking strategies to combat corruption and injustices.

The RIB Team, led by Director General of Crime Investigation Twagirayezu Jean Marie, received members of the anti-corruption commission of South Sudan who are in Rwanda for a study tour aimed at benchmarking strategies to combat corruption and injustices.
account_circle
SomePLAOSINT(@someplaosint) 's Twitter Profile Photo

1 | Benchmarking 🇨🇳PLAN's single carrier battle group (CBG) against the entirety of top European navies (🇫🇷&🇮🇹).
->The Liaoning CBG has more VLS cells than either one of the two European navies
->VLS count is a rough (sometimes biased) proxy of firepower
->Caveats apply

1 | Benchmarking 🇨🇳PLAN's single carrier battle group (CBG) against the entirety of top European navies (🇫🇷&🇮🇹).
->The Liaoning CBG has more VLS cells than either one of the two European navies
->VLS count is a rough (sometimes biased) proxy of firepower
->Caveats apply
account_circle
NTV UGANDA(@ntvuganda) 's Twitter Profile Photo

We need a railway to the ocean. A cheap railway. Now we are begging the Chinese. Why are we begging? We can build it ourselves if we are disciplined and we do not squander money on wages, allowances, benchmarking and so on - President Museveni

We need a railway to the ocean. A cheap railway. Now we are begging the Chinese. Why are we begging? We can build it ourselves if we are disciplined and we do not squander money on wages, allowances, benchmarking and so on - President Museveni

#LabourDay #NTVNews
account_circle
Ethan Mollick(@emollick) 's Twitter Profile Photo

There is a mysterious new model called gpt2-chatbot accessible from a major LLM benchmarking site. No one knows who made it or what it is, but I have been playing with it a little and it appears to be in the same rough ability level as GPT-4. A mysterious GPT-4 class model? Neat!

There is a mysterious new model called gpt2-chatbot accessible from a major LLM benchmarking site. No one knows who made it or what it is, but I have been playing with it a little and it appears to be in the same rough ability level as GPT-4. A mysterious GPT-4 class model? Neat!
account_circle
Taki Udon(@TakiUdon_) 's Twitter Profile Photo

Switch Lite OLED Screen Update

I just finished benchmarking some mod screen samples. The new screens are much better than my original. I'll try to share more information as I get it

Calibration data from Portrait Displays: portrait.com

Switch Lite OLED Screen Update  

I just finished benchmarking some mod screen samples. The new screens are much better than my original. I'll try to share more information as I get it 

Calibration data from Portrait Displays: portrait.com
account_circle
McGrumbleSnatcher(@SnatchMickeyG) 's Twitter Profile Photo

Did some benchmarking to get the minimum specs for TNaEI. Definatly at least 2gb of RAM is required for smooth gameplay. I'll talk more about it on my devlog

Did some benchmarking to get the minimum specs for TNaEI. Definatly at least 2gb of RAM is required for smooth gameplay. I'll talk more about it on my devlog
account_circle
Windows Latest(@WindowsLatest) 's Twitter Profile Photo

Surface Laptop 6 with Snapdragon X Elite, 16GB RAM base and Windows 11 24H2 spotted in four new benchmarks: windowslatest.com/2024/04/25/sur…

Geekbench listings reveal Surface Laptop 6 ARM edition have undergone benchmarking. The device achieved nearly 14,100 in multi-thread…

Surface Laptop 6 with Snapdragon X Elite, 16GB RAM base and Windows 11 24H2 spotted in four new benchmarks: windowslatest.com/2024/04/25/sur… #Windows11 

Geekbench listings reveal Surface Laptop 6 ARM edition have undergone benchmarking. The device achieved nearly 14,100 in multi-thread…
account_circle
BELLA II(@BELLA_Programme) 's Twitter Profile Photo

¿Listos para dar un impulso al proyecto BELLA II? ¡Únete a nuestra convocatoria de consultoría para desarrollar un benchmarking de modelos de inversión! ¡Participa antes del 28 de mayo y sé parte del cambio! Detalles 👉 shorturl.at/iO289
🚀🤝RedCLARA

¿Listos para dar un impulso al proyecto BELLA II? ¡Únete a nuestra convocatoria de consultoría para desarrollar un benchmarking de modelos de inversión! ¡Participa antes del 28 de mayo y sé parte del cambio! Detalles 👉 shorturl.at/iO289
#Consultoría #BELLAII 🚀🤝@RedCLARA
account_circle
Lindsay Fernandes(@lindsaydoucet) 's Twitter Profile Photo

Excited to be at British Columbia Institute of Technology (BCIT)’s downtown campus to launch our newest report: Benchmarking Greater Vancouver 2024! Lots of interesting findings to explore in this report led by Greater Vancouver Board of Trade & TheBusinessOfCities 📝

Stay tuned!

Excited to be at @bcit’s downtown campus to launch our newest report: Benchmarking Greater Vancouver 2024!  Lots of interesting findings to explore in this report led by @BoardofTrade & @TheBizOfCities 📝 

Stay tuned!
account_circle
intinyadeh(@intinyadeh) 's Twitter Profile Photo

frasa 'ada saran konkret?' konteksnya utk balas salah seorang netizen ttg benchmarking.

Sadar gabisa bebani pembayar pajak utk mikir, itu tugas pejabat yg digaji utk itu.

Kemenkeu selama ini ada 'Kemenkeu Mendengar' utk nampung aspirasi dan nindaklanjuti problem.

account_circle