Jim Fan(@DrJimFan) 's Twitter Profileg
Jim Fan

@DrJimFan

@NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI's first intern.

ID:1007413134

linkhttps://jimfan.me calendar_today12-12-2012 22:11:27

3,6K Tweets

238,1K Followers

2,9K Following

Jim Fan(@DrJimFan) 's Twitter Profile Photo

Fun fact: Transformer was almost named “CargoNet” by Noam. I’m glad he was outvoted and history took a different turn. 😅

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

Eons ago, the first creatures developed hardware to turn light into sight.
Then their small neural nets turn sight into insight.
Then large neural nets turn insight into foresight, enabling reasoning, planning, and actions.
They become embodied agents that learn to perceive and

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

Google I/O. Some thoughts: the model seems to be multimodal in, but not multimodal out. Imagen-3 and music gen models are still detached from Gemini as standalone components. Merging all modality I/O natively is the inevitable future:

- enables tasks like 'use a more robotic

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

The volume of generated pixels will scale exponentially in the near future. They will be in images, video clips, movies, entire TV show seasons, games, VR/AR, and beyond. Entire realities will be synthesized upon prompts.

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

I stand corrected: GPT-4o does NOT natively process video stream. The blog says it only takes image, text, and audio. That's sad, but the principle I said still holds: the right way to make a video-native model efficient is to co-develop the streaming codec on edge device.

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

The best time to work on a moonshot project like humanoid robotics is when only 20% of the experts believe in it 🦾🦾🦿🦿

Poll at ICRA 2024, the top academic robotics conference in the world.

The best time to work on a moonshot project like humanoid robotics is when only 20% of the experts believe in it 🦾🦾🦿🦿 Poll at ICRA 2024, the top academic robotics conference in the world.
account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

Here’s my take on the Bitter Lesson. It’s a guiding principle on how to develop good research taste. Strongly prefer simple & elegant ideas to complex ones, even if the latter scores a bit higher on leaderboard. Develop your research with a mental cluster of 1000 GPUs, even if

account_circle
Jim Fan(@DrJimFan) 's Twitter Profile Photo

Does AlphaZero count as training on synthetic data? There’s no human grandmaster data at all. AlphaZero expands its strategies & wisdom indefinitely with self-driven exploration and compute. The input is just a simple Go/Chess simulator that implements the game rules.

account_circle