Martin Görner (@martin_gorner) Twitter Tweets • TwiCopy

1 month ago

A nice blog post about Keras 3 from NSF Unidata
unidata.ucar.edu/blogs/news/ent…
The post's conclusion:

'For deep learning training I (Thomas) will be using a Keras 3 API exclusively. It more closely resembles the scikit-learn api and I find it to be easier to explain.'

thumb_up_off_alt4

repeat1

account_circle

Hassan Hayat 🔥

@TheSeaMouse

1 month ago

Why Google Deepmind's Mixture-of-Depths paper, and more generally dynamic compute methods, matter:

Most of the compute is WASTED because not all tokens are equally hard to predict

account_circle

Martin Görner

1 month ago

Gemma in Keras: how to build a chatbot and fine-tune it to speak like a pirate 🏴‍☠️🦜. This was a fun demo to make! It runs with Keras on JAX with the new keras.distribute.ModelParallel API.
Colab: bit.ly/gemma-pirate-d…
Video: youtu.be/AzQBFmPDtTI?si…

account_circle

Martin Görner

2 months ago

Not one but two Gemma competitions are currently live on Kaggle. And we have Keras starter notebooks for both:

kaggle.com/code/awsaf49/p…

kaggle.com/code/awsaf49/k…

Have fun with Gemma! (and check out the prizes: $250,000 in total!)

thumb_up_off_alt44

repeat6

account_circle

François Chollet

@fchollet

2 months ago

Keras 3.0.5 and Keras-nlp 0.8.1 now come pre-installed in Kaggle notebooks -- so you can run Gemma without any extra install steps 🚀

account_circle

Jupyter Meowbooks

@untitled01ipynb

2 months ago

François Chollet it's happening!

thumb_up_off_alt24

repeat2

account_circle

Martin Görner

2 months ago

This was a lot of fun to demo.
Colab here: bit.ly/gemma-pirate-d…
Fav quote from pirate Gemma: 'It's nice that ye like math!'⚔️💣⚔️

thumb_up_off_alt36

repeat3

account_circle

Boris Dayma 🖍️

@borisdayma

3 months ago

My notes reading the Gemma paper:
- arch similar to llama
- 6T tokens for the 7B model!!!
- huge vocab size
- GeGLU for FFN, I wish they ablated the dim used there, people tend to use 4x while I like to be closer to 2.5-3x
- surprised they use Sandwich-Norm, I think Normformer

account_circle

Martin Görner

3 months ago

Keras has added a new distributed training API, supporting full model parallelism for Gemma, and large models in general. It is backed by the XLA compiler in JAX. Code sample here: kaggle.com/code/nilaychau…

thumb_up_off_alt23

repeat6