Mehdi Ouazza(@mehd_io) 's Twitter Profileg
Mehdi Ouazza

@mehd_io

Data Engineer based in Berlin
Writer on Substack, do videos on Youtube

ID:2337086364

linkhttps://www.youtube.com/@mehdio calendar_today10-02-2014 18:23:32

600 Tweets

1,3K Followers

550 Following

Mehdi Ouazza(@mehd_io) 's Twitter Profile Photo

Last week Seattle, this week Prague, DuckDB user meetups are happening with many great talks! I'll be sharing them soon on YouTube.
Thanks for hosting this in Prague GoodData !

Last week Seattle, this week Prague, @duckdb user meetups are happening with many great talks! I'll be sharing them soon on YouTube. Thanks for hosting this in Prague @gooddata !
account_circle
Mehdi Ouazza(@mehd_io) 's Twitter Profile Photo

A common pattern for ensuring data quality is the 'Write-audit-publish' method. You essentially only serve the final dishes once a data taster has checked for any 'poison' and validated them 🧑‍🍳.

A common pattern for ensuring data quality is the 'Write-audit-publish' method. You essentially only serve the final dishes once a data taster has checked for any 'poison' and validated them 🧑‍🍳.
account_circle
Mehdi Ouazza(@mehd_io) 's Twitter Profile Photo

Often, the data team is the last to be aware of an issue, usually hearing about it from consumers.
If you can't react before they find out, you lose a bit of trust. Probably forever.
Having good data observability is key to preventing this.

Often, the data team is the last to be aware of an issue, usually hearing about it from consumers. If you can't react before they find out, you lose a bit of trust. Probably forever. Having good data observability is key to preventing this.
account_circle
Mehdi Ouazza(@mehd_io) 's Twitter Profile Photo

Can LLMs be useful for data processing?
While we can store useful common prompts, how do we handle repeatability and costs at scale? 🤔

account_circle
Mehdi Ouazza(@mehd_io) 's Twitter Profile Photo

Table war is still on, but it's a step forward for folks to easily switch between execution engine depending on the need.

Table war is still on, but it's a step forward for folks to easily switch between execution engine depending on the need.
account_circle
Mehdi Ouazza(@mehd_io) 's Twitter Profile Photo

It's not news that StackOverflow traffic has drastically decreased since ChatGPT's release.
And if you can't beat them, join them?
stackoverflow.co/company/press/…

account_circle
Mehdi Ouazza(@mehd_io) 's Twitter Profile Photo

I def want to get my hands on LLMs for web scraping, and I found ScrapeGraphAI. They have pretty decent documentation, let's see 👀github.com/VinciGit00/Scr…

account_circle
Mehdi Ouazza(@mehd_io) 's Twitter Profile Photo

SQL vs DataFrame.
Usually use DataFrame (w/ Python) for all ingestion work, and SQL for most of the transformation.

SQL vs DataFrame. Usually use DataFrame (w/ Python) for all ingestion work, and SQL for most of the transformation.
account_circle
MotherDuck(@motherduck) 's Twitter Profile Photo

Join Brian Olsen from Tabular & Mehdi Ouazza for a fun, informative session diving into table formats, with a special focus on Iceberg & DuckDB.
Explore the latest advancements & enjoy a hands-on demo showcasing the power of these tools.
Get ready to quack… and code!

Join @bitsondatadev from @tabulario & @mehd_io for a fun, informative session diving into table formats, with a special focus on Iceberg & DuckDB. Explore the latest advancements & enjoy a hands-on demo showcasing the power of these tools. Get ready to quack… and code!
account_circle
Mehdi Ouazza(@mehd_io) 's Twitter Profile Photo

Scare a data engineer during an interview in one line 💬
'Our data sources are primarily CSVs and Excel sheets hosted on an FTP server.'

account_circle
Mehdi Ouazza(@mehd_io) 's Twitter Profile Photo

Tech vs. traditional companies: IT as a profit center vs. an expense.
Likewise, product teams vs. data teams: revenue drivers vs. cost centers.

account_circle