Apache Parquet(@ApacheParquet) 's Twitter Profileg
Apache Parquet

@ApacheParquet

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.
It provides high performance compression

ID:1342646282

linkhttps://parquet.apache.org calendar_today10-04-2013 19:07:49

383 Tweets

8,7K Followers

26 Following

Rajat(@rajatk95) 's Twitter Profile Photo

Stack Overflow Apache Spark Can someone answere this -> why is Apache Parquet format faster than other columnar storage like hbase, kudu etc? stackoverflow.com/q/48761227/318…

account_circle
Lee Blum(@theLeeBlum) 's Twitter Profile Photo

My talk from the DMBI 2018 Conference at Ben-Gurion University of the Negev about our journey at @Verint_Cyber to Analytics on Apache Hadoop Apache Spark Apache Kafka Apache Parquet is available at youtube.com/watch?v=nh-JyY… . Thanks everyone for attending!

account_circle
Lee Blum(@theLeeBlum) 's Twitter Profile Photo

In one month from now I'll be speaking on Verint big data journey with Apache Spark Apache Kafka and Apache Parquet at the Conference in London. If you're there, drop by! conferences.oreilly.com/strata/strata-…

account_circle
lucien fregosi(@lulufrego) 's Twitter Profile Photo

Great benchmark between Apache Parquet on and Apache Kudu blog.clairvoyantsoft.com/guide-to-using…
In short kudu is faster than Parquet for random access Querys like CRUD operations but slower for analytics queries.

account_circle
Julien Le Dem(@J_) 's Twitter Profile Photo

If you’re a company using open source projects and not sure how to contribute, a release engineer would be a tremendous help. It’s hard to do this properly part time. I have a specific project in mind, if you need a hint.

account_circle
Mustafa Akın 🍉(@mustafaakin) 's Twitter Profile Photo

You do not need Spark to create Apache Parquet files, you can use plain Java and it can even fit in AWS Lambda for a serverless solution:
engineering.opsgenie.com/analyzing-aws-…

account_circle
Lee Blum(@theLeeBlum) 's Twitter Profile Photo

I'll be speaking at Conference this May in London, and share our journey in one of our many adventures with Apache Spark. You're all invited! conferences.oreilly.com/strata/strata-… Apache Hadoop Apache Kafka Apache Parquet O'Reilly Strata Data & AI Conference

account_circle
Shubham Chaudhary(@ylogx) 's Twitter Profile Photo

Working with a 10Gig csv data. Pandas read_csv took 16mins to load the csv into memory. Converted to Apache Parquet with ApacheArrow. It took 30 secs to read into pyarrow table and 16 sec to convert to pandas dataframe.

16mins => 46sec!

tech.blue-yonder.com/efficient-data…

Working with a 10Gig csv data. Pandas read_csv took 16mins to load the csv into memory. Converted to @ApacheParquet with @ApacheArrow. It took 30 secs to read into pyarrow table and 16 sec to convert to pandas dataframe. 16mins => 46sec! tech.blue-yonder.com/efficient-data…
account_circle
Julien Le Dem(@J_) 's Twitter Profile Photo

I'm proud to have shared the stage with Doug Cutting for this podcast on serialization formats. A great pleasure.
x.com/dataengpodcast…

account_circle