Apache Parquet (@ApacheParquet) Twitter Tweets • TwiCopy

Apache Parquet

@ApacheParquet

+ Follow

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.
It provides high performance compression

ID:1342646282

linkhttps://parquet.apache.org calendar_today10-04-2013 19:07:49

383 Tweets

8,7K Followers

26 Following

Julien Le Dem

@J_

5 years ago

PSA: If you use the page-level statistics in Apache Parquet please chime in on JIRA: issues.apache.org/jira/browse/PA…

thumb_up_off_alt2

chat_bubble_outline0

repeat3

shareShare

account_circle

Raniere Silva @[email protected]

@rgaiacs

5 years ago

Last speaker on the #europython 's scientific room before lunch is Peter Hoffmann talking about#Pandas and #Dask to work with large datasets in Apache Parquet.

account_circle

Gyula Fora

@GyulaFora

5 years ago

Gabor Hermann bol Apache Kafka Apache Parquet Apache Flink bol_Techlab Have a look at the Apache Flink bucketing sink rework for the upcoming release and the Parquet writer ;)

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

account_circle

Rajat

@rajatk95

5 years ago

Stack Overflow Apache Spark Can someone answere this -> why is Apache Parquet format faster than other columnar storage like hbase, kudu etc? stackoverflow.com/q/48761227/318…

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

account_circle

Lee Blum

@theLeeBlum

5 years ago

My talk from the DMBI 2018 Conference at Ben-Gurion University of the Negev about our journey at @Verint_Cyber to #BigData #Cyber Analytics on Apache Hadoop Apache Spark Apache Kafka Apache Parquet is available at youtube.com/watch?v=nh-JyY… . Thanks everyone for attending!

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

account_circle

In one month from now I'll be speaking on Verint big data journey with Apache Spark Apache Kafka and Apache Parquet at the #StrataData Conference in London. If you're there, drop by! conferences.oreilly.com/strata/strata-…

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

account_circle

Renee Yao

@ReneeYao1

6 years ago

Join the #GPU accelerated #analytics and #ML revolution. ApacheArrow Apache Parquet and GPU Open Analytics Initiative #GTC18

Join the #GPU accelerated #analytics and #ML revolution. @ApacheArrow @ApacheParquet and @gpuoai #GTC18

thumb_up_off_alt9

chat_bubble_outline0

repeat8

shareShare

account_circle

lucien fregosi

@lulufrego

6 years ago

Great benchmark between Apache Parquet on #hdfs and Apache Kudu blog.clairvoyantsoft.com/guide-to-using…
In short kudu is faster than Parquet for random access Querys like CRUD operations but slower for analytics queries.

account_circle

Julien Le Dem

@J_

6 years ago

If you’re a company using open source projects and not sure how to contribute, a release engineer would be a tremendous help. It’s hard to do this properly part time. I have a specific project in mind, if you need a hint.

thumb_up_off_alt8

chat_bubble_outline0

repeat7

shareShare

account_circle

Mustafa Akın 🍉

@mustafaakin

6 years ago

You do not need Spark to create Apache Parquet files, you can use plain Java and it can even fit in AWS Lambda for a serverless solution:
engineering.opsgenie.com/analyzing-aws-…

thumb_up_off_alt14

chat_bubble_outline0

repeat7

shareShare

account_circle

nuvolatech

@nuvola_tech

7 years ago

Learn how to use hive views for advanced schema evolution #hive blog.nuvola-tech.com/2017/02/schema…

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

account_circle

Jeeva

@Jeeva_G

6 years ago

Is there a way to #sqoop from mssql to #s3 as a parquet directly? #awsemr Apache Parquet Apache Hadoop #bigdata #datalake

thumb_up_off_alt1

chat_bubble_outline0

repeat2

shareShare

account_circle

Lee Blum

@theLeeBlum

6 years ago

I'll be speaking at #StrataData Conference this May in London, and share our journey in one of our many #BigData adventures with Apache Spark. You're all invited! conferences.oreilly.com/strata/strata-… Apache Hadoop Apache Kafka Apache Parquet O'Reilly Strata Data & AI Conference

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

account_circle

f0nzie@OilGasAnalytics

@fonhzie

6 years ago

I wonder if we have Apache Parquet in #rstats x.com/ylogx/status/9…

thumb_up_off_alt5

chat_bubble_outline0

repeat6

shareShare

account_circle

Shubham Chaudhary

@ylogx

6 years ago

Apache Parquet ApacheArrow Also the file size went down from 10Gigs to 3Gigs without any compression.

thumb_up_off_alt18

chat_bubble_outline0

repeat6

shareShare

account_circle

Shubham Chaudhary

@ylogx

6 years ago

Working with a 10Gig csv data. Pandas read_csv took 16mins to load the csv into memory. Converted to Apache Parquet with ApacheArrow. It took 30 secs to read into pyarrow table and 16 sec to convert to pandas dataframe.

16mins => 46sec!

tech.blue-yonder.com/efficient-data…