About jFreeman

jFreeman · ‎06-19-2017

I am only dealing with Parquet and Avro luckily, not text. And yes, I was referring to the key partitions in the files. Sorry for going off topic, but I'm still quite new to Spark and the whole Hadoop ecosystem in general, so I'm still trying to get a feel for everything. To clarify, partitions of a RDDs/Dataframes are different than the key based partitions of the files? I had always thought they were the same.

jFreeman · ‎06-19-2017

I should have mentioned it in the first post, but I need to maintain existing partitions as they are, so I need to compact files within partitions.

jFreeman · ‎06-19-2017

I'm running into isues with having lots of small Avro and Parquet files being created and stored in my hdfs and I need a way to compact them through Spark and its native libraries. I've seen that the standard methods for this seem to be coalesce and the Impala insert into a new table then insert back, but are there any better methods that have come on to the scene, or anything more Spark-centric?

Online	Offline
Last Visited	‎08-08-2017 01:54 PM

Member Since	‎06-19-2017 06:37 AM
Last Visited	‎08-08-2017 01:54 PM
Posts	4
Kudos received	1

Cloudera Community

Re: Any good methods for compacting small files in...

Re: Any good methods for compacting small files in...

Any good methods for compacting small files in Spa...