About chrismcg89

chrismcg89 · ‎05-11-2016

Thanks @mbalakrishnan, Im currently running Spark Streaming job locally which is writing to the Hive deployed on my cluster. I have added the hive.merge.sparkfiles property. Will this work on files written with the saveAsTable command ?

chrismcg89 · ‎05-11-2016

Thanks, I will make sure the Spark version of the property is set Thanks for the help, I wonder if instead of rdd.toDF().saveAsTable I should be writing insert statements this might force the delta files to be created.

chrismcg89 · ‎05-11-2016

@mbalakrishnan Thanks, yes those properties are set, I believe its something to do with how the data is getting written to Hive via Spark Streaming

chrismcg89 · ‎05-11-2016

@Eric Walk Thanks, Yes you are correct Spark isn't writing deltas its just adding to the existing partition. Any idea on how to get Spark to write the delta's? -rw-r--r-- 3 cmcguire hdfs 0 2016-05-11 16:36 /test_data/test_test_tbl/_SUCCESS drwxr-xr-x - cmcguire hdfs 0 2016-05-11 16:40 /test_data/test_tbl/dt=11-05-2016 -rwxr-xr-x 3 cmcguire hdfs 3750 2016-05-11 16:37 /test_data/test_tbl/dt=11-05-2016/part-00000 -rwxr-xr-x 3 cmcguire hdfs 5468 2016-05-11 16:37 /test_data/test_tbl/dt=11-05-2016/part-00000_copy_1 -rwxr-xr-x 3 cmcguire hdfs 8264 2016-05-11 16:38 /test_data/test_tbl/dt=11-05-2016/part-00000_copy_2 -rwxr-xr-x 3 cmcguire hdfs 7068 2016-05-11 16:38 /test_data/test_tbl/dt=11-05-2016/part-00000_copy_3 -rwxr-xr-x 3 cmcguire hdfs 5157 2016-05-11 16:39 /test_data/test_tbl/dt=11-05-2016/part-00000_copy_4 -rwxr-xr-x 3 cmcguire hdfs 10684 2016-05-11 16:39 /test_data/test_tbl/dt=11-05-2016/part-00000_copy_5 -rwxr-xr-x 3 cmcguire hdfs 4796 2016-05-11 16:40 /test_data/test_tbl/dt=11-05-2016/part-00000_copy_6

chrismcg89 · ‎05-11-2016

Hi, I am currently using Spark streaming to write to an external hive table every 30 mins. rdd.toDF().write.partitionBy("dt").options(options).format("orc").mode(SaveMode.Append).saveAsTable("table_name") The issue with this is it creates lots of small files in HDFS, like so part-00000 part-00000_copy_1 My table was created with transactions enabled, and I have enabled ACID transactions on the Hive instance however, I can't see any compactions running nor do any get created when I force compaction with ALTER TABLE command. I would expect compaction to run and merge these files as they are very small 200 KB's in size. Any idea's or help greatly appreciated

Online	Offline
Last Visited	‎06-14-2016 09:11 AM

Member Since	‎05-11-2016 05:57 AM
Last Visited	‎06-14-2016 09:11 AM
Posts	7
Kudos received	1

Cloudera Community

Re: Hive compactions on External table

Re: Hive compactions on External table

Re: Hive compactions on External table

Re: Hive compactions on External table

Hive compactions on External table