Member since
09-15-2017
5
Posts
0
Kudos Received
0
Solutions
09-18-2017
05:08 AM
Thank you for your replies. I am already following the above said methods. Was wondering if there is a way to use this property hive.merge.sparkfiles=true whick takes of combining small files automatically.
... View more
09-15-2017
01:38 PM
I am reading lot of csv files s3 via Spark and writing into a hive table as orc. While writing, it is writing lot of small files. I need to merge all these files, i tried setting the property sqlContext.sql("set hive.merge.sparkfiles=true"). But this has no impact. Given below the code. Please help.
sqlContext.sql("set hive.merge.sparkfiles=true") sqlContext.sql("set hive.merge.smallfiles.avgsize=128000000") sqlContext.sql("set hive.merge.size.per.task=128000000") new_df.registerTempTable("new_df")
sqlContext.sql("INSERT OVERWRITE TABLE db.tablename " + "PARTITION(`week`='W_1617')" + "SELECT col1, col2 FROM new_df"
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark