Support Questions

ravi_bansal · ‎08-04-2016

I am using spark with hive in my project . In the spark job , I am doing insert overwrite external table having partitioned columns. Spark job runs fine without any errors , I can see in web-UI, all tasks for the job are completed .

Now comes the painful part , I can see in logs , spark code processing is complete and now hive is trying to move the hdfs files from staging area to actual table directory of hive table . This is taking forever. Any inputs to fix this will be highly appreciated ? Please let me know if you want more details

Note : However When I run the same insert overwrite logic directly from hive script , it completes with in few minutes. (Execution engine is TEZ).

gopalv · ‎08-04-2016

Can you do a "dfs -ls" on the output for Spark job? The total # of files might be very different between SparkSQL and Hive-Tez.

ravi_bansal · ‎08-04-2016

Thanks Gopal....I don't have count of files produced by hive as of now ..I am trying to get that .........but Spark SQL produced 400 odd files before it got stuck....had it run further , it might have been producing more files...do you think num of files produced by Spark-SQL...that's why its taking so much of time ?

ravi_bansal · ‎08-05-2016

Spark SQL is producing around 2200 files where as TEZ is producing around 60 files.

ravi_bansal · ‎08-05-2016

It looks very similar to the these issues which other people have faced https://issues.apache.org/jira/browse/HIVE-13382

http://mail-archives.apache.org/mod_mbox/hive-user/201507.mbox/%3CCAG97e2E=0DQKPFSz1Gmy9=0te3i4uU0PL...

is this patch available in hdp 2.3.4 ?

Cloudera Community

Support Questions

Insert Overwrite running too slow when inserting data in partitioned table

Data not inserting in hive table (CDP)

Insert overwrite partitioned table

Job hang when Insert data into table in Spark Thri...

Very slow catalog update after insert overwrite

HDP 2.5.3 : Hive Throws ConcurrentModificationExce...

Hive INSERT OVERWRITE struct NoMatchingMethodExcep...

HDP 2.5.3 : Hive Throws ConcurrentModificationExce...

HBase stores base64 data when data is inserted fro...

Upgrade to HDP 2.5.3 : ConcurrentModificationExcep...

Cannot Insert Data from Text File Format Table to ...