Member since
08-04-2016
4
Posts
0
Kudos Received
0
Solutions
08-05-2016
09:50 PM
Spark SQL is producing around 2200 files where as TEZ is producing around 60 files.
... View more
08-05-2016
06:36 PM
It looks very similar to the these issues which other people have faced
https://issues.apache.org/jira/browse/HIVE-13382 http://mail-archives.apache.org/mod_mbox/hive-user/201507.mbox/%3CCAG97e2E=0DQKPFSz1Gmy9=0te3i4uU0PLGmJqMWFp=NoHQEoyA@mail.gmail.com%3E is this patch available in hdp 2.3.4 ?
... View more
08-04-2016
08:42 PM
Thanks Gopal....I don't have count of files produced by hive as of now ..I am trying to get that .........but Spark SQL produced 400 odd files before it got stuck....had it run further , it might have been producing more files...do you think num of files produced by Spark-SQL...that's why its taking so much of time ?
... View more
08-04-2016
08:07 PM
I am using spark with hive in my project . In the spark job , I am doing insert overwrite external table having partitioned columns. Spark job runs fine without any errors , I can see in web-UI, all tasks for the job are completed . Now comes the painful part , I can see in logs , spark code processing is complete and now hive is trying to move the hdfs files from staging area to actual table directory of hive table . This is taking forever. Any inputs to fix this will be highly appreciated ? Please let me know if you want more details Note : However When I run the same insert overwrite logic directly from hive script , it completes with in few minutes. (Execution engine is TEZ).
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark