Member since
10-22-2015
5
Posts
1
Kudos Received
0
Solutions
03-30-2017
09:55 AM
Thanks for updating Knut, In our environment, we used Parquet files on which we had hive external tables and queries were issued from Tableau. I had a question, if you set the spark.executor.cores = 1, will not the over all ETL batch job be slow? I mean will it not loose the the core concurrency power? Also, if you are using spark, you will see its effect on data sets that are often visited but not the one visit data sets? All your updates will greatly appreciated, as another team in my work place is going to try hive-on-spark. Thanks Sumit
... View more
03-29-2017
12:10 PM
1 Kudo
Hi knut N, We had attempted to connect Tableau to our Cloudera cluster and used Spark as the execution engine for Hive. We faced similar problems. The Spark job/task would never finish and to add some other observations, our Tableau queries would sometime finish quickly and sometime it would take long time and sometime would never finish. For these never ending applications that kept using resources, I was using the yarn -kill command to kill them. But, my expericence was not impressive. On the other hand, When we switched back to default Mapreduce execution engine for hive, our queries would always/always finish in on time(like 30+ seconds), the results were returned to the tableau properly and the applicationmaster and all other slave(map/reduce) tasks use to finish successfully all the time. Did you issue sql from shell or outside the cluster? Since, we were connecting via tableau, I suspected tableau may be part of the problem. Thanks for bringing this up, I would be curious to know more about this issue. As, we still are planning to adopt the Hive-On-Spark for our reporting purposes, but this experiecence has led us to suspect hive-on-spark. Thanks Sumit
... View more