Member since
03-27-2016
47
Posts
1
Kudos Received
0
Solutions
12-08-2016
03:24 AM
Did you look at jsonRDD something like this val jsonSchemaRDD = sqlContext.jsonRDD(jsons)// Pass in RDD directly
jsonSchemaRDD.registerTempTable("testjson")
sqlContext.sql("SELECT * FROM testjson where .... ").collect
... View more
06-09-2016
04:23 PM
The spark job ran fine now. I used spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar <file.py>
... View more
06-01-2016
06:43 PM
1 Kudo
You can set the run time parameters within hive shell, or pass them through your script, as mentioned by Pranay. Also if you are using Tez, this particular article on how to tune the performance may come in Handy. https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html
... View more
05-30-2016
08:55 AM
Thanks kuldeep i am able to run hive queries by putting in file now.
... View more
06-02-2016
06:37 AM
Someone know why my Pig/ Hive job are not working? I follow HDP (automation install) in one vmware with 16GB, everything is fine (all services are green in Ambari) But when I try to use Pig/ Hive in Ambari View, they are all failed stopped at 0% complete. I also try to login the terminal and manually run Pig/ example wordcount, the problem is the same. Someone have the same problem like this? Does One VM is not enough for Ambari hdp cluster? (if true, why sandbox is fine? ) Thanks
... View more
01-10-2018
04:16 PM
I solved the same problem, and the root cause is like what @Pradeep Bhadani said. Hive shell needs access to whichever Yarn container is running the Hive session process. and Yarn container could be running anywhere in the cluster(as long as that node has nodemanager). so make sure you have access to all nodes. also check if the hive shell client box has DNS resolution on all hostnames, because that container node is returned as hostname not IP.
... View more