Created on 03-01-2017 06:25 AM - edited 09-16-2022 04:10 AM
Hello Folks!
I´m trying to use spark (with this command: sudo -i -u hdfs spark-sql --conf spark.executor.memory=1g --conf spark.akka.frameSize=1024 --conf spark.driver.memory=10g --conf spark.dynamicAllocation.initialExecutors=10 --conf spark.dynamicAllocation.maxExecutors=10 --conf spark.executor.extraJavaOptions="-XX:MaxPermSize=1024M" --conf spark.kryoserializer.buffer.max=1024m -f hdfs:////uploadFiles/BI/sinapse.sql)
But terminal returns: "-bash: spark-sql: command not found"
What i doing wrong?
I´m using CDH 5.10.0
Created 03-01-2017 11:29 AM
I have not seen or heard of a spark-sql binary in which to launch spark jobs. My best guess is that it is used in conjunction with the Spark Thrift server. This feature of Spark is not included or supported in CDH (that is not saying you can't but the spark-sql binary will not exist by default).
If you already installed the Spark Thrift server then you need to add the Spark SQL CLI as well (and add it to your $PATH if you want to use it without listing the full path).
https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_spark_ki.html
Created 03-02-2017 04:39 PM
1. Go to Cloudera Manager -> Spark -> Instance -> Identify the node where you have Spark server installed
2. Login to the above identified node using CLI and go to path "/opt/cloudera/parcels/CDH-<version>/lib/spark/bin" , it will list binaries for "spark-shell, pyspark, spark-submit, etc". It helps us to login to spark & submit jobs. if it has spark-sql, then you can run the command that you have mentioned. In your case, spark-sql binary should be missing, so you are getting this error. You need to talk to your admin