Created 01-10-2017 10:30 PM
I'm using spark to import data from a postgres database on my host machine into hdp sandbox. I get this error everytime.
Here is the command call
spark-submit \ --packages org.apache.spark:spark-streaming-kafka_2.10:1.6.0,com.databricks:spark-csv_2.10:1.5.0,com.databricks:spark-avro_2.10:2.0.1 \ --master yarn \ --deploy-mode client \ --jars ${L_ALL_JARS} \ --class ${L_CLASS} \ bdi.spark.jar ${HDM_CUSTOMER_KEY} yarn-client
scala
val sparkConf = new SparkConf().setAppName(args(2)) val sc = new SparkContext(sparkConf) val hive = new HiveContext(sc) val df = sql.read.format("jdbc").options(opts).load.cache df.show
The error doesn't occur util I call an action on the dataframe show().
Created 01-11-2017 05:36 PM
I changed to the call to include the args below and it worked.
--num-executors 4 \
--executor-memory 512M \
--driver-memory 512M \
Created 01-11-2017 12:07 AM
Spark has lazy execution so show is where it tries to connect.
can you access that db from the sandbox command line check for errors
check postgres permissions and sandbox mapping / firewalls.
Created 01-11-2017 05:31 PM
I changed the code to read from Hive instead and still got the same error.
val sparkConf = new SparkConf().setAppName("test-spark").set("spark.eventLog.enabled", "true") val sc = new SparkContext(sparkConf) val sql = new HiveContext(sc) sql.sql("SELECT * FROM foodmart.customer").show()
I added these args to the call but also no luck.
--num-executors 4 \
--executor-memory 1G \
--driver-memory 1G \
Created 01-11-2017 05:36 PM
I changed to the call to include the args below and it worked.
--num-executors 4 \
--executor-memory 512M \
--driver-memory 512M \