Support Questions

hdulayibi · ‎01-10-2017

I'm using spark to import data from a postgres database on my host machine into hdp sandbox. I get this error everytime.

Here is the command call

spark-submit \
  --packages org.apache.spark:spark-streaming-kafka_2.10:1.6.0,com.databricks:spark-csv_2.10:1.5.0,com.databricks:spark-avro_2.10:2.0.1 \
  --master yarn \
  --deploy-mode client \
  --jars ${L_ALL_JARS} \
  --class ${L_CLASS} \
  bdi.spark.jar ${HDM_CUSTOMER_KEY} yarn-client

scala

val sparkConf = new SparkConf().setAppName(args(2))  

val sc = new SparkContext(sparkConf)

val hive = new HiveContext(sc)

val df = sql.read.format("jdbc").options(opts).load.cache

df.show

The error doesn't occur util I call an action on the dataframe show().

hdulayibi · ‎01-11-2017

I changed to the call to include the args below and it worked.

--num-executors 4 \

--executor-memory 512M \

--driver-memory 512M \

View solution in original post

TimothySpann · ‎01-11-2017

Spark has lazy execution so show is where it tries to connect.

can you access that db from the sandbox command line check for errors

check postgres permissions and sandbox mapping / firewalls.

hdulayibi · ‎01-11-2017

I changed the code to read from Hive instead and still got the same error.

val sparkConf = new SparkConf().setAppName("test-spark").set("spark.eventLog.enabled", "true")
val sc = new SparkContext(sparkConf)  
val sql = new HiveContext(sc)
sql.sql("SELECT * FROM foodmart.customer").show()

I added these args to the call but also no luck.

--num-executors 4 \

--executor-memory 1G \

--driver-memory 1G \

hdulayibi · ‎01-11-2017

I changed to the call to include the args below and it worked.

--num-executors 4 \

--executor-memory 512M \

--driver-memory 512M \

Cloudera Community

Support Questions

Sandbox: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources