09-06-2017 10:10 AM
I get this warning when reading individual JSON files "WARN datasources.DataSource: Error while looking for metadata directory"
I don't get the error if reading whole directories instead of individual files.
I found this article which stats that a config moved from a hive configuration to a spark configuration in Spark 2. (https://issues.apache.org/jira/browse/SPARK-15034)
I tried to submit my job as "sparksubmit --conf spark.sql.warehouse.dir=/user/hive/warehouse job.py" but this does not cause the error to stop. (also tried hdfs://server-name/user/hive/warehouse)
I verified that /use/hive/warehouse exits in hdfs and is world writeable.
According to the referenced article, the spark.sql.warehouse.dir variable is supposed to be set in spark-defaults.conf.
1. I am not sure how to set that through the Cloudera Manager.
2. From what I have read, the "--conf " argument to spark-submit should do basically the same thing so I am not sure if setting the variable would work even if I knew how to set it through CDH Manager.
Thanks in advance!