Reply
Explorer
Posts: 16
Registered: ‎01-14-2017

Error when reading individual json files

 

Hi,

 

I get this warning when reading individual JSON files "WARN datasources.DataSource: Error while looking for metadata directory"

 

I don't get the error if reading whole directories instead of individual files.  

 

I found this article which stats that a config moved from a hive configuration to a spark configuration in Spark 2. (https://issues.apache.org/jira/browse/SPARK-15034)

 

I tried to submit my job as "sparksubmit --conf  spark.sql.warehouse.dir=/user/hive/warehouse  job.py" but this does not cause the error to stop. (also tried hdfs://server-name/user/hive/warehouse)

 

I verified that /use/hive/warehouse exits in hdfs and is world writeable. 

 

According to the referenced article, the spark.sql.warehouse.dir variable is supposed to be set in spark-defaults.conf. 

 

1. I am not sure how to set that through the Cloudera Manager. 

2. From what I have read, the "--conf " argument to spark-submit should do basically the same thing so I am not sure if setting the variable would work even if I knew how to set it through CDH Manager. 

 

Thanks in advance!

 

 

 

 

 

Announcements