- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Create Hive table using pyspark: Mkdirs failed to create file
Created on ‎07-30-2022 09:51 AM - edited ‎07-30-2022 09:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We would like to create a Hive table in the ussign pyspark dataframe cluster.
We have the script below, which has run well several times in the past on the same cluster. After some configuration changes in the cluster, the same script is showing the error below.
We were unable to identify what changes are made to the cluster to trigger this error in this script (we rearange some services at cluster, etc)
The simple script is:
# pyspark --master=yarn
data = [("Java", "20000"), ("Python", "100000"), ("Scala", "3000")]
rdd = spark.sparkContext.parallelize(data)
dfFromRDD1 = rdd.toDF(columns)
dfFromRDD1.printSchema()
dfFromRDD1.show()
from pyspark.sql import SQLContext
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
dfFromRDD1.registerTempTable("evento_temp")
sqlContext.sql("use default").show()
ERROR:
Hive Session ID = bd9c459e-1ec8-483e-9543-c1527b33feec
22/07/30 13:55:45 WARN metastore.PersistenceManagerProvider: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
22/07/30 13:55:45 WARN util.DriverDataSource: Registered driver with driverClassName=org.apache.derby.jdbc.EmbeddedDriver was not found, trying direct instantiation.
22/07/30 13:55:46 WARN util.DriverDataSource: Registered driver with driverClassName=org.apache.derby.jdbc.EmbeddedDriver was not found, trying direct instantiation.
22/07/30 13:55:46 WARN metastore.MetaStoreDirectSql: Self-test query [select "DB_ID" from "DBS"] failed; direct SQL is disabled
javax.jdo.JDODataStoreException: Error executing SQL query "select "DB_ID" from "DBS"".
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
.....
at java.base/java.lang.Thread.run(Thread.java:829)
NestedThrowablesStackTrace:
java.sql.SQLSyntaxErrorException: Table/View 'DBS' does not exist.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
sqlContext.sql("CREATE TABLE IF NOT EXISTS evento STORED AS parquet as SELECT * from evento_temp").show()
ERROR:
22/07/29 17:07:08 WARN Datastore.Schema: The MetaData for "org.apache.hadoop.hive.metastore.model.MStorageDescriptor" is specified with a foreign-key at class level yet no "table" is defined. All foreign-keys at this level must have a table that the FK goes to.
22/07/29 17:07:08 WARN Datastore.Schema: The MetaData for "org.apache.hadoop.hive.metastore.model.MStorageDescriptor" is specified with a foreign-key at class level yet no "table" is defined. All foreign-keys at this level must have a table that the FK goes to.
22/07/29 17:07:08 WARN metastore.PersistenceManagerProvider: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
22/07/29 17:07:08 WARN metastore.PersistenceManagerProvider: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
22/07/29 17:07:08 WARN metastore.HiveMetaStore: Location: file:/home/usr_cmteste3/spark-warehouse/evento specified for non-external table:evento
22/07/29 17:07:09 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 3.0 (TID 4, <<HOST>>, executor 2): org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Mkdirs failed to create file:/home/usr_cmteste3/spark-warehouse/evento/.hive-staging_hive_2022-07-29_17-07-08_935_7404207232723330868-1/-ext-10000/_temporary/0/_temporary/attempt_202207291707093395760670811853018_0003_m_000001_4 (exists=false, cwd=file:/data05/yarn/nm/usercache/usr_cmteste3/appcache/application_1659116901602_0017/container_e67_1659116901602_0017_01_000003)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:282)
Created ‎08-01-2022 03:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I fix this issue copying:
cp /etc/hive/conf/hive-site.xml /etc/spark/conf
Created ‎08-01-2022 03:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I fix this issue copying:
cp /etc/hive/conf/hive-site.xml /etc/spark/conf
