Created on 07-30-2022 09:51 AM - edited 07-30-2022 09:59 AM
Hello,
We would like to create a Hive table in the ussign pyspark dataframe cluster.
We have the script below, which has run well several times in the past on the same cluster. After some configuration changes in the cluster, the same script is showing the error below.
We were unable to identify what changes are made to the cluster to trigger this error in this script (we rearange some services at cluster, etc)
The simple script is:
# pyspark --master=yarn
data = [("Java", "20000"), ("Python", "100000"), ("Scala", "3000")]
rdd = spark.sparkContext.parallelize(data)
dfFromRDD1 = rdd.toDF(columns)
dfFromRDD1.printSchema()
dfFromRDD1.show()
from pyspark.sql import SQLContext
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
dfFromRDD1.registerTempTable("evento_temp")
sqlContext.sql("use default").show()
ERROR:
Hive Session ID = bd9c459e-1ec8-483e-9543-c1527b33feec
22/07/30 13:55:45 WARN metastore.PersistenceManagerProvider: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
22/07/30 13:55:45 WARN util.DriverDataSource: Registered driver with driverClassName=org.apache.derby.jdbc.EmbeddedDriver was not found, trying direct instantiation.
22/07/30 13:55:46 WARN util.DriverDataSource: Registered driver with driverClassName=org.apache.derby.jdbc.EmbeddedDriver was not found, trying direct instantiation.
22/07/30 13:55:46 WARN metastore.MetaStoreDirectSql: Self-test query [select "DB_ID" from "DBS"] failed; direct SQL is disabled
javax.jdo.JDODataStoreException: Error executing SQL query "select "DB_ID" from "DBS"".
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
.....
at java.base/java.lang.Thread.run(Thread.java:829)
NestedThrowablesStackTrace:
java.sql.SQLSyntaxErrorException: Table/View 'DBS' does not exist.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
sqlContext.sql("CREATE TABLE IF NOT EXISTS evento STORED AS parquet as SELECT * from evento_temp").show()
ERROR:
22/07/29 17:07:08 WARN Datastore.Schema: The MetaData for "org.apache.hadoop.hive.metastore.model.MStorageDescriptor" is specified with a foreign-key at class level yet no "table" is defined. All foreign-keys at this level must have a table that the FK goes to.
22/07/29 17:07:08 WARN Datastore.Schema: The MetaData for "org.apache.hadoop.hive.metastore.model.MStorageDescriptor" is specified with a foreign-key at class level yet no "table" is defined. All foreign-keys at this level must have a table that the FK goes to.
22/07/29 17:07:08 WARN metastore.PersistenceManagerProvider: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
22/07/29 17:07:08 WARN metastore.PersistenceManagerProvider: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
22/07/29 17:07:08 WARN metastore.HiveMetaStore: Location: file:/home/usr_cmteste3/spark-warehouse/evento specified for non-external table:evento
22/07/29 17:07:09 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 3.0 (TID 4, <<HOST>>, executor 2): org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Mkdirs failed to create file:/home/usr_cmteste3/spark-warehouse/evento/.hive-staging_hive_2022-07-29_17-07-08_935_7404207232723330868-1/-ext-10000/_temporary/0/_temporary/attempt_202207291707093395760670811853018_0003_m_000001_4 (exists=false, cwd=file:/data05/yarn/nm/usercache/usr_cmteste3/appcache/application_1659116901602_0017/container_e67_1659116901602_0017_01_000003)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:282)
Created 08-01-2022 03:24 PM
I fix this issue copying:
cp /etc/hive/conf/hive-site.xml /etc/spark/conf
Created 08-01-2022 03:24 PM
I fix this issue copying:
cp /etc/hive/conf/hive-site.xml /etc/spark/conf