Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

save spark 2 dataframe into hive managed table

avatar
Rising Star

Hi, I'm using SPARK2-2.0.0.cloudera.beta2-1.cdh5.7.0.p0.110234.

 

I'm trying to save spark dataframe into hive table.

 

df.write.mode(SaveMode.Overwrite).partitionBy("date").saveAsTable(s"$databaseName.$tableName")

 

I can list the table in beeline shell. However, I cannot read the content, because the table schema is not what I expected :

 

+-----------+----------------+--------------------+--+
| col_name | data_type | comment |
+-----------+----------------+--------------------+--+
| col | array<string> | from deserializer |
+-----------+----------------+--------------------+--+

 

I've tried spark1.6.0-cdh5.9.0-hadoop2.6.0, but got the same result.

 

=== update 2016-11-29 14:50 ===

 

I realized that Spark SQL specific format, which is NOT compatible with Hive. So, I changed to:

 

  • [spark 1.6.0] df.write.mode(SaveMode.Overwrite).partitionBy("date").insertInto(s"$databaseName.$tableName")

However, every time I do a query using beeline, the Hive metastore server will crash. If I query using Impala, the metastore server works well.

 

  • [spark 2.0.0]  df.write.mode(SaveMode.Overwrite).insertInto(s"$databaseName.$tableName")

the write operation succeed some times. Then can be queried by Impala but not beeline.

sometimes, the write operation failed with error ERROR KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! and the metastore server crashes.

 

Thanks.

Who agreed with this topic