Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Who agreed with this topic

save spark 2 dataframe into hive managed table

Rising Star

Hi, I'm using SPARK2-2.0.0.cloudera.beta2-1.cdh5.7.0.p0.110234.


I'm trying to save spark dataframe into hive table.




I can list the table in beeline shell. However, I cannot read the content, because the table schema is not what I expected :


| col_name | data_type | comment |
| col | array<string> | from deserializer |


I've tried spark1.6.0-cdh5.9.0-hadoop2.6.0, but got the same result.


=== update 2016-11-29 14:50 ===


I realized that Spark SQL specific format, which is NOT compatible with Hive. So, I changed to:


  • [spark 1.6.0] df.write.mode(SaveMode.Overwrite).partitionBy("date").insertInto(s"$databaseName.$tableName")

However, every time I do a query using beeline, the Hive metastore server will crash. If I query using Impala, the metastore server works well.


  • [spark 2.0.0]  df.write.mode(SaveMode.Overwrite).insertInto(s"$databaseName.$tableName")

the write operation succeed some times. Then can be queried by Impala but not beeline.

sometimes, the write operation failed with error ERROR KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! and the metastore server crashes.



Who agreed with this topic