Welcome to the Cloudera Community

zhuangmz · ‎11-28-2016

Hi, I'm using SPARK2-2.0.0.cloudera.beta2-1.cdh5.7.0.p0.110234.

I'm trying to save spark dataframe into hive table.

df.write.mode(SaveMode.Overwrite).partitionBy("date").saveAsTable(s"$databaseName.$tableName")

I can list the table in beeline shell. However, I cannot read the content, because the table schema is not what I expected :

+-----------+----------------+--------------------+--+
| col_name | data_type | comment |
+-----------+----------------+--------------------+--+
| col | array<string> | from deserializer |
+-----------+----------------+--------------------+--+

I've tried spark1.6.0-cdh5.9.0-hadoop2.6.0, but got the same result.

=== update 2016-11-29 14:50 ===

I realized that Spark SQL specific format, which is NOT compatible with Hive. So, I changed to:

[spark 1.6.0] df.write.mode(SaveMode.Overwrite).partitionBy("date").insertInto(s"$databaseName.$tableName")

However, every time I do a query using beeline, the Hive metastore server will crash. If I query using Impala, the metastore server works well.

[spark 2.0.0] df.write.mode(SaveMode.Overwrite).insertInto(s"$databaseName.$tableName")

the write operation succeed some times. Then can be queried by Impala but not beeline.

sometimes, the write operation failed with error ERROR KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! and the metastore server crashes.

Thanks.

Cloudera Community

Welcome to the Cloudera Community

Who agreed with this topic

save spark 2 dataframe into hive managed table