Reply
Highlighted
Explorer
Posts: 10
Registered: ‎11-04-2016

Saving Spark 2.2 dataframs in Hive table

[ Edited ]

Hello,

 

I have a problem with Spark 2.2 (latest CDH 5.12.0) and saving DataFrame into Hive table.

 

Things I can do:

 

1. I can easily read tables from Hive tables in Spark 2.2

2. I can do saveAsTable in Spark 1.6 into Hive table and read it from Spark 2.2

3. I can do write.saveAsTable in Spark 2.2 and see the files and data inside Hive table

 

 

Things I cannot do in Spark 2.2:

 

4. When I read Hive table saved by Spark 2.2 in spark2-shell, it shows empty rows. It has all the fields and schema but no data.

 

I don't understand what could cause this problem.

Any help would be appreciate it.

 

example:

 

scala> val df = sc.parallelize(
     |   Seq(
     |     ("first", Array(2.0, 1.0, 2.1, 5.4)),
     |     ("test", Array(1.5, 0.5, 0.9, 3.7)),
     |     ("choose", Array(8.0, 2.9, 9.1, 2.5))
     |   ), 3
     | ).toDF
df: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>]

scala> df.show
+------+--------------------+
|    _1|                  _2|
+------+--------------------+
| first|[2.0, 1.0, 2.1, 5.4]|
|  test|[1.5, 0.5, 0.9, 3.7]|
|choose|[8.0, 2.9, 9.1, 2.5]|
+------+--------------------+

scala> df.write.saveAsTable("database.test")

scala> val savedDF = spark.sql("SELECT * FROM database.test")
res45: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>]

scala> savedDF.show
+---+---+
|_1|_2|
+---+---+
+---+---+
scala> savedDF.count
res55: Long = 0

 

 

Thanks

 

Announcements