Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Who agreed with this topic

Saving Spark 2.2 dataframs in Hive table

avatar
Expert Contributor

Hello,

 

I have a problem with Spark 2.2 (latest CDH 5.12.0) and saving DataFrame into Hive table.

 

Things I can do:

 

1. I can easily read tables from Hive tables in Spark 2.2

2. I can do saveAsTable in Spark 1.6 into Hive table and read it from Spark 2.2

3. I can do write.saveAsTable in Spark 2.2 and see the files and data inside Hive table

 

 

Things I cannot do in Spark 2.2:

 

4. When I read Hive table saved by Spark 2.2 in spark2-shell, it shows empty rows. It has all the fields and schema but no data.

 

I don't understand what could cause this problem.

Any help would be appreciate it.

 

example:

 

scala> val df = sc.parallelize(
     |   Seq(
     |     ("first", Array(2.0, 1.0, 2.1, 5.4)),
     |     ("test", Array(1.5, 0.5, 0.9, 3.7)),
     |     ("choose", Array(8.0, 2.9, 9.1, 2.5))
     |   ), 3
     | ).toDF
df: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>]

scala> df.show
+------+--------------------+
|    _1|                  _2|
+------+--------------------+
| first|[2.0, 1.0, 2.1, 5.4]|
|  test|[1.5, 0.5, 0.9, 3.7]|
|choose|[8.0, 2.9, 9.1, 2.5]|
+------+--------------------+

scala> df.write.saveAsTable("database.test")

scala> val savedDF = spark.sql("SELECT * FROM database.test")
res45: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>]

scala> savedDF.show
+---+---+
|_1|_2|
+---+---+
+---+---+
scala> savedDF.count
res55: Long = 0

 

 

Thanks

 

Who agreed with this topic