Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Saving Spark 2.2 dataframs in Hive table

avatar
Expert Contributor

Hello,

 

I have a problem with Spark 2.2 (latest CDH 5.12.0) and saving DataFrame into Hive table.

 

Things I can do:

 

1. I can easily read tables from Hive tables in Spark 2.2

2. I can do saveAsTable in Spark 1.6 into Hive table and read it from Spark 2.2

3. I can do write.saveAsTable in Spark 2.2 and see the files and data inside Hive table

 

 

Things I cannot do in Spark 2.2:

 

4. When I read Hive table saved by Spark 2.2 in spark2-shell, it shows empty rows. It has all the fields and schema but no data.

 

I don't understand what could cause this problem.

Any help would be appreciate it.

 

example:

 

scala> val df = sc.parallelize(
     |   Seq(
     |     ("first", Array(2.0, 1.0, 2.1, 5.4)),
     |     ("test", Array(1.5, 0.5, 0.9, 3.7)),
     |     ("choose", Array(8.0, 2.9, 9.1, 2.5))
     |   ), 3
     | ).toDF
df: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>]

scala> df.show
+------+--------------------+
|    _1|                  _2|
+------+--------------------+
| first|[2.0, 1.0, 2.1, 5.4]|
|  test|[1.5, 0.5, 0.9, 3.7]|
|choose|[8.0, 2.9, 9.1, 2.5]|
+------+--------------------+

scala> df.write.saveAsTable("database.test")

scala> val savedDF = spark.sql("SELECT * FROM database.test")
res45: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>]

scala> savedDF.show
+---+---+
|_1|_2|
+---+---+
+---+---+
scala> savedDF.count
res55: Long = 0

 

 

Thanks

 

Who agreed with this topic