Created on 08-05-2017 12:57 PM - edited 09-16-2022 05:03 AM
Hello,
I have a problem with Spark 2.2 (latest CDH 5.12.0) and saving DataFrame into Hive table.
Things I can do:
1. I can easily read tables from Hive tables in Spark 2.2
2. I can do saveAsTable in Spark 1.6 into Hive table and read it from Spark 2.2
3. I can do write.saveAsTable in Spark 2.2 and see the files and data inside Hive table
Things I cannot do in Spark 2.2:
4. When I read Hive table saved by Spark 2.2 in spark2-shell, it shows empty rows. It has all the fields and schema but no data.
I don't understand what could cause this problem.
Any help would be appreciate it.
example:
scala> val df = sc.parallelize( | Seq( | ("first", Array(2.0, 1.0, 2.1, 5.4)), | ("test", Array(1.5, 0.5, 0.9, 3.7)), | ("choose", Array(8.0, 2.9, 9.1, 2.5)) | ), 3 | ).toDF df: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>] scala> df.show +------+--------------------+ | _1| _2| +------+--------------------+ | first|[2.0, 1.0, 2.1, 5.4]| | test|[1.5, 0.5, 0.9, 3.7]| |choose|[8.0, 2.9, 9.1, 2.5]| +------+--------------------+ scala> df.write.saveAsTable("database.test") scala> val savedDF = spark.sql("SELECT * FROM database.test") res45: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>] scala> savedDF.show +---+---+ |_1|_2| +---+---+ +---+---+ scala> savedDF.count res55: Long = 0
Thanks