- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Inserting nulls from dataframe to hive table
- Labels:
-
Apache Hive
Created ‎10-18-2021 03:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I have such a problem that I have a csv file that I put in data frame and everything is okay, where field is empty it puts null in data frame, but when I want to insert that into Hive table nulls are getting changed to "?".
Created ‎01-03-2022 12:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Anitauzanna
I have tried below use case as per the problem description, but I could not see '?' in hive table
[root@c1448-node1 ~]# hdfs dfs -cat /tmp/data.csv
1,a,b
2,,c
3,d,
[root@c1448-node1 ~]#
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.7.7.1.7.0-551
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)
scala> val csv_df = spark.read.csv("/tmp/data.csv");
csv_df: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 1 more field]
scala> csv_df.show
+---+----+----+
|_c0| _c1| _c2|
+---+----+----+
| 1| a| b|
| 2|null| c|
| 3| d|null|
+---+----+----+
scala> csv_df.write.mode("overwrite").saveAsTable("null_test");
22/01/03 08:22:28 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
Hive Session ID = 623fe696-c08a-4bad-b15a-12b6c7db1c52
scala> val null_df=spark.sql("select * from null_test");
null_df: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 1 more field]
scala> null_df.show
+---+----+----+
|_c0| _c1| _c2|
+---+----+----+
| 1| a| b|
| 2|null| c|
| 3| d|null|
+---+----+----+
scala>
[root@c1448-node1 ~]#
[root@c1448-node1 ~]# beeline -e "select * from null_test;"
+----------------+----------------+----------------+
| null_test._c0 | null_test._c1 | null_test._c2 |
+----------------+----------------+----------------+
| 1 | a | b |
| 2 | NULL | c |
| 3 | d | NULL |
+----------------+----------------+----------------+
3 rows selected (1.344 seconds)
I guess there could be some unsupported characters in your text file, we might need to change encoding to get them correctly visible
Please share the sample data along with the Hive table schema & spark commands tried so far to help troubleshoot the issue further
