Support Questions

Find answers, ask questions, and share your expertise

Inserting nulls from dataframe to hive table

New Contributor

Hi, I have such a problem that I have a csv file that I put in data frame and everything is okay, where field is empty it puts null in data frame, but when I want to insert that into Hive table nulls are getting changed to "?".

1 REPLY 1

Cloudera Employee

Hi @Anitauzanna 
I have tried below use case as per the problem description, but I could not see '?' in hive table

[root@c1448-node1 ~]# hdfs dfs -cat /tmp/data.csv
1,a,b
2,,c
3,d,
[root@c1448-node1 ~]#

Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.7.7.1.7.0-551
/_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)

scala> val csv_df = spark.read.csv("/tmp/data.csv");
csv_df: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 1 more field]

scala> csv_df.show
+---+----+----+
|_c0| _c1| _c2|
+---+----+----+
| 1| a| b|
| 2|null| c|
| 3| d|null|
+---+----+----+


scala> csv_df.write.mode("overwrite").saveAsTable("null_test");
22/01/03 08:22:28 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
Hive Session ID = 623fe696-c08a-4bad-b15a-12b6c7db1c52

scala> val null_df=spark.sql("select * from null_test");
null_df: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 1 more field]

scala> null_df.show
+---+----+----+
|_c0| _c1| _c2|
+---+----+----+
| 1| a| b|
| 2|null| c|
| 3| d|null|
+---+----+----+


scala>

[root@c1448-node1 ~]#
[root@c1448-node1 ~]# beeline -e "select * from null_test;"
+----------------+----------------+----------------+
| null_test._c0 | null_test._c1 | null_test._c2 |
+----------------+----------------+----------------+
| 1 | a | b |
| 2 | NULL | c |
| 3 | d | NULL |
+----------------+----------------+----------------+
3 rows selected (1.344 seconds)

 

I guess there could be some unsupported characters in your text file, we might need to change encoding to get them correctly visible

Please share the sample data along with the Hive table schema & spark commands tried so far to help troubleshoot the issue further

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.