Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to handle nulls when exporting from Hive?

avatar
Super Collaborator

I often need to export data from Hive to CSV files so that I can share with folks - usually they will ultimately import the CSV data into some sort of standard DB.

Currently, I use a CLI command like this:

hive -e 'set hive.cli.print.header=true; select * from blah where condition ' | sed 's/[\t]/,/g' > myfile.csv

However, when I do it this way, null values actually get printed as "NULL".

For example, an output row might be:

0|true|NULL|1|0|'my string'|NULL|etc

So, my question: What can I add to my command to replace those NULL entries with just an empty character?

In other words, how do I instead get this:

0|true||1|0|'my string'||etc

?

1 ACCEPTED SOLUTION

avatar
Guru

Using your sed approach, this should replace all NULL with empty character

sed 's/[\t]/,/g; s/NULL//g'  > myfile.csv

If there is a chance that NULL is a substring of a value you will need to do the following where ^ is beginning of line and $ is end of line and , is your field delimiter

sed 's/[\t]/,/g; s/^NULL,/,/g; s/,NULL,/,,/g; s/,NULL$/,/g;'  > myfile.csv

Note that if your resultset is large, it is probably best to use Pig on HDFS and not sed (to leverage the parallel processing of hadoop and save yourself a lot of time.

Note also: To use empty character as nulls in the actual hive table, use the following in the DDL

TBLPROPERTIES('serialization.null.format'='');

View solution in original post

1 REPLY 1

avatar
Guru

Using your sed approach, this should replace all NULL with empty character

sed 's/[\t]/,/g; s/NULL//g'  > myfile.csv

If there is a chance that NULL is a substring of a value you will need to do the following where ^ is beginning of line and $ is end of line and , is your field delimiter

sed 's/[\t]/,/g; s/^NULL,/,/g; s/,NULL,/,,/g; s/,NULL$/,/g;'  > myfile.csv

Note that if your resultset is large, it is probably best to use Pig on HDFS and not sed (to leverage the parallel processing of hadoop and save yourself a lot of time.

Note also: To use empty character as nulls in the actual hive table, use the following in the DDL

TBLPROPERTIES('serialization.null.format'='');