Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is the best way to export Hive table containing many null values to CSV ?

avatar

I tried specifying what Hive must use as null values for the export but it keeps writing out the physical values in the CSV file as "NULL" or "\N' where I need to be empty values just separated by ,. E.g. file line is currently written as W,\N,\N,\N where I need format of W,,,

1 ACCEPTED SOLUTION

avatar
Guru

You cannot do this directly from the export command and must do some separate processing.

I feel the best way to do this is to run this pig script on export result

raw = load 'data.csv' using PigStorage(',');
nonull = foreach raw generate 
	REPLACE($0, '\\\\N', ''),
	REPLACE($1, '\\\\N', ''),
	REPLACE($2, '\\\\N', ''),
	REPLACE($3, '\\\\N', '');
store nonull into 'nonull/data.csv' using PigStorage(',');

Keep in mind this will result in output in the m-r format in hdfs

data.csv
data.csv/_SUCCESS
data.csv/part-m-00000
data.csv/part-m-00001
...

If you want to process this file in hadoop, just point to data.csv

If you want to pull this to edge node with command line use hdfs dfs -getmerge <localpath> nonull/data.csv

If you want to download it using Ambari Files View, just double click on nonull/data.csv the click Select All then Concatenate and it will download as a single file

View solution in original post

1 REPLY 1

avatar
Guru

You cannot do this directly from the export command and must do some separate processing.

I feel the best way to do this is to run this pig script on export result

raw = load 'data.csv' using PigStorage(',');
nonull = foreach raw generate 
	REPLACE($0, '\\\\N', ''),
	REPLACE($1, '\\\\N', ''),
	REPLACE($2, '\\\\N', ''),
	REPLACE($3, '\\\\N', '');
store nonull into 'nonull/data.csv' using PigStorage(',');

Keep in mind this will result in output in the m-r format in hdfs

data.csv
data.csv/_SUCCESS
data.csv/part-m-00000
data.csv/part-m-00001
...

If you want to process this file in hadoop, just point to data.csv

If you want to pull this to edge node with command line use hdfs dfs -getmerge <localpath> nonull/data.csv

If you want to download it using Ambari Files View, just double click on nonull/data.csv the click Select All then Concatenate and it will download as a single file