Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

What is the best way to export Hive table containing many null values to CSV ?

avatar

I tried specifying what Hive must use as null values for the export but it keeps writing out the physical values in the CSV file as "NULL" or "\N' where I need to be empty values just separated by ,. E.g. file line is currently written as W,\N,\N,\N where I need format of W,,,

1 ACCEPTED SOLUTION

avatar
Guru

You cannot do this directly from the export command and must do some separate processing.

I feel the best way to do this is to run this pig script on export result

raw = load 'data.csv' using PigStorage(',');
nonull = foreach raw generate 
	REPLACE($0, '\\\\N', ''),
	REPLACE($1, '\\\\N', ''),
	REPLACE($2, '\\\\N', ''),
	REPLACE($3, '\\\\N', '');
store nonull into 'nonull/data.csv' using PigStorage(',');

Keep in mind this will result in output in the m-r format in hdfs

data.csv
data.csv/_SUCCESS
data.csv/part-m-00000
data.csv/part-m-00001
...

If you want to process this file in hadoop, just point to data.csv

If you want to pull this to edge node with command line use hdfs dfs -getmerge <localpath> nonull/data.csv

If you want to download it using Ambari Files View, just double click on nonull/data.csv the click Select All then Concatenate and it will download as a single file

View solution in original post

1 REPLY 1

avatar
Guru

You cannot do this directly from the export command and must do some separate processing.

I feel the best way to do this is to run this pig script on export result

raw = load 'data.csv' using PigStorage(',');
nonull = foreach raw generate 
	REPLACE($0, '\\\\N', ''),
	REPLACE($1, '\\\\N', ''),
	REPLACE($2, '\\\\N', ''),
	REPLACE($3, '\\\\N', '');
store nonull into 'nonull/data.csv' using PigStorage(',');

Keep in mind this will result in output in the m-r format in hdfs

data.csv
data.csv/_SUCCESS
data.csv/part-m-00000
data.csv/part-m-00001
...

If you want to process this file in hadoop, just point to data.csv

If you want to pull this to edge node with command line use hdfs dfs -getmerge <localpath> nonull/data.csv

If you want to download it using Ambari Files View, just double click on nonull/data.csv the click Select All then Concatenate and it will download as a single file