Need help on below query.
We have a scenario where for any column defined with datatype as string in Hadoop, NULL value is loaded as blank. However, any column other than string datatype (INT,DOUBLE etc) NULL values are loaded as NULL in Hadoop.
Column name Data Type
Is this a default behavior of Hadoop/hive?
Yes, this is a default behaviour in Hive. When an empty value ( '' ) is passed for STRING datatype, it is taken as it is.
If you would like the null values for STRING to be treated as NULL instead of just empty, you can set this property - serialization.null.format for the table of your choice. For example, you can do this on a new table or to an existing table by running a statement like the below,
CREATE TABLE null_test_1 (service1end string, service1start string, service2end string, service2start string, firstlinemaintcost double) TBLPROPERTIES ( 'serialization.null.format'='');
alter table null_test set tblproperties ('serialization.null.format'='');
Once done, the empty values inserted for the columns with STRING datatype will be shown as NULL.
Hope this helps!