Support Questions

Find answers, ask questions, and share your expertise

Delimited /u0001 not working

avatar
Rising Star

Hi Team,

Iam using delimiter /u0001 on Hive. When i insert data on that table i see just data on the first column. Rest of the column are showing as a NULL. Please help find the details below

TABLE :::

CREATE TABLE `man_def_3`( |

| `cntry_cde` char(3), |

| `cntry_nm` varchar(45), |

| `iso_cntry_nbr` char(3), |

| `intl_tran_ind` char(1), |

| `ofac_prhbt_tran_ind` char(1), |

| `dflt_fin_nwk_cde` char(5), |

| `eff_dt` string, |

| `exp_dt` string, |

| `trnsfrm_dt` string) |

| ROW FORMAT DELIMITED |

| FIELDS TERMINATED BY '\u0001' |

| LINES TERMINATED BY '\n' |

| STORED AS INPUTFORMAT |

| 'org.apache.hadoop.mapred.TextInputFormat' |

| OUTPUTFORMAT |

| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |

| LOCATION |

| 'hdfs://dshdp-dev-cluster/apps/hive/warehouse/neela.db/cntry_def_1' |

| TBLPROPERTIES ( |

| 'COLUMN_STATS_ACCURATE'='true', |

| 'numFiles'='1', |

| 'totalSize'='2654', |

| 'transient_lastDdlTime'='1464346137')

OUTPUT:

select * from man_def_3 ;

+------------------------+-----------------------+----------------------------+----------------------------+----------------------------------+-------------------------------+---------------------+---------------------+-------------------------+--+

| cntry_def_3.cntry_cde | cntry_def_3.cntry_nm | cntry_def_3.iso_cntry_nbr | cntry_def_3.intl_tran_ind | cntry_def_3.ofac_prhbt_tran_ind | cntry_def_3.dflt_fin_nwk_cde | cntry_def_3.eff_dt | cntry_def_3.exp_dt | cntry_def_3.trnsfrm_dt |

+------------------------+-----------------------+----------------------------+----------------------------+----------------------------------+-------------------------------+---------------------+---------------------+-------------------------+--+

| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |

| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |

| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |

| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |

| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |

Part of Data :

abc\u0001su\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test1\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001resh2\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001sush3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001tes6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test52\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001sh3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s4\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

8 REPLIES 8

avatar
Master Guru

'\u0001' is a single character Ctrl-A. What do you have in your data file as the delimiter? A single ctrl-A or 6 characters '\u0001'? The delimiter in Hive must be a single character, and actually Ctrl-A is the default. The best way to generate it is programmatically. If you open a file with Ctrl-A's in vi you can see, for example: 1010^Abob^A2016-04-10 05:52:25.0 (here I have 3 fields: id, string and timestamp).

avatar
Rising Star

This is the data file, separated by \u0001

abc\u0001su\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test1\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001resh2\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001sush3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001tes6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test52\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001sh3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s4\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

avatar
Rising Star

Is the file is in correct format. Do i have any option to have any REGULAR EXPRESSION to accept this file format. ?

avatar
Master Guru

I suspect it's not in a correct format. Can you open the file in your favorite editor and count characters on the first line between "abc" and "su", just move the cursor to the right. If there are 6 characters then try this, I assume you are on linux (make a copy of your file before):

sed -i 's/\u00001/^A/g' datafilename

To type ^A, keep "Control" key pressed and type "v" followed by "a". Replace your data filename and retry to list your table from Hive.

avatar
Rising Star

I tried ruining the command, i dont see any change onthe file format. and i still getting NULL values...

avatar
Master Guru

Can you upload your file, just first few lines will be enough. Tnx.

avatar
Rising Star

Please find the details

abc\u0001paper\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper1\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper2\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper4\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

avatar
Master Guru

Okay, try to use the attached file.

xy.txt