Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Delimited /u0001 not working

Explorer

Hi Team,

Iam using delimiter /u0001 on Hive. When i insert data on that table i see just data on the first column. Rest of the column are showing as a NULL. Please help find the details below

TABLE :::

CREATE TABLE `man_def_3`( |

| `cntry_cde` char(3), |

| `cntry_nm` varchar(45), |

| `iso_cntry_nbr` char(3), |

| `intl_tran_ind` char(1), |

| `ofac_prhbt_tran_ind` char(1), |

| `dflt_fin_nwk_cde` char(5), |

| `eff_dt` string, |

| `exp_dt` string, |

| `trnsfrm_dt` string) |

| ROW FORMAT DELIMITED |

| FIELDS TERMINATED BY '\u0001' |

| LINES TERMINATED BY '\n' |

| STORED AS INPUTFORMAT |

| 'org.apache.hadoop.mapred.TextInputFormat' |

| OUTPUTFORMAT |

| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |

| LOCATION |

| 'hdfs://dshdp-dev-cluster/apps/hive/warehouse/neela.db/cntry_def_1' |

| TBLPROPERTIES ( |

| 'COLUMN_STATS_ACCURATE'='true', |

| 'numFiles'='1', |

| 'totalSize'='2654', |

| 'transient_lastDdlTime'='1464346137')

OUTPUT:

select * from man_def_3 ;

+------------------------+-----------------------+----------------------------+----------------------------+----------------------------------+-------------------------------+---------------------+---------------------+-------------------------+--+

| cntry_def_3.cntry_cde | cntry_def_3.cntry_nm | cntry_def_3.iso_cntry_nbr | cntry_def_3.intl_tran_ind | cntry_def_3.ofac_prhbt_tran_ind | cntry_def_3.dflt_fin_nwk_cde | cntry_def_3.eff_dt | cntry_def_3.exp_dt | cntry_def_3.trnsfrm_dt |

+------------------------+-----------------------+----------------------------+----------------------------+----------------------------------+-------------------------------+---------------------+---------------------+-------------------------+--+

| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |

| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |

| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |

| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |

| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |

Part of Data :

abc\u0001su\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test1\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001resh2\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001sush3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001tes6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test52\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001sh3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s4\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

8 REPLIES 8

'\u0001' is a single character Ctrl-A. What do you have in your data file as the delimiter? A single ctrl-A or 6 characters '\u0001'? The delimiter in Hive must be a single character, and actually Ctrl-A is the default. The best way to generate it is programmatically. If you open a file with Ctrl-A's in vi you can see, for example: 1010^Abob^A2016-04-10 05:52:25.0 (here I have 3 fields: id, string and timestamp).

Explorer

This is the data file, separated by \u0001

abc\u0001su\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test1\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001resh2\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001sush3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001tes6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001test52\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001sh3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s4\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

abc\u0001s6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

Explorer

Is the file is in correct format. Do i have any option to have any REGULAR EXPRESSION to accept this file format. ?

I suspect it's not in a correct format. Can you open the file in your favorite editor and count characters on the first line between "abc" and "su", just move the cursor to the right. If there are 6 characters then try this, I assume you are on linux (make a copy of your file before):

sed -i 's/\u00001/^A/g' datafilename

To type ^A, keep "Control" key pressed and type "v" followed by "a". Replace your data filename and retry to list your table from Hive.

Explorer

I tried ruining the command, i dont see any change onthe file format. and i still getting NULL values...

Can you upload your file, just first few lines will be enough. Tnx.

Explorer

Please find the details

abc\u0001paper\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper1\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper2\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper4\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30

Okay, try to use the attached file.

xy.txt