Created 05-27-2016 11:29 AM
Hi Team,
Iam using delimiter /u0001 on Hive. When i insert data on that table i see just data on the first column. Rest of the column are showing as a NULL. Please help find the details below
TABLE :::
CREATE TABLE `man_def_3`( |
| `cntry_cde` char(3), |
| `cntry_nm` varchar(45), |
| `iso_cntry_nbr` char(3), |
| `intl_tran_ind` char(1), |
| `ofac_prhbt_tran_ind` char(1), |
| `dflt_fin_nwk_cde` char(5), |
| `eff_dt` string, |
| `exp_dt` string, |
| `trnsfrm_dt` string) |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY '\u0001' |
| LINES TERMINATED BY '\n' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://dshdp-dev-cluster/apps/hive/warehouse/neela.db/cntry_def_1' |
| TBLPROPERTIES ( |
| 'COLUMN_STATS_ACCURATE'='true', |
| 'numFiles'='1', |
| 'totalSize'='2654', |
| 'transient_lastDdlTime'='1464346137')
OUTPUT:
select * from man_def_3 ;
+------------------------+-----------------------+----------------------------+----------------------------+----------------------------------+-------------------------------+---------------------+---------------------+-------------------------+--+
| cntry_def_3.cntry_cde | cntry_def_3.cntry_nm | cntry_def_3.iso_cntry_nbr | cntry_def_3.intl_tran_ind | cntry_def_3.ofac_prhbt_tran_ind | cntry_def_3.dflt_fin_nwk_cde | cntry_def_3.eff_dt | cntry_def_3.exp_dt | cntry_def_3.trnsfrm_dt |
+------------------------+-----------------------+----------------------------+----------------------------+----------------------------------+-------------------------------+---------------------+---------------------+-------------------------+--+
| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| abc | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
Part of Data :
abc\u0001su\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001test1\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001resh2\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001sush3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001test5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001tes6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001test52\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001sh3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001s4\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001s5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001s6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
Created 05-27-2016 12:09 PM
'\u0001' is a single character Ctrl-A. What do you have in your data file as the delimiter? A single ctrl-A or 6 characters '\u0001'? The delimiter in Hive must be a single character, and actually Ctrl-A is the default. The best way to generate it is programmatically. If you open a file with Ctrl-A's in vi you can see, for example: 1010^Abob^A2016-04-10 05:52:25.0 (here I have 3 fields: id, string and timestamp).
Created 05-27-2016 12:11 PM
This is the data file, separated by \u0001
abc\u0001su\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001test1\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001resh2\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001sush3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001test5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001tes6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001test52\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001sh3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001s4\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001s5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
abc\u0001s6\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
Created 05-27-2016 12:29 PM
Is the file is in correct format. Do i have any option to have any REGULAR EXPRESSION to accept this file format. ?
Created 05-27-2016 12:42 PM
I suspect it's not in a correct format. Can you open the file in your favorite editor and count characters on the first line between "abc" and "su", just move the cursor to the right. If there are 6 characters then try this, I assume you are on linux (make a copy of your file before):
sed -i 's/\u00001/^A/g' datafilename
To type ^A, keep "Control" key pressed and type "v" followed by "a". Replace your data filename and retry to list your table from Hive.
Created 05-27-2016 01:25 PM
I tried ruining the command, i dont see any change onthe file format. and i still getting NULL values...
Created 05-27-2016 01:45 PM
Can you upload your file, just first few lines will be enough. Tnx.
Created 05-27-2016 01:56 PM
Please find the details
abc\u0001paper\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper1\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper2\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper3\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper4\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30 abc\u0001paper5\u0001abc\u0001Y\u0001Y\u0001Test\u00012014-05-30\u00012015-05-30\u00012015-05-30
Created 05-27-2016 02:46 PM
Okay, try to use the attached file.