Reply
Highlighted
Contributor
Posts: 49
Registered: ‎04-21-2015

strange column format error while inserting data

[ Edited ]

Hello all

 

I have 2 tables, first one is CSV file and second one is parquet table. I am selecting and inserting data from first one to second one. 

 

I am inserting data with following sql 

 

 

insert overwrite header partition (p_tar = 2017112702) select FL_SEQ, REC_SEQ, EVNT_TYP, EVNT_SUB_TYP, Z1, Z2, K12, K24, K44, K45, K49, K53, K54, K66, K67, K68, K91, K92, K93, K94 from sgn.scsv where substr(cast (z2 as string), 1,10)="2017112702";

 

 

SQL statement inserted many records but I am getting following error 27 times 

 

 

WARNINGS: Error converting column: 1 TO INT (Data is: ▒'▒@▒▒)
file: hdfs://nahadoop01:8020/grid1/hive/warehouse/sgn.db/scsv/20171127_02.csv
record: @[u▒;▒'▒@▒▒;;;;;;;;;;;;;;;;;;;;;;;;;

CSV table's first 6 columns like that 

 

 

Query: describe sgn.scsv
+--------------+-----------+---------+
| name         | type      | comment |
+--------------+-----------+---------+
| fl_dt        | int       |         |
| fl_seq       | int       |         |
| rec_seq      | int       |         |
| evnt_typ     | int       |         |
| evnt_sub_typ | int       |         |
| z1           | bigint    |         |

 

And parquet table is similar to CSV table

 

Query: describe header
+------------------+-----------+
| name             | type      |
+------------------+-----------+
| fl_seq           | int       |
| rec_seq          | int       |
| evnt_typ         | int       |
| evnt_sub_typ     | int       |
| strt_ts          | bigint    |
| end_ts           | bigint    |

I downloaded /grid1/hive/warehouse/sgn.db/scsv/20171127_02.csv file to Linux filesystem and then checked the file contents 

 

There is only one line for error and it's not in first columns

 

impala@hadoop01:$ grep   -n "@\[u" 20171127_02.csv
59404785:20171127;2189;1575;10;220;20171126225552922;20171126225706058;;;;;;;;;;;;;;;;;104;28;31;161;;;;;;;;;;;;;;;;;;;80;47457;;80;60006;;;;;;;;;;;;;;;;158313;610798;2531;339864;802705;966917;4;73075;73136164;7@[u▒;▒'▒@▒▒;;;;;;;;;;;;;;;;;;;;;;;;;32;864;1546;0;;;;Mozilla/5.▒▒oYE=D"▒F▒▒▒

So I am wondering why I am getting same error 27 times if I have one line in csv file ? and more important question is why I am getting this error for "column 1" even column 1 is already has valid data ?  

 

Do you have any idea ?

 

Champion
Posts: 601
Registered: ‎05-16-2016

Re: strange column format error while inserting data

did you had a  chance to run hadoop  fsck ? to see if you have any corrupted blocks ? 

Contributor
Posts: 49
Registered: ‎04-21-2015

Re: strange column format error while inserting data

Hello csguna

 

I just checked again, it's healthy

...Status: HEALTHY
Total size: 5888983039200 B (Total open files size: 9932111872 B)
Total dirs: 671975
Total files: 1376803
Total symlinks: 0 (Files currently being written: 1)
Total blocks (validated): 765441 (avg. block size 7693581 B) (Total open file blocks (not validated): 74)
Minimally replicated blocks: 765441 (99.99999 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.9998524
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 4
Number of racks: 1
FSCK ended at Fri Dec 08 14:55:28 EET 2017 in 59696 milliseconds

The filesystem under path '/' is HEALTHY

 

 

Thanks for your reply

Announcements