Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

strange column format error while inserting data

Highlighted

strange column format error while inserting data

Contributor

Hello all

 

I have 2 tables, first one is CSV file and second one is parquet table. I am selecting and inserting data from first one to second one. 

 

I am inserting data with following sql 

 

 

insert overwrite header partition (p_tar = 2017112702) select FL_SEQ, REC_SEQ, EVNT_TYP, EVNT_SUB_TYP, Z1, Z2, K12, K24, K44, K45, K49, K53, K54, K66, K67, K68, K91, K92, K93, K94 from sgn.scsv where substr(cast (z2 as string), 1,10)="2017112702";

 

 

SQL statement inserted many records but I am getting following error 27 times 

 

 

WARNINGS: Error converting column: 1 TO INT (Data is: ▒'▒@▒▒)
file: hdfs://nahadoop01:8020/grid1/hive/warehouse/sgn.db/scsv/20171127_02.csv
record: @[u▒;▒'▒@▒▒;;;;;;;;;;;;;;;;;;;;;;;;;

CSV table's first 6 columns like that 

 

 

Query: describe sgn.scsv
+--------------+-----------+---------+
| name         | type      | comment |
+--------------+-----------+---------+
| fl_dt        | int       |         |
| fl_seq       | int       |         |
| rec_seq      | int       |         |
| evnt_typ     | int       |         |
| evnt_sub_typ | int       |         |
| z1           | bigint    |         |

 

And parquet table is similar to CSV table

 

Query: describe header
+------------------+-----------+
| name             | type      |
+------------------+-----------+
| fl_seq           | int       |
| rec_seq          | int       |
| evnt_typ         | int       |
| evnt_sub_typ     | int       |
| strt_ts          | bigint    |
| end_ts           | bigint    |

I downloaded /grid1/hive/warehouse/sgn.db/scsv/20171127_02.csv file to Linux filesystem and then checked the file contents 

 

There is only one line for error and it's not in first columns

 

impala@hadoop01:$ grep   -n "@\[u" 20171127_02.csv
59404785:20171127;2189;1575;10;220;20171126225552922;20171126225706058;;;;;;;;;;;;;;;;;104;28;31;161;;;;;;;;;;;;;;;;;;;80;47457;;80;60006;;;;;;;;;;;;;;;;158313;610798;2531;339864;802705;966917;4;73075;73136164;7@[u▒;▒'▒@▒▒;;;;;;;;;;;;;;;;;;;;;;;;;32;864;1546;0;;;;Mozilla/5.▒▒oYE=D"▒F▒▒▒

So I am wondering why I am getting same error 27 times if I have one line in csv file ? and more important question is why I am getting this error for "column 1" even column 1 is already has valid data ?  

 

Do you have any idea ?

 

2 REPLIES 2
Highlighted

Re: strange column format error while inserting data

Champion

did you had a  chance to run hadoop  fsck ? to see if you have any corrupted blocks ? 

Highlighted

Re: strange column format error while inserting data

Contributor

Hello csguna

 

I just checked again, it's healthy

...Status: HEALTHY
Total size: 5888983039200 B (Total open files size: 9932111872 B)
Total dirs: 671975
Total files: 1376803
Total symlinks: 0 (Files currently being written: 1)
Total blocks (validated): 765441 (avg. block size 7693581 B) (Total open file blocks (not validated): 74)
Minimally replicated blocks: 765441 (99.99999 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.9998524
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 4
Number of racks: 1
FSCK ended at Fri Dec 08 14:55:28 EET 2017 in 59696 milliseconds

The filesystem under path '/' is HEALTHY

 

 

Thanks for your reply

Don't have an account?
Coming from Hortonworks? Activate your account here