About somedatadude

somedatadude · ‎01-28-2020

Thank you very much for your answer SahilTakiar. Could you tell me what offset means and how i can make Impala showing me the specific row(s) causing the errors? Your answer is very much appreciated! 🙂 Sorry if the question is simple. I am just new to HDFS. Best

somedatadude · ‎12-30-2019

Dear community, I have created a new datatable by uploading a csv file (incl. header / the csv file contains data about a specific months) to HDFS (via Hue). Afterwards I have cleared the cache and uploaded the other csv files (all following csv files have the same column order BUT NO HEADER; average size of every monthly csv file: ~2-4 GB; number of columns: 54). Typical procedure after uploading a new csv file to the database: INVALIDATE METADATA database_xy When I send a query where every column shall be displayed I get the following Error Messages in Impala: Error converting column: 6 to TIMESTAMP Error converting column: 8 to TIMESTAMP Error converting column: 23 to TIMESTAMP Error converting column: 50 to TIMESTAMP Error converting column: 35 to TIMESTAMP Error converting column: 43 to TIMESTAMP Information for this columns are available after 4 months. Till then there are only NULL values. Query to reproduce these error messages: SELECT * FROM database_xy LIMIT 100 For a specific TIMESTAMP column: SELECT min(exp_date) FROM database_xy Error (Just a sample of the log box in Hue): Error parsing row: file: hdfs://blabla/foo_042019.csv, before offset: 2432696320 Error converting column: 21 to TIMESTAMP Error parsing row: file: hdfs://blabla/foo_032019.csv, before offset: 1895825408 Error converting column: 21 to TIMESTAMP Error converting column: 21 to TIMESTAMP Error parsing row: file: hdfs://blabla/foo_022019.csv, before offset: 2969567232 Error converting column: 21 to TIMESTAMP Error converting column: 21 to TIMESTAMP When I run the queries in Hive I get no error messages at all. How come? And how do i get rid of those error messages in Impala? Information about how I created the csv-files locally: First CSV: Python (Pandas): Set Options: Separator: Pipe, (only for first csv:) header=True, index=False (so there is no additional useless index column) Subsequent CSVs: Python (Pandas): Set Options: Separator: Pipe, header=False, index=False (so there is no additional useless index column) When I created the table with the first CSV in Hue I selected the following options: Field Separator: Pipe Record Separator: New line Quote Character: Double Quote Afterwards I have uploaded all the other CSVs in the database's folder to add the new months and invalidated the metadata. Thank you for your help in advance! I hope you enjoyed the Christmas holidays and I wish you a happy New Year's Eve! Best somedatadude

Online	Offline
Last Visited	‎02-04-2020 10:29 AM

Member Since	‎12-30-2019 03:43 AM
Last Visited	‎02-04-2020 10:29 AM
Posts	2

Cloudera Community

Re: "Error parsing row: file" Table consists of mu...

"Error parsing row: file" Table consists of multip...