Created on 08-07-2014 08:43 AM - edited 09-16-2022 02:04 AM
I have pipe-delimited text files in HDFS (lines delimited by new line), and a parquet table:
CREATE EXTERNAL TABLE IF NOT EXISTS table_parquet(
TRADE_DATE TIMESTAMP,
[removed for brevity]
FILLER STRING
)
PARTITIONED BY(REGION STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
STORED AS PARQUET
LOCATION 'hdfs://path/to/location/';
ALTER TABLE table_parquet ADD PARTITION(REGION ='euro')
LOCATION 'hdfs://path/to/location/euro';
However, when I try to query the table, I get
Backend 0:File hdfs://path/to/file.txt is invalid. Invalid file footer: |
I have tried inserting spaces between the last pipe and the newline, and I've tried removing the last pipe, but no luck.
Any ideas what I'm doing wrong?
Edit: This is Impala 1.4.0
Created 09-13-2014 01:26 PM
The ROW FORMAT clause only applies to tables using the text format. STORED AS PARQUET means the table expects the data to already be in Parquet format. You will need to make 2 tables. One table with no STORED AS clause but with ROW FORMAT DELIMITED etc. You will be able to query this table after you move the delimited data files into the table directory and REFRESH the table). Then another (empty) table with the same columns and a STORED AS PARQUET clause. Then to convert the data to Parquet, you do:
insert into parquet_table select * from text_table;
John
Created 08-15-2014 11:25 AM
i read also the same issue with comma separaed csv format. a bit different is, I am running old 1.2.4 still. please could expert shed some light on this. thanks
thanks
Jason
Created 09-13-2014 01:26 PM
The ROW FORMAT clause only applies to tables using the text format. STORED AS PARQUET means the table expects the data to already be in Parquet format. You will need to make 2 tables. One table with no STORED AS clause but with ROW FORMAT DELIMITED etc. You will be able to query this table after you move the delimited data files into the table directory and REFRESH the table). Then another (empty) table with the same columns and a STORED AS PARQUET clause. Then to convert the data to Parquet, you do:
insert into parquet_table select * from text_table;
John
Created on 10-20-2015 09:33 AM - edited 10-20-2015 09:55 AM
In addition to what John Russell suggests (which works great), ensure you remove any partitions and the source HDFS files from any partitions/tables created incorrectly as parquet.
In my case, I was inserting the data into a table stored as text, then transfering to a parquet type storage table successfully. However, my query was still throwing an "invalid file footer" error because I had invalid partitions that hadn't been completely dropped/deleted from HDFS, specfifically:
/user/hive/warehouse/<databasename>/<table_name>/<partition>
Created on 12-01-2015 11:43 AM - edited 12-01-2015 03:38 PM
Hi
We are also facing the same issue of invalid file footer, the table is created as follows :
2 tables created
CREATE EXTERNAL TABLE ABC_TEXT (
NAME STRING,
ID INT,
PHONE INT)
PARTITION BY (Customer_id INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ';'
STORED AS TEXTFILE
LOCATION '/USER/ABC_TEXT ;
CREATE EXTERNAL TABLE ABC_PARQUET (
NAME STRING,
ID INT,
PHONE INT )
PARTITION BY (Customer_id INT)
STORED AS PARQUET
LOCATION '/USER/ABC_PARQUET' ;
Then run the insert script, which inserts data perfectly but when queried on parquet table getting following error
Error:
Caused by: java.sql.SQLException: [Simba][ImpalaJDBCDriver](500312) Error in fetching data rows:
Invalid file footer
Please let me know what I am doing wrong.