Support Questions

Find answers, ask questions, and share your expertise

Impala: Parquet error "Invalid file footer" on pipe-delimited file

avatar
Explorer

I have pipe-delimited text files in HDFS (lines delimited by new line), and a parquet table:

 

CREATE EXTERNAL TABLE IF NOT EXISTS table_parquet(
TRADE_DATE TIMESTAMP,

[removed for brevity]
FILLER STRING
)
PARTITIONED BY(REGION STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
STORED AS PARQUET

LOCATION 'hdfs://path/to/location/';

 

ALTER TABLE table_parquet ADD PARTITION(REGION ='euro')
LOCATION 'hdfs://path/to/location/euro';

 

However, when I try to query the table, I get

 

Backend 0:File hdfs://path/to/file.txt is invalid.  Invalid file footer: |

I have tried inserting spaces between the last pipe and the newline, and I've tried removing the last pipe, but no luck.

Any ideas what I'm doing wrong?

 

 Edit: This is Impala 1.4.0

1 ACCEPTED SOLUTION

avatar
Contributor

The ROW FORMAT clause only applies to tables using the text format.  STORED AS PARQUET means the table expects the data to already be in Parquet format. You will need to make 2 tables.  One table with no STORED AS clause but with ROW FORMAT DELIMITED etc.  You will be able to query this table after you move the delimited data files into the table directory and REFRESH the table).  Then another (empty) table with the same columns and a STORED AS PARQUET clause.  Then to convert the data to Parquet, you do:

 

insert into parquet_table select * from text_table;

 

John

View solution in original post

4 REPLIES 4

avatar
New Contributor

 

i read also the same issue with comma separaed csv format. a bit different is, I am running old 1.2.4 still. please could expert shed some light on this. thanks 

 

thanks 

Jason

avatar
Contributor

The ROW FORMAT clause only applies to tables using the text format.  STORED AS PARQUET means the table expects the data to already be in Parquet format. You will need to make 2 tables.  One table with no STORED AS clause but with ROW FORMAT DELIMITED etc.  You will be able to query this table after you move the delimited data files into the table directory and REFRESH the table).  Then another (empty) table with the same columns and a STORED AS PARQUET clause.  Then to convert the data to Parquet, you do:

 

insert into parquet_table select * from text_table;

 

John

avatar
Explorer

In addition to what John Russell suggests (which works great), ensure you remove any partitions and the source HDFS files from any partitions/tables created incorrectly as parquet.

 

In my case, I was inserting the data into a table stored as text, then transfering to a parquet type storage table successfully.  However, my query was still throwing an "invalid file footer" error because I had invalid partitions that hadn't been completely dropped/deleted from HDFS,  specfifically:

/user/hive/warehouse/<databasename>/<table_name>/<partition>

 

Gord

avatar
Expert Contributor

Hi 

 

We are also facing the same issue of invalid file footer, the table is created as follows :

 

2 tables created

 

CREATE EXTERNAL TABLE ABC_TEXT (

NAME STRING,

ID INT,

PHONE INT)

PARTITION BY (Customer_id INT)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ';'

STORED AS TEXTFILE

LOCATION '/USER/ABC_TEXT ;

 

 

 

CREATE EXTERNAL TABLE ABC_PARQUET (

NAME STRING,

ID INT,

PHONE INT )

PARTITION BY (Customer_id INT)

STORED AS PARQUET

LOCATION '/USER/ABC_PARQUET' ;

 

Then run the insert script, which inserts data perfectly but when queried on parquet table getting following error 

 

 

Error:

Caused by: java.sql.SQLException: [Simba][ImpalaJDBCDriver](500312) Error in fetching data rows: 

Invalid file footer

 

Please let me know what I am doing wrong.