Support Questions

MB · ‎08-07-2014

I have pipe-delimited text files in HDFS (lines delimited by new line), and a parquet table:

CREATE EXTERNAL TABLE IF NOT EXISTS table_parquet(
TRADE_DATE TIMESTAMP,

[removed for brevity]
FILLER STRING
)
PARTITIONED BY(REGION STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
STORED AS PARQUET

LOCATION 'hdfs://path/to/location/';

ALTER TABLE table_parquet ADD PARTITION(REGION ='euro')
LOCATION 'hdfs://path/to/location/euro';

However, when I try to query the table, I get

Backend 0:File hdfs://path/to/file.txt is invalid. Invalid file footer: |

I have tried inserting spaces between the last pipe and the newline, and I've tried removing the last pipe, but no luck.

Any ideas what I'm doing wrong?

Edit: This is Impala 1.4.0

John Russell · ‎09-13-2014

The ROW FORMAT clause only applies to tables using the text format. STORED AS PARQUET means the table expects the data to already be in Parquet format. You will need to make 2 tables. One table with no STORED AS clause but with ROW FORMAT DELIMITED etc. You will be able to query this table after you move the delimited data files into the table directory and REFRESH the table). Then another (empty) table with the same columns and a STORED AS PARQUET clause. Then to convert the data to Parquet, you do:

insert into parquet_table select * from text_table;

John

View solution in original post

jasonshih · ‎08-15-2014

i read also the same issue with comma separaed csv format. a bit different is, I am running old 1.2.4 still. please could expert shed some light on this. thanks

thanks

Jason

John Russell · ‎09-13-2014

The ROW FORMAT clause only applies to tables using the text format. STORED AS PARQUET means the table expects the data to already be in Parquet format. You will need to make 2 tables. One table with no STORED AS clause but with ROW FORMAT DELIMITED etc. You will be able to query this table after you move the delimited data files into the table directory and REFRESH the table). Then another (empty) table with the same columns and a STORED AS PARQUET clause. Then to convert the data to Parquet, you do:

insert into parquet_table select * from text_table;

John

GordINV · ‎10-20-2015

In addition to what John Russell suggests (which works great), ensure you remove any partitions and the source HDFS files from any partitions/tables created incorrectly as parquet.

In my case, I was inserting the data into a table stored as text, then transfering to a parquet type storage table successfully. However, my query was still throwing an "invalid file footer" error because I had invalid partitions that hadn't been completely dropped/deleted from HDFS, specfifically:

/user/hive/warehouse/<databasename>/<table_name>/<partition>

Gord

Jais · ‎12-01-2015

Hi

We are also facing the same issue of invalid file footer, the table is created as follows :

2 tables created

CREATE EXTERNAL TABLE ABC_TEXT (

NAME STRING,

ID INT,

PHONE INT)

PARTITION BY (Customer_id INT)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ';'

STORED AS TEXTFILE

LOCATION '/USER/ABC_TEXT ;

CREATE EXTERNAL TABLE ABC_PARQUET (

NAME STRING,

ID INT,

PHONE INT )

PARTITION BY (Customer_id INT)

STORED AS PARQUET

LOCATION '/USER/ABC_PARQUET' ;

Then run the insert script, which inserts data perfectly but when queried on parquet table getting following error

Error:

Caused by: java.sql.SQLException: [Simba][ImpalaJDBCDriver](500312) Error in fetching data rows:

Invalid file footer

Please let me know what I am doing wrong.

Cloudera Community

Support Questions

Impala: Parquet error "Invalid file footer" on pipe-delimited file