Reply
New Contributor
Posts: 3
Registered: ‎10-09-2017
Accepted Solution

Create Impala table from existing Parquet file

[ Edited ]

I have a Parquet file that has 5,000 records in it.

I moved it to HDFS and ran the Impala command:

CREATE EXTERNAL TABLE mytable LIKE PARQUET '/user/hive/MyDataFolder/MyData.Parquet'
STORED AS PARQUET
LOCATION '/user/hive/MyDataFolder';

Impala creates the table, and I can see the correct schema in Hue.

I know that this Parquet file has 5,000 records in it. I know this because I put 5,000 records in it when I created it, and because I can query it with Drill and see 5,000 records. However, when I perform a query in Impala:

SELECT * FROM mytable;

I get back 0 rows.

Why? How do I tell Impala to 'see' those 5,000 records in the file I gave it?

Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: Create Impala table from existing Parquet file

Sorry for the trouble - this is expected to work.

 

Can you provide the query profile of the query that returned 0 rows?

Champion
Posts: 768
Registered: ‎05-16-2016

Re: Create Impala table from existing Parquet file

[ Edited ]

@Nevo  Its costly but could you fire the below command and see if that fixes 

REFRESH [db_name.]table_name



REFRESH DATABASE_NAME.TABLE_NAME

 

New Contributor
Posts: 3
Registered: ‎10-09-2017

Re: Create Impala table from existing Parquet file

Well, even to an idiot like me who's never used Impala, this is suspicious....

 

EXPLAIN SELECT * FROM nrtest2;

1	Per-Host Resource Reservation: Memory=0B
2	Per-Host Resource Estimates: Memory=10.00MB
3	WARNING: The following tables are missing relevant table and/or column statistics.
4	default.nrtest2
5	
6	PLAN-ROOT SINK
7	|
8	01:EXCHANGE [UNPARTITIONED]
9	|
10	00:SCAN HDFS [default.nrtest2]
11	   partitions=1/1 files=0 size=0B

How do I get Impala to tell me where on disk it's looking for files? DESCRIBE gives me the schema for the table but not the location of the table.

 

Highlighted
Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: Create Impala table from existing Parquet file

Your steps were perfectly reasonable - this flow should work.

 

Agree, the EXPLAIN definitely looks suspicious.

 

You can get the location via SHOW CREATE TABLE <table>.

 

Perhaps there is an issue with mixed-case HDFS paths. You might try converting everything to lower case to see if it works. I'd love to know either way so I can file a bug and fix it :)

 

Thanks!

 

New Contributor
Posts: 3
Registered: ‎10-09-2017

Re: Create Impala table from existing Parquet file

Huh... the SHOW CREATE TABLE indicated STORED AS TEXTFILE.

 

I'm going to guess this is a Stupid User Trick and that I mistyped something. Dropping and recreating the table appears to have resolved the issue, although I had limited time to look at it today.

 

Thanks for the help, everyone. Getting the SHOW CREATE TABLE hint was key. I learned something today!

Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: Create Impala table from existing Parquet file

Thanks for following up! Glad you got it working.