Reply
New Contributor
Posts: 3
Registered: ‎10-09-2017
Accepted Solution

Create Impala table from existing Parquet file

[ Edited ]

I have a Parquet file that has 5,000 records in it.

I moved it to HDFS and ran the Impala command:

CREATE EXTERNAL TABLE mytable LIKE PARQUET '/user/hive/MyDataFolder/MyData.Parquet'
STORED AS PARQUET
LOCATION '/user/hive/MyDataFolder';

Impala creates the table, and I can see the correct schema in Hue.

I know that this Parquet file has 5,000 records in it. I know this because I put 5,000 records in it when I created it, and because I can query it with Drill and see 5,000 records. However, when I perform a query in Impala:

SELECT * FROM mytable;

I get back 0 rows.

Why? How do I tell Impala to 'see' those 5,000 records in the file I gave it?

Cloudera Employee
Posts: 290
Registered: ‎10-16-2013

Re: Create Impala table from existing Parquet file

Sorry for the trouble - this is expected to work.

 

Can you provide the query profile of the query that returned 0 rows?

Highlighted
Champion
Posts: 595
Registered: ‎05-16-2016

Re: Create Impala table from existing Parquet file

[ Edited ]

@Nevo  Its costly but could you fire the below command and see if that fixes 

REFRESH [db_name.]table_name



REFRESH DATABASE_NAME.TABLE_NAME

 

New Contributor
Posts: 3
Registered: ‎10-09-2017

Re: Create Impala table from existing Parquet file

Well, even to an idiot like me who's never used Impala, this is suspicious....

 

EXPLAIN SELECT * FROM nrtest2;

1	Per-Host Resource Reservation: Memory=0B
2	Per-Host Resource Estimates: Memory=10.00MB
3	WARNING: The following tables are missing relevant table and/or column statistics.
4	default.nrtest2
5	
6	PLAN-ROOT SINK
7	|
8	01:EXCHANGE [UNPARTITIONED]
9	|
10	00:SCAN HDFS [default.nrtest2]
11	   partitions=1/1 files=0 size=0B

How do I get Impala to tell me where on disk it's looking for files? DESCRIBE gives me the schema for the table but not the location of the table.

 

Cloudera Employee
Posts: 290
Registered: ‎10-16-2013

Re: Create Impala table from existing Parquet file

Your steps were perfectly reasonable - this flow should work.

 

Agree, the EXPLAIN definitely looks suspicious.

 

You can get the location via SHOW CREATE TABLE <table>.

 

Perhaps there is an issue with mixed-case HDFS paths. You might try converting everything to lower case to see if it works. I'd love to know either way so I can file a bug and fix it :)

 

Thanks!

 

New Contributor
Posts: 3
Registered: ‎10-09-2017

Re: Create Impala table from existing Parquet file

Huh... the SHOW CREATE TABLE indicated STORED AS TEXTFILE.

 

I'm going to guess this is a Stupid User Trick and that I mistyped something. Dropping and recreating the table appears to have resolved the issue, although I had limited time to look at it today.

 

Thanks for the help, everyone. Getting the SHOW CREATE TABLE hint was key. I learned something today!

Cloudera Employee
Posts: 290
Registered: ‎10-16-2013

Re: Create Impala table from existing Parquet file

Thanks for following up! Glad you got it working.

Announcements