Support Questions

Find answers, ask questions, and share your expertise

Create Impala table from existing Parquet file

avatar
New Contributor

I have a Parquet file that has 5,000 records in it.

I moved it to HDFS and ran the Impala command:

CREATE EXTERNAL TABLE mytable LIKE PARQUET '/user/hive/MyDataFolder/MyData.Parquet'
STORED AS PARQUET
LOCATION '/user/hive/MyDataFolder';

Impala creates the table, and I can see the correct schema in Hue.

I know that this Parquet file has 5,000 records in it. I know this because I put 5,000 records in it when I created it, and because I can query it with Drill and see 5,000 records. However, when I perform a query in Impala:

SELECT * FROM mytable;

I get back 0 rows.

Why? How do I tell Impala to 'see' those 5,000 records in the file I gave it?

1 ACCEPTED SOLUTION

avatar
New Contributor

Huh... the SHOW CREATE TABLE indicated STORED AS TEXTFILE.

 

I'm going to guess this is a Stupid User Trick and that I mistyped something. Dropping and recreating the table appears to have resolved the issue, although I had limited time to look at it today.

 

Thanks for the help, everyone. Getting the SHOW CREATE TABLE hint was key. I learned something today!

View solution in original post

6 REPLIES 6

avatar

Sorry for the trouble - this is expected to work.

 

Can you provide the query profile of the query that returned 0 rows?

avatar
New Contributor

Well, even to an idiot like me who's never used Impala, this is suspicious....

 

EXPLAIN SELECT * FROM nrtest2;

1	Per-Host Resource Reservation: Memory=0B
2	Per-Host Resource Estimates: Memory=10.00MB
3	WARNING: The following tables are missing relevant table and/or column statistics.
4	default.nrtest2
5	
6	PLAN-ROOT SINK
7	|
8	01:EXCHANGE [UNPARTITIONED]
9	|
10	00:SCAN HDFS [default.nrtest2]
11	   partitions=1/1 files=0 size=0B

How do I get Impala to tell me where on disk it's looking for files? DESCRIBE gives me the schema for the table but not the location of the table.

 

avatar

Your steps were perfectly reasonable - this flow should work.

 

Agree, the EXPLAIN definitely looks suspicious.

 

You can get the location via SHOW CREATE TABLE <table>.

 

Perhaps there is an issue with mixed-case HDFS paths. You might try converting everything to lower case to see if it works. I'd love to know either way so I can file a bug and fix it 🙂

 

Thanks!

 

avatar
New Contributor

Huh... the SHOW CREATE TABLE indicated STORED AS TEXTFILE.

 

I'm going to guess this is a Stupid User Trick and that I mistyped something. Dropping and recreating the table appears to have resolved the issue, although I had limited time to look at it today.

 

Thanks for the help, everyone. Getting the SHOW CREATE TABLE hint was key. I learned something today!

avatar

Thanks for following up! Glad you got it working.

avatar
Champion

@Nevo  Its costly but could you fire the below command and see if that fixes 

REFRESH [db_name.]table_name



REFRESH DATABASE_NAME.TABLE_NAME