Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Create Impala table from existing Parquet file

Solved Go to solution

Create Impala table from existing Parquet file

New Contributor

I have a Parquet file that has 5,000 records in it.

I moved it to HDFS and ran the Impala command:

CREATE EXTERNAL TABLE mytable LIKE PARQUET '/user/hive/MyDataFolder/MyData.Parquet'
STORED AS PARQUET
LOCATION '/user/hive/MyDataFolder';

Impala creates the table, and I can see the correct schema in Hue.

I know that this Parquet file has 5,000 records in it. I know this because I put 5,000 records in it when I created it, and because I can query it with Drill and see 5,000 records. However, when I perform a query in Impala:

SELECT * FROM mytable;

I get back 0 rows.

Why? How do I tell Impala to 'see' those 5,000 records in the file I gave it?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Create Impala table from existing Parquet file

New Contributor

Huh... the SHOW CREATE TABLE indicated STORED AS TEXTFILE.

 

I'm going to guess this is a Stupid User Trick and that I mistyped something. Dropping and recreating the table appears to have resolved the issue, although I had limited time to look at it today.

 

Thanks for the help, everyone. Getting the SHOW CREATE TABLE hint was key. I learned something today!

6 REPLIES 6

Re: Create Impala table from existing Parquet file

Master Collaborator

Sorry for the trouble - this is expected to work.

 

Can you provide the query profile of the query that returned 0 rows?

Re: Create Impala table from existing Parquet file

New Contributor

Well, even to an idiot like me who's never used Impala, this is suspicious....

 

EXPLAIN SELECT * FROM nrtest2;

1	Per-Host Resource Reservation: Memory=0B
2	Per-Host Resource Estimates: Memory=10.00MB
3	WARNING: The following tables are missing relevant table and/or column statistics.
4	default.nrtest2
5	
6	PLAN-ROOT SINK
7	|
8	01:EXCHANGE [UNPARTITIONED]
9	|
10	00:SCAN HDFS [default.nrtest2]
11	   partitions=1/1 files=0 size=0B

How do I get Impala to tell me where on disk it's looking for files? DESCRIBE gives me the schema for the table but not the location of the table.

 

Re: Create Impala table from existing Parquet file

Master Collaborator

Your steps were perfectly reasonable - this flow should work.

 

Agree, the EXPLAIN definitely looks suspicious.

 

You can get the location via SHOW CREATE TABLE <table>.

 

Perhaps there is an issue with mixed-case HDFS paths. You might try converting everything to lower case to see if it works. I'd love to know either way so I can file a bug and fix it :)

 

Thanks!

 

Highlighted

Re: Create Impala table from existing Parquet file

New Contributor

Huh... the SHOW CREATE TABLE indicated STORED AS TEXTFILE.

 

I'm going to guess this is a Stupid User Trick and that I mistyped something. Dropping and recreating the table appears to have resolved the issue, although I had limited time to look at it today.

 

Thanks for the help, everyone. Getting the SHOW CREATE TABLE hint was key. I learned something today!

Re: Create Impala table from existing Parquet file

Master Collaborator

Thanks for following up! Glad you got it working.

Re: Create Impala table from existing Parquet file

Champion

@Nevo  Its costly but could you fire the below command and see if that fixes 

REFRESH [db_name.]table_name



REFRESH DATABASE_NAME.TABLE_NAME