Support Questions

Nevo · ‎10-09-2017

I have a Parquet file that has 5,000 records in it.

I moved it to HDFS and ran the Impala command:

CREATE EXTERNAL TABLE mytable LIKE PARQUET '/user/hive/MyDataFolder/MyData.Parquet'
STORED AS PARQUET
LOCATION '/user/hive/MyDataFolder';

Impala creates the table, and I can see the correct schema in Hue.

I know that this Parquet file has 5,000 records in it. I know this because I put 5,000 records in it when I created it, and because I can query it with Drill and see 5,000 records. However, when I perform a query in Impala:

SELECT * FROM mytable;

I get back 0 rows.

Why? How do I tell Impala to 'see' those 5,000 records in the file I gave it?

Nevo · ‎10-10-2017

Huh... the SHOW CREATE TABLE indicated STORED AS TEXTFILE.

I'm going to guess this is a Stupid User Trick and that I mistyped something. Dropping and recreating the table appears to have resolved the issue, although I had limited time to look at it today.

Thanks for the help, everyone. Getting the SHOW CREATE TABLE hint was key. I learned something today!

View solution in original post

alex.behm · ‎10-09-2017

Sorry for the trouble - this is expected to work.

Can you provide the query profile of the query that returned 0 rows?

Nevo · ‎10-10-2017

Well, even to an idiot like me who's never used Impala, this is suspicious....

EXPLAIN SELECT * FROM nrtest2;

1	Per-Host Resource Reservation: Memory=0B
2	Per-Host Resource Estimates: Memory=10.00MB
3	WARNING: The following tables are missing relevant table and/or column statistics.
4	default.nrtest2
5	
6	PLAN-ROOT SINK
7	|
8	01:EXCHANGE [UNPARTITIONED]
9	|
10	00:SCAN HDFS [default.nrtest2]
11	   partitions=1/1 files=0 size=0B

How do I get Impala to tell me where on disk it's looking for files? DESCRIBE gives me the schema for the table but not the location of the table.

alex.behm · ‎10-10-2017

Your steps were perfectly reasonable - this flow should work.

Agree, the EXPLAIN definitely looks suspicious.

You can get the location via SHOW CREATE TABLE <table>.

Perhaps there is an issue with mixed-case HDFS paths. You might try converting everything to lower case to see if it works. I'd love to know either way so I can file a bug and fix it 🙂

Thanks!

Nevo · ‎10-10-2017

Huh... the SHOW CREATE TABLE indicated STORED AS TEXTFILE.

I'm going to guess this is a Stupid User Trick and that I mistyped something. Dropping and recreating the table appears to have resolved the issue, although I had limited time to look at it today.

Thanks for the help, everyone. Getting the SHOW CREATE TABLE hint was key. I learned something today!

alex.behm · ‎10-10-2017

Thanks for following up! Glad you got it working.

csguna · ‎10-09-2017

@Nevo Its costly but could you fire the below command and see if that fixes

REFRESH [db_name.]table_name



REFRESH DATABASE_NAME.TABLE_NAME

Cloudera Community

Support Questions

Create Impala table from existing Parquet file