Reply
New Contributor
Posts: 3
Registered: ‎04-17-2018
Accepted Solution

External Table from Parquet folder returns empty result

Hi, I did search for similar issues but didn't find a perfect match for my case:

 

 

CREATE EXTERNAL TABLE processed_data LIKE PARQUET '/user/myuser/spark/proyecto1/date=2015-08-17/hour=21/part-00087-7a6eac40-0db4-4368-894c-72588f80dfd6.c000.snappy.parquet'

STORED AS PARQUET

LOCATION '/user/myuser/spark/proyecto1'

 

The table creation works, but any query to the table will return 0 rows.

 

Query: SHOW CREATE TABLE processed_data
+---------------------------------------------------------------------------------------------------------------------------+
| result                                                                                                                    |
+---------------------------------------------------------------------------------------------------------------------------+
| CREATE EXTERNAL TABLE josemidb.processed_data (                                                                           |
|   window STRUCT<start:TIMESTAMP,end:TIMESTAMP> COMMENT 'Inferred from Parquet file.',                                     |
|   terminating_cellid BIGINT COMMENT 'Inferred from Parquet file.',                                                        |
|   termination_id BIGINT COMMENT 'Inferred from Parquet file.',                                                            |
|   termination_reason BIGINT COMMENT 'Inferred from Parquet file.',                                                        |
|   numcalls BIGINT COMMENT 'Inferred from Parquet file.'                                                                   |
| )                                                                                                                         |
| STORED AS PARQUET                                                                                                         |
| LOCATION 'hdfs://myhost:8020/user/myuser/spark/proyecto1'                                    |
| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='false', 'numFiles'='18', 'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='67251') |
+---------------------------------------------------------------------------------------------------------------------------+

This shows numFiles = 18 which is right, but numRows = -1 :(

 

I think my problem is that I'm not handling partitions the right way. Can you help me with this?

 

Thank you.

Highlighted
Expert Contributor
Posts: 115
Registered: ‎07-17-2017

Re: External Table from Parquet folder returns empty result

Hi @messenjah00,

Can you share with us the proyecto1 table schema?
I think you must use PARTITIONED BY in the create statement.

Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: External Table from Parquet folder returns empty result

1. You need to add the appropriate PARTITION clause to your CREATE EXTERNAL TABLE. Looks like it should be PARTITIONED BY (`date` STRING)

2. You need to run ALTER TABLE RECOVER PARTITIONS to discover all the partitions for the table

 

After that you should be good to go!

Cloudera Employee
Posts: 4
Registered: ‎08-21-2017

Re: External Table from Parquet folder returns empty result

It looks like the path to the parquet file that you've used to infer the schema of the table has multiple directories that correspond to partitions (e.g., date=2015-08-17/hour=21).

Create the table with the schema and partitioning specified, then use alter partition to add the per-partition files (https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet.html)

The -1 shown is probably from not running compute stats.
New Contributor
Posts: 3
Registered: ‎04-17-2018

Re: External Table from Parquet folder returns empty result

Thank you all,

I added..

PARTITIONED BY (`date` STRING, hour INT)

 ..to the CREATE External TABLE sentence and then..

ALTER TABLE processed_data_p ADD PARTITION (`date`="2015-08-19", hour=20) LOCATION '/user/myuser/spark/proyecto1/date=2015-08-19/hour=20';

One option to avoid adding all partitions manually is:

alter table processed_data_p recover partitions;
Announcements