Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

External Table from Parquet folder returns empty result

Solved Go to solution

External Table from Parquet folder returns empty result

New Contributor

Hi, I did search for similar issues but didn't find a perfect match for my case:

 

 

CREATE EXTERNAL TABLE processed_data LIKE PARQUET '/user/myuser/spark/proyecto1/date=2015-08-17/hour=21/part-00087-7a6eac40-0db4-4368-894c-72588f80dfd6.c000.snappy.parquet'

STORED AS PARQUET

LOCATION '/user/myuser/spark/proyecto1'

 

The table creation works, but any query to the table will return 0 rows.

 

Query: SHOW CREATE TABLE processed_data
+---------------------------------------------------------------------------------------------------------------------------+
| result                                                                                                                    |
+---------------------------------------------------------------------------------------------------------------------------+
| CREATE EXTERNAL TABLE josemidb.processed_data (                                                                           |
|   window STRUCT<start:TIMESTAMP,end:TIMESTAMP> COMMENT 'Inferred from Parquet file.',                                     |
|   terminating_cellid BIGINT COMMENT 'Inferred from Parquet file.',                                                        |
|   termination_id BIGINT COMMENT 'Inferred from Parquet file.',                                                            |
|   termination_reason BIGINT COMMENT 'Inferred from Parquet file.',                                                        |
|   numcalls BIGINT COMMENT 'Inferred from Parquet file.'                                                                   |
| )                                                                                                                         |
| STORED AS PARQUET                                                                                                         |
| LOCATION 'hdfs://myhost:8020/user/myuser/spark/proyecto1'                                    |
| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='false', 'numFiles'='18', 'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='67251') |
+---------------------------------------------------------------------------------------------------------------------------+

This shows numFiles = 18 which is right, but numRows = -1 :(

 

I think my problem is that I'm not handling partitions the right way. Can you help me with this?

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: External Table from Parquet folder returns empty result

Master Collaborator

1. You need to add the appropriate PARTITION clause to your CREATE EXTERNAL TABLE. Looks like it should be PARTITIONED BY (`date` STRING)

2. You need to run ALTER TABLE RECOVER PARTITIONS to discover all the partitions for the table

 

After that you should be good to go!

4 REPLIES 4

Re: External Table from Parquet folder returns empty result

Expert Contributor

Hi @messenjah00,

Can you share with us the proyecto1 table schema?
I think you must use PARTITIONED BY in the create statement.

Re: External Table from Parquet folder returns empty result

Master Collaborator

1. You need to add the appropriate PARTITION clause to your CREATE EXTERNAL TABLE. Looks like it should be PARTITIONED BY (`date` STRING)

2. You need to run ALTER TABLE RECOVER PARTITIONS to discover all the partitions for the table

 

After that you should be good to go!

Re: External Table from Parquet folder returns empty result

Cloudera Employee
It looks like the path to the parquet file that you've used to infer the schema of the table has multiple directories that correspond to partitions (e.g., date=2015-08-17/hour=21).

Create the table with the schema and partitioning specified, then use alter partition to add the per-partition files (https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet.html)

The -1 shown is probably from not running compute stats.

Re: External Table from Parquet folder returns empty result

New Contributor

Thank you all,

I added..

PARTITIONED BY (`date` STRING, hour INT)

 ..to the CREATE External TABLE sentence and then..

ALTER TABLE processed_data_p ADD PARTITION (`date`="2015-08-19", hour=20) LOCATION '/user/myuser/spark/proyecto1/date=2015-08-19/hour=20';

One option to avoid adding all partitions manually is:

alter table processed_data_p recover partitions;
Don't have an account?
Coming from Hortonworks? Activate your account here