Support Questions

Manus · ‎06-01-2016

Hello everyone,

I have posted query below in which i have used 3 partition table and all are the external table pointing to some partition ditectory on hdfs.In every table partition is done on "ds" column(i.e. ds=2016-06-01-08).

ds=year-month-day-hour

On hourly basis data is landing on hdfs,creating directory as above timestamp and tables are pointing to all those partition directories.

A question is:

As already i explained all tables are partitioned,but

Where to write where clause in below query? so that data will be processed for that particular partition only.

Please let me know.

Query:

INSERT OVERWRITE TABLE falconexample.Patient_proce PARTITION (${falcon_output_partitions_hive}) select p.id,p.gender, p.Age, p.birthdate, o.component[1].valuequantity.value, o.component[1].valuequantity.unit from (select *, floor(datediff(to_date(from_unixtime(unix_timestamp())), to_date(birthdate)) / 365.25) as Age FROM falconexample. patient1) p inner join falconexample.DiagnosticReport1 d on p.id = substr(d.subject.reference,9) inner join falconexample.Observation1 o on p.id = substr(o.subject.reference,9) where p.Age>17 and p.Age<86 and o.component[1].valuequantity.value <140;

sunile_manjee · ‎06-01-2016

@Manoj Dhake

Partition Based Queries

In general, a SELECT query scans the entire table (other than for sampling). If a table created using the PARTITIONED BY clause, a query can do partition pruning and scan only a fraction of the table relevant to the partitions specified by the query. Hive currently does partition pruning if the partition predicates are specified in the WHERE clause or the ON clause in a JOIN. For example, if table page_views is partitioned on column date, the following query retrieves rows for just days between 2008-03-01 and 2008-03-31.

SELECT page_views.* FROM page_views WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31'

If a table page_views is joined with another table dim_users, you can specify a range of partitions in the ON clause as follows:

SELECT page_views.* FROM page_views JOIN dim_users ON (page_views.user_id = dim_users.id AND page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31')

See also Group By.
See also Sort By / Cluster By / Distribute By / Order By.

Manus · ‎06-01-2016

Thank you Sunile for giving such a quick response.

According to your answer,

I can write something like

............................................

.....falconexample.DiagnosticReport1 d on (p.id = substr(d.subject.reference,9) and p.ds='2016-06-01-10') inner join falconexample.Observation1 o on (p.id = substr(o.subject.reference,9)and o.ds='2016-06-01-10' and d.ds='2016-06-01-10')

Right?

sunile_manjee · ‎06-01-2016

@Manoj Dhake as long as the those dates are within the partition range then yes it will only query the partition. the CBO will help make that determination.

sunile_manjee · ‎06-02-2016

@Manoj Dhake please advise if I have answered your question.

Arc · ‎09-29-2017

i don't think that hive does pruning on the table at left side during join, even if you provide the predicate in ON clause. That works in sql but not in hive. i am using hive 1.2. you can check this fact by using "explain <query>" . it is better to have the table pruned using sub query , and then join it with another table. let me know your thoughts.

gopalv · ‎06-02-2016

floor(datediff(to_date(from_unixtime(unix_timestamp())), to_date(birthdate)) / 365.25

That unix_timestamp() could turn off a few optimizations in the planner, which might not be related to this issue.

Start using CURRENT_DATE instead of the unix_timestamp() call, for faster queries.

Cloudera Community

Support Questions

How to ask hive query to fetch data for specific partition?

Partition Based Queries

hive Insert to Dynamic Partition query Generating ...

Hive insert query optimization

HIVE - Duplicate table and merge partitions from ...

Running Hive LLAP on specific Nodes using YARN Nod...

Querying Hbase tables through Hive always creates ...

Querying Data via SparkSQL with ODBC Tools

Unable to use a specific yarn queue when submittin...

querying partitions in impala

How to fix ambari Hive View 1.5 /2.0 'Result fetch...

How to use K9s to fetch metrics and logs for Cloud...