Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to identify dynamic partitions created by a batch job for a particular run

Highlighted

How to identify dynamic partitions created by a batch job for a particular run

Rising Star

I have batch job, which run over the night and it create the dynamic partitions. How can i identify the number of partitions created by the batch job.

1 REPLY 1

Re: How to identify dynamic partitions created by a batch job for a particular run

Hi @Reddy.

This data (recent partitions created) is available in the Hive metastore.

Here is a very helpful Hive Metastore ER Diagram: (from @Rajkumar Singh)

https://community.hortonworks.com/questions/57036/is-there-a-updated-hive-121-metastore-er-diagram.h...

Here is an excellent post on querying the metastore:

http://www.openkb.info/2015/04/how-to-list-table-or-partition-location.html

If you have access to the Hive Metastore RDBMS, you can directly query the relational tables to get the info you need. Here is a sample MySQL query to get recent partitions...

select DBS.NAME as DB_NAME, TBLS.TBL_NAME, PARTITIONS.PART_NAME, SDS.LOCATION,
       from_unixtime(PARTITIONS.CREATE_TIME) as CREATE_TIME_FORMATTED, PARTITIONS.CREATE_TIME  
from SDS,TBLS,PARTITIONS,DBS
where PARTITIONS.SD_ID = SDS.SD_ID
  and TBLS.TBL_ID=PARTITIONS.TBL_ID
  and TBLS.DB_ID=DBS.DB_ID
  and TBLS.TBL_NAME='inventory'
order by PARTITIONS.CREATE_TIME DESC, DBS.NAME, TBLS.TBL_NAME
LIMIT 3; 

| DB_NAME                     | TBL_NAME  | PART_NAME           | LOCATION                                                                                     | CREATE_TIME_FORMATTED | CREATE_TIME |
+-----------------------------+-----------+---------------------+-------------------------------
| tpcds_bin_partitioned_orc_3 | inventory | inv_date_sk=2451760 | <FQDN>:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_3.db/inventory/inv_date_sk=2451760 | 2016-12-06 11:39:43   |  1481053183 |
| tpcds_bin_partitioned_orc_3 | inventory | inv_date_sk=2451445 | <FQDN>:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_3.db/inventory/inv_date_sk=2451445 | 2016-12-06 11:39:43   |  1481053183 |
| tpcds_bin_partitioned_orc_3 | inventory | inv_date_sk=2451326 | <FQDN>:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_3.db/inventory/inv_date_sk=2451326 | 2016-12-06 11:39:43   |  1481053183 |

But again - this solution requires you to have access to the MySQL metastore RDBMS.

You can also access the Hive metastore data from within the cluster - if you use the hive metatool. When you query the metastore using the metatool - you have to use ORM query language. It is clunky and not well-documented.

If I can get this kind of query working using the hive metatool (using ORM QL) I will edit & add to this post.

Here is a high-level example of using the Hive metatool. (from @gopal)

HIVE_CONF_DIR=/etc/hive/conf/conf.server/ 
hive --service metatool -executeJDOQL "select name from org.apache.hadoop.hive.metastore.model.MDatabase" 

hive --service metatool -executeJDOQL "select database.name + '.' + tableName from org.apache.hadoop.hive.metastore.model.MTable"

You can find the ORM data layouts here:

https://github.com/apache/hive/blob/master/metastore/src/model/package.jdo

I hope this helps.

Don't have an account?
Coming from Hortonworks? Activate your account here