Member since
01-13-2017
3
Posts
0
Kudos Received
0
Solutions
09-07-2018
04:08 PM
Thanks, just what I needed ... Ambari has been constantly having problems with this for last few releases, every time it's something different.
... View more
10-03-2017
08:01 PM
It has been suggested to me that CONCATENATE is not necessarily reliable, I didn't see issues in my testing however. If your table is partitioned and only "recent" partitions are updated you can "reprocess" these partitions periodically to consolidate files:
create temp table INSERT OVERWRITE temp_table AS SELECT ... date='whatever' swap partitions (DROP, RENAME or EXCHANGE as you like to move the new partition to live table) We take this approach for some cases (reprocess recent partitions). An alternative might be ACID tables which support compaction automatically - I have no experience with this option at all.
... View more
09-29-2017
01:45 PM
As of the HDP 2.6.1 release here is a query that I use to find row counts on a specific partitioned table: SELECT
* FROM hive.PARTITION_PARAMS AS A, hive.PARTITIONS AS B WHERE
A.PARAM_KEY='numRows' and A.PART_ID=B.PART_ID and A.PART_ID IN ( SELECT
PART_ID FROM hive.PARTITIONS WHERE TBL_ID=(SELECT A.TBL_ID FROM
hive.TBLS AS A, hive.DBS AS B WHERE A.DB_ID=B.DB_ID AND
B.NAME='DATABASE_NAME' AND A.TBL_NAME='TABLE_NAME')) This specific
query gives me row counts for each partition of a specific table. You
need to replace DATABASE_NAME and TABLE_NAME with the one you are
interested in. MySQL metastore but should work with others.
... View more