Member since
10-13-2016
68
Posts
9
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
542 | 02-15-2019 11:50 AM | |
1739 | 10-12-2017 02:03 PM | |
170 | 10-13-2016 11:52 AM |
12-11-2017
01:54 PM
hdp 2.6.0, HMS has 6GB memory, the metastore itself is mysql. After a few days the server hosting the HMS will have its CPU 100% used, hive queries are slow, and looking at the GC logs the HMS is constantly having stop the world events. Restarting the metastore 'fixes' the problem for a few days. I have found a few JIRA link related to memory leaks https://issues.apache.org/jira/browse/HIVE-15551 or https://issues.apache.org/jira/browse/HIVE-13749 . Is it a known issue in hdp 2.6.0? Is it known if the latest hdp version is fixed? Thanks,
... View more
Labels:
11-22-2017
08:48 AM
TL;DR; how to properly set up hive.tez.container.size for a job with wildly different steps? I have a 8 data node hdp2.6 cluster, all data nodes are identical, with 32GB ram. yarn.scheduler.maximum-allocation-mb is set up to the total server ram minus what is used by other services (OS, nodemanager...), ie. 20GB in my case, yarn.scheduler.minimum-allocation-mb is set up to 1GB, I am running only one hive MERGE statement, once per day, which has about 100k mappers. If I set up hive.tez.container.size to 1GB, many mappers can run in parallel (faster query), but I will end up with one of those errors: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1510697553800_0993_2_03, diagnostics=[Task failed, taskId=task_1510697553800_0993_2_03_000150, diagnostics=[TaskAttempt 0 failed, info=[Container container_e102_1510697553800_0993_01_000042 finished with diagnostics set to [Container failed, exitCode=-104. Container [pid=32295,containerID=container_e102_1510697553800_0993_01_000042] is running beyond physical memory limits. Current usage: 5.4 GB of 5.3 GB physical memory used; 7.4 GB of 11.0 GB virtual memory used. Killing container. Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1511269090751_0011_2_03, diagnostics=[Exception in VertexManager, vertex:vertex_1511269090751_0011_2_03 [Reducer 3],org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source should exist. If I set up hive.tez.container.size to a bigger value I will run a lot less queries in parallel (longer query time) but eventually the query will succeed. The thing is that I do not know in advance how big the data will be so even if I by trial and error find a good hive.tez.container.size it might not be good enough tomorrow, and maybe eventually my server memory will be too small . Further more, sizing for the worst case scenario feels like a waste of resource. Is there any way to have a sort of dynamic tez container size to get a fast and succeeding query? Cheers,
... View more
Labels:
10-24-2017
04:21 AM
Then I interpret this setting as "if there is too much data let's use it all instead of pruning it" and am very confused 🙂 I suppose it's due to internal hive implementation as you said.
... View more
10-23-2017
07:41 PM
I indeed see INFO [HiveServer2-Handler-Pool: Thread-107]: optimizer.RemoveDynamicPruningBySize (RemoveDynamicPruningBySize.java:process(61)) - Disabling dynamic pruning for: TS. Expected data size is too big: 1119008712 So if I understand well, this has to do with event size and not data size? I did try to get the value very high to enable pruning, pruning did indeed occur but locking all partitions timed out. Will post an explain asap.
... View more
10-23-2017
07:32 PM
I added this explain.
... View more
10-23-2017
05:05 AM
hive.tez.dynamic.partition.pruning already globally true hive.optimize.ppd true by default, I explicitly set it to true hive.optimize.index.filter false by default, I set it to true I set hive.tez.bucket.pruning to true as well. I think that my issue is related to https://community.hortonworks.com/questions/142167/why-not-set-hivetezdynamicpartitionpruningmaxdatas.html Thanks for your help!
... View more
10-20-2017
09:55 AM
Context: I have an issue with a MERGE statement, which does not use the partitions of the destination table. Looking for solutions, I stumbled upon this JIRA ticket which creates 3 new (in hive 0.14) configuration options: hive.tez.dynamic.partition.pruning: default true
hive.tez.dynamic.partition.pruning.max.event.size: default 1*1024*1024L
hive.tez.dynamic.partition.pruning.max.data.size: default 100*1024*1024L Now I wonder why should I not just set these variables to the max value possible to make sure that partition pruning always happen? It is disabled if data size is too big, but I find it counter intuitive as not pruning will massively increase data size. Cheers,
... View more
Labels:
10-20-2017
05:53 AM
Thanks @Eugene Koifman. You are of course right, the issue is that I updated the MERGE text without reflecting the edits on the CREATE (now fixed in the questions). I am indeed using the partitions in the MERGE: ...
ON dst.license_name = src.license_name AND dst.campaign_id = src.campaign_id
... but as far as I can tell, the pruning does not happen.
... View more
10-19-2017
12:27 PM
I have 2 tables, with the same structure: CREATE TABLE IF NOT EXISTS src (
-- DISTINCT field (+ partitions)
id BIGINT
-- other fields
, email STRING
, domain STRING
, lang STRING
, mobile_nr STRING
, custom_fields STRING
, groups array<struct<group_id:bigint,campaign_id:bigint,member_since_ts_utc:bigint>>
, ts_utc TIMESTAMP
, sys_schema_version INT
, sys_server_ipv4 BIGINT
, sys_server_name STRING
)
PARTITIONED BY (
license_name STRING
, campaign_id INT
)
CLUSTERED BY (id)
INTO 64 BUCKETS
STORED AS ORC
One is a source table (basically recreated from scratch with new data for each merge, so needs to be fully reprocessed everytime) one is the destination table, which will grow. Both tables have the same partitions and bucket definitions. When I EXPLAIN the MERGE statement, which has a join on the partitions and the bucketed field, I cannot see any partition pruning happening. set hive.merge.cardinality.check=false;
set hive.tez.exec.print.summary=true;
set tez.user.explain=true;
explain MERGE INTO
-- default.2steps_false_64_1
vault.contact
dst
USING default.2steps_2steps_false_64_1 src
ON
dst.license_name = src.license_name
AND dst.campaign_id = src.campaign_id
AND dst.id = src.id
-- On match: keep latest loaded
WHEN MATCHED
AND dst.updated_on_utc < src.ts_utc
THEN UPDATE SET
-- other fields
email = src.email
, city = src.city
, lang = src.lang
, mobile_nr = src.mobile_nr
, custom_fields = src.custom_fields
, groups = src.groups
, updated_on_utc = src.ts_utc
, sys_schema_version = src.sys_schema_version
, sys_server_ipv4 = src.sys_server_ipv4
, sys_server_name = src.sys_server_name
WHEN NOT MATCHED THEN INSERT VALUES (
src.id
, src.email
, src.city
, src.lang
, src.mobile_nr
, src.custom_fields
, src.groups
, src.ts_utc
, src.ts_utc
, NULL -- deleted_on
, src.sys_schema_version
, src.sys_server_ipv4
, src.sys_server_name
, src.license_name
, src.campaign_id
)
;
+-----------------------------------------------------------------------------------------------------------------------------+--+
| Explain
+-----------------------------------------------------------------------------------------------------------------------------+--+
| Vertex dependency in root stage
| Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE)
| Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
| Reducer 4 <- Reducer 2 (SIMPLE_EDGE)
|
| Stage-5
| Stats-Aggr Operator
| Stage-0
| Move Operator
| partition:{}
| table:{"name:":"vault.contact","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
| Stage-3
| Dependency Collection{}
| Stage-2
| Reducer 3
| File Output Operator [FS_904]
| compressed:true
| Statistics:Num rows: 496014 Data size: 166660704 Basic stats: COMPLETE Column stats: PARTIAL
| table:{"name:":"vault.contact","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
| Select Operator [SEL_901]
| | outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15"]
| | Statistics:Num rows: 496014 Data size: 166660704 Basic stats: COMPLETE Column stats: PARTIAL
| |<-Reducer 2 [SIMPLE_EDGE]
| Reduce Output Operator [RS_900]
| key expressions:_col0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>)
| Map-reduce partition columns:UDFToInteger(_col0) (type: int)
| sort order:+
| Statistics:Num rows: 496014 Data size: 257927280 Basic stats: COMPLETE Column stats: PARTIAL
| value expressions:_col1 (type: bigint), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: array<struct<group_id:bigint,campaign_id:bigint,member_since_ts_utc:bigint>>), _col8 (type: timestamp), _col9 (type: timestamp), _col10 (type: timestamp), _col11 (type: int), _col12 (type: bigint), _col13 (type: string), _col14 (type: string), _col15 (type: bigint)
| Select Operator [SEL_899]
| outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15"]
| Statistics:Num rows: 496014 Data size: 257927280 Basic stats: COMPLETE Column stats: PARTIAL
| Filter Operator [FIL_906]
| predicate:((_col13 = _col29) and (_col14 = _col30) and (_col0 = _col18) and (_col7 < _col25)) (type: boolean)
| Statistics:Num rows: 496014 Data size: 226182384 Basic stats: COMPLETE Column stats: PARTIAL
| Merge Join Operator [MERGEJOIN_916]
| | condition map:[{"":"Right Outer Join0 to 1"}]
| | keys:{"0":"license_name (type: string), campaign_id (type: bigint), id (type: bigint)","1":"license_name (type: string), UDFToLong(campaign_id) (type: bigint), id (type: bigint)"}
| | outputColumnNames:["_col0","_col7","_col8","_col9","_col13","_col14","_col17","_col18","_col19","_col20","_col21","_col22","_col23","_col24","_col25","_col26","_col27","_col28","_col29","_col30"]
| | Statistics:Num rows: 11904348 Data size: 25284835152 Basic stats: COMPLETE Column stats: PARTIAL
| |<-Map 1 [SIMPLE_EDGE]
| | Reduce Output Operator [RS_889]
| | key expressions:license_name (type: string), campaign_id (type: bigint), id (type: bigint)
| | Map-reduce partition columns:license_name (type: string), campaign_id(type: bigint), id (type: bigint)
| | sort order:+++
| | Statistics:Num rows: 129102910 Data size: 16525280556 Basic stats: COMPLETE Column stats: PARTIAL
| | value expressions:updated_on_utc (type: timestamp), created_on_utc (type: timestamp), deleted_on_utc (type: timestamp), ROW__ID (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>)
| | TableScan [TS_887]
| | ACID table:true
| | alias:dst
| | Statistics:Num rows: 129102910 Data size: 16525280556 Basic stats: COMPLETE Column stats: PARTIAL
| |<-Map 5 [SIMPLE_EDGE]
| Reduce Output Operator [RS_890]
| key expressions:license_name (type: string), UDFToLong(campaign_id) (type: bigint), id (type: bigint)
| Map-reduce partition columns:license_name (type: string), UDFToLong(campaign_id) (type: bigint), id (type: bigint)
| sort order:+++
| Statistics:Num rows: 11904348 Data size: 29935728348 Basic stats: COMPLETE Column stats: PARTIAL
| value expressions:email (type: string), city (type: string), lang (type: string), mobile_nr (type: string), custom_fields (type: string), groups (type: array<struct<group_id:bigint,campaign_id:bigint,member_since_ts_utc:bigint>>), ts_utc (type: timestamp), sys_schema_version (type: int), sys_server_ipv4 (type: bigint), sys_server_name (type: string), campaign_id (type: int)
| TableScan [TS_888]
| alias:src
| Statistics:Num rows: 11904348 Data size: 29935728348 Basic stats: COMPLETE Column stats: PARTIAL
| Reducer 4
| File Output Operator [FS_897]
| compressed:true
| Statistics:Num rows: 1 Data size: 188 Basic stats: COMPLETE Column stats: PARTIAL
| table:{"name:":"vault.contact","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
| Select Operator [SEL_895]
| | outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14"]
| | Statistics:Num rows: 1 Data size: 188 Basic stats: COMPLETE Column stats: PARTIAL
| |<-Reducer 2 [SIMPLE_EDGE]
| Reduce Output Operator [RS_894]
| Map-reduce partition columns:_col0 (type: bigint)
| sort order:
| Statistics:Num rows: 1 Data size: 188 Basic stats: COMPLETE Column stats: PARTIAL
| value expressions:_col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: array<struct<group_id:bigint,campaign_id:bigint,member_since_ts_utc:bigint>>), _col7 (type: timestamp), _col10 (type: int), _col11 (type: bigint), _col12 (type: string), _col13 (type: string), _col14 (type: int)
| Select Operator [SEL_893]
| outputColumnNames:["_col0","_col1","_col10","_col11","_col12","_col13","_col14","_col2","_col3","_col4","_col5","_col6","_col7"]
| Statistics:Num rows: 1 Data size: 188 Basic stats: COMPLETE Column stats: PARTIAL
| Filter Operator [FIL_907]
| predicate:(_col13 is null and _col14 is null and _col0 is null) (type: boolean)
| Statistics:Num rows: 1 Data size: 456 Basic stats: COMPLETE Column stats: PARTIAL
| Please refer to the previous Merge Join Operator [MERGEJOIN_916]
| Stage-4
| Stats-Aggr Operator
| Stage-1
| Move Operator
| partition:{}
| table:{"name:":"vault.contact","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
| Please refer to the previous Stage-3
|
Other explain with hive.explain.user=true 0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> set hive.merge.cardinality.check=false;
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> -- set hive.tez.dynamic.partition.pruning=true;
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> -- set hive.tez.dynamic.partition.pruning.max.data.size=107374182400; -- 100GB
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> set hive.tez.exec.print.summary=true;
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> set tez.user.explain=true;
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> set hive.explain.user=true;
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> explain MERGE INTO
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> -- default.2steps_false_64_1
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> vault.contact
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> dst
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> USING default.2steps_2steps_false_64_1 src
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> ON
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> dst.license_name = src.license_name
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> AND dst.campaign_id = src.campaign_id
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> AND dst.id = src.id
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> AND dst.license_name = 'baarn'
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> -- On match: keep latest loaded
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> WHEN MATCHED
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> AND dst.updated_on_utc < src.ts_utc
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> THEN UPDATE SET
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> -- other fields
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> email = src.email
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , domain = src.domain
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , lang = src.lang
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , mobile_nr = src.mobile_nr
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , custom_fields = src.custom_fields
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , groups = src.groups
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , updated_on_utc = src.ts_utc
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , sys_schema_version = src.sys_schema_version
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , sys_server_ipv4 = src.sys_server_ipv4
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , sys_server_name = src.sys_server_name
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> WHEN NOT MATCHED THEN INSERT VALUES (
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> src.id
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.email
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.domain
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.lang
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.mobile_nr
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.custom_fields
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.groups
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.ts_utc
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.ts_utc
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , NULL -- deleted_on
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.sys_schema_version
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.sys_server_ipv4
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.sys_server_name
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.license_name
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> , src.campaign_id
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> )
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput> ;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| Explain
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| Vertex dependency in root stage
| Map 2 <- Map 1 (BROADCAST_EDGE)
| Reducer 3 <- Map 2 (SIMPLE_EDGE)
| Reducer 4 <- Map 2 (SIMPLE_EDGE)
|
| Stage-5
| Stats-Aggr Operator
| Stage-0
| Move Operator
| partition:{}
| table:{"name:":"vault.contact","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
| Stage-3
| Dependency Collection{}
| Stage-2
| Reducer 3
| File Output Operator [FS_1461]
| compressed:true
| Statistics:Num rows: 496014 Data size: 123507486 Basic stats: COMPLETE Column stats: PARTIAL
| table:{"name:":"vault.contact","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
| Select Operator [SEL_1458]
| | outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15"]
| | Statistics:Num rows: 496014 Data size: 123507486 Basic stats: COMPLETE Column stats: PARTIAL
| |<-Map 2 [SIMPLE_EDGE]
| Reduce Output Operator [RS_1457]
| key expressions:_col0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>)
| Map-reduce partition columns:UDFToInteger(_col0) (type: int)
| sort order:+
| Statistics:Num rows: 496014 Data size: 79362240 Basic stats: COMPLETE Column stats: PARTIAL
| value expressions:_col1 (type: bigint), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: array<struct<group_id:bigint,campaign_id:bigint,member_since_ts_utc:bigint>>), _col8 (type: timestamp), _col9 (type: timestamp), _col10 (type: timestamp), _col11 (type: int), _col12 (type: bigint), _col13 (type: string), _col15 (type: bigint)
| Select Operator [SEL_1456]
| outputColumnNames:["_col0","_col1","_col10","_col11","_col12","_col13","_col15","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
| Statistics:Num rows: 496014 Data size: 79362240 Basic stats: COMPLETE Column stats: PARTIAL
| Filter Operator [FIL_1463]
| predicate:((_col13 = 'baarn') and (_col13 = _col29) and (_col14 = _col30) and (_col0 = _col18) and (_col7 < _col25)) (type: boolean)
| Statistics:Num rows: 496014 Data size: 179061054 Basic stats: COMPLETE Column stats: PARTIAL
| Map Join Operator [MAPJOIN_1474]
| | condition map:[{"":"Right Outer Join0 to 1"}]
| | HybridGraceHashJoin:true
| | keys:{"Map 2":"license_name (type: string), UDFToLong(campaign_id) (type: bigint), id (type: bigint)","Map 1":"license_name (type: string), campaign_id (type: bigint), id (type: bigint)"}
| | outputColumnNames:["_col0","_col7","_col8","_col9","_col13","_col14","_col17","_col18","_col19","_col20","_col21","_col22","_col23","_col24","_col25","_col26","_col27","_col28","_col29","_col30"]
| | Statistics:Num rows: 11904348 Data size: 24153922092 Basic stats: COMPLETE Column stats: PARTIAL
| |<-Map 1 [BROADCAST_EDGE]
| | Reduce Output Operator [RS_1446]
| | key expressions:license_name (type: string), campaign_id (type: bigint), id (type: bigint)
| | Map-reduce partition columns:license_name (type: string), campaign_id (type: bigint), id (type: bigint)
| | sort order:+++
| | Statistics:Num rows: 621448 Data size: 79546063 Basic stats: COMPLETE Column stats: PARTIAL
| | value expressions:updated_on_utc (type: timestamp), created_on_utc (type: timestamp), deleted_on_utc (type: timestamp), ROW__ID (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>)
| | TableScan [TS_1443]
| | ACID table:true
| | alias:dst
| | Statistics:Num rows: 621448 Data size: 79546063 Basic stats: COMPLETE Column stats: PARTIAL
| |<-TableScan [TS_1444]
| alias:src
| Statistics:Num rows: 11904348 Data size: 29935728348 Basic stats: COMPLETE Column stats: PARTIAL
| Reduce Output Operator [RS_1451]
| Map-reduce partition columns:_col0 (type: bigint)
| sort order:
| Statistics:Num rows: 1 Data size: 188 Basic stats: COMPLETE Column stats: PARTIAL
| value expressions:_col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: array<struct<group_id:bigint,campaign_id:bigint,member_since_ts_utc:bigint>>), _col7 (type: timestamp), _col10 (type: int), _col11 (type: bigint), _col12 (type: string), _col13 (type: string), _col14 (type: int)
| Select Operator [SEL_1450]
| outputColumnNames:["_col0","_col1","_col10","_col11","_col12","_col13","_col14","_col2","_col3","_col4","_col5","_col6","_col7"]
| Statistics:Num rows: 1 Data size: 188 Basic stats: COMPLETE Column stats: PARTIAL
| Filter Operator [FIL_1464]
| predicate:(_col13 is null and _col14 is null and _col0 is null) (type: boolean)
| Statistics:Num rows: 1 Data size: 361 Basic stats: COMPLETE Column stats: PARTIAL
| Please refer to the previous Map Join Operator [MAPJOIN_1474]
| Reducer 4
| File Output Operator [FS_1454]
| compressed:true
| Statistics:Num rows: 1 Data size: 188 Basic stats: COMPLETE Column stats: PARTIAL
| table:{"name:":"vault.contact","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
| Select Operator [SEL_1452]
| | outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14"]
| | Statistics:Num rows: 1 Data size: 188 Basic stats: COMPLETE Column stats: PARTIAL
| |<- Please refer to the previous Map 2 [SIMPLE_EDGE]
| Stage-4
| Stats-Aggr Operator
| Stage-1
| Move Operator
| partition:{}
| table:{"name:":"vault.contact","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
| Please refer to the previous Stage-3
|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
0: jdbc:hive2://ip-10-0-0-21.eu-west-1.comput>
Could anybody confirm or infirm from this explain that the partitions are properly pruned? hdp 2.6, small (4 nodes) AWS cluster
... View more
Labels:
10-12-2017
02:03 PM
The answer pointed at https://community.hortonworks.com/questions/57795/how-to-fix-under-replicated-blocks-fasly-its-take.html is the good one. Those are undocumented features in hadoop 2.7 but they can be set up and used and now I do see that replication is speed up.
... View more
10-12-2017
01:23 PM
1 Kudo
Hi, I had an issue with datanodes, resulting in having about 300k under replicated blocks. DNs are back, blocks are being replicated, but this is very slow, about 1 per second, and I am trying to find a way to speed replication up. I checked dfs.datanode.balance.bandwidthPerSec, which is set at about 6MB/seconds. My monitoring shows me that on average the rx/tx from each node is about 200k/seconds, so I am way below this limit. I followed this link https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html which did not help (use setrep -w 3 on all underreplicated files ) This link https://community.hortonworks.com/questions/57795/how-to-fix-under-replicated-blocks-fasly-its-take.html is not fully applicable (hadoop 2.7) but I set fs.namenode.replication.work.multiplier.per.iteration to 100 (default is 2) without visible speed up. So my question is: what can I do to fasten replication up? Context: hdp2.6, AWS, 3 nodes and replication factor = 3.
... View more
Labels:
10-04-2017
09:04 AM
1) was ok, 2) and 3) were not good and no it's not a kerberized cluster. After the fixes you suggested it all seems to work as expected. Thanks a million!
... View more
10-03-2017
01:38 PM
I am using hdp2.6 and would like to properly use Tez UI. The tez view is available, if I go there I see queries, can click on a query id and follow to the dag ID, but I do not have all I expect. DAG Details and DAG Counters look good. Graphical View tells me: Data not available to display graphical view!
No vertex data found in YARN Timeline Server. All Vertices, All Tasks and All Task Attempts tell me: No records available! Vertex Swimlane tells me: Data not available to display swimlane!
No vertex data found in YARN Timeline Server. I have seen the documentation relative to the manual install of HDP, saying to download a war file, but I do not believe this is what I should be doing here as I am using the ambari install from the cluster. tez.tez-ui.history-url.base is http://$ambari_ip:8080/#/main/view/TEZ/tez_cluster_instance which is indeed the URL where I can reach the tez view. Is there anything obvious I could have forgotten?
... View more
09-28-2017
11:14 AM
Fair enough about the *.period. As I did get metrics there is probably a smart default, but nice to have. I indeed found some messages in the service logs, and all looks good. To be honest, it all worked today. I then happily applied the settings to prod, and lo and behold, I only have 2 metrics there. Carrying on thinking, I understood is that in metrics2.properties I say that I want for instance node manager metrics, but I then actually need to restart the node manages to see those metrics. Indeed, the cluster I worked on yesterday has been rebooted (dev cluster, switched off at night). Now all works as expected. Thanks!
... View more
09-27-2017
01:59 PM
I have a big hive query, a MERGE statement with a 50GB source table and a 0 GB destination table. When running it fails because the partition hosting /hadoop/yarn/local (this dir has its own partition) on one data node is filled up to 90%, ie. the yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage threshold. I could extend this partition, increase the setting or add another datanode, but the issue is likely to prop again when I need to merge 100GB, and then again for 250GB and so on. What is the best way to sort this out? I could manually break the query (adding WHERE on dates for instance) but I have the feeling that there should be a cleaner solution. Context: small (3 datanodes) HDP2.6 cluster on AWS.
... View more
Labels:
09-27-2017
12:25 PM
I have a graphite server, to which I want to send Hadoop metrics2. On paper it's easy. Just add log4j.logger.org.apache.hadoop.metrics2=DEBUG to the log4j template and update hadoop-metrics2.properties template with: *.sink.graphite.class=org.apache.hadoop.metrics2.sink.GraphiteSink
*.sink.graphite.server_host=10.x.x.x
*.sink.graphite.server_port=2003
datanode.sink.graphite.metrics_prefix=datanode
namenode.sink.graphite.metrics_prefix=namenode
resourcemanager.sink.graphite.metrics_prefix=resourcemanager
nodemanager.sink.graphite.metrics_prefix=nodemanager
jobhistoryserver.sink.graphite.metrics_prefix=jobhistoryserver
journalnode.sink.graphite.metrics_prefix=journalnode
maptask.sink.graphite.metrics_prefix=maptask
reducetask.sink.graphite.metrics_prefix=reducetask
applicationhistoryserver.sink.graphite.metrics_prefix=applicationhistoryserver
It works very well with one service (eg. datanode). If I put more than one, I will only get 2 services in graphite, and I cannot confirm that all metrics for those services are present. Not knowing what metrics to expect and wanting to experiment, I do not want to filter on actual metric to limit their number. On collectd side I can see one metric dropped (invalid), but one metric only. It does not account for all the rest. Furthemore, setting CollectInternalStats to true shows me that no metrics is dropped. On Hadoop side... Well, I could not find anything telling me if metrics ar actually sent or not, if it succeeds or fail... Not logging anywhere. So my 2 questions are: How can I debug metrics2? Is there any known reasons why I am missing metrics? Context: hdp2.6 on AWS.
... View more
09-26-2017
05:46 AM
I agree with your assessment (files cannot be written to HDFS) but my problem is that as far as I know HDFS is in an healthy state: all ambari lights are green, no under replicated blocks, fsck is happy, I can indeed write even huge files on HDFS... If you are aware of other checks I could perform I would love to know about them. Thanks,
... View more
09-20-2017
01:51 PM
I have a Hive MERGE query, reading avro files to write ORC files. The avro files are input data, and the ORC files will be my main database. The merge query almost completes, but always end up failing. The relevant log lines (I think) are: # Just before failing, still good
[Thread-9646]: monitoring.TezJobMonitor$UpdateFunction (TezJobMonitor.java:update(1
37)) - Map 1: 19/19 Map 5: 80/80 Reducer 2: 1009/1009 Reducer 3: 9(+0)/10 Reducer 4: 1(+8,-19)/10
# a few more log.PerfLogger line...
Vertex failed, vertexName=Reducer 4, vertexId=vertex_1502360038800_0027_2_03, diagnostics=[Task failed, taskId=task_1502360038800_0027_2_03_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {}
...
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /dwh/vault/contact/.hive-staging_hive
_2017-09-20_08-56-51_838_2864382824593930489-1/_task_tmp.-ext-10000/name=hsys/id=46/_tmp.000000_0/delta_0000076_0000076_0000/bucket_00000 could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation. During this query run I could see that yarn memory was quite high (91%). There was no other things I notices, except this repeated in hadoop-hdfs-namenode.log WARN blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(385)) - Failed to place enough replicas, still in need of 3 to reach 3 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable: unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} I could not find myself anything relevant related to this error. An fsck (after query failure) does not give any error. Ambari does not find any under replicated blocks. Any idea if there is a usual culprit for this error, or where I could look? Thanks. Small (1 ambari, 3 DN) hdp2.6 cluster, on AWS.
... View more
Labels:
07-06-2017
10:44 AM
@Vani I am trying to understand what will this memory be used for. My understanding is that: any application will require its own AM one AM will use 1 container only tez-site/tez.am.resource.memory.mb defines the memory usable by the total of all AM So logically all AM memory should never be more than half of the available memory (for the worst case scenario where all application only use one container) I should allocate in tez-site/tez.am.resource.memory.mb (minimum container size * expected number of applications) Could you confirm my understanding?
... View more
07-04-2017
01:22 PM
@Vani, Thanks for your answer. I do not see an immediate change, but I carry on looking in this direction. What would be a good logical value for this maximum-am-resource-percent? Currently the AM memory (tez-site/tez.am.resource.memory.mb) is set to the min container size (5GB in my case). Does that make sense?
... View more
07-03-2017
02:27 PM
I have a small one node hdp2.6 cluster (8 CPUs, 32GB ram), and I cannot run more than 1 query at a time, although I was pretty sure that I configures the relevant settings to allow more than one container. The relevant configs are: yarn-site/yarn.nodemanager.resource.memory-mb = 27660
yarn-site/yarn.scheduler.minimum-allocation-mb = 5532
yarn-site/yarn.scheduler.maximum-allocation-mb = 27660
mapred-site/mapreduce.map.memory.mb = 5532
mapred-site/mapreduce.reduce.memory.mb = 11064
mapred-site/mapreduce.map.java.opts = -Xmx4425m
mapred-site/mapreduce.reduce.java.opts = -Xmx8851m
mapred-site/yarn.app.mapreduce.am.resource.mb = 11059
mapred-site/yarn.app.mapreduce.am.command-opts = -Xmx8851m -Dhdp.version=${hdp.version}
hive-site/hive.execution.engine = tez
hive-site/hive.tez.container.size = 5532
hive-site/hive.auto.convert.join.noconditionaltask.size = 1546859315
tez-site/tez.runtime.unordered.output.buffer.size-mb = 414
tez-interactive-site/tez.am.resource.memory.mb = 5532
tez-site/tez.am.resource.memory.mb = 5532
tez-site/tez.task.resource.memory.mb = 5532
tez-site/tez.runtime.io.sort.mb = 1351
hive-site/hive.tez.java.opts = -server -Xmx4425m -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps
capacity-scheduler/yarn.scheduler.capacity.resource-calculator = org.apache.hadoop.yarn.util.resource.DominantResourceCalculatororg.apache.hadoop.yarn.util.resource.DominantResourceCalculator
yarn-site/yarn.nodemanager.resource.cpu-vcores = 6
yarn-site/yarn.scheduler.maximum-allocation-vcores = 6
mapred-site/mapreduce.map.output.compress = true
hive-site/hive.exec.compress.intermediate = true
hive-site/hive.exec.compress.output = true
hive-interactive-env/enable_hive_interactive = false
Which if I understand it well, gives 5GB per container. If I run a hive query, it will use 5GB, 1 core, leaving about 15GB and 5 cores for the rest. I do not understand why the next query cannot start at the same time. Any help would be much welcome.
... View more
- Tags:
- Data Processing
- Hive
Labels:
06-15-2017
08:06 AM
I was using hive 1 with hive.server2.enable.doas=true. Now I want to use hive-interactive, but hive.server2.enable.doas has to be false apparently (that is what ambari says). This of course makes most of my queries break because of wrong permissions. I am curious to know why this setting cannot be true is there know workaround for this. Context: hdp 2.6 with hive and hive-interactive. Thanks!
... View more
Labels:
06-15-2017
05:34 AM
Thanks, but I am not interested in this surrogate key. The point of defining the PK was to help eg. reporting tools to find out automatically joins between tables. This surrogate key would thus not do. Thanks!
... View more
06-14-2017
02:10 PM
The example I gave was a trimmed-down version of what I wanted to do to show the technical problem. My expected PK is actually a compound PK, with a few partitioned columns and a few non-partitioned columns. But I am afraid that your answer says it all, no can do :(. Thanks!
... View more
06-14-2017
10:54 AM
I want to add primary key constraints to hive tables. The only think is that my PK is actually a partitioned column. For instance: CREATE TABLE pk
(
id INT,
PRIMARY KEY(part) DISABLE NOVALIDATE
)
PARTITIONED BY (part STRING) This fails with the error message: DBCException: SQL Error [10002] [42000]: Error while compiling statement: FAILED: SemanticException [Error 10002]: Invalid column reference part Is there a way to use a partitioned column as PK? Context: hdp 2.6, hive 2.1 with llap.
... View more
- Tags:
- Data Processing
- Hive
Labels:
06-14-2017
10:52 AM
I want to add primary/foreign key constraints to a hive table. The only think is that my PK is actually a partitioned column. For instance: CREATE TABLE pk
(
id INT,
PRIMARY KEY(part) DISABLE NOVALIDATE
)
PARTITIONED BY (part STRING) This fails with the error message: DBCException: SQL Error [10002] [42000]: Error while compiling statement: FAILED: SemanticException [Error 10002]: Invalid column reference part Is there a way to use a partitioned column as PK? Context: hp 2.6, hive 2.1 with llap.
... View more
- Tags:
- Data Processing
- Hive
Labels:
04-24-2017
12:06 PM
The answer is that is is not possible to set those parameters globally. @Murali Ramasami has the right workaround.
... View more
04-21-2017
10:09 AM
Been there, done that, no luck 😞
... View more
04-21-2017
10:09 AM
I indeed found this option, but the file still needs to be generated per workflow, there is no way to use global configuration.
... View more
03-17-2017
06:54 AM
While trying to use Oozie properly, I ended up setting a few parameters, namely:
oozie.launcher.mapreduce.map.memory.mb oozie.launcher.mapreduce.map.java.opts oozie.launcher.yarn.app.mapreduce.am.resource.mb oozie.launcher.mapred.job..queue.name If I set them in the worfklow configuration, they work as expected. Is there a way/a place to set them globally, ie. not per workflow? I was expecting that custom-oozie-site.xml would be the right place but apparently not (they have no effect if put there). Is the workflow itself the only place where they can be configured? If it is relevant, I am using hdp 2.5.
... View more
Labels:
- « Previous
-
- 1
- 2
- Next »