Member since
02-18-2018
3
Posts
2
Kudos Received
0
Solutions
01-03-2019
01:25 PM
1 Kudo
Hi, I'd like to share a situation we encountered where 99% of our HDFS blocks were reported missing and we were able to recover them. We had a system with 2 namenodes with high availability enabled. For some reason, under the data folders of the datanodes, i.e /data0x/hadoop/hdfs/data/current - we had 2 Block Pools folders listed (example of such folder is BP-1722964902-1.10.237.104-1541520732855). There was one folder containing the IP of namenode1 and another containing the IP of namenode 2. All the data was under the BlockPool of namenode 1, but inside the VERSION files of the namenodes (/data0x/hadoop/hdfs/namenode/current/) the BlockPool id and the namespace ID were of namenode 2 - the namenode was looking for blocks in the wrong block pool folder. I don't know how we got to the point of having 2 block pools folders, but we did. In order to fix the problem - and get HDFS healthy again - we just needed to update the VERSION file on all the namenode disks (on both NN machines) and on all the journal node disks (on all JN machines), to point to Namenode 1. We then restarted HDFS and made sure all the blocks are
reported and there's no more missing blocks.
... View more
03-02-2017
07:23 AM
1 Kudo
We're working with Hive 1.3.1, and running an INSERT statement to upload data into Hive from an external table. I noticed that the execution plan has changed on the same table from yesterday compared to today. Yesterday the plan resulted in a M/R job with 341 mappers and 359 reducers, while today the plan resulted in a M/R job with only mappers and no reducers This is the query: explain insert OVERWRITE table managed_table PARTITION(col1) select [columns] from external_table
How does hive decide how to execute the query? How does it translate the insert select into map reduce? What would cause a plan to change? This is the first plan (omitting columns lists because the table has over 300 columns) <code>STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
Stage-2 depends on stages: Stage-0
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: externalevents
Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE
Select Operator [columns] outputColumnNames: [columns] Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col394 (type: bigint)
sort order: +
Map-reduce partition columns: _col394 (type: bigint)
Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE
value expressions: [columns] Reduce Operator Tree:
Select Operator
expressions: [columns] outputColumnNames: [columns] Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 479391 Data size: 91824480256 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: ceazip.events_test_hive
Stage: Stage-0
Move Operator
tables:
partition:
evtf_first_date_id
replace: true
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: ceazip.events_test_hive
Stage: Stage-2
Stats-Aggr Operator
<code>And the 2nd plan:
<code>STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: events_2017_02_21_11_13_18
Statistics: Num rows: 468680 Data size: 89772957696 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: [columns]
outputColumnNames: [columns]
Statistics: Num rows: 468680 Data size: 89772957696 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 468680 Data size: 89772957696 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.events_test1
Stage: Stage-7
Conditional Operator
Stage: Stage-4
Move Operator
files:
hdfs directory: true
destination: hdfs://...../apps/hive/warehouse/events_test1/.hive-staging_hive_2017-03-01_07-35-18_776_4958999242494325333-1/-ext-10000
Stage: Stage-0
Move Operator
tables:
partition:
evtf_first_date_id
replace: true
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.events_test1
Stage: Stage-2
Stats-Aggr Operator
Stage: Stage-3
Merge File Operator
Map Operator Tree:
ORC File Merge Operator
merge level: stripe
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
Stage: Stage-5
Merge File Operator
Map Operator Tree:
ORC File Merge Operator
merge level: stripe
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
Stage: Stage-6
Move Operator
files:
hdfs directory: true
destination: hdfs://isr-r0-aps-nam-1.lab.il.nice.com:8020/apps/hive/warehouse/events_test1/.hive-staging_hive_2017-03-01_07-35-18_776_4958999242494325333-1/-ext-10000
Thanks,
Lior
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive