Created on 12-26-2015 09:17 PM - edited 09-16-2022 02:54 AM
For Orc file, how does yarn determine number of mapper. Is this based on the files in the hdfs?
Created 12-27-2015 06:25 AM
@Aron,
Initially getsplits method splits the data based on the blocks in HDFS.But it was changed so that splitting is based on stripes of orc file.
https://issues.apache.org/jira/browse/HIVE-5102
The above link provides the complete link for details and source code for OrcInputformat and getSplit Method
Created 12-27-2015 04:31 AM
OrcInputFormat is an implementation of InputFormat interface, where method getSplits determines the number of mappers https://hive.apache.org/javadocs/r0.13.1/api/ql/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.html.
Created 12-27-2015 06:25 AM
@Aron,
Initially getsplits method splits the data based on the blocks in HDFS.But it was changed so that splitting is based on stripes of orc file.
https://issues.apache.org/jira/browse/HIVE-5102
The above link provides the complete link for details and source code for OrcInputformat and getSplit Method
Created 12-27-2015 08:11 AM