Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

For ORC File what determines the number of mappers?

avatar
New Contributor

For Orc file, how does yarn determine number of mapper. Is this based on the files in the hdfs?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Aron,

Initially getsplits method splits the data based on the blocks in HDFS.But it was changed so that splitting is based on stripes of orc file.

https://issues.apache.org/jira/browse/HIVE-5102

The above link provides the complete link for details and source code for OrcInputformat and getSplit Method

View solution in original post

3 REPLIES 3

avatar
Master Mentor

OrcInputFormat is an implementation of InputFormat interface, where method getSplits determines the number of mappers https://hive.apache.org/javadocs/r0.13.1/api/ql/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.html.

avatar
Expert Contributor

@Aron,

Initially getsplits method splits the data based on the blocks in HDFS.But it was changed so that splitting is based on stripes of orc file.

https://issues.apache.org/jira/browse/HIVE-5102

The above link provides the complete link for details and source code for OrcInputformat and getSplit Method

avatar
New Contributor