- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
For ORC File what determines the number of mappers?
- Labels:
-
Apache YARN
-
HDFS
Created on ‎12-26-2015 09:17 PM - edited ‎09-16-2022 02:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For Orc file, how does yarn determine number of mapper. Is this based on the files in the hdfs?
Created ‎12-27-2015 06:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Aron,
Initially getsplits method splits the data based on the blocks in HDFS.But it was changed so that splitting is based on stripes of orc file.
https://issues.apache.org/jira/browse/HIVE-5102
The above link provides the complete link for details and source code for OrcInputformat and getSplit Method
Created ‎12-27-2015 04:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OrcInputFormat is an implementation of InputFormat interface, where method getSplits determines the number of mappers https://hive.apache.org/javadocs/r0.13.1/api/ql/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.html.
Created ‎12-27-2015 06:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Aron,
Initially getsplits method splits the data based on the blocks in HDFS.But it was changed so that splitting is based on stripes of orc file.
https://issues.apache.org/jira/browse/HIVE-5102
The above link provides the complete link for details and source code for OrcInputformat and getSplit Method
Created ‎12-27-2015 08:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
