<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: MapReduceIndexerTool problem in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduceIndexerTool-problem/m-p/66457#M77177</link>
    <description>&lt;P&gt;I use the same command and have no issues.&lt;/P&gt;&lt;P&gt;According to logs:&lt;/P&gt;&lt;PRE&gt;Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/attempt_1523546159827_0013_r_000000_0/map_0.out&lt;/PRE&gt;&lt;P&gt;So, I would guess that you csv is too big and when the reducer tries to load it, there is no sufficient space in local dirs of YARN nodemanager.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you try set more reducers by using :&lt;/P&gt;&lt;PRE&gt;--reducers 4&lt;/PRE&gt;&lt;P&gt;or more (based on your partitions and the csv size). You can also set more mappers, but based on log the reducer is suffering.&lt;/P&gt;&lt;P&gt;More details:&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.cloudera.com/documentation/enterprise/5-13-x/topics/search_mapreduceindexertool.html#concept_pjs_3sd_3v" target="_self"&gt;https://www.cloudera.com/documentation/enterprise/5-13-x/topics/search_mapreduceindexertool.html#concept_pjs_3sd_3v&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 17 Apr 2018 15:27:12 GMT</pubDate>
    <dc:creator>GeKas</dc:creator>
    <dc:date>2018-04-17T15:27:12Z</dc:date>
    <item>
      <title>MapReduceIndexerTool problem</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduceIndexerTool-problem/m-p/66324#M77176</link>
      <description>&lt;P&gt;Hi&amp;nbsp;All,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We’ve installed SolR and now we are trying to index a CSV file with 250 million rows using the toll “MapReduceIndexerTool”.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;After 90 minutes running, we are receiving the error:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;8/04/13 13:44:06 INFO mapreduce.Job: map 67% reduce 0%&lt;BR /&gt;18/04/13 13:49:13 INFO mapreduce.Job: map 100% reduce 0%&lt;BR /&gt;18/04/13 13:49:19 INFO mapreduce.Job: Task Id : attempt_1523546159827_0013_r_000000_0, Status : FAILED&lt;BR /&gt;Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#10&lt;BR /&gt;at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)&lt;BR /&gt;at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)&lt;BR /&gt;at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)&lt;BR /&gt;at java.security.AccessController.doPrivileged(Native Method)&lt;BR /&gt;at javax.security.auth.Subject.doAs(Subject.java:422)&lt;BR /&gt;at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)&lt;BR /&gt;at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)&lt;BR /&gt;Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/attempt_1523546159827_0013_r_000000_0/map_0.out&lt;BR /&gt;at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441)&lt;BR /&gt;at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)&lt;BR /&gt;at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)&lt;BR /&gt;at org.apache.hadoop.mapred.YarnOutputFiles.getInputFileForWrite(YarnOutputFiles.java:213)&lt;BR /&gt;at org.apache.hadoop.mapreduce.task.reduce.OnDiskMapOutput.&amp;lt;init&amp;gt;(OnDiskMapOutput.java:65)&lt;BR /&gt;at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:269)&lt;BR /&gt;at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:539)&lt;BR /&gt;at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:348)&lt;BR /&gt;at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The command we execute is:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-*/jars/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool --morphline-file /tmp/morphlines_pocsim.conf --output-dir hdfs://vitl000361:8020/user/hdfs/pocsims --verbose --go-live --collection pocsims --zk localhost:2181/solr hdfs:///data/pocsims/trim-exported.csv&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;the morphline file used:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SOLR_LOCATOR :&lt;BR /&gt;{ collection : POC_Sims&lt;BR /&gt;zkHost : "vitl000367:2181/solr,vitl000368:2181/solr,vitl000369:2181/solr"&lt;BR /&gt;batchSize : 1000 # batchSize&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;morphlines : [&lt;BR /&gt;{ id : morphline1&lt;BR /&gt;importCommands : ["org.kitesdk.**", "org.apache.solr.**"]&lt;BR /&gt;commands : [&lt;BR /&gt;{ readCSV {&lt;BR /&gt;separator : ","&lt;BR /&gt;columns : [ID,DT_CR,DT_FU,DT_AT,HBL,HBT,EC,ST,CUS,CSP,RAG,CSP_N,NGIN,SIP]&lt;BR /&gt;ignoreFirstLine : true&lt;BR /&gt;quoteChar : ""&lt;BR /&gt;trim : false&lt;BR /&gt;charset : UTF-8&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;{ generateUUID { field : id } }&lt;BR /&gt;{ logDebug { format : "output record: {}", args : ["@{}"] }}&lt;BR /&gt;{ sanitizeUnknownSolrFields { solrLocator : ${SOLR_LOCATOR} }}&lt;BR /&gt;{ loadSolr { solrLocator : ${SOLR_LOCATOR} } }&lt;BR /&gt;]&lt;BR /&gt;}&lt;BR /&gt;]&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please help us with this problem?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Ricardo Matos&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 13:06:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduceIndexerTool-problem/m-p/66324#M77176</guid>
      <dc:creator>rribeirom</dc:creator>
      <dc:date>2022-09-16T13:06:05Z</dc:date>
    </item>
    <item>
      <title>Re: MapReduceIndexerTool problem</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduceIndexerTool-problem/m-p/66457#M77177</link>
      <description>&lt;P&gt;I use the same command and have no issues.&lt;/P&gt;&lt;P&gt;According to logs:&lt;/P&gt;&lt;PRE&gt;Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/attempt_1523546159827_0013_r_000000_0/map_0.out&lt;/PRE&gt;&lt;P&gt;So, I would guess that you csv is too big and when the reducer tries to load it, there is no sufficient space in local dirs of YARN nodemanager.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you try set more reducers by using :&lt;/P&gt;&lt;PRE&gt;--reducers 4&lt;/PRE&gt;&lt;P&gt;or more (based on your partitions and the csv size). You can also set more mappers, but based on log the reducer is suffering.&lt;/P&gt;&lt;P&gt;More details:&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.cloudera.com/documentation/enterprise/5-13-x/topics/search_mapreduceindexertool.html#concept_pjs_3sd_3v" target="_self"&gt;https://www.cloudera.com/documentation/enterprise/5-13-x/topics/search_mapreduceindexertool.html#concept_pjs_3sd_3v&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Apr 2018 15:27:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/MapReduceIndexerTool-problem/m-p/66457#M77177</guid>
      <dc:creator>GeKas</dc:creator>
      <dc:date>2018-04-17T15:27:12Z</dc:date>
    </item>
  </channel>
</rss>

