Member since
04-13-2018
1
Post
0
Kudos Received
0
Solutions
04-13-2018
07:54 AM
Hi All,
We’ve installed SolR and now we are trying to index a CSV file with 250 million rows using the toll “MapReduceIndexerTool”.
After 90 minutes running, we are receiving the error:
8/04/13 13:44:06 INFO mapreduce.Job: map 67% reduce 0% 18/04/13 13:49:13 INFO mapreduce.Job: map 100% reduce 0% 18/04/13 13:49:19 INFO mapreduce.Job: Task Id : attempt_1523546159827_0013_r_000000_0, Status : FAILED Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#10 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/attempt_1523546159827_0013_r_000000_0/map_0.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) at org.apache.hadoop.mapred.YarnOutputFiles.getInputFileForWrite(YarnOutputFiles.java:213) at org.apache.hadoop.mapreduce.task.reduce.OnDiskMapOutput.<init>(OnDiskMapOutput.java:65) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:269) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:539) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:348) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
The command we execute is:
sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-*/jars/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool --morphline-file /tmp/morphlines_pocsim.conf --output-dir hdfs://vitl000361:8020/user/hdfs/pocsims --verbose --go-live --collection pocsims --zk localhost:2181/solr hdfs:///data/pocsims/trim-exported.csv
the morphline file used:
SOLR_LOCATOR : { collection : POC_Sims zkHost : "vitl000367:2181/solr,vitl000368:2181/solr,vitl000369:2181/solr" batchSize : 1000 # batchSize }
morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { readCSV { separator : "," columns : [ID,DT_CR,DT_FU,DT_AT,HBL,HBT,EC,ST,CUS,CSP,RAG,CSP_N,NGIN,SIP] ignoreFirstLine : true quoteChar : "" trim : false charset : UTF-8 } } { generateUUID { field : id } } { logDebug { format : "output record: {}", args : ["@{}"] }} { sanitizeUnknownSolrFields { solrLocator : ${SOLR_LOCATOR} }} { loadSolr { solrLocator : ${SOLR_LOCATOR} } } ] } ]
Could you please help us with this problem?
Regards,
Ricardo Matos
... View more
Labels:
- Labels:
-
Apache Solr
-
HDFS
-
MapReduce