Hi team, I have a JOB that runs in SQOOP to get data and during the MapReduce process, I see that it runs up to 45% and then resumes processing from scratch causing duplicate data.
Example:
23/08/05 04:42:20 INFO mapreduce.Job: map 39% reduce 0%
23/08/05 04:42:29 INFO mapreduce.Job: map 40% reduce 0%
23/08/05 04:42:31 INFO mapreduce.Job: map 41% reduce 0%
23/08/05 04:42:41 INFO mapreduce.Job: map 42% reduce 0%
23/08/05 04:42:48 INFO mapreduce.Job: map 43% reduce 0%
23/08/05 04:42:50 INFO mapreduce.Job: map 44% reduce 0%
23/08/05 04:42:57 INFO mapreduce.Job: map 45% reduce 0%
23/08/05 04:52:39 INFO mapreduce.Job: map 0% reduce 0%
23/08/05 04:52:50 INFO mapreduce.Job: map 2% reduce 0%
23/08/05 04:52:59 INFO mapreduce.Job: map 3% reduce 0%
23/08/05 04:53:10 INFO mapreduce.Job: map 5% reduce 0%
23/08/05 04:53:20 INFO mapreduce.Job: map 6% reduce 0%
We use Cloudera Express 5.8.3, Java Version 1.7.0_67, 2 NameNode's Servers and 18 DataNode's servers.