Support Questions

Kubik · ‎08-08-2023

Hi team, I have a JOB that runs in SQOOP to get data and during the MapReduce process, I see that it runs up to 45% and then resumes processing from scratch causing duplicate data.

Example:
23/08/05 04:42:20 INFO mapreduce.Job: map 39% reduce 0%
23/08/05 04:42:29 INFO mapreduce.Job: map 40% reduce 0%
23/08/05 04:42:31 INFO mapreduce.Job: map 41% reduce 0%
23/08/05 04:42:41 INFO mapreduce.Job: map 42% reduce 0%
23/08/05 04:42:48 INFO mapreduce.Job: map 43% reduce 0%
23/08/05 04:42:50 INFO mapreduce.Job: map 44% reduce 0%
23/08/05 04:42:57 INFO mapreduce.Job: map 45% reduce 0%
23/08/05 04:52:39 INFO mapreduce.Job: map 0% reduce 0%
23/08/05 04:52:50 INFO mapreduce.Job: map 2% reduce 0%
23/08/05 04:52:59 INFO mapreduce.Job: map 3% reduce 0%
23/08/05 04:53:10 INFO mapreduce.Job: map 5% reduce 0%
23/08/05 04:53:20 INFO mapreduce.Job: map 6% reduce 0%

We use Cloudera Express 5.8.3, Java Version 1.7.0_67, 2 NameNode's Servers and 18 DataNode's servers.

DianaTorres · ‎08-08-2023

@Kubik Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Cloudera Manager experts @paras @tj2007

who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.

Regards,

Diana Torres,
Senior Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Support Questions

Problems with MapReduce