Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to Determine Sqoop Task Memory Requirements??


How to Determine Sqoop Task Memory Requirements??


We want to troubleshoot Sqoop export action that is failing randomly (like once in 10 days) because mapper task is exceeding its memory limit. The job runs successfully 99% of times with same config. Interestingly if failed job is rerun without any changes it always runs successfully.

What we looked at so far:

  1. Container and mapper minimum set to 3GB at cluster level and job uses this settings
  2. Rerun of failed job without any changes to data or config is successful (always)
  3. Analysis of successful job counters shows that all mappers (set to 20 in job config) process between 900K-1M records
  4. Pepperdata shows that avg memory consumption of each mapper task for successful runs is always < 1GB
  5. Sqoop action dosent use any primary key or split-by flags
  6. The number of rows processed by Sqoop is approx same for all days (< 1000 rows delta)

We understand that fix is to increase mapper memory limit but:

  1. What is causing mapper in failed job to consume >3GB and when rerun it finishes successfully with <1GB
  2. If there is a way to do determine memory requirements of Sqoop mapper task(using any profiling tools) so they can avoid such random failures

Re: How to Determine Sqoop Task Memory Requirements??

@Rahul Reddy

1) Are you doing any ETL process before loading data into target DB?

2) Is it failing in Map stage or reducer stage?

4) Ideally required mappers are 10 to process your 900 K to 1 M records. Is there any specific reason why you are setting mappers explicitly? YARN will assign the mappers appropriately.

4) Can you attach your sqoop log?

5) attach your sqoop export script?

Don't have an account?
Coming from Hortonworks? Activate your account here