Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

import data from Mysql to HDFS by sqoop

Highlighted

import data from Mysql to HDFS by sqoop

Explorer

Hi team,

I have dumped data from another mysql database (prod) to a mysql database server (stg) (128 GB RAM, 32 CPUs), the databases after dump data to stg server we have 10 DBs, each DBs have only 1 table with each tables have 1 billion records. I have try to use sqoop to import the data from stg DB to HDFS but some DBs import failed with error like:

17/03/08 23:26:48 INFO mapreduce.Job:  map 16% reduce 0%
17/03/08 23:27:55 INFO mapreduce.Job:  map 17% reduce 0%
17/03/08 23:31:45 INFO mapreduce.Job:  map 18% reduce 0%
17/03/08 23:32:16 INFO mapreduce.Job:  map 19% reduce 0%
17/03/08 23:36:42 INFO mapreduce.Job:  map 20% reduce 0%
17/03/08 23:43:45 INFO mapreduce.Job: Task Id : attempt_1488958162821_0026_m_000027_0, Status : FAILED
AttemptID:attempt_1488958162821_0026_m_000027_0 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


17/03/08 23:44:45 INFO mapreduce.Job:  map 21% reduce 0%
17/03/08 23:44:45 INFO mapreduce.Job: Task Id : attempt_1488958162821_0026_m_000060_0, Status : FAILED
AttemptID:attempt_1488958162821_0026_m_000060_0 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


17/03/08 23:44:46 INFO mapreduce.Job:  map 20% reduce 0%
17/03/08 23:46:43 INFO mapreduce.Job:  map 21% reduce 0%
17/03/08 23:58:21 INFO mapreduce.Job:  map 22% reduce 0%
17/03/08 23:59:14 INFO mapreduce.Job: Task Id : attempt_1488958162821_0026_m_000059_0, Status : FAILED
AttemptID:attempt_1488958162821_0026_m_000059_0 Timed out after 300 secs
17/03/09 00:08:14 INFO mapreduce.Job: Task Id : attempt_1488958162821_0026_m_000026_0, Status : FAILED
AttemptID:attempt_1488958162821_0026_m_000026_0 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


17/03/09 00:10:14 INFO mapreduce.Job: Task Id : attempt_1488958162821_0026_m_000057_0, Status : FAILED
AttemptID:attempt_1488958162821_0026_m_000057_0 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


17/03/09 00:10:44 INFO mapreduce.Job: Task Id : attempt_1488958162821_0026_m_000045_0, Status : FAILED
AttemptID:attempt_1488958162821_0026_m_000045_0 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


17/03/09 00:11:44 INFO mapreduce.Job: Task Id : attempt_1488958162821_0026_m_000058_0, Status : FAILED
AttemptID:attempt_1488958162821_0026_m_000058_0 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


17/03/09 00:16:45 INFO mapreduce.Job: Task Id : attempt_1488958162821_0026_m_000055_0, Status : FAILED
AttemptID:attempt_1488958162821_0026_m_000055_0 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


17/03/09 00:17:15 INFO mapreduce.Job: Task Id : attempt_1488958162821_0026_m_000056_0, Status : FAILED
AttemptID:attempt_1488958162821_0026_m_000056_0 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


17/03/09 00:18:14 INFO mapreduce.Job: Task Id : attempt_1488958162821_0026_m_000046_0, Status : FAILED
AttemptID:attempt_1488958162821_0026_m_000046_0 Timed out after 300 secs
17/03/09 00:23:14 INFO mapreduce.Job: Task Id : attempt_1488958162821_0026_m_000025_0, Status : FAILED
AttemptID:attempt_1488958162821_0026_m_000025_0 Timed out after 300 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

Note: The hadoop cluster and Mysql stg in the same network. We have 3 node in cluster with 128GB RAM, 32 cpus, 11 TB HDD.

here is my sqoop command:

sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://db_server:3306/db_import?zeroDateTimeBehavior=convertToNull --username export_user --password secret --table Table --as-parquetfile --compression-codec snappy --warehouse-dir /data_imported --fetch-size 1000 --num-mappers 200

please give the idea of this error and how to fix this.

3 REPLIES 3

Re: import data from Mysql to HDFS by sqoop

Rising Star

@Tin Huynh Since the db dump is huge MR is getting timed out,

try using this option in the Sqoop job to customize timeout.

-D mapred.task.timeout=1500

Highlighted

Re: import data from Mysql to HDFS by sqoop

Contributor

At times too many mappers also can be an overkill. More the number, better it is might not be the case. Try to see what is the optimum value for your mappers and also specify a --split-by if know the key that can be partitioned properly. The timeout might happen too if the other threads are blocking the queue. Increasing the timeout is a way to do it but you need to know what is causing the timeout to have a better fix.

Highlighted

Re: import data from Mysql to HDFS by sqoop

Explorer

thank you all.

I have resolved this.

I only increase --num-mappers 2000.

Because sqoop migrate a huge of size of the data in each package so it get problem with network IO. So I have increase the number of mapper for separate the size of data in each packages.

Don't have an account?
Coming from Hortonworks? Activate your account here