Created on 05-01-2018 06:34 AM - edited 09-16-2022 06:09 AM
sqoop import --connect 'jdbc:sqlserver://data-dev.dev.eso.local;database=KUDU_40M' \
--username xxxx --password xxxx --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--table dim_PatientEncounter --hive-table kudu_40m.dim_PatientEncounter --create-hive-table --hive-import --warehouse-dir /user/hive/warehouse/testdb -m 4
The table has 30 million+ rows.
Running the above command, always seems to hang at mapper 3 of 4. Nothing much in the logs besides:
2018-05-01 08:22:53,364 INFO [IPC Server handler 7 on 45249] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1525122694968_0001_m_000003_0 is : 0.0
Changes I made:
mapreduce.map.memory.mb = 5 GB
mapreduce.reduce.memory.mb = 5GB
yarn.nodemanager.resource.memory-mb = 20GB (NodeManager Default Group)
yarn.nodemanager.resource.memory-mb = 8GB (NodeManager Group 1)
1 | 0 | 1 | 0 | 3 | 11 GB | 68 GB | 0 B | 3 | 32 | 0 |
4 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 B | 0 B | 0 B | 0 | 0 | 0 |
application_1525122694968_0001 | root | dim_PatientEncounter.jar | MAPREDUCE | root.users.root | Tue May 1 08:11:49 -0500 2018 | N/A | RUNNING | UNDEFINED | 3 | 3 | 11264 | 0 | 0 | ApplicationMaster |
Created 05-04-2018 02:27 AM
You are using sqoop1.
Sqoop1 is not a service, it is a tool, that submits the job to YARN.
So, apart from your stdout and YARN logs, there are no sqoop logs.
Number of mappers (-m 4) means that your job will open 4 connections to your database. If there is no indication in logs about out of memory or illegal value on a column (I mean yarn logs), then you should check that your DB can accept 4 concurrent connections.
Created 05-04-2018 02:27 AM
You are using sqoop1.
Sqoop1 is not a service, it is a tool, that submits the job to YARN.
So, apart from your stdout and YARN logs, there are no sqoop logs.
Number of mappers (-m 4) means that your job will open 4 connections to your database. If there is no indication in logs about out of memory or illegal value on a column (I mean yarn logs), then you should check that your DB can accept 4 concurrent connections.