Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

SQL Server table to Hive table using Sqoop

avatar
Contributor

sqoop import --connect 'jdbc:sqlserver://data-dev.dev.eso.local;database=KUDU_40M' \
--username xxxx --password xxxx --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--table dim_PatientEncounter --hive-table kudu_40m.dim_PatientEncounter --create-hive-table --hive-import --warehouse-dir /user/hive/warehouse/testdb -m 4

 

The table has 30 million+ rows.

 

Running the above command, always seems to hang at mapper 3 of 4. Nothing much in the logs besides:

2018-05-01 08:22:53,364 INFO [IPC Server handler 7 on 45249] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1525122694968_0001_m_000003_0 is : 0.0

 

Changes I made:

mapreduce.map.memory.mb = 5 GB

mapreduce.reduce.memory.mb = 5GB

yarn.nodemanager.resource.memory-mb = 20GB (NodeManager Default Group)

yarn.nodemanager.resource.memory-mb = 8GB (NodeManager Group 1)

 

Cluster MetricsApps Submitted Apps Pending Apps Running Apps Completed Containers Running Memory Used Memory Total Memory Reserved VCores Used VCores Total VCores Reserved
1 0 1 0 3 11 GB 68 GB 0 B 3 32 0
Cluster Nodes MetricsActive Nodes Decommissioning Nodes Decommissioned Nodes Lost Nodes Unhealthy Nodes Rebooted Nodes
4 0 0 0 0 0
User Metrics for dr.whoApps Submitted Apps Pending Apps Running Apps Completed Containers Running Containers Pending Containers Reserved Memory Used Memory Pending Memory Reserved VCores Used VCores Pending VCores Reserved
0 0 0 0 0 0 0 0 B 0 B 0 B 0 0 0
Show 20406080100 entries
Search: 
 
IDUserNameApplication TypeQueueStartTimeFinishTimeStateFinalStatusRunning ContainersAllocated CPU VCoresAllocated Memory MBReserved CPU VCoresReserved Memory MBProgressTracking UI
application_1525122694968_0001 root dim_PatientEncounter.jar MAPREDUCE root.users.root Tue May 1 08:11:49 -0500 2018 N/A RUNNING UNDEFINED 3 3 11264 0 0   ApplicationMaster
 
Any help is appreciated? Also, this is in CDH 5.14, so Sqoop1 is being used by default, I believe. Where are the Sqoop logs written for this, I cannot seem to find the logs besides the Yarn logs.
 
1 ACCEPTED SOLUTION

avatar
Super Collaborator

You are using sqoop1.

Sqoop1 is not a service, it is a tool, that submits the job to YARN.

So, apart from your stdout and YARN logs, there are no sqoop logs.

 

Number of mappers (-m 4) means that your job will open 4 connections to your database. If there is no indication in logs about out of memory or illegal value on a column (I mean yarn logs), then you should check that your DB can accept 4 concurrent connections.

View solution in original post

1 REPLY 1

avatar
Super Collaborator

You are using sqoop1.

Sqoop1 is not a service, it is a tool, that submits the job to YARN.

So, apart from your stdout and YARN logs, there are no sqoop logs.

 

Number of mappers (-m 4) means that your job will open 4 connections to your database. If there is no indication in logs about out of memory or illegal value on a column (I mean yarn logs), then you should check that your DB can accept 4 concurrent connections.