Support Questions

Find answers, ask questions, and share your expertise

SQL Server table to Hive table using Sqoop

avatar
Contributor

sqoop import --connect 'jdbc:sqlserver://data-dev.dev.eso.local;database=KUDU_40M' \
--username xxxx --password xxxx --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--table dim_PatientEncounter --hive-table kudu_40m.dim_PatientEncounter --create-hive-table --hive-import --warehouse-dir /user/hive/warehouse/testdb -m 4

 

The table has 30 million+ rows.

 

Running the above command, always seems to hang at mapper 3 of 4. Nothing much in the logs besides:

2018-05-01 08:22:53,364 INFO [IPC Server handler 7 on 45249] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1525122694968_0001_m_000003_0 is : 0.0

 

Changes I made:

mapreduce.map.memory.mb = 5 GB

mapreduce.reduce.memory.mb = 5GB

yarn.nodemanager.resource.memory-mb = 20GB (NodeManager Default Group)

yarn.nodemanager.resource.memory-mb = 8GB (NodeManager Group 1)

 

Cluster MetricsApps Submitted Apps Pending Apps Running Apps Completed Containers Running Memory Used Memory Total Memory Reserved VCores Used VCores Total VCores Reserved
1 0 1 0 3 11 GB 68 GB 0 B 3 32 0
Cluster Nodes MetricsActive Nodes Decommissioning Nodes Decommissioned Nodes Lost Nodes Unhealthy Nodes Rebooted Nodes
4 0 0 0 0 0
User Metrics for dr.whoApps Submitted Apps Pending Apps Running Apps Completed Containers Running Containers Pending Containers Reserved Memory Used Memory Pending Memory Reserved VCores Used VCores Pending VCores Reserved
0 0 0 0 0 0 0 0 B 0 B 0 B 0 0 0
Show 20406080100 entries
Search: 
 
IDUserNameApplication TypeQueueStartTimeFinishTimeStateFinalStatusRunning ContainersAllocated CPU VCoresAllocated Memory MBReserved CPU VCoresReserved Memory MBProgressTracking UI
application_1525122694968_0001 root dim_PatientEncounter.jar MAPREDUCE root.users.root Tue May 1 08:11:49 -0500 2018 N/A RUNNING UNDEFINED 3 3 11264 0 0   ApplicationMaster
 
Any help is appreciated? Also, this is in CDH 5.14, so Sqoop1 is being used by default, I believe. Where are the Sqoop logs written for this, I cannot seem to find the logs besides the Yarn logs.
 
1 ACCEPTED SOLUTION

avatar
Super Collaborator

You are using sqoop1.

Sqoop1 is not a service, it is a tool, that submits the job to YARN.

So, apart from your stdout and YARN logs, there are no sqoop logs.

 

Number of mappers (-m 4) means that your job will open 4 connections to your database. If there is no indication in logs about out of memory or illegal value on a column (I mean yarn logs), then you should check that your DB can accept 4 concurrent connections.

View solution in original post

1 REPLY 1

avatar
Super Collaborator

You are using sqoop1.

Sqoop1 is not a service, it is a tool, that submits the job to YARN.

So, apart from your stdout and YARN logs, there are no sqoop logs.

 

Number of mappers (-m 4) means that your job will open 4 connections to your database. If there is no indication in logs about out of memory or illegal value on a column (I mean yarn logs), then you should check that your DB can accept 4 concurrent connections.