Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

jpb submitted to mapreduce in Yarn is stuck while ingesting data using SQOOP

avatar
Explorer

I am ingesting data in CDH5 hdfs using SQOOP using mysql. The job is submitted to mapreduce, but there is no activity after I get mapreduce job id:

 

INFO mapreduce.JobSubmitter: number of splits:1
INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500040023027_0002
INFO impl.YarnClientImpl: Submitted application application_1500040023027_0002
INFO mapreduce.Job: The url to track the job: http://pc1.localdomain.com:8088/proxy/application_1500040023027_0002/
 INFO mapreduce.Job: Running job: job_1500040023027_0002

I have set up CDH5 on RHEL using cluster setup, but I have only one pc in cluster. I do see warnings to have atleast 3 datanodes, but I think it should not be an issue if I am not runninng huge activity.

 

Screenshot from 2017-07-14 12-23-41.png

I have also set the namenode and secondary namenode memory size to be 4GB. The block memory size is set to 64Mb. The log file size is also taken care of by setting them to 2GB minimum.

In Yarn settings, I have set root, and default  min and max cores to be 1 and 4, and min /max  memory to be 1 and 4 Gb

 

mapreduce screenshot shows that 0 VC and memory has been assigned to it.

Screenshot from 2017-07-14 19-47-13.png

Can somebody point me how to make it working.

1 ACCEPTED SOLUTION

avatar
Explorer
13 REPLIES 13

avatar
Champion

Were you able to check the Resource manager web ui ?

also check the resourcemanager / node manager logs

finally perform a simple sqoop list-database or list-tables just to see if the sqoop is working properly

did you place the mysql jdbc jar in sqoop path

avatar
Explorer

Following are the logs as I see:

 

resource manager web ui:

Security is off.

Safemode is off.

1,423 files and directories, 1,038 blocks = 2,461 total filesystem object(s).

Heap Memory used 246.75 MB of 3.97 GB Heap Memory. Max Heap Memory is 3.97 GB.

Non Heap Memory used 50.22 MB of 50.44 MB Commited Non Heap Memory. Max Non Heap Memory is 130 MB.
Configured Capacity:	44.98 GB
DFS Used:	821.78 MB (1.78%)
Non DFS Used:	11.41 GB
DFS Remaining:	32.64 GB (72.56%)
Block Pool Used:	821.78 MB (1.78%)
DataNodes usages% (Min/Median/Max/stdDev): 	1.78% / 1.78% / 1.78% / 0.00%
Live Nodes	1 (Decommissioned: 0, In Maintenance: 0)
Dead Nodes	0 (Decommissioned: 0, In Maintenance: 0)
Decommissioning Nodes	0
Entering Maintenance Nodes	0
Total Datanode Volume Failures	0 (0 B)
Number of Under-Replicated Blocks	740
Number of Blocks Pending Deletion	0
Block Deletion Start Time	Fri Jul 14 22:12:26 -0500 2017
Last Checkpoint Time	Sat Jul 15 17:13:51 -0500 2017
--------------------
NameNode Storage
Storage Directory	Type	State
/dfs/nn	IMAGE_AND_EDITS	Active
----------------------------
DFS Storage Types
Storage Type	Configured Capacity	Capacity Used	Capacity Remaining	Block Pool Used	Nodes In Service
DISK	44.98 GB	821.78 MB (1.78%)	32.64 GB (72.56%)	821.78 MB	1

 

resoure manager logs:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1500089331244_0001 from user: hdfs, in queue: root.users.hdfs, currently num of applications: 1
2017-07-15 18:32:18,076 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1500089331244_0001 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
2017-07-15 18:32:18,101 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1500089331244_0001_000001
2017-07-15 18:32:18,102 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1500089331244_0001_000001 State change from NEW to SUBMITTED on event = START
2017-07-15 18:32:18,192 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1500089331244_0001_000001 to scheduler from user: hdfs
2017-07-15 18:32:18,195 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1500089331244_0001_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED

terminal logs:

[root@pc1 ~]# sudo -u hdfs sqoop import --connect jdbc:mysql://localhost:3306/world -username root -P --table cities -target-dir /user/cloudera/world-cities -m1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: Permission denied
17/07/15 18:27:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.11.1
Enter password: 
17/07/15 18:27:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/07/15 18:27:44 INFO tool.CodeGenTool: Beginning code generation
Sat Jul 15 18:27:45 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
17/07/15 18:27:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1
17/07/15 18:27:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1
17/07/15 18:27:47 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-hdfs/compile/2799093c014869764a88ff9fe6216aed/cities.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/07/15 18:29:10 ERROR orm.CompilationManager: Could not make directory: /root/.
17/07/15 18:29:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/2799093c014869764a88ff9fe6216aed/cities.jar
17/07/15 18:29:11 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/07/15 18:29:11 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/07/15 18:29:11 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/07/15 18:29:11 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/07/15 18:29:13 INFO mapreduce.ImportJobBase: Beginning import of cities
17/07/15 18:29:27 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/07/15 18:30:37 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/07/15 18:30:39 INFO client.RMProxy: Connecting to ResourceManager at pc1.localdomain.com/192.168.1.115:8032
Sat Jul 15 18:32:07 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
17/07/15 18:32:08 INFO db.DBInputFormat: Using read commited transaction isolation
17/07/15 18:32:10 INFO mapreduce.JobSubmitter: number of splits:1
17/07/15 18:32:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500089331244_0001
17/07/15 18:32:18 INFO impl.YarnClientImpl: Submitted application application_1500089331244_0001
17/07/15 18:32:18 INFO mapreduce.Job: The url to track the job: http://pc1.localdomain.com:8088/proxy/application_1500089331244_0001/
17/07/15 18:32:18 INFO mapreduce.Job: Running job: job_1500089331244_0001

I have jdbc connection in place for connecting mysql to hdfs.

I see there is an error in terminal logs, but I cannot figure out the reason, and also whether it is related to mapreduce?

 

avatar
Contributor

Can you try without sudo -u hdfs?

 

avatar
Champion

Based on the Error I assume you are firing your sqoop command as "root user " .

 

 

 ERROR orm.CompilationManager: Could not make directory: /root/

 

 

Try firing the same sqoop command with some non-root user make sure you give all the necessary permission for him to write / read files in hdfs . 

 

something like 

 sudo addgroup hadoop

 sudo adduser --ingroup hadoop hduser

sudo usermod -a -G hdfs hduser 

avatar
Explorer

I added a user named /user/root:

sudo -u hdfs hadoop fs -mkdir /user/root
sudo -u hdfs hadoop fs -chown root /user/root

 

I am still at same place:

[root@pc1 ~]# sqoop import --connect jdbc:mysql://localhost:3306/world -username root -P --table cities -target-dir /user/cloudera/world-cities -m1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/07/17 15:06:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.11.1
Enter password: 
17/07/17 15:07:03 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/07/17 15:07:03 INFO tool.CodeGenTool: Beginning code generation
Mon Jul 17 15:07:04 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
17/07/17 15:07:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1
17/07/17 15:07:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1
17/07/17 15:07:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/bb43452d9a9b052fd0b43509eb87ece3/cities.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/07/17 15:07:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/bb43452d9a9b052fd0b43509eb87ece3/cities.jar
17/07/17 15:07:07 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/07/17 15:07:07 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/07/17 15:07:07 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/07/17 15:07:07 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/07/17 15:07:07 INFO mapreduce.ImportJobBase: Beginning import of cities
17/07/17 15:07:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/07/17 15:07:09 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/07/17 15:07:09 INFO client.RMProxy: Connecting to ResourceManager at pc1.localdomain.com/192.168.1.115:8032
Mon Jul 17 15:07:15 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
17/07/17 15:07:15 INFO db.DBInputFormat: Using read commited transaction isolation
17/07/17 15:07:16 INFO mapreduce.JobSubmitter: number of splits:1
17/07/17 15:07:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500089331244_0002

Not sure where I am still wrong.

 

 

avatar
Champion

This is bad  - Remove the folder from hdfs

 

sudo -u hdfs hadoop fs -mkdir /user/root
sudo -u hdfs hadoop fs -chown root /user/root
 

 

 

Could You Follow the below steps 

 

Login as Root 

 

Step 1 . Create user normal user Login as root in terminal 

 

1 . sudo useradd hduser 
2  change the password for hduser using passwd 

#Check if hduser user exists by performing id 

id hduser  -> 
you should get result like uid=493(hdfs) gid=489(hdfs) groups=489(hdfs),492(hadoop)
id mapred
if so then add the user to group mapred and hdfs usermod -a -G mapred hduser usermod -a -G hdfs hduser

once everything is done. 

 

 

Step2

Login in your OS terminal as hduser using

su - hduser 

 

Step3 

 

 sudo -u hdfs hadoop fs -mkdir /user/hduser
sudo -u hdfs hadoop fs  -chown -R  hduser /user/hduser
sudo -u hdfs hadoop fs  -chmod  -R 777 /user/hduser

note - 777 permission bad but since it is a test let us do em.


Step4 

 

sqoop list-databases \
--connect jdbc:mysql://localhost \
--username name --password Youpassword

 

Step 5 :  Perform the same for import.

 

 

Let me know if that is suffice 

 

avatar
Explorer

I changed terminal user as hduser, however, the mapreduce application is seen as pending in YARN. Following is the log:

 

[hduser@pc1 ~]$ id hduser
uid=1002(hduser) gid=1002(hduser) groups=1002(hduser),980(hdfs),979(mapred)
[hduser@pc1 ~]$ sqoop import --connect jdbc:mysql://localhost:3306/world -username root -P --table cities -target-dir /user/cloudera/world-cities -m1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/07/17 23:40:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.11.1
Enter password: 
17/07/17 23:40:43 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/07/17 23:40:43 INFO tool.CodeGenTool: Beginning code generation
Mon Jul 17 23:40:44 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
17/07/17 23:40:44 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1
17/07/17 23:40:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1
17/07/17 23:40:45 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-hduser/compile/32dfd91e6debe7d017b22f5df50f2199/cities.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/07/17 23:40:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hduser/compile/32dfd91e6debe7d017b22f5df50f2199/cities.jar
17/07/17 23:40:47 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/07/17 23:40:47 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/07/17 23:40:47 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/07/17 23:40:47 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/07/17 23:40:47 INFO mapreduce.ImportJobBase: Beginning import of cities
17/07/17 23:40:48 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/07/17 23:40:50 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/07/17 23:40:50 INFO client.RMProxy: Connecting to ResourceManager at pc1.localdomain.com/192.168.1.115:8032
Mon Jul 17 23:40:56 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
17/07/17 23:40:56 INFO db.DBInputFormat: Using read commited transaction isolation
17/07/17 23:40:57 INFO mapreduce.JobSubmitter: number of splits:1
17/07/17 23:40:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500089331244_0005
17/07/17 23:40:57 INFO impl.YarnClientImpl: Submitted application application_1500089331244_0005
17/07/17 23:40:58 INFO mapreduce.Job: The url to track the job: http://pc1.localdomain.com:8088/proxy/application_1500089331244_0005/
17/07/17 23:40:58 INFO mapreduce.Job: Running job: job_1500089331244_0005

avatar
Champion

 the log says track the job 

http://pc1.localdomain.com:8088/proxy/application_1500089331244_0005/

what you see ? meantime Check the Resourcemanager log and let me know

 

Were you able to perform Step 4 

 

Also what is this ? why set it as root 

 

In Yarn settings, I have set root, 
and default min and max cores to be 1 and 4,
and min /max memory to be 1 and 4 Gb

 

avatar
Explorer

Yes, I was able to perform step 4.

 

The application status in Yarn:

Screenshot from 2017-07-18 06-43-29.png

It is still in unassigned and pending state.

 

Hadoop- yarn-Resoure manager logs:

2017-07-17 23:40:50,845 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 5
2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 5 submitted by user hduser
2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1500089331244_0005
2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser	IP=192.168.1.115	OPERATION=Submit Application Request	TARGET=ClientRMService	RESULT=SUCCESS	APPID=application_1500089331244_0005
2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1500089331244_0005
2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1500089331244_0005 State change from NEW to NEW_SAVING on event = START
2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1500089331244_0005 State change from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
2017-07-17 23:40:57,951 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1500089331244_0005 from user: hduser, in queue: root.users.hduser, currently num of applications: 1
2017-07-17 23:40:57,951 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1500089331244_0005 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
2017-07-17 23:40:57,951 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1500089331244_0005_000001
2017-07-17 23:40:57,951 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1500089331244_0005_000001 State change from NEW to SUBMITTED on event = START
2017-07-17 23:40:57,952 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1500089331244_0005_000001 to scheduler from user: hduser
2017-07-17 23:40:57,952 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1500089331244_0005_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED

I meant by dynamic resoure pool allocation for reserving min and max resources:

Screenshot from 2017-07-18 06-39-24.png

However, in Yarn, I do not see them being utilized at all!