Created on 07-14-2017 10:33 AM - edited 09-16-2022 04:55 AM
I am ingesting data in CDH5 hdfs using SQOOP using mysql. The job is submitted to mapreduce, but there is no activity after I get mapreduce job id:
INFO mapreduce.JobSubmitter: number of splits:1 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500040023027_0002 INFO impl.YarnClientImpl: Submitted application application_1500040023027_0002 INFO mapreduce.Job: The url to track the job: http://pc1.localdomain.com:8088/proxy/application_1500040023027_0002/ INFO mapreduce.Job: Running job: job_1500040023027_0002
I have set up CDH5 on RHEL using cluster setup, but I have only one pc in cluster. I do see warnings to have atleast 3 datanodes, but I think it should not be an issue if I am not runninng huge activity.
I have also set the namenode and secondary namenode memory size to be 4GB. The block memory size is set to 64Mb. The log file size is also taken care of by setting them to 2GB minimum.
In Yarn settings, I have set root, and default min and max cores to be 1 and 4, and min /max memory to be 1 and 4 Gb
mapreduce screenshot shows that 0 VC and memory has been assigned to it.
Can somebody point me how to make it working.
Created 07-21-2017 11:51 AM
Following the follwing link worked for me.
Created 07-15-2017 01:44 AM
Were you able to check the Resource manager web ui ?
also check the resourcemanager / node manager logs
finally perform a simple sqoop list-database or list-tables just to see if the sqoop is working properly
did you place the mysql jdbc jar in sqoop path
Created 07-15-2017 04:51 PM
Following are the logs as I see:
resource manager web ui:
Security is off. Safemode is off. 1,423 files and directories, 1,038 blocks = 2,461 total filesystem object(s). Heap Memory used 246.75 MB of 3.97 GB Heap Memory. Max Heap Memory is 3.97 GB. Non Heap Memory used 50.22 MB of 50.44 MB Commited Non Heap Memory. Max Non Heap Memory is 130 MB. Configured Capacity: 44.98 GB DFS Used: 821.78 MB (1.78%) Non DFS Used: 11.41 GB DFS Remaining: 32.64 GB (72.56%) Block Pool Used: 821.78 MB (1.78%) DataNodes usages% (Min/Median/Max/stdDev): 1.78% / 1.78% / 1.78% / 0.00% Live Nodes 1 (Decommissioned: 0, In Maintenance: 0) Dead Nodes 0 (Decommissioned: 0, In Maintenance: 0) Decommissioning Nodes 0 Entering Maintenance Nodes 0 Total Datanode Volume Failures 0 (0 B) Number of Under-Replicated Blocks 740 Number of Blocks Pending Deletion 0 Block Deletion Start Time Fri Jul 14 22:12:26 -0500 2017 Last Checkpoint Time Sat Jul 15 17:13:51 -0500 2017 -------------------- NameNode Storage Storage Directory Type State /dfs/nn IMAGE_AND_EDITS Active ---------------------------- DFS Storage Types Storage Type Configured Capacity Capacity Used Capacity Remaining Block Pool Used Nodes In Service DISK 44.98 GB 821.78 MB (1.78%) 32.64 GB (72.56%) 821.78 MB 1
resoure manager logs:
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1500089331244_0001 from user: hdfs, in queue: root.users.hdfs, currently num of applications: 1 2017-07-15 18:32:18,076 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1500089331244_0001 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED 2017-07-15 18:32:18,101 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1500089331244_0001_000001 2017-07-15 18:32:18,102 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1500089331244_0001_000001 State change from NEW to SUBMITTED on event = START 2017-07-15 18:32:18,192 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1500089331244_0001_000001 to scheduler from user: hdfs 2017-07-15 18:32:18,195 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1500089331244_0001_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
terminal logs:
[root@pc1 ~]# sudo -u hdfs sqoop import --connect jdbc:mysql://localhost:3306/world -username root -P --table cities -target-dir /user/cloudera/world-cities -m1 Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. find: failed to restore initial working directory: Permission denied 17/07/15 18:27:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.11.1 Enter password: 17/07/15 18:27:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/07/15 18:27:44 INFO tool.CodeGenTool: Beginning code generation Sat Jul 15 18:27:45 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 17/07/15 18:27:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1 17/07/15 18:27:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1 17/07/15 18:27:47 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce Note: /tmp/sqoop-hdfs/compile/2799093c014869764a88ff9fe6216aed/cities.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/07/15 18:29:10 ERROR orm.CompilationManager: Could not make directory: /root/. 17/07/15 18:29:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/2799093c014869764a88ff9fe6216aed/cities.jar 17/07/15 18:29:11 WARN manager.MySQLManager: It looks like you are importing from mysql. 17/07/15 18:29:11 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 17/07/15 18:29:11 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 17/07/15 18:29:11 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 17/07/15 18:29:13 INFO mapreduce.ImportJobBase: Beginning import of cities 17/07/15 18:29:27 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/07/15 18:30:37 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/07/15 18:30:39 INFO client.RMProxy: Connecting to ResourceManager at pc1.localdomain.com/192.168.1.115:8032 Sat Jul 15 18:32:07 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 17/07/15 18:32:08 INFO db.DBInputFormat: Using read commited transaction isolation 17/07/15 18:32:10 INFO mapreduce.JobSubmitter: number of splits:1 17/07/15 18:32:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500089331244_0001 17/07/15 18:32:18 INFO impl.YarnClientImpl: Submitted application application_1500089331244_0001 17/07/15 18:32:18 INFO mapreduce.Job: The url to track the job: http://pc1.localdomain.com:8088/proxy/application_1500089331244_0001/ 17/07/15 18:32:18 INFO mapreduce.Job: Running job: job_1500089331244_0001
I have jdbc connection in place for connecting mysql to hdfs.
I see there is an error in terminal logs, but I cannot figure out the reason, and also whether it is related to mapreduce?
Created 07-16-2017 03:03 AM
Can you try without sudo -u hdfs?
Created 07-16-2017 06:36 PM
Based on the Error I assume you are firing your sqoop command as "root user " .
ERROR orm.CompilationManager: Could not make directory: /root/
Try firing the same sqoop command with some non-root user make sure you give all the necessary permission for him to write / read files in hdfs .
something like
sudo addgroup hadoop sudo adduser --ingroup hadoop hduser sudo usermod -a -G hdfs hduser
Created 07-17-2017 01:11 PM
I added a user named /user/root:
sudo -u hdfs hadoop fs -mkdir /user/root
sudo -u hdfs hadoop fs -chown root /user/root
I am still at same place:
[root@pc1 ~]# sqoop import --connect jdbc:mysql://localhost:3306/world -username root -P --table cities -target-dir /user/cloudera/world-cities -m1 Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 17/07/17 15:06:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.11.1 Enter password: 17/07/17 15:07:03 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/07/17 15:07:03 INFO tool.CodeGenTool: Beginning code generation Mon Jul 17 15:07:04 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 17/07/17 15:07:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1 17/07/17 15:07:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1 17/07/17 15:07:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce Note: /tmp/sqoop-root/compile/bb43452d9a9b052fd0b43509eb87ece3/cities.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/07/17 15:07:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/bb43452d9a9b052fd0b43509eb87ece3/cities.jar 17/07/17 15:07:07 WARN manager.MySQLManager: It looks like you are importing from mysql. 17/07/17 15:07:07 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 17/07/17 15:07:07 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 17/07/17 15:07:07 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 17/07/17 15:07:07 INFO mapreduce.ImportJobBase: Beginning import of cities 17/07/17 15:07:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/07/17 15:07:09 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/07/17 15:07:09 INFO client.RMProxy: Connecting to ResourceManager at pc1.localdomain.com/192.168.1.115:8032 Mon Jul 17 15:07:15 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 17/07/17 15:07:15 INFO db.DBInputFormat: Using read commited transaction isolation 17/07/17 15:07:16 INFO mapreduce.JobSubmitter: number of splits:1 17/07/17 15:07:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500089331244_0002
Not sure where I am still wrong.
Created on 07-17-2017 06:49 PM - edited 07-17-2017 06:50 PM
This is bad - Remove the folder from hdfs
sudo -u hdfs hadoop fs -mkdir /user/root sudo -u hdfs hadoop fs -chown root /user/root
Could You Follow the below steps
Login as Root
Step 1 . Create user normal user Login as root in terminal
1 . sudo useradd hduser 2 change the password for hduser using passwd #Check if hduser user exists by performing id id hduser ->
you should get result like uid=493(hdfs) gid=489(hdfs) groups=489(hdfs),492(hadoop)
id mapred
if so then add the user to group mapred and hdfs usermod -a -G mapred hduser usermod -a -G hdfs hduser
once everything is done.
Step2
Login in your OS terminal as hduser using
su - hduser
Step3
sudo -u hdfs hadoop fs -mkdir /user/hduser
sudo -u hdfs hadoop fs -chown -R hduser /user/hduser
sudo -u hdfs hadoop fs -chmod -R 777 /user/hduser
note - 777 permission bad but since it is a test let us do em.
Step4
sqoop list-databases \
--connect jdbc:mysql://localhost \
--username name --password Youpassword
Step 5 : Perform the same for import.
Let me know if that is suffice
Created 07-17-2017 09:45 PM
I changed terminal user as hduser, however, the mapreduce application is seen as pending in YARN. Following is the log:
[hduser@pc1 ~]$ id hduser uid=1002(hduser) gid=1002(hduser) groups=1002(hduser),980(hdfs),979(mapred) [hduser@pc1 ~]$ sqoop import --connect jdbc:mysql://localhost:3306/world -username root -P --table cities -target-dir /user/cloudera/world-cities -m1 Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 17/07/17 23:40:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.11.1 Enter password: 17/07/17 23:40:43 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/07/17 23:40:43 INFO tool.CodeGenTool: Beginning code generation Mon Jul 17 23:40:44 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 17/07/17 23:40:44 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1 17/07/17 23:40:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cities` AS t LIMIT 1 17/07/17 23:40:45 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce Note: /tmp/sqoop-hduser/compile/32dfd91e6debe7d017b22f5df50f2199/cities.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/07/17 23:40:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hduser/compile/32dfd91e6debe7d017b22f5df50f2199/cities.jar 17/07/17 23:40:47 WARN manager.MySQLManager: It looks like you are importing from mysql. 17/07/17 23:40:47 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 17/07/17 23:40:47 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 17/07/17 23:40:47 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 17/07/17 23:40:47 INFO mapreduce.ImportJobBase: Beginning import of cities 17/07/17 23:40:48 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/07/17 23:40:50 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/07/17 23:40:50 INFO client.RMProxy: Connecting to ResourceManager at pc1.localdomain.com/192.168.1.115:8032 Mon Jul 17 23:40:56 CDT 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 17/07/17 23:40:56 INFO db.DBInputFormat: Using read commited transaction isolation 17/07/17 23:40:57 INFO mapreduce.JobSubmitter: number of splits:1 17/07/17 23:40:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500089331244_0005 17/07/17 23:40:57 INFO impl.YarnClientImpl: Submitted application application_1500089331244_0005 17/07/17 23:40:58 INFO mapreduce.Job: The url to track the job: http://pc1.localdomain.com:8088/proxy/application_1500089331244_0005/ 17/07/17 23:40:58 INFO mapreduce.Job: Running job: job_1500089331244_0005
Created on 07-17-2017 09:54 PM - edited 07-17-2017 10:00 PM
the log says track the job
http://pc1.localdomain.com:8088/proxy/application_1500089331244_0005/
what you see ? meantime Check the Resourcemanager log and let me know
Were you able to perform Step 4
Also what is this ? why set it as root
In Yarn settings, I have set root,
and default min and max cores to be 1 and 4,
and min /max memory to be 1 and 4 Gb
Created 07-18-2017 04:49 AM
Yes, I was able to perform step 4.
The application status in Yarn:
It is still in unassigned and pending state.
Hadoop- yarn-Resoure manager logs:
2017-07-17 23:40:50,845 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 5 2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 5 submitted by user hduser 2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1500089331244_0005 2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser IP=192.168.1.115 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1500089331244_0005 2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1500089331244_0005 2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1500089331244_0005 State change from NEW to NEW_SAVING on event = START 2017-07-17 23:40:57,950 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1500089331244_0005 State change from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED 2017-07-17 23:40:57,951 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1500089331244_0005 from user: hduser, in queue: root.users.hduser, currently num of applications: 1 2017-07-17 23:40:57,951 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1500089331244_0005 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED 2017-07-17 23:40:57,951 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1500089331244_0005_000001 2017-07-17 23:40:57,951 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1500089331244_0005_000001 State change from NEW to SUBMITTED on event = START 2017-07-17 23:40:57,952 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1500089331244_0005_000001 to scheduler from user: hduser 2017-07-17 23:40:57,952 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1500089331244_0005_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
I meant by dynamic resoure pool allocation for reserving min and max resources:
However, in Yarn, I do not see them being utilized at all!