Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎03-04-2019

oozie sqoop action heart beat

[ Edited ]

Hi everyone, iam newbew in big data
i want to import table to the hive with oozie and sqoop
at first i had the problem mysql driver - solved by adding jar in oozie lib path
the second problem was java heap size exception, and increase heap size of hdfs and yarn
at last my script dont working with oozie(no exception just Heart beat), but if run sqoop from shell its working.

chd 5.13
java 8


My script

<workflow-app name="oozie-sqoop-test" xmlns="uri:oozie:workflow:0.5">
<start to="oozie-sqoop-test-job"/>
<action name="oozie-sqoop-test-job">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>quickstart.cloudera:8032</job-tracker>
<name-node>hdfs://localhost:8020</name-node>
<prepare>
<delete path="hdfs://localhost:8020/user/hive/warehouse/some"/>
</prepare>
<command>import --connect jdbc:mysql://1.1.1.1/somebd --table some --username user --password pas --as-parquetfile --warehouse-dir=/user/hive/warehouse -m 1</command>
<file>hdfs://localhost:8020/user/test/oozie/sqoop/hive-site.xml#hive-site.xml</file>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>


and conf

oozie.use.system.libpath=true
oozie.wf.application.path=hdfs://quickstart.cloudera:8020/user/test/oozie/sqoop/oozie-sqoop-test18.xml


and log

Sqoop command arguments :
import
--connect
jdbc:mysql://1.1.1.1/somedb
--table
some
--username
user
--password
********
--as-parquetfile
--warehouse-dir=/user/hive/warehouse
-m
1
Fetching child yarn jobs
tag id : oozie-a23ce0ea22c8363d6679e0331a411931
2019-03-04 07:03:14,478 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
Child yarn jobs are found - application_1551703829290_0012
Found [1] Map-Reduce jobs from this launcher
Killing existing jobs and starting over:
2019-03-04 07:03:14,651 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
Killing job [application_1551703829290_0012] ... 2019-03-04 07:03:14,657 [main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1551703829290_0012
Done
=================================================================
>>> Invoking Sqoop command line now >>>
2019-03-04 07:03:14,699 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2019-03-04 07:03:14,745 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6-cdh5.13.0
2019-03-04 07:03:14,755 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
2019-03-04 07:03:14,763 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2019-03-04 07:03:14,818 [main] INFO org.apache.sqoop.manager.MySQLManager - Preparing to use a MySQL streaming resultset.
2019-03-04 07:03:14,818 [main] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation
2019-03-04 07:03:14,819 [main] INFO org.apache.sqoop.tool.CodeGenTool - Will generate java class as codegen_some
2019-03-04 07:03:15,176 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `some` AS t LIMIT 1
2019-03-04 07:03:15,210 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `some` AS t LIMIT 1
2019-03-04 07:03:15,226 [main] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop-mapreduce
2019-03-04 07:03:16,994 [main] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/aee94c462138557618da1682b4bac3a2/codegen_some.jar
2019-03-04 07:03:17,005 [main] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql.
2019-03-04 07:03:17,006 [main] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct
2019-03-04 07:03:17,006 [main] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path.
2019-03-04 07:03:17,006 [main] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql)
2019-03-04 07:03:17,022 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of some
2019-03-04 07:03:17,023 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2019-03-04 07:03:17,040 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2019-03-04 07:03:17,109 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `some` AS t LIMIT 1
2019-03-04 07:03:17,121 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `some` AS t LIMIT 1
2019-03-04 07:03:17,950 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2019-03-04 07:03:17,954 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
2019-03-04 07:03:18,028 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
2019-03-04 07:03:18,536 [main] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation
2019-03-04 07:03:18,631 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2019-03-04 07:03:18,692 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1551703829290_0013
2019-03-04 07:03:18,692 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: mapreduce.job, Service: job_1551703829290_0011, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@2ee83775)
2019-03-04 07:03:18,693 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: RM_DELEGATION_TOKEN, Service: 127.0.0.1:8032, Ident: (RM_DELEGATION_TOKEN owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1551711543861, maxDate=1552316343861, sequenceNumber=29, masterKeyId=2)
2019-03-04 07:03:19,055 [main] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20190302033422/sqoop/mysql-connector-java.jar conflicts with cache file (mapreduce.job.cache.files) job_1551703829290_0013/libjars/mysql-connector-java.jar">hdfs://localhost:8020/user/cloudera/.staging/job_1551703829290_0013/libjars/mysql-connector-java.jar This will be an error in Hadoop 2.0
2019-03-04 07:03:19,056 [main] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20190302033422/sqoop/sqoop.jar conflicts with cache file (mapreduce.job.cache.files) job_1551703829290_0013/libjars/sqoop.jar">hdfs://localhost:8020/user/cloudera/.staging/job_1551703829290_0013/libjars/sqoop.jar This will be an error in Hadoop 2.0
2019-03-04 07:03:19,093 [main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1551703829290_0013
2019-03-04 07:03:19,129 [main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1551703829290_0013/
2019-03-04 07:03:19,129 [main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1551703829290_0013/
2019-03-04 07:03:19,130 [main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_1551703829290_0013
2019-03-04 07:03:19,130 [main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_1551703829290_0013
2019-03-04 07:03:25,392 [main] INFO org.apache.hadoop.mapreduce.Job - Job job_1551703829290_0013 running in uber mode : false
2019-03-04 07:03:25,392 [main] INFO org.apache.hadoop.mapreduce.Job - Job job_1551703829290_0013 running in uber mode : false
2019-03-04 07:03:25,394 [main] INFO org.apache.hadoop.mapreduce.Job - map 0% reduce 0%
2019-03-04 07:03:25,394 [main] INFO org.apache.hadoop.mapreduce.Job - map 0% reduce 0%
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat



Please Help, what i do wrong

Posts: 1,903
Kudos: 435
Solutions: 307
Registered: ‎07-31-2013

Re: oozie sqoop action heart beat

The heart beat messages are just signifying that the action is waiting for something within to complete.

In your log's case, Sqoop is awaiting completion of the job it was able to launch: job_1551703829290_0013.

Please check the status and error/etc. of the job job_1551703829290_0013 to see why it may have taken very long.

If this is a small cluster, there's a good chance also that your configured resources for NodeManager (Memory/CPU) is inadequate to run two or more parallel jobs (Oozie action is one job, but it submits another and waits for the submitted one to complete, so each action is roughly/mostly 2 concurrent job executions). This can be fixed by adding more NM hosts, raising the resources on existing NM hosts or configuring job resource demands to be lower than their current values.