Hi everyone, iam newbew in big data
i want to import table to the hive with oozie and sqoop
at first i had the problem mysql driver - solved by adding jar in oozie lib path
the second problem was java heap size exception, and increase heap size of hdfs and yarn
at last my script dont working with oozie(no exception just Heart beat), but if run sqoop from shell its working.
chd 5.13 java 8
My script
<workflow-app name="oozie-sqoop-test" xmlns="uri:oozie:workflow:0.5"> <start to="oozie-sqoop-test-job"/> <action name="oozie-sqoop-test-job"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>quickstart.cloudera:8032</job-tracker> <name-node>hdfs://localhost:8020</name-node> <prepare> <delete path="hdfs://localhost:8020/user/hive/warehouse/some"/> </prepare> <command>import --connect jdbc:mysql://1.1.1.1/somebd --table some --username user --password pas --as-parquetfile --warehouse-dir=/user/hive/warehouse -m 1</command> <file>hdfs://localhost:8020/user/test/oozie/sqoop/hive-site.xml#hive-site.xml</file> </sqoop> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
and conf
oozie.use.system.libpath=true oozie.wf.application.path=hdfs://quickstart.cloudera:8020/user/test/oozie/sqoop/oozie-sqoop-test18.xml
and log
Sqoop command arguments : import --connect jdbc:mysql://1.1.1.1/somedb --table some --username user --password ******** --as-parquetfile --warehouse-dir=/user/hive/warehouse -m 1 Fetching child yarn jobs tag id : oozie-a23ce0ea22c8363d6679e0331a411931 2019-03-04 07:03:14,478 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032 Child yarn jobs are found - application_1551703829290_0012 Found [1] Map-Reduce jobs from this launcher Killing existing jobs and starting over: 2019-03-04 07:03:14,651 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032 Killing job [application_1551703829290_0012] ... 2019-03-04 07:03:14,657 [main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1551703829290_0012 Done ================================================================= >>> Invoking Sqoop command line now >>> 2019-03-04 07:03:14,699 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 2019-03-04 07:03:14,745 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6-cdh5.13.0 2019-03-04 07:03:14,755 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead. 2019-03-04 07:03:14,763 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 2019-03-04 07:03:14,818 [main] INFO org.apache.sqoop.manager.MySQLManager - Preparing to use a MySQL streaming resultset. 2019-03-04 07:03:14,818 [main] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation 2019-03-04 07:03:14,819 [main] INFO org.apache.sqoop.tool.CodeGenTool - Will generate java class as codegen_some 2019-03-04 07:03:15,176 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `some` AS t LIMIT 1 2019-03-04 07:03:15,210 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `some` AS t LIMIT 1 2019-03-04 07:03:15,226 [main] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hadoop-mapreduce 2019-03-04 07:03:16,994 [main] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/aee94c462138557618da1682b4bac3a2/codegen_some.jar 2019-03-04 07:03:17,005 [main] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql. 2019-03-04 07:03:17,006 [main] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct 2019-03-04 07:03:17,006 [main] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path. 2019-03-04 07:03:17,006 [main] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql) 2019-03-04 07:03:17,022 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of some 2019-03-04 07:03:17,023 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-03-04 07:03:17,040 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar 2019-03-04 07:03:17,109 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `some` AS t LIMIT 1 2019-03-04 07:03:17,121 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `some` AS t LIMIT 1 2019-03-04 07:03:17,950 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 2019-03-04 07:03:17,954 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies. 2019-03-04 07:03:18,028 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032 2019-03-04 07:03:18,536 [main] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation 2019-03-04 07:03:18,631 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2019-03-04 07:03:18,692 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1551703829290_0013 2019-03-04 07:03:18,692 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: mapreduce.job, Service: job_1551703829290_0011, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@2ee83775) 2019-03-04 07:03:18,693 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: RM_DELEGATION_TOKEN, Service: 127.0.0.1:8032, Ident: (RM_DELEGATION_TOKEN owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1551711543861, maxDate=1552316343861, sequenceNumber=29, masterKeyId=2) 2019-03-04 07:03:19,055 [main] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20190302033422/sqoop/mysql-connector-java.jar conflicts with cache file (mapreduce.job.cache.files) job_1551703829290_0013/libjars/mysql-connector-java.jar">hdfs://localhost:8020/user/cloudera/.staging/job_1551703829290_0013/libjars/mysql-connector-java.jar This will be an error in Hadoop 2.0 2019-03-04 07:03:19,056 [main] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20190302033422/sqoop/sqoop.jar conflicts with cache file (mapreduce.job.cache.files) job_1551703829290_0013/libjars/sqoop.jar">hdfs://localhost:8020/user/cloudera/.staging/job_1551703829290_0013/libjars/sqoop.jar This will be an error in Hadoop 2.0 2019-03-04 07:03:19,093 [main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1551703829290_0013 2019-03-04 07:03:19,129 [main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1551703829290_0013/ 2019-03-04 07:03:19,129 [main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1551703829290_0013/ 2019-03-04 07:03:19,130 [main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_1551703829290_0013 2019-03-04 07:03:19,130 [main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_1551703829290_0013 2019-03-04 07:03:25,392 [main] INFO org.apache.hadoop.mapreduce.Job - Job job_1551703829290_0013 running in uber mode : false 2019-03-04 07:03:25,392 [main] INFO org.apache.hadoop.mapreduce.Job - Job job_1551703829290_0013 running in uber mode : false 2019-03-04 07:03:25,394 [main] INFO org.apache.hadoop.mapreduce.Job - map 0% reduce 0% 2019-03-04 07:03:25,394 [main] INFO org.apache.hadoop.mapreduce.Job - map 0% reduce 0% Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat
Please Help, what i do wrong
Created 03-07-2019 01:07 AM