Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

sqoop job for incremental import execution from oozie

avatar
Expert Contributor

I hear that metastore in sqoop can take care of the incremental imports and that way I do not need to keep track of the last updated id/datetime myself.

I am trying to execute this from an oozie WF but my question is

1,what goes into last-value parameter in sqoop command in that case(when I have a sqoop job and metastore configured)?(Do I need to even pass the parameter )?

2. Also, can I give multiple import statements in single sqoop job?

3. If yes, How?

4. Is it a good idea to execute multiple table imports in parallel? (I really would like to know the pros and cons attached to it).

5. If I plan to have table imports in parallel, do I just fork and execute jobs in oozie?

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
5 REPLIES 5

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Rising Star

Hi @Benjamin Leonhardi

I am trying to import a table from SQL Server to hbase using sqoop and through incremental import in sqoop trying to update the hbase table using empid and schedule the sqoop job using oozie workflow in order to make the job runs on particular time basis.

Eg)

SQL TABLE

create table employee(empid int primary key,empname varchar(35),designation varchar(30),salary int);

insert into employee values(300,'Azhar','MD',50000); insert into employee values(301,'vijay','GM',40000); insert into employee values(302,'rahul','Asst GM',35000); insert into employee values(303,'khanna','accountant',25000); insert into employee values(304,'vikram','sales manager',20000);

IMPORTING DATA INTO HBASE USING SQOOP

sqoop import --connect "jdbc:sqlserver://localhost:1433;database=US_DB" --username sa--password 12345 --table employee --hbase-table hb_emp --column-family empid --hbase-create-table

INCREMENTAL IMPORT IN SQOOP FOR SQL-HBASE TABLE

sqoop import --connect "jdbc:mysql://localhost;database=US_DB" --username root -P --table employee --hbase-table hb_emp --column-family cfemp --incremental append --check-column empid --last-value 304

SCHEDULING SQOOP INCREMENTAL JOB USING OOZIE FOR HBASE TABLE

Here is my job.properties and workflow.xml configuration

job.properties

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
nameNode=hdfs://sandbox.hortonworks.com:8020
jobTracker=sandbox.hortonworks.com:8050
queue.Name=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.libpath=${nameNode}/user/root/share/lib
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/sqoop/

workflow.xml

<workflow-app name="sqoop-hbase-wf" xmlns="uri:oozie:workflow:0.2"> <start to="sqoop-import"/> <action name="sqoop-import"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>/user/root/hbase-site.xml</job-xml> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <command>sqoop import --connect "jdbc:mysql://localhost;database=US_DB" --username root -P --table employee --hbase-table hb_emp --column-family cfemp --incremental append --check-column empid --last-value 304</command> <file>/user/root/sqljdbc4.jar#sqljdbc4.jar</file> <file>/user/root/hbase/hbase-client-1.1.2.2.4.0.0-169.jar#hbase-client-1.1.2.2.4.0.0-169.jar</file> <file>/user/root/hbase/hbase-common-1.1.2.2.4.0.0-169.jar#hbase-common-1.1.2.2.4.0.0-169.jar</file> <file>/user/root/hbase/hbase-protocol-1.1.2.2.4.0.0-169.jar#hbase-protocol-1.1.2.2.4.0.0-169.jar</file> <file>/user/root/hbase/htrace-core-3.1.0-incubating.jar#htrace-core-3.1.0-incubating.jar</file> <file>/user/root/hbase/hbase-server-1.1.2.2.4.0.0-169.jar#hbase-server-1.1.2.2.4.0.0-169.jar</file> <file>/user/root/hbase/hbase-hadoop-compat-1.1.2.2.4.0.0-169.jar#hbase-hadoop-compat-1.1.2.2.4.0.0-169.jar</file> <file>/user/root/hbase/high-scale-lib-1.1.1.jar#high-scale-lib-1.1.1.jar</file> </sqoop> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>

ERROR

Getting struck while running oozie with sqoop-main exception. Please help me to solve this issue.

What are the compatible versions for this task to be completed usinf hdp 2.4.

ENVIRONMENT

Hortonworks 2.4 hdp

Thanks

avatar
New Contributor

I am reaaaaally late on this answer!!!

But since i recently faced this issue myself, i am gonna answer to help somebody else out. The solution was not to remove the sqoop keyword when passing the command tag in workflow.xml

Pass the command in this way.

<command>import --connect "jdbc:mysql://localhost;database=US_DB" --username root -P --table employee --hbase-table hb_emp --column-family cfemp --incremental append --check-column empid --last-value 304</command>

avatar
Explorer

@simran kaur Can you tell how did you ran multiple table imports using oozie/Sqoop ? apart from using import-all tables too.