<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: sqoop job for incremental import execution from oozie in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137156#M27657</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/6243/simrankauradept.html" nodeid="6243"&gt;@simran kaur&lt;/A&gt; Can you tell how did you ran multiple table imports using oozie/Sqoop ? apart from using import-all tables too.&lt;/P&gt;</description>
    <pubDate>Wed, 19 Jul 2017 06:37:57 GMT</pubDate>
    <dc:creator>gandra</dc:creator>
    <dc:date>2017-07-19T06:37:57Z</dc:date>
    <item>
      <title>sqoop job for incremental import execution from oozie</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137153#M27654</link>
      <description>&lt;P&gt;I hear that metastore in sqoop can take care of the incremental imports and that way I do not need to keep track of the last updated id/datetime myself. &lt;/P&gt;&lt;P&gt;I am trying to execute this from an oozie WF but my question is &lt;/P&gt;&lt;P&gt;1,what goes into last-value parameter in sqoop command in that case(when I have a sqoop job and metastore configured)?(Do I need to even pass the parameter )?&lt;/P&gt;&lt;P&gt;2. Also, can I give multiple import statements in single sqoop job?&lt;/P&gt;&lt;P&gt;3. If yes, How?&lt;/P&gt;&lt;P&gt;4. Is it a good idea to execute multiple table imports in parallel? (I really would like to know the pros and cons attached to it). &lt;/P&gt;&lt;P&gt;5. If I plan to have table imports in parallel, do I just fork and execute jobs in oozie?&lt;/P&gt;</description>
      <pubDate>Sun, 08 May 2016 23:28:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137153#M27654</guid>
      <dc:creator>sim6</dc:creator>
      <dc:date>2016-05-08T23:28:31Z</dc:date>
    </item>
    <item>
      <title>Re: sqoop job for incremental import execution from oozie</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137154#M27655</link>
      <description>&lt;P&gt;Just before we answer the questions before:&lt;/P&gt;&lt;P&gt;I prefer to do it a different way: If you run incremental loads you have a hard time to roll back imports tht may or may not have failed. It is easier if you associate each import with a partition in hive and then just delete that partition in case of failure.&lt;/P&gt;&lt;P&gt;I.e. if you want to load data hourly create a hourly partitioned table, run an hourly oozie job and use coord:dateformat to provide min/max parameters for that hour. This way you can just re-run the oozie instance in case of any failure and everything will be perfect. If you do incremental loads in the middle of a time time period you don't have much control of the data entering your tables. If you rerun a job you have duplicate data. &lt;/P&gt;&lt;P&gt;Apart from that:&lt;/P&gt;&lt;P&gt;1) If you want a central metastore for sqoop jobs that run in Oozie I think you need to setup the metastore and then use the --meta-connect parameter to it. &lt;/P&gt;&lt;P&gt;That jira is helpful&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/SQOOP-453" target="_blank"&gt;https://issues.apache.org/jira/browse/SQOOP-453&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2) You can do import-all-tables&lt;/P&gt;&lt;P&gt;4) Depends. If its the same database I would say the answer is no. Since the bottleneck will be most likely the network or the database returning data. In that case its better to increase the number of mappers and run imports one by one. For small tables or ones you cannot partition loading in parallel might be good. However you will have a bit of overhead in the cluster since each parallel oozie job will have three empty containers ( oozie launcher AM, oozie launcher map, sqoop am ) this can add up on small clusters as well&lt;/P&gt;&lt;P&gt;5) yes &lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 01:22:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137154#M27655</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-05-09T01:22:34Z</dc:date>
    </item>
    <item>
      <title>Re: sqoop job for incremental import execution from oozie</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137155#M27656</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/168/bleonhardi.html" nodeid="168"&gt;@Benjamin Leonhardi&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I am trying to import a table from SQL Server to hbase using sqoop and through incremental import in sqoop trying to update the hbase table using empid and schedule the sqoop job using oozie workflow in order to make the job runs on particular time basis.&lt;/P&gt;&lt;P&gt;Eg)&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;SQL TABLE&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;create table employee(empid int primary key,empname varchar(35),designation varchar(30),salary int);&lt;/P&gt;&lt;P&gt;insert into employee values(300,'Azhar','MD',50000);
insert into employee values(301,'vijay','GM',40000);
insert into employee values(302,'rahul','Asst GM',35000);
insert into employee values(303,'khanna','accountant',25000);
insert into employee values(304,'vikram','sales manager',20000);&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;IMPORTING DATA INTO HBASE USING SQOOP&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;sqoop import --connect "jdbc:sqlserver://localhost:1433;database=US_DB" --username sa--password 12345 --table employee --hbase-table hb_emp --column-family empid --hbase-create-table&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;INCREMENTAL IMPORT IN SQOOP FOR SQL-HBASE TABLE&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;sqoop import --connect "jdbc:mysql://localhost;database=US_DB" --username root -P --table employee --hbase-table hb_emp --column-family cfemp  --incremental append --check-column empid --last-value 304&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;SCHEDULING SQOOP INCREMENTAL JOB USING OOZIE FOR HBASE TABLE&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Here is my job.properties and workflow.xml configuration&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;job.properties&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
# &lt;A href="http://www.apache.org/licenses/LICENSE-2.0"&gt;http://www.apache.org/licenses/LICENSE-2.0&lt;/A&gt;
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
nameNode=hdfs://sandbox.hortonworks.com:8020
jobTracker=sandbox.hortonworks.com:8050
queue.Name=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.libpath=${nameNode}/user/root/share/lib
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/sqoop/

&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt; workflow.xml&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;lt;workflow-app name="sqoop-hbase-wf" xmlns="uri:oozie:workflow:0.2"&amp;gt;   
  &amp;lt;start to="sqoop-import"/&amp;gt;   
  &amp;lt;action name="sqoop-import"&amp;gt;   
  &amp;lt;sqoop xmlns="uri:oozie:sqoop-action:0.2"&amp;gt;   
  &amp;lt;job-tracker&amp;gt;${jobTracker}&amp;lt;/job-tracker&amp;gt;   
  &amp;lt;name-node&amp;gt;${nameNode}&amp;lt;/name-node&amp;gt;   
  &amp;lt;job-xml&amp;gt;/user/root/hbase-site.xml&amp;lt;/job-xml&amp;gt;   
   
  &amp;lt;configuration&amp;gt;   
  &amp;lt;property&amp;gt;   
  &amp;lt;name&amp;gt;mapred.job.queue.name&amp;lt;/name&amp;gt;   
  &amp;lt;value&amp;gt;${queueName}&amp;lt;/value&amp;gt;   
  &amp;lt;/property&amp;gt;   
  &amp;lt;/configuration&amp;gt;   
   
  &amp;lt;command&amp;gt;sqoop import --connect "jdbc:mysql://localhost;database=US_DB" --username root -P --table employee --hbase-table hb_emp --column-family cfemp
      --incremental append --check-column empid --last-value 304&amp;lt;/command&amp;gt;   
   
          &amp;lt;file&amp;gt;/user/root/sqljdbc4.jar#sqljdbc4.jar&amp;lt;/file&amp;gt;   
          &amp;lt;file&amp;gt;/user/root/hbase/hbase-client-1.1.2.2.4.0.0-169.jar#hbase-client-1.1.2.2.4.0.0-169.jar&amp;lt;/file&amp;gt;   
          &amp;lt;file&amp;gt;/user/root/hbase/hbase-common-1.1.2.2.4.0.0-169.jar#hbase-common-1.1.2.2.4.0.0-169.jar&amp;lt;/file&amp;gt;   
          &amp;lt;file&amp;gt;/user/root/hbase/hbase-protocol-1.1.2.2.4.0.0-169.jar#hbase-protocol-1.1.2.2.4.0.0-169.jar&amp;lt;/file&amp;gt;   
          &amp;lt;file&amp;gt;/user/root/hbase/htrace-core-3.1.0-incubating.jar#htrace-core-3.1.0-incubating.jar&amp;lt;/file&amp;gt;   
          &amp;lt;file&amp;gt;/user/root/hbase/hbase-server-1.1.2.2.4.0.0-169.jar#hbase-server-1.1.2.2.4.0.0-169.jar&amp;lt;/file&amp;gt;   
          &amp;lt;file&amp;gt;/user/root/hbase/hbase-hadoop-compat-1.1.2.2.4.0.0-169.jar#hbase-hadoop-compat-1.1.2.2.4.0.0-169.jar&amp;lt;/file&amp;gt;   
          &amp;lt;file&amp;gt;/user/root/hbase/high-scale-lib-1.1.1.jar#high-scale-lib-1.1.1.jar&amp;lt;/file&amp;gt;   

  &amp;lt;/sqoop&amp;gt;   
  &amp;lt;ok to="end"/&amp;gt;   
  &amp;lt;error to="fail"/&amp;gt;   
  &amp;lt;/action&amp;gt;   
   
  &amp;lt;kill name="fail"&amp;gt;   
  &amp;lt;message&amp;gt;Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]&amp;lt;/message&amp;gt;   
  &amp;lt;/kill&amp;gt;   
  &amp;lt;end name="end"/&amp;gt;   
&amp;lt;/workflow-app&amp;gt;  &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;ERROR&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Getting struck while running oozie with sqoop-main exception. Please help me to solve this issue.&lt;/P&gt;&lt;P&gt;What are the compatible versions for this task to be completed usinf hdp 2.4.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;ENVIRONMENT&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Hortonworks 2.4 hdp&lt;/STRONG&gt;
&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;
&lt;A rel="user" href="https://community.cloudera.com/users/168/bleonhardi.html" nodeid="168"&gt;&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Tue, 31 May 2016 14:05:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137155#M27656</guid>
      <dc:creator>omkarmotoe</dc:creator>
      <dc:date>2016-05-31T14:05:37Z</dc:date>
    </item>
    <item>
      <title>Re: sqoop job for incremental import execution from oozie</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137156#M27657</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/6243/simrankauradept.html" nodeid="6243"&gt;@simran kaur&lt;/A&gt; Can you tell how did you ran multiple table imports using oozie/Sqoop ? apart from using import-all tables too.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2017 06:37:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137156#M27657</guid>
      <dc:creator>gandra</dc:creator>
      <dc:date>2017-07-19T06:37:57Z</dc:date>
    </item>
    <item>
      <title>Re: sqoop job for incremental import execution from oozie</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137157#M27658</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/17809/gandra.html" nodeid="17809"&gt;@gvenkatesh&lt;/A&gt;: Here &lt;A href="http://www.yourtechchick.com/hadoop/hive/step-step-guide-sqoop-incremental-imports/"&gt;http://www.yourtechchick.com/hadoop/hive/step-step-guide-sqoop-incremental-imports/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://www.yourtechchick.com/sqoop/run-sqoop-jobs-from-oozie/"&gt;http://www.yourtechchick.com/sqoop/run-sqoop-jobs-from-oozie/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2017 12:12:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137157#M27658</guid>
      <dc:creator>simran_k</dc:creator>
      <dc:date>2017-07-19T12:12:41Z</dc:date>
    </item>
    <item>
      <title>Re: sqoop job for incremental import execution from oozie</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137158#M27659</link>
      <description>&lt;P&gt;I am reaaaaally late on this answer!!!&lt;/P&gt;&lt;P&gt;But since i recently faced this issue myself, i am gonna answer to help somebody else out. The solution was not to remove the sqoop keyword when passing the command tag in workflow.xml&lt;/P&gt;&lt;P&gt;Pass the command in this way.&lt;/P&gt;&lt;P&gt;&amp;lt;command&amp;gt;import --connect "jdbc:mysql://localhost;database=US_DB" --username root -P --table employee --hbase-table hb_emp --column-family cfemp --incremental append --check-column empid --last-value 304&amp;lt;/command&amp;gt; &lt;/P&gt;</description>
      <pubDate>Sat, 17 Feb 2018 00:23:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/sqoop-job-for-incremental-import-execution-from-oozie/m-p/137158#M27659</guid>
      <dc:creator>1013vishalsharm</dc:creator>
      <dc:date>2018-02-17T00:23:25Z</dc:date>
    </item>
  </channel>
</rss>

