Created on 11-10-2018 10:58 PM - edited 09-16-2022 06:53 AM
hi,
wanted to move rdbms data into hdfs on daily bases using sqoop import.
is there any job shedular avaible in hadoop system which will run sqoop import command on daily bases ?
if yes then please provide details such as how to set job sheduleing frequency and how to check log if job fails ?
regards
vivek
Created 11-26-2018 09:32 PM
Hi,
There are 2 options that you can do.
1. You can create a shell script(Containing the Sqoop commands) and set a Cron based scheduler time to time as per the your requirement.
Link:- https://www.taniarascia.com/setting-up-a-basic-cron-job-in-linux/
2. Or you can create Oozie workflow jobs( which will run Sqoop actions frequently) and pass it out in coordinator.
Link:- https://oozie.apache.org/docs/4.1.0/DG_SqoopActionExtension.html (To create workflow.xml)
Link:- https://oozie.apache.org/docs/3.1.3-incubating/CoordinatorFunctionalSpec.html ( To create coordinator)
Regards
Nitish
Created 12-03-2018 12:47 AM
hi,
i want to import rdbms table into hdfs usign sqoop and increment will happen on daily bases.
how sqoop will read max incremental value from previous upload ?
regards
vivek
Created 12-03-2018 01:33 AM
Hi,
You can create Sqoop jobs which will store the value for last value into the embedded database.
Link:- https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_saved_jobs
Kindly go through above link which will help you to achieve this.
Regards
Nitish
Created 12-03-2018 02:46 AM
"If a saved job is configured to perform an incremental import, state regarding the most recently imported rows is updated in the saved job to allow the job to continually import only the newest rows."
but how to configure saved job
is it posssible to view this metastore or last imported values
Created 12-03-2018 03:30 AM
Hi,
When ever you create sqoop job then it gets created as local metastore under $home/.sqoop folder.
I woul request you to confgiure Global metastore so that all the jobs can be accessed from other nodes as well.
Use the [Best Practices for Sqoop1 Metastore] instructions below to manage the Sqoop1 Metastore
[Best Practices for Sqoop1 Metastore]
1. Identify a cluster host (e.g. metastore1.cloudera.com) upon which to run the Metastore process
2. Use the [Deploy Sqoop1 Client Gateway] instructions to deploy it on the Metatore host
3. Use the [Configure Sqoop1 Metastore Directory] instructions to create a directory for the Metastore database
4. Use the [Set Sqoop1 Safety Valve sqoop-site.xml ( metastore )] instructions to configure the Metastore
5. Use the [Start Sqoop1 Metastore Java Process] instructions to start the Metastore process
6. Use the [Stop Sqoop1 Metastore Java Process] instructions to gracefully stop the Metastore process
~~~~~
[Deploy Sqoop1 Client Gateway]
1. Login to Cloudera Manager
2. Navigate to: (Home > Cluster)
3. Click button to the right of ClusterName
4. Select "Add a Service"
5. Select "Sqoop 1 Client"
6. Deploy the Gateway on each host that will run Sqoop1 CLI commands
7. Complete wizard
Also see Cloudera Documentation (Deploy Sqoop Client Gateway):
~~~~~
[Configure Sqoop1 Metastore Directory]
1. Execute [commands] to setup the Sqoop1 Metastore directory
[commands]
ssh root@metastore.cloudera.com
mkdir /var/lib/sqoop-metastore (can be anywhere)
chown sqoop:sqoop /var/lib/sqoop-metastore
~~~~~
[Set Sqoop1 Safety Valve sqoop-site.xml ( metastore )]
1. Login to Cloudera Manager
2. Navigate to: (Home > Cluster > Sqoop1 Client Gateway > Configuration)
3. Search for "Sqoop 1 Client Client Advanced Configuration Snippet (Safety Valve) for sqoop-conf/sqoop-site.xml"
4. Add the [xml properties] below
5. Save Changes
6. Redeploy Sqoop1 Client Gateway
[xml properties]
<!-- START SQOOP1 CLIENT GATEWAY SQOOP-SITE.XML SAFETY VALVE CONFIGURATION -->
<property>
<name>sqoop.metastore.client.autoconnect.url</name>
<value>jdbc:hsqldb:hsql://metastore1.cloudera.com:16000/sqoop</value>
<description>THIS TELLS SQOOP1 WHERE TO GO TO CONNECT TO SHARED METASTORE</description>
</property>
<property>
<name>sqoop.metastore.server.location</name>
<value>/var/lib/sqoop-metastore/metastore.db</value>
<description>THIS TELLS SQOOP1 THE LINX PATH AND METASTORE NAME FOR THE DATABASE</description>
</property>
<property>
<name>sqoop.metastore.server.port</name>
<value>16000</value>
<description>THIS THE LISTEN PORT FOR HSQLDB</description>
</property>
<property>
<name>sqoop.metastore.client.record.password</name>
<value>false</value>
<description>IF TRUE, SAVING PASSWORDS IN METASTORE IS ALLOWED</description>
</property>
<property>
<name>sqoop.metastore.client.enable.autoconnect</name>
<value>true</value>
<description>IF TRUE, SQOOP WILL USE LOCAL METASTORE WHEN NO OTHER METASTORE ARGUMENTS ARE PROVIDED </description>
</property>
<!-- SQOOP1 CLIENT GATEWAY SQOOP-SITE.XML SAFETY VALVE CONFIGURATION -->
~~~~~
[Start Sqoop1 Metastore Java Process]
1. Execute [commands] to start the Sqoop1 Metastore process
[commands]
ssh root@metastore.cloudera.com
sudo -u sqoop nohup sqoop-metastore&
ps -aux | egrep -i metastore
Output:
root 3021 0.1 0.1 169388 3024 pts/1 S 01:49 0:00 sudo -u sqoop sqoop-metastore
sqoop 3022 11.8 3.9 1535656 75732 pts/1 Sl 01:49 0:02 /usr/java/jdk1.7.0_67-cloudera/bin/java -Xmx1000m -Dhadoop.log.dir=/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/lib/native:::/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender org.apache.sqoop.Sqoop metastore
root 3215 0.0 0.0 101012 860 pts/1 S+ 01:49 0:00 egrep -i metastore
~~~~~
[Stop Sqoop1 Metastore Database]
1. Execute [commands] to stop the Sqoop1 Metastore
[commands]
ssh root@metastore.cloudera.com
sudo -u sqoop sqoop-metastore --shutdown
ps -aux | egrep -i metastore
Hope this helps.
Regards
Nitish
Created 12-03-2018 09:33 PM
Really great help.
Thanks lot.
regards
vivek
Created 12-04-2018 01:42 AM
Hi,
You are welcome.
Regards
Nitish
Created 12-04-2018 01:42 AM
any practical example of how to shedule ozzie workflow
Created 12-04-2018 01:48 AM
Hi,
you can create Oozie workflow jobs( which will run Sqoop actions frequently) and pass it out in coordinator.
Link:- https://oozie.apache.org/docs/4.1.0/DG_SqoopActionExtension.html (To create workflow.xml)
Link:- https://oozie.apache.org/docs/3.1.3-incubating/CoordinatorFunctionalSpec.html ( To create coordinator)
Regards
Nitish