Created 08-11-2019 11:18 PM
I want to continuously import data from Postgres to Hive through Sqoop Oozie Coordinator Jobs. In incremental import we need to give the last value to start the import from last value.But in this case we don't know the last value as the data is growing continuously ..Is there any way to automate this process?
Could anyone please help me in resolving the issue
Created 08-12-2019 12:11 AM
Created on 08-12-2019 02:40 AM - edited 08-12-2019 02:43 AM
Hi Eric,
Can we write this Sqoop saved job creation command in Oozie Workflow.xml file?
I set the frequency of Coordinator to every 30 mins. My requirement is for every 30 Mins the Coordinator job is executed and need to import the data from Postgres to Hive and we don't the know last value of the records previously which it is inserted because we got the data continuously.
If we write the saved job in xml will it execute and import the data from Postgres to Hive?
I am running the Coordinator workflow from HUE not through command line.
Could you please explain me in detail
Created 08-13-2019 04:30 AM
Hi Eric,
I tried to execute the Sqoop saved jobs command through Oozie Coordinator.I executed the Coordinator jobs through HUE .When I submit the Coordinator job its throwing an error:
$SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration
This is my workflow.xml :
<workflow-app name="sqoop-coordwf" xmlns="uri:oozie:workflow:0.4">
<start to="sqoop-coordaction"/>
<action name="sqoop-coordaction">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="hdfs://HadoopNode3:8020/user/cloudera/student"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<arg>sqoop job</arg>
<arg>--create</arg>
<arg>myjob</arg>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:postgresql://172.19.250.152:5432/inet</arg>
<arg>--username</arg>
<arg>postgres</arg>
<arg>--password</arg>
<arg>postgres</arg>
<arg>--table</arg>
<arg>company</arg>
<arg>--hive-import</arg>
<arg>--hive-table</arg>
<arg>test</arg>
<arg>-m</arg>
<arg>1</arg>
<file>/user/oozieschedules/sqoop/coordinator/hive-site.xml#hive-site.xml</file>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Failed, Error Message</message>
</kill>
<end name="end"/>
</workflow-app>
could you please tell me is there any other alternatives for doing sqoop incremental import....
Created 08-13-2019 04:27 PM
Created 08-16-2019 03:34 AM
Hi Eric,
When we tried to create sqoop saved jobs through this command:
sqoop job --create myjob -- import --connect jdbc:postgresql://ipaddress:5432/inet --username postgres --P --table company --fields-terminated-by '|' --lines-terminated-by '\n' --hive-import --hive-table sampletest
It throws the following error:
19/08/16 15:52:03 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.16.1
19/08/16 15:52:04 ERROR tool.JobTool: I/O error performing job operation: java.io.IOException: Exception creating SQL connection
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:217)
at org.apache.sqoop.metastore.hsqldb.AutoHsqldbStorage.open(AutoHsqldbStorage.java:112)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:290)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.sql.SQLException: socket creation error
at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
at org.hsqldb.jdbc.jdbcConnection.<init>(Unknown Source)
at org.hsqldb.jdbcDriver.getConnection(Unknown Source)
at org.hsqldb.jdbcDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:177)
... 8 more
Could you please help me in resolving the issue....
Created 08-25-2019 09:05 PM
Hi,
Can you share the output of below command from the host on which you are running the Sqoop job?
## ls -ltr /var/lib/sqoop
## ll -aR /home/<username>
This will help us to check the permission on these folders.
Regards
Nitish
Created 08-25-2019 10:10 PM
Hi,
The output of ls -ltr is:
total 600
-rw-r--r-- 1 root root 505 Nov 22 2018 NOTICE.txt
-rw-r--r-- 1 root root 15419 Nov 22 2018 LICENSE.txt
drwxr-xr-x 2 root root 4096 Nov 22 2018 cloudera
lrwxrwxrwx 1 root root 30 Nov 22 2018 sqoop-test.jar -> sqoop-test-1.4.6-cdh5.16.1.jar
lrwxrwxrwx 1 root root 25 Nov 22 2018 sqoop.jar -> sqoop-1.4.6-cdh5.16.1.jar
lrwxrwxrwx 1 root root 15 Nov 22 2018 conf -> /etc/sqoop/conf
lrwxrwxrwx 1 root root 41 Nov 22 2018 sqoop-test-1.4.6-cdh5.16.1.jar -> ../../jars/sqoop-test-1.4.6-cdh5.16.1.jar
lrwxrwxrwx 1 root root 36 Nov 22 2018 sqoop-1.4.6-cdh5.16.1.jar -> ../../jars/sqoop-1.4.6-cdh5.16.1.jar
drwxr-xr-x 2 root root 4096 Nov 22 2018 lib
-rw-r--r-- 1 root root 579785 Jul 19 16:48 postgresql-9.2-1002.jdbc4.jar
drwxr-xr-x 3 root root 4096 Aug 23 17:45 bin
could you please check and resolve the issue.....
Thanks,
Akhila.
Created 08-25-2019 10:23 PM
Hi Akhila,
This is the output of what command?
Can you please run the below commands from the host and share the output.
## ls -ltr /var/lib/sqoop
## ls -ltra /home/<username>
Regards
Nitish
Created 08-25-2019 10:33 PM
Hi,
Sqoop is not there in var/lib/sqoop
Sqoop is installed in /opt/cloudera/parcels/CDH-5.16.1-1.cdh5.16.1.p0.3/lib/sqoop in my environment.
so i ran the command:
ls -ltr /opt/cloudera/parcels/CDH-5.16.1-1.cdh5.16.1.p0.3/lib/sqoop
output:
total 600
-rw-r--r-- 1 root root 505 Nov 22 2018 NOTICE.txt
-rw-r--r-- 1 root root 15419 Nov 22 2018 LICENSE.txt
drwxr-xr-x 2 root root 4096 Nov 22 2018 cloudera
lrwxrwxrwx 1 root root 30 Nov 22 2018 sqoop-test.jar -> sqoop-test-1.4.6-cdh5.16.1.jar
lrwxrwxrwx 1 root root 25 Nov 22 2018 sqoop.jar -> sqoop-1.4.6-cdh5.16.1.jar
lrwxrwxrwx 1 root root 15 Nov 22 2018 conf -> /etc/sqoop/conf
lrwxrwxrwx 1 root root 41 Nov 22 2018 sqoop-test-1.4.6-cdh5.16.1.jar -> ../../jars/sqoop-test-1.4.6-cdh5.16.1.jar
lrwxrwxrwx 1 root root 36 Nov 22 2018 sqoop-1.4.6-cdh5.16.1.jar -> ../../jars/sqoop-1.4.6-cdh5.16.1.jar
drwxr-xr-x 2 root root 4096 Nov 22 2018 lib
-rw-r--r-- 1 root root 579785 Jul 19 16:48 postgresql-9.2-1002.jdbc4.jar
drwxr-xr-x 3 root root 4096 Aug 23 17:45 bin
ls -ltra /home/ubuntu
output:
total 931732
-rw-r--r-- 1 ubuntu ubuntu 675 Nov 22 2018 .profile
-rw-r--r-- 1 ubuntu ubuntu 3637 Nov 22 2018 .bashrc
-rw-r--r-- 1 ubuntu ubuntu 220 Nov 22 2018 .bash_logout
drwx------ 2 ubuntu ubuntu 4096 Nov 22 2018 .cache
-rw------- 1 ubuntu ubuntu 28 Nov 22 2018 .bash_history
drwxr-xr-x 3 root root 4096 Dec 4 2018 elasticsearch-hadoop-6.3.2
drwxr-xr-x 5 root root 4096 Feb 18 2019 venv
-rw-r--r-- 1 root root 470003663 May 10 14:35 signalstrength_20190510.backup
drwxr-xr-x 3 root root 4096 Aug 7 14:37 ..
-rw-r--r-- 1 root root 470994315 Aug 20 15:45 inet.sql
-rw-r--r-- 1 root root 13040148 Aug 21 17:42 npa.sql
drwxr-xr-x 2 root root 4096 Aug 23 12:31 python
-rw-r--r-- 1 root root 323 Aug 23 17:46 script.sh
drwxr-xr-x 6 ubuntu ubuntu 4096 Aug 23 17:46 .
Regards,
Akhila.