Created 01-27-2016 06:16 PM
I'm using HDP 2.3.2 sandbox. To make sqoop from oozie run, I needed:
- set port forwarding for 10020 in sandbox to remove connection refuse issue
- copy oracle jdbc drive to HDFS /user/oozie/share/lib/lib_20151027124452/sqoop directory
Now, I can run sqoop list-tables or list-databases command; however, sqoop import fails with both HDFS and Hive.
Working command:
list-tables --connect jdbc:oracle:thin:@//11.111.11.111:1521/dev11 --password m11 --username m11
Not Working command:
sqoop import --connect jdbc:oracle:thin:@//11.111.11.111:1521/dev11 --password m11 --username m11 --table A_BASE --verbose --target-dir /user/hue/test1 -m 1
sqoop import --connect jdbc:oracle:thin:@//11.111.11.111:1521/dev11 --password m11 --username m11 --hive-table A_BASE --table A_BASE --verbose --hive-import -m 1
These commands all work from a SSH command line.
The error message is: Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
I wish we could see the detail debug information for this error return.
Created 01-28-2016 10:48 PM
This is my summary:
1. download HDP sandbox
2. create a worflow that does list-databases and run it from Hue. We expect the failure.
3. To see the error log, localhost:19888/jobhistory/?user.name=hue, click job Id link, click value 1 link for Maps at the bottom of tables. In the new web page, click logs link. You should see java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver.
4. open SSH shell as root. Copy oracle jdbc jar to /home/oozie from your local file system.
5. run su - oozie
6. run hdfs dfs -put ojdbc7-12.1.0.1.0.jar /user/oozie/share/lib/lib_20151027124452/sqoop
7. check if oracle jdbc jar is copied or not from Ambari. Use HDFS Files viewer from clicking the second icon from the right side on the Ambari page. Navigate it /user/oozie/share/lib/lib_20151027124452/sqoop.
8. Restart sandbox.
9. Run the workflow created at step 2.
10. It should work.
11. create another workflow that does import Hive and run it from Hue.
12. You see the warning message: 2016-01-28 21:10:23,398 WARN [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(177)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. Intercepting System.exit(1)
13. Need the followings two things:
(1) add oozie.action.sharelib.for.sqoop=hive,hcatalog,sqoop >> To do this from Hue oozie workflow web page, click "Edit properties" link and add this property to Oozie parameters. The oozie.use.system.libpath is already set true for default. Just add it.
(2) copy hive-site.xml to lib dir >> To do this from Hue, click File Browser, click oozie that takes you to /user/hue/oozie. Click workspaces, click _hue_xxx directory for the current workflow, create lib directory in the identified directory, copy hive-site.xml file that contains something like: (you don't need to update jdbc connection string of yours; it looks it needs hive.metastore.uris value; so maybe you can delete the first property, which I have not tried yet.)
<configuration>
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true</value> <description>JDBC connect string for a JDBC metastore</description>
</property>
<property> <name>hive.metastore.uris</name> <value>thrift://sandbox.hortonworks.com:9083</value>
</property>
</configuration>
Once this file is created, go back to the oozie workflow editor, and click Add Path button and select hive-site.xml. Save it.
14. Run the workflow. It should run.
Created 01-27-2016 09:07 PM
@Shigeru Takehara please make sure you have the following property set
oozie.action.sharelib.for.sqoop=hive,hcatalog,sqoop
here's a link to my explanation. Please post errors from the oozie job run.
Created 01-27-2016 09:59 PM
Where should we set oozie.action.sharelib.for.sqoop=hive,hcatalog,sqoop?
I'm using Hue's oozie web page interface. There is "edit properties" link where I could set oozie parameters and hadoop job properties. Since I don't know exact, I set both, but the result is "failed".
Could you tell where we should set? Also the above link does not a web page available.
Created 01-28-2016 01:50 AM
in your job.properties file, I am not intimately familiar with HUE Oozie editor and cannot comment for sure.
Created 01-28-2016 05:47 PM
I got the following error: I used your sqoopshell example.
Stdoutput I AM: yarn
Stdoutput Running On: sandbox.hortonworks.com
Stdoutput CWD: /hadoop/yarn/local/usercache/root/appcache/application_1454000721165_0004/container_e21_1454000721165_0004_01_000002
Stdoutput Can I see these clients? Hive = /usr/bin/hive, Sqoop = /usr/bin/sqoop
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
Created 01-28-2016 05:57 PM
Additional information:
I added the above sqoop working command in the script.sh. I can see it works, but it fails at sqoop import.
Created 01-28-2016 06:50 PM
When I ran sqoop import --connect jdbc:oracle:thin:@//11.111.11.111:1521/dev11 --password m11 --username m11 --table A_BASE --verbose --target-dir /user/root/test1 -m 1,
I got an interesting log:
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=yarn, access=WRITE, inode="/user/yarn/.staging":hdfs:hdfs:drwxr-xr-x
Created 01-28-2016 06:52 PM
@Shigeru Takehara are you running Sqoop as user yarn? If not then its an expected situation. In my script I su as yarn to get that working
Created 01-28-2016 06:53 PM
I ran:
sudo -u hdfs hdfs dfs -mkdir /user/yarn sudo -u hdfs hdfs dfs -chown yarn:hdfs /user/yarn
I got:
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=EXECUTE, inode="/user/yarn/.staging/job_1454000721165_0009/libjars/commons-compress-1.4.1.jar":yarn:hdfs:drwx------
How should we do from here?
Created 01-28-2016 06:54 PM
@Shigeru Takehara do chown recursive chown -R