Reply
New Contributor
Posts: 1
Registered: ‎09-01-2015

sqoop action failing with custom jar file

Hello,

 

I have a requirement to mask PII attributes when they get imported from source database into HDFS/Hive using sqoop import.

As part of that effort, I have generated sqoop record class using "codegen" tool and then customized generated class with masking function.

From the command line, I was able to import records successfully using the folowing options " --jar-file Account.jar --class-name Account" along with other standard arguments of Sqoop import tool.

However when I tried to perform the same command from Oozie sqoop action, it was failing with the following error message as shown below. For some reason, MR Job tool interface expanding job.jar file when it bundled all required jars and uploaded onto distributed cache on HDFS ( stage folder under user who executes the job) and subsquently MR2 framework failing to download job's jar file from HDFS into local node where it supposed to run task process. And the root cause is job.jar was exanded (instead keep it in the JAR format) and given only read/write and NO execution permission when it uploaded to HDFS hence MR2 frame work not able to download contents from this directory. 

Please let me know if you guys have any suggestions to fix this issue. Thanks a lot

 

 

Oozie Sqoop action Worflow :

 

<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
<credentials>
<credential name="hcat" type="hcat">
<property>
<name>hcat.metastore.uri</name>
<value>thrift://xxxxxxxxxxx:9083</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>hive/xxxxxxxxxxxxx@XXXX.ORG</value>
</property>
</credential>
</credentials>
<start to="sqoop-ab1c"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="sqoop-ab1c" cred="hcat">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect jdbc:xxxxx --username xxxxxxx --password xxxxxxx --table Account --jar-file Account.jar --class-name Account --hive-import --hive-overwrite --hive-table account -m 1</command>
<file>/user/svc_etl/jars/hive-site.xml#hive-site.xml</file>
<archive>/user/svc_etl/jars/sqljdbc4.jar#sqljdbc4.jar</archive>
<archive>/user/svc_etl/jars/Account.jar#Account.jar</archive>
</sqoop>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>

 

 

Sqoop command arguments :
import
--connect
jdbc:xxxxxxxxxxxxxx
--username
svc_etl
--password
xxxxxxxxxxx
--table
Account
--jar-file
Account.jar
--class-name
Account
--hive-import
--hive-overwrite
--hive-table
default.account
-m
1

 

 

[hdfs@xxxxxx root]$ hadoop fs -ls /user/svc_etl/.staging/job_1439972400507_0239
Found 5 items
drw-r--r-- - svc_etl supergroup 0 2015-09-01 18:41 /user/svc_etl/.staging/job_1439972400507_0239/job.jar
-rw-r--r-- 3 svc_etl supergroup 94 2015-09-01 18:41 /user/svc_etl/.staging/job_1439972400507_0239/job.split
-rw-r--r-- 2 svc_etl supergroup 13 2015-09-01 18:41 /user/svc_etl/.staging/job_1439972400507_0239/job.splitmetainfo
-rw-r--r-- 2 svc_etl supergroup 276704 2015-09-01 18:41 /user/svc_etl/.staging/job_1439972400507_0239/job.xml
drwx------ - svc_etl supergroup 0 2015-09-01 18:41 /user/svc_etl/.staging/job_1439972400507_0239/libjars


[hdfs@xxxxxxx root]$ hadoop fs -ls -R /user/svc_etl/.staging/job_1439972400507_0239/job.jar
-rw-r--r-- 2 svc_etl supergroup 14280 2015-09-01 18:41 /user/svc_etl/.staging/job_1439972400507_0239/job.jar/Account.class

 

 

Yarn Job error stack:

 

 

Application application_1439972400507_0239 failed 2 times due to AM Container for appattempt_1439972400507_0239_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://xxxxxx:8088/proxy/application_1439972400507_0239/Then, click on links to logs of each attempt.
Diagnostics: Permission denied: user=svc_etl, access=READ_EXECUTE, inode="/user/
svc_etl/.staging/job_1439972400507_0239/job.jar":
svc_etl:supergroup:drw-r--r--
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:151)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6506)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:5043)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:5004)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:868)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getListing(AuthorizationProviderProxyClientProtocol.java:334)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:613)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)

org.apache.hadoop.security.AccessControlException: Permission denied: user=svc_etl, access=READ_EXECUTE, inode="/user/svc_etl/.staging/job_1439972400507_0239/job.jar":svc_etl:supergroup:drw-r--r--
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:151)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6506)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:5043)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:5004)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:868)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getListing(AuthorizationProviderProxyClientProtocol.java:334)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:613)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1965)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1946)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:354)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:265)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

 

 

Highlighted
Cloudera Employee
Posts: 50
Registered: ‎08-26-2015

Re: sqoop action failing with custom jar file

In the logs is the line:

org.apache.hadoop.security.AccessControlException: Permission denied: user=svc_etl, access=READ_EXECUTE, inode="/user/svc_etl/.staging/job_1439972400507_0239/job.jar":svc_etl:supergroup:drw-r--r--

The "d" before "rw-r--r--" means that it is a directory, so somehow you have created an HDFS directory called "job.jar" instead of the actual JAR file. Can you try removing that directory, uploading the JAR file to the same path, and then running it again?
Announcements