Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Executing Sqoop Job works fine in Edge node but fails through Oozie

Executing Sqoop Job works fine in Edge node but fails through Oozie

Explorer

Error Message: ERROR org.apache.sqoop.tool.ImportTool - Import failed: Missing Hive MetaStore connection URI

 

Creatg Job command used:
sqoop job --create TestEmployeeJob123 -- import --connect "jdbc:oracle:thin:@a0-10101423.sdi.corp.company.com:49125:Database" --table EMPLOYEE --username Name --password-file /user/user1/pwd --hive-import --hive-database mosaic --hive-table employee123 --incremental append --check-column ID --last-value 0 -m 1 --as-parquetfile

 

Execute Job command used:
sqoop job --exec TestEmployeeJob123

 

Execute Job in Edge node - Successfull

Execute Job through Oozie - Successfull, if there are no new records; Failure, if there are new records to update the Hive database


In oozie job (both sqoop create & sqoop exec), apart from create & execute statements, I've below settings, pls let me know, should i configure anything else.

 

1. Sqoop action- configuredpwd file path

2. Sqoop action - settings gear credentials checked HCAT & Hive2

3. Workflow setting - Configured Variable (oozie.libpath - /user/oozie/libext)


My guess I miss some sqoop/workflow setting in Oozie, appreciate any help.

 

Version info: CDH 5.14.2 & Sqoop 1.4.6

 

Thanks!

7 REPLIES 7

Re: Executing Sqoop Job works fine in Edge node but fails through Oozie

Master Guru
For sqoop actions where Hive interaction is required, pass a gateway/edge
copy of hive-site.xml into the workflow by placing it in its lib/directory,
or by inlining the configuration contents under the action configuration.

If it's a shell action instead, ensure all NodeManager hosts carry both a
Sqoop Client and Hive gateway roles.

Re: Executing Sqoop Job works fine in Edge node but fails through Oozie

Explorer

@Harsh J

Copied Hive-site.xml from one of the gateway and made it available for sqoop job. Getting below error now...

Appreciate your help

 

2018-08-13 06:26:53,397 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of EMPLOYEE
2018-08-13 06:26:53,399 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2018-08-13 06:26:53,409 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2018-08-13 06:26:53,434 [main] INFO org.apache.sqoop.manager.OracleManager - Time zone has been set to GMT
2018-08-13 06:26:53,516 [main] INFO org.apache.sqoop.manager.OracleManager - Time zone has been set to GMT
2018-08-13 06:26:53,516 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM EMPLOYEE t WHERE 1=0
2018-08-13 06:26:53,539 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM EMPLOYEE t WHERE 1=0
2018-08-13 06:26:54,253 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://server123.abcd.amrs.company.com:9083
2018-08-13 06:26:54,297 [main] INFO hive.metastore - Opened a connection to metastore, current connections: 1
2018-08-13 06:26:54,297 [main] INFO hive.metastore - Connected to metastore.
2018-08-13 06:26:54,477 [main] INFO hive.metastore - Closed a connection to metastore, current connections: 0
2018-08-13 06:26:54,477 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://server123.abcd.amrs.company.com:9083
2018-08-13 06:26:54,481 [main] INFO hive.metastore - Opened a connection to metastore, current connections: 1
2018-08-13 06:26:54,481 [main] INFO hive.metastore - Connected to metastore.
2018-08-13 06:26:54,803 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://server123.abcd.amrs.company.com:9083
2018-08-13 06:26:54,806 [main] INFO hive.metastore - Opened a connection to metastore, current connections: 2
2018-08-13 06:26:54,806 [main] INFO hive.metastore - Connected to metastore.
2018-08-13 06:26:54,844 [main] ERROR org.apache.sqoop.mapreduce.ParquetJob - Couldn't fetch delegation token.
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.sqoop.mapreduce.ParquetJob.addHiveDelegationToken(ParquetJob.java:196)
at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:113)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:267)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:454)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:513)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:244)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:299)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:187)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:170)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:81)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:235)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: MetaException(message:Delegation Token can be issued only with kerberos authentication. Current AuthenticationMethod: TOKEN)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result.read(ThriftHiveMetastore.java)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_delegation_token(ThriftHiveMetastore.java:3886)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_delegation_token(ThriftHiveMetastore.java:3872)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDelegationToken(HiveMetaStoreClient.java:1899)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDelegationToken(HiveMetaStoreClient.java:1889)
... 37 more
2018-08-13 06:26:54,945 [main] ERROR org.apache.sqoop.Sqoop - Got exception running Sqoop: java.lang.RuntimeException: Couldn't fetch delegation token.

<<< Invocation of Sqoop command completed <<<

No child hadoop job is executed.
Intercepting System.exit(1)

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://nameservice1/user/user123/oozie-oozi/0020326-180726130552582-oozie-oozi-W/sqoop-c7a5--sqoop/action-data.seq
2018-08-13 06:26:55,012 [main] INFO org.apache.hadoop.io.compress.zlib.ZlibFactory - Successfully loaded & initialized native-zlib library
2018-08-13 06:26:55,013 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.deflate]
Successfully reset security manager from org.apache.oozie.action.hadoop.LauncherSecurityManager@2f677247 to null

Oozie Launcher ends

2018-08-13 06:26:55,064 [main] INFO org.apache.hadoop.mapred.Task - Task:attempt_1532704598265_58608_m_000000_0 is done. And is in the process of committing
2018-08-13 06:26:55,064 [main] INFO org.apache.hadoop.mapred.Task - Task:attempt_1532704598265_58608_m_000000_0 is done. And is in the process of committing
2018-08-13 06:26:55,275 [main] INFO org.apache.hadoop.mapred.Task - Task 'attempt_1532704598265_58608_m_000000_0' done.
2018-08-13 06:26:55,275 [main] INFO org.apache.hadoop.mapred.Task - Task 'attempt_1532704598265_58608_m_000000_0' done.
2018-08-13 06:26:55,376 [main] INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl - MapTask metrics system shutdown complete.

Re: Executing Sqoop Job works fine in Edge node but fails through Oozie

Explorer

Would appreciate some help on this

Re: Executing Sqoop Job works fine in Edge node but fails through Oozie

Explorer

Worth to look at:

 

https://qna.dwh4u.com/index.php?qa=11682&qa_1=couldnt-fetch-delegation-sqoop-import-action-oozie-wor...

 

You might hit this BUG SQOOP-3177

 

If that is your case, it looks you can workaround this by using --target_dir to load the data in HDFS and then load into Hive table from HDFS.

 

Regards,

Nic

Highlighted

Re: Executing Sqoop Job works fine in Edge node but fails through Oozie

Explorer

@nicusan 

 

BUG SQOOP-3177 is reported for version 1.4.5. I'm using sqoop 1.4.6, still applicable?

 

Two step approach is tedious to accommodate data type compatibility and incremental logic.

1. loading to HDFS 

2. then loading into hive 

 

Is there any other simple work around?

 

Re: Executing Sqoop Job works fine in Edge node but fails through Oozie

Explorer

@Harsh J

Can you pls confirm whether this is bug?

Re: Executing Sqoop Job works fine in Edge node but fails through Oozie

Master Guru
Yes, @nicusan is correct on your subsequent problem here. The ParquetJob class Sqoop uses appears to redundantly fetch a delegation token even if it already has one, which will require a TGT to execute and thus doesn't work with a Sqoop Action in Oozie. SQOOP-3177 tracks this issue.

While it is inconvenient, you can consider using a shell action with a keytab passed into it, do a login via kinit in the script and then run a Sqoop CLI invocation, which could help work around this.

P.s. You may need to 'unset HADOOP_TOKEN_FILE_LOCATION' before doing the kinit inside the script.