Created 09-11-2016 08:34 PM
Hello - I'm having trouble using the Sqoop CLI with HCatalog. The error is always:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.mapreduce.HCatOutputFormat not found at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:519) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:499) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1598) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:499) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:285) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1556) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1553) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1486) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.mapreduce.HCatOutputFormat not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:515) ... 11 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.mapreduce.HCatOutputFormat not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 13 more
I added the following to the sqoop-env template within Amabari (advice I found in another post) with no luck.
This job works fine when in an Oozie Sqoop action, but failing with the above when executed via CLI:
$ sqoop import --skip-dist-cache --username xx --password-file /dir/xx.dat --connect jdbc:postgresql://server.x.lan/xx --split-by download_id --hcatalog-table project_xx_0 --hcatalog-database default --query "select [lots of stuff with several joins] AND \$CONDITIONS"
So this is a sqoop free-form query import from a Postgresql database/table to an existing hive-hcatalog table.
Any help, tips are greatly appreciated, Thanks.
Created 09-11-2016 09:56 PM
@Brenden Cobb you're missing hcatalog specific libs when you invoke sqoop CLI. If you're saying it works with Oozie, means the libs are being serviced by your sharelib, in CLI mode, you need to provide them. Can you confirm whether HCat client is installed on the node? You may have to search all nodes. Additionally, you're using --skip-dist-cache parameter, thereby forcing local libs over sharelib, you either need hcatalog on the classpath, in your sqoop lib or passed to the CLI command explicitly.
Created 09-11-2016 09:56 PM
@Brenden Cobb you're missing hcatalog specific libs when you invoke sqoop CLI. If you're saying it works with Oozie, means the libs are being serviced by your sharelib, in CLI mode, you need to provide them. Can you confirm whether HCat client is installed on the node? You may have to search all nodes. Additionally, you're using --skip-dist-cache parameter, thereby forcing local libs over sharelib, you either need hcatalog on the classpath, in your sqoop lib or passed to the CLI command explicitly.
Created 09-12-2016 05:09 PM
Ah, thanks. Feeling very silly I left that param in. 🙂