Support Questions
Find answers, ask questions, and share your expertise

Can't import data via Sqoop cli with HCatalog

Contributor

Hello - I'm having trouble using the Sqoop CLI with HCatalog. The error is always:

org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
org.apache.hive.hcatalog.mapreduce.HCatOutputFormat not
found at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:519)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:499)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1598)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:499)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:285)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1556)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1553)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1486)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
org.apache.hive.hcatalog.mapreduce.HCatOutputFormat not
found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:515)
        ... 11 more
Caused by: java.lang.ClassNotFoundException:
Class org.apache.hive.hcatalog.mapreduce.HCatOutputFormat not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 13 more

I added the following to the sqoop-env template within Amabari (advice I found in another post) with no luck.

This job works fine when in an Oozie Sqoop action, but failing with the above when executed via CLI:

$ sqoop import --skip-dist-cache --username xx --password-file /dir/xx.dat --connect jdbc:postgresql://server.x.lan/xx --split-by download_id --hcatalog-table project_xx_0 --hcatalog-database default --query "select [lots of stuff with several joins] AND \$CONDITIONS"

So this is a sqoop free-form query import from a Postgresql database/table to an existing hive-hcatalog table.

Any help, tips are greatly appreciated, Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

Mentor

@Brenden Cobb you're missing hcatalog specific libs when you invoke sqoop CLI. If you're saying it works with Oozie, means the libs are being serviced by your sharelib, in CLI mode, you need to provide them. Can you confirm whether HCat client is installed on the node? You may have to search all nodes. Additionally, you're using --skip-dist-cache parameter, thereby forcing local libs over sharelib, you either need hcatalog on the classpath, in your sqoop lib or passed to the CLI command explicitly.

View solution in original post

2 REPLIES 2

Mentor

@Brenden Cobb you're missing hcatalog specific libs when you invoke sqoop CLI. If you're saying it works with Oozie, means the libs are being serviced by your sharelib, in CLI mode, you need to provide them. Can you confirm whether HCat client is installed on the node? You may have to search all nodes. Additionally, you're using --skip-dist-cache parameter, thereby forcing local libs over sharelib, you either need hcatalog on the classpath, in your sqoop lib or passed to the CLI command explicitly.

View solution in original post

Contributor

Ah, thanks. Feeling very silly I left that param in. 🙂