Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop import with --hcatalog options with control over destination dir

Highlighted

Sqoop import with --hcatalog options with control over destination dir

Explorer

Is there a way to use the --hcatalog options with sqoop import while maintaining control over destination dir, like --warehouse-dir or --destination-dir? These options appear to be incompatible, but perhaps there's a trick.

I'd like to import and create table definitions for Hive in a single go, storing as avro or orc, while controlling the destination dir, so I'm not forced to Hive's default/managed directory. There are various reasons for not placing in default warehouse dir, one being I don't currently have permissions to do to (so getting an error in sqoop import with --hcatalog options), another being we have a dir placement strategy not using the default dir.

11 REPLIES 11
Highlighted

Re: Sqoop import with --hcatalog options with control over destination dir

Hello @Henrik Olsen!
Yep, according to the documentation, these "dir" options aren't supported :(
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_unsupported_sqoop_hive_import_options

What about the --hive-home parameter? I never used but seems to replace the default hive dir.

Or.. if you create a database pointing to other location on HDFS, then you can take some advantage using --hcatalog-database parameter alongside the hcatalog integration.

Hope this helps!

Highlighted

Re: Sqoop import with --hcatalog options with control over destination dir

Explorer

I think the --hive-home parameter is for the Hive installation path, not data placement.

I currently encounter a problem where sqoop import is hanging, running

sqoop import -Dmapreduce.job.user.classpath.first=true --verbose --driver com.sybase.jdbc4.jdbc.SybDriver --connect "jdbc:sybase:Tds:xxxx:4200/xxx" --username sxxx --password-file file:///home/xxx/.pw --table xxx -m 1 --create-hcatalog-table --hcatalog-database sandbox --hcatalog-table my_table --hcatalog-storage-stanza "stored as avro"

where sandbox database was either created through ambari in default or custom location.

The source table columns are read fine, and the create DDL prepared

INFO hcat.SqoopHCatUtilities: HCatalog Create table statementcreate table xxx stored as avro 18/07/17 09:00:20 INFO hcat.SqoopHCatUtilities: Executing external HCatalog CLI process with args :-f,/tmp/hcat-script-1531810820345
18/07/17 09:00:21 INFO hcat.SqoopHCatUtilities: 18/07/17 09:00:21 WARN conf.HiveConf: HiveConf of name hive.mapred.supports.subdirectories does not exist

Then it hangs.

If I do Ctrl-C I get

18/07/17 09:07:58 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: HCat exited with status 130
        at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.executeExternalHCatProgram(SqoopHCatUtilities.java:1196)
        at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.launchHCatCli(SqoopHCatUtilities.java:1145)
        at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.createHCatTable(SqoopHCatUtilities.java:679)
        at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:342)
        at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:848)
        at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:102)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:263)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:243)

Tips on how to troubleshoot this?

Re: Sqoop import with --hcatalog options with control over destination dir

Explorer

Letting it take it's time the command finally exited with the attached stack trace.

sqoop.txt

And I now see I get a similar

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

when just trying to start hive from the same shell where I run sqoop, so there must be some setup-issue. I don't have admin privileges, so likely rely on others to fix.

Highlighted

Re: Sqoop import with --hcatalog options with control over destination dir

Hi @Henrik Olsen !
Sorry for my delay, I'm trying to answer here in the HCC at night. And yes sorry about the --hive-home!

Okay, regarding your issues.
I saw a GSS exception, do you have Kerberos in the env?
If so, try to kinit -kt <keytab> <principal> before attempting the sqoop job to see if works.
Btw you mentioned that you don't have admin privileges, not sure if this going to help you (I already suffered the same as you). But usually, in secured environments, the sysadmin generates a keytab to your user or service, so you can get a valid ticket and pass through kerberos.
Hope this helps!

Highlighted

Re: Sqoop import with --hcatalog options with control over destination dir

Explorer

Thanks. Indeed I forgot the kinit. When trying just with kinit (no options) it uses my own user account. That user cannot access the Hive database dir though (hive:hdfs owns that as shown with access rights below).

18/07/19 09:51:13 INFO hcat.SqoopHCatUtilities: FAILED: SemanticException MetaException(message:java.security.AccessControlException: Permission denied: user=w19993, access=READ, inode="/user/w19993/hivedb":hive:hdfs:drwx------

I wonder how to come around that. Should I use kinit with a service account? I notice on the HDP server I'm running sqoop from, that there exists a set of keytabs under /etc/security/keytabs/, including hdfs.headless.keytab and hive.service.keytab. Don't know how to use those.

Highlighted

Re: Sqoop import with --hcatalog options with control over destination dir

Explorer

Got it to work now, but finding a suitable keytab and pricipal used elsewhere (nifi) to access HDFS. By copying that keytab from the nifi/hdf server to our hdp server and kinit with that, I got the right permissions for sqoop and hive/hcatalog to do it's thing. Nice.

Highlighted

Re: Sqoop import with --hcatalog options with control over destination dir

Good to know @Henrik Olsen!
I missed your reply coz I wasn't notified in your last comments haha

But, are you still having issues with sqoop? Keep us posted :)
Otherwise, if your issue has been solved. Please, I'd kindly ask to you to accept the answer, this will engage the other HCC user's to keep doing the good job, hence they'll find the solution faster by just looking for the best answer :D

Highlighted

Re: Sqoop import with --hcatalog options with control over destination dir

Explorer

Hi @Vinicius Higa Murakami. I still need to figure out control of destination dir with --hcatalog. I guess I can create the database first with a location parameter. I still think the tables inside will be managed though and not external. Any way to get the data to be external?

Highlighted

Re: Sqoop import with --hcatalog options with control over destination dir

Hi @Henrik Olsen!
Indeed the table will be managed, guess in this case you've 3 options

- Create the external table in Hive before attempting the sqoop job
- By trying to use the sqoop codegen (didn't test it, but it's worth to take a look)

http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal

- Or you can create as managed table, afterwards, you change the table properties from managed to external.

alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
msck repair table table_name; 

Hope this helps!

Don't have an account?
Coming from Hortonworks? Activate your account here