Created 07-16-2018 11:28 AM
Is there a way to use the --hcatalog options with sqoop import while maintaining control over destination dir, like --warehouse-dir or --destination-dir? These options appear to be incompatible, but perhaps there's a trick.
I'd like to import and create table definitions for Hive in a single go, storing as avro or orc, while controlling the destination dir, so I'm not forced to Hive's default/managed directory. There are various reasons for not placing in default warehouse dir, one being I don't currently have permissions to do to (so getting an error in sqoop import with --hcatalog options), another being we have a dir placement strategy not using the default dir.
Created 07-17-2018 12:28 AM
Hello @Henrik Olsen!
Yep, according to the documentation, these "dir" options aren't supported 😞
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_unsupported_sqoop_hive_import_options
What about the --hive-home parameter? I never used but seems to replace the default hive dir.
Or.. if you create a database pointing to other location on HDFS, then you can take some advantage using --hcatalog-database parameter alongside the hcatalog integration.
Hope this helps!
Created 07-17-2018 07:09 AM
I think the --hive-home parameter is for the Hive installation path, not data placement.
I currently encounter a problem where sqoop import is hanging, running
sqoop import -Dmapreduce.job.user.classpath.first=true --verbose --driver com.sybase.jdbc4.jdbc.SybDriver --connect "jdbc:sybase:Tds:xxxx:4200/xxx" --username sxxx --password-file file:///home/xxx/.pw --table xxx -m 1 --create-hcatalog-table --hcatalog-database sandbox --hcatalog-table my_table --hcatalog-storage-stanza "stored as avro"
where sandbox database was either created through ambari in default or custom location.
The source table columns are read fine, and the create DDL prepared
INFO hcat.SqoopHCatUtilities: HCatalog Create table statementcreate table xxx stored as avro 18/07/17 09:00:20 INFO hcat.SqoopHCatUtilities: Executing external HCatalog CLI process with args :-f,/tmp/hcat-script-1531810820345 18/07/17 09:00:21 INFO hcat.SqoopHCatUtilities: 18/07/17 09:00:21 WARN conf.HiveConf: HiveConf of name hive.mapred.supports.subdirectories does not exist
Then it hangs.
If I do Ctrl-C I get
18/07/17 09:07:58 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: HCat exited with status 130 at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.executeExternalHCatProgram(SqoopHCatUtilities.java:1196) at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.launchHCatCli(SqoopHCatUtilities.java:1145) at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.createHCatTable(SqoopHCatUtilities.java:679) at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:342) at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:848) at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:102) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:263) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
Tips on how to troubleshoot this?
Created 07-17-2018 07:44 AM
Letting it take it's time the command finally exited with the attached stack trace.
And I now see I get a similar
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
when just trying to start hive from the same shell where I run sqoop, so there must be some setup-issue. I don't have admin privileges, so likely rely on others to fix.
Created 07-18-2018 05:54 AM
Hi @Henrik Olsen !
Sorry for my delay, I'm trying to answer here in the HCC at night. And yes sorry about the --hive-home!
Okay, regarding your issues.
I saw a GSS exception, do you have Kerberos in the env?
If so, try to kinit -kt <keytab> <principal> before attempting the sqoop job to see if works.
Btw you mentioned that you don't have admin privileges, not sure if this going to help you (I already suffered the same as you). But usually, in secured environments, the sysadmin generates a keytab to your user or service, so you can get a valid ticket and pass through kerberos.
Hope this helps!
Created 07-19-2018 07:56 AM
Thanks. Indeed I forgot the kinit. When trying just with kinit (no options) it uses my own user account. That user cannot access the Hive database dir though (hive:hdfs owns that as shown with access rights below).
18/07/19 09:51:13 INFO hcat.SqoopHCatUtilities: FAILED: SemanticException MetaException(message:java.security.AccessControlException: Permission denied: user=w19993, access=READ, inode="/user/w19993/hivedb":hive:hdfs:drwx------
I wonder how to come around that. Should I use kinit with a service account? I notice on the HDP server I'm running sqoop from, that there exists a set of keytabs under /etc/security/keytabs/, including hdfs.headless.keytab and hive.service.keytab. Don't know how to use those.
Created 07-19-2018 11:44 AM
Got it to work now, but finding a suitable keytab and pricipal used elsewhere (nifi) to access HDFS. By copying that keytab from the nifi/hdf server to our hdp server and kinit with that, I got the right permissions for sqoop and hive/hcatalog to do it's thing. Nice.
Created 07-19-2018 08:29 PM
Good to know @Henrik Olsen!
I missed your reply coz I wasn't notified in your last comments haha
But, are you still having issues with sqoop? Keep us posted 🙂
Otherwise, if your issue has been solved. Please, I'd kindly ask to you to accept the answer, this will engage the other HCC user's to keep doing the good job, hence they'll find the solution faster by just looking for the best answer 😄
Created 07-20-2018 07:56 AM
Hi @Vinicius Higa Murakami. I still need to figure out control of destination dir with --hcatalog. I guess I can create the database first with a location parameter. I still think the tables inside will be managed though and not external. Any way to get the data to be external?
Created 07-20-2018 07:24 PM
Hi @Henrik Olsen!
Indeed the table will be managed, guess in this case you've 3 options
- Create the external table in Hive before attempting the sqoop job
- By trying to use the sqoop codegen (didn't test it, but it's worth to take a look)
http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal
- Or you can create as managed table, afterwards, you change the table properties from managed to external.
alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE'); msck repair table table_name;
Hope this helps!
Created 07-24-2018 08:35 AM
I ended up with a followup procedure to make a "create external table like ...", insert into that and drop the managed. I might also try your external=true approach.
Created 07-27-2018 06:23 PM
Cool, glad to know that 🙂
Let me know if you're still having this kinda issue, if don't I'd kindly ask to you to accept an answer, this will engage the other HCC user's to keep doing a good job, hence they'll find the solution faster by just looking for the best answer