Created 09-13-2018 08:49 AM
Using HDP 2.6.4 and Sqoop 1.4.6. So not Sqoop 2, which I understand is dropped.
I'm importing from a postgresql with simple username and password (no Kerberos) to Hive/HDFS which does use Kerberos.
Besides doing a kinit first, do I need to tell sqoop somehow that the underlying import work to Hive/HDFS (initiated by the sqoop import command) needs to use Kerberos? If so, how?
Created 09-13-2018 09:05 AM
I'm getting error like this
18/09/13 10:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/XXX/.staging/job_1536655741381_0075
18/09/13 10:56:44 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.security.AccessControlException: Permission denied: user=XXX, access=WRITE, inode="/user/XXX/.staging/job_1536655741381_0075/libjars/antlr-runtime-3.4.jar":XXX:hdfs:----------
It's strange, because the .staging under my user has permission like this
drwx------ - XXXhdfs 0 2018-09-03 10:47 /user/XXX/.staging
The database and tables reside on HDFS with Ranger controlled permissions. They are initially written with these perms (by hive/beeline commands)
hive:hdfs drwx------ (where my sqoop job works fine)
but then (on a cron basis to let Ranger control) changed to
hdfs:hdfs d--------- (and then my sqoop job does not work anymore).
Is that because sqoop needs to be told to use Kerberos, and how is that done with sqoop 1(.4.6)?
Created 09-14-2018 09:09 AM
Could it be it's a Hive service user that's trying to write /user/XXX/.staging/job_1536655741381_0075/libjars/antlr-runtime-3.4.jar and can't because permissions are set as
drwx------ - XXX hdfs 0 2018-09-03 10:47 /user/XXX/.staging
- as in only XXX can write, not 'group' or 'other' (so not Hive or other service user sqoop might initiate to do the work).
Just a thought.
Created 09-13-2018 11:10 AM
Once you have run the kinit, your ticket is valid to run any service on a kerberized cluster. In the case of export/inport from a RDBMS to HDFS you can see from my command below the parameter --target-dir that tells sqoop where I was the data exported. Below are the best recommended practice, because sqoop will send your password in clear text which is a security issue,its recommended to encrypt the password using the hadoop credential create API, in the below extract I am storing the encrypted password [mysql.testDB.password.jceks] as an alias in my home directory in HDFS where I am the only one who can access. When prompted for the password provide the database password
$ hadoop credential create mysql.testDB.alias -provider jceks://hdfs/user/micheal/mysql.testDB.password.jceks Enter password: Enter password again: mysql.testDB.alias has been successfully created. org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.
Now after creating the above-encrypted password, the import will look like this you can also use --password xyzfcd instead of the alias
sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/micheal/mysql.testDB.password.jceks --driver com.mysql.jdbc.Driver --connect jdbc:mysql://hadoop.com:3306/test --username micheal --password-alias mysql.testDB.alias --table "customer" --target-dir /user/micheal/test
You will realize that the password parameter I used an alias Please see the sqoop user guide
Created 09-13-2018 03:07 PM
That's an HDFS permission issue and nothing to do with sqoop.You should the following assuming the user running the sqoop job is olsen as the root user
Switch to hadoop super user
# su - hdfs
As hdfs user
$ hdfs dfs -mkdir /user/olsen
Change the hdfs directory permissions
$ hdfs dfs -chown olsen:olsen /user/olsen
Now run your sqoop job as olsen it should succeed.
HTH
Created 09-14-2018 06:31 AM
I already have a user account dir, which seem to have right permissions (user called XXX here)
drwx------ - XXX hdfs 0 2018-09-03 10:47 /user/XXX/.staging
I once saw an admin user look into the Ranger logs, and it seemed strange that I first got an allow on the file in question, then immiediately after (within same second) a deny - on the very same filepath.
Created 09-14-2018 06:58 AM
Could you test using HDFS /tmp as output directory mainly used as a temporary storage during MapReduce operation?
--target-dir /user/tmp/test
Another test can you change the hdfs directory permissions so that the user and group is olsen or xxxx where xxxx is the one launching the sqoop job
$ hdfs dfs -chown olsen:olsen /user/olsen
Please let me know
Created 09-14-2018 07:45 AM
I'm using hcatalog, which doesn't support target-dir, so cannot try it out.
I'm not allowed to change ownership, and think it shouldn't help also having the group set to me, if I'm already owner with rwx. It would rather restrict chances of writing, as hdfs no longer can access it, unless I put rwx on 'other'.