Support Questions
Find answers, ask questions, and share your expertise

Sqoop and kerberos issue (no kerberos on db, but on hive/hdfs)


Using HDP 2.6.4 and Sqoop 1.4.6. So not Sqoop 2, which I understand is dropped.

I'm importing from a postgresql with simple username and password (no Kerberos) to Hive/HDFS which does use Kerberos.

Besides doing a kinit first, do I need to tell sqoop somehow that the underlying import work to Hive/HDFS (initiated by the sqoop import command) needs to use Kerberos? If so, how?



I'm getting error like this

18/09/13 10:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/XXX/.staging/job_1536655741381_0075

18/09/13 10:56:44 ERROR tool.ImportTool: Encountered IOException running import job: Permission denied: user=XXX, access=WRITE, inode="/user/XXX/.staging/job_1536655741381_0075/libjars/antlr-runtime-3.4.jar":XXX:hdfs:----------

It's strange, because the .staging under my user has permission like this

drwx------ - XXXhdfs 0 2018-09-03 10:47 /user/XXX/.staging

The database and tables reside on HDFS with Ranger controlled permissions. They are initially written with these perms (by hive/beeline commands)

hive:hdfs drwx------ (where my sqoop job works fine)

but then (on a cron basis to let Ranger control) changed to

hdfs:hdfs d--------- (and then my sqoop job does not work anymore).

Is that because sqoop needs to be told to use Kerberos, and how is that done with sqoop 1(.4.6)?


Could it be it's a Hive service user that's trying to write /user/XXX/.staging/job_1536655741381_0075/libjars/antlr-runtime-3.4.jar and can't because permissions are set as
drwx------ - XXX hdfs 0 2018-09-03 10:47 /user/XXX/.staging

- as in only XXX can write, not 'group' or 'other' (so not Hive or other service user sqoop might initiate to do the work).

Just a thought.


@Henrik Olsen

Once you have run the kinit, your ticket is valid to run any service on a kerberized cluster. In the case of export/inport from a RDBMS to HDFS you can see from my command below the parameter --target-dir that tells sqoop where I was the data exported. Below are the best recommended practice, because sqoop will send your password in clear text which is a security issue,its recommended to encrypt the password using the hadoop credential create API, in the below extract I am storing the encrypted password [mysql.testDB.password.jceks] as an alias in my home directory in HDFS where I am the only one who can access. When prompted for the password provide the database password

$ hadoop credential create mysql.testDB.alias -provider jceks://hdfs/user/micheal/mysql.testDB.password.jceks 
Enter password: 
Enter password again: mysql.testDB.alias has been successfully created. has been updated. 

Now after creating the above-encrypted password, the import will look like this you can also use --password xyzfcd instead of the alias

sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql:// --username micheal --password-alias mysql.testDB.alias --table "customer" --target-dir /user/micheal/test 

You will realize that the password parameter I used an alias Please see the sqoop user guide


@Henrik Olsen

That's an HDFS permission issue and nothing to do with sqoop.You should the following assuming the user running the sqoop job is olsen as the root user

Switch to hadoop super user

# su - hdfs

As hdfs user

$ hdfs dfs -mkdir /user/olsen

Change the hdfs directory permissions

$ hdfs dfs -chown olsen:olsen  /user/olsen

Now run your sqoop job as olsen it should succeed.



@Geoffrey Shelton Okot

I already have a user account dir, which seem to have right permissions (user called XXX here)

drwx------ - XXX hdfs 0 2018-09-03 10:47 /user/XXX/.staging

I once saw an admin user look into the Ranger logs, and it seemed strange that I first got an allow on the file in question, then immiediately after (within same second) a deny - on the very same filepath.


@Henrik Olsen

Could you test using HDFS /tmp as output directory mainly used as a temporary storage during MapReduce operation?

--target-dir /user/tmp/test 

Another test can you change the hdfs directory permissions so that the user and group is olsen or xxxx where xxxx is the one launching the sqoop job

$ hdfs dfs -chown olsen:olsen  /user/olsen

Please let me know


I'm using hcatalog, which doesn't support target-dir, so cannot try it out.

I'm not allowed to change ownership, and think it shouldn't help also having the group set to me, if I'm already owner with rwx. It would rather restrict chances of writing, as hdfs no longer can access it, unless I put rwx on 'other'.

; ;