I have done a kinit username@domainname already before initializing the sqoop command and kinit has generated a valid kerberos ticket as well. Even though I do this and run the command from the location where the kerberos user keytab file is located, it still gives me the above issue. As I can see in the sqoop.properties file, parameter org.apache.sqoop.security.authentication.type=KERBEROS is commented out. Is this the reason for the above issue? How can we resolve the above issue where we need sqoop to export data from HDFS to kerberos authenticated DB?
I am not a Sqoop expert, but the error you getting is not coming from Hadoop services; rather, the login to your Oracle Database is failing. In this case, I believe you should review how you are supplying credentials to authenticate to the Oracle DB and verify they are correct.
The error you are encountering is Oracle related you should validate the username password against the oracle database. If you have an oracle client installed on your laptop do the below there are many variations. But let your DBA give you the username/password for the database you are trying to export.
Having said that Sqoop on the command line will display your password in clear text, that's not secure so below is a way to encrypt your password so that your sqoop jobs are secured from prying eyes, to do that you will use the hadoop package to encrypt your oracle credential.
In the below example I am creating a password for my fictitious testDB and using the name in the alias to easily identify it from the 100's of db's. In the below example I am using MySQL database the alias doesn't matter ie oracle.testDB.alias or db2.testDB.alias, the most important is the password it should match the password of the user of the Oracle/MySQL/dbs user
Encrypting SQOOP password
Generating the jceks file you MUST provide a path to your hdfs home, create one before executing this command
$ hadoop credential create mysql.testDB.alias -provider jceks://hdfs/user/george/mysql.testDB.password.jceks Enter password: [database_password] Enter password again: [database_password] mysql.testDB.alias has been successfully created. org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.
Validating the encyrpted password creation
The encrypted password jceks will be MUST be written to your hdfs home $ hdfs dfs -ls /user/george Found 1 items -rwx------ 3 george hdfs 503 2018-09-02 01:40 /user/george/mysql.testDB.password.jceks
Running the sqoop with the jceks alias
Assumption my mysql database test is running on host is pomme.cloudera.com port 3306
Warning: /usr/hdp/188.8.131.52-205/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 18/09/02 02:08:04 INFO sqoop.Sqoop: Running Sqoop version: 184.108.40.206.6.2.0-205 18/09/02 02:08:06 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 18/09/02 02:08:06 INFO manager.SqlManager: Using default fetchSize of 1000 18/09/02 02:08:06 INFO tool.CodeGenTool: Beginning code generation .......... some text removed here.............. 18/09/02 02:08:18 INFO mapreduce.Job: The url to track the job: http://pomme.cloudera.com:8088/proxy/application_1535835049607_0002/ 18/09/02 02:08:18 INFO mapreduce.Job: Running job: job_1535835049607_0002 18/09/02 02:08:55 INFO mapreduce.Job: Job job_1535835049607_0002 running in uber mode : false Total megabyte-milliseconds taken by all map tasks=917431296 Map-Reduce Framework Map input records=2170 Map output records=2170 Input split bytes=396 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=944 CPU time spent (ms)=11690 Physical memory (bytes) snapshot=669270016 Virtual memory (bytes) snapshot=18275794944 Total committed heap usage (bytes)=331350016 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=243892 18/09/02 02:11:48 INFO mapreduce.ImportJobBase: Transferred 238.1758 KB in 218.8164 seconds (1.0885 KB/sec) 18/09/02 02:11:48 INFO mapreduce.ImportJobBase: Retrieved 2170 records.
Sqoop import in hdfs
Check the export was successful $ hdfs dfs -ls /user/george/test Found 5 items -rw-r--r-- 3 george hdfs 0 2018-09-02 02:11 /user/george/test/_SUCCESS -rw-r--r-- 3 george hdfs 60298 2018-09-02 02:10 /user/george/test/part-m-00000 -rw-r--r-- 3 george hdfs 60894 2018-09-02 02:10 /user/george/test/part-m-00001 -rw-r--r-- 3 george hdfs 62050 2018-09-02 02:11 /user/george/test/part-m-00002 -rw-r--r-- 3 george hdfs 60650 2018-09-02 02:11 /user/george/test/part-m-00003
Check the values in he splits
$ hdfs dfs -cat /user/george/test/part-m-00000 1,Julian Stuart,firstname.lastname@example.org,Suspendisse Tristique Neque Associates,9230 Metus. Av.,Pemberton,Mexico 2,Ferris Fulton,email@example.com,Nunc Ltd,256-788 At Avenue,Northampton,China 3,Byron Irwin,adipiscing.Mauris@DonecnibhQuisque.edu,Nascetur Ridiculus Foundation,4042 Non, St.,Gattatico,Lithuania ..........................some text removed ................ 18,Peter Middleton,purus.Nullam.firstname.lastname@example.org,Erat In Consectetuer Associates,1618 Donec St.,Grand Island,Thailand
So, I am able to connect to SQL Developer/SQL Plus for the kerberos authenticated DB.
1. kinit -k -t <keytabfile> <user>@<Realm/domain>
2. A ticket is created by the above command. Point the credential file location in SQL Developer to this ticket file location.
3. Provide host,post and DB service name. Click on the checkbox where it mentions Kerberos authentication.
4. Voila, SQL Developer connects to the database (without passing username/password)
But, similar set of steps fails in Sqoop to connect to the db since there is no option in Sqoop to mention that the DB has to connected using kerberos authentication (There was a checkbox in sqldeveloper). Is there any way to do the same in Sqoop?
The above command works fine when I provide a username and password for an on-premise database but for a database with kerberos authentication (where you are not supposed to provide any username/password, it fails)