Member since
01-19-2017
3682
Posts
633
Kudos Received
373
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1731 | 06-04-2025 11:36 PM | |
| 2166 | 03-23-2025 05:23 AM | |
| 1035 | 03-17-2025 10:18 AM | |
| 3964 | 03-05-2025 01:34 PM | |
| 2710 | 03-03-2025 01:09 PM |
02-01-2019
10:10 PM
1 Kudo
@Siva A Sqoop will by default import NULL values as string null. Hive is, however, using string \N to denote NULL values and therefore predicates dealing with NULL (like IS NULL) will not work correctly. You should append parameters --null-string and --null-non-string in case of import job or --input-null-string and --input-null-non-string in case of an export job if you wish to properly preserve NULL values. Because sqoop is using those parameters in generated code, you need to properly escape value \N to \N: HTH
... View more
02-01-2019
03:30 PM
@Sandeep Nemuri I think we responded at almost the same time, when some is clicking submit, there is no logic that checks whether a similar answer has already been give 🙂 Maybe you should have added that he needs to run the script as Atlas admin user as illustrated which he wasn't aware of 🙂
... View more
01-30-2019
10:55 AM
@Ali Erdem Any updates on this thread?
... View more
01-29-2019
12:02 PM
@Ali Erdem YES it's possible to connect and run a sqoop job against an SQL server without a password. Hadoop credential provider API the CredentialProvider API in Hadoop allows for the separation of applications and how they store their required passwords/secrets. With Sqoop 1.4.5 or higher, the credential API keystore is supported by Sqoop. The AD user ONLY needs to include the -Dhadoop.security.crendential.provider.path in the sqoop command. Here are the steps, The API expects the password .jceks file to be in HDFS and accessible to that user preferably in his/her home directory Assumption password for Production sqlserver it's good to standardize eg sql_prod,sql_dev or ora_prod,ora_dev etc $ hadoop credential create sql_prod.password -provider jceks://hdfs/user/erdem/sql_prod.password.jceks The above command will prompt for the target database password see output below Enter password: {the_target_database_password}
Enter password again: {the_target_database_password}ora_prod.password
has been successfully created.org.apache.hadoop.security.alias.JavaKeyStoreProvider
has been updated. Now the password should be in your home directory,the file should be readable $ hdfs dfs -ls /user/erdem
Found 1 items
-rwx------ 3 erdem erdem 502 2019-01-29 11:08 /user/erdem/sql_prod.password.jceks Now the user erdem can run a sqoop job sqoop import
-Dhadoop.security.crendential.provider.path jceks//hdfs/user/erdem/sql_prod.password.jceks
-Doraoop.timestamp.string=false -Dmapreduce.job.user.classpath.first=true \
--verbose --connect jdbc:sqlserver://sqlserver-name \
--username erdem \
--password alias ora_prod.password \
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--table test \
--target-dir "{some_dir}" \
--split-by NOOBJETRISQUECONTRAT --direct --as-parquetfile In the above, I modified the output from my oracle sqoop output especially for the driver part. But it should work without issue you will realise the user erdem didn't key in a password on the CLI a security loophole. There you go revert if you need more help.
... View more
01-29-2019
12:53 AM
Part 3 of the previous kerberization document
... View more
01-29-2019
12:52 AM
@Tom Burke Setup the Server: Install Kerberos KDC and Admin Server $ apt update && apt upgrade -y
$ apt install krb5-kdc krb5-admin-server krb5-config -y
$ krb5_newrealm Locate and edit the krb5.conf [logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = TEST.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
[realms]
TEST.COM = {
kdc = server.test.com
admin_server = server.test.com
}
[domain_realm]
.test.com = TEST.COM
test.com = TEST.COM
KDC configuration Locate and edit the kdc.conf /etc/krb5kdc/kdc.conf. [kdcdefaults]
kdc_ports = 88
kdc_tcp_ports = 88
[realms]
TEST.COM = {
#master_key_type = aes256-cts
acl_file = /var/kerberos/krb5kdc/kadm5.acl
dict_file = /usr/share/dict/words
admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
}
Create the Kerberos database This should pick your REALM for the krb5.conf and kdc.conf you will be prompted for a master password keep it preciously it will be useful for the Ambari Kerberos wizard # /usr/sbin/kdb5_util create -s output Loading random data Initializing database '/var/kerberos/krb5kdc/principal' for realm 'TEST.COM', master key name 'K/M@TEST.COM' You will be prompted for the database Master Password. It is important that you NOT FORGET this password. Enter KDC database master key: Re-enter KDC database master key to verify: Locate and edit the kadm5.acl Assign Administrator Privilege by editing the kadm5.acl in /var/kerberos/krb5kdc/kadm5.acl replace the EXAMPLE.COM with your realm */admin@TEST.COM * Restart the KDC and kadmin Set the 2 daemons to auto start at boot else your cluster won't start # /etc/rc.d/init.d/krb5kdc start
Starting Kerberos 5 KDC: [ OK ]
# /etc/rc.d/init.d/kadmin start
Starting Kerberos 5 Admin Server: Create a Kerberos Admin Use the same master password # kadmin.local -q "addprinc admin/admin" Output Authenticating as principal root/admin@TEST.COM with password. WARNING: no policy specified for admin/admin@TEST.COM; defaulting to no policy Enter password for principal "admin/admin@TEST.COM": Re-enter password for principal "admin/admin@TEST.COM": Principal "admin/admin@TEST.COM" created. Check if the root principal was created Go to Ambari and enable Kerberos See attached Kerberos setup for HDP 3.1 they are quite similar save for the new UI
... View more
01-28-2019
04:07 PM
1 Kudo
@Marcel-Jan Krijgsman So frustrating indeed have you tried running the hive import from /usr/hdp/2.6.5.0-292/atlas/hook-bin ? The output should look like below # ./import-hive.sh
Using Hive configuration directory [/etc/hive/conf]
Log file for import is /usr/hdp/current/atlas-server/logs/import-hive.log
log4j:WARN No such property [maxFileSize] in org.apache.log4j.PatternLayout.
log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.PatternLayout.
Enter username for atlas :- admin
Enter password for atlas :-
Hive Meta Data imported successfully!!! After running successfully you should be able to see your tables in Atlas
... View more
01-28-2019
11:57 AM
1 Kudo
@Michael Bronson If you have exhausted all other avenues YES, Step 1 Check and compare the /usr/hdp/current/kafka-broker symlinks Step 2 Download both env'es as backup from the problematic and functioning cluster Upload the functioning cluster env to the problematic one, since you have a backup Start kafka through ambari Step 3 sed -i 's/verify=platform_default/verify=disable/'/etc/python/cert-verification.cfg Step 4 Lastly, if the above steps don't remedy the issue, then remove and -re-install the ambari-agent and remember to manually point to the correct ambari server in the ambari-agent.ini
... View more
01-28-2019
09:01 AM
1 Kudo
@Michael Bronson If you can start your brokers from the CLI then that means your env is not set properly as Ambari depends on that env to successfully start or stop a component. What you could do is export the env from the problematic cluster and compare it meticulously against the env from the working cluster using the procedures I sent above. You should be able to see the difference Can you also validate that the symlinks are okay
... View more
01-28-2019
08:50 AM
@Bhushan Kandalkar Good it worked out but you shouldn't have omitted the information about the architecture ie Load balancer such info is critical in the analysis ....:-) Happy hadooping
... View more