Created 10-08-2018 02:10 PM
I am trying t import data and creating table into hive but getting below error:
sqoop import --connect jdbc:postgresql://<hsot>/iso --username <username> -P --table poc --hive-import --create-hive-table --hive-table hdp.poc --delete-target-dir -- --schema live;
18/10/08 15:31:51 INFO authenticator.AbstractLogin: Successfully logged in. 18/10/08 15:31:51 INFO kerberos.KerberosLogin: [Principal=null]: TGT refresh thread started. 18/10/08 15:31:51 INFO kerberos.KerberosLogin: [Principal=null]: TGT valid starting at: Mon Oct 08 15:30:53 CEST 2018 18/10/08 15:31:51 INFO kerberos.KerberosLogin: [Principal=null]: TGT expires: Tue Oct 09 01:30:53 CEST 2018 18/10/08 15:31:51 INFO kerberos.KerberosLogin: [Principal=null]: TGT refresh sleeping until: Mon Oct 08 23:40:14 CEST 2018 18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'key.deserializer' was supplied but isn't a known config. 18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'value.deserializer' was supplied but isn't a known config. 18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'hook.group.id' was supplied but isn't a known config. 18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'zookeeper.connection.timeout.ms' was supplied but isn't a known config. 18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'zookeeper.session.timeout.ms' was supplied but isn't a known config. 18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'enable.auto.commit' was supplied but isn't a known config. 18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'zookeeper.connect' was supplied but isn't a known config. 18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'zookeeper.sync.time.ms' was supplied but isn't a known config. 18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'session.timeout.ms' was supplied but isn't a known config. 18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'auto.offset.reset' was supplied but isn't a known config. 18/10/08 15:31:51 INFO utils.AppInfoParser: Kafka version : 1.0.0.2.6.5.0-292 18/10/08 15:31:51 INFO utils.AppInfoParser: Kafka commitId : 2ff1ddae17fb8503 18/10/08 15:31:51 INFO kafka.KafkaNotification: <== KafkaNotification.createProducer() 18/10/08 15:32:51 ERROR hook.AtlasHook: Failed to send notification - attempt #1; error=java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms. 18/10/08 15:33:52 ERROR hook.AtlasHook: Failed to send notification - attempt #2; error=java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms. 18/10/08 15:34:53 ERROR hook.FailedMessagesLogger:
However map reduce job is getting successfully executed and data is loaded but it table is not getting created into hive .
8/10/08 15:31:21 INFO mapreduce.Job: Running job: job_1538735110847_0010 18/10/08 15:31:35 INFO mapreduce.Job: Job job_1538735110847_0010 running in uber mode : false 18/10/08 15:31:35 INFO mapreduce.Job: map 0% reduce 0% 18/10/08 15:31:45 INFO mapreduce.Job: map 25% reduce 0% 18/10/08 15:31:48 INFO mapreduce.Job: map 75% reduce 0% 18/10/08 15:31:49 INFO mapreduce.Job: map 100% reduce 0% 18/10/08 15:31:50 INFO mapreduce.Job: Job job_1538735110847_0010 completed successfully 18/10/08 15:31:50 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=736372 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=470 HDFS: Number of bytes written=546499 HDFS: Number of read operations=16 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Job Counters Launched map tasks=4 Other local map tasks=4 Total time spent by all maps in occupied slots (ms)=40826 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=40826 Total vcore-milliseconds taken by all map tasks=40826 Total megabyte-milliseconds taken by all map tasks=167223296 Map-Reduce Framework Map input records=4269 Map output records=4269 Input split bytes=470 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=327 CPU time spent (ms)=7480 Physical memory (bytes) snapshot=1524961280 Virtual memory (bytes) snapshot=22446456832 Total committed heap usage (bytes)=2266497024 File Input Format Counters Bytes Read=0 File Output Format Counters
Created 10-08-2018 03:23 PM
I have just run a sqoop command with a small variation and it worked fine
$ sqoop-import --connect jdbc:mysql://localhost:3306/ambari --username ambari --password ambari --table users --hive-table testusers --create-hive-table --hive-import --hive-home /user/hive
Validate the hive table creation
hive> show tables; OK testusers Time taken: 2.94 seconds, Fetched: 1 row(s) hive> describe testusers; OK user_id int principal_id bigint create_time string ldap_user int user_type string user_name string user_password string active int active_widget_layouts string Time taken: 0.486 seconds, Fetched: 9 row(s)
In your sqoop command I see a syntax error or was it intentional ?after the --delete-target-dir -- --schema live; There is a floating -- and something eles what is this --schema live ??
sqoop import --connect jdbc:postgresql://<hsot>/iso --username <username> -P --table poc --hive-import --create-hive-table --hive-table hdp.poc --delete-target-dir -- --schema live;
Please revert.
Created 10-08-2018 05:14 PM
Looks like there is some issue with Kafka. Is your Kafka running ?
It is trying to post the entity to Atlas and it is failing. This doesn't cause the job to fail but the entity won't be seen in Atlas.
If you do not want to use Atlas, then you can disable it under Sqoop. You can uncheck "Enable Atlas Hook" under Advanced sqoop-env section. If you want to use atlas make sure that Atlas is configured properly to talk to Kafka and kafka is up and running.
.
-Aditya
Created 10-09-2018 10:41 AM
yes Aditya !!! I had checked on the same line , atlas and kafka both are up and running also I have tried to run the sqoop job disabling atlas hook but still it end up with errors Failed to update metadata after 60000 ms.
Created 10-09-2018 10:50 AM
Can you check if these properties in Atlas are set properly.
atlas.kafka.bootstrap.servers - Should be in this format ( kafkahost1:6667, kafkahost2:6667,..)
atlas.kafka.zookeeper.connect - Should be in this format ( zkhost1:2181, zkhost2:2181,...)
atlas.kafka.security.protocol - Should match the protocol on which kafka is running (Can be checked in Kafka config - Filter for listeners)
.
If you are using Ranger, make sure atlas user has proper permissions to ATLAS_HOOK and ATLAS_ENTITIES topics in Kafka.
.
You can also disable Atlas Hive hook under hive configs just to confirm if the error is gone.
Created 10-09-2018 01:18 PM
all the mentioned property is set in the cluster .also ranger is installed but plugin did not enable for kafka. I have grnated permission using below commands :
/usr/hdp/current/kafka-broker/bin/kafka-acls.sh
--add --group*--allow-principal User:*--operation All
--authorizer-properties "zookeeper.connect=<ZOOKEEPER_HOST>:2181"
/usr/hdp/current/kafka-broker/bin/kafka-acls.sh
--add --topic ATLAS_ENTITIES --allow-principal User:*--operation All
--authorizer-properties "zookeeper.connect=<ZOOKEEPER_HOST>:2181"
/usr/hdp/current/kafka-broker/bin/kafka-acls.sh
--add --topic ATLAS_HOOK --allow-principal User:*--operation All
--authorizer-properties "zookeeper.connect=<ZOOKEEPER_HOST>:2181"