Created 08-28-2018 09:46 AM
I'm trying to import the Mysql table into Hive using the Sqoop --hive-import parameter. The import process stops at step hive.HiveImport: Connecting to jdbc:hive2, and remains in this state forever.
All service tests in Ambari are successful.
The connection string to Hive works successfully in the Superset.
Import options:
sqoop-import --connect jdbc:mysql://mysql_host/zabbix --username *** --password *** \ --table alerts \ --hive-import \ --create-hive-table \ --hive-table zabbix.alerts \ --verbose
Output:
18/08/28 12:14:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.0.0.0-1634 ... 18/08/28 12:14:48 INFO mapreduce.ImportJobBase: Beginning import of alerts ... 18/08/28 12:14:58 INFO mapreduce.Job: Running job: job_1535384317331_0003 18/08/28 12:15:06 INFO mapreduce.Job: Job job_1535384317331_0003 running in uber mode : false 18/08/28 12:15:06 INFO mapreduce.Job: map 0% reduce 0% 18/08/28 12:15:14 INFO mapreduce.Job: map 100% reduce 0% 18/08/28 12:15:15 INFO mapreduce.Job: Job job_1535384317331_0003 completed successfully 18/08/28 12:15:15 INFO mapreduce.Job: Counters: 32 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=963956 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=450 HDFS: Number of bytes written=1472709 HDFS: Number of read operations=24 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Job Counters Launched map tasks=4 Other local map tasks=4 Total time spent by all maps in occupied slots (ms)=63339 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=21113 Total vcore-milliseconds taken by all map tasks=21113 Total megabyte-milliseconds taken by all map tasks=32429568 Map-Reduce Framework Map input records=3152 Map output records=3152 Input split bytes=450 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=312 CPU time spent (ms)=10040 Physical memory (bytes) snapshot=1009577984 Virtual memory (bytes) snapshot=13137891328 Total committed heap usage (bytes)=597688320 Peak Map Physical memory (bytes)=258195456 Peak Map Virtual memory (bytes)=3291807744 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=1472709 18/08/28 12:15:15 INFO mapreduce.ImportJobBase: Transferred 1.4045 MB in 25.1313 seconds (57.2271 KB/sec) 18/08/28 12:15:15 INFO mapreduce.ImportJobBase: Retrieved 3152 records. 18/08/28 12:15:15 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table alerts ... 18/08/28 12:15:17 DEBUG hive.TableDefWriter: Create statement: CREATE TABLE `zabbix.alerts` ( `alertid` BIGINT, `actionid` BIGINT, `eventid` BIGINT, `userid` BIGINT, `clock` INT, `mediatypeid` BIGINT, `sendto` STRING, `subject` STRING, `message` STRING, `status` INT, `retries` INT, `error` STRING, `esc_step` INT, `alerttype` INT, `p_eventid` BIGINT, `acknowledgeid` BIGINT) COMMENT 'Imported by sqoop on 2018/08/28 12:15:17' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE 18/08/28 12:15:17 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://hdp01.amb.corp:8020/user/hive/alerts' INTO TABLE `zabbix.alerts` 18/08/28 12:15:17 INFO hive.HiveImport: Loading uploaded data into Hive 18/08/28 12:15:20 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings. 18/08/28 12:15:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] 18/08/28 12:15:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] 18/08/28 12:15:20 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 18/08/28 12:15:20 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 18/08/28 12:15:24 INFO hive.HiveImport: Connecting to jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb.corp:2181,hdp01.amb.corp:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
At the last step, imports hang!
What could be the problem?
Created 09-13-2018 08:51 PM
Mahesh you nailed it!
To get this to work I created the file "beeline-hs2-connection.xml" in ${HIVE_CONF_DIR}, i.e. /etc/hive/conf, and I included the credentials there.
See https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
Created 08-28-2018 10:51 AM
Can you check the HS2 logs and Zookeeper logs for issues? Seems like the Zookeeper is unable to connect to HS2.
Created 08-28-2018 11:42 AM
In the journals HS2 and Zookeeper no problems were found.
But, when connecting to Hive with Beeline, the server asks for the login and password.
Could something like this happen when Sqoop connected to Hive? And how to fix it?
[hive@hdp00]$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb.corp:2181,hdp01.amb.corp:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 Enter username for jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb.corp:2181,hdp01.amb.corp:2181/default: hive Enter password for jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb.corp:2181,hdp01.amb.corp:2181/default: ********* 18/08/28 14:35:21 [main]: INFO jdbc.HiveConnection: Connected to hdp02:10000 Connected to: Apache Hive (version 3.1.0.3.0.0.0-1634) Driver: Hive JDBC (version 3.1.0.3.0.0.0-1634) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.0.3.0.0.0-1634 by Apache Hive 0: jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb>
Created 08-29-2018 11:50 AM
Command sqoop-create-hive-table has the same effect!!!
In the HDP-3.0, the sqoop-hive-import should work?
Or will I have to return HDP-2.6?
$ sqoop-create-hive-table --connect jdbc:mysql://zabbix.amb.corp/zabbix --username *** --password *** --table alerts --hive-table t1 --verbose ... 18/08/29 14:40:07 DEBUG hive.TableDefWriter: Create statement: CREATE TABLE IF NOT EXISTS `history_uint` ( `itemid` BIGINT, `clock` INT, `value` BIGINT, `ns` INT) COMMENT 'Imported by sqoop on 2018/08/29 14:40:07' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE 18/08/29 14:40:08 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://hdp01.amb.corp:8020/user/sqoop/history_uint' INTO TABLE `history_uint` 18/08/29 14:40:09 INFO hive.HiveImport: Loading uploaded data into Hive 18/08/29 14:40:11 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings. 18/08/29 14:40:11 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] 18/08/29 14:40:11 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] 18/08/29 14:40:11 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 18/08/29 14:40:11 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 18/08/29 14:40:14 INFO hive.HiveImport: Connecting to jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb.corp:2181,hdp01.amb.corp:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Created on 08-29-2018 04:27 PM - last edited on 12-21-2020 09:16 PM by VidyaSargur
I believe I have the same exact problem. I get more details if I set debug logging prior to running sqoop.
export HADOOP_ROOT_LOGGER=DEBUG,console
Note that my cluster in not Kerberized. Here are the debug messages I get while it appears to be hung. But the job still never finishes. It keeps repeating the messages at about 5 minute intervals.
18/08/29 09:13:23 INFO hive.HiveImport: Connecting to jdbc:hive2://node1:2181,node1:2181,node1:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 18/08/29 09:13:24 DEBUG impl.LeaseRenewer: Lease renewer daemon for [] with renew id 1 executed
18/08/29 09:13:27 DEBUG impl.LeaseRenewer: Lease renewer daemon for [] with renew id 1 expired
18/08/29 09:13:27 DEBUG impl.LeaseRenewer: Lease renewer daemon for [] with renew id 1 exited
18/08/29 09:13:47 DEBUG ipc.Client: IPC Client (505973878) connection to node2/172.21.0.11:41588 from agileanalytics: closed
18/08/29 09:13:47 DEBUG ipc.Client: IPC Client (505973878) connection to node2/172.21.0.11:41588 from agileanalytics: stopped, remaining connections 1
18/08/29 09:13:48 DEBUG ipc.Client: IPC Client (505973878) connection to node2/172.21.0.10:8020 from agileanalytics: closed
18/08/29 09:13:48 DEBUG ipc.Client: IPC Client (505973878) connection to node2/172.21.0.10:8020 from agileanalytics: stopped, remaining connections 0
18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Initialize connection to node node3:6667 (id: 1002 rack: null) for sending metadata request
18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Initiating connection to node node3:6667 (id: 1002 rack: null)
18/08/29 09:18:18 DEBUG metrics.Metrics: Added sensor with name node-1002.bytes-sent
18/08/29 09:18:18 DEBUG metrics.Metrics: Added sensor with name node-1002.bytes-received
18/08/29 09:18:18 DEBUG metrics.Metrics: Added sensor with name node-1002.latency
18/08/29 09:18:18 DEBUG network.Selector: [Producer clientId=producer-1] Created socket with SO_RCVBUF = 32768, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node 1002
18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Completed connection to node 1002. Fetching API versions.
18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Initiating API versions fetch from node 1002.
18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Sending metadata request (type=MetadataRequest, topics=ATLAS_HOOK) to node node3:6667 (id: 1001 rack: null)
18/08/29 09:18:18 DEBUG clients.Metadata: Removing unused topic ATLAS_HOOK from the metadata list, expiryMs 1535559498233 now 1535559498257
18/08/29 09:18:18 DEBUG clients.Metadata: Updated cluster metadata version 3 to Cluster(id = 9eDSa9DhRMOTMZguD63M-w, nodes = [node3:6667 (id: 1002 rack: null), node3:6667 (id: 1003 rack: null), node3:6667 (id: 1001 rack: null)], partitions = [Partition(topic = ATLAS_HOOK, partition = 0, leader = 1001, replicas = [1001], isr = [1001], offlineReplicas = [])]) 18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Recorded API versions for node 1002: (Produce(0): 0 to 5 [usable: 5], Fetch(1): 0 to 6 [usable: 6], ListOffsets(2): 0 to 2 [usable: 2], Metadata(3): 0 to 5 [usable: 5], LeaderAndIsr(4): 0 to 1 [usable: 1], StopReplica(5): 0 [usable: 0], UpdateMetadata(6): 0 to 4 [usable: 4], ControlledShutdown(7): 0 to 1 [usable: 1], OffsetCommit(8): 0 to 3 [usable: 3], OffsetFetch(9): 0 to 3 [usable: 3], FindCoordinator(10): 0 to 1 [usable: 1], JoinGroup(11): 0 to 2 [usable: 2], Heartbeat(12): 0 to 1 [usable: 1], LeaveGroup(13): 0 to 1 [usable: 1], SyncGroup(14): 0 to 1 [usable: 1], DescribeGroups(15): 0 to 1 [usable: 1], ListGroups(16): 0 to 1 [usable: 1], SaslHandshake(17): 0 to 1 [usable: 1], ApiVersions(18): 0 to 1 [usable: 1], CreateTopics(19): 0 to 2 [usable: 2], DeleteTopics(20): 0 to 1 [usable: 1], DeleteRecords(21): 0 [usable: 0], InitProducerId(22): 0 [usable: 0], OffsetForLeaderEpoch(23): 0 [usable: 0], AddPartitionsToTxn(24): 0 [usable: 0], AddOffsetsToTxn(25): 0 [usable: 0], EndTxn(26): 0 [usable: 0], WriteTxnMarkers(27): 0 [usable: 0], TxnOffsetCommit(28): 0 [usable: 0], DescribeAcls(29): 0 [usable: 0], CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 0 [usable: 0], DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 0], AlterReplicaLogDirs(34): 0 [usable: 0], DescribeLogDirs(35): 0 [usable: 0], SaslAuthenticate(36): 0 [usable: 0], CreatePartitions(37): 0 [usable: 0])
18/08/29 09:22:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Node -1 disconnected.
18/08/29 09:22:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Sending metadata request (type=MetadataRequest, topics=) to node node3:6667 (id: 1002 rack: null)
18/08/29 09:22:18 DEBUG clients.Metadata: Updated cluster metadata version 4 to Cluster(id = 9eDSa9DhRMOTMZguD63M-w, nodes = [node3:6667 (id: 1003 rack: null), node3:6667 (id: 1002 rack: null), node3:6667 (id: 1001 rack: null)], partitions = [])
Created 09-06-2018 08:30 PM
I am also having a similar issue with hive import on HDP3. Has anyone been successful yet? My sqoop command successfully pulls the database to HDFS, but hangs on the zookeeper connect as well.
Created 09-13-2018 07:06 PM
Since Hive client is removed, using hive-import, hive-create-table and things like that will wait for credentials to be entered. if using sqoop CLI and not via oozie, then supply the hive credentials just as you do via beeline and this would do the needful.. If this is not acceptable then you would need to change the sqoop syntax to use hcatalog options to achieve the same.
Created 09-13-2018 08:51 PM
Mahesh you nailed it!
To get this to work I created the file "beeline-hs2-connection.xml" in ${HIVE_CONF_DIR}, i.e. /etc/hive/conf, and I included the credentials there.
See https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
Created 09-13-2018 11:05 PM
@Roger Sliva, Yes that's another way of getting it done.
Created 03-04-2019 07:51 AM
you can modify hive.distro script and let the login authentication enter in the script it self