Support Questions

Find answers, ask questions, and share your expertise

sqoop import hung (--hive-import) HDP-3.0.0

avatar

I'm trying to import the Mysql table into Hive using the Sqoop --hive-import parameter. The import process stops at step hive.HiveImport: Connecting to jdbc:hive2, and remains in this state forever.

All service tests in Ambari are successful.

The connection string to Hive works successfully in the Superset.

Import options:

sqoop-import --connect jdbc:mysql://mysql_host/zabbix --username *** --password *** \
--table alerts \
--hive-import \
--create-hive-table \
--hive-table zabbix.alerts \
--verbose

Output:

18/08/28 12:14:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.0.0.0-1634
...
18/08/28 12:14:48 INFO mapreduce.ImportJobBase: Beginning import of alerts
...

18/08/28 12:14:58 INFO mapreduce.Job: Running job: job_1535384317331_0003
18/08/28 12:15:06 INFO mapreduce.Job: Job job_1535384317331_0003 running in uber mode : false
18/08/28 12:15:06 INFO mapreduce.Job:  map 0% reduce 0%
18/08/28 12:15:14 INFO mapreduce.Job:  map 100% reduce 0%
18/08/28 12:15:15 INFO mapreduce.Job: Job job_1535384317331_0003 completed successfully
18/08/28 12:15:15 INFO mapreduce.Job: Counters: 32
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=963956
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=450
		HDFS: Number of bytes written=1472709
		HDFS: Number of read operations=24
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=8
	Job Counters 
		Launched map tasks=4
		Other local map tasks=4
		Total time spent by all maps in occupied slots (ms)=63339
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=21113
		Total vcore-milliseconds taken by all map tasks=21113
		Total megabyte-milliseconds taken by all map tasks=32429568
	Map-Reduce Framework
		Map input records=3152
		Map output records=3152
		Input split bytes=450
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=312
		CPU time spent (ms)=10040
		Physical memory (bytes) snapshot=1009577984
		Virtual memory (bytes) snapshot=13137891328
		Total committed heap usage (bytes)=597688320
		Peak Map Physical memory (bytes)=258195456
		Peak Map Virtual memory (bytes)=3291807744
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=1472709
18/08/28 12:15:15 INFO mapreduce.ImportJobBase: Transferred 1.4045 MB in 25.1313 seconds (57.2271 KB/sec)
18/08/28 12:15:15 INFO mapreduce.ImportJobBase: Retrieved 3152 records.
18/08/28 12:15:15 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table alerts
...
18/08/28 12:15:17 DEBUG hive.TableDefWriter: Create statement: CREATE TABLE `zabbix.alerts` ( `alertid` BIGINT, `actionid` BIGINT, `eventid` BIGINT, `userid` BIGINT, `clock` INT, `mediatypeid` BIGINT, `sendto` STRING, `subject` STRING, `message` STRING, `status` INT, `retries` INT, `error` STRING, `esc_step` INT, `alerttype` INT, `p_eventid` BIGINT, `acknowledgeid` BIGINT) COMMENT 'Imported by sqoop on 2018/08/28 12:15:17' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE
18/08/28 12:15:17 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://hdp01.amb.corp:8020/user/hive/alerts' INTO TABLE `zabbix.alerts`
18/08/28 12:15:17 INFO hive.HiveImport: Loading uploaded data into Hive
18/08/28 12:15:20 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
18/08/28 12:15:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
18/08/28 12:15:20 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
18/08/28 12:15:20 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
18/08/28 12:15:20 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
18/08/28 12:15:24 INFO hive.HiveImport: Connecting to jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb.corp:2181,hdp01.amb.corp:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2

At the last step, imports hang!

What could be the problem?

1 ACCEPTED SOLUTION

avatar
New Contributor

Mahesh you nailed it!

To get this to work I created the file "beeline-hs2-connection.xml" in ${HIVE_CONF_DIR}, i.e. /etc/hive/conf, and I included the credentials there.

See https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

View solution in original post

9 REPLIES 9

avatar
@Eugene Mogilevsky

Can you check the HS2 logs and Zookeeper logs for issues? Seems like the Zookeeper is unable to connect to HS2.

avatar

In the journals HS2 and Zookeeper no problems were found.

But, when connecting to Hive with Beeline, the server asks for the login and password.

Could something like this happen when Sqoop connected to Hive? And how to fix it?

[hive@hdp00]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb.corp:2181,hdp01.amb.corp:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Enter username for jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb.corp:2181,hdp01.amb.corp:2181/default: hive
Enter password for jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb.corp:2181,hdp01.amb.corp:2181/default: *********
18/08/28 14:35:21 [main]: INFO jdbc.HiveConnection: Connected to hdp02:10000
Connected to: Apache Hive (version 3.1.0.3.0.0.0-1634)
Driver: Hive JDBC (version 3.1.0.3.0.0.0-1634)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.0.0.0-1634 by Apache Hive
0: jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb> 

avatar

Command sqoop-create-hive-table has the same effect!!!
In the HDP-3.0, the sqoop-hive-import should work?

Or will I have to return HDP-2.6?

$ sqoop-create-hive-table --connect jdbc:mysql://zabbix.amb.corp/zabbix --username *** --password *** --table alerts --hive-table t1 --verbose
...
18/08/29 14:40:07 DEBUG hive.TableDefWriter: Create statement: CREATE TABLE IF NOT EXISTS `history_uint` ( `itemid` BIGINT, `clock` INT, `value` BIGINT, `ns` INT) COMMENT 'Imported by sqoop on 2018/08/29 14:40:07' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE

18/08/29 14:40:08 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://hdp01.amb.corp:8020/user/sqoop/history_uint' INTO TABLE `history_uint`

18/08/29 14:40:09 INFO hive.HiveImport: Loading uploaded data into Hive

18/08/29 14:40:11 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.

18/08/29 14:40:11 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]

18/08/29 14:40:11 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

18/08/29 14:40:11 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

18/08/29 14:40:11 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

18/08/29 14:40:14 INFO hive.HiveImport: Connecting to jdbc:hive2://hdp03.amb.corp:2181,hdp02.amb.corp:2181,hdp01.amb.corp:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2

avatar
New Contributor

I believe I have the same exact problem. I get more details if I set debug logging prior to running sqoop.

export HADOOP_ROOT_LOGGER=DEBUG,console

Note that my cluster in not Kerberized. Here are the debug messages I get while it appears to be hung. But the job still never finishes. It keeps repeating the messages at about 5 minute intervals.

18/08/29 09:13:23 INFO hive.HiveImport: Connecting to jdbc:hive2://node1:2181,node1:2181,node1:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 18/08/29 09:13:24 DEBUG impl.LeaseRenewer: Lease renewer daemon for [] with renew id 1 executed

18/08/29 09:13:27 DEBUG impl.LeaseRenewer: Lease renewer daemon for [] with renew id 1 expired

18/08/29 09:13:27 DEBUG impl.LeaseRenewer: Lease renewer daemon for [] with renew id 1 exited

18/08/29 09:13:47 DEBUG ipc.Client: IPC Client (505973878) connection to node2/172.21.0.11:41588 from agileanalytics: closed

18/08/29 09:13:47 DEBUG ipc.Client: IPC Client (505973878) connection to node2/172.21.0.11:41588 from agileanalytics: stopped, remaining connections 1

18/08/29 09:13:48 DEBUG ipc.Client: IPC Client (505973878) connection to node2/172.21.0.10:8020 from agileanalytics: closed

18/08/29 09:13:48 DEBUG ipc.Client: IPC Client (505973878) connection to node2/172.21.0.10:8020 from agileanalytics: stopped, remaining connections 0

18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Initialize connection to node node3:6667 (id: 1002 rack: null) for sending metadata request

18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Initiating connection to node node3:6667 (id: 1002 rack: null)

18/08/29 09:18:18 DEBUG metrics.Metrics: Added sensor with name node-1002.bytes-sent

18/08/29 09:18:18 DEBUG metrics.Metrics: Added sensor with name node-1002.bytes-received

18/08/29 09:18:18 DEBUG metrics.Metrics: Added sensor with name node-1002.latency

18/08/29 09:18:18 DEBUG network.Selector: [Producer clientId=producer-1] Created socket with SO_RCVBUF = 32768, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node 1002

18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Completed connection to node 1002. Fetching API versions.

18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Initiating API versions fetch from node 1002.

18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Sending metadata request (type=MetadataRequest, topics=ATLAS_HOOK) to node node3:6667 (id: 1001 rack: null)

18/08/29 09:18:18 DEBUG clients.Metadata: Removing unused topic ATLAS_HOOK from the metadata list, expiryMs 1535559498233 now 1535559498257

18/08/29 09:18:18 DEBUG clients.Metadata: Updated cluster metadata version 3 to Cluster(id = 9eDSa9DhRMOTMZguD63M-w, nodes = [node3:6667 (id: 1002 rack: null), node3:6667 (id: 1003 rack: null), node3:6667 (id: 1001 rack: null)], partitions = [Partition(topic = ATLAS_HOOK, partition = 0, leader = 1001, replicas = [1001], isr = [1001], offlineReplicas = [])]) 18/08/29 09:18:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Recorded API versions for node 1002: (Produce(0): 0 to 5 [usable: 5], Fetch(1): 0 to 6 [usable: 6], ListOffsets(2): 0 to 2 [usable: 2], Metadata(3): 0 to 5 [usable: 5], LeaderAndIsr(4): 0 to 1 [usable: 1], StopReplica(5): 0 [usable: 0], UpdateMetadata(6): 0 to 4 [usable: 4], ControlledShutdown(7): 0 to 1 [usable: 1], OffsetCommit(8): 0 to 3 [usable: 3], OffsetFetch(9): 0 to 3 [usable: 3], FindCoordinator(10): 0 to 1 [usable: 1], JoinGroup(11): 0 to 2 [usable: 2], Heartbeat(12): 0 to 1 [usable: 1], LeaveGroup(13): 0 to 1 [usable: 1], SyncGroup(14): 0 to 1 [usable: 1], DescribeGroups(15): 0 to 1 [usable: 1], ListGroups(16): 0 to 1 [usable: 1], SaslHandshake(17): 0 to 1 [usable: 1], ApiVersions(18): 0 to 1 [usable: 1], CreateTopics(19): 0 to 2 [usable: 2], DeleteTopics(20): 0 to 1 [usable: 1], DeleteRecords(21): 0 [usable: 0], InitProducerId(22): 0 [usable: 0], OffsetForLeaderEpoch(23): 0 [usable: 0], AddPartitionsToTxn(24): 0 [usable: 0], AddOffsetsToTxn(25): 0 [usable: 0], EndTxn(26): 0 [usable: 0], WriteTxnMarkers(27): 0 [usable: 0], TxnOffsetCommit(28): 0 [usable: 0], DescribeAcls(29): 0 [usable: 0], CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 0 [usable: 0], DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 0], AlterReplicaLogDirs(34): 0 [usable: 0], DescribeLogDirs(35): 0 [usable: 0], SaslAuthenticate(36): 0 [usable: 0], CreatePartitions(37): 0 [usable: 0])

18/08/29 09:22:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Node -1 disconnected.

18/08/29 09:22:18 DEBUG clients.NetworkClient: [Producer clientId=producer-1] Sending metadata request (type=MetadataRequest, topics=) to node node3:6667 (id: 1002 rack: null)

18/08/29 09:22:18 DEBUG clients.Metadata: Updated cluster metadata version 4 to Cluster(id = 9eDSa9DhRMOTMZguD63M-w, nodes = [node3:6667 (id: 1003 rack: null), node3:6667 (id: 1002 rack: null), node3:6667 (id: 1001 rack: null)], partitions = [])

avatar
Rising Star

I am also having a similar issue with hive import on HDP3. Has anyone been successful yet? My sqoop command successfully pulls the database to HDFS, but hangs on the zookeeper connect as well.

avatar
Expert Contributor

Since Hive client is removed, using hive-import, hive-create-table and things like that will wait for credentials to be entered. if using sqoop CLI and not via oozie, then supply the hive credentials just as you do via beeline and this would do the needful.. If this is not acceptable then you would need to change the sqoop syntax to use hcatalog options to achieve the same.

avatar
New Contributor

Mahesh you nailed it!

To get this to work I created the file "beeline-hs2-connection.xml" in ${HIVE_CONF_DIR}, i.e. /etc/hive/conf, and I included the credentials there.

See https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

avatar
Expert Contributor

@Roger Sliva, Yes that's another way of getting it done.

avatar
Explorer

you can modify hive.distro script and let the login authentication enter in the script it self