Member since
07-30-2019
117
Posts
5
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1385 | 01-17-2020 01:21 AM | |
13120 | 10-24-2019 10:32 PM | |
8669 | 10-23-2019 11:39 PM | |
8683 | 10-23-2019 04:31 AM |
01-17-2020
01:28 AM
@raghu9raghavend I see that you are using the below connect string: jdbc:hive2://ZK1:2181,ZK2:2181,ZK3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveServer2;transportMode=binary;httpPath=cliservice; You are using both "transportMode=binary" and then you are providing the "httpPath=cliservice" as well. "httpPath=cliservice" should be provided only when "transportMode=http". Thus, if your Hiveserver2 is running in HTTP mode, please set the "transportMode=http" in the connect string. jdbc:hive2://ZK1:2181,ZK2:2181,ZK3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveServer2;transportMode=http;httpPath=cliservice; Alternatively, if you are have HiveServer2 in binary mode, remove both the options. jdbc:hive2://ZK1:2181,ZK2:2181,ZK3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveServer2;
... View more
01-17-2020
01:21 AM
1 Kudo
@kentlee406 The issue seems to be on the Yarn side i.e. more specifically with the Resource Manager. We can see the below message: 20/01/16 13:13:12 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/10.0.2.15:8032 20/01/16 13:13:15 WARN hdfs.DFSClient: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1281) at java.lang.Thread.join(Thread.java:1355) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894) I would be great if you can check if there any other Yarn applications running in your cluster. I would also suggest a complete restart of your Yarn services.
... View more
10-24-2019
10:43 PM
Hi @Atena-Dev-Team, It seems you are trying to connect to Hive using either spark-sql or Spark Thriftserver. Can you please avoid involving spark and confirm if you are facing the issue via hive. Try accessing the table via Hive CLI or beeline when connected to Hiveserver2.
... View more
10-24-2019
10:32 PM
4 Kudos
Hi @BTKR/Tej, Your subject and your actual question differ a lot and both are completely different things. If you are concerned about the number of connections going to Metastore database from the Hive Metastore process, you can use the below way: 1. Find out the PID of HMS process on the server using the below command: ps -ef | grep -i hivemetastore 2. Once you have the PID, get the put of below command: lsof -p PID | grep ESTABLISHED This will give you list of all the connections being made to and by the Hivemetastore process. This will also include the connections made "TO" Hivemetastore process from the clients i.e. from the Hive CLI shells. Please look for the database type in the output to confirm the connections FROM Hivemetastore to HMS DB. For example on my side, I have the below output: In the above photo, all the outputs that have "mysql" in it are being made from HiveMetastore Process to HMS DB. As per your question in the description, if you want to find out how many threads are try to connect to the database at any time, you can collect the jstack of the HMS process and then look for the threads referring the mysql calls(which is the database type in my case, you can look for oracle or postgres if any of those are your database types). Also, I get a feeling that you are concerned by the number of connections being made to the database. You can check the below property via Hive CLI and beeline(this property will not listed in the Ambari as it is built in): set datanucleus.connectionPool.maxPoolSize; --This will give to the connection pool size. 10 is the default value, if set to something else, please let me know. Also, share the output of below query: set datanucleus.connectionPoolingType; Do confirm the exact HDP version you are on!!! Please note the connection pool set to 10 does not mean there will only 10 connections to HMS DB, there can be more connections, but if this value is increased, the number of connections also increases exponentially to HMS DB. Sometimes, it is suggested to increase the connection pool size to accommodate the huge load of queries on Hive. So, if you are using your Hive services extensively, and the connectionpoolsize is set to a higher value, I would suggest to fix the issue on the HMS DB side to to allow more number of connections. For example, on MySQL, there is max_connections, you can increase it to 1000 or more. Let me know if the above information was helpful!! Thanks, Rohit Rai Malhotra
... View more
10-23-2019
11:45 PM
You can try changing the limits from Ambari as well. Under Ambari > Yarn Configs > Advanced: Restart Yarn after increasing the limit.
... View more
10-23-2019
11:39 PM
Hey @Jesse_s, You can check the below link for documentation: https://www.cloudera.com/downloads.html Also, please accept the answer, if it helped.
... View more
10-23-2019
11:34 PM
Hi @soumya, Can you please confirm exactly how the issue was resolved. Please do accept the answer which helped you.
... View more
10-23-2019
05:21 AM
Hi @sundar_gampa, Not sure, why the source command is not working. But, rather than running the shell command from beeline, you can create a shell script which has beeline command after you have sourced the required file.
... View more
10-23-2019
05:01 AM
Hi @soumya, It seems there is some confusion here. As per my understanding, you are trying to connect to Spark Thrift Server via beeline. Please correct me, if I am wrong. You you to be able to connect to Spark Thrift server via beeline, you need to make sure you are providing the correct hostname and portnumber in the JDBC URL you are using in the beeline. For example: jdbc:hive2://host:portnumber/ Here, the "host" will be the Hostname of the server where Spark Thriftserver is running. Let us say it is running on abc.soumya.com. The default portnumber for Spark Thriftserver is 10000. But this portnumber can be configured to something else as well. You need to find the correct portnumber. Thus, you connect string would look like below: jdbc:hive2://abc.soumya.com:10000/ You can refer the below link for more information on this: https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html
... View more
10-23-2019
04:45 AM
Hi @ssulav , I see the below error in the HS2 logs shared: ERROR [HiveServer2-Background-Pool: Thread-886]: SessionState (:()) - Vertex failed, vertexName=Map 1, vertexId=vertex_1571760131080_0019_1_00, diagnostics=[Task failed, taskId=task_1571760131080_0019_1_00_000380, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: unable to create new native thread The above error is reported by the Yarn application. This error is not related to any Hadoop or Yarn or Hive configuration. Rather this is an error received by the OS for now able to create new threads for the process. You need to check the ulimits of the Yarn user on the NodeManager and ResourceManager nodes. Though, it is more likely this error is coming from the NodeManager running the above mentioned Task ID. You can identify the host where the above Task is running by searching for the vertex or Task ID in the Yarn application logs. The job would be running as Yarn user and check for similar errors in the nodemanager logs for the same host. You can try increase the ulimit for "ulimit -u" option.
... View more