Member since
07-30-2019
49
Posts
3
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
264 | 01-17-2020 01:21 AM | |
2189 | 10-24-2019 10:32 PM | |
2240 | 10-23-2019 11:39 PM | |
2254 | 10-23-2019 04:31 AM |
04-30-2020
07:19 AM
@Djain Hi Deepa, It seems you are missing the Hive Clients from the NodeManagers, which could be causing the issue. Can you please make sure you have hive clients installed on all the NodeManager nodes and then confirm you are hitting the issue. Also, if you are facing the problem, will you be able to upload the complete yarn app logs. Thanks, Rohit Rai Malhotra
... View more
01-17-2020
01:28 AM
@raghu9raghavend I see that you are using the below connect string: jdbc:hive2://ZK1:2181,ZK2:2181,ZK3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveServer2;transportMode=binary;httpPath=cliservice; You are using both "transportMode=binary" and then you are providing the "httpPath=cliservice" as well. "httpPath=cliservice" should be provided only when "transportMode=http". Thus, if your Hiveserver2 is running in HTTP mode, please set the "transportMode=http" in the connect string. jdbc:hive2://ZK1:2181,ZK2:2181,ZK3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveServer2;transportMode=http;httpPath=cliservice; Alternatively, if you are have HiveServer2 in binary mode, remove both the options. jdbc:hive2://ZK1:2181,ZK2:2181,ZK3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveServer2;
... View more
01-17-2020
01:21 AM
1 Kudo
@kentlee406 The issue seems to be on the Yarn side i.e. more specifically with the Resource Manager. We can see the below message: 20/01/16 13:13:12 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/10.0.2.15:8032 20/01/16 13:13:15 WARN hdfs.DFSClient: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1281) at java.lang.Thread.join(Thread.java:1355) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894) I would be great if you can check if there any other Yarn applications running in your cluster. I would also suggest a complete restart of your Yarn services.
... View more
10-24-2019
10:43 PM
Hi @Atena-Dev-Team, It seems you are trying to connect to Hive using either spark-sql or Spark Thriftserver. Can you please avoid involving spark and confirm if you are facing the issue via hive. Try accessing the table via Hive CLI or beeline when connected to Hiveserver2.
... View more
10-24-2019
10:32 PM
2 Kudos
Hi @BTKR/Tej, Your subject and your actual question differ a lot and both are completely different things. If you are concerned about the number of connections going to Metastore database from the Hive Metastore process, you can use the below way: 1. Find out the PID of HMS process on the server using the below command: ps -ef | grep -i hivemetastore 2. Once you have the PID, get the put of below command: lsof -p PID | grep ESTABLISHED This will give you list of all the connections being made to and by the Hivemetastore process. This will also include the connections made "TO" Hivemetastore process from the clients i.e. from the Hive CLI shells. Please look for the database type in the output to confirm the connections FROM Hivemetastore to HMS DB. For example on my side, I have the below output: In the above photo, all the outputs that have "mysql" in it are being made from HiveMetastore Process to HMS DB. As per your question in the description, if you want to find out how many threads are try to connect to the database at any time, you can collect the jstack of the HMS process and then look for the threads referring the mysql calls(which is the database type in my case, you can look for oracle or postgres if any of those are your database types). Also, I get a feeling that you are concerned by the number of connections being made to the database. You can check the below property via Hive CLI and beeline(this property will not listed in the Ambari as it is built in): set datanucleus.connectionPool.maxPoolSize; --This will give to the connection pool size. 10 is the default value, if set to something else, please let me know. Also, share the output of below query: set datanucleus.connectionPoolingType; Do confirm the exact HDP version you are on!!! Please note the connection pool set to 10 does not mean there will only 10 connections to HMS DB, there can be more connections, but if this value is increased, the number of connections also increases exponentially to HMS DB. Sometimes, it is suggested to increase the connection pool size to accommodate the huge load of queries on Hive. So, if you are using your Hive services extensively, and the connectionpoolsize is set to a higher value, I would suggest to fix the issue on the HMS DB side to to allow more number of connections. For example, on MySQL, there is max_connections, you can increase it to 1000 or more. Let me know if the above information was helpful!! Thanks, Rohit Rai Malhotra
... View more
10-23-2019
11:45 PM
You can try changing the limits from Ambari as well. Under Ambari > Yarn Configs > Advanced: Restart Yarn after increasing the limit.
... View more
10-23-2019
11:39 PM
Hey @Jesse_s, You can check the below link for documentation: https://www.cloudera.com/downloads.html Also, please accept the answer, if it helped.
... View more
10-23-2019
11:34 PM
Hi @soumya, Can you please confirm exactly how the issue was resolved. Please do accept the answer which helped you.
... View more
10-23-2019
05:28 AM
Hi @Gerva, You need to have below properties set to be able to use Hive ACID functionality: hive>set hive.support.concurrency = true;
hive>set hive.enforce.bucketing = true;
hive>set hive.exec.dynamic.partition.mode = nonstrict;
hive>set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive>set hive.compactor.initiator.on = true;
hive>set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;
... View more
10-23-2019
05:21 AM
Hi @sundar_gampa, Not sure, why the source command is not working. But, rather than running the shell command from beeline, you can create a shell script which has beeline command after you have sourced the required file.
... View more
10-23-2019
05:18 AM
Hello @hiveexport, Can you please share the exact sqoop command you are running at your end. To resolve the issue, import the data to an ORC Hive table, rather than TextFormat Hive table. Example on how to create ORC table in Hive: CREATE TABLE TEST_ORC(
Id INT,
Name STRING) STORED AS ORC; The ORC table should be able to handle the "\n" characters in your data.
... View more
10-23-2019
05:01 AM
Hi @soumya, It seems there is some confusion here. As per my understanding, you are trying to connect to Spark Thrift Server via beeline. Please correct me, if I am wrong. You you to be able to connect to Spark Thrift server via beeline, you need to make sure you are providing the correct hostname and portnumber in the JDBC URL you are using in the beeline. For example: jdbc:hive2://host:portnumber/ Here, the "host" will be the Hostname of the server where Spark Thriftserver is running. Let us say it is running on abc.soumya.com. The default portnumber for Spark Thriftserver is 10000. But this portnumber can be configured to something else as well. You need to find the correct portnumber. Thus, you connect string would look like below: jdbc:hive2://abc.soumya.com:10000/ You can refer the below link for more information on this: https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html
... View more
10-23-2019
04:52 AM
Hello @Rak, Can you please share the exact sqoop command you are running at your end. If you have the table in Textformat and you are getting more number of rows in Hive in comparison to source, this means your data from Source contains change line characters such as "\n" or other special characters, in some of your fields. To resolve the issue, import the data to an ORC Hive table. Example on how to create ORC table in Hive: CREATE TABLE TEST_ORC(
Id INT,
Name STRING) STORED AS ORC; If you do not want to store the data to ORC table, you can move the data from the ORC table to TextFormat table later within Hive.
... View more
10-23-2019
04:45 AM
Hi @ssulav , I see the below error in the HS2 logs shared: ERROR [HiveServer2-Background-Pool: Thread-886]: SessionState (:()) - Vertex failed, vertexName=Map 1, vertexId=vertex_1571760131080_0019_1_00, diagnostics=[Task failed, taskId=task_1571760131080_0019_1_00_000380, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: unable to create new native thread The above error is reported by the Yarn application. This error is not related to any Hadoop or Yarn or Hive configuration. Rather this is an error received by the OS for now able to create new threads for the process. You need to check the ulimits of the Yarn user on the NodeManager and ResourceManager nodes. Though, it is more likely this error is coming from the NodeManager running the above mentioned Task ID. You can identify the host where the above Task is running by searching for the vertex or Task ID in the Yarn application logs. The job would be running as Yarn user and check for similar errors in the nodemanager logs for the same host. You can try increase the ulimit for "ulimit -u" option.
... View more
10-23-2019
04:31 AM
Hi Jesse, There seems to be some encryption enabled at your end, that does not allow the other user to access the ODBC driver connection. This could be some Windows level configuration or some security parameter. Found below article, relevant to the similar error: https://stackoverflow.com/questions/30886839/key-not-valid-for-use-in-specified-state-how-to-load-profile-of-user-to-imper https://bytes.com/topic/asp-net/answers/566477-dpapi-decrypt-error-decryption-failed-key-not-valid-use-specified-state
... View more
10-23-2019
04:23 AM
Hello, Can you please share the exact sqoop command you are running at your end. Also, I think your Hive table is a TextFormat table(correct me if I am wrong). If you have the table in Textformat, this means your data from Source SQL DB contains change line characters such as "\n" in some of your fields. To resolve the issue, import the data to an ORC Hive table. Example on how to create ORC table in Hive: CREATE TABLE test_details_txt(
visit_id INT,
store_id SMALLINT) STORED AS ORC;
... View more
10-23-2019
04:09 AM
The query execution was not working in the first case i.e. when the used capacity was close 100%, as there was no capacity left in the queue to start the Query Coordinator App/Apps. And Query Coordinator App is a must to run the query on LLAP. Without a Query Coordinator app, you will be able to connect to LLAP, but no query will run(which requires a Tez task). There is a property in Hive "Maximum Total Concurrent Queries", this determines how many more apps would be running under the LLAP queue. These apps are for co-ordinating between the different mapper and reducer tasks running under LLAP Daemons(in llap0 app). Thus, if you have LLAP configured with below details, it will not work for a 100GB "llap" Yarn Queue capacity, it will require a minimum of 108GB "llap" Yarn Queue capacity(assuming that default AM size is 4 GB): Memory per Daemon=50GB Number of daemons=50GB Maximum Total Concurrent Queries=1 In the above case, the "llap0" application will need approx. 104GB(100GB for 2 daemons and some extra amount for its AM usually 4GB). The "llap0" always runs with (number of Daemons+1) containers. The extra 1 for AM. In this case as the Number of Daemons are 2, the "llap0" app will run with 3 containers(2 containers of 50GB each and one AM container). Apart from the above "llap0", there will be a Tez application with the name such as " HIVE-b52bb8c2-db60-4a79-9526-e61c9ee48261" (example) with in the "llap" yarn queue. This is the query co-ordinator app, as "Maximum Total Concurrent Queries" is set to 1, you will only see one such application. if the "Maximum Total Concurrent Queries" is set to 2, you will see 2 such apps and so on. Thus, you "llap" Yarn queue will need 4GB(default AM size) more for Query coordinator app to start. So, the total requirement for "llap" Yarn queue will be 104GB+4GB=108GB. Total Yarn queue requirement for LLAP= {(Number of Daemons)*(Memory per Daemon)+Default AM Size} + {Maximum Total Concurrent Queries*Default AM Size} The first part is "llap0" app and second part is the number of query coordinator apps in the above formula. Hope I have explained it well!! Let me know if you get confused!
... View more
10-23-2019
03:53 AM
It would be great if you can share the exact error you are facing. Also, can you please try creating the table as below: CREATE EXTERNAL TABLE IF NOT EXISTS tsvtab ( > name string, > region_code int, > sal int, > add string > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE; load data inpath 'hdfs://hadoophdinsightigi-2019-10-21t07-33-15-078z@hadohdistorage.blob.core.windows.net/user/HadoopPOCDir/data.tsv' into table tsvtab; OR CREATE EXTERNAL TABLE IF NOT EXISTS tsvtab ( > name string, > region_code int, > sal int, > add string > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE; load data inpath '/user/HadoopPOCDir/data.tsv' into table tsvtab; NOTE: I have change the path from "wasb//" to "hdfs//" in the first command and removed the unwanted details from the second command.
... View more
04-15-2018
07:54 AM
@PJ You can increase or decrese the number of mappers used in sqoop command using -m option. This helps you determine the number of parallel connections to be made to netezza. Though, please note, this doe snot always works, sometime the number of mappers are still decided on the basis of splits. The impact of this would be on Yarn, the number of mappers would add load on Yarn as it will run as a Map-Reduce job.
... View more
04-15-2018
07:54 AM
@Anurag Mishra Can you try the connect command as below: !connect jdbc:hive2://<hiveserver2_host>:10000/default;principal=hive/_HOST@SOLON.PRD Please note that I have changed principal=hive/<host>@SOLON.PRD to principal=hive/_HOST@SOLON.PRD Do replace <hiveserver2_host> with hostname of your Hiveserver2.
... View more
04-14-2018
06:22 AM
@Tauseef Ahmad No, Hive Metastore and HiveServer2 are still on version 1.2.X.X in HDP-2.6.X. Though, having said that, there are may functionalities of Hive 2.1 which are backported to existing Hive 1.2 in HDP stack.
... View more
04-14-2018
05:56 AM
@Georg Heiler Please confirm if hive.server2.enable.doAs is set to false at your end, if not, can you please set it to false and then test(hive.server2.enable.doAs=true is not supported with LLAP) Also, confirm if "Maximum Total Concurrent Queries" is not configured 0 for your LLAP.
... View more
04-14-2018
05:38 AM
@kumar
sonal
- Please share the exact error you see while connecting to Hive via ODBC - Do confirm if you are using Hortonworks provided Simba ODBC driver. - Is your cluster kerberised?
... View more
04-14-2018
05:36 AM
@Ans Butt Try using Hortonworks provided Simba ODBC driver which can be dowloaded here.
... View more
04-14-2018
05:31 AM
@Sirine It could be related to the below issue: https://issues.apache.org/jira/browse/HIVE-16828 Though, I am not sure it is a view of table at your end. As a workaround, can you try recreating the table and then try the query again.
... View more
04-14-2018
05:26 AM
@Gaurang Shah Please check for the exact error in the Yarn application log corresponding to query run. The Yarn app logs should give us a better understanding of Vertex failure issue. Attache them here, if you can.
... View more
04-14-2018
05:23 AM
@Gerhard Neuhofer can you share the command used to import the data to Hive. Also, I do not understand "the log directo empty", does this mean no logs in Hive Metastore? Id yes, it seems like a process crash. Check your /var/log/messages for any errors at the time. Do confirm if core limit is set to unlimited in "ulimit -a" output, this shall generate a core file for analysis of the crash.
... View more
04-14-2018
05:19 AM
@Burhan ud Din Qazi There is no such functionality in Hive, though, I think you can achieve this by adding constraint on all the columns of the table.
... View more
04-14-2018
05:11 AM
@pk reddy - Can you try running the command via Hive user and not root user. - Check if you enough space under /tmp directory as hive login creates logs there.
... View more
10-10-2017
08:54 AM
Hi @regie canada, You can use ? to pass the parameters in place of the usual @. Using @ is specific to Transact-SQL. The driver supports SQL-92 syntax where parameter markers in the SQL statement are expected to be question marks. Thanks, Rohit Rai Malhotra
... View more