About Shifu

Shifu · ‎09-17-2021

Hi @manojamr I am glad to know your original issue got resolved. As per your last comment, your Query took 9.5 hours to get complete. In this case, we may need to check whether there is a delay or hungriness, or resource crunch or it is normal. To figure out that we may need beeline console output, QueryId, Application log, all HS2 and HMS logs. It would be great if you create a case with Cloudera so we would be happy to assist you. If you are happy with the reply, mark it Accept as Solution

Shifu · ‎09-17-2021

Hi @Kiddo Could you check whether the below link helps your query? https://community.cloudera.com/t5/Support-Questions/Hive-Do-we-have-checksum-in-hive/td-p/104490 https://community.cloudera.com/t5/Support-Questions/Hive-Can-t-get-the-md5-value/m-p/117696 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF If you are happy with the reply, mark it Accept as Solution

Shifu · ‎09-14-2021

Hi @manojamr Step 1: Could you run the following commands to gather column statistics for all the table that is involved in the query. analyze table <TABLE-NAME> compute statistics; analyze table <TABLE-NAME> compute statistics for columns; Reference: https://cwiki.apache.org/confluence/display/Hive/StatsDev Step 2: set the following property in session level set hive.tez.container.size=10240 ; set hive.tez.java.opts=-Xmx8192m; set tez.runtime.io.sort.mb=4096; set tez.task.resource.memory.mb=7680; set tez.am.resource.memory.mb=10240; set tez.am.launch.cmd-opts=-Xmx8192m; If Step2 got succes ignore step 3. Step 3: Re-run the job at the beeline session-level. If the job fails again, I would request the below details. 1. Complete query, 2. Beeline console output, 3. QueryId of the job 4. HS2 and HMS logs and 5. Application logs.

Shifu · ‎09-01-2021

@Eric_B Yes, your understanding is correct.

Shifu · ‎09-01-2021

Hi @saikat As I can understand you are running a merge query and it is failing with java.lang.OutOfMemoryError error. Step 1: Could you please run major compaction on all the tables involves in the merge query(If it is an ACID table or else ignore step1). Once the major compaction is triggered make sure it got completed by running "show compactions;" command in the beeline. This will bring down some stats collection burden for the hive. How to run minor and major compaction? Alter table <table name> compact 'MAJOR'; Step 2: Once step1 is done. Please set the following propery in beeline session level and re-run the merge query set hive.tez.container.size=16384; set hive.tez.java.opts=-Xmx13107m; set tez.runtime.io.sort.mb=4096; set tez.task.resource.memory.mb=16384; set tez.am.resource.memory.mb=16384; set tez.am.launch.cmd-opts=-Xmx13107m; set hive.auto.convert.join=false; The TEZ container and AM size is set as 16GB, if the query got failed you can increase the value to 20GB(then hive.tez.java.opts and tez.am.launch.cmd-opts need to be configured 80% of container and AM size that is 16384). If the query got succeeded with 16GB of TEZ container and AM size then you can try to decrease it too 14/12/10 and figure out a benchmark where it is failing and getting succeeded. In this way, you can save resources. If you are happy with the comment, Mark it "Accept as Solution".

Shifu · ‎08-31-2021

If I installed a later version of Zookeeper (for example), would ambari recognize that later version in it's management? Or would it exist in parallel with the version of Zookeeper packaged with 3.1.5? > You have to install zookeeper or any component via Ambari only, if you install it manually(via yum or apt) in the server ambari will not recognize or it will not consider it. Grafana is running v6.4.2, but has a major security issue that was patched in future releases: https://grafana.com/blog/2020/06/03/grafana-6.7.4-and-7.0.2-released-with-important-security-fix/ Infra Solr is running SOLR 7.7 and has an RCE vulnerability. This was patched in SOLR 8.3, which is not part of Ambari 2.7.5's InfraSolr. Zookeeper packaged is 3.4.6, but SSL implementation was added in 3.5.5 > As mentioned already please create a support case with Cloudera along with the vulnerability CVE number so we can check with our team and confirm whether our product is vulnerable to the security concern or not. If it is so we can provide a patch to overcome it. If you are happy with the comment, Mark it "Accept as Solution".

Shifu · ‎08-31-2021

Hi @Eric_B I saw some questions talking about "Patch Upgrades" but is there a guide to upgrading individual components in a cluster via Ambari or however? > You may not able to upgrade individual components via Ambari. You can either install a component or you can upgrade to the next available HDP 3.X version but I can see you are in the latest 3.1.5 version. If you felt your Hadoop components have a particular vulnerability issue. Please feel free to raise a case with Cloudera so we will check and clarify the same. If the vulnerability is legitimate and could cause harm to your infrastructure we can provide a patch to the issue. In that way, you can overcome it. If you are happy with the comment, Mark it Accepts as Solution.

Shifu · ‎08-22-2021

Hi @Nil_kharat Still not resolved the issue. You may need to check the HS2 logs and application logs to figure the slowness. And one more thing how can we track the job that are running by user's. 1. Go to RM UI > Running/finished/killed > check the User column 2. CM > YARN > Applications > Based upon the user you can search over here. If you are happy with the response mark it as Accepts as Solution

Shifu · ‎08-19-2021

Hi @Nil_kharat Generally, in Hive you may see Query slowness, Query failure, Configuration issue, Alerts, Services down, Vulnerability issue, some bugs this kind of issues you may see.

Shifu · ‎08-16-2021

Hi @Nil_kharat If your jobs are at ACCEPTED most probably it is because the AM does not have enough memory to launch in that particular queue. You can click on top of the particular ACCEPTED job to see the detail. Can you try to increase the Maximum AM Resource(Ambari > Tile Icon > Yarn Queue Manager > Particular queue) to 50% and try to re-run the query and check. If you are happy with the response mark it as Accept as Solution

Online	Offline
Last Visited	‎05-11-2022 05:47 AM

Member Since	‎03-29-2020 10:09 PM
Last Visited	‎05-11-2022 05:47 AM
Posts	110
Kudos received	9

Cloudera Community

Re: Hive table in power bi

Re: How do we identify hive metastore performance ...

Re: Halting due to Out Of Memory Error...Exit code...

Re: Error while executing hive merge query

Re: Upgrading Individual Components Post HDP 3.1.5

Re: Halting due to Out Of Memory Error...Exit code...

Re: MD5 value of a table

Re: Halting due to Out Of Memory Error...Exit code...

Re: Upgrading Individual Components Post HDP 3.1.5

Re: Error while executing hive merge query

Re: Upgrading Individual Components Post HDP 3.1.5

Re: Upgrading Individual Components Post HDP 3.1.5

Re: YARN job is Running Slow

Re: real time issues in hive and spark

Re: YARN job is Running Slow