About Shifu

Shifu · ‎06-29-2021

Hello @K_K Once you run a query in beeline pick the queryID and trace the queryID in Hiveserver2 logs to figure out how much time it takes in the HTTP handler thread and the background thread to figure out any slowness in this part. Once the job goes through this it reaches YARN so you need to check the YARN application log of the query about where it is getting slow whether at AM level/container assigning level or task level. In this way, you can see where it is taking time. If it is a managed table you can run major compaction in the table to compress all the delta files into a single base file, in this way you can eliminate multiple HDFS scanning while running the query. You can also run explain plan against the query to figure out the flow and how much data it is processing. You can also run analyze query against the table to collect the column stats and table stats that will increase the query performance. All the jobs cannot be completed in lesser than 4 seconds. Reference: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ANALYZETABLE%3Ctable1%3ECACHEMETADATA https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/performance-tuning/content/hive_query_result_cache_ms_cache.html https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/using-hiveql/content/hive_hive_3_tables.html

Shifu · ‎06-26-2021

Hi @PURUSHOTHAMAN_S I can see there are a lot of alerts(28) in Ambari, if I were you I will start checking with HDFS service at first like namenode are up and running because it is vital for other services to come up. Then you may need to check YARN and then you can concentrate on others. Check out the Ambari startup logs to see why and where it is getting failed. Hope it helps.

Shifu · ‎06-23-2021

Hello @K_K Hope you are doing great. MapReduce2 and TEZ can provide an output of lesser than 4 seconds but it is DEPENDS upon so many factors. Namely query complexity, queue sizing, input data, resource availability, and so on.

Shifu · ‎06-20-2021

@Bryan_zh I believe HDP 3.1.5 supports Spark 2.X only. Please check the below link https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/spark-overview/content/analyzing_data_with_apache_spark.html How to integrate Hive and Spark? https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html

Shifu · ‎06-20-2021

Hello @prasanna06 Could you check the below link and see it helps. https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/auto_tls.html

Shifu · ‎06-20-2021

Hello @vidanimegh Error: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Application application_1623850591633_0042 failed 2 times due to AM Container for appattempt_1623850591633_0042_000002 exited with exitCode: -104 Failing this attempt.Diagnostics: [2021-06-18 18:32:52.722]Container [pid=32822,containerID=container_e49_1623850591633_0042_02_000001] is running 34230272B beyond the 'PHYSICAL' memory limit. Current usage: 2.0 GB of 2 GB physical memory used; 3.9 GB of 4.2 GB virtual memory used. Killing container. As I can see your jobs are getting failed with PHYSICAL memory limit error. Could you set the below property in beeline session level and re-run the analysis query and see how it goes. set hive.tez.container.size=8192; set hive.tez.java.opts=-Xmx6553; set tez.runtime.io.sort.mb=3072; set tez.task.resource.memory.mb=8192; set tez.am.resource.memory.mb=8192; set tez.am.launch.cmd-opts=-Xmx6553m;

Shifu · ‎06-19-2021

Hello @Bryan_zh Hive 3 is the default version in HDP 3.1.5 and you cannot degrade the version to Hive 2.3.7. It is also not recommended to degrade Hive from 3.X to 2.X

Shifu · ‎05-16-2021

Hello @ryu Could you take a screenshot of the message and share it with us. What is the HDP and Ambari version you are using?

Shifu · ‎05-16-2021

Hello @Enigmat Could you try DISTINCT to remove similar entries? https://dwgeek.com/identify-and-remove-duplicate-records-from-hive-table.html/ https://stackoverflow.com/questions/43280052/how-to-delete-duplicate-records-from-hive-table

Shifu · ‎05-16-2021

Hello @bsaad 1. Could you check whether you are able to connect to internet from the Oracle VM, using a ping test to google.com 2. Could you cross-check the port number 8889 is up and listening by using the following command as the root user #netstat -ntpla | grep 8889

Online	Offline
Last Visited	‎05-11-2022 05:47 AM

Member Since	‎03-29-2020 10:09 PM
Last Visited	‎05-11-2022 05:47 AM
Posts	110
Kudos received	9

Cloudera Community

Re: Hive table in power bi

Re: How do we identify hive metastore performance ...

Re: Halting due to Out Of Memory Error...Exit code...

Re: Error while executing hive merge query

Re: Upgrading Individual Components Post HDP 3.1.5

Re: Can Map Reduce2 or TEZ can provide output less...

Re: Hive View not opening in Ambari UI

Re: Can Map Reduce2 or TEZ can provide output less...

Re: How can I downgrade hive version 3.1 to 2.3.7...

Re: SSL Cerificate

Re: Analyze table commands not working in CDP

Re: How can I downgrade hive version 3.1 to 2.3.7...

Re: ambari warning, unknown version

Re: how to remove duplicates in a cell Hive SQL

Re: Internet not working in Couldera VM