About shobikas

VidyaSargur · ‎08-05-2021

@Priyanka26, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. If you are still experiencing the issue, can you provide the information @shobikas has requested?

VidyaSargur · ‎07-06-2021

Hi @Faizan123, I hope the replies provided by @Shelton or @shobikas has helped you resolve your issue. If so, can you kindly accept them as a solution?

vidanimegh · ‎06-17-2021

@dmharshit , When you run this query, does a YARN application ID get generated? Or the query fails before triggering the YARN application? In case YARN application is triggered, please get the logs of that particular YARN application and check for errors. yarn logs -applicationId your_application_id > your_application_id.log 2>&1 Check and see if you're able to get any detailed errors in this log file and share. Thanks, Megh

VidyaSargur · ‎06-13-2021

Hi @gael__urbauer , did @shobikas' solution work for you? Have you found a resolution for your issue? If so, can you please mark the appropriate reply as the solution? It will make it easier for others to find the answer in the future.

enirys · ‎04-21-2021

@shobikas Thank you for your reply, you are right about ambari DB. i had two requests in "SCHEDULED" status. I just updated the status to "COMPLETED" then restart ambari to get it working. Thank you so much for your help

TDStephanieSoft · ‎04-19-2021

After I have checked the permission in Ranger, I noticed that there isn't any policy setup for hive user. I did add a policy to allow access to the folder /user/admin in HDFS to hive user. The problem is I cannot save the new policy due to another security error as show below. I wonder if the Hortonworks Sandbox HDP 3.0 that I downloaded is working version 😞

shobikas · ‎04-18-2021

@caisch The temp tables are created during the application run as intermediate data. These intermediate tables will not be removed in case the application fails and cleanup does not happen. The other reason might be if you are using beeline to run the query and if you abruptly disconnect the session without disconnecting properly by using '!q' then the file created on the '/tmp/hive' during the beeline initialisation will not be cleared. To clean up the /tmp directory automatically add the below properties in custom-hive-site.xml hive.start.cleanup.scratchdir - True // To clean up the Hive scratch directory while starting the HiveServer2. hive.server2.clear.dangling.scratchdir - true //This will start a thread in Hiveserver2 to clear out the dangling directories from the HDFS location. hive.server2.clear.dangling.scratchdir.interval - 1800s After adding the property kindly restart the hive service. Reference link: https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-ScratchDirectoryManagement or you can run a cron job to delete the files periodically. Reference Link: https://community.cloudera.com/t5/Support-Questions/Do-we-have-any-script-which-we-can-use-to-clean-tmp-hive-dir/m-p/156965 Please 'Accept as Solution' if my answers are really helpful to you. Thanks!

Simon230 · ‎04-09-2021

Hi, Thank you for your response @shobikas As explained in my previous post, I don't use temporary functions in LLAP. I try to use permanent function. I followed this official article https://community.cloudera.com/t5/Community-Articles/Creating-custom-udf-and-adding-udf-jar-to-Hive-LLAP/ta-p/246598. In the example, it seems that it is possible to use a permanent UDF with hive.llap.execution.mode=only. Thank you, Simon

Shelton · ‎02-10-2019

@Dukool SHarma Any updates?

shobikas · ‎01-08-2019

@Nikhil Raina In hadoop, Mapreduce breaks the jobs into task and these task runs in a parallel way. So that the overall execution time may reduce. Now among the divided tasks, if one of the tasks take more time than desired, then the overall execution time of job increases. The reason can be anything: node busy, network congestion, etc, which limits the total execution time of the Job, and the system should wait for the slow running tasks to be completed. It may be difficult to detect causes since the tasks still complete successfully, although more time is taken than the expected time. Hadoop doesn’t try to diagnose and fix slow running tasks, instead, it tries to detect them and runs backup tasks for them. The backup tasks will be preferentially scheduled on the faster nodes. This is called "speculative execution" in Hadoop. The "backup task" are "speculative Tasks". When a task successfully completes, then duplicate tasks that are running are killed since they are no longer needed. If the original task finishes first, then the speculative task will be killed. On the other hand, if the speculative task finishes first, then the original one will be killed. Simply, "Speculative execution" is a "MapReduce job optimization technique" in Hadoop that is enabled by default. To disable that set the property value "mapred.map.tasks.speculative.execution" - "false" and "mapred.reduce.tasks.speculative.execution" - "false" in "mapred-site.xml". Please accept this answer if you found it helpful.

Online	Offline
Last Visited	‎03-24-2022 12:36 AM

Member Since	‎11-26-2018 02:27 PM
Last Visited	‎03-24-2022 12:36 AM
Posts	103
Kudos received	13

Cloudera Community

Re: timeline server memory leak?

Re: Cannot enable hiveserver interactive

Re: How to clean up temporary Hive folders/files i...

Re: what is speculative execution

Re: Hive Metadata AssertionError: Invalid decRef w...

Re: Segregation between compute node and data node

Re: Error: Error while processing statement: FAILE...

Re: timeline server memory leak?

Re: Cannot enable hiveserver interactive

Re: Fail to load data from HDFS into Apachi Hive

Re: How to clean up temporary Hive folders/files i...

Re: Add custom UDF on Hive LLAP

Re: How to change / configure number of Mappers ?

Re: what is speculative execution