About shobikas

shobikas · ‎07-29-2021

@Priyanka26 Can you please provide us the below details? 1) Are you using LLAP to run queries? 2) Complete stack trace 3) Your HDP version. Without the complete stack trace we can't provide you the correct suggestion to fix the issue. From your description i guess you are using LLAP. If so you can try setting the below properties and check whether it helps you or not. set hive.llap.io.enabled=false; set hive.llap.io.memory.mode=none; set hive.llap.object.cache.enabled=false; Please try and see whether it helps or not. If not kindly provide the complete stack trace. Please "Accept As Solution" if it is helpful to you. Thanks Shobika S

shobikas · ‎06-28-2021

@Faizan123 We are not segregating compute node and data node. Compute node is a node manager and data node is used for storage. If you submit the job the yarn will try to create the task containers on the node where the data is located. The name we use node manager or compute node is used by yarn containers for processing the data. The name data node is used for storing the data. Both can be in a single node. Please let me know if you have any queries. Also mark "Accept as Solution" if my answer helps you! Thanks Shobika S

shobikas · ‎06-09-2021

Hi wyukawa As you mentioned this is yarn bug - YARN-5368 and is fixed in the HDP-2.5.6 version. You can try setting the below properties in your environment and check whether it helps you. yarn.timeline-service.ttl-ms=604800000 yarn.timeline-service.rolling-period=daily yarn.timeline-service.leveldb-timeline-store.read-cache-size=4194304 yarn.timeline-service.leveldb-timeline-store.write-buffer-size=4194304 yarn.timeline-service.leveldb-timeline-store.max-open-files=500 NOTE: Kindly replace the values according to your need. If the above suggested workaround doesn't help then i would suggest you to upgrade your environment to HDP-2.5.6 version or more. You can refer the below link to check the fixed issues: https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.5.6/bk_release-notes/content/fixed_issues.html Please "Accept as Solution" if my answer was helpful to you. Thanks!

shobikas · ‎04-27-2021

@dmharshit It would be really difficult to answer for the question without the necessary details like what is the query you are running, From when you started facing the issue and the complete error stack. But from the two line which you provided i suspect that there might be issue with stats. So you can try setting the below parameter and try it. set hive.stats.column.autogather=false; set hive.optimize.sort.dynamic.partition=true; If the above doesn't help kindly provide the necessary details to figure out the issue. If the above parameter resolves the issue kindly mark "Accept as a Solution" Thanks Shobika S

shobikas · ‎04-21-2021

@enirys This issue occurs when the request status in Ambari DB is not matching the current status of LLAP. Due to this, we can't enable Interactive Query. 1) Kindly check for the status in the Ambari DB using the below command: "select status, last_execution_status from requestschedule where status='SCHEDULED' OR status='IN_PROGRESS'" 2) To change this, run the following and change status form requestschedule table to 'COMPLETED'. update requestschedule set status='COMPLETED' where status='SCHEDULED' Reference article: https://community.cloudera.com/t5/Customer/ERROR-quot-You-cannot-enable-Interactive-Query-now-because/ta-p/272587 Please accept as solution if my above update helps you. Thanks!

shobikas · ‎04-18-2021

@caisch The temp tables are created during the application run as intermediate data. These intermediate tables will not be removed in case the application fails and cleanup does not happen. The other reason might be if you are using beeline to run the query and if you abruptly disconnect the session without disconnecting properly by using '!q' then the file created on the '/tmp/hive' during the beeline initialisation will not be cleared. To clean up the /tmp directory automatically add the below properties in custom-hive-site.xml hive.start.cleanup.scratchdir - True // To clean up the Hive scratch directory while starting the HiveServer2. hive.server2.clear.dangling.scratchdir - true //This will start a thread in Hiveserver2 to clear out the dangling directories from the HDFS location. hive.server2.clear.dangling.scratchdir.interval - 1800s After adding the property kindly restart the hive service. Reference link: https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-ScratchDirectoryManagement or you can run a cron job to delete the files periodically. Reference Link: https://community.cloudera.com/t5/Support-Questions/Do-we-have-any-script-which-we-can-use-to-clean-tmp-hive-dir/m-p/156965 Please 'Accept as Solution' if my answers are really helpful to you. Thanks!

shobikas · ‎04-18-2021

@TDStephanieSoft If you are using Ranger kindly check if you have necessary permission for hive user in the HDFS policy for the path '/user/admin/EMP.csv'. Also check whether the hive use has read, write and execute permission in the HDFS location '/user/admin/EMP.csv' Thanks!

shobikas · ‎04-07-2021

Hi @Simon230 There are two points which i have to mention here (a) Temporary functions are not allowed in LLAP (b) hive.llap.execution.mode=only will not work with LLAP and custom UDF. You can use hive.llap.execution.mode=auto or all to run those queries when using a custom UDF. I would suggest you to try with execution mode as auto or all but again it is not the best way to use LLAP. You can refer the below community article for reference: https://community.cloudera.com/t5/Support-Questions/How-are-UDF-s-treated-with-Hive-LLAP/td-p/196893#:~:text=LLAP%20localizes%20all%20permanent%20functions,potential%20for%20conflicting%20between%20users. Please accept the solution if is is helpful to you. Thanks!

shobikas · ‎01-30-2019

Hi @Dukool SHarma The number of map tasks for a given job is driven by the number of input splits. So, the number of map tasks is equal to the number of input splits. Split is logical split of the data, basically used during data processing using MapReduce program. Suppose you have a file of 200MB and HDFS default block configuration is 128MB.Then it will consider two splits. But if you have specified the split size(say 200MB) in your MapReduce program then both blocks(2 block) will be considered as a single split for the MapReduce processing and one Mapper will get assigned for this job. If you want n number of Map, divide the file size by n as follows: Parameters: conf.set(“mapred.max.split.size”, “41943040”); // maximum split file size in bytes conf.set(“mapred.min.split.size”, “20971520”); // minimum split file size in bytes. Please accept my answer if it is found helpful.

shobikas · ‎01-08-2019

@Nikhil Raina In hadoop, Mapreduce breaks the jobs into task and these task runs in a parallel way. So that the overall execution time may reduce. Now among the divided tasks, if one of the tasks take more time than desired, then the overall execution time of job increases. The reason can be anything: node busy, network congestion, etc, which limits the total execution time of the Job, and the system should wait for the slow running tasks to be completed. It may be difficult to detect causes since the tasks still complete successfully, although more time is taken than the expected time. Hadoop doesn’t try to diagnose and fix slow running tasks, instead, it tries to detect them and runs backup tasks for them. The backup tasks will be preferentially scheduled on the faster nodes. This is called "speculative execution" in Hadoop. The "backup task" are "speculative Tasks". When a task successfully completes, then duplicate tasks that are running are killed since they are no longer needed. If the original task finishes first, then the speculative task will be killed. On the other hand, if the speculative task finishes first, then the original one will be killed. Simply, "Speculative execution" is a "MapReduce job optimization technique" in Hadoop that is enabled by default. To disable that set the property value "mapred.map.tasks.speculative.execution" - "false" and "mapred.reduce.tasks.speculative.execution" - "false" in "mapred-site.xml". Please accept this answer if you found it helpful.

Online	Offline
Last Visited	‎03-24-2022 12:36 AM

Member Since	‎11-26-2018 02:27 PM
Last Visited	‎03-24-2022 12:36 AM
Posts	103
Kudos received	13

Cloudera Community

Re: timeline server memory leak?

Re: Cannot enable hiveserver interactive

Re: How to clean up temporary Hive folders/files i...

Re: what is speculative execution

Re: Hive Metadata AssertionError: Invalid decRef w...

Re: Segregation between compute node and data node

Re: timeline server memory leak?

Re: Error: Error while processing statement: FAILE...

Re: Cannot enable hiveserver interactive

Re: How to clean up temporary Hive folders/files i...

Re: Fail to load data from HDFS into Apachi Hive

Re: Add custom UDF on Hive LLAP

Re: How to change / configure number of Mappers ?

Re: what is speculative execution