Member since
11-26-2018
103
Posts
13
Kudos Received
4
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2108 | 04-21-2021 01:10 AM | |
| 6446 | 04-18-2021 11:32 PM | |
| 13067 | 01-08-2019 02:20 PM |
06-28-2021
06:35 AM
@Faizan123 We are not segregating compute node and data node. Compute node is a node manager and data node is used for storage. If you submit the job the yarn will try to create the task containers on the node where the data is located. The name we use node manager or compute node is used by yarn containers for processing the data. The name data node is used for storing the data. Both can be in a single node. Please let me know if you have any queries. Also mark "Accept as Solution" if my answer helps you! Thanks Shobika S
... View more
04-27-2021
02:23 AM
@dmharshit It would be really difficult to answer for the question without the necessary details like what is the query you are running, From when you started facing the issue and the complete error stack. But from the two line which you provided i suspect that there might be issue with stats. So you can try setting the below parameter and try it. set hive.stats.column.autogather=false;
set hive.optimize.sort.dynamic.partition=true; If the above doesn't help kindly provide the necessary details to figure out the issue. If the above parameter resolves the issue kindly mark "Accept as a Solution" Thanks Shobika S
... View more
04-21-2021
01:10 AM
@enirys This issue occurs when the request status in Ambari DB is not matching the current status of LLAP. Due to this, we can't enable Interactive Query. 1) Kindly check for the status in the Ambari DB using the below command: "select status, last_execution_status from requestschedule
where status='SCHEDULED' OR status='IN_PROGRESS'" 2) To change this, run the following and change status form requestschedule table to 'COMPLETED'. update requestschedule set status='COMPLETED' where status='SCHEDULED' Reference article: https://community.cloudera.com/t5/Customer/ERROR-quot-You-cannot-enable-Interactive-Query-now-because/ta-p/272587 Please accept as solution if my above update helps you. Thanks!
... View more
04-18-2021
11:32 PM
@caisch The temp tables are created during the application run as intermediate data. These intermediate tables will not be removed in case the application fails and cleanup does not happen. The other reason might be if you are using beeline to run the query and if you abruptly disconnect the session without disconnecting properly by using '!q' then the file created on the '/tmp/hive' during the beeline initialisation will not be cleared. To clean up the /tmp directory automatically add the below properties in custom-hive-site.xml hive.start.cleanup.scratchdir - True // To clean up the Hive scratch directory while starting the HiveServer2.
hive.server2.clear.dangling.scratchdir - true //This will start a thread in Hiveserver2 to clear out the dangling directories from the HDFS location.
hive.server2.clear.dangling.scratchdir.interval - 1800s After adding the property kindly restart the hive service. Reference link: https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-ScratchDirectoryManagement or you can run a cron job to delete the files periodically. Reference Link: https://community.cloudera.com/t5/Support-Questions/Do-we-have-any-script-which-we-can-use-to-clean-tmp-hive-dir/m-p/156965 Please 'Accept as Solution' if my answers are really helpful to you. Thanks!
... View more
01-08-2019
02:20 PM
2 Kudos
@Nikhil Raina In hadoop, Mapreduce breaks the jobs into task and these task runs in a parallel way. So that the overall execution time may reduce. Now among the divided tasks, if one of the tasks take more time than desired, then the overall execution time of job increases. The reason can be anything: node busy, network congestion, etc, which limits the total execution time of the Job, and the system should wait for the slow running tasks to be completed. It may be difficult to detect causes since the tasks still complete successfully, although more time is taken than the expected time. Hadoop doesn’t try to diagnose and fix slow running tasks, instead, it tries to detect them and runs backup tasks for them. The backup tasks will be preferentially scheduled on the faster nodes. This is called "speculative execution" in Hadoop. The "backup task" are "speculative Tasks". When a task successfully completes, then duplicate tasks that are running are killed since they are no longer needed. If the original task finishes first, then the speculative task will be killed. On the other hand, if the speculative task finishes first, then the original one will be killed. Simply, "Speculative execution" is a "MapReduce job optimization technique" in Hadoop that is enabled by default. To disable that set the property value "mapred.map.tasks.speculative.execution" - "false" and "mapred.reduce.tasks.speculative.execution" - "false" in "mapred-site.xml". Please accept this answer if you found it helpful.
... View more