Member since
01-24-2021
9
Posts
0
Kudos Received
0
Solutions
01-18-2024
10:04 PM
Much Thanks .
... View more
01-08-2024
10:33 PM
As an admin of a CDH cluster , some query has submitted to the hiveserver2 , but the query still in the hiveserver make the explain stage , which does not submit to the yarn cluster and has no application id on yarn . And the query make the hiveserver2 wrong which has a too long query such as select * from aaa where code in ('xxx','xxx1','xxx3',.......'xxx2000000') ; A sql has more than milion row may make the hiveserver2 corrupt . The 10002 web page seems does not have some action button like yarn web the deal with the query . The cdh version is 6.3.2 . hive version 2.1.1 . When this situation occur , I have to restart the hiveserver2 . I want to know if there is some way to kill a query through hive queryid or hive sessionid o instead of yarn applicationid . It is also usefull when someone query the metastore which multithread such as "use hadoop " sql with 30 or more active connect , The adminastritor has the ability to kill them forcely .
... View more
Labels:
- Labels:
-
Apache Hive
09-15-2023
02:06 AM
I use CDH 6.3.2 。 hive 2.1 hadoop 3.0 hive on spark 。yarn cluster 。 hive.merge.sparkfiles=true ; hive.merge.orcfile.stripe.level=true ; This configuration makes the 1099 reduce file result merge into one file when the result is small 。Then the merged file has about 1099 stripes in one file 。 Then the result is so slow when it is read. I tried hive.merge.orcfile.stripe.level=false ; The result is desirable 。One small file with one stripe and read fast 。 Can anyone tell the difference between true and false ? Why " hive.merge.orcfile.stripe.level=true " is the default one ?
... View more
02-18-2023
12:05 PM
Run hdfs fsck delete. And found that datanode config wrong. Less 2 directories datanode store direcory config. Is there any possible way to rebuild the lost corrupt block? Much thanks
... View more
11-28-2022
11:09 PM
Thanks a lot. This "yarn application -updatePriority 10 -appId application_xxxx_xx" seems a config of yarn. It does not work for spark 2.x in CDH 6.3.2 either. Does it the same reason which means the 'Application Priority' must match the yarn version with spark version?
... View more
11-24-2022
06:11 PM
Yarn's application priority can be found in yarn 8088 resource manager's website, How do the priority work in yarn? Now the version of CDH I use is 6.3.2. Hadoop 3.0.0 Hive 2.1.1 I use hive on spark. Can I use the config to manage the application priority? For some reason some hive sqls should have high application priority when those sqls are appending and run ahead of the other. Instead of running the same time and equal share the compute resources. When I set the following config in Hive, it seems do not work well in Yarn. MapReduce "-Dmapreduce.job.priority=xx" Flink "-yD yarn.applicaiton.priority=xx" Spark "spark.yarn.priority=xx" In hive sql."set spark.yarn.priority=10;" It does not work ...
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
-
Apache YARN
01-24-2021
09:59 PM
hi, seems the same error . Excuse me , Where is the log of cm yarn usage aggretion logs ? I also set the pool rules and hive some error with user like admin which has no group matching. User admin can not submit job to yarn ,but the other normal user can do it .
... View more
01-24-2021
07:54 PM
hi , In the develop enviroment , there is a service named CM yarn usage aggregation runs in yarn per hour . It can be found in jobhistoryserver's web ui . but in the test env , there is no more . The difference between this two is that develop start with root , test env start with a user with previlege sudo . how to find the starting log of CMyarnusageaggregation to debug the problems ? The log-aggretion properties are both set to enabled. Much Thanks!
... View more
Labels:
- Labels:
-
Apache YARN