Member since
09-22-2017
37
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4621 | 11-16-2019 10:29 AM |
01-30-2022
10:34 PM
The following parameters control the number of mappers for splittable formats with Tez: set tez.grouping.min-size=16777216; -- 16 MB min split
set tez.grouping.max-size=1073741824; -- 1 GB max split Adjust the above values to best suit your data file size to avoid file split grouping leading to increased number of mappers. If you still don't see number of mappers increased and hive.input.format is set to “org.apache.hadoop.hive.ql.io.CombineHiveInputFormat”, you may need to adjust below properties as well set mapreduce.input.fileinputformat.split.maxsize=50000;
set mapreduce.input.fileinputformat.split.minsize=50000; Please note that data locality w.r.t nodes also plays roles in determining, for more information please refer to the below references References: https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94915 https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works https://cloudera.ericlin.me/2015/05/how-to-control-the-number-of-mappers-required-for-a-hive-query/ http://cloudsqale.com/2018/10/22/tez-internals-1-number-of-map-tasks/ http://cloudsqale.com/2018/12/24/orc-files-split-computation-hive-on-tez/
... View more
12-01-2021
08:41 AM
Hi, The Reducer process the output of the mapper. After processing the data, it produces a new set of output. At last HDFS stores this output data. Reducer takes a set of an intermediate key value pair produced by the mapper as the input and runs a Reducer function on each of them. The mappers and reducers depends on the data that is processing. You can manually set the number of reducers with below property but i think it is not recommended. set mapred.reduce.tasks=xx; Regards, Chethan YM
... View more
10-15-2021
08:58 PM
Hi, I have found a another community article that has addressed your concern. Please do check below: That sounds like all is working as designed/implemented since Ranger does not currently (as of HDP 2.4) have a supported plug-in for Spark and knowing that when spark is reading Hive tables that it really isn't going through the "front door" of Hive to actual run queries (it is reading these files from HDFS directly). That said, the underlying HDFS authorization policies (either w/or w/o using Ranger) will be honored if they are in-place. Article: https://community.cloudera.com/t5/Support-Questions/Does-Spark-job-honor-Ranger-hive-policies/td-p/147760 Do mark it resolved if it really helps you. Regards, Chethan YM
... View more
11-16-2019
05:02 PM
Thank you for the info. Yes, I have created with backup with another directory and I was about to boot restart the namenode from that image.
... View more
05-25-2018
03:27 PM
@Mokkan Mok NN does not write blocks to DN, only client to DN and DN to DN (depending on replication factor). Client to DN depends on client you are using. If you are using webhdfs you will be using HTTP for example. Other clients like hdfs use RPC protocol. I think DN to DN replication is always RPC. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
05-24-2018
02:25 PM
Thank you very much, we are using centos we should be able to install using yum?
... View more
05-25-2018
08:17 AM
Thank you @Geoffrey Shelton Okot It helps. Actually, I was thinking it will create a user in CentOS only. But It has created a user in HDFS as well.
... View more
05-16-2018
04:41 PM
@Mokkan Mok The HDP and Ambari upgrades will only impact the related binaries, but you should also test their compatibility against bespoke/third party tools that are plug to the hadoop cluster e.g Presto,Juypter ,tableau ,etc
... View more