About mokkan

tarak271 · ‎01-30-2022

The following parameters control the number of mappers for splittable formats with Tez: set tez.grouping.min-size=16777216; -- 16 MB min split set tez.grouping.max-size=1073741824; -- 1 GB max split Adjust the above values to best suit your data file size to avoid file split grouping leading to increased number of mappers. If you still don't see number of mappers increased and hive.input.format is set to “org.apache.hadoop.hive.ql.io.CombineHiveInputFormat”, you may need to adjust below properties as well set mapreduce.input.fileinputformat.split.maxsize=50000; set mapreduce.input.fileinputformat.split.minsize=50000; Please note that data locality w.r.t nodes also plays roles in determining, for more information please refer to the below references References: https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94915 https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works https://cloudera.ericlin.me/2015/05/how-to-control-the-number-of-mappers-required-for-a-hive-query/ http://cloudsqale.com/2018/10/22/tez-internals-1-number-of-map-tasks/ http://cloudsqale.com/2018/12/24/orc-files-split-computation-hive-on-tez/

ChethanYM · ‎12-01-2021

Hi, The Reducer process the output of the mapper. After processing the data, it produces a new set of output. At last HDFS stores this output data. Reducer takes a set of an intermediate key value pair produced by the mapper as the input and runs a Reducer function on each of them. The mappers and reducers depends on the data that is processing. You can manually set the number of reducers with below property but i think it is not recommended. set mapred.reduce.tasks=xx; Regards, Chethan YM

ChethanYM · ‎10-15-2021

Hi, I have found a another community article that has addressed your concern. Please do check below: That sounds like all is working as designed/implemented since Ranger does not currently (as of HDP 2.4) have a supported plug-in for Spark and knowing that when spark is reading Hive tables that it really isn't going through the "front door" of Hive to actual run queries (it is reading these files from HDFS directly). That said, the underlying HDFS authorization policies (either w/or w/o using Ranger) will be honored if they are in-place. Article: https://community.cloudera.com/t5/Support-Questions/Does-Spark-job-honor-Ranger-hive-policies/td-p/147760 Do mark it resolved if it really helps you. Regards, Chethan YM

mokkan · ‎11-16-2019

Thank you for the info. Yes, I have created with backup with another directory and I was about to boot restart the namenode from that image.

falbani · ‎05-25-2018

@Mokkan Mok NN does not write blocks to DN, only client to DN and DN to DN (depending on replication factor). Client to DN depends on client you are using. If you are using webhdfs you will be using HTTP for example. Other clients like hdfs use RPC protocol. I think DN to DN replication is always RPC. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

mokkan · ‎05-24-2018

Thanks a lot.

mokkan · ‎05-24-2018

Thank you very much, we are using centos we should be able to install using yum?

pateljay · ‎05-25-2018

Thank you @Geoffrey Shelton Okot It helps. Actually, I was thinking it will create a user in CentOS only. But It has created a user in HDFS as well.

Shelton · ‎05-16-2018

@Mokkan Mok The HDP and Ambari upgrades will only impact the related binaries, but you should also test their compatibility against bespoke/third party tools that are plug to the hadoop cluster e.g Presto,Juypter ,tableau ,etc

Online	Offline
Last Visited	‎12-01-2021 11:44 AM

Member Since	‎09-22-2017 06:34 PM
Last Visited	‎12-01-2021 11:44 AM
Posts	37

Cloudera Community

Re: namenode format question

Re: Number of mapper is not changing

Re: Reducer 1 question

Re: Spark query vs beeline query

Re: namenode format question

Re: hdfs block protocol

Re: delegation token and block token question

Re: Edge node or utility node packages

Re: [Closed] : How to create user in HDFS

Re: How do I upgrade hdp 2.2 to 2.6