About sreeviswa_athic

dineshc · ‎07-14-2017

@Viswa According to official apache document by default number of reducers is set to 1 You can override this by using the following properties: For MR1 set mapred.reduce.tasks=N For MR2 set mapreduce.job.reduces=N The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * <no. of maximum containers per node>). With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing. Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures. The scaling factors above are slightly less than whole numbers to reserve a few reduce slots in the framework for speculative-tasks and failed tasks. Now to understand the number of tasks spawned I would point you to this blog In MR1, the number of tasks launched per node was specified via the settings mapred.map.tasks.maximum and mapred.reduce.tasks.maximum. In MR2, one can determine how many concurrent tasks are launched per node by dividing the resources allocated to YARN by the resources allocated to each MapReduce task, and taking the minimum of the two types of resources (memory and CPU). Specifically, you take the minimum of yarn.nodemanager.resource.memory-mb divided by mapreduce.[map|reduce].memory.mb and yarn.nodemanager.resource.cpu-vcores divided by mapreduce.[map|reduce].cpu.vcores. This will give you the number of tasks that will be spawned per node.

dineshc · ‎06-30-2017

@Viswa In Tez, there are following types of DataMovements that take place between 2 vertex and is represented via an Edge in the DAG. BROADCAST Output on this edge produced by any source task is available to all destination tasks. CUSTOM Custom routing defined by the user. ONE_TO_ONE Output on this edge produced by the i-th source task is available to the i-th destination task. SCATTER_GATHER The i-th output on this edge produced by all source tasks is available to the same destination task. To answer your question: SIMPLE_EDGE refers to data movement type - SCATTER_GATHER (example - SHUFFLE JOIN ) BROADCAST_EDGE refers to data movement type - BROADCAST (example - MAP JOIN) I drew the above inference from createEdgeProperty() in source code Hope this helps.

nrbndsdb0509 · ‎06-15-2017

thank you @Dinesh Chitlangia that explains it very well.

sreeviswa_athic · ‎06-05-2017

tried creating RDD with collect() and print out using for loop. Was working fine. Was trying out in pyspark though. thank you

jsensharma · ‎06-03-2017

@Viswa Were you able to find the Zookeeper logs?

sreeviswa_athic · ‎03-28-2017

I have created new set of User with correct hostname and privileges and it worked, thankyou

rajsyrus · ‎03-14-2017

@Viswa To check Namenode Safe mode status, Login to Namenode host and issue the below command, [user@NNhost1 ~]$ hdfs dfsadmin -safemode get Safe mode is OFF in NNhost1/10.X.X.X:8020 Safe mode is OFF in NNhost2/10.X.X.X:8020 If Safe mode is turned ON, please issue the below command to leave from safemode. [user@NNhost1 ~]$ hdfs dfsadmin -safemode leave

dineshc · ‎03-14-2017

@Viswa - Kindly accept the answer if my answer as helped you.

sreeviswa_athic · ‎03-09-2017

tried the same command again later, it worked. haven't changed anything. Thank you Jay SenSharma

namaheshwari · ‎03-10-2017

@Viswa - Can we close this now?

Online	Offline
Last Visited	‎04-04-2020 07:07 PM

Member Since	‎02-25-2016 11:18 PM
Last Visited	‎04-04-2020 07:07 PM
Posts	72
Kudos received	34

Cloudera Community

Re: Hive query execution taking longer time

Re: HBase/Phoenix - How to specify autocommit in J...

Re: Hive 1.2.x - CTAS behavior when using CAST to ...

Re: Oozie > HiveActionExecutor > LauncherMapper di...

Re: How to apply configuration when creating more ...

Re: Number of Tasks created for each reducer

Re: Hive Explain plan Interpretation

Re: Irregularities in Select query

Re: spark - spark socketexception connection reset...

Re: Zookeeper logs

Re: Ranger installation failed

Re: Namenode safe mode

Re: Checkpoint node configuration parameters

Re: Could not find or load main class dfsadmin

Re: NameNode HA - HBASE has to be stopped