About ravi1

bleonhardi · ‎04-26-2016

@Kevin Sievers Hi Kevin, your commands look good to me, somehow he does not take the number of reduce tasks though. You are right Hadoop should be MUCH faster. But the one reduce task and even weirder one mapper seem to be the problem And I assure you it runs with a lot of mappers and 40 reducers and is loading and transforming around 300 GB of data in 20 minutes on an 7 datanode cluster. So basically I have NO idea why he does only one mapper, I have no idea why he has the second Reducer AT ALL. I have no idea why he ignores the mapred.reduce.tasks parameter? I think a support ticket might be in order. set hive.tez.java.opts = "-Xmx3600m"; set hive.tez.container.size = 4096; set mapred.reduce.tasks=120; CREATE EXTERNAL TABLE STAGING ... ... insert into TABLE TARGET partition (day = 20150811) SELECT * FROM STAGING distribute by DT ;

mevivekbhagat · ‎06-06-2017

"--create-hcatalog-table " This tells hive to create table.

ajay_kumar · ‎12-05-2015

As @bsaini explained this property determines no of open handlers at given time for a datanode. Two factors which one can look before changing this property are: 1. Use Cases 2. HDP services being used For example if you are using HBase extensively then increasing this property(to match cores or spindles in datanode) may help in getting better throughput specially for bulk writes/reads. However increasing it beyond a point will not help or even may effect performance negatively.

Enis · ‎12-02-2015

HBase master reads the list of files of the regions of tables in a couple of cases: (1) CatalogJanitor process. This runs every hbase.catalogjanitor.interval (5mins by default). This is for garbage collecting regions that have been split or merge. The catalog janitor checks whether the daugther regions (after a split) still has references to the parent region. Once the references are compacted, parent can be deleted. Notice that this process should only access recently split or merged regions. (2) HFile/WAL cleaner. This runs every hbase.master.cleaner.interval (1 min by default). This is for garbage collecting data files (hfiles) and WAL files. Data files in HBase can be referenced by more than one region, table and shared across snapshots and live tables and there is also a minimum time (TTL) that the hfile/WAL will be kept around. That is why the master is responsible for doing reference counting and garbage collecting the data files. This is possibly the most expensive NN operation among the other ones in this list. (3) Region Balancer. The balancer takes locality into account for balancing decisions. That is why the balancer will do file listing to find the locality of blocks of files in the regions. The locality of files is kept in a local cache for (hard coded unfortunately) 240 minutes.

cnauroth · ‎11-18-2015

ipc.server.tcpnodelay controls use of Nagle's algorithm on any server component that makes use of Hadoop's common RPC framework. That means that full deployment of a change in this setting would require a restart of any component that uses that common RPC framework. That's a broad set of components, including all HDFS, YARN and MapReduce daemons. It probably also includes other components in the wider ecosystem.

ravi1 · ‎11-30-2015

Thanks Steve. In our case, we are looking to set it at RM level, not necessarily even at app/AM level. So, AM fails for any reason, just don't retry AM on the same host, pick something else. Based on error, it might be good option to blacklist at RM level to not send further AMs there.

zangyongzhen · ‎12-14-2016

, how can i access the jira

david_greenshte · ‎12-19-2017

This method of using yarn command does not cover the use case of running HDInsight cluster on demand when cluster created to run the pipeline and then deleted. One approach is to use https://github.com/shanyu/hadooplogparser . Is there any option to configure YARN logger to produce text and not TFile binary format?

sunile_manjee · ‎03-13-2017

HDP service logs are available in ambari log search. the back end is solr so you can pull all or only relevant info based on your requirements. Also for service level metrics, ambari stores these now in grafana.

nsabharwal · ‎11-02-2015

Thanks @ravi@hortonworks.com

Online	Offline
Last Visited	‎12-18-2021 05:54 PM

Member Since	‎01-09-2019 05:01 PM
Last Visited	‎12-18-2021 05:54 PM
Posts	401
Kudos received	163

Cloudera Community

Re: 2 hosts not running master services

Re: ambari restart and service restart updating kr...

Re: How to automate sqoop incremental import using...

Re: Path to core-site.xml in sandbox?

Re: Curious to know why majority of the people are...

Re: How do you force the number of reducers in a m...

Re: Can sqoop be used to directly import data into...

Re: What is the recommended value for dfs.datanode...

Re: When would HBaseMaster access HBase HDFS files...

Re: What services need to be restarted if ipc.ser...

Re: Can we avoid Resource Manager to retry failed ...

Re: Does using CGroups with LinuxContainerExecutor...

Re: In which format are yarn container logs stored...

Re: Hadoop Log Monitoring

Re: Is there a way to change compression level in ...