Member since
01-09-2019
401
Posts
163
Kudos Received
80
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2081 | 06-21-2017 03:53 PM | |
3160 | 03-14-2017 01:24 PM | |
1985 | 01-25-2017 03:36 PM | |
3166 | 12-20-2016 06:19 PM | |
1583 | 12-14-2016 05:24 PM |
04-26-2016
06:02 PM
@Kevin Sievers Hi Kevin, your commands look good to me, somehow he does not take the number of reduce tasks though. You are right Hadoop should be MUCH faster. But the one reduce task and even weirder one mapper seem to be the problem And I assure you it runs with a lot of mappers and 40 reducers and is loading and transforming around 300 GB of data in 20 minutes on an 7 datanode cluster. So basically I have NO idea why he does only one mapper, I have no idea why he has the second Reducer AT ALL. I have no idea why he ignores the mapred.reduce.tasks parameter? I think a support ticket might be in order. set hive.tez.java.opts = "-Xmx3600m";
set hive.tez.container.size = 4096;
set mapred.reduce.tasks=120;
CREATE EXTERNAL TABLE STAGING ...
...
insert into TABLE TARGET partition (day = 20150811) SELECT * FROM STAGING distribute by DT ;
... View more
06-06-2017
04:26 PM
"--create-hcatalog-table " This tells hive to create table.
... View more
12-05-2015
08:43 PM
As @bsaini explained this property determines no of open handlers at given time for a datanode. Two factors which one can look before changing this property are: 1. Use Cases 2. HDP services being used For example if you are using HBase extensively then increasing this property(to match cores or spindles in datanode) may help in getting better throughput specially for bulk writes/reads. However increasing it beyond a point will not help or even may effect performance negatively.
... View more
12-02-2015
07:37 PM
1 Kudo
HBase master reads the list of files of the regions of tables in a couple of cases: (1) CatalogJanitor process. This runs every hbase.catalogjanitor.interval (5mins by default). This is for garbage collecting regions that have been split or merge. The catalog janitor checks whether the daugther regions (after a split) still has references to the parent region. Once the references are compacted, parent can be deleted. Notice that this process should only access recently split or merged regions. (2) HFile/WAL cleaner. This runs every hbase.master.cleaner.interval (1 min by default). This is for garbage collecting data files (hfiles) and WAL files. Data files in HBase can be referenced by more than one region, table and shared across snapshots and live tables and there is also a minimum time (TTL) that the hfile/WAL will be kept around. That is why the master is responsible for doing reference counting and garbage collecting the data files. This is possibly the most expensive NN operation among the other ones in this list. (3) Region Balancer. The balancer takes locality into account for balancing decisions. That is why the balancer will do file listing to find the locality of blocks of files in the regions. The locality of files is kept in a local cache for (hard coded unfortunately) 240 minutes.
... View more
11-18-2015
06:27 PM
1 Kudo
ipc.server.tcpnodelay controls use of Nagle's algorithm on any server component that makes use of Hadoop's common RPC framework. That means that full deployment of a change in this setting would require a restart of any component that uses that common RPC framework. That's a broad set of components, including all HDFS, YARN and MapReduce daemons. It probably also includes other components in the wider ecosystem.
... View more
11-30-2015
07:56 PM
Thanks Steve. In our case, we are looking to set it at RM level, not necessarily even at app/AM level. So, AM fails for any reason, just don't retry AM on the same host, pick something else. Based on error, it might be good option to blacklist at RM level to not send further AMs there.
... View more
12-14-2016
05:53 PM
, how can i access the jira
... View more
12-19-2017
10:43 AM
This method of using yarn command does not cover the use case of running HDInsight cluster on demand when cluster created to run the pipeline and then deleted. One approach is to use https://github.com/shanyu/hadooplogparser . Is there any option to configure YARN logger to produce text and not TFile binary format?
... View more
03-13-2017
01:54 AM
HDP service logs are available in ambari log search. the back end is solr so you can pull all or only relevant info based on your requirements. Also for service level metrics, ambari stores these now in grafana.
... View more
11-02-2015
11:58 AM
Thanks @ravi@hortonworks.com
... View more
- « Previous
- Next »