Member since
10-13-2016
68
Posts
10
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2111 | 02-15-2019 11:50 AM | |
4640 | 10-12-2017 02:03 PM | |
872 | 10-13-2016 11:52 AM |
09-27-2017
12:25 PM
I have a graphite server, to which I want to send Hadoop metrics2. On paper it's easy. Just add log4j.logger.org.apache.hadoop.metrics2=DEBUG to the log4j template and update hadoop-metrics2.properties template with: *.sink.graphite.class=org.apache.hadoop.metrics2.sink.GraphiteSink
*.sink.graphite.server_host=10.x.x.x
*.sink.graphite.server_port=2003
datanode.sink.graphite.metrics_prefix=datanode
namenode.sink.graphite.metrics_prefix=namenode
resourcemanager.sink.graphite.metrics_prefix=resourcemanager
nodemanager.sink.graphite.metrics_prefix=nodemanager
jobhistoryserver.sink.graphite.metrics_prefix=jobhistoryserver
journalnode.sink.graphite.metrics_prefix=journalnode
maptask.sink.graphite.metrics_prefix=maptask
reducetask.sink.graphite.metrics_prefix=reducetask
applicationhistoryserver.sink.graphite.metrics_prefix=applicationhistoryserver
It works very well with one service (eg. datanode). If I put more than one, I will only get 2 services in graphite, and I cannot confirm that all metrics for those services are present. Not knowing what metrics to expect and wanting to experiment, I do not want to filter on actual metric to limit their number. On collectd side I can see one metric dropped (invalid), but one metric only. It does not account for all the rest. Furthemore, setting CollectInternalStats to true shows me that no metrics is dropped. On Hadoop side... Well, I could not find anything telling me if metrics ar actually sent or not, if it succeeds or fail... Not logging anywhere. So my 2 questions are: How can I debug metrics2? Is there any known reasons why I am missing metrics? Context: hdp2.6 on AWS.
... View more
Labels:
- Labels:
-
Apache Hadoop
07-06-2017
10:44 AM
@Vani I am trying to understand what will this memory be used for. My understanding is that: any application will require its own AM one AM will use 1 container only tez-site/tez.am.resource.memory.mb defines the memory usable by the total of all AM So logically all AM memory should never be more than half of the available memory (for the worst case scenario where all application only use one container) I should allocate in tez-site/tez.am.resource.memory.mb (minimum container size * expected number of applications) Could you confirm my understanding?
... View more
07-04-2017
01:22 PM
@Vani, Thanks for your answer. I do not see an immediate change, but I carry on looking in this direction. What would be a good logical value for this maximum-am-resource-percent? Currently the AM memory (tez-site/tez.am.resource.memory.mb) is set to the min container size (5GB in my case). Does that make sense?
... View more
07-03-2017
02:27 PM
I have a small one node hdp2.6 cluster (8 CPUs, 32GB ram), and I cannot run more than 1 query at a time, although I was pretty sure that I configures the relevant settings to allow more than one container. The relevant configs are: yarn-site/yarn.nodemanager.resource.memory-mb = 27660
yarn-site/yarn.scheduler.minimum-allocation-mb = 5532
yarn-site/yarn.scheduler.maximum-allocation-mb = 27660
mapred-site/mapreduce.map.memory.mb = 5532
mapred-site/mapreduce.reduce.memory.mb = 11064
mapred-site/mapreduce.map.java.opts = -Xmx4425m
mapred-site/mapreduce.reduce.java.opts = -Xmx8851m
mapred-site/yarn.app.mapreduce.am.resource.mb = 11059
mapred-site/yarn.app.mapreduce.am.command-opts = -Xmx8851m -Dhdp.version=${hdp.version}
hive-site/hive.execution.engine = tez
hive-site/hive.tez.container.size = 5532
hive-site/hive.auto.convert.join.noconditionaltask.size = 1546859315
tez-site/tez.runtime.unordered.output.buffer.size-mb = 414
tez-interactive-site/tez.am.resource.memory.mb = 5532
tez-site/tez.am.resource.memory.mb = 5532
tez-site/tez.task.resource.memory.mb = 5532
tez-site/tez.runtime.io.sort.mb = 1351
hive-site/hive.tez.java.opts = -server -Xmx4425m -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps
capacity-scheduler/yarn.scheduler.capacity.resource-calculator = org.apache.hadoop.yarn.util.resource.DominantResourceCalculatororg.apache.hadoop.yarn.util.resource.DominantResourceCalculator
yarn-site/yarn.nodemanager.resource.cpu-vcores = 6
yarn-site/yarn.scheduler.maximum-allocation-vcores = 6
mapred-site/mapreduce.map.output.compress = true
hive-site/hive.exec.compress.intermediate = true
hive-site/hive.exec.compress.output = true
hive-interactive-env/enable_hive_interactive = false
Which if I understand it well, gives 5GB per container. If I run a hive query, it will use 5GB, 1 core, leaving about 15GB and 5 cores for the rest. I do not understand why the next query cannot start at the same time. Any help would be much welcome.
... View more
Labels:
- Labels:
-
Apache Hive
06-15-2017
08:06 AM
I was using hive 1 with hive.server2.enable.doas=true. Now I want to use hive-interactive, but hive.server2.enable.doas has to be false apparently (that is what ambari says). This of course makes most of my queries break because of wrong permissions. I am curious to know why this setting cannot be true is there know workaround for this. Context: hdp 2.6 with hive and hive-interactive. Thanks!
... View more
Labels:
- Labels:
-
Apache Hive
06-15-2017
05:34 AM
Thanks, but I am not interested in this surrogate key. The point of defining the PK was to help eg. reporting tools to find out automatically joins between tables. This surrogate key would thus not do. Thanks!
... View more
06-14-2017
02:10 PM
The example I gave was a trimmed-down version of what I wanted to do to show the technical problem. My expected PK is actually a compound PK, with a few partitioned columns and a few non-partitioned columns. But I am afraid that your answer says it all, no can do :(. Thanks!
... View more
06-14-2017
10:54 AM
I want to add primary key constraints to hive tables. The only think is that my PK is actually a partitioned column. For instance: CREATE TABLE pk
(
id INT,
PRIMARY KEY(part) DISABLE NOVALIDATE
)
PARTITIONED BY (part STRING) This fails with the error message: DBCException: SQL Error [10002] [42000]: Error while compiling statement: FAILED: SemanticException [Error 10002]: Invalid column reference part Is there a way to use a partitioned column as PK? Context: hdp 2.6, hive 2.1 with llap.
... View more
Labels:
- Labels:
-
Apache Hive
06-14-2017
10:52 AM
I want to add primary/foreign key constraints to a hive table. The only think is that my PK is actually a partitioned column. For instance: CREATE TABLE pk
(
id INT,
PRIMARY KEY(part) DISABLE NOVALIDATE
)
PARTITIONED BY (part STRING) This fails with the error message: DBCException: SQL Error [10002] [42000]: Error while compiling statement: FAILED: SemanticException [Error 10002]: Invalid column reference part Is there a way to use a partitioned column as PK? Context: hp 2.6, hive 2.1 with llap.
... View more
Labels:
- Labels:
-
Apache Hive
04-24-2017
12:06 PM
The answer is that is is not possible to set those parameters globally. @Murali Ramasami has the right workaround.
... View more
- « Previous
- Next »