Member since 
    
	
		
		
		10-13-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                68
            
            
                Posts
            
        
                10
            
            
                Kudos Received
            
        
                3
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2553 | 02-15-2019 11:50 AM | |
| 5136 | 10-12-2017 02:03 PM | |
| 1223 | 10-13-2016 11:52 AM | 
			
    
	
		
		
		09-27-2017
	
		
		12:25 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I have a graphite server, to which I want to send Hadoop metrics2.  On paper it's easy. Just add log4j.logger.org.apache.hadoop.metrics2=DEBUG to the log4j template and update hadoop-metrics2.properties template with:  *.sink.graphite.class=org.apache.hadoop.metrics2.sink.GraphiteSink 
*.sink.graphite.server_host=10.x.x.x
*.sink.graphite.server_port=2003
datanode.sink.graphite.metrics_prefix=datanode
namenode.sink.graphite.metrics_prefix=namenode
resourcemanager.sink.graphite.metrics_prefix=resourcemanager
nodemanager.sink.graphite.metrics_prefix=nodemanager
jobhistoryserver.sink.graphite.metrics_prefix=jobhistoryserver
journalnode.sink.graphite.metrics_prefix=journalnode
maptask.sink.graphite.metrics_prefix=maptask
reducetask.sink.graphite.metrics_prefix=reducetask
applicationhistoryserver.sink.graphite.metrics_prefix=applicationhistoryserver
  It works very well with one service (eg. datanode). If I put more than one, I will only get 2 services in graphite, and I cannot confirm that all metrics for those services are present.  Not knowing what metrics to expect and wanting to experiment, I do not want to filter on actual metric to limit their number.  On collectd side I can see one metric dropped (invalid), but one metric only. It does not account for all the rest. Furthemore, setting CollectInternalStats to true shows me that no metrics is dropped.  On Hadoop side... Well, I could not find anything telling me if metrics ar actually sent or not, if it succeeds or fail... Not logging anywhere.  So my 2 questions are:   How can I debug metrics2?  Is there any known reasons why I am missing metrics?   Context: hdp2.6 on AWS. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
			
    
	
		
		
		07-06-2017
	
		
		10:44 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Vani I am trying to understand what will this memory be used for. My understanding is that:   any application will require its own AM  one AM will use 1 container only  tez-site/tez.am.resource.memory.mb defines the memory usable by the total of all AM   So logically    all AM memory should never be more than half of the available memory (for the worst case scenario where all application only use one container)  I should allocate in tez-site/tez.am.resource.memory.mb (minimum container size * expected number of applications)   Could you confirm my understanding? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-04-2017
	
		
		01:22 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Vani, Thanks for your answer.  I do not see an immediate change, but  I carry on looking in this direction.  What would be a good logical value for this maximum-am-resource-percent?  Currently the AM memory (tez-site/tez.am.resource.memory.mb) is set to the min container size (5GB in my case). Does that make sense?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-03-2017
	
		
		02:27 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I have a small one node hdp2.6 cluster (8 CPUs, 32GB ram), and I cannot run more than 1 query at a time, although I was pretty sure that I configures the relevant settings to allow more than one container.  The relevant configs are:  yarn-site/yarn.nodemanager.resource.memory-mb = 27660
yarn-site/yarn.scheduler.minimum-allocation-mb = 5532
yarn-site/yarn.scheduler.maximum-allocation-mb = 27660
mapred-site/mapreduce.map.memory.mb = 5532
mapred-site/mapreduce.reduce.memory.mb = 11064
mapred-site/mapreduce.map.java.opts = -Xmx4425m
mapred-site/mapreduce.reduce.java.opts =  -Xmx8851m
mapred-site/yarn.app.mapreduce.am.resource.mb = 11059
mapred-site/yarn.app.mapreduce.am.command-opts = -Xmx8851m -Dhdp.version=${hdp.version}
hive-site/hive.execution.engine = tez
hive-site/hive.tez.container.size = 5532
hive-site/hive.auto.convert.join.noconditionaltask.size = 1546859315
tez-site/tez.runtime.unordered.output.buffer.size-mb = 414
tez-interactive-site/tez.am.resource.memory.mb = 5532
tez-site/tez.am.resource.memory.mb = 5532
tez-site/tez.task.resource.memory.mb = 5532
tez-site/tez.runtime.io.sort.mb = 1351
hive-site/hive.tez.java.opts = -server -Xmx4425m -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps
capacity-scheduler/yarn.scheduler.capacity.resource-calculator = org.apache.hadoop.yarn.util.resource.DominantResourceCalculatororg.apache.hadoop.yarn.util.resource.DominantResourceCalculator
yarn-site/yarn.nodemanager.resource.cpu-vcores = 6
yarn-site/yarn.scheduler.maximum-allocation-vcores = 6
mapred-site/mapreduce.map.output.compress = true
hive-site/hive.exec.compress.intermediate = true
hive-site/hive.exec.compress.output = true
hive-interactive-env/enable_hive_interactive = false
  Which if I understand it well, gives 5GB per container.   If I run a hive query, it will use 5GB, 1 core, leaving about 15GB and 5 cores for the rest. I do not understand why the next query cannot start at the same time.       Any help would be much welcome. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
			
    
	
		
		
		06-15-2017
	
		
		08:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I was using hive 1 with hive.server2.enable.doas=true. Now I want to use hive-interactive, but hive.server2.enable.doas has to be false apparently (that is what ambari says). This of course makes most of my queries break because of wrong permissions.  I am curious to know   why this setting cannot be true  is there know workaround for this.   Context: hdp 2.6 with hive and hive-interactive.  Thanks! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
			
    
	
		
		
		06-15-2017
	
		
		05:34 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks, but I am not interested in this surrogate key. The point of defining the PK was to help eg. reporting tools to find out automatically joins between tables. This surrogate key would thus not do.  Thanks! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-14-2017
	
		
		02:10 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The example I gave was a trimmed-down version of what I wanted to do to show the technical problem.   My expected PK is actually a compound PK, with a few partitioned columns and a few non-partitioned columns.   But I am afraid that your answer says it all, no can do :(.  Thanks! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-14-2017
	
		
		10:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I want to add primary key constraints to hive tables. The only think is that my PK is actually a partitioned column. For instance:  CREATE TABLE pk 
(
  id INT, 
  PRIMARY KEY(part) DISABLE NOVALIDATE
)
PARTITIONED BY (part STRING)  This fails with the error message:  DBCException: SQL Error [10002] [42000]: Error while compiling statement: FAILED: SemanticException [Error 10002]: Invalid column reference part  Is there a way to use a partitioned column as PK?  Context: hdp 2.6, hive 2.1 with llap. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
			
    
	
		
		
		06-14-2017
	
		
		10:52 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	I want to add primary/foreign key constraints to a hive table. The only think is that my PK is actually a partitioned column. For instance:  CREATE TABLE pk 
(
  id INT, 
  PRIMARY KEY(part) DISABLE NOVALIDATE
)
PARTITIONED BY (part STRING)  This fails with the error message:  DBCException: SQL Error [10002] [42000]: Error while compiling statement: FAILED: SemanticException [Error 10002]: Invalid column reference part  Is there a way to use a partitioned column as PK?  Context: hp 2.6, hive 2.1 with llap. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
			
    
	
		
		
		04-24-2017
	
		
		12:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The answer is that is is not possible to set those parameters globally.  @Murali Ramasami has the right workaround. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		- « Previous
- Next »
 
        













