Member since 
    
	
		
		
		07-17-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                738
            
            
                Posts
            
        
                433
            
            
                Kudos Received
            
        
                111
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3464 | 08-06-2019 07:09 PM | |
| 3653 | 07-19-2019 01:57 PM | |
| 5167 | 02-25-2019 04:47 PM | |
| 4656 | 10-11-2018 02:47 PM | |
| 1754 | 09-26-2018 02:49 PM | 
			
    
	
		
		
		08-24-2016
	
		
		09:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 At my site this will work  ACCUMULO_CONF_DIR=/etc/accumulo/conf/server accumulo init  After init no further issues found.  Many Thanks for your detailed help  🙂 Klaus 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-12-2016
	
		
		04:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		11 Kudos
		
	
				
		
	
		
					
							 
	Apache ZooKeeper is a “high-performance coordination service
for distributed applications.” Most users do not use ZooKeeper directly; however,
most users are also hard-pressed to deploy a Hadoop-based architecture that
doesn’t rely on ZooKeeper in some way. With its prevalence in the data-center, resource
management within ZooKeeper is paramount to ensure that the various
applications and services relying on ZooKeeper are able to access it in a
timely manner. To this end, one of ZooKeeper’s protection mechanisms is known
as “max client connections” or “
	maxClientCnxns”.  
	maxClientCnxns refers to a configuration property that can
be added to the zoo.cfg configuration file. This property limits the number of
active connections from a host, specified by IP address, to a single ZooKeeper
server. By default, this limit is 60 active connections; one host is not
allowed to have more than 60 active connections open to one ZooKeeper server.
Changes to this property in the zoo.cfg file require a restart of ZooKeeper.
This is a simple way that ZooKeeper prevents clients from performing a denial
of service attack against ZooKeeper (maliciously or unwittingly) as well as
limiting the amount of memory required by these client connections.  
	The reason this property is so important is that it can
effectively deny all access from a host inside of a cluster to a ZooKeeper
server. This can have a severe performance and stability impacts on a cluster.
For example, if a node running an Apache HBase RegionServer hits the
maxClientCnxns limit, all future requests made by that RegionServer to that
ZooKeeper server would be dropped until the overall number of connections to
the ZooKeeper server are reduced. Perhaps the worst part about this is that
processes other than HBase running on the same node (e.g. YARN containers as a
part of a MapReduce job) could also eat into the allowed connections from the
same host.  
	On a positive note, it is simple to recognize when this rate
limiting is happening and also simple to determine the problematic clients on
the rate-limited host. First, there is a very clear error message in the
ZooKeeper server log which identifies the host being rate-limited and the
current active connections limit: 
 “Too many connections from 10.0.0.1 – max is 60”
  
	This error message is stating that a client from the host
with IP address 10.0.0.1 is trying to connect to this ZooKeeper server, but the
limit is 60 connections. As such, the current connection will be dropped. At
this point, we know the host where these connections are coming from, but we
don’t know what applications on that host are making them. We can use a network
analysis tool such as `netstat` to determine the applications on the client host,
in this case 10.0.0.1 (let’s assume our ZooKeeper server is on 10.0.0.5): 
 netstat -nape | awk ‘{if ($5 == “10.0.0.5:2181”) print $4, $9;}’
  
	This command will list the local address and process
identifier for each connection, only where the remote address is our ZooKeeper
server and the ZooKeeper service port (2181). Similarly, we can further group
this data to give us a count of outgoing connections by process identifier to
the ZooKeeper server. 
 netstat -nape | awk ‘{if ($5 == “10.0.0.5:2181”) print $9;}’ | sort | uniq –c
  
	This command will report a count of connections to the
ZooKeeper server. This can be extremely helpful in identifying misbehaving
applications causing issues. Additionally, we can use some of the “four letter
word” commands to further give us information about the active connections to a
ZooKeeper server. Using netcat, either of the following could be used: 
 echo “stat” | nc 10.0.0.5 2181
  echo “cons” | nc 10.0.0.5 2181
  
	Each of these commands will output data which contains information about the active connections to the given ZooKeeper server.  
	To summarize, the maxClientCnxns property in zoo.cfg is used
by the ZooKeeper server to limit incoming connections to the ZooKeeper from a
single host. By default, this limit is 60. When this limit is reached, new
connections to the ZooKeeper server from the given host will be immediately
dropped. This rate-limiting can be observed in the ZooKeeper log and offending
applications can be identified by using network tools like netstat. Changes to
maxClientCnxns must be accompanied with a restart of the ZooKeeper server.  
	ZooKeeper configuration property documentation  
	ZooKeeper four letter words documentation 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-24-2017
	
		
		07:01 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 idea work-around if you don't have minimal versions of   
 Phoenix 4.8.0+  Hive 1.2.1+   to use Phoenix Storage Handler for Hive.  Phoenix-Pig Integration worked for  me.. thank you  @Saurabh Rathi 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-03-2016
	
		
		03:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks for trying it out! It does sound like a Slider bug if it isn't working for multiple containers on the same host. You're welcome to open a ticket at https://issues.apache.org/jira/browse/SLIDER, or I can do that for you.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-21-2016
	
		
		05:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 That is not an error message, it's INFO. ZooKeeper is just telling you that a client tried to create the node /brokers/ids but it already existed in ZooKeeper (you cannot create a node that already exists).  Not sure why you can't see the Kafka messages though. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-19-2016
	
		
		02:23 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							  @Michael Sobelman That DNS is not detectable by the node you are trying to access from.  You can be fancy on aws and configure through routing tables by setting up a proper vpn between the EMR and NiFi nodes.  Another option I used is route53 which will give you DNS publicly available.  Lastly you can put a ELB infront of your EMR HBase master node.  You may have to script it up (via boot scripts) to configure your ELB to point to new internal IP.   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-22-2016
	
		
		01:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 One of the most common questions I come across when trying to help debug MapReduce jobs is: "How do I change the Log4j level for my job?" Many times, a user has a JAR with a class that implements Tool that they invoke using the hadoop jar command. The desire is to change the log level without changing any code or global configuration files:  hadoop jar MyApplication.jar com.myorg.application.ApplicationJob <args ...>  There is an extremely large amount of misinformation because how to do 
this has drastically changed from the 0.20.x and 1.x Apache Hadoop days.
 Most posts will inform you of some solution involving environment 
variables or passing Java opts to the mappers/reducers. In practice, there is actually a very straightforward solution.  To change the Mapper Log4j level, set mapreduce.map.log.level. To change the Reducer Log4j level, set mapreduce.reduce.log.level. If for some reason you need to change the Log4j level on the MapReduce ApplicationMaster (e.g. to debug InputSplit generation), you need to set yarn.app.mapreduce.am.log.level. This is the proper way for the Apache Hadoop 2.x release line. These options do not allow configuration of a Log4j level on a certain class or package -- this would require custom logging setup to be provided by your application.  It's important to remember that you are able to define configuration properties (which will appear in your job via the Hadoop Configuration) using the `hadoop jar` command:  hadoop jar <jarfile> <classname> [-Dkey=value ...] [arg, ...]  The `-Dkey=value` section can be used to define the Log4j configuration properties when you launch the job.  For example, to set the DEBUG Log4j level on Mappers:  hadoop jar MyApplication.jar com.myorg.application.ApplicationJob -Dmapreduce.map.log.level=DEBUG <args ...>  To set the WARN Log4j level on Reducers:  hadoop jar MyApplication.jar com.myorg.application.ApplicationJob -Dmapreduce.reduce.log.level=WARN <args ...>  To set the DEBUG Log4j level on the MapReduce Application Master:  hadoop jar MyApplication.jar com.myorg.application.ApplicationJob -Dyarn.app.mapreduce.am.log.level=DEBUG <args ...>  And, of course, each of these options can be used with one another:  hadoop jar MyApplication.jar com.myorg.application.ApplicationJob -Dmapreduce.map.log.level=DEBUG -Dmapreduce.reduce.log.level=DEBUG -Dyarn.app.mapreduce.am.log.level=DEBUG <args ...> 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		09-29-2016
	
		
		09:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							  @Josh Elser extremely helpful article. nice work 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-14-2016
	
		
		01:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The phoenix-sqlline command is not using PQS. You want to use /usr/hdp/current/phoenix-client/bin/sqlline-thin.py to interactive with PQS. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-19-2016
	
		
		08:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The Manual Install Guide (versions 2.2.9, 2.3.0, 2.3.2, 2.3.4, 2.3.4.7, 2.4.0, and 2.4.2) have been updated with this information. Thank you very much for your comments and assistance.    http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-2a6efe32-d0e1-4e84-9068-4361b8c36dc8.1.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













