Member since 
    
	
		
		
		09-21-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                85
            
            
                Posts
            
        
                75
            
            
                Kudos Received
            
        
                7
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2484 | 04-21-2016 12:22 PM | |
| 5674 | 03-12-2016 02:19 PM | |
| 2429 | 10-29-2015 07:50 PM | |
| 2806 | 10-02-2015 04:21 PM | |
| 7611 | 09-29-2015 03:08 PM | 
			
    
	
		
		
		06-02-2016
	
		
		06:03 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 As far as I know Kerberos is for authentication. Not the encryption of Hive communication. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-02-2016
	
		
		05:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Sri  Bandaru- No. That's for Ambari HTTPS. I'm referring to SSL of HiveServer2 connections.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-02-2016
	
		
		02:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 What configuration is required in the Hive Ambari View for supporting Hive SSL? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Ambari
- 
						
							
		
			Apache Hive
			
    
	
		
		
		04-21-2016
	
		
		12:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Ali Bajwa A simplified approach:
On the Ambari Server:    yum -y install git
git clone https://github.com/seanorama/ambari-bootstrap
cd ambari-bootstrap
export ambari_server_custom_script=${ambari_server_custom_script:-~/ambari-bootstrap/ambari-extras.sh}
export install_ambari_server=true
./ambari-bootstrap.sh Then deploy the cluster. The "extras" script above takes care of all the tedious stuff automatically (cloning Zeppelin, the blueprint defaults, the role command order, ...). yum -y install python-argparse
cd deploy
export ambari_services="HDFS MAPREDUCE2 YARN ZOOKEEPER HIVE SPARK ZEPPELIN"
bash ./deploy-recommended-cluster.bash 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-21-2016
	
		
		12:22 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 The Google Cloud Storage Connector for Hadoop is configured at the cluster level without any knowledge of Kerberos.  So the output you showed is what I would expect.  But some thoughts:   In secure environments, ideally a user can never even reach Hadoop without authentication against the Kerberos or Directory.  With that assumed, you would never get the chance to run 'hadoop fs -ls ...' anyway.  So lock down all access to the environment & network so only authorized users can even run the commands.    It couldn't hurt to submit a feature request for a configuration option that disables 'gs' unless the user is authenticated to Hadoop.  Personally I see this as a bug report, but technically it's a feature request.  You would have to raise it with Google since the Connector is not currently a part of Apache Hadoop. Google maintains it separately. 
  Why it's not a bug: Kerberos governs communications between services, not the executions of commands. Since GS doesn't do Kerberos, it works as intended since it already has it's authentication done separately.      I've not done it, but you could check if individual users/applications can pass the GCS token. If possible then you would remove it from the cluster-wide configuration and the users would be required to do this themselves. It would still not be using Kerberos but would be another layer of security.  s3a://, swift://, and wasb:// support this method.    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-04-2016
	
		
		03:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Prerequisites: 
 
 Launch Sandbox on Azure
 
 VM Size: Minimum of A4 or A5 
 
 
 A Twitter App
 
 You'll use the API credentials 
 The "Application Details" don't matter 
 
 
 
 Prepare the Sandbox 
 Connect to SSH & Ambari 
 
 Connect to the Sandbox using SSH
 
 or web console: http://<<ip>>:4200/ 
 
 
 Become root:
 sudo su - 
 
 Reset the Ambari password:
 ambari-admin-password-reset 
 
 Login to Ambari:
 
 http://<<ip>>:8080 
 User: admin 
 
 
 Before moving to the next steps, ensure all services on the left are started (green) or in maintenance mode (black). 
 
 Install NiFi 
 
 In Ambari, Click "Actions" (bottom left) -> Add Service 
 Choose NiFi and continue through the dialogs. 
 You shouldn't need to change anything 
 NiFi should now be accessible at http:<<ip>>:9090/nifi/ 
 
 Tune Sandbox 
 The Sandbox is tuned to run on minimal hardware. We need to update the Hive, Tez & YARN configuration for our use case. 
 
 This could take up to 15 minutes to complete:
 bash <(curl -sSL https://git.io/vVRPs) 
 
 
 Solr & Banana 
 Solr enables the ability to search across large corpuses of information through specialized indexing techniques. 
 Banana is a dashboard visualization tool for Solr. 
 
 Download the Banana Dashboard
 curl -L https://git.io/vVRP3 -o /opt/hostname-hdpsearch/solr/server/solr-webapp/webapp/banana/app/dashboards/default.json 
 
 Update Solr to support Twitter's timestamp format
 curl -L https://git.io/vVRPz -o /opt/hostname-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml 
 
 Start Solr
 JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 /opt/hostname-hdpsearch/solr/bin/solr start -c -z localhost:2181 
 
 Create Solr collection for tweets
 /opt/hostname-hdpsearch/solr/bin/solr create -c tweets -d data_driven_schema_configs -s 1 -rf 1
 
 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-12-2016
	
		
		02:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		17 Kudos
		
	
				
		
	
		
					
							 As with many topics, "it depends".  For slave/worker/data hosts which only have distributed services you can likely disable swap. With distributed services it's preferred to let the process/host be killed rather than swap. The killing of that process or host shouldn't affect cluster availability.  Said another way: you want to "fail fast" not to "slowly degrade". Just 1 bad process/host can greatly degrade performance of the whole cluster. For example, in a 350 host cluster removal of 2 bad nodes improved throughput by ~2x:   http://www.slideshare.net/t3rmin4t0r/tez8-ui-walkthrough/23  http://pages.cs.wisc.edu/~thanhdo/pdf/talk-socc-limplock.pdf   For masters, swap is also often disabled though it's not a set rule from Hortonworks and I assume there will be some discussion/disagreement. Masters can be treated somewhat like you'd treat masters in other, non-Hadoop, environments.  The fear with disabling swap on masters is that an OOM (out of memory) event could affect cluster availability. But that will still happen even with swap configured, it just will take slightly longer. Good administrator/operator practices would be to monitor RAM availability, then fix any issues before running out of memory. Thus maintaining availability without affecting performance. No swap is needed then.  Scenarios where you might want swap:   playing/testing functionality, not performance, on hosts with very little RAM so will likely need to swap.  if you have the need to use more memory, or expect to need more, than the amount of RAM which has been purchased. And can accept severe degradation in failure. In this case you would need a lot of swap configured. Your better off buying the right amount of memory.   Extra thoughts:   if you want to disable swap, but your organization require their to be a swap partition, set swappiness=0  if you choose to have swap, set swappiness=1 to avoid swapping until all physical memory has been used.  most Cloud/Virtualization providers disable swap by default. Don't change that.  some advise to avoid swap on SSDs due to reducing their lifespan  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-11-2016
	
		
		08:38 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 The questions will be:
- 1. Should there be a swap partition at all (i.e. swappiness=0)?
- 2. Do recommendations vary between masters, workers or certain components?
- 3. If swappiness>=1, what should the amount be? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-11-2016
	
		
		08:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 David - Thanks for posting. As discussed separately, the 2xRAM recommendation is definitely out of date.
I'm working on some consensus with my team on their recommendations, and look forward to others comments coming in below. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-12-2016
	
		
		06:53 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Mind if we convert this to an Article and update together since no answer will be correct for more than a couple months? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













