Member since 
    
	
		
		
		09-29-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                286
            
            
                Posts
            
        
                601
            
            
                Kudos Received
            
        
                60
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 12844 | 03-21-2017 07:34 PM | |
| 3762 | 11-16-2016 04:18 AM | |
| 2142 | 10-18-2016 03:57 PM | |
| 5100 | 09-12-2016 03:36 PM | |
| 8432 | 08-25-2016 09:01 PM | 
			
    
	
		
		
		11-11-2016
	
		
		06:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		5 Kudos
		
	
				
		
	
		
					
							 Here are the Requirements:   Total Data Size - Uncompressed: 13.5TB; Compressed: 2 TB  Large Virtual Fact Table, View containing a Union All of 3 Large Tables, 11 Billion Records in Total Size  Another view taking the large virtual fact table, with consecutive Left Out Joins on 8 Dimension Tables, so that no matter what 11 Billion records is always the result.  There is timestamp data that you can use to filter rows by.   Suppose you were given the following. How would you begin configuring Hortonworks for Hive?  Would you focus on storage? How can we configure for compute?  Lets assume:   Platform: AWS  Data Node Instance: r3_4xlarge  Cores: 16  RAM: 122 GB  EBS Storage: 2 x 1TB Disks   So where do we begin?  First Some Quick Calculations:  Memory per Core: 122GB/16 = 7.625; Approximate 8 GB per CPU Core  This means our largest Container Size PER Node per core is 8 GB
  However we should not  reserve all 16 Cores to Hadoop.  Some Cores are need for OS and other processes.  Let's Assume 14 Cores is reserved for YARN.  Memory Allocated for All YARN containers on a node = No. of Virtual Cores x Memory Per Core
114688 MB = 14 * 8192 MB (8 *1024)  
Note Also   At 8 GB, we can run in parallel 14 Tasks (Mappers or Reducers), one per CPU, without wasting RAM. We can certainly run container sizes less than 8GB if we wish,  Since our Optimal Container Size per Node is 8 GB, our Yarn Minimum Container Size must be a factor of 8GB to prevent wastage of memory, that is: 1,2,4,8   However  Tez Container Size for Hive is a multiple of Yarn Minimum Container Size
   Memory Settings  YARN      Hive      TEZ      Running Application      Error 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		10-18-2016
	
		
		03:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Scenario 1:  Ranger KMS DB is down but Node is Up   The keys are cached for a time.  You can still read the data in the encrypted folder.  HDFS has knowledge of the encryption zone key  I assume that The Ranger KMS Service is still up, while the DB/ metastore is down.  If you know the database cannot be recovered, and you don¹t have a back up of the keystore, you immediately begin to remove the encryption zone.     You log in as an authorized user, or hdfs and begin copying the files to an unencrypted area and then remove the encrypted zone.  I just tested this on my cluster   
Scenario 2:  The entire node was down.  This means BOTH the Ranger DB and the Ranger KMS Service is down.
   The Encryption Zone key is the Ranger KMS DB (Metastore) and you can also export and save to a file.  You should back up and also make the Ranger KMS DB highly available.  Once you export to a keystore file, you back up the file.  If the cluster node goes down, you restore the Ranger KMS DB again from backup.  If you cannot restore Ranger KMS DB from back up,  you create a completely new Ranger KMS Db and get the backup Keystore file and as a special user run a script to import the key back to the  newly created database.    You can associate once again the encryption zone folder with the key using HDFS commands.  If you Don¹t have BOTH the Keystore file and the Ranger KMS DB to restore  then you don¹t have any option.  The file remains encrypted.   
See this article for script to export and import keys:    https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-12-2016
	
		
		03:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 You can use LDAP in ADDITION to Kerberos.  LDAP is the authentication authority. Kerberos is the ticketing system.
LDAP is like the DMV giving you your driver's licence.  Kerberos is your boarding pass to get on the plane.
Kerberos can be enabled with AD, FreeIPA as your LDAP in HAdoop.
Ambari, Nifi, Ranger will authenticate with those LDAPs.
The only exception is Hive where when Kerberos is enabled it replaces LDAP authentication. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-12-2016
	
		
		03:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Here is your answer:
You can easily spoof your Hadoop cluster with a change of a simple environment variable.      See also   https://community.hortonworks.com/questions/2982/kerberos-adldap-and-ranger.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-08-2016
	
		
		07:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Bryan Bende  What do you mean by "So I went to nifi-https (https://localhost:8443/nifi) and went to the accounts section and approved the account for mycert.p12 and chose a role of "NiFi'."
Does that mean you added to the authorized_users file the DN associated with mycert.p12 and added a role of ROLE_Nifi? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-25-2016
	
		
		09:01 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 @Vineet @Pratheesh  Nair
Ok.. Here is solution  So apparently if you are installing on Amazon AWS EC2 (remember you are only given access to ec2-user, not root), and if you decided to NOT do passwordless ssh with default key named id_rsa for that ec2-user when installing Ambari and its agents, when you try to install Hortonworks HDB (HAWQ) via Ambari, Ambari WILL NOT exchange the keys for the gpadmin user for you.
It would create the gpadmin user with the password you give it on the HAWQ config screen during install, but no keys exchanged.  NOTE: for my nodes I had a key for ssh that was NOT default name of id_rsa.
I do not know if this is a combination of using a non root user or the fact that ec2-user did not have its own passwordless ssh with default key named id_rsa.
In anycase ONLY on the HAWQ Master, for gpadmin would the keys exist.
If you tried the following to generate keys on the HAWQ Master you would still get an error, where it would NOT even accept the default gpadmin user password you set, even though it works.  That was surprising.
  su gpadmin
> source /usr/local/hawq/greenplum_path.sh
> hawq ssh-exkeys -f Hosts
>Enter password for existing user for node <......>
>Enter password for existing user for node <......>
>Enter password for existing user for node <......>  So in essence you have to manually go to each node, and copy authorized_keys file from HAWQ Master to each node (chmod 600), into the /home/gpadmin/.ssh/ so that you can at least password ssh from that HAWQ master node.
Then you run the ssh-exkeys manually and it would work. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-25-2016
	
		
		05:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Pratheesh  Nair
ok I did not run it from hawq master, because Ambari is trying to run it from the HAWQ Standby node.
So I ran it from the HAWQ Master and got the dreaded error:   gpadmin-[ERROR]:-Permission denied (publickey,gssapi-keyex,gssapi-with-mic).  So I am attempting to run as gpadmin  hawq ssh-exkeys -f Hosts  It is asking for password for each hosts
I am attempting to set up passwordless ssh for the gpadmin user (since this is AWS, it was setup for the ec2-user)
Hopefully that may solve it.
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-25-2016
	
		
		05:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Pratheesh  Nair
        
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-25-2016
	
		
		04:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Pratheesh  Nair  There are no logs on /data/hawq/masteron the HAWQ standby master.  In fact nothing is created there unlike on the HAWQ master node.  Yes the ips are different for the masters.  When you run the command from the command line, it immediately returns   This can be run only on master or standby host  Even in verbose mode.
Yes I can do passwordless ssh if I provide the -i with pem file (as this is AWS) i.e. ssh -i <.pem> node
I cannot do a ssh node directly without passing the -i option. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-25-2016
	
		
		02:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Looking at the source code does not help:    https://github.com/apache/incubator-hawq/blob/master/tools/bin/hawq         
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













