Member since 
    
	
		
		
		09-24-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                816
            
            
                Posts
            
        
                488
            
            
                Kudos Received
            
        
                189
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3104 | 12-25-2018 10:42 PM | |
| 13984 | 10-09-2018 03:52 AM | |
| 4683 | 02-23-2018 11:46 PM | |
| 2401 | 09-02-2017 01:49 AM | |
| 2822 | 06-21-2017 12:06 AM | 
			
    
	
		
		
		04-07-2017
	
		
		01:00 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing". 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-07-2017
	
		
		12:53 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks for your reply, but your solution will fix all Zeppelin interpreters to use py3. I want to have interpreters running both py2 and py3. I was able to set livy.pyspark to work on py3, and I'm looking for setup to enable spark.pyspark interpreter to work on py3. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-06-2017
	
		
		10:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 If you want to minimize your down time you can try to stop nodes one by one, upgrade RAM and restart all components on the node after restart. You must do masters one by one, but in case of workers you can do in sets of 2, or if you have rack-awarness in sets of 3-4 or even rack by rack. You need HA configuration of major services like HDFS, Yarn, HBase, Hive for this to work. You also need replication of Kafka topics of at least 2, and do Kafka nodes one by one, if you have Kafka. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-06-2017
	
		
		09:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 No, multiple HBASE services are not supported right now, not even by "copy". You'd need to create additional services like HBASE2, HBASE3 and so on, like SPARK2 now running in addition to SPARK. Instead you can either add more nodes to your cluster and enable the single HBASE to handle all your requirements, or create 2 clusters with one HBase in each. I'd suggest working with a single, strengthened HBase cluster. Recently I've been involved in a HBase cluster running on several hundred nodes, and after some tuning it works great. Initially we also considered 2 clusters but this one is covering all our needs for the time being, and scaling well so far. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-06-2017
	
		
		09:37 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Yes, exactly! Data stored on HDFS is not affected in any way, so all files used by a single HBase region are still replaced only 3 times. What is further replicated to achieve RS HA are read-only secondary keys held by respective Region Servers. You can find a good explanation here. What you get in return is faster recovery for reading from HBase. For "write" you still need to wait longer (like without RS HA), until the HBase master activates affected regions on other Region Servers. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-01-2017
	
		
		12:35 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Cloudbreak is a popular, easy to use HDP component for cluster deployment on various cloud environments including 
Azure, AWS, OpenStac and GCP. This article shows how to create an Azure application for Cloudbreak using Azure CLI.   Note: To do this, you need access to "Owner" account on your Azure subscription. "Developer" and other roles are not enough.   
 Download and install Azure CLI using instructions provided here. CLI versions are available for Windows, Mac-OS and Linux https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
Type "az" to make sure the CLI is available and in your command path.   Login to your Azure account in your web browser, and then also login from your command line:   az login 
To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code HPBCSXTPJ to authenticate.   
 Follow the instructions on the web page. When done you will see confirmation on the command line that your login was successful.   Run the following command. You can freely choose values to enter here including dummy URIs. Identifier URI and the homepage are never used on Azure but they are
required. Also make sure that identifier URI is unique on your subscription. So, instead of "mycbdapp" you may choose a more 
descriptive name.
URIs are dummy, never used, but required    az ad app create --identifier-uris http://mycbdapp.com --display-name mycbdapp --homepage http://mycbdapp.com   
 Ignore the output of this command, including appId, that's not the one we need!  Choose your password, and run the following command    az ad sp create-for-rbac --name "mycbdapp"  --password "mytopsecretpassword" --role Owner
{
  "appId": "c19a48f3-492f-a87b-ac4a-b1d8e456f14e",
  "displayName": "mycbdapp",
  "name": "http://mycbdapp",
  "password": "mytopsecretpassword",
  "tenant": "891fd956-21c9-4c40-bfa7-ab88c1d8364c"
}   
 Now login to your Cloudbreak instance, select "manage credentials", "+ create credential", and on the 
"Configure credential" page select Azure and fill the form like on the screenshot.        
 Use appId, password, and tenant ID from the 
output above. Add you Azure subscription ID, and paste the public key of your ssh key pair your created before
(this will be used to provide ssh access to cluster machines to the "cloudbreak" user).
Then, proceed by providing other settings, and enjoy HDP on Cloudbreak!  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		03-30-2017
	
		
		11:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Trying to use Zeppelin pyspark interpreter with python3, I set "python" parameter in the interpreter to my python3 path, and have installed python3 on all worker nodes in the cluster at the same path, getting error when running simple commands:  %pyspark
file = sc.textFile("/data/x1")
file.take(3)
Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions  It works from the command line, using "pyspark" after exporting PYSPARK_PYTHON set to my python3 path. But how to tell this to Zeppelin? I haven't changed anything else. Actually, as the next step I'd like to create 2 spark interpreters, one to run on python2 and another on python3. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Zeppelin
			
    
	
		
		
		03-29-2017
	
		
		11:24 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	You can try WebHCat, its mapreduce command 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-25-2017
	
		
		05:52 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	Well, it seems to be a bug, reported but unattended: HIVE-13983. A workaround is to use INSERT INTO ... SELECT like  insert into test select 'привет' from test limit 1;  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













