Member since 
    
	
		
		
		06-29-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                81
            
            
                Posts
            
        
                43
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1446 | 03-16-2016 08:26 PM | 
			
    
	
		
		
		03-14-2017
	
		
		03:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 My question on HDFS using SAN as the backend storage has 3 main parts  1. Is it feasible to use SAN as the back end storage for HDFS?  2. What are the pros and cons of using SAN or NAS for HDFS?  3. Has it been tested for performance and may be other aspects? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
			
    
	
		
		
		01-06-2017
	
		
		04:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Tom McCuch Thanks for the clarification. One another related question is that in general what are the advantages that Mesos would bring over Yarn? Especially given the fact that Hortonworks is making efforts to support HDP on Mesos. I mean why care. If HDP on the cloud, its still YARN thats going to be the cluster manager. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-05-2017
	
		
		04:41 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Is it possible to deploy HDP docker container in Mesos using Marathon?   If so, where can i get the docker images from and the Marathon recipes?  If its not possible with the combination above, what are the options to deploy HDP on Mesos? How is it going to be better than running on Yarn? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Hortonworks Data Platform (HDP)
			
    
	
		
		
		12-30-2016
	
		
		09:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 My understanding along with questions as below,  AWS-HDCloud  Manual scaling using Ambari or AWS UI possible.   Auto Scaling  1. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)?   1.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality.  --------------------------------------------------------------------------------------------------------------------------------------------------------------  AWS-HDP on IaaS  Manual scaling using Ambari is possible.  Auto Scaling-Without CloudBreak  2. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)?   2.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality.  Auto Scaling-WithCloudBreak   Auto-scaling may be possible, but question 2.1 applies here as well.  --------------------------------------------------------------------------------------------------------------------------------------------------------------  Azure-HdInsights  Manual scaling using Ambari or Azure UI possible.  Auto Scaling  3. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)?   3.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality.  --------------------------------------------------------------------------------------------------------------------------------------------------------------  Azure-HDP in MarketPlace  Manual scaling using Ambari or Azure UI possible.  Auto Scaling  4. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)?  4.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality.  --------------------------------------------------------------------------------------------------------------------------------------------------------------  Azure-HDP on IaaS  Same questions as AWS-HDP on IaaS 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Hortonworks Cloudbreak
			
    
	
		
		
		12-30-2016
	
		
		09:38 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Tom McCuch One last question which i got after reading your answer again. WASB in Azure is supported on both HDP on Azure IaaS and HDP in Azure MarketPlace. Does this mean that WASB is natively optimized in Hadoop 2.x? If so, this would also mean that any distribution with Hadoop 2.x deployed on Azure can use WASB for storage? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-28-2016
	
		
		03:09 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Tom McCuch So to summarize, please correct as appropriate  1. HDI 3.5 - WASB and ADLS  2. Pre HDI 3.5 - Only WASB  3. HDP on Asure IaaS - Only WASB and HDFS on VHD  4. HDP from Azure Marketplace - Only WASB and HDFS on VHD  5. HDCloud 2.5 - S3 Only  6. HDP on AWS IaaS - HDFS on Ephemeral or EBS  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-27-2016
	
		
		08:29 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Tom McCuch Thanks. Can you also please talk a little bit about ADLS? Do you still recommend WASB over ADLS?   And i am not clear on the parallelism factor on s3 and WASB. Are you saying that S3 does not offer parallelism and suitable for larger number of smaller files? whats you take on parallelism when it comes to WASB?  And can i use WASB, ADLS and S3 when i install HDP on Azure's IaaS (using CloudBreak) as the HDFS layer? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-22-2016
	
		
		03:51 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		5 Kudos
		
	
				
		
	
		
					
							 What are the storage options possible when deploying HDP on Cloud?  My understanding as follows,  1. Azure (HDInsight, HDP via CloudBreak, HDP in the MarketPlace)  WASB - What about parallelism here? i.e. if i store a file here and run a map reduce job processing this file. Would i achieve the same effect as i achieve in HDFS storage?  ADLS - Although not co-located, performance can be improved by means of parallelism.  HDFS itself - I can move the data to the edge node then copy into HDFS  What are my options to move my data into WASB, ADLS? This thread suggests NI-FI but my requirement is ephemeral and NIFI investment may not sell.   2. AWS (Below questions apply to HDCloud, HDP via CloudBreak to AWS)  S3 - What about parallelism here? i.e. if i store a file here and run a map reduce job processing this file. Would i achieve the same effect as i achieve in HDFS storage?  HDFS itself - I can move the data to the edge node then copy into HDFS  And out of these storage options, which one is better over the other and for what reason? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Hortonworks Data Platform (HDP)
			
    
	
		
		
		12-21-2016
	
		
		04:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Greg Keys Thanks again. Hopefully last set of questions  1. With HDP in Azure marketplace, we cannot use the OS of our choice. With CloudBreak, can we specify the OS?  2. Storage in Azure - HDFS, WSAB, ADLS are options for all deployment options of HDP IaaS (CloudBreak, Marketplace), HDInsights?  3. With HDC can i choose the OS?  4. What are the storage options for HDCloud? Is it HDFS and S3 (same as that for HDP on AWS IaaS through CloudBreak)?  5. Can i deploy HDP via  CloudBreak in AWS VPC similar to the way that i can deploy in the AWS public cloud?  6. Can i deploy HDC on AWS VPC?  7. What are my options to move data from on-premise to AWS public cloud (S3, HDFS) and AWS VPC (S3, HDFS)? (This may not be strictly HDP question!)  8. What are my options to move data from on-premise to Azure public cloud (WASB, ADLS, HDFS) ?  9. Can i spin HDInsights or HDP (Cloudbreak or marketplace) in Azure private cloud? (I assume that Azure offers two flavors of private cloud - on-premise hosted and the other one similar to VPC) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-21-2016
	
		
		02:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Greg Keys Thanks a lot. Few follow up questions  1. Option 2 that i was talking about is what i see in the Azure portal. Please see the attachments. hdponazure.png and hdponazure-clustercreation.png  2. What about the "Data Lake store" as an option for storage on all options?   3. With respect to performance, my question was more around the issues due to compute and storage not colocated.  4. And what is the purpose of HDCoud? Is it similar to CloudBreak for AWS? Is it for HDP on AWS IaaS?  5. And HDC that you mentioned above - is that a HDP as a service Offering from AWS? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













