About learninghuman

learninghuman · ‎03-14-2017

My question on HDFS using SAN as the backend storage has 3 main parts 1. Is it feasible to use SAN as the back end storage for HDFS? 2. What are the pros and cons of using SAN or NAS for HDFS? 3. Has it been tested for performance and may be other aspects?

learninghuman · ‎01-06-2017

@Tom McCuch Thanks for the clarification. One another related question is that in general what are the advantages that Mesos would bring over Yarn? Especially given the fact that Hortonworks is making efforts to support HDP on Mesos. I mean why care. If HDP on the cloud, its still YARN thats going to be the cluster manager.

learninghuman · ‎01-05-2017

Is it possible to deploy HDP docker container in Mesos using Marathon? If so, where can i get the docker images from and the Marathon recipes? If its not possible with the combination above, what are the options to deploy HDP on Mesos? How is it going to be better than running on Yarn?

learninghuman · ‎12-30-2016

My understanding along with questions as below, AWS-HDCloud Manual scaling using Ambari or AWS UI possible. Auto Scaling 1. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)? 1.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality. -------------------------------------------------------------------------------------------------------------------------------------------------------------- AWS-HDP on IaaS Manual scaling using Ambari is possible. Auto Scaling-Without CloudBreak 2. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)? 2.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality. Auto Scaling-WithCloudBreak Auto-scaling may be possible, but question 2.1 applies here as well. -------------------------------------------------------------------------------------------------------------------------------------------------------------- Azure-HdInsights Manual scaling using Ambari or Azure UI possible. Auto Scaling 3. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)? 3.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality. -------------------------------------------------------------------------------------------------------------------------------------------------------------- Azure-HDP in MarketPlace Manual scaling using Ambari or Azure UI possible. Auto Scaling 4. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)? 4.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality. -------------------------------------------------------------------------------------------------------------------------------------------------------------- Azure-HDP on IaaS Same questions as AWS-HDP on IaaS

learninghuman · ‎12-30-2016

@Tom McCuch One last question which i got after reading your answer again. WASB in Azure is supported on both HDP on Azure IaaS and HDP in Azure MarketPlace. Does this mean that WASB is natively optimized in Hadoop 2.x? If so, this would also mean that any distribution with Hadoop 2.x deployed on Azure can use WASB for storage?

learninghuman · ‎12-28-2016

@Tom McCuch So to summarize, please correct as appropriate 1. HDI 3.5 - WASB and ADLS 2. Pre HDI 3.5 - Only WASB 3. HDP on Asure IaaS - Only WASB and HDFS on VHD 4. HDP from Azure Marketplace - Only WASB and HDFS on VHD 5. HDCloud 2.5 - S3 Only 6. HDP on AWS IaaS - HDFS on Ephemeral or EBS

learninghuman · ‎12-27-2016

@Tom McCuch Thanks. Can you also please talk a little bit about ADLS? Do you still recommend WASB over ADLS? And i am not clear on the parallelism factor on s3 and WASB. Are you saying that S3 does not offer parallelism and suitable for larger number of smaller files? whats you take on parallelism when it comes to WASB? And can i use WASB, ADLS and S3 when i install HDP on Azure's IaaS (using CloudBreak) as the HDFS layer?

learninghuman · ‎12-22-2016

What are the storage options possible when deploying HDP on Cloud? My understanding as follows, 1. Azure (HDInsight, HDP via CloudBreak, HDP in the MarketPlace) WASB - What about parallelism here? i.e. if i store a file here and run a map reduce job processing this file. Would i achieve the same effect as i achieve in HDFS storage? ADLS - Although not co-located, performance can be improved by means of parallelism. HDFS itself - I can move the data to the edge node then copy into HDFS What are my options to move my data into WASB, ADLS? This thread suggests NI-FI but my requirement is ephemeral and NIFI investment may not sell. 2. AWS (Below questions apply to HDCloud, HDP via CloudBreak to AWS) S3 - What about parallelism here? i.e. if i store a file here and run a map reduce job processing this file. Would i achieve the same effect as i achieve in HDFS storage? HDFS itself - I can move the data to the edge node then copy into HDFS And out of these storage options, which one is better over the other and for what reason?

learninghuman · ‎12-21-2016

@Greg Keys Thanks again. Hopefully last set of questions 1. With HDP in Azure marketplace, we cannot use the OS of our choice. With CloudBreak, can we specify the OS? 2. Storage in Azure - HDFS, WSAB, ADLS are options for all deployment options of HDP IaaS (CloudBreak, Marketplace), HDInsights? 3. With HDC can i choose the OS? 4. What are the storage options for HDCloud? Is it HDFS and S3 (same as that for HDP on AWS IaaS through CloudBreak)? 5. Can i deploy HDP via CloudBreak in AWS VPC similar to the way that i can deploy in the AWS public cloud? 6. Can i deploy HDC on AWS VPC? 7. What are my options to move data from on-premise to AWS public cloud (S3, HDFS) and AWS VPC (S3, HDFS)? (This may not be strictly HDP question!) 8. What are my options to move data from on-premise to Azure public cloud (WASB, ADLS, HDFS) ? 9. Can i spin HDInsights or HDP (Cloudbreak or marketplace) in Azure private cloud? (I assume that Azure offers two flavors of private cloud - on-premise hosted and the other one similar to VPC)

learninghuman · ‎12-21-2016

@Greg Keys Thanks a lot. Few follow up questions 1. Option 2 that i was talking about is what i see in the Azure portal. Please see the attachments. hdponazure.png and hdponazure-clustercreation.png 2. What about the "Data Lake store" as an option for storage on all options? 3. With respect to performance, my question was more around the issues due to compute and storage not colocated. 4. And what is the purpose of HDCoud? Is it similar to CloudBreak for AWS? Is it for HDP on AWS IaaS? 5. And HDC that you mentioned above - is that a HDP as a service Offering from AWS?

Online	Offline
Last Visited	‎07-05-2016 08:52 AM

Member Since	‎06-29-2016 09:30 AM
Last Visited	‎07-05-2016 08:52 AM
Posts	81
Kudos received	43

Cloudera Community

Re: Mapreduce and Hcatalog Integration fails to us...

Feasibility and recommendation for running HDFS on...

Re: HDP on Mesos using Marathon, Docker

HDP on Mesos using Marathon, Docker

Scaling and Auto-scaling of HDP on AWS and Azure C...

Re: HDP on Cloud (Azure, AWS) - Storage options an...

Re: HDP on Cloud (Azure, AWS) - Storage options an...

Re: HDP on Cloud (Azure, AWS) - Storage options an...

HDP on Cloud (Azure, AWS) - Storage options and da...

Re: HDInsight Vs HDP Service on Azure Vs HDP on Az...

Re: HDInsight Vs HDP Service on Azure Vs HDP on Az...