Member since
06-29-2016
81
Posts
43
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1108 | 03-16-2016 08:26 PM |
03-14-2017
03:34 PM
My question on HDFS using SAN as the backend storage has 3 main parts 1. Is it feasible to use SAN as the back end storage for HDFS? 2. What are the pros and cons of using SAN or NAS for HDFS? 3. Has it been tested for performance and may be other aspects?
... View more
Labels:
- Labels:
-
Apache Hadoop
01-06-2017
04:16 AM
1 Kudo
@Tom McCuch Thanks for the clarification. One another related question is that in general what are the advantages that Mesos would bring over Yarn? Especially given the fact that Hortonworks is making efforts to support HDP on Mesos. I mean why care. If HDP on the cloud, its still YARN thats going to be the cluster manager.
... View more
01-05-2017
04:41 PM
1 Kudo
Is it possible to deploy HDP docker container in Mesos using Marathon? If so, where can i get the docker images from and the Marathon recipes? If its not possible with the combination above, what are the options to deploy HDP on Mesos? How is it going to be better than running on Yarn?
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
12-30-2016
09:58 AM
2 Kudos
My understanding along with questions as below, AWS-HDCloud Manual scaling using Ambari or AWS UI possible. Auto Scaling 1. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)? 1.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality. -------------------------------------------------------------------------------------------------------------------------------------------------------------- AWS-HDP on IaaS Manual scaling using Ambari is possible. Auto Scaling-Without CloudBreak 2. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)? 2.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality. Auto Scaling-WithCloudBreak Auto-scaling may be possible, but question 2.1 applies here as well. -------------------------------------------------------------------------------------------------------------------------------------------------------------- Azure-HdInsights Manual scaling using Ambari or Azure UI possible. Auto Scaling 3. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)? 3.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality. -------------------------------------------------------------------------------------------------------------------------------------------------------------- Azure-HDP in MarketPlace Manual scaling using Ambari or Azure UI possible. Auto Scaling 4. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)? 4.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality. -------------------------------------------------------------------------------------------------------------------------------------------------------------- Azure-HDP on IaaS Same questions as AWS-HDP on IaaS
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
12-30-2016
09:38 AM
@Tom McCuch One last question which i got after reading your answer again. WASB in Azure is supported on both HDP on Azure IaaS and HDP in Azure MarketPlace. Does this mean that WASB is natively optimized in Hadoop 2.x? If so, this would also mean that any distribution with Hadoop 2.x deployed on Azure can use WASB for storage?
... View more
12-28-2016
03:09 PM
@Tom McCuch So to summarize, please correct as appropriate 1. HDI 3.5 - WASB and ADLS 2. Pre HDI 3.5 - Only WASB 3. HDP on Asure IaaS - Only WASB and HDFS on VHD 4. HDP from Azure Marketplace - Only WASB and HDFS on VHD 5. HDCloud 2.5 - S3 Only 6. HDP on AWS IaaS - HDFS on Ephemeral or EBS
... View more
12-27-2016
08:29 AM
@Tom McCuch Thanks. Can you also please talk a little bit about ADLS? Do you still recommend WASB over ADLS? And i am not clear on the parallelism factor on s3 and WASB. Are you saying that S3 does not offer parallelism and suitable for larger number of smaller files? whats you take on parallelism when it comes to WASB? And can i use WASB, ADLS and S3 when i install HDP on Azure's IaaS (using CloudBreak) as the HDFS layer?
... View more
12-22-2016
03:51 AM
5 Kudos
What are the storage options possible when deploying HDP on Cloud? My understanding as follows, 1. Azure (HDInsight, HDP via CloudBreak, HDP in the MarketPlace) WASB - What about parallelism here? i.e. if i store a file here and run a map reduce job processing this file. Would i achieve the same effect as i achieve in HDFS storage? ADLS - Although not co-located, performance can be improved by means of parallelism. HDFS itself - I can move the data to the edge node then copy into HDFS What are my options to move my data into WASB, ADLS? This thread suggests NI-FI but my requirement is ephemeral and NIFI investment may not sell. 2. AWS (Below questions apply to HDCloud, HDP via CloudBreak to AWS) S3 - What about parallelism here? i.e. if i store a file here and run a map reduce job processing this file. Would i achieve the same effect as i achieve in HDFS storage? HDFS itself - I can move the data to the edge node then copy into HDFS And out of these storage options, which one is better over the other and for what reason?
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
12-21-2016
04:42 PM
@Greg Keys Thanks again. Hopefully last set of questions 1. With HDP in Azure marketplace, we cannot use the OS of our choice. With CloudBreak, can we specify the OS? 2. Storage in Azure - HDFS, WSAB, ADLS are options for all deployment options of HDP IaaS (CloudBreak, Marketplace), HDInsights? 3. With HDC can i choose the OS? 4. What are the storage options for HDCloud? Is it HDFS and S3 (same as that for HDP on AWS IaaS through CloudBreak)? 5. Can i deploy HDP via CloudBreak in AWS VPC similar to the way that i can deploy in the AWS public cloud? 6. Can i deploy HDC on AWS VPC? 7. What are my options to move data from on-premise to AWS public cloud (S3, HDFS) and AWS VPC (S3, HDFS)? (This may not be strictly HDP question!) 8. What are my options to move data from on-premise to Azure public cloud (WASB, ADLS, HDFS) ? 9. Can i spin HDInsights or HDP (Cloudbreak or marketplace) in Azure private cloud? (I assume that Azure offers two flavors of private cloud - on-premise hosted and the other one similar to VPC)
... View more
12-21-2016
02:15 PM
@Greg Keys Thanks a lot. Few follow up questions 1. Option 2 that i was talking about is what i see in the Azure portal. Please see the attachments. hdponazure.png and hdponazure-clustercreation.png 2. What about the "Data Lake store" as an option for storage on all options? 3. With respect to performance, my question was more around the issues due to compute and storage not colocated. 4. And what is the purpose of HDCoud? Is it similar to CloudBreak for AWS? Is it for HDP on AWS IaaS? 5. And HDC that you mentioned above - is that a HDP as a service Offering from AWS?
... View more