Member since
04-26-2016
78
Posts
32
Kudos Received
0
Solutions
10-04-2018
01:52 PM
@amarnath reddy pappu Thanks for your response. I tried deleting the packages on the host machine and tried reinstalling the cluster from start. But still no luck and the same error recurring.
... View more
10-02-2018
12:37 PM
@amarnath reddy pappuWe have already populated the repoid with the repaid information from satellite server. On our satellite, the channel name, repo id and repo name are one and the same. We are experiencing this issue despite all these changes. Any further inputs / alternatives to try? Thanks
... View more
10-01-2018
01:05 PM
Hi, We are trying to install HDP2.6.5 using Ambari 2.6.2.2 As the machines are within a firewall region, we have configured a Redhat Satellite server which hosts all the required repos and packages for HDP installation. However, when we install it using Ambari Installation Wizard, we are facing issues. In the step 1 of installation, I have provided a custom stack definition file which has the <version> element as 2.6.5.9 and <build> element as 1 Also, we have chosen the "Use Local Repository" option with the OS being redhat7. For the Base URLs corresponding to various remiss, I have provided as http://public as the URL and have checked the "Skip Repository Base URL validation" checkbox along with the "Use Redhat Satellite Spacewalk" checkbox as well. In my opinion, the Base URLs are not significant in the case of satellite server deployment and hence provided some dummy value. Inline to my thinking, upon selecting the Redhat Satellite Spacewalk checkbox, BaseURL elements are disabled and becomes uneditable. From here on, we are able to proceed all the way to Step9 - Install, Start and Test. However, here component installation get failed. Right now, we are facing the issue with ZooKeeper installation as the only component we chose to install, just to check installation works fine. The error we are seeing the below error message in Ambari wizard: The'zookeeper-server' component did not advertise a version.This may indicate a problem with the component packaging.However, the stack-select tool was able to report a single version installed (2.6.5.9-1).Thisis the version that will be reported. Has anyone experienced similar issue below and were able to resolve? Any further pointers to allow us move forward? Thanks in Advance.
... View more
Labels:
08-21-2018
08:17 PM
@amarnath reddy pappu our organisation requirement is to use enterprise wide tool instead of Ambari and hence that option is ruled out altogether.
... View more
08-21-2018
04:09 PM
Thanks @amarnath reddy pappu for your response. It provides some food for thought to further explore and better our understanding. We intend to use the external system mainly for alerting purposes so that the support team can act quickly in case any action is required. I was thinking of using the ambari-alerts.log and other log files for only alerting purposes. As the present alerting mechanism is only within Ambari, and our Ops team want to use their existing setup, wondering what is the best possible alternative if scanning the log files is not the way to go?
... View more
08-21-2018
03:41 PM
Hi, We have an org-wide monitoring and alerting system that is being used. We now want to integrate HDP related metrics as well into this system which works on scanning the log files with regular expressions and alerting in case of any threshold breaches. I have the following questions regarding this scenario: Whether it is sufficient for the monitoring system to just scan the ambari-alerts.log alone to get all HDP related metrics? Or is there a need to monitor the individual component level log files as well? For example, to monitor Storm, Kafka, Solr etc, should each individual log files be monitored? What additional details will the individual component level logs provide in comparison to the ambari-alerts.log? While Grafana has dashboards for many metrics, Ambari Alerts are very few in comparison. Is there an easy way to configure all the Grafana metrics to be considered for alerting? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Ambari
07-10-2018
03:47 PM
Hi, Looking into the latest version of documentation for Ambari Metrics, it doesnt look like it supports Solr. In our setup, we are planning to use hdp-search which can be installed through Ambari. But I am not sure whether its possible to monitor the metrics generated by SolrCloud setup. Any thoughts? Thanks in Advance.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Solr
07-10-2018
03:31 PM
Thanks @Geoffrey Shelton Okot for the response. We are planning to use HDP-Search, which is an add-on module for SolrCloud. Using HDPSearch allows us to install and manage Solr through Ambari. So I am wondering that Ambari should be able to Kerberise Solr nodes as well, just like other services it allows. If not for these basic services managed through Ambari, what is the exact benefit of HDPSearch if at all someone purchases HDP-Search support license from Hortonworks? Also, I am unable to find any documentation on monitoring support for Solr using Ambari Metrics Service. Any thoughts? Thanks in advance.
... View more
07-05-2018
12:53 PM
Hi, I am looking for step-by-step procedure to setup Kerberos for Solr that is installed externally i.e. HDP Search. I couldn't find any security related information on HDP-Search documentation - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_solr-search-installation/content/ch_hdp-search-install-ambari.html However, under the security documentation there is a section for enabling Kerberos for SolrCloud - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_security/content/solr_ranger_configure_solrcloud_kerberos.html Can we follow the same documentation to setup Kerberos for HDPSearch as well? I was wondering whether the process has been made more simpler for HDPSearch through Ambari, similar to kerberos setup for other HDP components? Thanks
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Solr
05-21-2018
10:44 PM
Hi @Gaurav Sharma Thanks for your response. Yes, while I understand that there might not be a best practice around maximum capacity, I wonder why cannot each queue has set the maximum value as 100% if we have minimum capacities configured properly along with pre-emption etc? In your example, you mentioned 70-80%. But again with minimum cluster capacity parameters, in what circumstances there might be "bottlenecks" for resources? Further questions: 1. Is the maximum capacity derived from the overall global cluster capacity or is it only from the parent queue's capacity? 2. Is there anything like a "default" queue in HDP setup that is mandatory? (sorry I could test this, but wanted to see if there is any ready answer)
... View more
05-21-2018
09:13 PM
Hi,
In our setup, we are using YARN Capacity Scheduler and have many queues setup in a hierarchical fashion with a well configured minimum capacities. However, wondering what is the best practice for setting maximum capacity value i.e. for the parameter yarn.scheduler.capacity.<queue-path>.maximum-capacity?
Is it advisable to have each queue configured with a maximum capacity of 100% or something like 90 to 95% with some leeway for the default queue? In summary, what are the best practices to leverage maximum cluster capacity while its available while honouring the minimum queue capacities? Thanks
... View more
Labels:
- Labels:
-
Apache YARN
10-25-2016
11:28 AM
Hi, In HDFS Admin Guide, for copying data across encryption zones (inter-cluster or intra-cluster) it has been recommended to use distcp on /.reserved/raw/source_data_dir instead of source_data_dir directly. I believe the reason behind this is to reduce the unnecessary decryption and encryption of the copied-over data on source and destination respectively. My question is that if we copy from /.reserved/raw directory, the data on the destination would obviously be in encrypted form which means the KMS keys as well need to be copied over separately, like database dump or something like that? Any pointers on what is the best strategy in this case?
... View more
Labels:
06-22-2016
09:44 AM
Hi, I understand some of the servcies can be setup in HA mode as documented in the docs. However, I am trying to understand what does "High Availability" mean for the following HDP services / components. Tez Spark ((Presume its a client-only and hence HA won't be applicable as multiple clients can be installed) Slider Phoenix ((Presume its a client-only and hence HA won't be applicable as multiple clients can be installed) Accumulo Storm (Is it all about setting Nimbus HA?) Falcon Atlas Sqoop (Presume its a client-only and hence HA won't be applicable as multiple clients can be installed. But wondering the role of the database behind Sqoop) Flume Ambari (Presume no native HA available at the moment, but planned for future) Zookeeper (Presume Zookeeper itself is inherently HA due to its ensemble and thats what provides HA to many other components. But wanted to understand it there is more to this.) Knox
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
06-17-2016
07:46 PM
Thanks @Arpit Agarwal for your response. So finally it boils down to choosing RR vs AvailableSpace policies and Hortonworks recommends using RR policy with DiskBalancer vs Cloudera's recommendation of AvailableSpace policy? Am I correct in saying that? 🙂
... View more
06-17-2016
02:47 PM
1 Kudo
Hi @Arpit Agarwal I don't know the intricacies of this. But trying to understand which is a better option - to run the balancer as a recovery mechanism at regular intervals or use a better placement policy while writing the blocks itself. I presume the default block placement policy is RR. So if the placement is round-robin, then the smaller disks are filled-up faster. Instead if the placement policy can take available space and as well as IO throughput for each disk, wouldn't that be a better choice? Also, as documented these two properties are only applicable when dfs.datanode.fsdataset.volume.choosing.policy is set to org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy (https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml) But I couldn't find any property named dfs.datanode.fsdataset.volume.choosing.policy. Please let me know where this is set. Please correct me if I am wrong in my understanding.
... View more
06-01-2016
12:46 PM
Thanks @Benjamin Leonhardi I am looking for a solution around a source cluster feeding into two downstream clusters. In another question on HCC, there was a mention of HDF as a good fit and hence wanted to understand the merits in comparison with Falcon
... View more
06-01-2016
10:50 AM
In a teeing based solution where the data is ingested simultaneously to two clusters, can Falcon be used, similar to Flume multi-sink? Alternatively, is it better done with HDF in comparison with Falcon? What are the benefits?
... View more
Labels:
- Labels:
-
Apache Falcon
-
Cloudera DataFlow (CDF)
05-23-2016
10:57 AM
@Benjamin Leonhardi Let me rephrase my question. Assume I have a HDP cluster and then an edge node outside the cluster. On the edge node, I have installed Knox service. My question is to understand which is the better way of ingesting data into HDFS. 1. Should I use the edge node as a staging area to ingest the data first onto the edge node (which means storage is needed on the edge node) and then ingest onto HDFS? This would help to secure the data nodes being exposed to the outside world 2. Alternatively, I can configure Knox service on the edge node such that the WebHDFS API goes through Knox and hence the Namenode URL / IP address is not exposed beyond Knox. In this case, the source directly streams to HDFS, but know doing the address translation. However, no additional storage is needed on the edge node for staging the data temporarily.
... View more
05-23-2016
10:39 AM
@Pradeep Bhadani I have seen solutions where the staging option has been taken.. So just wondering what advantages the option of staging solution brings, in comparison to the direct streaming through Knox WebHDFS API?
... View more
05-23-2016
10:20 AM
Hi, Is it a good practice to stream data directly from the source systems directly into HDFS using Knox exposed WebHDFS APIs, or using the Knox edge node as a staging area before ingesting into HDFS a better one? Thansk
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Knox
04-28-2016
06:42 PM
@james.jones Thanks James. I shall look into meld. Seems interesting.
... View more
04-27-2016
08:52 PM
Thanks @David Schorow for your response. Personally, I feel this will be a great feature to have in Ambari and helps DevOps teams in managing things much better. Hope it makes the scope sooner than later. Regards
... View more
04-27-2016
08:30 PM
Thanks @Brandon Wilson for the quick response. I am aware of the different hardware profiles within a cluster. However, I was wondering about managing multiple clusters through a single Ambari deployment.
... View more
04-27-2016
08:27 PM
Hi, Can Ambari be used to create and manage two different clusters, probably with different hardware profiles, configurations etc? In some of the earlier posts, its mentioned as a future scope, but not sure if it is implemented in the recent 2.x releases? Thanks
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
04-26-2016
10:56 AM
@Artem Ervits, but would this be for one cluster itself rather than comparison across clusters? Not sure if there is a better way than to get cluster configuration from Ambari REST APIs separately for each of them and then do a manual comparison? Thanks
... View more
04-26-2016
09:07 AM
1 Kudo
Hi, Just wondering what is the best way to compare configurations for two clusters? Is there any feature available on Ambari UI to do the same? At present I think Ambari cannot support managing multiple clusters. So not sure if this would be possible at all. Is exporting the cluster configurations as blueprints and then comparing them manually, the only option available? Or are there any better ways to do the same? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Ambari
04-11-2016
09:58 AM
Following the security article (https://community.hortonworks.com/articles/17336/choosing-kerberos-approach-for-hadoop-cluster-in-a.html#comment-26641) , there seems to be three different options to enable kerberos for Hadoop cluster. Just wondering which is the recommended approach out of the three from Hortonworks. 1. Use an MIT KDC specific to Hadoop cluster - automated keytab management using Ambari 2. Use an existing Enterprise Active Directory - Manual setup 3. Using existing Enterprise AD with automated management using Ambari Definitely option 2 seems to be less preferable than 1 and 3. However, wondering what are the factors to consider when choosing either 1 or 3.
... View more
Labels:
- Labels:
-
Apache Hadoop
04-08-2016
02:36 PM
What is the recommended approach out of these three. Definitely two is the least preferred one I believe. So how to decide between 1 and 3? Any suggestions / recommendation by Hortonworks?
... View more
02-12-2016
12:10 PM
Thanks @Neeraj Sabharwal for validating my understanding 🙂
... View more