About jstraub

jstraub · ‎02-18-2016

please see my comment above. In secure mode you need local user accounts on all Nodemanager nodes

jstraub · ‎02-18-2016

@Sagar Shimpi @ARUNKUMAR RAMASAMY I agree with @Vikas Gadade, if you want to execute jobs with your user account, you have to make sure the user is available on every Nodemanager node! Please see this => "YARN containers in a secure cluster use the operating system facilities to offer execution isolation for containers. Secure containers execute under the credentials of the job user. The operating system enforces access restriction for the container. The container must run as the user that submitted the application." more info => https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/SecureContainer.html

jstraub · ‎02-17-2016

Make sure you have configured the right heap size as well as validate the following configurations: hbase.rootdir = hdfs://ams...... hbase.cluster.distributed=True Metrics service operation mode=distributed hbase.zookeeper.property.clientPort=2181 hbase.zookeeper.quorum=<zookeeper quorum, comma separated without port> zookeper.znode.parent= /ams-hbase-unsecure or /hbase-hbase-secure (depending kerberos yes/no) Restart the metrics collector and make sure a new Znode was created in Zookeeper. Make sure Hbase and the Metrics collector have been started successfully.

jstraub · ‎02-17-2016

Thanks !!!

jstraub · ‎02-17-2016

It is definitely possible to do that, however I would not recommend it, especially in a production environment. These JN processes are just lightweight daemons, so you can place them on the same nodes with other master services. Using one Quorum for multiple clusters increases the risk and chance of affecting the health/stability of all the attached clusters. For example if Cluster A brings down your JN Quorum (for whatever reason), the Namenodes of Cluster B cant synchronize their state and will shutdown eventually because the Quorum is not available => 2016-02-16 22:55:55,550 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [XXXXX:8485, XXXXXX:8485, xXXXX:8485], stream=QuorumOutputStream starting at txid 51260)) java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.

jstraub · ‎02-15-2016

Yeah the script is really a starting point to do ambari audits. It sounds like you need more like an export/import functionality, I have worked on something similar in the past. Or are you looking for a way to export the config deltas from two clusters an compare them? How would the export of configuration deltas work? Export all adjusted configurations, but ignore configurations that have a hostname, ip or clustername automatically? Or do you just export all delta configurations, select the configuration values you want for the new cluster and import the selected values?

jstraub · ‎02-15-2016

@Steven Hirsch The python script is using the following modules: requests json getpass logging sys getopt On most of the systems you only have to install getpass and requests. Requests is not python script, its a complete package that makes it easier to submit API requests, see this page http://docs.python-requests.org/en/master/ (You can install it with "pip install requests") Let me know if you need any help with the script, I am happy to help and improve the script 🙂

jstraub · ‎02-14-2016

Great, thanks for sharing! This might also help https://github.com/mr-jstraub/HDFSQuota/blob/master/HDFSQuota.ipynb

jstraub · ‎02-13-2016

You dont have to remove and reinstall the ambari metrics service from Ambari, I am pretty sure this will not solve the problem! Please see my comment above => Since hbase.cluster.distributed is true, could you please change "Metrics service operation mode" to "distributed" If this is a new installation, you can try to remove all Metrics data: Stop Ambari Metrics (Collector + all monitors) Make sure no Metrics process is running (you can kill all processes belonging to user "ams") Remove data from hdfs (hdfs dfs -rmr hdfs://hdp-m.samitsolutions.com:8020/apps/hbase/data) Remove data from zookeeper (login: zookeeper-client -server hdp-m.samitsolutions.com:2181; removal: rmr /<hbase znode>) Start the Ambari Metrics Collector (not the monitors!) See if the collector starts, if not please upload the hbase-master and ambari-metrics-collector log Is this a secured (kerberized) or unsecured (no kerberos) cluster? There are other steps we can try, but lets try the above first. Thanks

jstraub · ‎02-12-2016

. @wei yang Are you using Spark 1.3.1 or just the content of the tutorial? ORC support was added in Spark 1.4 (http://hortonworks.com/blog/bringing-orc-support-into-apache-spark/) Try using the following command myDataFrame.write.format("orc").save("some_name")

Online	Offline
Last Visited	‎08-18-2019 08:21 AM

Member Since	‎09-15-2015 02:21 PM
Last Visited	‎08-18-2019 08:21 AM
Posts	457
Kudos received	472

Cloudera Community

Re: NiFi: How do I see the flowfile attributes nam...

Re: NiFi: JSON Array split

Re: Securing Solr with Ranger ERROR 500

Re: Is Ambari Infra open source?

Re: After disabling kerberos , ZKfailover not comi...

Re: Adding a new user to the cluster

Re: Adding a new user to the cluster

Re: Ambari Metrics - Switching from Embedded to Di...

Re: Visualize near-real-time stock price changes u...

Re: Is it recommended to use same set of journal n...

Re: Ambari Audit log

Re: Ambari Audit log

Re: How to identify what is consuming space in HDF...

Re: Ambari-Metrics collector not starting

Re: saveAsOrcFile is not a member of org.apache.sp...