About dstreev

dstreev · ‎09-28-2015

I'll answer my own question here since I was able to work through it on a new install. With a fresh installation of HDP 2.3 with Ambari 2.1.1, you'll be prompted during the installation to select a server (or more) to install the NFS gateway on. This happens in the same configuration windows during cluster configuration where you designate Datanodes, Region Servers, Phoenix Servers, etc.. After the installation has finished, you'll see indications that the NFS Gateway is running on the choosen servers. Now what? If you go to one of the servers and do a df - h you won't see any new mount points. So how far down the path did Ambari get you. If you reference back to the HDP 2.2 docs on configuring NFS, you'll see that Ambari has started the nfs and rpcbind services for you. But now it's up to you to mount them. Follow the remaining HDP 2.2 docs to complete the process and mount the NFS gateway. The startup process for the NFS Gateway is run as the 'hdfs' user. So earlier documents covering proxy settings are NOT necessary. # Mount Example (to be run as root) mkdir / hdfs mount - t nfs - o vers = 3 ,proto = tcp,nolock localhost: / / hdfs User interaction details are found here. /etc/fstab Example for NFS Gateway Automount localhost:/ /hdfs nfs rw,vers=3,proto=tcp,nolock,timeo=600 0 0 If you're using NFS as a quick way to traverse HDFS while avoiding the startup times of the jvm when making the 'hdfs dfs ...' call, try out the hdfs-cli project. https://github.com/dstreev/hdfs-cli

dstreev · ‎09-26-2015

Looking at my current HA implementation, the root zNode for HA is hadoop-ha, followed by the name of the HA instance. IE: /hadoop-ha/HOME , where HOME is to name of the HA NN instance. So this should support your "multiple" HA systems in theory. HA NN doesn't but much load on ZooKeeper, so you're not going to have a scaling issue in that regard. I think this is an operational issue though. Customers are starting to see that ZooKeeper is used for more and more things on clusters and could be considered a place that holds "too many eggs". Another point is around upgrades. While the rolling upgrade supports an easy transition for ZooKeeper, the clusters you would serve from a single ZooKeeper instance will have limitations, from an operational standpoint. Customers with larger clusters are trending to having multiple ZooKeepers per cluster, instead of one ZooKeeper the rule them all. Especially if Storm and Kafka are involved. They apply a different type (heavier) of load then NN HA. Note: Resource Manager HA can also put quite a load on ZooKeeper on really large clusters, as it uses ZooKeeper to maintain the state of jobs.

dstreev · ‎09-25-2015

For Ambari-Server, I currently make adjustments to: /etc/ambari-server/conf/log4j.properties And for the Agent, I have to create a symlink to redirect /var/log/ambari-agent. Is there a better method?

dstreev · ‎09-25-2015

Recently use TDE to encrypt an HBase installation and found some interesting request for Key access by the Region Servers. Out of the box, we locked down the Key permissions to allow only the "hbase" user, since this was the user accessing the files by way of the Region Servers. During normal operations, we saw additional requests from the "nn" user and later from "hdfs". Well, "hdfs" is a user, that's fine. But "nn" is not. "nn" was setup as a principal per host for Kerberos (in IPA). We got around this by actually creating an "nn" user in IPA and granting them rights to the Key in Ranger KMS. Was that the best way? And I'm a little curious "how" the "nn" principal expressed itself as a user in hdfs operations.

dstreev · ‎09-25-2015

Before Ambari 2.1, we had to manager the NFS Gateway separately. Now it's "kind of" part of the Ambari process. At least it shows up in Ambari (HDFS Summary page) as installed and running. But I don't see a away to control the bind, etc... And there aren't any processes running like that. So what is the process for using NFS with Ambari 2.1+?

dstreev · ‎09-25-2015

Have you tried to create the component first, before attempting the installation? IE: curl -i -X -H "X-Requested-By: ambari" POST -u admin:admin http://<ambari-host>:8080/api/v1/clusters/<clustername>/hosts/<host_name>/host_components/ZOOKEEPER.

dstreev · ‎09-25-2015

hadoop distcp -i -log /tmp/ hdfs://xxx:8020/apps/yyyy hdfs://xxx_cid/tmp/ In this case the "xxx" is the "un-secure" cluster, while "xxx_cid" in the secure cluster. We are launching the job from the Kerberos cluster, with the appropriate kinit for the user and getting the following error: java.io.IOException: Failed on local exception: java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.; Host Details : local host is: "xxx/10.x.x.x"; destination host is: "xxx":8020; ... Caused by: java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections. I thought by launching the job from the secure cluster, that we could avoid any access issues. But it appears that the processes are kicked off from the "source" cluster. In this case, that's the insecure cluster. Idea's on getting around this?

dstreev · ‎09-25-2015

Managing Capacity Scheduler via text can get messy. And it requires a bit of research by the user to figure out what settings are available. Try using the Capacity Scheduler View in Ambari 2.1+. It will make managing queues much more simple. Note: You'll need to set it up first through "Manage Ambari". A sample of a complex Queue Layout. Through this interface, you can also manage newer features in YARN that map users to queues, via ACL. If you're using Ranger to secure HDP, the YARN plugin will extend this capability even more!!!

dstreev · ‎09-24-2015

Applied this recently as well with MySql 5.5 instance with HA (Tungsten). Haven't seen the issue on 5.6 basic install.

dstreev · ‎09-24-2015

As a general rule we do NOT use the default Ambari Databases. Pick one, MySQL, Oracle or PostgreSQL and have a separate instance stood up for it. Then use it for all of your repositories. It should be this way for any environment beyond a sandbox. I wouldn’t even do a POC with the defaults. Simply because the defaults are all over the place and POC can turn into production systems :). Once you’ve committed to using a repository, changing “types” is not really possible. You need to start over basically. I know a lot have asked about this in the past, but it’s a mess. Take 2 minutes in the beginning, setup an “Independent” MySQL (or other) database and use it. If you need to move the MySQL around in the future, that’s possible and more obtainable than switching types. NOTE: Ambari won’t lay down MySQL until the Hive Metastore, so even if you figure out a way to use that Metastore for Oozie, Ranger, etc.. it’s will be controlled by the Hive Service Config. So it WILL restart MySQL when you’ve allow Ambari to install it. If you didn’t catch me saying it earlier, Install a separate and independent RDBMS for your Metastores.

Online	Offline
Last Visited	‎04-22-2024 12:06 PM

Member Since	‎07-30-2019 11:12 AM
Last Visited	‎04-22-2024 12:06 PM
Posts	53
Kudos received	130

Cloudera Community

Re: Audit HDFS spool logs not coming in acrhive fo...

Re: How to install the hadoop-aws module to copy f...

Re: Why we need multiple values for YARN_LOCAL_DIR...

Re: Unable to upgrade from HDP 2.3.4.0 to 2.3.4.7

Re: How can we track data transfer from hiveserver...

Re: Using NFS with Ambari 2.1 and above

Re: Can one Zookeeper quorum support multiple HDFS...

Better way the change Logging directories for Amba...

How to promote a principal in IPA to a user for a ...

Using NFS with Ambari 2.1 and above

Re: Error while installing component on a host via...

Running distcp between two cluster: One Kerberized...

Re: Configuring YARN Capacity Scheduler with Ambar...

Re: Problem installing Ranger

Re: What are Oozie Production Recommendations?