About jpetro416

jpetro416 · ‎12-21-2017

Yes, you should be able to passwordless SSH to all nodes FROM Ambari. That way, the agents will be able to install automatically. You don't want to be installing Ambari agents manually. https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-installation/content/set_up_password-less_ssh.html

jpetro416 · ‎11-03-2017

You achieve this by limiting access via firewall rules, other than that KNOX + Kerberos is the built in method. Some resources: Secure Authentication: The core Hadoop uses Kerberos and Hadoop delegation tokens for security. WebHDFS also uses Kerberos (SPNEGO) and Hadoop delegation tokens for authentication. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_security/content/configure_webhdfs_for_knox.html https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cdh_sg_secure_webhdfs_config.html

jpetro416 · ‎10-20-2017

Try increasing the polling intervals on some of the processors. This can help the CPU + UI.

jpetro416 · ‎10-20-2017

This info might be more helpful to guide you down the road of DR. With HDP in production, you must combine different technologies being offered and tailor these together as your own solution. I've read through many solutions, and the info below is the most critical in my opinion. Remember, preventing data loss is better than recovering from it! Read these slides first: https://www.slideshare.net/cloudera/hadoop-backup-and-disaster-recovery https://www.slideshare.net/hortonworks/ops-workshop-asrunon20150112/72 1. VM Snapshots If your not using VM's, then switch over Ambari nightly VM snapshots Namenode VM snapshots 2. Lockdown critical directories: fs.protected.directories - Under HDFS config in ambari Protect critical directories from deletion. There could be accidental deletes of the critical data-sets. These catastrophic errors should be avoided by adding appropriate protections. For example the /user directory is the parent of all user-specific sub-directories. Attempting to delete the entire /user directory is very likely to be unintentional. To protect against accidental data loss, mark the /user directory as protected. This prevents attempts to delete it unless the directory is already empty 3. Backups Backups can be automated using tools like Apache Falcon (being deprecated in HDP 3.0, switch to workflow editor + DistCp) and Apache Oozie Using Snapshots HDFS snapshots can be combined with DistCp to create the basis for an online backup solution. Because a snapshot is a read-only, point-in-time copy of the data, it can be used to back up files while HDFS is still actively serving application clients. Backups can even be automated using tools like Apache Falcon and Apache Oozie. Example: “Accidentally” remove the important file sudo -u hdfs hdfs dfs -rm -r -skipTrash /tmp/important-dir/important-file.txt Recover the file from the snapshot: hdfs dfs -cp /tmp/important-dir/.snapshot/first-snapshot/important-file.txt /tmp/important-dir hdfs dfs -cat /tmp/important-dir/important-file.txt HDFS Snapshots Overview A snapshot is a point-in-time, read-only image of the entire file system or a sub tree of the file system. HDFS snapshots are useful for: Protection against user error: With snapshots, if a user accidentally deletes a file, the file can be restored from the latest snapshot that contains the file. Backup: Files can be backed up using the snapshot image while the file system continues to serve HDFS clients. Test and development: Files in an HDFS snapshot can be used to test new programs without affecting the HDFS file system that is concurrently supporting HDFS clients. Disaster recovery: Snapshots can be replicated to a remote recovery site for disaster recovery. DistCp Overview Hadoop DistCp (distributed copy) can be used to copy data between Hadoop clusters or within a Hadoop cluster. DistCp can copy just files from a directory or it can copy an entire directory hierarchy. It can also copy multiple source directories to a single target directory. DistCp: Uses MapReduce to implement its I/O load distribution, error handling, and reporting. Has built-in support for multiple file system types. It can work with HDFS, Amazon S3, Cassandra, and others. DistCp also supports copying between different HDFS versions. Can generate a significant workload on the cluster if a large volume of data is being transferred. Has many command options. Use hadoop distcp –help to get online command help information.

jpetro416 · ‎08-29-2017

If your password has any unique characters such as "&" it will break the XML The fix for this example would be changing the & to: "& amp;" without the space (this website will not show the correct value).

jpetro416 · ‎07-12-2017

I'm trying to achieve a simple count on fileflows via attribute. The idea is to count to 100, then execute an email. How do I achieve this?

jpetro416 · ‎04-21-2017

The answer given is worthless this issue comes down to not having the ambari agents setup on the nodes. Run a yum install ambari-agent then configure the agent and start the agent with ambari-agent start. Any issues at this point check: Agents are up and running /etc/hosts files are correct Ssh is working There is another potential problem. If this is your SECOND attempt after restarting the entire process, there is a BUG. If this is the second attempt after a successful SSH. Then try the alternative option for manual install. It will work if the agents are running and vi /etc/ambari-agent/conf/ambari-agent.ini [server] hostname=<your.ambari.server.hostname> url_port=8440 secured_url_port=8441 Start the agent on every host in your cluster. ambari-agent start

jpetro416 · ‎04-20-2017

There is another potential problem. If this is your SECOND attempt after restarting the entire process, there is a BUG. If this is the second attempt after a successful SSH. Then try the alternative option for manual install. It will work if the agents are running and vi /etc/ambari-agent/conf/ambari-agent.ini [server] hostname=<your.ambari.server.hostname> url_port=8440 secured_url_port=8441 Start the agent on every host in your cluster. ambari-agent start

jpetro416 · ‎04-12-2017

Try using incognito mode in chrome, this works for me and asks me to proceed. Make sure you import the cert via IE first.

jpetro416 · ‎03-23-2017

@Matt Burgess Is there anyway of not using the password in your examples if I had already setup password-less SSH. The groovy script is throwing an error if I try to not use the password.

Online	Offline
Last Visited	‎03-18-2021 05:08 PM

Member Since	‎09-26-2016 12:59 PM
Last Visited	‎03-18-2021 05:08 PM
Posts	74
Kudos received	4

Cloudera Community

Re: Specified config does not exist in ZooKeeper (...

Re: dose ambari cluster needs ssh access between a...

Re: restrict WebHDFS to be reachable only from cer...

Re: HDF NiFi wont' start

Re: Error: Could not find or load main class org.a...

Re: dose ambari cluster needs ssh access between a...

Re: restrict WebHDFS to be reachable only from cer...

Re: NIFI not responding.

Re: what is the best backup and recovery solution ...

Re: LDAP Authentication Issue

NiFi Count Fileflows via attribute

Re: Can not deploy to hosts

Re: Problem during Ambari Confirm Hosts

Re: Can't Access the Nifi UI

Re: Nifi Add Custom Sensitive Field To Processor?