About amcbarnett

amcbarnett · ‎02-07-2016

@keerthana gajarajakumar please post this as a new question on HCC

amcbarnett · ‎02-06-2016

Create an article on this https://community.hortonworks.com/articles/14900/demystify-knox-ldap-ssl-ca-cert-integration-1.html

amcbarnett · ‎02-06-2016

@Pardeep feel free to add or edit

amcbarnett · ‎02-06-2016

This article was inspired by How to configure knox with existing ssl certificate and a client engagement to give more clarity into the actual commands that you need run to setup Apache Knox for SSL and LDAP with existing certificates. Pre-requisites You need to get procure the following items from your security department before you begin: Your LDAP or AD Digital Certificate: <ldap>.crt The company's Digital CA Cert: <company_ca>.crt The certificate/ key pair for the gateway node: <gateway_node>.crt and <gateway_node>.pem The passphrase for the above gateway node key. Your Knox Master Secret -------------------------------------------------------------------- What if? You don't know the Knox Master Secret? Then you can change it as follows cd $gateway bin/knoxcli.cmd create-master --force -- Delete Keystores and restart Knox > rm data/security/keystores/gateway.jks > rm data/security/keystores__gateway-credentials.jceks You don't have a signed cert from a trusted CA Authority for your Gateway Node? Follow the steps in Hortonworks Doc here to request one from your signing authority. They are also steps available here in the Apache Doc. You Don't have time to get a trusted cert and you just want a self signed cert for evaluation? This would be the subject of another article. In the meantime please check out these great blogs on steps to your own self signed cert or become your own CA for evaluation purposes. SSL Between Knox and WebHDFS Deploying HTTPS in HDFS OR Follow the steps in the doc Self signed certificate specific hostname evaluations ----------------------------------------------------------------------- LDAP Certificate with Apache Knox Steps You would need to import your LDAP Certificate into the Java Key Store #First do a Key List to see if your company's signing authority is already trusted in Java Trust Store keytool -list -keystore ${JAVA_HOME}/jre/lib/security/cacerts | grep <Replace with Your Company's Cert Authority> Enter keystore password:#the default is 'changeit' unless someone else changed it :-) >changeit #If it is not there, or if in doubt, import it. keytool -importcert -trustcacerts -file <ldap>.crt -storepass changeit -noprompt -alias MyLdapCert -keystore ${JAVA_HOME}/jre/lib/security/cacerts Follow the Hortonworks Documentation and/or Apache Knox documentation to configure Knox for LDAP. This article is only meant to give insight into the elusive commands concerning digital certificates and how to import them for use in Knox. --------------------------------------------------------------------------------------------------------- CA Certificate Steps for Dev and Prod SSL with Apache Knox The Hortonworks documentation for the CA Cert steps is here. This article gives you the commands that the doc is asking you to accomplish. This is the heart of this article. First Some handy Knox specifics: {GATEWAY_HOME}/data/security/keystores/gateway.jks This is the identity keystore for the Knox Gateway and needs the public and private keys as well as any signing certs. (see apache docs) The expected alias for the certificate is gateway-identity. (Ancil's Note: {GATEWAY_HOME} is usually /usr/hdp/current/knox-server/) {GATEWAY_HOME}/data/security/keystores/__gateway-credentials.jceks. This is the credential store for the gateway itself and you will want to add a credential to this that protects the private key passphrase used when you import the key pair into the identity store. This is done with knoxcli.sh create-alias gateway-identity-passphrase --value {value}. The master secret for the gateway is used as the keystore password and must also be used to import the key pair. If you choose to make the private key passphrase the same as the master secret then you can skip #2 above. (Source @lmccay) Here are the steps and commands to run ----------------------------------------------------------------------- Step 1. Export PKCS12 key If the Master Key Pair for your gateway node is in PEM format, you need to convert into PCKS12 format. This needs to be done in order to import into the Knox java keystore. "PEM certificates usually have extentions such as .pem, .crt, .cer, and .key. The PKCS12 or PFX format is a binary format for storing the server certificate, any intermediate certificates, and the private key in one encryptable file. PFX files usually have extensions such as .pfx and .p12" (Ref.https://www.sslshopper.com/ssl-converter.html ) #Execute the command substituting for the right files openssl pkcs12 -export -in <gateway_node>.crt -inkey <gateway_node>.pem -out <gateway_node>.p12 -name gateway-identity -certfile <company_ca>.crt -caname <any friendly name> > Enter passphrase for crt: <your company should provide this> > Create an Export Key: <Use Knox master Key> (Reference https://www.openssl.org/docs/manmaster/apps/pkcs12.html ) ----------------------------------------------------------------------- Step 2. Turn the PCKS12 into JKS (Java Key Store) format When Knox was initially setup with Ambari, a jks was already created with an identity of "gateway-identity" for evaluation purposes. See http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Knox_Gateway_Admin_Guide/content/self_signed_certificate_specific_hostname_evaluations.html However we need to take our PCKS12 key and use that instead in the Gateway java keystore. #In Ambari, shut down Knox #Create a copy or backup of /usr/hdp/current/knox-server/data/security/keystores/gateway.jks mv /usr/hdp/current/knox-server/data/security/keystores/gateway.jks /usr/hdp/current/knox-server/data/security/keystores/gateway.jks.old #Execute the command substituting for the right files. IMPORTANT: The Alias MUST be called gateway-identity keytool -importkeystore -srckeystore <gateway_node>.p12 -srcstoretype pkcs12 -srcstorepass <Knox Master Key> -destkeystore /usr/hdp/current/knox-server/data/security/keystores/gateway.jks -deststoretype jks -deststorepass <You can use Knox Master Key or Other> -alias gateway-identity #Verify that the key was imported keytool -list -keystore /usr/hdp/current/knox-server/data/security/keystores/gateway.jks ----------------------------------------------------------------------- Step 3: Sync Default Gateway Identity If you did NOT use the Knox Master Key for your destination pass phrase in Step 3 above you need to let Knox know. knoxcli.sh create-alias gateway-identity-passphrase --value {value}. If you did use the Knox Master Key for you destination pass phrase in Step 3 above, delete the default credential and when Knox restarts it will automatically create the credential with the Knox Master Key. knoxcli.sh delete-alias gateway-identity-passphrase NOTE: To change password for gateway.jks keytool -storepasswd -new <master key> -keystore /var/lib/knox/data-2.3.4.0-3485/security/keystores/gateway.jks ----------------------------------------------------------------------- Step 4: You many need to import your Company's Digital CA Certificate into the Java Key Store keytool -importcert -trustcacerts -file <company_ca>.crt -storepass changeit -noprompt -alias MyLdapCert -keystore ${JAVA_HOME}/jre/lib/security/cacerts Enter keystore password:#the default is 'changeit' unless someone else changed it :-) >changeit ----------------------------------------------------------------------- Step 5: Start Knox Ambari and Test with appropriate Curl commands. Set Debug on in Ambari for Knox if you need to do LDAP and SSL connectivity. Example: http://<host>:8444/gateway/default/webhdfs/v1/?op=liststatus curl -k -u admin:admin-password 'https://127.0.0.1:8443/gateway/default/webhdfs/v1?op=LISTSTATUS'

amcbarnett · ‎02-05-2016

As per a Support note: "You can use the Move NameNode wizard in Ambari. This will move the NameNode but only according to Ambari. After this has been successfully completed (with the NameNode down) then you should move all the files in the old namenode edits directory (dfs.namenode.name.dir) to the new NameNode in the directory configured. The permissions of these files will be hdfs:hadoop (by default) but the owner should be the user who runs your NameNode & the group will be the hadoop primary group. After this is done, then the NameNode is ready to start." The most important thing is to ensure that you have a backup of all the images and edits in dfs.namenode.name.dir. Then if anything happens you can you can revert back to that.

amcbarnett · ‎02-05-2016

This is the picture I have come up with

amcbarnett · ‎02-05-2016

See also https://community.hortonworks.com/articles/14612/process-for-moving-journal-nodes-from-one-host-to.html

amcbarnett · ‎02-05-2016

In case you would like to move JournalNode service to another host, here are the steps to do so: 1. Put HDFS in safemode su - hdfs -c 'hdfs dfsadmin -fs hdfs://<active node>:8020 -safemode enter' 2. Execute a save namespace of the Active NameNode su - hdfs -c 'hdfs dfsadmin -fs hdfs://<active node>:8020 -saveNamespace' 3. Stop all services via Ambari 4. Add journal node to the Ambari database through the API call: export AMBARI_USER=admin export PASSWORD=admin export AMBARI_HOST=localhost export CLUSTER_NAME=cluster export MOVE_FROM='old-host.abcd.com' export MOVE_TO='new-host.abcd.com' # Tell to ambari we want to install this component on new_host curl -u $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By:ambari" -i -X POST http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$MOVE_TO/host_components/JOURNALNODE OR curl -v -u admin:admin -H "X-Requested-By:ambari" -i -X POST -d '{"host_components" : [{"HostRoles":{"component_name":"JOURNALNODE"}}] }' http://<ambari-server>:8080/api/v1/clusters/<clustername>/hosts?Hosts/host_name=<new journal node hostname> 5. Within the Ambari web UI, click on the "Host" tab to display all the hosts for the cluster. Click on the host in which you just added the JournalNode service to in step 4. On this page you should see the JournalNode component in the "Components" section. Next to the JournalNode label is a drop down button, click it and select the "install" option. A progress bar should show up displaying the progress of the installation. OR via Curl # Trigger installation curl -u $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By:ambari" -i -X PUT -d '{"RequestInfo": {"context": "Install JournalNode","query":"HostRoles/component_name.in('JOURNALNODE')"}, "Body":{"HostRoles": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$MOVE_TO/host_components 6. Once the installation finishes, update the configuration property below within in the Ambari UI within the HDFS section. <dfs.namenode.shared.edits.dir> to include the new journal node and exclude the old one 7. Start the new installed journal node or Via Curl # Start JournalNode on new host curl -u $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By:ambari" -i -X PUT -d '{"RequestInfo": {"context": "Start JournalNode","query":"HostRoles/component_name.in('JOURNALNODE')"}, "Body":{"HostRoles": {"state": "STARTED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$MOVE_TO/host_components/JOURNALNODE 8. Start the rest of the journal nodes 9. Start the active NameNode to refresh the configuration and then stop it. 10. On the previous active node, run hdfs namenode -initializeSharedEdits -force" 11. Restart the active NameNode 12. On the standby NameNode run "hdfs namenode -bootstrapStandby" (it will prompt you to “re-format filesystem in storage directory /hadoop/hdfs/namenode? (Y or N) , answer Y “ ) 13. Start DataNodes 14. Start standby NameNode 15. Remove journal node by the following api call # Stop JournalNode on old host curl -u $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By:ambari" -i -X PUT -d '{"RequestInfo": {"context": "Stop JournalNode","query":"HostRoles/component_name.in('JOURNALNODE')"}, "Body":{"HostRoles": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$MOVE_FROM/host_components/JOURNALNODE # Remove the old component curl -u $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By:ambari" -i -X DELETE http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$MOVE_FROM/host_components/JOURNALNODE 16. Start all services (ENSURE THIS IS DONE AND ZOOKEEPER IS UP AND RUNNING OR NEITHER NAMENODE WILL BE ACTIVE) (Reference support note also)

amcbarnett · ‎02-04-2016

This document is an informal guide to setting up a test cluster on Amazon AWS, specifically the EC2 service. This is not a best practice guide nor is it suitable for a full PoC or production install of HDP. Please refer to Hortonworks documentation online to get a complete set of documentation. Create Instances Created the following RHEL 6.3 64bit instances: m1.medium ambarimaster m1.large hdpmaster1 m1.large hdpmaster2 m1.medium hdpslave1 m1.medium hdpslave2 m1.medium hdpslave3 Note: when instantiating instances, I increased the root partition to 100Gb on each of them. For long term use, you may want to create separate volumes for your each of the datanodes to store larger amounts of data. Typical raw storage per node is 12-24Tb per slave node. Note: I edit the Name column in the EC2 Instances screen to the names mentioned above so I know which box I’m dealing with Configure Security Groups Used the following security group rules: ICMP Port (Service) Source Action ALL sg-79c54511 (hadoop) Delete TCP Port (Service) Source Action 0 – 65535 sg-79c54511 (hadoop) Delete 22 (SSH) 0.0.0.0/0 Delete 80 (HTTP) 0.0.0.0/0 Delete 7180 0.0.0.0/0 Delete 8080 (HTTP*) 0.0.0.0/0 Delete 50000 – 50100 0.0.0.0/0 Delete UDP Port (Service) Source Action 0 – 65535 sg-79c54511 (hadoop) Delete Configure Nodes On each and every node (using root): vim /etc/sysconfig/selinux (set SELINUX=disabled) vim /etc/sysconfig/network (set HOSTNAME=<chosen_name>.hdp.hadoop where<chosen_name>is one of the following: ambarimaster, hdpmaster1, hdpmaster2, hdpslave1, hdpslave2, hdpslave3 – depending on what EC2 instance you are on) chkconfig iptables off chkconfig ip6tables off shutdown -r now #(only after the commands above are completed) Note: when I do a restart of the node in this manner, my external EC2 names did NOT change. They will change if you actually halt the instance. This is separate concern from the internal IP addresses which we will get to further on in these instructions Note: SSH on the RHEL instances has a time out. If your session hangs just give it a few seconds and you will get a “Write failed: Broken pipe” message; just reconnect the box and everything will be fine. Change the SSH timeout if you desire. Key Exchange Logged onto the ambarimaster ONLY: ssh-keygen -t rsa On your local box (assuming a linux/mac laptop/workstation, if not use Cygwin, WinSCP, FileZilla, etc to accomplish the equivalent secure copy): scp -i amuisekey.pem root@ec2-54-234-94-128.compute-1.amazonaws.com:/root/.ssh/id_rsa.pub ./ scp -i amuisekey.pem root@ec2-54-234-94-128.compute-1.amazonaws.com:/root/.ssh/id_rsa ./ Once you have your public and private key on your local box, you can distribute the public key to each node. Do this for every host except for the ambarimaster: scp -i amuisekey.pem ./id_rsa.pub root@ec2-174-129-186-149.compute-1.amazonaws.com:/root/.ssh/ Log on to each host copy the public key for ambarimaster into each server’s authorized_key file: cat id_rsa.pub >> authorized_keys To confirm the passwordless ssh is working: Pick a host other than ambarimaster and determine the internal IP and keep it handy: ifconfig -a Log on to your ambarimaster and test passwordless ssh using the IP of the host you had just looked up: ssh root@10.110.35.23 Confirm that you did actually land on the right host by checking the name: hostname Make sure you exit out of your remote session to your child node from the ambarimaster or things could get confusing very fast Setup Hosts Log on to the ambarimaster and edit the hosts: On each host, check the internal ip with: ifconfig -a Edit the hosts file on your ambarimaster: vim /etc/hosts Edit the hosts file to look the one below, taking into account your own IP addresses for each host: 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 10.110.35.23 hdpmaster1.hdp.hadoop hdpmaster1 10.191.45.41 hdpmaster2.hdp.hadoop hdpmaster2 10.151.94.30 hdpslave1.hdp.hadoop hdpslave1 10.151.87.239 hdpslave2.hdp.hadoop hdpslave2 10.70.78.233 hdpslave3.hdp.hadoop hdpslave3 10.151.22.30 ambarimaster.hdp.hadoop ambarimaster Finally, copy the /etc/hosts file from the ambarimaster to every other node: scp /etc/hosts root@hdpmaster1:/etc/hosts scp /etc/hosts root@hdpmaster2:/etc/hosts scp /etc/hosts root@hdpslave1:/etc/hosts scp /etc/hosts root@hdpslave2:/etc/hosts scp /etc/hosts root@hdpslave3:/etc/hosts Note: the /etc/hosts is the file you will need to change if you shut down your EC2 instance and get a new internal IP. When you update this file you must make sure that all nodes have the same copy. YUM Install On the ambarimaster only, install the HDP yum repository: cd wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/GA/ambari.repo cp ambari.repo /etc/yum.repos.d yum install epel-release yum repolist Installand initialize the ambari server: yum install ambari-server ambari-server setup ambari-server start Now you can logon to Ambari. Make a note of the external hostname of your ambarimaster EC instance in the AWS console and go to: http://:8080 using your local host’s favorite web browser Log on to Ambari with admin/admin Using Ambari to Install Going through the Ambari cluster install process: Name your cluster whatever you want Install Options::Target Hosts – on each line enter the fully qualified hostnames as below (do not add our ambarimaster to the list): hdpmaster1.hdp.hadoophdpmaster2.hdp.hadoophdpslave1.hdp.hadoophdpslave2.hdp.hadoophdpslave3.hdp.hadoop Install Options::Host Registration Information – Find the id_rsa (private key) file you downloaded from ambarimaster when you were setting up. Click on choose file and select this file. Install Options::Advanced Options – leave these as default Click Register and Confirm Confirm Hosts – Wait for the ambari agents to be installed and registered on each of your nodes and click next when all have been marked success. Note that you can always add nodes at a later time, but make sure you have your two masters and at least 1 slave. Choose Services – By default all services are selected. Note that you cannot go back and reinstall services later in this version of Ambari so choose what you want now. Assign Masters – Likely the default is fine, but see below for a good setup. Note that one of the slaves will need to be a ZooKeeper instance to have an odd number for quorum. hdpmaster1:NameNode,NagiosServer,GangliaCollector,HBaseMaster,ZooKeeper hdpmaster2:SNameNode,JobTracker,HiveServer2,HiveMetastore,WebHCatServer,OozieServer,ZooKeeper hdpslave1:ZooKeeper Assign Slaves and Clients – For a demo cluster it is fine to have all of the boxes run datanode, tasktracker, regionserver, and client libraries. If you want to expand this cluster with many more slave nodes then I would suggest only running the datanode, tasktracker, and regionserver roles to the hdpslave nodes. The clients can be installed where you like but be sure at least one or two boxes have a client role. Click Next after you are done. Customize Services – You will note that two services have red markers next to their name: Hive/HCat and Nagios. Select Hive/HCat and choose your password for hive user on the MySQL database (this stores metadata only); remember the password. Select Nagios and choose your admin password. Setup your Hadoop admin email to your email (or the email of someone you don’t like very much) and you can experience Hadoop alerts from your cluster! Wow. Review – Take note of the Print command in the top corner. I usually save this to a pdf. Then click Deploy. Get a coffee. Note: you may need to refresh the web page if the installer appears stuck (this happens very occasionally depending on various browser/network situations) Verify 100% installed and click Next Summary – You should see that all of the Master services were installed successfully and none should have failed. Click Complete. At this point the Ambari installation and the HDP Cluster is complete so you should see the Ambari Dashboard. You can leave your cluster running as long as you want but be warned that the instances and volumes will cost you on AWS. To ensure that you will not be charged you can terminate (not just stop) your instances and delete your volumes in AWS. I encourage you to keep them for a a week or so as you decide how to setup your actual Hadoop PoC cluster (be it on actual hardware, Virtual Machines, or another cloud solution). The instances you created will be handy for reference as you install your next cluster and generally are low cost. Consult AWS documentation for details on management and pricing. Please look into Rackspace as well. Relevant Links HDP Documentation: http://docs.hortonworks.com/ AWS Instructions for using Putty with Linux EC2 instances: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html AWS Discussion of Static IPs: https://forums.aws.amazon.com/thread.jspa?threadID=71177 Adding a new drive to RHEL 6 (Relevant for adding volumes for Datanode storage): http://www.techotopia.com/index.php/Adding_a_New_Disk_Drive_to_an_RHEL_6_System Toronto Hadoop User Group: http://www.meetup.com/TorontoHUG/ Some more details on this process with Amazon EC2 and Cygwin: http://pthakkar.com/2013/03/installing-hadoop-apache-ambari-amazon-ec2/

amcbarnett · ‎02-04-2016

ISSUE: Choosing the appropriate Linux file system for HDFS deployment SOLUTION: The Hadoop Distributed File System is platform independent and can function on top of any underlying file system and Operating System. Linux offers a variety of file system choices, each with caveats that have an impact on HDFS. As a general best practice, if you are mounting disks solely for Hadoop data, disable ‘noatime’. This speeds up reads for files. There are three Linux file system options that are popular to choose from: Ext3 Ext4 XFS Yahoo uses the ext3 file system for its Hadoop deployments. ext3 is also the default filesystem choice for many popular Linux OS flavours. Since HDFS on ext3 has been publicly tested on Yahoo’s cluster it makes for a safe choice for the underlying file system. ext4 is the successor to ext3. ext4 has better performance with large files. ext4 also introduced delayed allocation of data, which adds a bit more risk with unplanned server outages while decreasing fragmentation and improving performance. XFS offers better disk space utilization than ext3 and has much quicker disk formatting times than ext3. This means that it is quicker to get started with a data node using XFS. Most often performance of a Hadoop cluster will not be constrained by disk speed – I/O and RAM limitations will be more important. ext3 has been extensively tested with Hadoop and is currently the stable option to go with. ext4 and xfs can be considered as well and they give some performance benefits. References: http://wiki.apache.org/hadoop/DiskSetup http://hadoop-common.472056.n3.nabble.com/Hadoop-performance-xfs-and-ext4-td742325.html http://www.quora.com/What-are-the-advantages-and-disadvantages-of-the-filesystems-ext2-ext3-ext4-ReiserFS-and-XFS

Online	Offline
Last Visited	‎04-13-2018 03:07 PM

Member Since	‎09-29-2015 05:35 PM
Last Visited	‎04-13-2018 03:07 PM
Posts	286
Kudos received	595

Cloudera Community

Re: HIVE : counting null values based on group by

Re: ERROR 500 received - when installing the PIVOT...

Re: How do you achieve high availability in HDFS w...

Re: Why can't we use LDAP for Hadoop authenticatio...

Re: Error Installing HDB HAWQ Standby Master

Re: STORM : How do I fix the google.guava dependen...

Re: How to configure Knox with existing SSL Certif...

Re: Demystify Knox, LDAP, SSL, CA Cert integration

Demystify Knox, LDAP, SSL, CA Cert integration

Re: How to recover from a failed NameNode move?

Re: Best Practices for Storm Deployment on a Hadoo...

Re: Process for moving HDP Services manually

Process for Moving Journal Nodes from One Host to ...

Ambari on EC2

Best Practices: Linux File Systems for HDFS