Member since
09-29-2015
286
Posts
601
Kudos Received
60
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11464 | 03-21-2017 07:34 PM | |
2885 | 11-16-2016 04:18 AM | |
1608 | 10-18-2016 03:57 PM | |
4266 | 09-12-2016 03:36 PM | |
6215 | 08-25-2016 09:01 PM |
02-07-2016
03:22 AM
@keerthana gajarajakumar please post this as a new question on HCC
... View more
02-06-2016
04:42 AM
2 Kudos
Create an article on this https://community.hortonworks.com/articles/14900/demystify-knox-ldap-ssl-ca-cert-integration-1.html
... View more
02-06-2016
04:41 AM
@Pardeep feel free to add or edit
... View more
02-06-2016
04:40 AM
20 Kudos
This article was inspired by How to configure knox with existing ssl certificate and
a client engagement to give more clarity into the actual commands that
you need run to setup Apache Knox for SSL and LDAP with existing
certificates. Pre-requisites You need to get procure the following items from your security department before you begin: Your LDAP or AD Digital Certificate: <ldap>.crt The company's Digital CA Cert: <company_ca>.crt The certificate/ key pair for the gateway node: <gateway_node>.crt and <gateway_node>.pem The passphrase for the above gateway node key. Your Knox Master Secret -------------------------------------------------------------------- What if? You don't know the Knox Master Secret? Then you can change it as follows
cd $gateway
bin/knoxcli.cmd create-master --force
-- Delete Keystores and restart Knox
> rm data/security/keystores/gateway.jks
> rm data/security/keystores__gateway-credentials.jceks
You don't have a signed cert from a trusted CA Authority for your Gateway Node? Follow the steps in Hortonworks Doc here to request one from your signing authority. They are also steps available here in the Apache Doc. You Don't have time to get a trusted cert and you just want a self signed cert for evaluation? This would be the subject of another article.
In the meantime please check out these great blogs on steps to your own self signed cert or become your own CA for evaluation purposes. SSL Between Knox and WebHDFS Deploying HTTPS in HDFS OR Follow the steps in the doc Self signed certificate specific hostname evaluations -----------------------------------------------------------------------
LDAP Certificate with Apache Knox Steps You would need to import your LDAP Certificate into the Java Key Store #First do a Key List to see if your company's signing authority is already trusted in Java Trust Store
keytool -list -keystore ${JAVA_HOME}/jre/lib/security/cacerts | grep <Replace with Your Company's Cert Authority>
Enter keystore password:#the default is 'changeit' unless someone else changed it :-)
>changeit
#If it is not there, or if in doubt, import it.
keytool -importcert -trustcacerts -file <ldap>.crt -storepass changeit -noprompt -alias MyLdapCert -keystore ${JAVA_HOME}/jre/lib/security/cacerts
Follow the Hortonworks Documentation and/or Apache Knox documentation to configure Knox for LDAP. This article is only meant to give insight into the elusive commands concerning digital certificates and how to import them for use in Knox. --------------------------------------------------------------------------------------------------------- CA Certificate Steps for Dev and Prod SSL with Apache Knox The Hortonworks documentation for the CA Cert steps is here. This article gives you the commands that the doc is asking you to accomplish. This is the heart of this article. First Some handy Knox specifics: {GATEWAY_HOME}/data/security/keystores/gateway.jks This
is the identity keystore for the Knox Gateway and needs the public and
private keys as well as any signing certs. (see apache docs) The
expected alias for the certificate is gateway-identity. (Ancil's Note: {GATEWAY_HOME} is usually /usr/hdp/current/knox-server/) {GATEWAY_HOME}/data/security/keystores/__gateway-credentials.jceks. This
is the credential store for the gateway itself and you will want to add
a credential to this that protects the private key passphrase used when
you import the key pair into the identity store. This is done with knoxcli.sh create-alias gateway-identity-passphrase --value {value}. The master secret for the gateway is used as the keystore password
and must also be used to import the key pair. If you choose to make the
private key passphrase the same as the master secret then you can skip
#2 above. (Source @lmccay)
Here are the steps and commands to run ----------------------------------------------------------------------- Step 1. Export PKCS12 key If
the Master Key Pair for your gateway node is in PEM format, you need to
convert into PCKS12 format. This needs to be done in order to import into the Knox java keystore. "PEM certificates usually have extentions such as .pem, .crt, .cer, and .key. The PKCS12 or PFX format is a binary format for storing the server
certificate, any intermediate certificates, and the private key in one
encryptable file. PFX files usually have extensions such as .pfx and .p12" (Ref.https://www.sslshopper.com/ssl-converter.html ) #Execute the command substituting for the right files
openssl pkcs12 -export -in <gateway_node>.crt -inkey <gateway_node>.pem -out <gateway_node>.p12 -name gateway-identity -certfile <company_ca>.crt -caname <any friendly name>
> Enter passphrase for crt: <your company should provide this>
> Create an Export Key: <Use Knox master Key>
(Reference https://www.openssl.org/docs/manmaster/apps/pkcs12.html ) -----------------------------------------------------------------------
Step 2. Turn the PCKS12 into JKS (Java Key Store) format When
Knox was initially setup with Ambari, a jks was already created with an
identity of "gateway-identity" for evaluation purposes. See
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Knox_Gateway_Admin_Guide/content/self_signed_certificate_specific_hostname_evaluations.html However we need to take our PCKS12 key and use that instead in the Gateway java keystore. #In Ambari, shut down Knox
#Create a copy or backup of /usr/hdp/current/knox-server/data/security/keystores/gateway.jks
mv /usr/hdp/current/knox-server/data/security/keystores/gateway.jks /usr/hdp/current/knox-server/data/security/keystores/gateway.jks.old
#Execute the command substituting for the right files. IMPORTANT: The Alias MUST be called gateway-identity
keytool -importkeystore -srckeystore <gateway_node>.p12 -srcstoretype pkcs12 -srcstorepass <Knox Master Key> -destkeystore /usr/hdp/current/knox-server/data/security/keystores/gateway.jks -deststoretype jks -deststorepass <You can use Knox Master Key or Other> -alias gateway-identity
#Verify that the key was imported
keytool -list -keystore /usr/hdp/current/knox-server/data/security/keystores/gateway.jks
----------------------------------------------------------------------- Step 3: Sync Default Gateway Identity If you did NOT use the Knox Master Key for your destination pass phrase in Step 3 above you need to let Knox know. knoxcli.sh create-alias gateway-identity-passphrase --value {value}. If
you did use the Knox Master Key for you destination pass phrase in Step
3 above, delete the default credential and when Knox restarts it will
automatically create the credential with the Knox Master Key. knoxcli.sh delete-alias gateway-identity-passphrase
NOTE: To change password for gateway.jks keytool -storepasswd -new <master key> -keystore /var/lib/knox/data-2.3.4.0-3485/security/keystores/gateway.jks ----------------------------------------------------------------------- Step 4: You many need to import your Company's Digital CA Certificate into the Java Key Store keytool -importcert -trustcacerts -file <company_ca>.crt -storepass changeit -noprompt -alias MyLdapCert -keystore ${JAVA_HOME}/jre/lib/security/cacerts
Enter keystore password:#the default is 'changeit' unless someone else changed it :-)
>changeit
----------------------------------------------------------------------- Step 5: Start Knox Ambari and Test with appropriate Curl commands. Set Debug on in Ambari for Knox if you need to do LDAP and SSL connectivity. Example: http://<host>:8444/gateway/default/webhdfs/v1/?op=liststatus
curl -k -u admin:admin-password 'https://127.0.0.1:8443/gateway/default/webhdfs/v1?op=LISTSTATUS'
... View more
Labels:
02-05-2016
10:36 PM
2 Kudos
As per a Support note: "You can use the Move NameNode wizard in Ambari.
This will move the NameNode but only according to Ambari. After
this has been successfully completed (with the NameNode down) then you
should move all the files in the old namenode edits directory
(dfs.namenode.name.dir) to the new NameNode in the directory configured.
The permissions of these files will be hdfs:hadoop (by default) but the
owner should be the user who runs your NameNode & the group will be
the hadoop primary group. After this is done, then the NameNode is ready to start." The most important thing is to ensure that you have a backup of all the images and edits in dfs.namenode.name.dir. Then if anything happens you can you can revert back to that.
... View more
02-05-2016
01:53 AM
1 Kudo
This is the picture I have come up with
... View more
02-05-2016
01:33 AM
1 Kudo
See also https://community.hortonworks.com/articles/14612/process-for-moving-journal-nodes-from-one-host-to.html
... View more
02-05-2016
01:32 AM
3 Kudos
In case you would like to move JournalNode service to another host, here are the steps to do so: 1. Put HDFS in safemode
su - hdfs -c 'hdfs dfsadmin -fs hdfs://<active node>:8020 -safemode enter'
2. Execute a save namespace of the Active NameNode
su - hdfs -c 'hdfs dfsadmin -fs hdfs://<active node>:8020 -saveNamespace'
3. Stop all services via Ambari
4. Add journal node to the Ambari database through the API call:
export AMBARI_USER=admin
export PASSWORD=admin
export AMBARI_HOST=localhost
export CLUSTER_NAME=cluster
export MOVE_FROM='old-host.abcd.com'
export MOVE_TO='new-host.abcd.com'
# Tell to ambari we want to install this component on new_host
curl -u $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By:ambari" -i -X POST http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$MOVE_TO/host_components/JOURNALNODE
OR
curl
-v -u admin:admin -H "X-Requested-By:ambari" -i -X POST -d
'{"host_components" : [{"HostRoles":{"component_name":"JOURNALNODE"}}]
}'
http://<ambari-server>:8080/api/v1/clusters/<clustername>/hosts?Hosts/host_name=<new
journal node hostname>
5. Within the Ambari web UI, click on the "Host" tab to display all the hosts for the cluster. Click on the host in which you just added the JournalNode service to in step 4. On this page you should see the JournalNode component in the "Components" section. Next to the JournalNode label is a drop down button, click it and select the "install" option. A progress bar should show up displaying the progress of the installation.
OR via Curl
# Trigger installation
curl -u $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By:ambari" -i -X PUT -d '{"RequestInfo": {"context": "Install JournalNode","query":"HostRoles/component_name.in('JOURNALNODE')"}, "Body":{"HostRoles": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$MOVE_TO/host_components
6. Once the installation finishes, update the configuration property below within in the Ambari UI within the HDFS section.
<dfs.namenode.shared.edits.dir>
to include the new journal node and exclude the old one
7. Start the new installed journal node
or Via Curl
# Start JournalNode on new host
curl -u $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By:ambari" -i -X PUT -d '{"RequestInfo": {"context": "Start JournalNode","query":"HostRoles/component_name.in('JOURNALNODE')"}, "Body":{"HostRoles": {"state": "STARTED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$MOVE_TO/host_components/JOURNALNODE
8. Start the rest of the journal nodes
9. Start the active NameNode to refresh the configuration and then stop it.
10. On the previous active node, run
hdfs namenode -initializeSharedEdits -force"
11. Restart the active NameNode
12. On the standby NameNode run
"hdfs namenode -bootstrapStandby"
(it will prompt you to “re-format filesystem in storage directory /hadoop/hdfs/namenode? (Y or N) , answer Y “ )
13. Start DataNodes
14. Start standby NameNode
15. Remove journal node by the following api call
# Stop JournalNode on old host
curl -u $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By:ambari" -i -X PUT -d '{"RequestInfo": {"context": "Stop JournalNode","query":"HostRoles/component_name.in('JOURNALNODE')"}, "Body":{"HostRoles": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$MOVE_FROM/host_components/JOURNALNODE
# Remove the old component
curl -u $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By:ambari" -i -X DELETE http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$MOVE_FROM/host_components/JOURNALNODE
16. Start all services (ENSURE THIS IS DONE AND ZOOKEEPER IS UP AND RUNNING OR NEITHER NAMENODE WILL BE ACTIVE)
(Reference support note also)
... View more
Labels:
02-04-2016
08:05 PM
1 Kudo
This document is an informal guide to setting
up a test cluster on Amazon AWS, specifically the EC2 service. This is
not a best practice guide nor is it suitable for a full PoC or
production install of HDP. Please refer to Hortonworks documentation online to get a complete set of documentation. Create Instances Created the following RHEL 6.3 64bit instances: m1.medium ambarimaster m1.large hdpmaster1 m1.large hdpmaster2 m1.medium hdpslave1 m1.medium hdpslave2 m1.medium hdpslave3 Note: when instantiating instances, I increased the root partition to
100Gb on each of them. For long term use, you may want to create
separate volumes for your each of the datanodes to store larger amounts
of data. Typical raw storage per node is 12-24Tb per slave node. Note: I edit the Name column in the EC2 Instances screen to the names mentioned above so I know which box I’m dealing with Configure Security Groups Used the following security group rules:
ICMP
Port (Service)
Source
Action
ALL
sg-79c54511 (hadoop)
Delete
TCP
Port (Service)
Source
Action
0 – 65535
sg-79c54511 (hadoop)
Delete
22 (SSH)
0.0.0.0/0
Delete
80 (HTTP)
0.0.0.0/0
Delete
7180
0.0.0.0/0
Delete
8080 (HTTP*)
0.0.0.0/0
Delete
50000 – 50100
0.0.0.0/0
Delete
UDP
Port (Service)
Source
Action
0 – 65535
sg-79c54511 (hadoop)
Delete
Configure Nodes On each and every node (using root):
vim /etc/sysconfig/selinux (set SELINUX=disabled)
vim /etc/sysconfig/network (set HOSTNAME=<chosen_name>.hdp.hadoop where<chosen_name>is one of the following: ambarimaster, hdpmaster1, hdpmaster2, hdpslave1, hdpslave2, hdpslave3 – depending on what EC2 instance you are on)
chkconfig iptables off
chkconfig ip6tables off
shutdown -r now #(only after the commands above are completed) Note: when I do a restart of the node in this manner, my external EC2
names did NOT change. They will change if you actually halt the
instance. This is separate concern from the internal IP addresses which
we will get to further on in these instructions Note: SSH on the RHEL instances has a time out. If your session hangs
just give it a few seconds and you will get a “Write failed: Broken
pipe” message; just reconnect the box and everything will be fine.
Change the SSH timeout if you desire. Key Exchange Logged onto the ambarimaster ONLY: ssh-keygen -t rsa On your local box (assuming a linux/mac laptop/workstation, if not
use Cygwin, WinSCP, FileZilla, etc to accomplish the equivalent secure
copy):
scp -i amuisekey.pem root@ec2-54-234-94-128.compute-1.amazonaws.com:/root/.ssh/id_rsa.pub ./
scp -i amuisekey.pem root@ec2-54-234-94-128.compute-1.amazonaws.com:/root/.ssh/id_rsa ./ Once you have your public and private key on your local box, you can
distribute the public key to each node. Do this for every host except
for the ambarimaster:
scp -i amuisekey.pem ./id_rsa.pub root@ec2-174-129-186-149.compute-1.amazonaws.com:/root/.ssh/ Log on to each host copy the public key for ambarimaster into each server’s authorized_key file:
cat id_rsa.pub >> authorized_keys To confirm the passwordless ssh is working: Pick a host other than ambarimaster and determine the internal IP and keep it handy: ifconfig -a Log on to your ambarimaster and test passwordless ssh using the IP of the host you had just looked up: ssh root@10.110.35.23 Confirm that you did actually land on the right host by checking the name: hostname Make sure you exit out of your remote session to your child node from the ambarimaster or things could get confusing very fast Setup Hosts Log on to the ambarimaster and edit the hosts: On each host, check the internal ip with: ifconfig -a Edit the hosts file on your ambarimaster: vim /etc/hosts Edit the hosts file to look the one below, taking into account your own IP addresses for each host:
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.110.35.23 hdpmaster1.hdp.hadoop hdpmaster1
10.191.45.41 hdpmaster2.hdp.hadoop hdpmaster2
10.151.94.30 hdpslave1.hdp.hadoop hdpslave1
10.151.87.239 hdpslave2.hdp.hadoop hdpslave2
10.70.78.233 hdpslave3.hdp.hadoop hdpslave3
10.151.22.30 ambarimaster.hdp.hadoop ambarimaster
Finally, copy the /etc/hosts file from the ambarimaster to every other node:
scp /etc/hosts root@hdpmaster1:/etc/hosts
scp /etc/hosts root@hdpmaster2:/etc/hosts
scp /etc/hosts root@hdpslave1:/etc/hosts
scp /etc/hosts root@hdpslave2:/etc/hosts
scp /etc/hosts root@hdpslave3:/etc/hosts Note: the /etc/hosts is the file you will need to change if you shut
down your EC2 instance and get a new internal IP. When you update this
file you must make sure that all nodes have the same copy. YUM Install On the ambarimaster only, install the HDP yum repository:
cd
wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/GA/ambari.repo
cp ambari.repo /etc/yum.repos.d
yum install epel-release
yum repolist
Installand initialize the ambari server:
yum install ambari-server
ambari-server setup
ambari-server start Now you can logon to Ambari. Make a note of the external hostname of
your ambarimaster EC instance in the AWS console and go to: http://:8080
using your local host’s favorite web browser Log on to Ambari with admin/admin Using Ambari to Install Going through the Ambari cluster install process: Name your cluster whatever you want Install Options::Target Hosts – on each line enter the fully
qualified hostnames as below (do not add our ambarimaster to the list):
hdpmaster1.hdp.hadoophdpmaster2.hdp.hadoophdpslave1.hdp.hadoophdpslave2.hdp.hadoophdpslave3.hdp.hadoop Install Options::Host Registration Information – Find the id_rsa
(private key) file you downloaded from ambarimaster when you were
setting up. Click on choose file and select this file. Install Options::Advanced Options – leave these as default Click Register and Confirm Confirm Hosts – Wait for the ambari agents to be installed and
registered on each of your nodes and click next when all have been
marked success. Note that you can always add nodes at a later time, but
make sure you have your two masters and at least 1 slave. Choose Services – By default all services are selected. Note that you
cannot go back and reinstall services later in this version of Ambari
so choose what you want now. Assign Masters – Likely the default is fine, but see below for a good
setup. Note that one of the slaves will need to be a ZooKeeper instance
to have an odd number for quorum.
hdpmaster1:NameNode,NagiosServer,GangliaCollector,HBaseMaster,ZooKeeper
hdpmaster2:SNameNode,JobTracker,HiveServer2,HiveMetastore,WebHCatServer,OozieServer,ZooKeeper
hdpslave1:ZooKeeper Assign Slaves and Clients – For a demo cluster it is fine to have all
of the boxes run datanode, tasktracker, regionserver, and client
libraries. If you want to expand this cluster with many more slave nodes
then I would suggest only running the datanode, tasktracker, and
regionserver roles to the hdpslave nodes. The clients can be installed
where you like but be sure at least one or two boxes have a client role.
Click Next after you are done. Customize Services – You will note that two services have red markers next to their name: Hive/HCat and Nagios. Select Hive/HCat and choose your password for hive user on the MySQL
database (this stores metadata only); remember the password. Select Nagios and choose your admin password. Setup your Hadoop admin
email to your email (or the email of someone you don’t like very much)
and you can experience Hadoop alerts from your cluster! Wow. Review – Take note of the Print command in the top corner. I usually save this to a pdf. Then click Deploy. Get a coffee. Note: you may need to refresh the web page if the installer appears
stuck (this happens very occasionally depending on various
browser/network situations) Verify 100% installed and click Next Summary – You should see that all of the Master services were
installed successfully and none should have failed. Click Complete. At this point the Ambari installation and the HDP Cluster is complete so you should see the Ambari Dashboard. You can leave your cluster running as long as you want but be warned
that the instances and volumes will cost you on AWS. To ensure that you
will not be charged you can terminate (not just stop) your instances and
delete your volumes in AWS. I encourage you to keep them for a a week
or so as you decide how to setup your actual Hadoop PoC cluster (be it
on actual hardware, Virtual Machines, or another cloud solution). The
instances you created will be handy for reference as you install your
next cluster and generally are low cost. Consult AWS documentation for
details on management and pricing. Please look into Rackspace as well. Relevant Links HDP Documentation:
http://docs.hortonworks.com/ AWS Instructions for using Putty with Linux EC2 instances:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html AWS Discussion of Static IPs:
https://forums.aws.amazon.com/thread.jspa?threadID=71177 Adding a new drive to RHEL 6 (Relevant for adding volumes for Datanode storage):
http://www.techotopia.com/index.php/Adding_a_New_Disk_Drive_to_an_RHEL_6_System Toronto Hadoop User Group:
http://www.meetup.com/TorontoHUG/ Some more details on this process with Amazon EC2 and Cygwin:
http://pthakkar.com/2013/03/installing-hadoop-apache-ambari-amazon-ec2/
... View more
Labels:
02-04-2016
07:58 PM
6 Kudos
ISSUE: Choosing the appropriate Linux file system for HDFS deployment SOLUTION: The Hadoop Distributed File System is platform independent and can
function on top of any underlying file system and Operating System.
Linux offers a variety of file system choices, each with caveats that
have an impact on HDFS. As a general best practice, if you are mounting disks solely for Hadoop data, disable ‘noatime’. This speeds up reads for files. There are three Linux file system options that are popular to choose from:
Ext3 Ext4 XFS Yahoo uses the ext3 file system for its Hadoop deployments. ext3 is
also the default filesystem choice for many popular Linux OS flavours.
Since HDFS on ext3 has been publicly tested on Yahoo’s cluster it makes
for a safe choice for the underlying file system. ext4 is the successor to ext3. ext4 has better performance with large
files. ext4 also introduced delayed allocation of data, which adds a
bit more risk with unplanned server outages while decreasing
fragmentation and improving performance. XFS offers better disk space utilization than ext3 and has much
quicker disk formatting times than ext3. This means that it is quicker
to get started with a data node using XFS. Most often performance of a Hadoop cluster will not be constrained by
disk speed – I/O and RAM limitations will be more important. ext3 has
been extensively tested with Hadoop and is currently the stable option
to go with. ext4 and xfs can be considered as well and they give some
performance benefits. References:
http://wiki.apache.org/hadoop/DiskSetup http://hadoop-common.472056.n3.nabble.com/Hadoop-performance-xfs-and-ext4-td742325.html http://www.quora.com/What-are-the-advantages-and-disadvantages-of-the-filesystems-ext2-ext3-ext4-ReiserFS-and-XFS
... View more
Labels: