Created on 01-13-2016 03:52 PM
Overview
This document gives an overview of HDP Installation on Isilon.
The pdf version of the article with images - installation-guide-emc-isilon-hdp-23.pdf
In installing Hadoop with Isilon, the key difference is that, each Isilon Node contains a Hadoop Compatible NameNode and DataNode. The compute and the storage are on separate set of node unlike a common of Hadoop Architecture.
HDP will connect to the EMC Isilon cluster using EMC SmartConnect. SmartConnect allows for transparent failover in case an Isilon Node Failure. The transactions will seamlessly failover to the new nodes. Isilon has various storage architectures which allows fast recovery of the failed nodes and file level data protection settings.
EMC Isilon has 2 steps with their release process
EMC will release its support for Hortonworks HDP, stating it is compliance compatible.
There is a few months lag to get the complete in depth certification with Hortonworks Certification suite.
The below is compliance compatible.
Isilon Release Additional Isilon Patch Ambari Support HDP Support
7.2.0.3 Patch 159065 2.1.0* HDP 2.3
7.2.1.1 none 2.1.1 HDP 2.3
Ranger – Ranger plugin’s except HDFS plugin works. This has been tested internally in HWX labs and is being tested. HDFS plugin will be worked in the future releases. Please contact Isilon product team for more details.
Note – WebHDFS port for Isilon is on on port 8082 and not 50070
Please read the steps carefully as there are special deviations that is applied to attach the Isilon nodes to the HDP Compute cluster.
There are main steps
Some key Isilon concepts
Access Zones
Prepare Isilon zone
Follow the following steps are needed:
Create a Zone
hwxisi1-1# isi zone zones list
hwxisi1-1# mkdir -p /ifs/isitest/zonehdp
hwxisi1-1# isi zone zones create --name zonehdp --path /ifs/isitest/zonehdp
Attach a pool of ip addresses to the zone
Assign a working directory to the zone
hwxisi1-1# isi zone zones create --name zonehdp --path /ifs/isitest/zonehdphwxisi1-1# mkdir -p /ifs/isitest/zonehdp/hadoop hwxisi1-1# isi zone zones modify zonehdp --hdfs-root-directory /ifs/isitest/ zonehdp/hadoop; hwxisi1-1# touch /ifs/isitest/zonehdp/hadoop/THIS_IS_ISILON_isitest_zonehdp
hwxisi1-1# isi hdfs settings view Default Block Size: 128M Default Checksum Type: none Server Log Level: notice Server Threads: 256 hwxisi1-1# isi hdfs settings modify --server-threads 256 *** Latest EMC recommendation is to leave this as auto by default and not specify no. of threads *** Verify using isi hdfs settings view hwxisi1-1# isi hdfs settings modify --default-block-size 128M
Create the users and directories
The scripts can be downloaded from (Claudio’s github url. EMC Engineering officially supports this.)
https://github.com/claudiofahey/isilon-hadoop-tools/tree/master/onefs
hwxisi1-1# bash /ifs/isitest/scripts/isilon-hadoop-tools/onefs/isilon_create_users.sh --dist hwx --startgid 501 --startuid 501 --zone zonehdp
hwxisi1-1# bash /ifs/isitest/scripts/isilon-hadoop-tools/onefs/isilon_create_directories.sh --dist hwx --fixperm --zone zonehdp
Map the hdfs user to the Isilon superuser. This will allow the hdfs user to chown (change ownership of) all files
hwxisi1-1# isi zone zones modify --user-mapping-rules="hdfs=>root" --zone zonehdp Permissions to root directory
Get the ZoneID from the following
isi zone zones view zonehdp
Replace the zoneid in the following command and execute it.
isi_run -z <zoneid> "chown -R hdfs /ifs/isitest/zonehdp/hadoop"
The command below will restart the HDFS service on Isilon to ensure that any cached user mapping rules are flushed. This will temporarily interrupt any HDFS connections coming from other Hadoop clusters
hwxisi1-1# isi services isi_hdfs_d disable ; isi services isi_hdfs_d enable
Now you have completed the step in Isilon. We will now move to installing Hortonworks HDP 2.2. on the compute nodes. The Insatalltion will be performed using Apache Ambari. 1.7
Install Ambari Server
Ambari Server makes installation, configuration, management and monitoring of hadoop cluster simpler. Isilon zones have of Ambari Agent running on the Isilon Cluster.
Ambari server will be used to deploy HDP 2.3 to setup the hadoop cluster with HDP. Please follow the Hortonworks Installation Document for ensuring the pre-requisites for environment match
The below steps are for CentOs 6 environment. Follow the steps from the Ambari Installation guide.
1. Complete the environment pre-requisites mentioned in the install guide.
2. Install the Ambari Server packages.
[root@hadoopmanager-server-0 ~]# wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.1.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
[root@hadoopmanager-server-0 ~]# yum install ambari-server
3. Setup Ambari Server.
[root@hadoopmanager-server-0 ~]# ambari-server setup
4. Accept all defaults and complete the setup process.
5. Start the server.
[root@hadoopmanager-server-0 ~]# ambari-server start
6.Browse to http://<ambari-host>:8080/.
7.Login using the following account:
Username: admin
Password: admin
Deploy a Hortonworks Hadoop Cluster with Isilon for HDFS
You will deploy Hortonworks HDP Hadoop using the standard process defined by Hortonworks. Ambari Server allows for the immediate usage of an Isilon cluster for all HDFS services (NameNode and DataNode), no reconfiguration will be necessary once the HDP install is completed.
1.Configure the Ambari Agent on Isilon.
isiloncluster1-1# isi zone zones modify zonehdp --hdfs-ambari-namenode \ <smartconnectip/ipfrom ip pool> isiloncluster1-1# isi zone zones modify zonehdp --hdfs-ambari-server <hostname/ip of the ambari server>
2.Login to Ambari Server.
3.Welcome: Specify the name of your cluster mycluster1.
4.Select Stack: Select the HDP 2.3 stack.
Ambari Agent is already installed with Isilon OneFS. There are 2 ways of doing the following step. You can install the Ambari Agent on the compute nodes, then you do not need to go back register the Isilon host separately.
In the below steps you are installing the agent using Ambari UI wizard, and that is the reason you are going back to register the Agent.
Note
You will register your hosts with Ambari in two steps. First you will deploy the Ambari agent to your Linux hosts that will run HDP. Then you will go back one step and add Isilon.
1. Specify your Linux hosts for the compute nodes that will run HDP master components and slave components for your HDP cluster installation in the Target Hosts text box.
Put in the ssh key
Click the Next button to deploy the Ambari Agent to your Linux hosts and register them.
2.Once the Ambari Agent has been deployed and registered on your Linux hosts, click the Back button.
Now you will add the SmartConnect address of the Isilon cluster (mycluster1-hdfs.lab.example.com) to your list of target hosts.
Check the box to "Perform manual registration on hosts and do not use SSH."
Click the Next button. You should see that Ambari agents on all hosts, including your Linux hosts and Isilon, become registered.
If SmartConnect is not available pick one IP Address from the IP Address pool.
5.Choose Services:
Select all the services.
6.Assign Masters:
Assign NameNode and SNameNode components to the Isilon SmartConnect address.
ZooKeeper should be installed on mycluster1-master-0 and any two workers.
All other master components can be assigned to the master Node or Compute Nodes.
7.Assign Slaves and Clients:
Assign Data Node to the SmartConnect Isilon Node.
The rest to the compute Nodes.
8.Customize Services:
In YARN, set yarn.timeline-service.store-class to org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.
9.Review: Carefully review your configuration and then click Deploy.
10.After a successful installation, Ambari will start and test all of the selected services. Sometime it may fail for the first time around. You may need to retry couple of times. Review the Install, Start and Test page for any warnings or errors. It is recommended to correct any warnings or errors before continuing.
Adding a Hadoop User
You must add a user account for each Linux user that will submit MapReduce jobs. The procedure below can be used to add a user named hduser1.
Warning
The steps below will create local user and group accounts on your Isilon cluster. If you are using a directory service such as Active Directory, and you want these users and groups to be defined in your directory service, then DO NOT run these steps. Instead, refer to the OneFS documentation and EMC Isilon Best Practices for Hadoop Data Storage.
1.Add user to Isilon.
isiloncluster1-1# isi auth groups create hduser1 --zone zone1 \ --provider local isiloncluster1-1
# isi auth users create hduser1 --primary-group hduser1 \ --zone zone1 --provider local \ --home-directory /ifs/isiloncluster1/zone1/hadoop/user/hduser1
2.Add user to Hadoop nodes. Usually, this only needs to be performed on the master-0 node.
[root@mycluster1-master-0 ~]# adduser hduser1
3.Create the user's home directory on HDFS. In the below command you sudo as hdfs and then executing the “ hdfs” command.
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -mkdir -p /user/hduser1
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -chown hduser1:hduser1 \ /user/hduser1
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -chmod 755 /user/hduser1
Validation
Ambari Service Check
Ambari has built-in functional tests for each component. These are executed automatically when you install your cluster with Ambari. To execute them after installation, select the service in Ambari, click the Service Actions button, and select Run Service Check.
Functional Tests
The tests below should be performed to ensure a proper installation. Perform the tests in the order shown.
You must create the Hadoop user hduser1 before proceeding.
HDFS
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -ls / Found 5 items -rw-r--r-- 1 root hadoop 0 2014-08-05 05:59 /THIS_IS_ISILON drwxr-xr-x - hbase hbase 148 2014-08-05 06:06 /hbase drwxrwxr-x - solr solr 0 2014-08-05 06:07 /solr drwxrwxrwt - hdfs supergroup 107 2014-08-05 06:07 /tmp drwxr-xr-x - hdfs supergroup 184 2014-08-05 06:07 /user
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -put -f /etc/hosts /tmp
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -cat /tmp/hosts 127.0.0.1 localhost
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -rm -skipTrash /tmp/hosts
[root@mycluster1-master-0 ~]# su - hduser1
[hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls / Found 5 items -rw-r--r-- 1 root hadoop 0 2014-08-05 05:59 /THIS_IS_ISILON drwxr-xr-x - hbase hbase 148 2014-08-05 06:28 /hbase drwxrwxr-x - solr solr 0 2014-08-05 06:07 /solr drwxrwxrwt - hdfs supergroup 107 2014-08-05 06:07 /tmp drwxr-xr-x - hdfs supergroup 209 2014-08-05 06:39 /user
[hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls ...
YARN / MapReduce
[hduser1@mycluster1-master-0 ~]$ hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ pi 10 1000 ... Estimated value of Pi is 3.14000000000000000000
[hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir in
You can put any file into the in directory. It will be used the datasource for subsequent tests.
[hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f /etc/hosts in
[hduser1@mycluster1-master-0 ~]$ hadoop fs -ls in ... [hduser1@mycluster1-master-0 ~]$ hadoop fs -rm -r out
[hduser1@mycluster1-master-0 ~]$ hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ wordcount in out ... [hduser1@mycluster1-master-0 ~]$ hadoop fs -ls out Found 4 items -rw-r--r-- 1 hduser1 hduser1 0 2014-08-05 06:44 out/_SUCCESS -rw-r--r-- 1 hduser1 hduser1 24 2014-08-05 06:44 out/part-r-00000 -rw-r--r-- 1 hduser1 hduser1 0 2014-08-05 06:44 out/part-r-00001 -rw-r--r-- 1 hduser1 hduser1 0 2014-08-05 06:44 out/part-r-00002
[hduser1@mycluster1-master-0 ~]$ hadoop fs -cat out/part* localhost 1 127.0.0.1 1
Browse to the YARN Resource Manager GUI http://mycluster1-master-0.lab.example.com:8088/.
Browse to the MapReduce History Server GUI http://mycluster1-master-0.lab.example.com:19888/. In particular, confirm that you can view the complete logs for task attempts.
Hive
[hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir -p sample_data/tab1
[hduser1@mycluster1-master-0 ~]$ cat - > tab1.csv 1,true,123.123,2012-10-24 08:55:00 2,false,1243.5,2012-10-25 13:40:00 3,false,24453.325,2008-08-22 09:33:21.123 4,false,243423.325,2007-05-12 22:32:21.33454 5,true,243.325,1953-04-22 09:11:33
[hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f tab1.csv sample_data/tab1
[hduser1@mycluster1-master-0 ~]$ hive
hive> DROP TABLE IF EXISTS tab1; CREATE EXTERNAL TABLE tab1 ( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/hduser1/sample_data/tab1'; DROP TABLE IF EXISTS tab2; CREATE TABLE tab2 ( id INT, col_1 BOOLEAN, col_2 DOUBLE, month INT, day INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; INSERT OVERWRITE TABLE tab2 SELECT id, col_1, col_2, MONTH(col_3), DAYOFMONTH(col_3) FROM tab1 WHERE YEAR(col_3) = 2012;
OK Time taken: 28.256 seconds
hive> show tables;
OK tab1 tab2 Time taken: 0.889 seconds, Fetched: 2 row(s)
hive> select * from tab1;
OK
1 true 123.123 2012-10-24 08:55:00 2 false 1243.5 2012-10-25 13:40:00 3 false 24453.325 2008-08-22 09:33:21.123 4 false 243423.325 2007-05-12 22:32:21.33454 5 true 243.325 1953-04-22 09:11:33 Time taken: 1.083 seconds, Fetched: 5 row(s)
hive> select * from tab2; OK 1 true 123.123 10 24 2 false 1243.5 10 25 Time taken: 0.094 seconds, Fetched: 2 row(s)
hive> select * from tab1 where id=1; OK 1 true 123.123 2012-10-24 08:55:00 Time taken: 15.083 seconds, Fetched: 1 row(s) hive> select * from tab2 where id=1;
OK 1 true 123.123 10 24 Time taken: 13.094 seconds, Fetched: 1 row(s)
hive> exit;
Pig
[hduser1@mycluster1-master-0 ~]$ pig
grunt> a = load 'in';
grunt> dump a; ... Success! ...
grunt> quit;
HBase
[hduser1@mycluster1-master-0 ~]$ hbase shell
hbase(main):001:0> create 'test', 'cf' 0 row(s) in 3.3680 seconds => Hbase::Table - test
hbase(main):002:0> list 'test' TABLE test 1 row(s) in 0.0210 seconds => ["test"]
hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1' 0 row(s) in 0.1320 seconds
hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2' 0 row(s) in 0.0120 seconds
hbase(main):005:0> scan 'test' ROW COLUMN+CELL row1 column=cf:a,timestamp=1407542488028,value=value1 row2 column=cf:b,timestamp=1407542499562,value=value2 2 row(s) in 0.0510 seconds
hbase(main):006:0> get 'test', 'row1' COLUMN CELL cf:a timestamp=1407542488028,value=value1 1 row(s) in 0.0240 seconds hbase(main):007:0> quit
Use Case - Searching Wikipedia
One of the many unique features of Isilon is its multi-protocol support. This allows you, for instance, to write a file using SMB (Windows) or NFS (Linux/Unix) and then read it using HDFS to perform Hadoop analytics on it.
In this section, we exercise this capability to download the entire Wikipedia database (excluding media) using your favorite browser to Isilon. As soon as the download completes, we'll run a Hadoop grep to search the entire text of Wikipedia using our Hadoop cluster. As this search doesn't rely on a word index, your regular expression can be as complicated as you like.
1.First, let's connect your client (with your favorite web browser) to your Isilon cluster.
1.If you are using a Windows host or other SMB client:
1.Click Start -> Run.
2.Enter: \\<Isilon Host>\ifs
3.You may authenticate as root with your Isilon root password.
4.Browse to \ifs\isiloncluster1\zone1\hadoop\tmp.
5.Create a directory here called wikidata. This is where you will download the Wikipedia data to.
2.If you are using a Linux host or other NFS client:
1.Mount your NFS export.
[root@workstation ~]$ mkdir /mnt/isiloncluster1 [root@workstation ~]$ echo \ subnet0-pool0.isiloncluster1.lab.example.com:/ifs \ /mnt/isiloncluster1 nfs \ nolock,nfsvers=3,tcp,rw,hard,intr,timeo=600,retrans=2,rsize=131072,wsize=524288 \ >> /etc/fstab
[root@workstation ~]$ mount -a
[root@workstation ~]$ mkdir -p \ /mnt/isiloncluser1/isiloncluster1/zone1/hadoop/tmp/wikidata
2.On Mac
How to create an NFS Mount
http://support.apple.com/kb/TA22243
3.On our favorite web browser and go to http://dumps.wikimedia.org/enwiki/latest.
4.Locate the file enwiki-latest-pages-articles.xml.bz2 and download it directly to the wikidata folder on Isilon. Your web browser will be writing this file to the Isilon file system using SMB or NFS.
Note
This file is approximately 10 GB in size and contains the entire text of the English version of Wikipedia. If this is too large, you may want to download one of the smaller files such as enwiki-latest-all-titles.gz.
5.Now let's run the Hadoop grep job. We'll search for all two-word phrases that begin with EMC.
[hduser1@mycluster1-master-0 ~]$ hadoop fs -ls /tmp/wikidata
[hduser1@mycluster1-master-0 ~]$ hadoop fs -rm -r /tmp/wikigrep
[hduser1@mycluster1-master-0 ~]$ hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ grep /tmp/wikidata /tmp/wikigrep "EMC [^ ]*"
6.When the the job completes, use your favorite text file viewer to view the output file /tmp/wikigrep/part-r-00000. You may open the file in a text editor from the NFS Mount
Created on 02-11-2016 12:38 PM
@Shivaji This is wonderful. Thanks!!
Created on 02-29-2016 03:02 PM
Additional white papers from EMC:
http://www.criticism.com/white-papers/white-papers.php
Latest EMC Best Practices January 2015 Version:
https://www.emc.com/collateral/white-papers/h13926-wp-emc-isilon-hadoop-best-practices-onefs72.pdf
Created on 04-11-2016 06:48 PM
The best practice to download Isilon patches is to go to support.emc.com and search for the patch number 159065.
Created on 06-08-2016 09:15 PM
@Ancil McBarnett is there an upgrade? HDP 2.4 was just mentioned as supported by EMC.
Created on 06-10-2016 11:48 PM
@Timothy Spann, I think the easiest way to track OneFS support for new versions of Ambari and HDP is to visit the ECN and follow this page, https://community.emc.com/docs/DOC-37101 .
One thing to note is that this guide is written for the 7.2.x. In 8.0.0.0 the conmmand line structure was changed, so refer to the new CLI guide. Hadoop starts on page 997, http://www.emc.com/collateral/TechnicalDocument/docu65065.pdf .
Created on 06-21-2016 09:38 AM
Nice Article and Very useful.
Created on 08-17-2016 08:50 PM
An updated guide was published recently by Isilon, covering OneFS 8.0.0.x. It's titled the EMC Isilon OneFS with Hadoop and Hortonworks Installation Guide: http://www.emc.com/collateral/TechnicalDocument/docu71396.pdf