Member since
09-18-2015
100
Posts
98
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1408 | 03-22-2016 02:05 AM | |
989 | 03-17-2016 06:16 AM | |
1755 | 03-17-2016 06:13 AM | |
1272 | 03-12-2016 04:48 AM | |
4596 | 03-10-2016 08:04 PM |
01-14-2016
04:48 AM
1 Kudo
@abhishek shah from sandbox $ telnet 192.168.137.1 1433 See if you are able to connect. It looks like you are having an connection error.
... View more
01-14-2016
03:59 AM
1 Kudo
Your user may not have the privilege This is what you could do a) Login as mysql using root b) change database to forHadoop c) Run your Load Script. I have a tutorial for Atlas, where I create 2 mysql tables in a newly created DB instance. Take a look https://github.com/shivajid/atlas/tree/master/tutorial Thanks
... View more
01-13-2016
06:07 PM
@bdurai I posted on the Solr community and got the below answer from a Committer :- It's usually not all that difficult to write a multi-threaded client that uses CloudSolrClient, or even fire up multiple instances of the SolrJ client (assuming they can work on discreet sections of the documents you need to index). That avoids the problem Shawn alludes to. Plus other issues. If you do _not_ use CloudSolrClient, then all the docs go to some node in the system that then sub-divides the list (and you really should update in batches, see: https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/) then the node that receives the packet sub-divides it into groups based on what shard they should be part of and forwards them to the leaders for that shard, very significantly increasing the numbers of conversations being carried on between Solr nodes. Times the number of threads you're specifying with CUSC (I really regret the renaming from ConcurrentUpdateSolrServer, I liked writing CUSS). With CloudSolrClient, you can scale nearly linearly with the number of shards. Not so with CUSC. FWIW, Erick
... View more
01-13-2016
03:52 PM
9 Kudos
Overview
This document gives an overview of HDP Installation on
Isilon. The pdf version of the article with images - installation-guide-emc-isilon-hdp-23.pdf
Architecture
In installing Hadoop with Isilon, the key difference is
that,
each Isilon Node contains a
Hadoop Compatible
NameNode and DataNode. The compute and the storage are on separate set of node
unlike a common of Hadoop Architecture.
HDP will connect to the EMC Isilon cluster using EMC
SmartConnect. SmartConnect allows for transparent failover in case an Isilon
Node Failure. The transactions will seamlessly failover to the new nodes.
Isilon has various storage architectures which allows fast recovery of the
failed nodes and file level data protection settings.
Certification
EMC Isilon has 2 steps with their release process
Step 1 – Compliance Compatible
Step 2 – Certified with HWX certification suite
EMC will release its support for Hortonworks HDP, stating it
is compliance compatible.
There is a few months lag to get the complete in depth
certification with Hortonworks Certification suite.
The below is compliance compatible.
Isilon Release
Additional Isilon Patch
Ambari Support
HDP Support
7.2.0.3
Patch
159065
2.1.0*
HDP 2.3
7.2.1.1
none
2.1.1
HDP 2.3 Ranger
Ranger – Ranger plugin’s except HDFS plugin works. This has
been tested internally in HWX labs and is being tested. HDFS plugin will be
worked in the future releases. Please contact Isilon product team for more
details.
Note – WebHDFS port for
Isilon is on on port 8082 and not 50070 High Level Steps
Please read the steps carefully as there are special
deviations that is applied to attach the Isilon nodes to the HDP Compute
cluster.
There are main steps
Isilon Zones and IP Pool Configuration
Ambari Install and Configuration
Post Ambari Install Validation
Some key Isilon
concepts
Isilon is a cluster of hardware nodes, each node has its own CPU,
Memory and Storage. Each node has 2 network interfaces – Backend and Frontend.
The backend is connected over Infiniband Network
interface
The front end is 10 GB Ethernet interface
The Isilon cluster run on OneFS, which is based
on FreeBSD(Unix)
Isilon provides Data Protection via mirroring or
Reed Solomon FEC.
Isilon has its own Data Management Console Access Zones
Access zones, provides a method to logically
partition cluster access and allocate resources to self-contained units,
thereby providing a shared tenant environment.
In other words, it allows Isilon OneFs to
segment the cluster configuration and separate the data into multiple
self-contained units with their own sets of authentication providers; user
mapping rules, and SMB shares.
A Hadoop/HDP Cluster will connect to a
single Isilon zone. This is a one to one
mapping.
This is part of Isilon Administration.
Please work with your isilon administrator to create the needed isilon zone.
Useful video on zone:
https://www.youtube.com/watch?v=hF3W8o-n-Oo
By default, OneFS includes a single access
zone called System
. You should not use
the System zone for your cluster creation
.
Prepare Isilon zone
Follow the following steps are
needed:
1.Create a Isilon zone
2.Attach a pool of ip addresses to the zone
3.Assign a working directory to the zone
4.Create the needed users Create a Zone
Decide on a Zone Name. Ensure that the new zone
that you want to create does not exist.
For the purpose of example we will call the zone
“
zonehdp”. You can name it to your
organization’s liking. Replace it with the version name that you want to
assign.
hwxisi1-1# isi zone zones list
/ifs
is the default share across the nodes. Create a new directory for your zone
under a directory “isitest”.
isitest is just another hierarchy for the documentation
purpose.
hwxisi1-1# mkdir -p /ifs/isitest/zonehdp
Create the zone
hwxisi1-1# isi zone zones create --name zonehdp --path /ifs/isitest/zonehdp Attach a pool of ip addresses to the zone
Associate an IP address pool with the zone. In this step you are
creating the pool. Get the pool from your Isilon Admin. In this step replace the pool name, ip address
range and zonename to an appropriate value. Assign a working directory to the zone
Create the HDFS root directory. This is
usually called
hadoop and must be within the access zone
directory.
Set the
HDFS root directory for the access zone
Create
an indicator file so that we can easily determine when we are looking your
Isilon cluster via HDFS.
hwxisi1-1# isi zone zones create --name zonehdp --path /ifs/isitest/zonehdphwxisi1-1# mkdir -p /ifs/isitest/zonehdp/hadoop
hwxisi1-1# isi zone zones modify zonehdp --hdfs-root-directory /ifs/isitest/ zonehdp/hadoop;
hwxisi1-1# touch /ifs/isitest/zonehdp/hadoop/THIS_IS_ISILON_isitest_zonehdp
Check the hdfs thread settings and Block Size. If it is not
set, set it using the isilon documentation in the appendix. . This is a one
time activity
hwxisi1-1# isi hdfs settings view
Default Block Size: 128M
Default Checksum Type: none
Server Log Level: notice
Server Threads: 256
hwxisi1-1# isi hdfs settings modify --server-threads 256
*** Latest EMC recommendation is to leave this as auto by default and not specify no. of threads
*** Verify using isi hdfs settings view
hwxisi1-1# isi hdfs settings modify --default-block-size 128M
Create the users and directories
The scripts can be
downloaded from (Claudio’s github url. EMC Engineering officially supports
this.) https://github.com/claudiofahey/isilon-hadoop-tools/tree/master/onefs
Extract
the Isilon Hadoop Tools to your Isilon cluster. This can be placed in any
directory under /ifs. It is recommended to use /ifs/
isitest/scripts.
Execute
the script.
hwxisi1-1# bash
/ifs/isitest/scripts/isilon-hadoop-tools/onefs/isilon_create_users.sh --dist
hwx --startgid 501 --startuid 501 --zone
zonehdp
hwxisi1-1# bash
/ifs/isitest/scripts/isilon-hadoop-tools/onefs/isilon_create_directories.sh
--dist hwx --fixperm --zone
zonehdp
Map
the
hdfs user to the Isilon superuser. This will allow
the
hdfs user to chown (change ownership of) all files
hwxisi1-1# isi zone zones modify
--user-mapping-rules="hdfs=>root" --zone zonehdp
Permissions to root directory
Get the ZoneID from the following
isi zone zones view zonehdp
Replace the zoneid in the following
command and execute it.
isi_run -z <zoneid> "chown -R hdfs
/ifs/isitest/zonehdp/hadoop"
Restart Services
The command below will restart the HDFS
service on Isilon to ensure that any cached user mapping rules are flushed.
This will temporarily interrupt any HDFS connections coming from other Hadoop
clusters
hwxisi1-1#
isi services isi_hdfs_d disable ; isi services
isi_hdfs_d enable
Now you have completed the step in Isilon. We will now move
to installing Hortonworks HDP 2.2. on the compute nodes. The Insatalltion will
be performed using Apache Ambari. 1.7
Install Ambari Server
Ambari Server makes installation, configuration, management
and monitoring of hadoop cluster
simpler. Isilon zones have of Ambari Agent running on the Isilon Cluster.
Ambari server will be used to deploy HDP 2.3 to setup the
hadoop cluster with HDP. Please follow the Hortonworks Installation Document for
ensuring the pre-requisites for environment match
http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_Installing_HDP_AMB/content/_download_the_ambari_repo.html
The below steps are for CentOs 6 environment. Follow the
steps from the Ambari Installation guide.
1. Complete the environment pre-requisites mentioned in the install
guide.
2. Install the Ambari Server packages.
[root@hadoopmanager-server-0
~]# wget
-nv
http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.1.0/ambari.repo
-O /etc/yum.repos.d/ambari.repo
[root@hadoopmanager-server-0 ~]# yum install ambari-server
3. Setup Ambari Server.
[root@hadoopmanager-server-0
~]# ambari-server setup
4. Accept all defaults and complete the setup process.
5. Start the server.
[root@hadoopmanager-server-0
~]# ambari-server start
6.Browse to http://<ambari-host>:8080/.
7.Login using the following account:
Username: admin
Password: admin
Deploy a Hortonworks Hadoop Cluster with Isilon for
HDFS
You will deploy Hortonworks HDP Hadoop using the standard
process defined by Hortonworks. Ambari Server allows for the immediate usage of
an Isilon cluster for all HDFS services (NameNode and DataNode), no
reconfiguration will be necessary once the HDP install is completed.
1.Configure the Ambari Agent on Isilon.
isiloncluster1-1# isi zone zones modify zonehdp --hdfs-ambari-namenode \ <smartconnectip/ipfrom ip pool>
isiloncluster1-1# isi zone zones modify zonehdp --hdfs-ambari-server <hostname/ip
of the ambari server>
2.Login to Ambari Server.
3.Welcome: Specify the name of
your cluster
mycluster1.
4.Select Stack: Select the HDP 2.3
stack.
Install Option:
Ambari Agent is already installed with
Isilon OneFS. There are 2 ways of doing the following step. You can install the
Ambari Agent on the compute nodes, then you do not need to go back register the
Isilon host separately.
In the below steps you are installing
the agent using Ambari UI wizard, and that is the reason you are going back to
register the Agent.
Note
You will register your
hosts with Ambari in two steps. First you will deploy the Ambari agent to your
Linux hosts that will run HDP. Then you will go
back one step
and add Isilon.
1. Specify your Linux hosts for the compute nodes that will run HDP
master components and slave components for your HDP cluster installation in the
Target Hosts text box.
Put in
the ssh key
Click the Next button to deploy the Ambari
Agent to your Linux hosts and register them.
2.Once the
Ambari Agent has been deployed and registered on your Linux hosts, click
the
Back button.
Now you
will add the
SmartConnect address of
the Isilon cluster (mycluster1-hdfs.lab.example.com) to your list of target
hosts.
Check
the box to "Perform manual registration on hosts and do not use SSH."
Click the Next button. You should see that
Ambari agents on all hosts, including your Linux hosts and Isilon, become
registered.
If SmartConnect is not
available pick one IP Address from the IP Address pool.
5.Choose Services:
Select all the services.
6.Assign Masters:
Assign
NameNode and SNameNode components to the Isilon SmartConnect address.
ZooKeeper
should be installed on mycluster1-master-0 and any two workers.
All other master components can be assigned to the master Node or Compute Nodes.
7.Assign Slaves and Clients:
Assign Data Node to the SmartConnect Isilon Node.
The rest to the compute Nodes.
8.Customize Services:
Change the Webhdfs port
from 50070 to 8082.
Assign passwords to Hive,
Oozie, and any other selected services that require them.
Check that all local data
directories are within /data/1, /data/2, etc. The following settings should be
checked.
YARN
Node Manager log-dirs
YARN
Node Manager local-dirs
HBase
local directory
ZooKeeper
directory
Oozie
Data Dir
Storm
storm.local.dir
In YARN, set yarn.timeline-service.store-class to org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.
9.Review: Carefully review
your configuration and then click Deploy.
10.After a successful installation, Ambari will start and test all
of the selected services. Sometime it
may fail for the first time around. You may need to retry couple of times. Review
the Install, Start and Test page for any warnings or errors. It is recommended
to correct any warnings or errors before continuing.
Adding a Hadoop User
You must add a user account for each Linux user that will submit
MapReduce jobs. The procedure below can be used to add a user named hduser1.
Warning
The steps below will create local user and group accounts on
your Isilon cluster. If you are using a directory service such as Active
Directory, and you want these users and groups to be defined in your directory
service, then DO NOT run these steps. Instead, refer to the OneFS documentation
and
EMC Isilon Best Practices for Hadoop Data Storage.
1.Add user to Isilon.
isiloncluster1-1# isi auth groups create hduser1
--zone zone1 \ --provider local
isiloncluster1-1
# isi auth users create hduser1 --primary-group
hduser1 \ --zone zone1 --provider local \ --home-directory
/ifs/isiloncluster1/zone1/hadoop/user/hduser1
2.Add user to Hadoop nodes. Usually, this only needs to be
performed on the master-0 node.
[root@mycluster1-master-0 ~]# adduser hduser1
3.Create the user's home directory on HDFS. In the below command
you sudo as
hdfs and then executing
the “
hdfs” command.
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-mkdir -p /user/hduser1
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-chown hduser1:hduser1 \ /user/hduser1
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-chmod 755 /user/hduser1
Validation
Ambari Service Check
Ambari has built-in functional tests for each component. These
are executed automatically when you install your cluster with Ambari. To
execute them after installation, select the service in Ambari, click the
Service
Actions
button, and select Run Service Check.
Functional Tests
The tests below should be performed to ensure a proper
installation. Perform the tests in the order shown.
You must create the Hadoop user hduser1 before
proceeding.
HDFS
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-ls /
Found 5 items -rw-r--r-- 1 root
hadoop
0 2014-08-05 05:59 /THIS_IS_ISILON drwxr-xr-x - hbase
hbase
148 2014-08-05 06:06 /hbase drwxrwxr-x - solr
solr
0 2014-08-05 06:07 /solr drwxrwxrwt - hdfs supergroup
107 2014-08-05 06:07 /tmp drwxr-xr-x - hdfs
supergroup 184 2014-08-05 06:07 /user
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-put -f /etc/hosts /tmp
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-cat /tmp/hosts
127.0.0.1 localhost
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-rm -skipTrash /tmp/hosts
[root@mycluster1-master-0 ~]# su - hduser1
[hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls /
Found 5 items -rw-r--r-- 1 root
hadoop
0 2014-08-05 05:59 /THIS_IS_ISILON
drwxr-xr-x - hbase hbase
148 2014-08-05 06:28 /hbase drwxrwxr-x - solr
solr
0 2014-08-05 06:07 /solr drwxrwxrwt - hdfs supergroup
107 2014-08-05 06:07 /tmp drwxr-xr-x - hdfs
supergroup 209 2014-08-05 06:39 /user
[hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls ...
YARN / MapReduce
[hduser1@mycluster1-master-0 ~]$ hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
\ pi 10 1000 ... Estimated value of Pi is 3.14000000000000000000
[hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir in
You can put any file into the in directory. It
will be used the datasource for subsequent tests.
[hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f
/etc/hosts in
[hduser1@mycluster1-master-0 ~]$ hadoop fs -ls in
... [hduser1@mycluster1-master-0 ~]$ hadoop fs -rm -r out
[hduser1@mycluster1-master-0 ~]$ hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
\ wordcount in out ... [hduser1@mycluster1-master-0 ~]$ hadoop fs -ls
out
Found 4 items -rw-r--r-- 1 hduser1
hduser1 0 2014-08-05
06:44 out/_SUCCESS -rw-r--r-- 1 hduser1
hduser1 24 2014-08-05 06:44 out/part-r-00000
-rw-r--r-- 1 hduser1
hduser1 0 2014-08-05
06:44 out/part-r-00001 -rw-r--r-- 1 hduser1
hduser1 0 2014-08-05
06:44 out/part-r-00002
[hduser1@mycluster1-master-0 ~]$ hadoop fs -cat
out/part*
localhost 1
127.0.0.1 1
Browse to the YARN Resource Manager GUI http://mycluster1-master-0.lab.example.com:8088/.
Browse to the MapReduce History Server GUI http://mycluster1-master-0.lab.example.com:19888/. In particular, confirm that you can view the complete logs for
task attempts.
Hive
[hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir -p
sample_data/tab1
[hduser1@mycluster1-master-0 ~]$ cat - > tab1.csv
1,true,123.123,2012-10-24 08:55:00 2,false,1243.5,2012-10-25 13:40:00
3,false,24453.325,2008-08-22 09:33:21.123 4,false,243423.325,2007-05-12
22:32:21.33454 5,true,243.325,1953-04-22 09:11:33
[hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f
tab1.csv sample_data/tab1
[hduser1@mycluster1-master-0 ~]$ hive
hive> DROP TABLE IF EXISTS tab1; CREATE EXTERNAL
TABLE tab1 ( id INT, col_1 BOOLEAN,
col_2 DOUBLE, col_3 TIMESTAMP ) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' LOCATION '/user/hduser1/sample_data/tab1'; DROP TABLE IF EXISTS tab2; CREATE TABLE tab2 ( id INT,
col_1 BOOLEAN, col_2 DOUBLE, month INT,
day INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; INSERT OVERWRITE TABLE tab2 SELECT id, col_1,
col_2, MONTH(col_3), DAYOFMONTH(col_3) FROM tab1 WHERE YEAR(col_3) = 2012;
OK Time taken: 28.256 seconds
hive> show tables;
OK tab1 tab2 Time taken: 0.889 seconds, Fetched: 2
row(s)
hive> select * from tab1;
OK
1 true
123.123 2012-10-24 08:55:00
2 false 1243.5 2012-10-25 13:40:00
3 false 24453.325
2008-08-22 09:33:21.123 4 false 243423.325
2007-05-12 22:32:21.33454 5 true
243.325 1953-04-22 09:11:33 Time taken:
1.083 seconds, Fetched: 5 row(s)
hive> select * from tab2; OK
1 true
123.123 10 24
2 false 1243.5 10 25 Time
taken: 0.094 seconds, Fetched: 2 row(s)
hive> select * from tab1 where id=1; OK
1 true
123.123 2012-10-24 08:55:00 Time taken:
15.083 seconds, Fetched: 1 row(s)
hive>
select * from tab2 where id=1;
OK 1 true
123.123 10 24 Time
taken: 13.094 seconds, Fetched: 1 row(s)
hive> exit;
Pig
[hduser1@mycluster1-master-0 ~]$ pig
grunt> a = load 'in';
grunt> dump a; ... Success! ...
grunt> quit;
HBase
[hduser1@mycluster1-master-0 ~]$ hbase shell
hbase(main):001:0> create 'test', 'cf' 0
row(s) in 3.3680 seconds => Hbase::Table - test
hbase(main):002:0> list 'test' TABLE test 1
row(s) in 0.0210 seconds => ["test"]
hbase(main):003:0> put 'test', 'row1', 'cf:a',
'value1'
0 row(s) in 0.1320 seconds
hbase(main):004:0> put 'test', 'row2', 'cf:b',
'value2'
0 row(s) in 0.0120 seconds
hbase(main):005:0> scan 'test'
ROW
COLUMN+CELL
row1
column=cf:a,timestamp=1407542488028,value=value1
row2 column=cf:b,timestamp=1407542499562,value=value2
2 row(s) in 0.0510 seconds
hbase(main):006:0> get 'test', 'row1'
COLUMN CELL
cf:a
timestamp=1407542488028,value=value1 1 row(s) in 0.0240 seconds
hbase(main):007:0>
quit
Use Case - Searching Wikipedia
One of the many unique features of Isilon is its multi-protocol
support. This allows you, for instance, to write a file using SMB (Windows) or
NFS (Linux/Unix) and then read it using HDFS to perform Hadoop analytics on it.
In this section, we exercise this capability to download the
entire Wikipedia database (excluding media) using your favorite browser to
Isilon. As soon as the download completes, we'll run a Hadoop grep to search
the entire text of Wikipedia using our Hadoop cluster. As this search doesn't
rely on a word index, your regular expression can be as complicated as you
like.
1.First, let's connect your client (with your favorite web
browser) to your Isilon cluster.
1.If you are using a Windows host or other SMB client:
1.Click Start -> Run.
2.Enter: \\<Isilon Host>\ifs
3.You may authenticate
as
root with your Isilon root password.
4.Browse to \ifs\isiloncluster1\zone1\hadoop\tmp.
5.Create a directory here
called
wikidata. This is where you will download the Wikipedia data
to.
2.If you are using a Linux host or other NFS client:
1.Mount your NFS export.
[root@workstation ~]$ mkdir /mnt/isiloncluster1
[root@workstation ~]$ echo \
subnet0-pool0.isiloncluster1.lab.example.com:/ifs \ /mnt/isiloncluster1 nfs \ nolock,nfsvers=3,tcp,rw,hard,intr,timeo=600,retrans=2,rsize=131072,wsize=524288
\ >> /etc/fstab
[root@workstation ~]$ mount -a
[root@workstation ~]$ mkdir -p \
/mnt/isiloncluser1/isiloncluster1/zone1/hadoop/tmp/wikidata
2.On Mac
How to create an NFS Mount
http://support.apple.com/kb/TA22243
3.On our favorite web browser and go to http://dumps.wikimedia.org/enwiki/latest.
4.Locate the file enwiki-latest-pages-articles.xml.bz2 and
download it directly to the
wikidata folder on Isilon. Your
web browser will be writing this file to the Isilon file system using SMB or
NFS.
Note
This file is approximately 10 GB in size and contains the entire
text of the English version of Wikipedia. If this is too large, you may want to
download one of the smaller files such as
enwiki-latest-all-titles.gz.
5.Now let's run the Hadoop grep job. We'll search for all two-word
phrases that begin with
EMC.
[hduser1@mycluster1-master-0 ~]$ hadoop fs -ls
/tmp/wikidata
[hduser1@mycluster1-master-0 ~]$ hadoop fs -rm -r
/tmp/wikigrep
[hduser1@mycluster1-master-0 ~]$ hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
\ grep /tmp/wikidata /tmp/wikigrep "EMC [^ ]*"
6.When the the job completes, use your favorite text file viewer
to view the output file
/tmp/wikigrep/part-r-00000. You may open the file
in a text editor from the NFS Mount
... View more
Labels:
01-13-2016
03:16 PM
2 Kudos
@hrongali
Isilon is designed to handle mult protocol workload. The namenode is installed by default. You would have to asses how much of CPU is being used today and what workload you are bringing on. Make sure it is X410 model of the Isilon that they are using. You can use other models, but it is for experimental purpose only. At the worst case they may need to add more Isilon nodes to the architecture incase there is more workload. Get an Isilon SE involved in the Isilon sizing as they have incentive to sell more hardware. A good use case would be to run some TPC DS work load with Hive Tez and see how it behaves Also make sure that the compute nodes are co-located with the Isilon nodes, as the compute and storage will be pretty chatty.
... View more
01-13-2016
03:02 PM
2 Kudos
@Amit Jain - As mentioned by @bsaini HDFS is not officially part of Atlas current roadmap. It will be good to raise and jira and get votes for it. It is the only way to push this. While this happens as a community member you can always write your own types. Remember, Atlas has this open API to create your own type system to model anything you want. I have created a small utility based on this, called Atlas CLI. https://github.com/shivajid/atlas/tree/master/codesamples/atlas A good code examples that the developers always work with is the QuickStart.java https://github.com/apache/incubator-atlas/blob/master/webapp/src/main/java/org/apache/atlas/examples/QuickStart.java IHTH
... View more
01-13-2016
05:19 AM
4 Kudos
Prerequisites:
HDP 2.x OneFS 7.2 Ambari should report all green for all hosts in
cluster Forward and reverse hostnames or smart connect
should be configured Kerberos Requirements:
All KDC’s need to have different realm name One KDC per zone Disable AES encryption in client krb5.conf Deleting principals from Isilon doesn’t remove
them from kdc Don’t use the isi auth krb5 spn fix command Overview: Following these steps in the order below will accomplish
these tasks:
KDC Setup: install and configure Hadoop Client Setup: Kerberos configured and
tested Secure Isilon Setup: configure, create
principals and set proxyusers Finish Hadoop Client Setup: Create all necessary
principals, place keytabs on the correct hosts and start services Finish Hadoop Client Setup: kerberos_only
configuration KDC Setup: Configure KDC: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14-1-2.html below is an overview of the
steps taken in the link provided Follow section 13: 1.2 through 13: 1.4 Modify kdc.conf for supported
encryption type supported_enctypes = RC4-HMAC:normal
DES-CBC-MD5:normal DES-CBC-CRC:normal Modify kdc.conf to have corrected
Realm Name. Update kadm5.acl to reflect the new
Realm Name Create kdc /usr/sbin/kdb5 util_create –s /etc/rc.d/init.d/krb5kdc start /etc/rc.d/init.d/kadmin start Hadoop Client Setup:
Run
this on all hadoop clients yum install
krb5-workstation krb5-libs Modify and copy
the krb5.conf from the the kdc to all the clients Update Realm Name, kdc and admin
server Make sure to update the default realm Update encryption to not include AES default_tgs_enctypes = RC4-HMAC
DES-CBC-MD5 DES-CBC-CRC default_tkt_enctypes = RC4-HMAC
DES-CBC-MD5 DES-CBC-CRC permitted_enctypes = RC4-HMAC
DES-CBC-MD5 DES-CBC-CRC preferred_enctypes = RC4-HMAC
DES-CBC-MD5 DES-CBC-CRC
Now
you can test using kinit from the clients and it should work kinit kadmin/admin Secure Isilon Setup:
To
prevent auto spn generation in the system zone you need to set ‘All Auth
Providers’ setting on the system zone to ‘No’. isi zone zones modify --zone=system --all-auth-providers=No Add
the KDC to the Isilon cluster and each KDC needs a unique name isi auth krb5 create --realm=EXAMPLE.COM
--admin-server=kdc.example.com --kdc=kdc.example.com --user=kadmin/admin --password=isi To
verify the join and list all the auth providers for the cluster isi auth status Modify
zone to use authenticaion provider isi zone zones modify --zone=zone-example
--add-auth-provider=krb5:EXAMPLE.COM Verify isi zone zones view --zone=zone-example Create
the Isilon spn’s for the zone. The format needs to be hdfs/<cluster
hostname/SC name>@REALM and HTTP/<cluster hostname/SC name>@REALM isi auth krb5 spn create --provider-name=EXAMPLE.COM
--spn=hdfs/cluster.example.com@EXAMPLE.COM
--user=kadmin/admin --password=isi isi auth krb5 spn create --provider-name=EXAMPLE.COM
--spn=HTTP/cluster.example.com@EXAMPLE.COM
--user=kadmin/admin --password=isi Verify
spn creation isi auth krb5 spn list --provider-name=EXAMPLE.COM Lastly
create proxy users isi hdfs proxyusers create oozie --zone=zone-example
--add-user=ambari-qa isi hdfs proxyusers create hive --zone=zone-example
--add-user=ambari-qa Finish Hadoop Client Setup:
Enter
the Ambari secure setup wizard Admin
-> security -> enable security Click
through the wizard untill you get to the screen that configures the principals.
Note: Isilon does not convert principal
names to short names using rules so don’t use aliases(e.g. rm instead of yarn) Realm
name Hdfs
-> namenode hdfs/cluster.example.com@EXAMPLE.COM Hdfs
-> secondarynamenode hdfs/cluster.example.com@EXAMPLE.COM Hdfs
-> datanode hdfs/cluster.example.com@EXAMPLE.COM Falcon
-> namenode hdfs/cluster.example.com@EXAMPLE.COM Yarn
-> resourceManager yarn/_HOST Yarn
-> nodemanager yarn/_HOST Mapreduce2
-> history server principal -> mapred/_HOST DFS Web Principal -> HTTP/cluster.example.com@EXAMPLE.COM Now
download the csv and copy it to the ambari server On
the server put the file in /var/lib/ambari-server/resources/scripts/ On
the ambari server go to that path and run the keytabs.sh script ./keytabs.sh
host-principal-keytab-list.csv > generate_keytabs.sh In
the generate_keytabs.sh script that was just generated you need to comment out
all the lines that have to do with principals being created for hdfs or for the
Isilon Cluster. This script is going to generate all the principals for the
hadoop services but Isilon principals are already created by the Isilon cluster
so there is no need to create them again. Doing so will cause secure cluster to
not authenticate properly. Finally
execute the ./generate_keytabs.sh and this will create all the principals for the
hadoop services and export a keytab for every host in the cluster. Copy
the keytab tar files created to the clients and extract them in the proper
location. Finish
the wizard install Finish Secure Isilon Setup:
After
everything has finished installing you need to configure the Isilon zone to
only allow secure connections. isi zone zones modify
--zone=zone-example --hdfs-authentication=kerberos_only
... View more
01-13-2016
02:47 AM
Bosco, CloudSolrClient will return an LBHTTPClient (which load balances across the nodes). But I do not see that LBHTTPClient is multithreaded. So, the question begs, which has a higher throughput?
... View more
01-13-2016
12:32 AM
1 Kudo
We have a customer that needs to update few billion documents to SolrCloud. I know the suggested way of using is SolrCloudClient, for its load balancing feature. As per docs - CloudSolrClient SolrJ client class to communicate with SolrCloud. Instances of this class communicate with Zookeeper to discover Solr endpoints for SolrCloud collections, and then use the LBHttpSolrClient to issue requests. This class assumes the id field for your documents is called 'id' - if this is not the case, you must set the right name with setIdField(String) . As per the docs - ConcurrentUpdateSolrClient
ConcurrentUpdateSolrClient buffers all added documents and writes them into open HTTP connections. This class is thread safe. Params from UpdateRequest are converted to http request parameters. When params change between UpdateRequests a new HTTP request is started. Although any SolrClient request can be made with this implementation, it is only recommended to use ConcurrentUpdateSolrClient with /update requests. The class HttpSolrClient is better suited for the query interface. Now since with ConcurrentUdateSolrClient I am able to use a queue and a pool of threads, which makes it more attractive to use over CloudSolrClient which will use a HTTPSolrClient once it gets a set of nodes to do the updates. I would love to hear more in depth discussion on these 2 APIs. Thanks Shivaji
... View more
Labels:
- Labels:
-
Apache Solr
- « Previous
- Next »