Member since 
    
	
		
		
		09-18-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                100
            
            
                Posts
            
        
                98
            
            
                Kudos Received
            
        
                11
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2151 | 03-22-2016 02:05 AM | |
| 1394 | 03-17-2016 06:16 AM | |
| 4977 | 03-17-2016 06:13 AM | |
| 1807 | 03-12-2016 04:48 AM | |
| 5782 | 03-10-2016 08:04 PM | 
			
    
	
		
		
		01-14-2016
	
		
		04:48 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							@abhishek shah from sandbox  $ telnet 192.168.137.1  1433  See if you are able to connect. It looks like you are having an connection error.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-14-2016
	
		
		03:59 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Your user may not have the privilege  This is what you could do  a) Login as mysql using root  b) change database to forHadoop  c) Run your Load Script.   I have a tutorial for Atlas, where I create 2 mysql tables in a newly created DB instance. Take a look  https://github.com/shivajid/atlas/tree/master/tutorial  Thanks 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-13-2016
	
		
		06:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @bdurai  I posted on the Solr community and got the below answer from a Committer :-  It's usually not all that difficult to write a multi-threaded client that uses CloudSolrClient, or even fire up multiple instances of the SolrJ client (assuming they can work  on discreet sections of the documents you need to index).  That avoids the problem Shawn alludes to. Plus other  issues. If you do _not_ use CloudSolrClient, then all the  docs go to some node in the system that then sub-divides  the list (and you really should update in batches, see:  https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/)  then the node that receives the packet sub-divides it  into groups based on what shard they should be part of  and forwards them to the leaders for that shard, very  significantly increasing the numbers of conversations  being carried on between Solr nodes. Times the number  of threads you're specifying with CUSC (I really regret  the renaming from ConcurrentUpdateSolrServer, I liked  writing CUSS).  With CloudSolrClient, you can scale nearly linearly with  the number of shards. Not so with CUSC.  FWIW,  Erick 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-13-2016
	
		
		03:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		9 Kudos
		
	
				
		
	
		
					
							 
	  
	    Overview  
	This document gives an overview of HDP Installation on
Isilon.   The pdf version of the article with images - installation-guide-emc-isilon-hdp-23.pdf    
	Architecture  
	In installing Hadoop with Isilon, the key difference is
that, 
	each Isilon Node contains a
Hadoop Compatible 
	NameNode and  DataNode. The compute and the storage are on separate set of node
unlike a common of Hadoop Architecture.  
	HDP will connect to the EMC Isilon cluster using EMC
SmartConnect. SmartConnect allows for transparent failover in case an Isilon
Node Failure. The transactions will seamlessly failover to the new nodes.
Isilon has various storage architectures which allows fast recovery of the
failed nodes and file level data protection settings.  
	
	  
	Certification  
	EMC Isilon has 2 steps with their release process  
	
 Step 1 – Compliance Compatible 	
 Step 2 – Certified with HWX certification suite   
	EMC will release its support for Hortonworks HDP, stating it
is compliance compatible.  
	There is a few months lag to get the complete in depth
certification with Hortonworks Certification suite.  
	The below is compliance compatible.  	
		Isilon Release
	
	
		Additional Isilon Patch
	
	
		Ambari Support
	
	
		HDP Support
	  	
		7.2.0.3
	
	
		Patch
  159065
		
	
	
		2.1.0*
	
	
		HDP 2.3
	  	7.2.1.1
	none
	2.1.1
	HDP 2.3  Ranger  
	Ranger – Ranger plugin’s except HDFS plugin works. This has
been tested internally in HWX labs and is being tested. HDFS plugin will be
worked in the future releases. Please contact Isilon product team for more
details.  
	Note – WebHDFS port for
Isilon is on on port 8082 and not 50070  High Level Steps  
	Please read the steps carefully as there are special
deviations that is applied to attach the Isilon nodes to the HDP Compute
cluster.  
	There are main steps  
	
 Isilon Zones and IP Pool Configuration 	
 Ambari Install and Configuration 	
 Post Ambari Install Validation   
	    Some key Isilon
concepts  
	
 Isilon is a cluster of  hardware nodes, each node has its own CPU,
Memory and Storage.   Each node has 2 network interfaces – Backend and Frontend.
	 	
 The backend is connected over Infiniband Network
interface
	 	
 The front end is 10 GB Ethernet interface 	
 The Isilon cluster run on OneFS, which is based
on FreeBSD(Unix)
	 	
 Isilon provides Data Protection via mirroring or
Reed Solomon FEC.
	 	
 Isilon has its own Data Management Console     Access Zones  
	
 Access zones, provides a method to logically
partition cluster access and allocate resources to self-contained units,
thereby providing a shared tenant environment.
	 	
 In other words, it allows Isilon OneFs to
segment the cluster configuration and separate the data into multiple
self-contained units with their own sets of authentication providers; user
mapping rules, and SMB shares.
	 	
 A Hadoop/HDP Cluster will connect to a
single Isilon zone.  This is a one to one
mapping.
	 	
 This is part of Isilon Administration.
Please work with your isilon administrator to create the needed isilon zone.
	 	
 Useful video on zone: 	
 https://www.youtube.com/watch?v=hF3W8o-n-Oo 	
 By default, OneFS includes a single access
zone called System
	. You should not use
the System zone for your cluster creation
	.   
	    Prepare Isilon zone  
	Follow the following steps are
needed:  
	
 1.Create a Isilon zone 	
 2.Attach a pool of ip addresses to the zone 	
 3.Assign a working directory to the zone 	
 4.Create the needed users     Create a Zone  
	
 Decide on a Zone Name. Ensure that the new zone
that you want to create does not exist.
	 	
 For the purpose of example we will call the zone
“
	zonehdp”. You can name it to your
organization’s liking. Replace it with the version name that you want to
assign.
	  
 hwxisi1-1# isi zone zones list  
	
 /ifs
	is the default share across the nodes. Create a new directory for your zone
under a directory “isitest”. 	
 isitest is just another hierarchy for the documentation
purpose.
	 hwxisi1-1# mkdir -p /ifs/isitest/zonehdp  
		
 Create the zone 	  	 
	
	
	
	 hwxisi1-1# isi zone zones create --name zonehdp --path /ifs/isitest/zonehdp    Attach a pool of ip addresses to the zone  
		
 Associate an IP address pool with the zone. In this step you are
creating the pool. Get the pool from your Isilon Admin.  In this step replace the pool name, ip address
range and zonename to an appropriate value. 	    Assign a working directory to the zone  
		
 Create the HDFS root directory. This is
usually called 
		hadoop and must be within the access zone
directory. 
		 		
 Set the
HDFS root directory for the access zone
		 		
 Create
an indicator file so that we can easily determine when we are looking your
Isilon cluster via HDFS. 	 
	 hwxisi1-1# isi zone zones create --name zonehdp --path /ifs/isitest/zonehdphwxisi1-1# mkdir -p /ifs/isitest/zonehdp/hadoop
hwxisi1-1# isi zone zones modify zonehdp --hdfs-root-directory /ifs/isitest/ zonehdp/hadoop;
hwxisi1-1# touch /ifs/isitest/zonehdp/hadoop/THIS_IS_ISILON_isitest_zonehdp  
		
 Check the hdfs thread settings and Block Size. If it is not
set, set it using the isilon documentation in the appendix. . This is a one
time activity
		   hwxisi1-1# isi hdfs settings view
Default Block Size: 128M
Default Checksum Type: none
Server Log Level: notice
Server Threads: 256
hwxisi1-1# isi hdfs settings modify --server-threads 256
*** Latest EMC recommendation is to leave this as auto by default and not specify no. of threads
*** Verify using isi hdfs settings view
hwxisi1-1# isi hdfs settings modify --default-block-size 128M
    Create the users and directories  
		
  The scripts can be
downloaded from (Claudio’s github url. EMC Engineering officially supports
this.)  https://github.com/claudiofahey/isilon-hadoop-tools/tree/master/onefs  		
 Extract
the Isilon Hadoop Tools to your Isilon cluster. This can be placed in any
directory under /ifs. It is recommended to use /ifs/
		isitest/scripts.   		
 Execute
the script.
		 	  	
	
	
		
			hwxisi1-1# bash
  /ifs/isitest/scripts/isilon-hadoop-tools/onefs/isilon_create_users.sh --dist
  hwx --startgid 501 --startuid 501 --zone 
			zonehdp
			  
				hwxisi1-1# bash
  /ifs/isitest/scripts/isilon-hadoop-tools/onefs/isilon_create_directories.sh
  --dist hwx --fixperm --zone 
				zonehdp
			    
		Map
the 
		hdfs user to the Isilon superuser. This will allow
the 
		hdfs user to chown (change ownership of) all files
	  	
	
	
		
			hwxisi1-1# isi zone zones modify
  --user-mapping-rules="hdfs=>root" --zone zonehdp
		
	
	
	Permissions to root directory  
		Get the ZoneID from the following
	  isi zone zones view zonehdp	  
		Replace the zoneid in the following
command and execute it.
	  
		isi_run -z <zoneid>  "chown -R hdfs
/ifs/isitest/zonehdp/hadoop"
	    Restart Services  
		The command below will restart the HDFS
service on Isilon to ensure that any cached user mapping rules are flushed.
This will temporarily interrupt any HDFS connections coming from other Hadoop
clusters
	  	
	
	
		
			hwxisi1-1#
			isi services isi_hdfs_d disable ; isi services
  isi_hdfs_d enable
			
			  
				
			  
		Now you have completed the step in Isilon. We will now move
to installing Hortonworks HDP 2.2. on the compute nodes. The Insatalltion will
be performed using Apache Ambari. 1.7
		
	    Install Ambari Server  
		Ambari Server makes installation, configuration, management
and monitoring  of hadoop cluster
simpler. Isilon zones have of Ambari Agent running on the Isilon Cluster.
	  
		Ambari server will be used to deploy HDP 2.3 to setup the
hadoop cluster with HDP. Please follow the Hortonworks Installation Document for
ensuring the pre-requisites for environment match
	  
		http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_Installing_HDP_AMB/content/_download_the_ambari_repo.html
	  
		The below steps are for CentOs 6 environment. Follow the
steps from the Ambari Installation guide.
	  
		1.  Complete the environment pre-requisites mentioned in the install
guide.
	  
		2. Install the Ambari Server packages.
	  	
	
	
		
			[root@hadoopmanager-server-0
  ~]# wget
  -nv 
			http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.1.0/ambari.repo
  -O /etc/yum.repos.d/ambari.repo
			
			  
				[root@hadoopmanager-server-0 ~]# yum install ambari-server
			  
		3. Setup Ambari Server.
	  
		[root@hadoopmanager-server-0
~]# ambari-server setup
	  
		4. Accept all defaults and complete the setup process.
	  
		5. Start the server.
	  
		[root@hadoopmanager-server-0
~]# ambari-server start
	  
		6.Browse to http://<ambari-host>:8080/.
	  
		7.Login using the following account:
	  
		Username: admin
	  
		Password: admin
	    Deploy a Hortonworks Hadoop Cluster with Isilon for
HDFS
	  
		You will deploy Hortonworks HDP Hadoop using the standard
process defined by Hortonworks. Ambari Server allows for the immediate usage of
an Isilon cluster for all HDFS services (NameNode and DataNode), no
reconfiguration will be necessary once the HDP install is completed.
	  
		1.Configure the Ambari Agent on Isilon.
	  isiloncluster1-1# isi zone zones modify zonehdp --hdfs-ambari-namenode \ <smartconnectip/ipfrom ip pool>
isiloncluster1-1# isi zone zones modify zonehdp --hdfs-ambari-server <hostname/ip
of the ambari server>
		  
		2.Login to Ambari Server.
	  
		3.Welcome: Specify the name of
your cluster 
		mycluster1.
	  
		4.Select Stack: Select the HDP 2.3
stack.
	  Install Option:  
		Ambari Agent is already installed with
Isilon OneFS. There are 2 ways of doing the following step. You can install the
Ambari Agent on the compute nodes, then you do not need to go back register the
Isilon host separately.
	  
		In the below steps you are installing
the agent using Ambari UI wizard, and that is the reason you are going back to
register the Agent.
	  
		Note
	  
		You will register your
hosts with Ambari in two steps. First you will deploy the Ambari agent to your
Linux hosts that will run HDP. Then you will go 
		back one step
and add Isilon.
		
	  
		1. Specify your Linux hosts for the compute nodes that will run HDP
master components and slave components for your HDP cluster installation in the
Target Hosts text box.
	  
		Put in
the ssh key  
		Click the Next button to deploy the Ambari
Agent to your Linux hosts and register them.
	  
		2.Once the
Ambari Agent has been deployed and registered on your Linux hosts, click
the 
		Back button.
	  
		Now you
will add the 
		SmartConnect address of
the Isilon cluster (mycluster1-hdfs.lab.example.com) to your list of target
hosts.
	  
		Check
the box to "Perform manual registration on hosts and do not use SSH."
	  
		Click the Next button. You should see that
Ambari agents on all hosts, including your Linux hosts and Isilon, become
registered.
	  
		If SmartConnect is not
available pick one IP Address from the IP Address pool.
		
	  
		5.Choose Services:
	  
		Select all the services.
	  
		6.Assign Masters:
	  
		Assign
NameNode and SNameNode components to the Isilon SmartConnect address.
	  
		ZooKeeper
should be installed on mycluster1-master-0 and any two workers.
	  
		All other master components can be assigned to the master Node or Compute Nodes.
	  
		7.Assign Slaves and Clients: 
	  
		Assign Data Node to the SmartConnect Isilon Node.
	  
		The rest to the compute Nodes.
	  
		8.Customize Services:
	  
		
 Change the Webhdfs port
from 50070 to 8082. 
		 		
 Assign passwords to Hive,
Oozie, and any other selected services that require them.
		 		
 Check that all local data
directories are within /data/1, /data/2, etc. The following settings should be
checked.
		 	  
		
  
			
  
				
 YARN
       Node Manager log-dirs
				 				
 YARN
       Node Manager local-dirs
				 				
 HBase
       local directory
				 				
 ZooKeeper
       directory
				 				
 Oozie
       Data Dir
				 				
 Storm
       storm.local.dir
				 			  
		In YARN, set yarn.timeline-service.store-class to org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.
	  
		9.Review: Carefully review
your configuration and then click Deploy.
	  
		10.After a successful installation, Ambari will start and test all
of the selected services.  Sometime it
may fail for the first time around. You may need to retry couple of times. Review
the Install, Start and Test page for any warnings or errors. It is recommended
to correct any warnings or errors before continuing.
	  
		
	  
		
	    Adding a Hadoop User  
		You must add a user account for each Linux user that will submit
MapReduce jobs. The procedure below can be used to add a user named hduser1.
	  
		Warning
	  
		The steps below will create local user and group accounts on
your Isilon cluster. If you are using a directory service such as Active
Directory, and you want these users and groups to be defined in your directory
service, then DO NOT run these steps. Instead, refer to the OneFS documentation
and 
		EMC Isilon Best Practices for Hadoop Data Storage.
	  
		1.Add user to Isilon.
	  
		isiloncluster1-1# isi auth groups create hduser1
--zone zone1 \ --provider local
		 isiloncluster1-1
	  
		# isi auth users create hduser1 --primary-group
hduser1 \ --zone zone1 --provider local \ --home-directory
/ifs/isiloncluster1/zone1/hadoop/user/hduser1
		
	  
		2.Add user to Hadoop nodes. Usually, this only needs to be
performed on the master-0 node.
	  
		[root@mycluster1-master-0 ~]# adduser hduser1
	  
		3.Create the user's home directory on HDFS. In the below command
you sudo as 
		hdfs and then executing
the “
		hdfs” command.
	  
		[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-mkdir -p /user/hduser1
		
	  
		[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-chown hduser1:hduser1 \ /user/hduser1
		
	  
		[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-chmod 755 /user/hduser1
		
	    Validation  
		
	    Ambari Service Check  
		Ambari has built-in functional tests for each component. These
are executed automatically when you install your cluster with Ambari. To
execute them after installation, select the service in Ambari, click the 
		Service
Actions
		 button, and select Run Service Check.
	  
		
	  
		
	    Functional Tests  
		The tests below should be performed to ensure a proper
installation. Perform the tests in the order shown.
	  
		You must create the Hadoop user hduser1 before
proceeding.
	  
		HDFS
	  
		[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-ls /
		 Found 5 items -rw-r--r--  1 root 
hadoop 
0 2014-08-05 05:59 /THIS_IS_ISILON drwxr-xr-x  - hbase
hbase 
148 2014-08-05 06:06 /hbase drwxrwxr-x  - solr 
solr 
0 2014-08-05 06:07 /solr drwxrwxrwt  - hdfs  supergroup 
107 2014-08-05 06:07 /tmp drwxr-xr-x  - hdfs 
supergroup  184 2014-08-05 06:07 /user
	  
		[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-put -f /etc/hosts /tmp
		
	  
		[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-cat /tmp/hosts
		 127.0.0.1 localhost
	  
		[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs
-rm -skipTrash /tmp/hosts
		
	  
		[root@mycluster1-master-0 ~]# su - hduser1
	  
		[hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls /
		Found 5 items -rw-r--r--  1 root 
hadoop 
 0 2014-08-05 05:59 /THIS_IS_ISILON
drwxr-xr-x  - hbase hbase 
148 2014-08-05 06:28 /hbase drwxrwxr-x  - solr 
solr 
0 2014-08-05 06:07 /solr drwxrwxrwt  - hdfs  supergroup 
107 2014-08-05 06:07 /tmp drwxr-xr-x  - hdfs 
supergroup  209 2014-08-05 06:39 /user
	  
		[hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls ...
	  
		
	  
		
	    YARN / MapReduce  
		[hduser1@mycluster1-master-0 ~]$ hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
		\ pi 10 1000 ... Estimated value of Pi is 3.14000000000000000000
	  
		[hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir in
	  
		You can put any file into the in directory. It
will be used the datasource for subsequent tests.
	  
		[hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f
/etc/hosts in
		
	  
		[hduser1@mycluster1-master-0 ~]$ hadoop fs -ls in
		... [hduser1@mycluster1-master-0 ~]$ hadoop fs -rm -r out
	  
		[hduser1@mycluster1-master-0 ~]$ hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
		\ wordcount in out ... [hduser1@mycluster1-master-0 ~]$ hadoop fs -ls
out
		 Found 4 items -rw-r--r--  1 hduser1
hduser1  0 2014-08-05
06:44 out/_SUCCESS -rw-r--r--  1 hduser1
hduser1  24 2014-08-05 06:44 out/part-r-00000
-rw-r--r--  1 hduser1
hduser1  0 2014-08-05
06:44 out/part-r-00001 -rw-r--r--  1 hduser1
hduser1   0 2014-08-05
06:44 out/part-r-00002
	  
		[hduser1@mycluster1-master-0 ~]$ hadoop fs -cat
out/part*
		 localhost  1
127.0.0.1  1
	  
		Browse to the YARN Resource Manager GUI http://mycluster1-master-0.lab.example.com:8088/.
	  
		Browse to the MapReduce History Server GUI http://mycluster1-master-0.lab.example.com:19888/. In particular, confirm that you can view the complete logs for
task attempts.
	  
		
	  
		
	    Hive  
		[hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir -p
sample_data/tab1
		
	  
		[hduser1@mycluster1-master-0 ~]$ cat - > tab1.csv
1,true,123.123,2012-10-24 08:55:00 2,false,1243.5,2012-10-25 13:40:00
3,false,24453.325,2008-08-22 09:33:21.123 4,false,243423.325,2007-05-12
22:32:21.33454 5,true,243.325,1953-04-22 09:11:33
		
	  
		[hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f
tab1.csv sample_data/tab1
		
	  
		[hduser1@mycluster1-master-0 ~]$ hive
	  
		hive> DROP TABLE IF EXISTS tab1; CREATE EXTERNAL
TABLE tab1 (   id INT,   col_1 BOOLEAN,  
col_2 DOUBLE,   col_3 TIMESTAMP ) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' LOCATION '/user/hduser1/sample_data/tab1';  DROP TABLE IF EXISTS tab2;  CREATE TABLE tab2 (   id INT,
  col_1 BOOLEAN,   col_2 DOUBLE,   month INT,
  day INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';  INSERT OVERWRITE TABLE tab2 SELECT id, col_1,
col_2, MONTH(col_3), DAYOFMONTH(col_3) FROM tab1 WHERE YEAR(col_3) = 2012;
		
	  
		OK Time taken: 28.256 seconds
	  
		hive> show tables;
	  
		OK tab1 tab2 Time taken: 0.889 seconds, Fetched: 2
row(s)
	  
		hive> select * from tab1;
	  
		OK
	  
		1  true 
123.123  2012-10-24 08:55:00
2  false  1243.5  2012-10-25 13:40:00
3  false  24453.325 
2008-08-22 09:33:21.123 4  false  243423.325 
2007-05-12 22:32:21.33454 5  true 
243.325  1953-04-22 09:11:33 Time taken:
1.083 seconds, Fetched: 5 row(s)
	  
		hive> select * from tab2; OK
1  true 
123.123  10  24
2  false  1243.5  10  25 Time
taken: 0.094 seconds, Fetched: 2 row(s)
	  
		hive> select * from tab1 where id=1; OK
1  true 
123.123  2012-10-24 08:55:00 Time taken:
15.083 seconds, Fetched: 1 row(s) 
hive> 
		select * from tab2 where id=1;
	  
		OK 1  true 
123.123  10  24 Time
taken: 13.094 seconds, Fetched: 1 row(s)
	  
		hive> exit;
	  
		
	    Pig  
		[hduser1@mycluster1-master-0 ~]$ pig
	  
		grunt> a = load 'in';
	  
		grunt> dump a; ... Success! ...
	  
		grunt> quit;
	  
		
	    HBase  
		[hduser1@mycluster1-master-0 ~]$ hbase shell
	  
		hbase(main):001:0> create 'test', 'cf' 0
row(s) in 3.3680 seconds => Hbase::Table - test
	  
		hbase(main):002:0> list 'test' TABLE test 1
row(s) in 0.0210 seconds => ["test"]
	  
		hbase(main):003:0> put 'test', 'row1', 'cf:a',
'value1'
		 0 row(s) in 0.1320 seconds
	  
		hbase(main):004:0> put 'test', 'row2', 'cf:b',
'value2'
		 0 row(s) in 0.0120 seconds
	  
		hbase(main):005:0> scan 'test'
		ROW 
COLUMN+CELL
 row1 
column=cf:a,timestamp=1407542488028,value=value1
 row2  column=cf:b,timestamp=1407542499562,value=value2
2 row(s) in 0.0510 seconds
	  
		hbase(main):006:0> get 'test', 'row1'
		COLUMN CELL
 cf:a 
timestamp=1407542488028,value=value1 1 row(s) in 0.0240 seconds
hbase(main):007:0> 
		quit
	  
		
	  
		
	    Use Case - Searching Wikipedia  
		One of the many unique features of Isilon is its multi-protocol
support. This allows you, for instance, to write a file using SMB (Windows) or
NFS (Linux/Unix) and then read it using HDFS to perform Hadoop analytics on it.
	  
		In this section, we exercise this capability to download the
entire Wikipedia database (excluding media) using your favorite browser to
Isilon. As soon as the download completes, we'll run a Hadoop grep to search
the entire text of Wikipedia using our Hadoop cluster. As this search doesn't
rely on a word index, your regular expression can be as complicated as you
like.
	  
		1.First, let's connect your client (with your favorite web
browser) to your Isilon cluster.
	  
		1.If you are using a Windows host or other SMB client:
	  
		1.Click Start -> Run.
	  
		2.Enter: \\<Isilon Host>\ifs
	  
		3.You may authenticate
as 
		root with your Isilon root password.
	  
		4.Browse to \ifs\isiloncluster1\zone1\hadoop\tmp.
	  
		5.Create a directory here
called 
		wikidata. This is where you will download the Wikipedia data
to.
	  
		2.If you are using a Linux host or other NFS client:
	  
		1.Mount your NFS export.
	  
		[root@workstation ~]$ mkdir /mnt/isiloncluster1
		[root@workstation ~]$ echo \
subnet0-pool0.isiloncluster1.lab.example.com:/ifs \ /mnt/isiloncluster1 nfs \ nolock,nfsvers=3,tcp,rw,hard,intr,timeo=600,retrans=2,rsize=131072,wsize=524288
\ >> /etc/fstab
		
	  
		[root@workstation ~]$ mount -a
	  
		[root@workstation ~]$ mkdir -p \
/mnt/isiloncluser1/isiloncluster1/zone1/hadoop/tmp/wikidata
		
	  
		2.On Mac
	  
		How to create an NFS Mount
	  
		http://support.apple.com/kb/TA22243
	  
		3.On our favorite web browser and go to http://dumps.wikimedia.org/enwiki/latest.
	  
		  
	  
		4.Locate the file enwiki-latest-pages-articles.xml.bz2 and
download it directly to the 
		wikidata folder on Isilon. Your
web browser will be writing this file to the Isilon file system using SMB or
NFS.
	  
		Note
	  
		This file is approximately 10 GB in size and contains the entire
text of the English version of Wikipedia. If this is too large, you may want to
download one of the smaller files such as
		enwiki-latest-all-titles.gz.
	  
		5.Now let's run the Hadoop grep job. We'll search for all two-word
phrases that begin with 
		EMC.
	  
		[hduser1@mycluster1-master-0 ~]$ hadoop fs -ls
/tmp/wikidata
		
	  
		[hduser1@mycluster1-master-0 ~]$ hadoop fs -rm -r
/tmp/wikigrep
		
	  
		[hduser1@mycluster1-master-0 ~]$ hadoop jar \ /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
		\ grep /tmp/wikidata /tmp/wikigrep "EMC [^ ]*"
	  
		6.When the the job completes, use your favorite text file viewer
to view the output file
		/tmp/wikigrep/part-r-00000. You may open the file
in a text editor from the NFS Mount
	  
		
	 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		01-13-2016
	
		
		03:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @hrongali
   Isilon is designed to handle mult protocol workload. The namenode is installed by default. You would have to asses how much of CPU is being used today and what workload you are bringing on.  Make sure it is X410 model of the Isilon that they are using. You can use other models, but it is for experimental purpose only. At the worst case they may need to add more Isilon nodes to the architecture incase there is more workload. Get an Isilon SE involved in the Isilon sizing as they have incentive to sell more hardware. A good use case would be to run some TPC DS work load with Hive Tez and see how it behaves  Also make sure that the compute nodes are co-located with the Isilon nodes, as the compute and storage will be pretty chatty. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-13-2016
	
		
		03:02 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @Amit Jain - As mentioned by @bsaini HDFS is not officially part of Atlas current roadmap. It will be good to raise and jira and get votes for it. It is the only way to push this.  While this happens as a community member you can always write your own types. Remember, Atlas has this open API to create your own type system to model anything you want.  I have created a small utility based on this, called Atlas CLI.  https://github.com/shivajid/atlas/tree/master/codesamples/atlas  A good code examples that the developers always work with is the QuickStart.java  https://github.com/apache/incubator-atlas/blob/master/webapp/src/main/java/org/apache/atlas/examples/QuickStart.java  IHTH 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-13-2016
	
		
		05:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 Prerequisites:  
 HDP 2.x  OneFS 7.2  Ambari should report all green for all hosts in
cluster  Forward and reverse hostnames or smart connect
should be configured   Kerberos Requirements:  
 All KDC’s need to have different realm name  One KDC per zone  Disable AES encryption in client krb5.conf  Deleting principals from Isilon doesn’t remove
them from kdc  Don’t use the isi auth krb5 spn fix command   Overview:  Following these steps in the order below will accomplish
these tasks:  
 KDC Setup: install and configure  Hadoop Client Setup: Kerberos configured and
tested  Secure Isilon Setup: configure, create
principals and set proxyusers  Finish Hadoop Client Setup: Create all necessary
principals, place keytabs on the correct hosts and start services  Finish Hadoop Client Setup: kerberos_only
configuration   KDC Setup: Configure KDC: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14-1-2.html below is an overview of the
     steps taken in the link provided  Follow section 13: 1.2 through 13: 1.4  Modify kdc.conf for supported
encryption type    supported_enctypes = RC4-HMAC:normal
DES-CBC-MD5:normal DES-CBC-CRC:normal      Modify kdc.conf to have corrected
Realm Name.  Update kadm5.acl to reflect the new
Realm Name  Create kdc  /usr/sbin/kdb5 util_create –s  /etc/rc.d/init.d/krb5kdc start  /etc/rc.d/init.d/kadmin start     Hadoop Client Setup:  
 Run
this on all hadoop clients yum install
krb5-workstation krb5-libs  Modify and copy
the krb5.conf from the the kdc to all the clients  Update Realm Name, kdc and admin
server  Make sure to update the default realm    Update encryption to not include AES  default_tgs_enctypes = RC4-HMAC
DES-CBC-MD5 DES-CBC-CRC  default_tkt_enctypes = RC4-HMAC
DES-CBC-MD5 DES-CBC-CRC  permitted_enctypes = RC4-HMAC
DES-CBC-MD5 DES-CBC-CRC  preferred_enctypes = RC4-HMAC
DES-CBC-MD5 DES-CBC-CRC       
 Now
you can test using kinit from the clients and it should work  kinit kadmin/admin   Secure Isilon Setup:  
 To
prevent auto spn generation in the system zone you need to set ‘All Auth
Providers’ setting on the system zone to ‘No’.  isi zone zones modify --zone=system --all-auth-providers=No    Add
the KDC to the Isilon cluster and each KDC needs a unique name  isi auth krb5 create --realm=EXAMPLE.COM
--admin-server=kdc.example.com --kdc=kdc.example.com --user=kadmin/admin --password=isi    To
verify the join and list all the auth providers for the cluster  isi auth status    Modify
zone to use authenticaion provider  isi zone zones modify --zone=zone-example
--add-auth-provider=krb5:EXAMPLE.COM    Verify  isi zone zones view --zone=zone-example    Create
the Isilon spn’s for the zone. The format needs to be hdfs/<cluster
hostname/SC name>@REALM and HTTP/<cluster hostname/SC name>@REALM  isi auth krb5 spn create --provider-name=EXAMPLE.COM
--spn=hdfs/cluster.example.com@EXAMPLE.COM
--user=kadmin/admin --password=isi  isi auth krb5 spn create --provider-name=EXAMPLE.COM
--spn=HTTP/cluster.example.com@EXAMPLE.COM
--user=kadmin/admin --password=isi    Verify
spn creation  isi auth krb5 spn list --provider-name=EXAMPLE.COM    Lastly
create proxy users  isi hdfs proxyusers create oozie --zone=zone-example
--add-user=ambari-qa  isi hdfs proxyusers create hive --zone=zone-example
--add-user=ambari-qa     Finish Hadoop Client Setup:  
 Enter
the Ambari secure setup wizard  Admin
-> security -> enable security    Click
through the wizard untill you get to the screen that configures the principals.
Note: Isilon does not convert principal
names to short names using rules so don’t use aliases(e.g. rm instead of yarn)  Realm
name  Hdfs
-> namenode hdfs/cluster.example.com@EXAMPLE.COM  Hdfs
-> secondarynamenode hdfs/cluster.example.com@EXAMPLE.COM  Hdfs
-> datanode hdfs/cluster.example.com@EXAMPLE.COM  Falcon
-> namenode hdfs/cluster.example.com@EXAMPLE.COM  Yarn
-> resourceManager yarn/_HOST  Yarn
-> nodemanager yarn/_HOST  Mapreduce2
-> history server principal -> mapred/_HOST  DFS Web Principal -> HTTP/cluster.example.com@EXAMPLE.COM    Now
download the csv and copy it to the ambari server  On
the server put the file in /var/lib/ambari-server/resources/scripts/    On
the ambari server go to that path and run the keytabs.sh script  ./keytabs.sh
host-principal-keytab-list.csv > generate_keytabs.sh    In
the generate_keytabs.sh script that was just generated you need to comment out
all the lines that have to do with principals being created for hdfs or for the
Isilon Cluster. This script is going to generate all the principals for the
hadoop services but Isilon principals are already created by the Isilon cluster
so there is no need to create them again. Doing so will cause secure cluster to
not authenticate properly.  Finally
execute the ./generate_keytabs.sh and this will create all the principals for the
hadoop services and export a keytab for every host in the cluster.  Copy
the keytab tar files created to the clients and extract them in the proper
location.  Finish
the wizard install   Finish Secure Isilon Setup:  
 After
everything has finished installing you need to configure the Isilon zone to
only allow secure connections.  isi zone zones modify
--zone=zone-example --hdfs-authentication=kerberos_only    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-13-2016
	
		
		02:47 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Bosco, CloudSolrClient will return an LBHTTPClient (which load balances across the nodes). But I do not see that LBHTTPClient is multithreaded. So, the question begs, which has a higher throughput? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-13-2016
	
		
		12:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 We have a customer that needs to update few billion documents to SolrCloud. I know the suggested way of using is SolrCloudClient, for its load balancing feature.   As per docs - CloudSolrClient  SolrJ client class to communicate with SolrCloud. Instances of this class communicate with Zookeeper to discover Solr endpoints for SolrCloud collections, and then use the  LBHttpSolrClient  to issue requests. This class assumes the id field for your documents is called 'id' - if this is not the case, you must set the right name with  setIdField(String) .  As per the docs - ConcurrentUpdateSolrClient  
ConcurrentUpdateSolrClient buffers all added documents and writes them into open HTTP connections. This class is thread safe. Params from  UpdateRequest  are converted to http request parameters. When params change between UpdateRequests a new HTTP request is started. Although any SolrClient request can be made with this implementation, it is only recommended to use ConcurrentUpdateSolrClient with /update requests. The class  HttpSolrClient  is better suited for the query interface.  Now since with ConcurrentUdateSolrClient I am able to use a queue and a pool of threads, which makes it more attractive to use over CloudSolrClient which will use a HTTPSolrClient once it gets a set of nodes to do the updates.  I would love to hear more in depth discussion on these 2 APIs.   Thanks  Shivaji 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Solr
 
- « Previous
 - Next »