Member since 
    
	
		
		
		09-17-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                436
            
            
                Posts
            
        
                736
            
            
                Kudos Received
            
        
                81
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 5082 | 01-14-2017 01:52 AM | |
| 7348 | 12-07-2016 06:41 PM | |
| 8705 | 11-02-2016 06:56 PM | |
| 2806 | 10-19-2016 08:10 PM | |
| 7067 | 10-19-2016 08:05 AM | 
			
    
	
		
		
		10-11-2015
	
		
		02:01 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I believe it should work for 2.3 as well (dropping the ELASTICSEARCH dir into 2.3 resources die). @smishra@hortonworks.com has it been tested? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-10-2015
	
		
		12:08 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 I think this would be the numeric twitter identification number given to each user  https://dev.twitter.com/rest/reference/get/users/lookup 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-09-2015
	
		
		09:24 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You could try the Ambari service for ES too:  https://hortonworks-gallery.github.io/index.html?sort=asc&filter=ambari%20extensions 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-09-2015
	
		
		01:37 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		24 Kudos
		
	
				
		
	
		
					
							 Hbase indexing to Solr with HDP Search in HDP 2.3  
 Background:   The HBase Indexer provides the ability to stream events from HBase to Solr for near real time searching. The HBase indexer is included with HDPSearch as an additional service. The indexer works by acting as an HBase replication sink. As updates are written to HBase, the events are asynchronously replicated to the HBase Indexer processes, which in turn creates Solr documents and pushes them to Solr.  
 References:  https://doc.lucidworks.com/lucidworks-hdpsearch/2.3/Guide-Jobs.html#_hbase-indexer  https://github.com/NGDATA/hbase-indexer/wiki/Tutorial     Steps  
 Download and start HDP 2.3 sandbox VM which comes with LW HDP search installed (under /opt/lucidworks-hdpsearch) and run below to ensure no log files owned by root remain   chown -R solr:solr /opt/lucidworks-hdpsearch/solr  If running on an Ambari installed HDP 2.3 cluster (instead of sandbox), run the below to install HDPsearch and setup the user dir in HDFS:  yum install -y lucidworks-hdpsearch
sudo -u hdfs hadoop fs -mkdir /user/solr
sudo -u hdfs hadoop fs -chown solr /user/solr  Point Solr to Zookeeper by configuring hbase-indexer-site.xml  vi /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase-indexer-site.xml
<?xml version="1.0"?>
<configuration>
   <property>
      <name>hbaseindexer.zookeeper.connectstring</name>
      <value>sandbox.hortonworks.com:2181</value>
   </property>
  <property>
     <name>hbase.zookeeper.quorum</name>
     <value>sandbox.hortonworks.com</value>
   </property>
</configuration>
  
 In Ambari > HBase > Configs > Custom hbase-site add the below properties, but do not restart HBase just yet:   hbase.replication=true
replication.source.ratio=1.0
replication.source.nb.capacity=1000
replication.replicationsource.implementation=com.ngdata.sep.impl.SepReplicationSource
  
 Copy Solrs Hbase related libs to $HBASE_HOME/lib   cp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep* /usr/hdp/current/hbase-master/lib/
  
 Restart Hbase  Copy hbase-site.xml to hbase-indexer's conf dir   cp /etc/hbase/conf/hbase-site.xml /opt/lucidworks-hdpsearch/hbase-indexer/conf/
  
 Start Solr in cloud mode (pointing to ZK)   cd /opt/lucidworks-hdpsearch/solr
bin/solr start -c -z sandbox.hortonworks.com:2181
  
 Create collection     bin/solr create -c hbaseCollection \
     -d data_driven_schema_configs \
     -n myCollConfigs \
     -s 2 \
     -rf 2 
  
 Start Hbase indexer   cd /opt/lucidworks-hdpsearch/hbase-indexer/bin/
./hbase-indexer server
  
 In a second terminal, create table to be indexed in HBase. Open  hbase shell  and run below to create a table named "indexdemo-user", with a single column family named "info". Note that the REPLICATION_SCOPE of the column family of the table must be set to 1.:   create 'indexdemo-user', { NAME => 'info', REPLICATION_SCOPE => '1' }
!quit
  
 Now we'll create an indexer that will index the the indexdemo-user table as its contents are updated.   vi /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml
<?xml version="1.0"?>
<indexer table="indexdemo-user">
  <field name="firstname_s" value="info:firstname"/>
  <field name="lastname_s" value="info:lastname"/>
  <field name="age_i" value="info:age" type="int"/>
</indexer>
  
 The above file defines three pieces of information that will be used for indexing, how to interpret them, and how they will be stored in Solr.   
 Next, create an indexer based on the created indexer xml file.   /opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer add-indexer -n hbaseindexer -c /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml  -cp solr.zk=sandbox.hortonworks.com:2181 -cp solr.collection=hbaseCollection 
  
 Check it got created   /opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer list-indexers
  
 Check that the index server output shows below   INFO supervisor.IndexerSupervisor: Started indexer for hbaseindexer
  
 Log back in the  hbase shell  try adding some data to the indexdemo-user table   hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'John'
hbase> put 'indexdemo-user', 'row1', 'info:lastname', 'Smith'
  
 Run commit   curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
  
 Open Solr UI and notice under statistics the "Num Docs" has increased: http://sandbox.hortonworks.com:8983/solr/#/hbaseCollection_shard1_replica1  Run query using Solr REST API: http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&indent=true   Now try updating the data you've just added in  hbase shell  and commit  hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'Jim'
 
 curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
  
 Check the content in Solr:  http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&indent=true  Note that the document's firstname_s field now contains the string "Jim".  Finally, delete the row from HBase and commit   hbase> deleteall 'indexdemo-user', 'row1'
 
 curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
  
 Check the content in Solr and notice that the document has been removedhttp://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&indent=true  You have successfully setup Hbase indexing with HDP search  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		10-08-2015
	
		
		07:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I believe this would currently be through Hive views 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-08-2015
	
		
		07:10 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		9 Kudos
		
	
				
		
	
		
					
							 Lab Overview  In this lab, we will learn to:  
 Configure Solr to store indexes in HDFS  Create a solr cluster of 2 solr instances running on port 8983 and 8984  Index documents in HDFS using the Hadoop connectors  Use Solr to search documents   Pre-Requisite  
 The lab is designed for the HDP Sandbox. Download the HDP Sandbox here, import into VMWare Fusion and start the VM   LAB  Step 1 - Log into Sandbox  
 After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g.   192.168.191.241 sandbox.hortonworks.com sandbox    
  
 Connect to the VM via SSH (root/hadoop), correct the /etc/hosts entry   ssh root@sandbox.hortonworks.com
  
 If running on an Ambari installed HDP 2.3 cluster (instead of sandbox), run the below to install HDPsearch   yum install -y lucidworks-hdpsearch
sudo -u hdfs hadoop fs -mkdir /user/solr
sudo -u hdfs hadoop fs -chown solr /user/solr
  
 If running on HDP 2.3 sandbox, run below   chown -R solr:solr /opt/lucidworks-hdpsearch
  
 Run remaining steps as solr   su solr
  Step 2 - Configure Solr to store index files in HDFS  
 For the lab, we will use schemaless configuration that ships with Solr
 
 Schemaless configuration is a set of SOLR features that allow one to index documents without pre-specifying the schema of indexed documents  Sample schemaless configruation can be found in the directory /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs    Let's create a copy of the sample schemaless configuration and modify it to store indexes in HDFS cp -R /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs  /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs 
 
  Open  /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf/solrconfig.xml  in your favorite editor and make the following changes:   1- Replace the section:                  <directoryFactory name="DirectoryFactory"
               >
                </directoryFactory>
  with              <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
                <str name="solr.hdfs.home">hdfs://sandbox.hortonworks.com/user/solr</str>
                <bool name="solr.hdfs.blockcache.enabled">true</bool>
                <int name="solr.hdfs.blockcache.slab.count">1</int>
                <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool>
                <int name="solr.hdfs.blockcache.blocksperbank">16384</int>
                <bool name="solr.hdfs.blockcache.read.enabled">true</bool>
                <bool name="solr.hdfs.blockcache.write.enabled">false</bool>
                <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
                <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
                <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
            </directoryFactory>
  2- set locktype to  <lockType>hdfs</lockType>
  3- Save and exit the file  Step 3 - Start 2 Solr instances in solrcloud mode  mkdir -p ~/solr-cores/core1
mkdir -p ~/solr-cores/core2
cp /opt/lucidworks-hdpsearch/solr/server/solr/solr.xml ~/solr-cores/core1
cp /opt/lucidworks-hdpsearch/solr/server/solr/solr.xml ~/solr-cores/core2
#you may need to set JAVA_HOME
#export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64
/opt/lucidworks-hdpsearch/solr/bin/solr  start -cloud -p 8983 -z sandbox.hortonworks.com:2181 -s ~/solr-cores/core1
/opt/lucidworks-hdpsearch/solr/bin/solr  restart -cloud -p 8984 -z sandbox.hortonworks.com:2181 -s ~/solr-cores/core2
  Step 4 - Create a Solr Collection named "labs" with 2 shards and a replication factor of 2  /opt/lucidworks-hdpsearch/solr/bin/solr create -c labs -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf -n labs -s 2 -rf 2
  Step 5 - Validate that the labs collection got created  
 Using the browser, visit http://sandbox.hortonworks.com:8983/solr/#/~cloud. You should see the labs collection with 2 shards, each with a replication factor of 2.       Step 6 - Load documents to HDFS  
 Upload sample csv file to hdfs. We will index the file with Solr using the Solr Hadoop connectors   hadoop fs -mkdir -p csv
hadoop fs -put /opt/lucidworks-hdpsearch/solr/example/exampledocs/books.csv csv/
  Step 7 - Index documents with Solr using Solr Hadoop Connector  hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter="," -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c labs -i csv/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk localhost:2181
  Step 8 - Search indexed documents  
 Search the indexed documents. Using the browser, visit the urlhttp://sandbox.hortonworks.com:8984/solr/labs/select?q=*:*  You will see search results like below     Step 9 - Lab Complete  
 You have sucessfully completed the lab and learnt how to:
 
 Store Solr indexes in HDFS  Create a Solr Cluster  Index documents in HDFS using Solr Hadoop connectors    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		10-08-2015
	
		
		06:35 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		18 Kudos
		
	
				
		
	
		
					
							 There have been a number of questions recently on using AD/IPA with HDP 2.3 security:   How to setup Active Directory/IPA?  How to setup cluster OS to recognize users from AD using SSSD?  How to enable kerberos for authentication?  How to install Ranger for authorization/audit and setup plugins for HDFS, Hive, HBase, Kafka, Storm, Yarn, Knox and test these components on kerborized cluster?  How to sync Ranger user/group sync with AD/IPA?  How to integrate Knox with AD/IPA?  How to setup encryption at rest with Ranger KMS?   To help answer some of these questions, the partner team have prepared cheatsheets on security workshops. These are living materials with sample code snippets which are being updated/enhanced per the feedback from the field so rather than replicate the materials here, the latest materials can be referenced at the GitHub repo linked from here:    https://community.hortonworks.com/repos/4465/workshops-on-how-to-setup-security-on-hadoop-using.html  To help get started with security, we have also made available secured sandbox and LDAP VMs after running through above steps.  Note that these are unofficial and for the final word on security with HDP, the official docs should be referenced at: http://docs.hortonworks.com. For example:   http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_Ambari_Security_Guide/content/ch_amb_sec_guide.html  http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Ranger_Install_Guide/content/ch_overview_ranger_ambari_install.html  http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Ranger_KMS_Admin_Guide/content/ch_ranger_kms_overview.html   For help with the workshop materials please use GitHub issues:   https://github.com/abajwa-hw/security-workshops/issues 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		10-08-2015
	
		
		05:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 There are steps and code for a working Kafka to Storm to HBase example on HDP 2.3 at the 3-part tutorial series here which may help:  http://hortonworks.com/hadoop-tutorial/simulating-transporting-realtime-events-stream-apache-kafka/  http://hortonworks.com/hadoop-tutorial/ingesting-processing-real-time-events-apache-storm/  http://hortonworks.com/hadoop-tutorial/real-time-data-ingestion-hbase-hive-using-storm-bolt/  In the sample code provided above, the hbase-site.xml was packaged into the uber jar by adding the below in the pom.xml       <resources>
      <resource>
        <directory>/etc/hbase/conf</directory>
        <includes>
          <include>hbase-site.xml</include>
        </includes> 
      </resource>
      <resource>
        <directory>/opt/TruckEvents/Tutorials-master/src/main/resources</directory>
      </resource>      
    </resources>
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-08-2015
	
		
		04:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Note that you would need to add this user to the list of sudoers first which the documentation hadn't mentioned. I ran into the same while building the ambari service. See https://issues.apache.org/jira/browse/NIFI-930 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-05-2015
	
		
		05:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This option (dropping jars in /usr/hdp/current/hive-server2/auxlib) may not work for them because they have about 800 jars, they in turn load shared libs.The way they currently manage this is by using an uber jar whose manifest’s Class-Path entry has references to relative paths of our jars.The relative paths work because the uber jar resides in one of their own installation directories,which wont happen when the uber jar is in cluster’s installation directory.Copying so many jars to cluster installation will be impractical for admins of joint customers.Is there no way to use ADD JAR or set  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













