Member since
09-17-2015
436
Posts
736
Kudos Received
81
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3754 | 01-14-2017 01:52 AM | |
5675 | 12-07-2016 06:41 PM | |
6512 | 11-02-2016 06:56 PM | |
2147 | 10-19-2016 08:10 PM | |
5627 | 10-19-2016 08:05 AM |
10-11-2015
02:01 AM
I believe it should work for 2.3 as well (dropping the ELASTICSEARCH dir into 2.3 resources die). @smishra@hortonworks.com has it been tested?
... View more
10-10-2015
12:08 AM
3 Kudos
I think this would be the numeric twitter identification number given to each user https://dev.twitter.com/rest/reference/get/users/lookup
... View more
10-09-2015
09:24 PM
You could try the Ambari service for ES too: https://hortonworks-gallery.github.io/index.html?sort=asc&filter=ambari%20extensions
... View more
10-09-2015
01:37 AM
24 Kudos
Hbase indexing to Solr with HDP Search in HDP 2.3
Background: The HBase Indexer provides the ability to stream events from HBase to Solr for near real time searching. The HBase indexer is included with HDPSearch as an additional service. The indexer works by acting as an HBase replication sink. As updates are written to HBase, the events are asynchronously replicated to the HBase Indexer processes, which in turn creates Solr documents and pushes them to Solr.
References: https://doc.lucidworks.com/lucidworks-hdpsearch/2.3/Guide-Jobs.html#_hbase-indexer https://github.com/NGDATA/hbase-indexer/wiki/Tutorial Steps
Download and start HDP 2.3 sandbox VM which comes with LW HDP search installed (under /opt/lucidworks-hdpsearch) and run below to ensure no log files owned by root remain chown -R solr:solr /opt/lucidworks-hdpsearch/solr If running on an Ambari installed HDP 2.3 cluster (instead of sandbox), run the below to install HDPsearch and setup the user dir in HDFS: yum install -y lucidworks-hdpsearch
sudo -u hdfs hadoop fs -mkdir /user/solr
sudo -u hdfs hadoop fs -chown solr /user/solr Point Solr to Zookeeper by configuring hbase-indexer-site.xml vi /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase-indexer-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>hbaseindexer.zookeeper.connectstring</name>
<value>sandbox.hortonworks.com:2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>sandbox.hortonworks.com</value>
</property>
</configuration>
In Ambari > HBase > Configs > Custom hbase-site add the below properties, but do not restart HBase just yet: hbase.replication=true
replication.source.ratio=1.0
replication.source.nb.capacity=1000
replication.replicationsource.implementation=com.ngdata.sep.impl.SepReplicationSource
Copy Solrs Hbase related libs to $HBASE_HOME/lib cp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep* /usr/hdp/current/hbase-master/lib/
Restart Hbase Copy hbase-site.xml to hbase-indexer's conf dir cp /etc/hbase/conf/hbase-site.xml /opt/lucidworks-hdpsearch/hbase-indexer/conf/
Start Solr in cloud mode (pointing to ZK) cd /opt/lucidworks-hdpsearch/solr
bin/solr start -c -z sandbox.hortonworks.com:2181
Create collection bin/solr create -c hbaseCollection \
-d data_driven_schema_configs \
-n myCollConfigs \
-s 2 \
-rf 2
Start Hbase indexer cd /opt/lucidworks-hdpsearch/hbase-indexer/bin/
./hbase-indexer server
In a second terminal, create table to be indexed in HBase. Open hbase shell and run below to create a table named "indexdemo-user", with a single column family named "info". Note that the REPLICATION_SCOPE of the column family of the table must be set to 1.: create 'indexdemo-user', { NAME => 'info', REPLICATION_SCOPE => '1' }
!quit
Now we'll create an indexer that will index the the indexdemo-user table as its contents are updated. vi /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml
<?xml version="1.0"?>
<indexer table="indexdemo-user">
<field name="firstname_s" value="info:firstname"/>
<field name="lastname_s" value="info:lastname"/>
<field name="age_i" value="info:age" type="int"/>
</indexer>
The above file defines three pieces of information that will be used for indexing, how to interpret them, and how they will be stored in Solr.
Next, create an indexer based on the created indexer xml file. /opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer add-indexer -n hbaseindexer -c /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml -cp solr.zk=sandbox.hortonworks.com:2181 -cp solr.collection=hbaseCollection
Check it got created /opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer list-indexers
Check that the index server output shows below INFO supervisor.IndexerSupervisor: Started indexer for hbaseindexer
Log back in the hbase shell try adding some data to the indexdemo-user table hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'John'
hbase> put 'indexdemo-user', 'row1', 'info:lastname', 'Smith'
Run commit curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
Open Solr UI and notice under statistics the "Num Docs" has increased: http://sandbox.hortonworks.com:8983/solr/#/hbaseCollection_shard1_replica1 Run query using Solr REST API: http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&indent=true Now try updating the data you've just added in hbase shell and commit hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'Jim'
curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
Check the content in Solr: http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&indent=true Note that the document's firstname_s field now contains the string "Jim". Finally, delete the row from HBase and commit hbase> deleteall 'indexdemo-user', 'row1'
curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
Check the content in Solr and notice that the document has been removedhttp://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&indent=true You have successfully setup Hbase indexing with HDP search
... View more
Labels:
10-08-2015
07:15 PM
1 Kudo
I believe this would currently be through Hive views
... View more
10-08-2015
07:10 PM
9 Kudos
Lab Overview In this lab, we will learn to:
Configure Solr to store indexes in HDFS Create a solr cluster of 2 solr instances running on port 8983 and 8984 Index documents in HDFS using the Hadoop connectors Use Solr to search documents Pre-Requisite
The lab is designed for the HDP Sandbox. Download the HDP Sandbox here, import into VMWare Fusion and start the VM LAB Step 1 - Log into Sandbox
After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g. 192.168.191.241 sandbox.hortonworks.com sandbox
Connect to the VM via SSH (root/hadoop), correct the /etc/hosts entry ssh root@sandbox.hortonworks.com
If running on an Ambari installed HDP 2.3 cluster (instead of sandbox), run the below to install HDPsearch yum install -y lucidworks-hdpsearch
sudo -u hdfs hadoop fs -mkdir /user/solr
sudo -u hdfs hadoop fs -chown solr /user/solr
If running on HDP 2.3 sandbox, run below chown -R solr:solr /opt/lucidworks-hdpsearch
Run remaining steps as solr su solr
Step 2 - Configure Solr to store index files in HDFS
For the lab, we will use schemaless configuration that ships with Solr
Schemaless configuration is a set of SOLR features that allow one to index documents without pre-specifying the schema of indexed documents Sample schemaless configruation can be found in the directory /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs Let's create a copy of the sample schemaless configuration and modify it to store indexes in HDFS cp -R /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs
Open /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf/solrconfig.xml in your favorite editor and make the following changes: 1- Replace the section: <directoryFactory name="DirectoryFactory"
>
</directoryFactory>
with <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
<str name="solr.hdfs.home">hdfs://sandbox.hortonworks.com/user/solr</str>
<bool name="solr.hdfs.blockcache.enabled">true</bool>
<int name="solr.hdfs.blockcache.slab.count">1</int>
<bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool>
<int name="solr.hdfs.blockcache.blocksperbank">16384</int>
<bool name="solr.hdfs.blockcache.read.enabled">true</bool>
<bool name="solr.hdfs.blockcache.write.enabled">false</bool>
<bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
<int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
<int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
</directoryFactory>
2- set locktype to <lockType>hdfs</lockType>
3- Save and exit the file Step 3 - Start 2 Solr instances in solrcloud mode mkdir -p ~/solr-cores/core1
mkdir -p ~/solr-cores/core2
cp /opt/lucidworks-hdpsearch/solr/server/solr/solr.xml ~/solr-cores/core1
cp /opt/lucidworks-hdpsearch/solr/server/solr/solr.xml ~/solr-cores/core2
#you may need to set JAVA_HOME
#export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64
/opt/lucidworks-hdpsearch/solr/bin/solr start -cloud -p 8983 -z sandbox.hortonworks.com:2181 -s ~/solr-cores/core1
/opt/lucidworks-hdpsearch/solr/bin/solr restart -cloud -p 8984 -z sandbox.hortonworks.com:2181 -s ~/solr-cores/core2
Step 4 - Create a Solr Collection named "labs" with 2 shards and a replication factor of 2 /opt/lucidworks-hdpsearch/solr/bin/solr create -c labs -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf -n labs -s 2 -rf 2
Step 5 - Validate that the labs collection got created
Using the browser, visit http://sandbox.hortonworks.com:8983/solr/#/~cloud. You should see the labs collection with 2 shards, each with a replication factor of 2. Step 6 - Load documents to HDFS
Upload sample csv file to hdfs. We will index the file with Solr using the Solr Hadoop connectors hadoop fs -mkdir -p csv
hadoop fs -put /opt/lucidworks-hdpsearch/solr/example/exampledocs/books.csv csv/
Step 7 - Index documents with Solr using Solr Hadoop Connector hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter="," -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c labs -i csv/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk localhost:2181
Step 8 - Search indexed documents
Search the indexed documents. Using the browser, visit the urlhttp://sandbox.hortonworks.com:8984/solr/labs/select?q=*:* You will see search results like below Step 9 - Lab Complete
You have sucessfully completed the lab and learnt how to:
Store Solr indexes in HDFS Create a Solr Cluster Index documents in HDFS using Solr Hadoop connectors
... View more
Labels:
10-08-2015
06:35 PM
18 Kudos
There have been a number of questions recently on using AD/IPA with HDP 2.3 security: How to setup Active Directory/IPA? How to setup cluster OS to recognize users from AD using SSSD? How to enable kerberos for authentication? How to install Ranger for authorization/audit and setup plugins for HDFS, Hive, HBase, Kafka, Storm, Yarn, Knox and test these components on kerborized cluster? How to sync Ranger user/group sync with AD/IPA? How to integrate Knox with AD/IPA? How to setup encryption at rest with Ranger KMS? To help answer some of these questions, the partner team have prepared cheatsheets on security workshops. These are living materials with sample code snippets which are being updated/enhanced per the feedback from the field so rather than replicate the materials here, the latest materials can be referenced at the GitHub repo linked from here: https://community.hortonworks.com/repos/4465/workshops-on-how-to-setup-security-on-hadoop-using.html To help get started with security, we have also made available secured sandbox and LDAP VMs after running through above steps. Note that these are unofficial and for the final word on security with HDP, the official docs should be referenced at: http://docs.hortonworks.com. For example: http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_Ambari_Security_Guide/content/ch_amb_sec_guide.html http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Ranger_Install_Guide/content/ch_overview_ranger_ambari_install.html http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Ranger_KMS_Admin_Guide/content/ch_ranger_kms_overview.html For help with the workshop materials please use GitHub issues: https://github.com/abajwa-hw/security-workshops/issues
... View more
Labels:
10-08-2015
05:21 PM
4 Kudos
There are steps and code for a working Kafka to Storm to HBase example on HDP 2.3 at the 3-part tutorial series here which may help: http://hortonworks.com/hadoop-tutorial/simulating-transporting-realtime-events-stream-apache-kafka/ http://hortonworks.com/hadoop-tutorial/ingesting-processing-real-time-events-apache-storm/ http://hortonworks.com/hadoop-tutorial/real-time-data-ingestion-hbase-hive-using-storm-bolt/ In the sample code provided above, the hbase-site.xml was packaged into the uber jar by adding the below in the pom.xml <resources>
<resource>
<directory>/etc/hbase/conf</directory>
<includes>
<include>hbase-site.xml</include>
</includes>
</resource>
<resource>
<directory>/opt/TruckEvents/Tutorials-master/src/main/resources</directory>
</resource>
</resources>
... View more
10-08-2015
04:29 PM
Note that you would need to add this user to the list of sudoers first which the documentation hadn't mentioned. I ran into the same while building the ambari service. See https://issues.apache.org/jira/browse/NIFI-930
... View more
10-05-2015
05:46 PM
This option (dropping jars in /usr/hdp/current/hive-server2/auxlib) may not work for them because they have about 800 jars, they in turn load shared libs.The way they currently manage this is by using an uber jar whose manifest’s Class-Path entry has references to relative paths of our jars.The relative paths work because the uber jar resides in one of their own installation directories,which wont happen when the uber jar is in cluster’s installation directory.Copying so many jars to cluster installation will be impractical for admins of joint customers.Is there no way to use ADD JAR or set
... View more