About abajwa

abajwa · ‎10-11-2015

I believe it should work for 2.3 as well (dropping the ELASTICSEARCH dir into 2.3 resources die). @smishra@hortonworks.com has it been tested?

abajwa · ‎10-10-2015

I think this would be the numeric twitter identification number given to each user https://dev.twitter.com/rest/reference/get/users/lookup

abajwa · ‎10-09-2015

You could try the Ambari service for ES too: https://hortonworks-gallery.github.io/index.html?sort=asc&filter=ambari%20extensions

abajwa · ‎10-09-2015

Hbase indexing to Solr with HDP Search in HDP 2.3 Background: The HBase Indexer provides the ability to stream events from HBase to Solr for near real time searching. The HBase indexer is included with HDPSearch as an additional service. The indexer works by acting as an HBase replication sink. As updates are written to HBase, the events are asynchronously replicated to the HBase Indexer processes, which in turn creates Solr documents and pushes them to Solr. References: https://doc.lucidworks.com/lucidworks-hdpsearch/2.3/Guide-Jobs.html#_hbase-indexer https://github.com/NGDATA/hbase-indexer/wiki/Tutorial Steps Download and start HDP 2.3 sandbox VM which comes with LW HDP search installed (under /opt/lucidworks-hdpsearch) and run below to ensure no log files owned by root remain chown -R solr:solr /opt/lucidworks-hdpsearch/solr If running on an Ambari installed HDP 2.3 cluster (instead of sandbox), run the below to install HDPsearch and setup the user dir in HDFS: yum install -y lucidworks-hdpsearch sudo -u hdfs hadoop fs -mkdir /user/solr sudo -u hdfs hadoop fs -chown solr /user/solr Point Solr to Zookeeper by configuring hbase-indexer-site.xml vi /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase-indexer-site.xml <?xml version="1.0"?> <configuration> <property> <name>hbaseindexer.zookeeper.connectstring</name> <value>sandbox.hortonworks.com:2181</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>sandbox.hortonworks.com</value> </property> </configuration> In Ambari > HBase > Configs > Custom hbase-site add the below properties, but do not restart HBase just yet: hbase.replication=true replication.source.ratio=1.0 replication.source.nb.capacity=1000 replication.replicationsource.implementation=com.ngdata.sep.impl.SepReplicationSource Copy Solrs Hbase related libs to $HBASE_HOME/lib cp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep* /usr/hdp/current/hbase-master/lib/ Restart Hbase Copy hbase-site.xml to hbase-indexer's conf dir cp /etc/hbase/conf/hbase-site.xml /opt/lucidworks-hdpsearch/hbase-indexer/conf/ Start Solr in cloud mode (pointing to ZK) cd /opt/lucidworks-hdpsearch/solr bin/solr start -c -z sandbox.hortonworks.com:2181 Create collection bin/solr create -c hbaseCollection \ -d data_driven_schema_configs \ -n myCollConfigs \ -s 2 \ -rf 2 Start Hbase indexer cd /opt/lucidworks-hdpsearch/hbase-indexer/bin/ ./hbase-indexer server In a second terminal, create table to be indexed in HBase. Open hbase shell and run below to create a table named "indexdemo-user", with a single column family named "info". Note that the REPLICATION_SCOPE of the column family of the table must be set to 1.: create 'indexdemo-user', { NAME => 'info', REPLICATION_SCOPE => '1' } !quit Now we'll create an indexer that will index the the indexdemo-user table as its contents are updated. vi /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml <?xml version="1.0"?> <indexer table="indexdemo-user"> <field name="firstname_s" value="info:firstname"/> <field name="lastname_s" value="info:lastname"/> <field name="age_i" value="info:age" type="int"/> </indexer> The above file defines three pieces of information that will be used for indexing, how to interpret them, and how they will be stored in Solr. Next, create an indexer based on the created indexer xml file. /opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer add-indexer -n hbaseindexer -c /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml -cp solr.zk=sandbox.hortonworks.com:2181 -cp solr.collection=hbaseCollection Check it got created /opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer list-indexers Check that the index server output shows below INFO supervisor.IndexerSupervisor: Started indexer for hbaseindexer Log back in the hbase shell try adding some data to the indexdemo-user table hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'John' hbase> put 'indexdemo-user', 'row1', 'info:lastname', 'Smith' Run commit curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true Open Solr UI and notice under statistics the "Num Docs" has increased: http://sandbox.hortonworks.com:8983/solr/#/hbaseCollection_shard1_replica1 Run query using Solr REST API: http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&indent=true Now try updating the data you've just added in hbase shell and commit hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'Jim' curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true Check the content in Solr: http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&indent=true Note that the document's firstname_s field now contains the string "Jim". Finally, delete the row from HBase and commit hbase> deleteall 'indexdemo-user', 'row1' curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true Check the content in Solr and notice that the document has been removedhttp://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&indent=true You have successfully setup Hbase indexing with HDP search

abajwa · ‎10-08-2015

I believe this would currently be through Hive views

abajwa · ‎10-08-2015

Lab Overview In this lab, we will learn to: Configure Solr to store indexes in HDFS Create a solr cluster of 2 solr instances running on port 8983 and 8984 Index documents in HDFS using the Hadoop connectors Use Solr to search documents Pre-Requisite The lab is designed for the HDP Sandbox. Download the HDP Sandbox here, import into VMWare Fusion and start the VM LAB Step 1 - Log into Sandbox After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g. 192.168.191.241 sandbox.hortonworks.com sandbox Connect to the VM via SSH (root/hadoop), correct the /etc/hosts entry ssh root@sandbox.hortonworks.com If running on an Ambari installed HDP 2.3 cluster (instead of sandbox), run the below to install HDPsearch yum install -y lucidworks-hdpsearch sudo -u hdfs hadoop fs -mkdir /user/solr sudo -u hdfs hadoop fs -chown solr /user/solr If running on HDP 2.3 sandbox, run below chown -R solr:solr /opt/lucidworks-hdpsearch Run remaining steps as solr su solr Step 2 - Configure Solr to store index files in HDFS For the lab, we will use schemaless configuration that ships with Solr Schemaless configuration is a set of SOLR features that allow one to index documents without pre-specifying the schema of indexed documents Sample schemaless configruation can be found in the directory /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs Let's create a copy of the sample schemaless configuration and modify it to store indexes in HDFS cp -R /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs Open /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf/solrconfig.xml in your favorite editor and make the following changes: 1- Replace the section: <directoryFactory name="DirectoryFactory" > </directoryFactory> with <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory"> <str name="solr.hdfs.home">hdfs://sandbox.hortonworks.com/user/solr</str> <bool name="solr.hdfs.blockcache.enabled">true</bool> <int name="solr.hdfs.blockcache.slab.count">1</int> <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool> <int name="solr.hdfs.blockcache.blocksperbank">16384</int> <bool name="solr.hdfs.blockcache.read.enabled">true</bool> <bool name="solr.hdfs.blockcache.write.enabled">false</bool> <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool> <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int> <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int> </directoryFactory> 2- set locktype to <lockType>hdfs</lockType> 3- Save and exit the file Step 3 - Start 2 Solr instances in solrcloud mode mkdir -p ~/solr-cores/core1 mkdir -p ~/solr-cores/core2 cp /opt/lucidworks-hdpsearch/solr/server/solr/solr.xml ~/solr-cores/core1 cp /opt/lucidworks-hdpsearch/solr/server/solr/solr.xml ~/solr-cores/core2 #you may need to set JAVA_HOME #export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 /opt/lucidworks-hdpsearch/solr/bin/solr start -cloud -p 8983 -z sandbox.hortonworks.com:2181 -s ~/solr-cores/core1 /opt/lucidworks-hdpsearch/solr/bin/solr restart -cloud -p 8984 -z sandbox.hortonworks.com:2181 -s ~/solr-cores/core2 Step 4 - Create a Solr Collection named "labs" with 2 shards and a replication factor of 2 /opt/lucidworks-hdpsearch/solr/bin/solr create -c labs -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf -n labs -s 2 -rf 2 Step 5 - Validate that the labs collection got created Using the browser, visit http://sandbox.hortonworks.com:8983/solr/#/~cloud. You should see the labs collection with 2 shards, each with a replication factor of 2. Step 6 - Load documents to HDFS Upload sample csv file to hdfs. We will index the file with Solr using the Solr Hadoop connectors hadoop fs -mkdir -p csv hadoop fs -put /opt/lucidworks-hdpsearch/solr/example/exampledocs/books.csv csv/ Step 7 - Index documents with Solr using Solr Hadoop Connector hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter="," -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c labs -i csv/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk localhost:2181 Step 8 - Search indexed documents Search the indexed documents. Using the browser, visit the urlhttp://sandbox.hortonworks.com:8984/solr/labs/select?q=*:* You will see search results like below Step 9 - Lab Complete You have sucessfully completed the lab and learnt how to: Store Solr indexes in HDFS Create a Solr Cluster Index documents in HDFS using Solr Hadoop connectors

abajwa · ‎10-08-2015

There have been a number of questions recently on using AD/IPA with HDP 2.3 security: How to setup Active Directory/IPA? How to setup cluster OS to recognize users from AD using SSSD? How to enable kerberos for authentication? How to install Ranger for authorization/audit and setup plugins for HDFS, Hive, HBase, Kafka, Storm, Yarn, Knox and test these components on kerborized cluster? How to sync Ranger user/group sync with AD/IPA? How to integrate Knox with AD/IPA? How to setup encryption at rest with Ranger KMS? To help answer some of these questions, the partner team have prepared cheatsheets on security workshops. These are living materials with sample code snippets which are being updated/enhanced per the feedback from the field so rather than replicate the materials here, the latest materials can be referenced at the GitHub repo linked from here: https://community.hortonworks.com/repos/4465/workshops-on-how-to-setup-security-on-hadoop-using.html To help get started with security, we have also made available secured sandbox and LDAP VMs after running through above steps. Note that these are unofficial and for the final word on security with HDP, the official docs should be referenced at: http://docs.hortonworks.com. For example: http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_Ambari_Security_Guide/content/ch_amb_sec_guide.html http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Ranger_Install_Guide/content/ch_overview_ranger_ambari_install.html http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Ranger_KMS_Admin_Guide/content/ch_ranger_kms_overview.html For help with the workshop materials please use GitHub issues: https://github.com/abajwa-hw/security-workshops/issues

abajwa · ‎10-08-2015

There are steps and code for a working Kafka to Storm to HBase example on HDP 2.3 at the 3-part tutorial series here which may help: http://hortonworks.com/hadoop-tutorial/simulating-transporting-realtime-events-stream-apache-kafka/ http://hortonworks.com/hadoop-tutorial/ingesting-processing-real-time-events-apache-storm/ http://hortonworks.com/hadoop-tutorial/real-time-data-ingestion-hbase-hive-using-storm-bolt/ In the sample code provided above, the hbase-site.xml was packaged into the uber jar by adding the below in the pom.xml <resources> <resource> <directory>/etc/hbase/conf</directory> <includes> <include>hbase-site.xml</include> </includes> </resource> <resource> <directory>/opt/TruckEvents/Tutorials-master/src/main/resources</directory> </resource> </resources>

abajwa · ‎10-08-2015

Note that you would need to add this user to the list of sudoers first which the documentation hadn't mentioned. I ran into the same while building the ambari service. See https://issues.apache.org/jira/browse/NIFI-930

abajwa · ‎10-05-2015

This option (dropping jars in /usr/hdp/current/hive-server2/auxlib) may not work for them because they have about 800 jars, they in turn load shared libs.The way they currently manage this is by using an uber jar whose manifest’s Class-Path entry has references to relative paths of our jars.The relative paths work because the uber jar resides in one of their own installation directories,which wont happen when the uber jar is in cluster’s installation directory.Copying so many jars to cluster installation will be impractical for admins of joint customers.Is there no way to use ADD JAR or set

Online	Offline
Last Visited	‎04-04-2024 03:26 PM

Member Since	‎09-17-2015 07:33 PM
Last Visited	‎04-04-2024 03:26 PM
Posts	436
Kudos received	559

Cloudera Community

Re: OpenTSDB - multiple instances

Re: Is there a way to store service configuration ...

Re: Unable to visualize tweets on Banana.

Re: Is it possible to have Ambari server available...

Re: Unable to add apache Nifi in ambari?

Re: Installing ElasticSearch on HDP

Re: What is the format of the (IDs to Follow) fiel...

Re: Installing ElasticSearch on HDP

Hbase indexing to Solr with HDP Search

Re: What is the best way to implement row-based se...

Index Documents using HDPSearch in HDP 2.3

Configuring HDP Security with Active Directory/IPA

Re: How to connect Storm to HBase ?

Re: How to run NiFi as Non-Root User?

Re: (hive) How to dynamically set hive.aux.jars.pa...