About dbalasundaran

dbalasundaran · ‎10-23-2018

@Amit Mishra, let me know if docs I shared in my comment helps

dbalasundaran · ‎10-23-2018

Please see latest doc on HDPSearch4.0 at https://docs.hortonworks.com/HDPDocuments/HDPS/HDPS-4.0.0/bk_solr-search-installation/content/ch_hdp-search.html . In case you are interested here is an article where you could find details on how to Add Solr to and HDP-3 cluster : https://community.hortonworks.com/articles/224593/hdp-search-40-deployment-and-basic-connector-usage.html. Hope this helps

dbalasundaran · ‎10-19-2018

HDP Search provides the tools to index data from your HDP cluster to Solr. You can utilize the power of connectors that are shipped with HDPSearch to index data from HDFS, Hive tables, and Spark dataframes to Solr. Once you have your data in Solr, search and querying is simpler. You may find the official Hortonworks documentation for HDP Search 4.0 here: https://docs.hortonworks.com/HDPDocuments/HDPS/HDPS-4.0.0/bk_solr-search-installation/content/ch_hdp-search.html In this article, you may find how you could set up your HDP Cluster with HDP Search using the Solr Management pack shipped by Hortonworks. Additionally, you will find details on how to use connectors for Hive, Spark, and Hdfs to index data to Solr and query from Solr. (This document assumes that you already have an Ambari-2.7.0 + HDP-3.0 cluster up and running) Setup Install Management Pack: Download Solr service mpack on Ambari server node: wget http://public-repo-1.hortonworks.com/HDP-SOLR/hdp-solr-ambari-mp/solr-service-mpack-4.0.0.tar.gz Install mpack: ambari-server install-mpack --mpack=solr-service-mpack-4.0.0.tar.gz Restart ambari-server: ambari-server restart Add Solr to HDP Cluster: Navigate and Login to Ambari UI Choose Services - Add Service from the left navigation panel Choose Solr from Choose Services Page and click Next Now you land on the Assign Masters page where you can choose a number of Solr Servers you need for your cluster. Choose 2 or more for SolrCloud. Click Next - You would see Customize Services Page where no changes are required. Next, you land on the Review page as shown below where you can verify the hosts and Solr package If this looks good click on Deploy to get Solr added to your cluster. You can go to SolrUI directly from QuickLinks. If Kerberos is enabled you have to enable spnego authentication to access Solr UI (Instructions mentioned in this doc). Now you have your HDPSearch cluster up and running fine! Connector Usage The following shows how you can use Connectors that are shipped with HDPSearch to index data to Solr and query using SolrUI. Below shows Hive, Hadoop and Spark Connector usages. Hive Connector: For the Hive connector to work in this version(Hdp search 4.0) you need to have Serde Jar in Hive’s classpath. You can do this as below: Create a directory ‘auxlib’ in /usr/hdp/current/hive-server2 Copy serde jar to auxlib cp /opt/lucidworks-hdpsearch/hive/solr-hive-serde-4.0.0.jar Restart Hive Create a Collection: You may use the below command to create a new ‘hivecollection’ in Solr with 2 shards and a replication factor 1. http://<solr_host>:8983/solr/admin/collections?action=CREATE&name=hivecollection&numShards=2&replicationFactor=1 Index Data As this is an example, first we will create a table in hive whose data we want to be indexed in Solr. You will skip this step for your real data. Now create an external table for Solr and proceed with indexing. Create Table in Hive and insert data: As hive user, connect to beeline (kinit before connecting to beeline if it is a secure cluster) CREATE TABLE books (id STRING, cat STRING, title STRING, price FLOAT, in_stock BOOLEAN, author STRING, series STRING, seq INT, genre STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; Load data from books.csv in the example directory (/opt/lucidworks-hdpsearch/solr/example/exampledocs/books.csv) LOAD DATA LOCAL INPATH '/opt/lucidworks-hdpsearch/solr/example/exampledocs/books.csv' OVERWRITE INTO TABLE books; Update books table (if needed) so that column name is ignored: ALTER TABLE books SET TBLPROPERTIES ("skip.header.line.count"="1"); Create External Table for Solr and index data to Solr: CREATE EXTERNAL TABLE solr_sec (id STRING, cat_s STRING, title_s STRING, price_f STRING, in_stock_b STRING, author_s STRING, series_s STRING, seq_i INT, genre_s STRING) STORED BY 'com.lucidworks.hadoop.hive.LWStorageHandler' LOCATION '/tmp/solr' TBLPROPERTIES('solr.zkhost' = <zk_connection_string>, 'solr.collection' = ‘hivecollection’, 'solr.query' = '*:*'); If this is a secure cluster you have to mention the path to jaas-client.conf which will contain the service principal and keytab for the user who has permission to read and write to and from solr and hive. Then you append : 'lww.jaas.file' = '/tmp/jaas-client.conf' : as well to the create external table command above. A sample jaas-client.conf looks like: Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/etc/security/keytabs/smokeuser.headless.keytab" storeKey=true useTicketCache=false debug=true principal="ambari-qa@EXAMPLE.COM"; }; The owner of the file is solr:hadoop. This file needs to be copied to all nodes where a NodeManager is running. Insert data from books table which you want to index to solr external table: INSERT OVERWRITE TABLE solr_sec SELECT b.* FROM books b; You can issue a select query to verify data is inserted Now you should be able to see your data in solr ‘hivecollection’. You can search for the data/info you were looking for using Solr UI or API calls. Querying Data via Solr UI: You can issue the below call on your cluster command line as well: curl -v -i --negotiate -u : "http://<solr_host>:8983/solr/hivecollection/select?q=*:*&wt=json&indent=true" Change query ‘q’ based on what you want to look for. Spark Connector: You need to first build spark-solr jar from the source. From /opt/lucidworks-hdpsearch/spark/spark-solr: mvn clean package -DskipTests This will create spark-solr jars in target directory. Now switch to spark shell /usr/hdp/current/spark2-thriftserver/bin/spark-shell --jars ./spark-solr-3.5.6-shaded.jar (Add jaas-client.conf (--conf 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=/tmp/jaas-client.conf' --conf 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/tmp/jaas-client.conf') if this is a secure cluster. See Hive section for details on jaas-client.conf) Create Collection: Just like how we created hivecollection, create a ‘sparkcollection’ curl -X GET "http://<solr_host>:8983/solr/admin/collections?action=CREATE&name=sparkcollection&numShards=2&replicationFactor=1" Index data to Solr: (Below is the same example you’ll find in Lucidworks doc as well) The CSV file used in this sample is located here. Move the CSV file to the /tmp HDFS directory. Read it as a Spark DataFrame as shown below. Index this data to Solr using the command: http://<solr_host>:8983/solr/sparkcollection/update?commit=true Run a query on Solr UI to validate the setup: *:* returns all 999 docs indexed Query for a particular pickup location returns one document Reading data from Solr Read from Spark (tip and fare): Read from spark (total amount and toll amount) Hdfs Connector: Hdfs connector provides the ability to index files of the following formats or contents : CSV Zip War Sequence XML SolrXML Directories Regex (allows to define a regular expression on the incoming data and filter content) Grok (indexes incoming data based on a grok configuration) Job jar (which is the HDFS connector) has different IngestMappers to handle these different types of files/formats. Let's try to index the same books.csv we had in the hive example using CSVIngestMapper as an example. This example assumes that you have a collection already created which is explained in the above examples. Suppose books.csv resides in /user/solr/csvDir in HDFS. First command below will index this data to Solr. The rest of the commands are for all other types of IngestMappers. For secure cluster add -Dlww.jaas.file=/tmp/jaas-client.conf to the commands and do a kinit as needed. CSVIngestMapper hadoop jar solr-hadoop-job-4.0.0.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -DcsvDelimiter=@ -DcsvFirstLineComment=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c csvCollection -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -i /user/solr/csvDir/* -zk <zk_connection_string> RegexIngestMapper hadoop jar solr-hadoop-job-4.0.0.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.regex=".([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])." -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.groups_to_fields="0=data,1=ip1,2=ip2,3=ip3,4=ip4" -cls com.lucidworks.hadoop.ingest.RegexIngestMapper -c regexCollection -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -i /user/solr/regexDir/* -zk <zk_connection_string> GrokIngestMapper hadoop jar solr-hadoop-job-4.0.0.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dgrok.uri=/tmp/grok.conf -cls com.lucidworks.hadoop.ingest.GrokIngestMapper -c grokCollection -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -i /user/solr/grokDir/*8.log -zk <zk_connection_string> Sample grok.conf: input { stdin { type => example } }filter {grok {match => [ "message", "%{IP:ip} %{WORD:log_code} %{GREEDYDATA:log_message}" ]add_field => [ "received_from_field", "%{ip}" ]add_field => [ "message_code", "%{log_code}" ]add_field => [ "message_field", "%{log_message}" ]}}output {stdout { codec => rubydebug }} SequenceFileIngestMapper hadoop jar solr-hadoop-job-4.0.0.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.SequenceFileIngestMapper -c seqCollection -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -i /user/solr/seqDir/*.seq -zk <zk_connection_string> SolrXMLIngestMapper hadoop jar solr-hadoop-job-4.0.0.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.SolrXMLIngestMapper -c solrxmlCollection -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -i /user/solr/solrXmlDir/* -zk <zk_connection_string> WarcIngestMapper hadoop jar solr-hadoop-job-4.0.0.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.WarcIngestMapper -c warcCollection -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -i /user/solr/warcDir/* -zk <zk_connection_string> ZipIngestMapper hadoop jar solr-hadoop-job-4.0.0.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.ZipIngestMapper -c zipCollection -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -i /user/solr/test-zip/* -zk <zk_connection_string> DirectoryIngestMapper hadoop jar solr-hadoop-job-4.0.0.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c dirCollection -i /user/solr/test-documents/hadoop-dir/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk <zk_connection_string> XMLIngestMapper hadoop jar solr-hadoop-job-4.0.0.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.xml.start=root -Dlww.xml.end=root -Dlww.jaas.file=/tmp/jaas-client.conf -Dlww.xml.docXPathExpr=//doc -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.XMLIngestMapper -c xmlCollection -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -i /user/solr/xmlDir/* -zk <zk_connection_string> You should be able to issue queries to these collections via SolrUI/through APIs just like we did in the previous examples.

dbalasundaran · ‎06-29-2018

Could you try escaping it with u002fc So if you want a/b try using a\\u002fb

dbalasundaran · ‎10-12-2017

Could you please try : /api/v1/clusters/clusterName/hosts/hostName/host_components/componentName; adding service and hostname in the request body {"{SERVICE}", serviceName, "{HOST}", hostName} This article has sample calls that would walk you through the process: https://cwiki.apache.org/confluence/display/AMBARI/Adding+a+New+Service+to+an+Existing+Cluster Hope this helps.

dbalasundaran · ‎09-22-2017

Thanks @kramakrishnan. This works!

dbalasundaran · ‎09-21-2017

I am looking for an API call with which I can get the property differences between 2 config versions of a Service? Would be great if it can be extended to config versions in a config group.

dbalasundaran · ‎06-29-2017

@aswathy, by default you should be able to login with admin/admin. Could you please try

dbalasundaran · ‎05-02-2017

@cduby, which version of HDCloud are you using?

dbalasundaran · ‎03-27-2017

Thanks @Ram Venkatesh. After registering the metastore I can see the named entry in my json. Thankyou!

Online	Offline
Last Visited	‎05-15-2020 05:48 PM

Member Since	‎10-21-2015 08:04 PM
Last Visited	‎05-15-2020 05:48 PM
Posts	26
Kudos received	77

Cloudera Community

Re: Ambari: Properties with '/' in its name is not...

Re: How do I ensure that service is installed on a...

Re: Login credentials for zeppelin ui

Re: Solr on HDP 3?

Re: Solr on HDP 3?

HDP Search 4.0 : Deployment and Basic Connector U...

Re: Ambari: Properties with '/' in its name is not...

Re: How do I ensure that service is installed on a...

Re: Is there an Ambari API call to find the diff b...

Is there an Ambari API call to find the diff betwe...

Re: Login credentials for zeppelin ui

Re: HDP 2.6 on HD Cloud - HiveServer2 interactive ...

Re: 'SHOW CLI JSON' doesn't have properties from n...