Reply
New Contributor
Posts: 4
Registered: ‎04-09-2017

Indexing multiple tables and declaring multiple collections in morphline.conf

I have different collections for different hbase tables and all the collections are indexed when I am defining one collection in morphline.conf file, but I am not sure how to define multiple collections in morphline.conf as cloudera manager uses single morphline.conf file so I am able to declare only one collection in the morphliine.conf file.Below is the morphline.conf file example. 

 

I have tried couple of things adding another SolR_LOCATOR2 at the end .  

declaring multiple collections in single collection separated by , eg --collection : demoTable1_collection,demoTable2_collection but it didnt work

 

SOLR_LOCATOR : {
# Name of solr collection
collection : demoTable2_collection

# ZooKeeper ensemble

zkHost : "$ZK_HOST"
}

Cloudera Employee
Posts: 162
Registered: ‎01-09-2014

Re: Indexing multiple tables and declaring multiple collections in morphline.conf

[ Edited ]

The 'morphlineId' property for the morphlineSolrSink would allow you to specify different morphlines within the same morphline.conf to be used for multiple collections.  Are you using separate sinks for the respective collections?

 

To clarify, the above entry would be for flume, what are you using to index the data?

-pd

New Contributor
Posts: 4
Registered: ‎04-09-2017

Re: Indexing multiple tables and declaring multiple collections in morphline.conf

I am processing the data through spark and inserting into Hbase and indexing using solr.So I have different Hbase tables to be indexed. I am creating one collection for each table and able to index when using the collection name and fields in the morphline.conf file but I am not getting how can I add multiples collections. Do I have to add different morphlinesID to the same morphline.conf file . 

Below is the morphline.conf Do I have to add multiple mophlineID ?

 

SOLR_LOCATOR2 : {
# Name of solr collection
collection : demoTable2_collection

# ZooKeeper ensemble

zkHost : "$ZK_HOST"
}

morphlines : [
{
id : morphline
importCommands : ["org.kitesdk.**", "com.ngdata.**"]

commands : [
{
extractHBaseCells {
mappings : 

Cloudera Employee
Posts: 162
Registered: ‎01-09-2014

Re: Indexing multiple tables and declaring multiple collections in morphline.conf

It sounds like you are using the keystore indexer to get data from hbase into solr correct?

If so, you can create multiple hbase indexer instances, and the hbase-mapper.xml that you would use can have separate morphline IDs:

From this documentation [1], you would specify the morphline id in the xml file, and use a relative path for the morphlines.conf, so it use the CM version:

<?xml version="1.0"?>
<indexer table="record"
mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper">

<!-- The relative or absolute path on the local file system to the
morphline configuration file. -->
<!-- Use relative path "morphlines.conf" for morphlines managed by
Cloudera Manager -->
<param name="morphlineFile" value="morphlines.conf"/>


<param name="morphlineId" value="morphline1"/>

</indexer>

[1] https://www.cloudera.com/documentation/enterprise/5-8-x/topics/search_hbase_batch_indexer.html#conce...

Then when you create the separate instances with the hbase-indexer add-indexer command, you will reference the xml file for each collection and specify the collection to connect to.

Announcements