Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Indexing multiple tables and declaring multiple collections in morphline.conf

Indexing multiple tables and declaring multiple collections in morphline.conf

New Contributor

I have different collections for different hbase tables and all the collections are indexed when I am defining one collection in morphline.conf file, but I am not sure how to define multiple collections in morphline.conf as cloudera manager uses single morphline.conf file so I am able to declare only one collection in the morphliine.conf file.Below is the morphline.conf file example. 

 

I have tried couple of things adding another SolR_LOCATOR2 at the end .  

declaring multiple collections in single collection separated by , eg --collection : demoTable1_collection,demoTable2_collection but it didnt work

 

SOLR_LOCATOR : {
# Name of solr collection
collection : demoTable2_collection

# ZooKeeper ensemble

zkHost : "$ZK_HOST"
}

3 REPLIES 3

Re: Indexing multiple tables and declaring multiple collections in morphline.conf

Super Collaborator

The 'morphlineId' property for the morphlineSolrSink would allow you to specify different morphlines within the same morphline.conf to be used for multiple collections.  Are you using separate sinks for the respective collections?

 

To clarify, the above entry would be for flume, what are you using to index the data?

-pd

Re: Indexing multiple tables and declaring multiple collections in morphline.conf

New Contributor

I am processing the data through spark and inserting into Hbase and indexing using solr.So I have different Hbase tables to be indexed. I am creating one collection for each table and able to index when using the collection name and fields in the morphline.conf file but I am not getting how can I add multiples collections. Do I have to add different morphlinesID to the same morphline.conf file . 

Below is the morphline.conf Do I have to add multiple mophlineID ?

 

SOLR_LOCATOR2 : {
# Name of solr collection
collection : demoTable2_collection

# ZooKeeper ensemble

zkHost : "$ZK_HOST"
}

morphlines : [
{
id : morphline
importCommands : ["org.kitesdk.**", "com.ngdata.**"]

commands : [
{
extractHBaseCells {
mappings : 

Re: Indexing multiple tables and declaring multiple collections in morphline.conf

Super Collaborator
It sounds like you are using the keystore indexer to get data from hbase into solr correct?

If so, you can create multiple hbase indexer instances, and the hbase-mapper.xml that you would use can have separate morphline IDs:

From this documentation [1], you would specify the morphline id in the xml file, and use a relative path for the morphlines.conf, so it use the CM version:

<?xml version="1.0"?>
<indexer table="record"
mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper">

<!-- The relative or absolute path on the local file system to the
morphline configuration file. -->
<!-- Use relative path "morphlines.conf" for morphlines managed by
Cloudera Manager -->
<param name="morphlineFile" value="morphlines.conf"/>


<param name="morphlineId" value="morphline1"/>

</indexer>

[1] https://www.cloudera.com/documentation/enterprise/5-8-x/topics/search_hbase_batch_indexer.html#conce...

Then when you create the separate instances with the hbase-indexer add-indexer command, you will reference the xml file for each collection and specify the collection to connect to.