Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

index nested structure

SOLVED Go to solution

index nested structure

Contributor

I have an avro input source, going through a morphline into Solr. For example the following structure:

 

{

    "username" : "alex"

    "date" : "21-08-2014"

    "attachments" : [

        "documents" : [

              {

                  "title": "test"

                  "tags" : [ "a", "b", "c" ]

              },

              {

                  "optional1" : "test2"

                  "title" : "test2"

              } ],

        "source" : "school"

    ]

}

 

I can extract with extractAvroPath, like so:

 

...

{ extractAvroPaths {

     flatten : true

     paths : {

         /my_user : /username       # this works fine

         /my_attachments : "/attachments[]"

         /my_documents : "/attachments[]/documents[]"

     }

  }

}

.....

 

The problem being that /my_attachments or /my_documents now contain raw json/avro structures instead of a single field. How would I go about 'unwrapping' these fields so that they are all part of one solr document, while still retaining their context of the document they belong to? 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: index nested structure

Contributor

To answer my own question: No this is not possible at this time, since Solr only started supporting nested documents since 4.5 and CDH5.1 is at 4.4 right now. Even if this becomes available in a future release the question will be whether or not this can be easily integrated and used with Kites morphlines.

 

For getting the job done I had to switch to using ElasticSearch, which does support nested documents and used Flume's ElasticSearchSink. Flume's official documetation on elasticsearch and avro is lacking and I had to patch flume code to get it working with UTF-8 charset and Json, but it's working nonetheless. Hope I can move this dataflow to the better integrated SolrCloud in the future.

 

1 REPLY 1
Highlighted

Re: index nested structure

Contributor

To answer my own question: No this is not possible at this time, since Solr only started supporting nested documents since 4.5 and CDH5.1 is at 4.4 right now. Even if this becomes available in a future release the question will be whether or not this can be easily integrated and used with Kites morphlines.

 

For getting the job done I had to switch to using ElasticSearch, which does support nested documents and used Flume's ElasticSearchSink. Flume's official documetation on elasticsearch and avro is lacking and I had to patch flume code to get it working with UTF-8 charset and Json, but it's working nonetheless. Hope I can move this dataflow to the better integrated SolrCloud in the future.