Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

index nested structure

avatar
Explorer

I have an avro input source, going through a morphline into Solr. For example the following structure:

 

{

    "username" : "alex"

    "date" : "21-08-2014"

    "attachments" : [

        "documents" : [

              {

                  "title": "test"

                  "tags" : [ "a", "b", "c" ]

              },

              {

                  "optional1" : "test2"

                  "title" : "test2"

              } ],

        "source" : "school"

    ]

}

 

I can extract with extractAvroPath, like so:

 

...

{ extractAvroPaths {

     flatten : true

     paths : {

         /my_user : /username       # this works fine

         /my_attachments : "/attachments[]"

         /my_documents : "/attachments[]/documents[]"

     }

  }

}

.....

 

The problem being that /my_attachments or /my_documents now contain raw json/avro structures instead of a single field. How would I go about 'unwrapping' these fields so that they are all part of one solr document, while still retaining their context of the document they belong to? 

 

1 ACCEPTED SOLUTION

avatar
Explorer

To answer my own question: No this is not possible at this time, since Solr only started supporting nested documents since 4.5 and CDH5.1 is at 4.4 right now. Even if this becomes available in a future release the question will be whether or not this can be easily integrated and used with Kites morphlines.

 

For getting the job done I had to switch to using ElasticSearch, which does support nested documents and used Flume's ElasticSearchSink. Flume's official documetation on elasticsearch and avro is lacking and I had to patch flume code to get it working with UTF-8 charset and Json, but it's working nonetheless. Hope I can move this dataflow to the better integrated SolrCloud in the future.

 

View solution in original post

1 REPLY 1

avatar
Explorer

To answer my own question: No this is not possible at this time, since Solr only started supporting nested documents since 4.5 and CDH5.1 is at 4.4 right now. Even if this becomes available in a future release the question will be whether or not this can be easily integrated and used with Kites morphlines.

 

For getting the job done I had to switch to using ElasticSearch, which does support nested documents and used Flume's ElasticSearchSink. Flume's official documetation on elasticsearch and avro is lacking and I had to patch flume code to get it working with UTF-8 charset and Json, but it's working nonetheless. Hope I can move this dataflow to the better integrated SolrCloud in the future.