index nested structure

RobV — Fri, 16 Sep 2022 09:05:39 GMT

I have an avro input source, going through a morphline into Solr. For example the following structure:

{

"username" : "alex"

"date" : "21-08-2014"

"attachments" : [

"documents" : [

{

"title": "test"

"tags" : [ "a", "b", "c" ]

{

"optional1" : "test2"

"title" : "test2"

} ],

"source" : "school"

]

}

I can extract with extractAvroPath, like so:

...

{ extractAvroPaths {

flatten : true

paths : {

/my_user : /username # this works fine

/my_attachments : "/attachments[]"

/my_documents : "/attachments[]/documents[]"

}

.....

The problem being that /my_attachments or /my_documents now contain raw json/avro structures instead of a single field. How would I go about 'unwrapping' these fields so that they are all part of one solr document, while still retaining their context of the document they belong to?

Re: index nested structure

RobV — Wed, 27 Aug 2014 18:57:19 GMT

To answer my own question: No this is not possible at this time, since Solr only started supporting nested documents since 4.5 and CDH5.1 is at 4.4 right now. Even if this becomes available in a future release the question will be whether or not this can be easily integrated and used with Kites morphlines.

For getting the job done I had to switch to using ElasticSearch, which does support nested documents and used Flume's ElasticSearchSink. Flume's official documetation on elasticsearch and avro is lacking and I had to patch flume code to get it working with UTF-8 charset and Json, but it's working nonetheless. Hope I can move this dataflow to the better integrated SolrCloud in the future.

question Re: index nested structure in Archives of Support Questions (Read Only)

index nested structure

Re: index nested structure