Created on 08-21-2014 05:45 AM - edited 09-16-2022 02:05 AM
I have an avro input source, going through a morphline into Solr. For example the following structure:
{
"username" : "alex"
"date" : "21-08-2014"
"attachments" : [
"documents" : [
{
"title": "test"
"tags" : [ "a", "b", "c" ]
},
{
"optional1" : "test2"
"title" : "test2"
} ],
"source" : "school"
]
}
I can extract with extractAvroPath, like so:
...
{ extractAvroPaths {
flatten : true
paths : {
/my_user : /username # this works fine
/my_attachments : "/attachments[]"
/my_documents : "/attachments[]/documents[]"
}
}
}
.....
The problem being that /my_attachments or /my_documents now contain raw json/avro structures instead of a single field. How would I go about 'unwrapping' these fields so that they are all part of one solr document, while still retaining their context of the document they belong to?
Created 08-27-2014 11:57 AM
To answer my own question: No this is not possible at this time, since Solr only started supporting nested documents since 4.5 and CDH5.1 is at 4.4 right now. Even if this becomes available in a future release the question will be whether or not this can be easily integrated and used with Kites morphlines.
For getting the job done I had to switch to using ElasticSearch, which does support nested documents and used Flume's ElasticSearchSink. Flume's official documetation on elasticsearch and avro is lacking and I had to patch flume code to get it working with UTF-8 charset and Json, but it's working nonetheless. Hope I can move this dataflow to the better integrated SolrCloud in the future.
Created 08-27-2014 11:57 AM
To answer my own question: No this is not possible at this time, since Solr only started supporting nested documents since 4.5 and CDH5.1 is at 4.4 right now. Even if this becomes available in a future release the question will be whether or not this can be easily integrated and used with Kites morphlines.
For getting the job done I had to switch to using ElasticSearch, which does support nested documents and used Flume's ElasticSearchSink. Flume's official documetation on elasticsearch and avro is lacking and I had to patch flume code to get it working with UTF-8 charset and Json, but it's working nonetheless. Hope I can move this dataflow to the better integrated SolrCloud in the future.