Support Questions

Find answers, ask questions, and share your expertise

Trouble Indexing data to elasticsearch using NIFI

avatar
New Contributor

Hey,

i'm using NIFI 1.22 and preparing NIFI flow for upgrade to 2.0.0 version.
Currently, using  PutElasticsearchHttpRecord in order to index flowfiles which contain array of nested jsons. This processor sends each flowfile as whole, does not require splits to single jsons or defining schema.

This processor is deprecated and it suggested to move to PutElasticsearchRecord.

My data is very much complex and variant so I tried to avoid defining schema and to use  the Inffer Schema strategy of JsonTreeReader in  PutElasticsearchRecord.
But the processor is failing on type conversion of the first field in the first json of the array. It seems that it cannot handle nested jsons in array (while splitting it to single jsons worked).

In addition, my implemetation need to deal the following limitation: 
1. Processing a large dataset (TBs), so it need to be fast.
2. Avoiding spliting the array of jsons into a single json per flow file, as it cause content claim issues (too many large files and small files in the flow).
3. Avoiding managing schema, as the data is much variant.

Love to hear how is best to Index this data to Elasticsearch.

2 REPLIES 2

avatar
Community Manager

@eylon, Welcome to our community! To help you get the best possible answer, I have tagged in our NiFi experts  @SAMSAL @cotopaul @MattWho @TimothySpann who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
New Contributor

Thanks for the reply.
Adding more details about my flow configuration and example for the input filesputElasticsearchRecord_errorputElasticsearchRecord_errorputElasticsearchRecord_configputElasticsearchRecord_configjsonTreeReader_configjsonTreeReader_configinputFile smapleinputFile smaple