Created on 01-28-2024 11:59 AM - edited 01-28-2024 12:05 PM
Hey,
i'm using NIFI 1.22 and preparing NIFI flow for upgrade to 2.0.0 version.
Currently, using PutElasticsearchHttpRecord in order to index flowfiles which contain array of nested jsons. This processor sends each flowfile as whole, does not require splits to single jsons or defining schema.
This processor is deprecated and it suggested to move to PutElasticsearchRecord.
My data is very much complex and variant so I tried to avoid defining schema and to use the Inffer Schema strategy of JsonTreeReader in PutElasticsearchRecord.
But the processor is failing on type conversion of the first field in the first json of the array. It seems that it cannot handle nested jsons in array (while splitting it to single jsons worked).
In addition, my implemetation need to deal the following limitation:
1. Processing a large dataset (TBs), so it need to be fast.
2. Avoiding spliting the array of jsons into a single json per flow file, as it cause content claim issues (too many large files and small files in the flow).
3. Avoiding managing schema, as the data is much variant.
Love to hear how is best to Index this data to Elasticsearch.
Created 01-29-2024 02:48 AM
@eylon, Welcome to our community! To help you get the best possible answer, I have tagged in our NiFi experts @SAMSAL @cotopaul @MattWho @TimothySpann who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 01-29-2024 11:18 PM
Thanks for the reply.
Adding more details about my flow configuration and example for the input files