Created 05-03-2016 01:44 PM
Hello All,
I want to write to ES using nifi. So I choose to use putElasticsearch processor to do this.
I want to know is it possible to do bulk insert to ES (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html) using putElasticsearch processor?
My understanding: putElasticsearch writes to ES by reading the content of the input flowfile i.e, just one record in json format as shown below. ex: {"priority": "DEBUG", "classname": "ServiceImpl", "message": "Getting node by name XXX", "creationTime": "2016-04-20T15:38:43.000000Z"}
However, I want to perform a bulk insert. thus my flowfile contents will be in the format: (as specified by ES bulk api)
action_and_meta_data\n optional_source\n
action_and_meta_data\n optional_source\n ....
For example:
{"index": {"_type": "xxxxxlogs", "_id": "2016-04-20-15:13:57-945", "_index": "xxxxlogs-2016-04-20"}} \n
{"priority": "DEBUG", "classname": "ServiceImpl", "message": "Getting node by name XXXX", "creationTime": "2016-04-20T15:13:57.000000Z"} \n
{"index": {"_type": "xxxxlogs-", "_id": "2016-04-20-15:13:57-941", "_index": "xxxxlogs-2016-04-20"}} \n
{"priority": "DEBUG", "classname": "ServiceImpl", "message": "Got node idx XXX", "creationTime": "2016-04-20T15:13:57.000000Z"} \n
.......
Thousands of such entries: (action_and_meta_data\n optional_source\n)
Please let me know if this can be achieved with putElasticsearch processor? or point me to any specific format for my flowfile conten?
Otherwise please let me know how to achieve bulk insert to es using NIFI ?
Many thanks.
Regards,
Amarnath
Created 05-03-2016 02:05 PM
Hi @Amar ch So the putElasicsearch processor as you have identified, is designed to write individual flowfiles, or indeed batches of flowfiles.
Those batches are controlled via the "Batch Size" property. I guess it really depends what you mean by bulk insert, I don't see any limiations on the "Batch Size" so it should be possible to increase that until you get the size insert you require.
For more information on the properties, please take a look at:
From reviewing the JIRA associated with the processor, it does look as if the putElasticsearch projcessor does make use of the bulk load api.
Created on 05-09-2016 03:37 PM - edited 08-19-2019 02:51 AM
Here is the flow where I have removed sensitive info (sorry for bad quality!!)
Here are the processors
AddHeader
{"index":{"_index":"${name:toLower()}","_type":"${observableProperty}"}}
{"observableProperty"
MERGE
ADJUST FORMAT IN BULK
POST HTTP TO ES
Created 05-12-2016 04:16 PM
Thank @Massimiliano Nigrelli for the information, it is helpful.
Created 05-12-2016 01:33 PM
@Amar ch please accept one of the answers to close the thread.