About dpexecute

dpexecute · ‎06-04-2018

Thanks for this, this is great!

dpexecute · ‎06-04-2018

I agree with you but one problem is I am ingesting and querying millions of files so I doubt MonitorActivity will be able to keep track and also I don't find "run-schedule" option in properties of any ES/HBase processor to be accurate. E.g. Changing the run-schedule for HBase doesn't actually slow down the ingest much at all probably due to batch sizes.

dpexecute · ‎06-01-2018

Not sure I understand the MonitorActivity part, I would still need the flowfile-content attached so I can't use other flowfiles or split flowfiles. I mean I think I am literally stuck now waiting for 1.7. FetchElastic has no "target" and QueryElastic just seems to disappear the flowfile when no hits return. I have no way of keeping the flowfile-content binary, while I query and update its metadata. It doesn't pass empty attributes for the flowfile so I don't get to keep the flowfile. I need a way perhaps to pull out x amount of files and confirm that x amount of files were recorded, if less than x, then redo that batch of files.

dpexecute · ‎05-31-2018

I have a situation where I query Elasticsearch using "QueryElasticsearchHttp" to match a query then return results of the query to populate flowfile-attributes. However, if the query returns empty results it doesn't go to "failure queue". I need the results but the results are inconsistent, they may not be ready to be queried. I want it to fail so that I can "Retry at a later time" using ControlRate. However, whenever QueryElasticsearchHttp fails to find the document, it just disappears, no failure, no success queue file. The flowfile disappears. I CANNOT use FetchElasticsearchHttp because it has no property: "Target: flowfile-attributes" ( I need the flowfile content that is being queried ) I am really stuck now, because my results in elasticsearch are populated by a different nifi flow process, so it may not be ready to be "Queried from Elastic". That's a timing issue. I want it to try querying again later.

dpexecute · ‎05-31-2018

Right but my goal was to be able to write binary cells and other string cells using PutHBaseJson. My issue is instead of a GetFile, I'm getting it as a flowfile with a binary field "bytes" field. I'm having a bit of an issue trying to parse it so that I can maybe put it into flowfile-content while still keeping the other fields as attributes perhaps. So you send the attributes to PutHBaseJson, while you send the flowfile-content PutHBaseCell. For that I think I need to split all the avros into individual flowfiles, then the tricky part is converting the binary field into flowfile-content. How exactly would you convert one field in an avro into flowfile content ?

dpexecute · ‎05-30-2018

Hi, so I figure with Nifi 1.3 the best way to send PDFs between two systems is to put it inside a binary field ("bytes" field) in Avro. This comes in as binary, and then using maybe UpdateRecord you can modify some fields, then I want to store it in HBase. Now here is where the problem comes in. The binary in Avro is working but HBase only allows JSON storage (Same with Solr or Elasticsearch). So when you convert the binary in Avro to JSON, JSON cannot store binary (it may convert it to a byte-array type of integers). What's the best standard way of storing it in a JSON NoSQL database? Would it be smarter to convert to something like Base64 or whatever and have that as the JSON field? Additionally, whenever I use Avros with binary, I seem to have some trouble converting it, I get a lot of "ArrayIndexOutOfBounds" whenever I try to use UpdateRecord for example. I guess the only way is to splitAvro or SplitJSON and store the binary inside flowfile-content??? This would slow down the process a lot.

dpexecute · ‎05-03-2018

This is pretty amazing. It deserves to be on a blog of some kind 🙂

dpexecute · ‎05-02-2018

@Shu I am not seeing the advantage of having the sample files in JSON/csv format, I can manipulate those myself. The problem I have is reducing avros or manipulating them with Record processors without converting to JSON first. I'm trying to avoid having to do any Splits/SplitJSON and EvaluateJsonPath which is a nice way to create attributes or manipulate flowfile-content. Main problems include: Converting Avro parameters with record.count > 1 to attributes without splitting. Removing fields from an avro flowfile-content. Renaming fields from an avro flowfile-content (not too important). Converting an embedded Avro format to a different AvroSchemaRegistry format (reducing fields essentially). (the topic of this post).

dpexecute · ‎05-01-2018

ok sure I replied with a sample file. More schema fields embedded than the schema above.

dpexecute · ‎05-01-2018

Just an example Avro file that can be used to test a schema. It looks like you can't just convertRecord if you're missing fields from your schema (ConvertRecord doesn't reduce the amount of fields as expected, it instead likes to fail and say "incompatible schemas" essentially). test-avro-file.txt

Online	Offline
Last Visited	‎06-12-2018 02:26 PM

Member Since	‎04-23-2018 08:37 PM
Last Visited	‎06-12-2018 02:26 PM
Posts	16
Kudos received	2

Cloudera Community

Re: Best way of handling corrupt or missing blocks...

Re: Problem with timing Elastic Queries with Nifi

Re: Problem with timing Elastic Queries with Nifi

Problem with timing Elastic Queries with Nifi

Re: Best way to store PDF from Avro Binary in Nifi

Best way to store PDF from Avro Binary in Nifi

Re: Reduce fields in AvroReader to AvroWriter usin...

Re: Reduce fields in AvroReader to AvroWriter usin...

Re: Reduce fields in AvroReader to AvroWriter usin...

Re: Reduce fields in AvroReader to AvroWriter usin...