Member since
09-18-2018
92
Posts
5
Kudos Received
0
Solutions
07-24-2019
07:29 PM
I've verified that if the key is set to a field that is in the root level of my JSON then I am able to update/upsert properly. For example, I tried this with the evt_type field: This worked fine. So my only question is how do I handle a field that is one level down in the JSON as is the case with stdds_id {"evt_data": {"stdds_id": "stdds_id_value" , ...}}
... View more
07-24-2019
07:16 PM
By the way, (if I need to) I am able to use EvaluateJSONPath to extract the stdds_id and place it as an attribute on the flowfile (see STDDS_ID) : Id use STDDS_ID as the key into mongo, but I'm not sure how to add this to the flowfile content, so that I can update/upsert using this key.
... View more
07-24-2019
05:57 PM
I have a sequence of flowfiles that I need to put to mongo. A sample is at the bottom of this question. The flowfile contains JSON with a field "evt_data": {"stdds_id": "stdds_id_value" .. I need that stdds_id_value to be the key for update/upsert into mongo. I'm looking for help with the flow configuration that will make this work. My current PutMongo (non-working attempt) configuration is this: I'm trying to put a record that is keyed on the evt_data.stdds_id value as see in the following flowfile JSON: In other words the key for the document in Mongo would be "TARGET-STDDS-KCLT-157481920517" This doesnt work, and I end up seeing multiple documents in mongo for that same key. I should only see one per distinct key. What is the proper way to set the update key?
... View more
Labels:
- Labels:
-
Apache NiFi
07-23-2019
06:51 PM
PutMongo does not like my $or
... View more
07-23-2019
05:38 PM
I need to do an update/upsert into Mongo. Essentially the command that I need to run is seen in the following command (this works in the mongo command line client): Notice in the first update I search for the document that matches the specified STDDS_DST_ID. In the second update, I match any of several Ids including the one that was already matched. In this simple example. I have a set of linked Ids: TFM_ID, FVIEW_ID, STDDS_DST_ID. The set of linked IDs is unique. For in this example that per-set distinction guarantees that you wont find STDDS_DIST_ID 100 associated with another FVIEW_ID, or TFM_ID. You'll only find it with FVIEW_ID 3000 and TFM_ID 300000. So assuming that I have a flowfile that contains some number of fields (e.g. fld1, fld2, fl3), and one or more of the ids: TFM_ID, STDDS_DST_ID, FVIEW_ID, how can I configure PutMongo so that it will update/upsert the appropriate document (that one that matches one or more of these IDs)? Again, in my mind, PutMongo simply needs to be configured consistent with the update you see in the image above. I just dont have much experience with PutMongo. Looking at the documentation, I believe I must do the following: Set Mode to update Set Upsert to true Leave Update Query Key Empty Set the Update Query to something such as the first argument in the sample command: { $or: [{"TFM_ID": "300000"}, {"FVIEW_ID": "3000"},{"STDDS_DST_ID": "100"}]} Set the flowfile content to the data to place in the document (including the $set): {$set: {"fld2": "fld2_val", "fld3": "fld3_val", "TFM_ID": "300000", "FVIEW_ID": "3000","STDDS_DST_ID": "100"}} Are my assumptions on the mark?
... View more
Labels:
- Labels:
-
Apache NiFi
07-16-2019
04:23 PM
Thanks @Nico Verwer. I'll give that a shot, and will be sure to accept your response as an answer, once I verify.
... View more
07-04-2019
10:29 AM
HI @Shu. Could you please explain what sysdate, current_date, etc would do for me with the spark job? I dont fully understand how to use them and the benefits that this technique would offer.
... View more
07-03-2019
01:00 PM
@Shu I like your idea of creating daily archives (Option 3 above). How do I ensure that spark jobs that I create to process those daily files run on the datanode that they are stored on? Does yarn do this by default? I've not yet used yarn. I've only used HDFS. I am hoping to eventually use k8s (kubernetes).
... View more
07-03-2019
12:56 PM
Hi @Shu. Thank you very much for your thoughts. This is the kind of feedback that I was hoping for. I'll absolutely do my best to understand your recommendation. It sounds like I am not completely off-base in the way that I hope to use HDFS. It does sound like you are confirming that I must figure out how to accumulate large files, prior to driving them into HDFS. I will look at the tools and methods that you suggest. Thanks for your insights
... View more
07-02-2019
06:29 PM
I've used the PutHDFS processor as I've started to understand how to deal with big data environments. Up until now I've been putting very small files into HDFS. This seems to be architecturally bad practice. The HDFS block size defaults to about 128 MB, and the hadoop community recommendation seems to be that applications (that write to HDFS) should write files that are GB in size, or even TB. I'm trying to understand how to do this with Nifi. Part of my concern is a concern for the data analysts. What is the best way to logically structure files that are appropriate for HDFS? Currently the files that I am writing contain small JSON objects, or lists. I use MergeRecord to intentionally make the file I write larger. However my JSON objects accumulate fast thousands of JSON records per second potentially. For the Big Data/Nifi experts, I'd appreciate any thoughts relative to the best way to use Nifi to support streaming large data objects into HDFS.
... View more
Labels:
- Labels:
-
Apache NiFi