Member since
04-03-2023
12
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1429 | 07-31-2023 05:34 AM |
01-30-2024
03:24 AM
1 Kudo
UPDATE: I'm working on an enclave, so this initial test was at jolt-demo.appspot.com, but moving it over to NiFi, I had to add one addition level in the JOLT transformation. What appears below is now correct. That was quick. This JOLT transformation . . . [
{
"operation": "shift",
"spec": {
"*": {
"*": {
"*": {
"*":
{
"@": ""
}
}
}
}
}
}
] . . . transform the JSON to this . . . [ {
"category" : "reference",
"author" : "Nigel Rees",
"title" : "Sayings of the Century",
"price" : 8.95
}, {
"category" : "fiction",
"author" : "Herman Melville",
"title" : "Moby **bleep**",
"isbn" : "0-553-21311-3",
"price" : 8.99
}, {
"category" : "fiction",
"author" : "J.R.R. Tolkien",
"title" : "The Lord of the Rings",
"isbn" : "0-395-19395-8",
"price" : 22.99
} ] And with an all-defaults JSONTreeReader and CSVRecordSetWriter, "select category" returns exactly what I need. I was thinking about JOLT, but haven't done much with it, and was fearful of the complexity. So thanks again, @SAMSAL for pushing me in the right direction.
... View more
01-30-2024
02:50 AM
1 Kudo
Wow, I was excited when I saw this as it looked like the kind of simple elegance I was looking for, and I wondered why I hadn't noticed the Starting Field Strategy property, because in trying to work this out, I had previously turned to JsonTreeReader. But in implementing it, I see why. We're on version 11.9.0, and the JsonTreeReader is version 1.14.0.i, meaning, I don't have those capabilities. Because moving to a more recent version is not possible, I will go down the JOLT pathway and see what I can work out. Even though I couldn't test it out, I will accept your solution because I believe if I had the latest and greatest, it would be the one. Plus, your description of the JSON hierarchy in play here was helpful.
... View more
01-29-2024
07:24 AM
I have this JSON: {
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby **bleep**",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J.R.R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
]
} I want a list/record set of categories. CSV, plain text, doesn't matter, but let's say CSV: category
"refereces"
"fiction"
"fiction" I have tried many things, too many to repeat them all here. But basically, I have a GenerateFlowFile where the JSON is hard-coded, then a QueryReport where the reader is a JsonPathReader where I have properties for all fields in the JSON: store $.store
book $.store.book[*]
category $.store.book[*].category
etc. Just to see what's being returned, I currently have the writer set to an all-defaults JsonRecordSetWriter. With this in mind, in the QueryRecord, select * returns the JSON unaltered. select store returns the JSON unaltered. select book returns "no column named 'book'". I can use an EvaluateJsonPath with $.store.book[*].category as the property value, and it returns this: ["references", "fiction", "fiction"] If I switch over to an all-defaults CSVRecordSetWriter and do select store, I get this: store
MapRecord[{book=[Ljava.lang.Object;@24bda2f0}] I know there are other ways to configure EvaluateJsonPath so it does parse the data correctly, but in doing so, it creates a FlowFile for each record. I don't want that; I want a single recordset in one FlowFile because this is just a proof of concept. With the real data I'm looking at tens of thousands of records. I also know I could take this to Groovy and get it done. I'd like to avoid that and only use to the bare minimum of native NiFi processors. I've also tried some things with a ForkRecord, but as I said, I've kind of lost the bubble on everything I've tried. I believe this is possible but running out of energy and ideas and think I've exhausted the wisdom of the web. Is it really this difficult? Let me know what I'm doing wrong.
... View more
Labels:
- Labels:
-
Apache NiFi
11-03-2023
02:45 AM
Thanks, Matt! We've been pressing hard for a year now to get some migration work done from an outmoded ETL tool, and as I've moved along, there's a lot I haven't stopped to truly understand. I had seen the notice about variable registry only before, but didn't truly appreciate what that meant. Now I do! And btw, I solved the problem by calling the "udpate" API directly from an InvokeHTTP processor where there's no restriction on using attributes. Works like a charm!
... View more
11-01-2023
10:15 AM
I moved a working flow that is populating Solr indexes from one process group to another. In the original, the SolrLocation property of the PutSolrContentStream processor is populated using two parameters: #{solr_url}/#{solr_index_name_a} It's done this way because a QueryRecord processor is used to split the record set into two groups, and one path uses the "a" index, and the other, the "b" index. However, in the new flow, I have to append a year value (i.e., "2023") to the name of the index depending on earlier processing. To accomplish this, I am holding the name of the index in an attribute instead of a parameter. At the appropriate time in the flow, I use an UpdateAttribute processor to append the correct year to the index name. Then further down, I have the PutSolrContentStream processor, and I populate the SolrLocation property like this: #{solr_url}/${solr_index_name_a} This fails with an "HTTP ERROR 404 NOT FOUND". It took a lot of trial an error, but I have discovered if I hard code the index name to a parameter, and set the SolrLocation name using two parameters (as in the original flow) instead of a parameter and an attribute like this: #{solr_url}/#{solr_index_name} it works. I move back to the attribute, and I get 404. In testing, I inserted in the middle an UpdateAttribute where I create an attribute called coreURL and set it to the value of the parameter + attribute, and I use that attribute instead as the SolrLocation. No dice. I then copy and paste the value of coreURL into SolrLocation (i.e., a hard-coded URL), and it works. It looks to me that, despite the documentation saying SolrLocation supports Expression Language, it doesn't, because, I've tried many variations, and any time I introduce an attribute to SolrLocation, the processor fails. with a 404 Version is 11.6.3.
... View more
Labels:
- Labels:
-
Apache NiFi
-
Apache Solr
09-22-2023
07:47 AM
Same problem, but this did not help me. The solution for me was found on stackoverflow. Change the Result RecordPath in the LookupRecord processor to a single forward slash. https://stackoverflow.com/questions/49674048/apache-nifi-hbase-lookup
... View more
07-31-2023
05:34 AM
Thanks for the suggestions. I ended up moving everything to stored procedures in the database, which are run under the existing context (service), so no need for a sensitive parameter.
... View more
07-18-2023
04:16 AM
I've developed a flow that depends on importing an Oracle database dump via data pump (i.e., the impdp command). To run this command, you have to pass in the username and password as part of the parameters of the impdp command to connect to the Oracle database. In early development, I simply put the password in a non-sensitive parameter, and at the appropriate time, I'm using a ReplaceText processor to wipe out the current contents of the flowfile and write in the contents of the shell script used to run the impdp command. Of course, the text I'm writing in to the flowfile includes #{oracle_pw}. I then PutFile to write the script to disk and ExecuteStreamCommand to run the script. It works like a charm. Now that development is over and we're making plans to move this flow into production, I need a way to make this password parameter sensitive and still be able to pass it in to the script. I've given this a lot of thought and done some research, but it seems like I might need to reach beyond the confines of NiFi to make this happen. One thought was to ensure the OS account NiFi is using already has all the Oracle stuff in its path, and then instead of creating and running a script, pass the impdp command directly via ExecuteStreamCommand, but again, non-sensitive properties (i.e., command.argument.1, command.argument.2, etc.) cannot contain sensitive parameters. I would appreciate any help in brainstorming how to accomplish this. Here are the basic contents of the script: #!/bin/bash eval "$(grep 'ORACLE_BASE=' /home/oracle/.bash_profile)" eval "$(grep 'ORACLE_HOME=' /home/oracle/.bash_profile)" PATH=$PATH:$ORACLE_HOME/bin export ORACLE_BASE ORACLE_HOME PATH impdp oracleuser/#{oracle_pw}@//oracleserver/oracledatabase more parameters, etc.
... View more
Labels:
- Labels:
-
Apache NiFi
06-15-2023
03:33 AM
Matt, wow. Thank you for the tremendous detail about how this processor works internally. This fills a gap here on the web that I know will help many in days to come. Yes, I wonder about myself sometimes. Max Bin Age was the one property I did not tinker with, but it makes perfect sense that this is what got me over the goal line—which also shows I'm still learning in my overall understanding of how NiFi works. I set it down to 15 secs, and so far so good. Many, many thanks!
... View more
06-14-2023
08:48 AM
I have a flow that fetches all files from a given directory—they could be gzip, zip or csv, with the gzip and zip files holding a single csv file. I then route on MIME type, decompress the gzip files, unpack the zip files, and then bring what are now ALL csv files back together. This is working. I then want to create a zip archive of all of these csv files, and MergeContent seemed like a good candidate (outside of using ExecuteStreamCommand to run zip from the OS). But no matter what I do, the results are inconsistent: The first time I ran it with default properties in MergeContent (and setting Merge Format to ZIP), it created a single zip file — I thought I was done! But on the second time, it created six zip files, so I realized I wasn't done, and started playing with properties Changed Maximum number of Bins to 1 - created five zip files Changed Correlation Attribute Name to absolute.path, which is the same for all flow files - created four zip files Changed Maximum Group Size to 500 MB (all csv files zipped up are ~500 KB) - created two zip files Changed Minimum Number of Entries to 1000, Maximum to 5000 (62 csv files) - created forty zip files Changed load balancing on the queue to single node - created twelve zip files Clearly I'm throwing darts hoping something will stick. Documentation (and the general wisdom of the web) hasn't been particularly helpful in understanding how this works, what a "bin" is, what a "bundle" is, and most of it seems geared towards breaking apart a single flow file, doing some processing, then bringing it back together. That's not what I'm doing. I'm starting with multiple flow files and want to bring all of them always, every time, to a single flow file. If the answer is that this can't be done with MergeContent, then I'll just run zip through the OS — but that would mean writing the files to disk, then zipping, and I wanted to try to keep this native Nifi. Again, I started with default properties, except changing Merge Format to ZIP, and then made my modifications from there. And, yes, I am using the "merged" relationship.
... View more
Labels:
- Labels:
-
Apache NiFi