Member since
12-11-2017
21
Posts
4
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4236 | 12-21-2020 02:44 AM | |
1922 | 07-10-2020 08:42 AM |
12-21-2020
02:44 AM
1 Kudo
Hello @Lokeswar The queryRecord processor does exactly what you want. But you need to have your job in a record-oriented approach, using json reader and json writter. Then you don't work with flow-file attribute, but directly with your flow-file content. So, you have your CSV: - use a convertRecord to transform it in record flow-file, using CSVreader as reader, and JSONTreeWriter as out writer as you want JSON - add a queryRecord processor, and your query should look like this: SELECT * FROM FLOWFILE WHERE resourceState='INSTALLED' OR resourceState='RETIRED' warning: don't use double quote for values, just simple quote After this, you will have a condition at the output of the queryrecord that you can plugto your next processor.
... View more
12-17-2020
11:06 PM
1 Kudo
hello @spa I've been looking for this also but it doesn't exist. Then, you can use a script (python, groovy,...) In case you have performance issue with scriptprocessor, you can improve the situation using the trick here: InvokeScriptedProcessor template (a faster ExecuteScript)
... View more
12-17-2020
01:24 AM
Hi @justenji Same for me, I've tried to use avro schema generator, including the inferschema from Nifi, but no luck.
... View more
12-16-2020
11:36 PM
hello @Anurag007 Your description is a little bit 'dry'. Anyway, you can probably do what you want with the following processors: - getFile (or better, listFile + fetchFile) to get the content of your files - routeOnContent, which allows you to define some routing rules based on file content using regexp You will find easily many examples of how to use these processors, probably using the search feature of this site
... View more
12-16-2020
03:45 AM
Hello @justenji Thanks a lot for the time you spend on my issue, I really appreciate. Yes, as I mentionned at the beguinning of my post, it works with basic JoltTranformJSON on a single JSON entry, and this is what I'm doing now: split my records and then use this processor. But I want to keep the record-oriented approach which is really more efficient regarding performances. I wanted to test some different thing regarding schema, as suggested by @TimothySpann . I guess we need to tell Jolt that the output will be an array of record. I've tried various attempts with avro schema but no luck. Actually, I've even tried to use inferSchema to create a schema, but the AvroRegistrySchema doesn't want to take take it, and the error message I have is "Not a named Type" Here is the basic avro schema: {
"type": "array",
"namespace":"nothing",
"items": {
"type": "record",
"name": "steps",
"fields": [
{
"name": "index",
"type": "string",
"doc": "Type inferred from index"
}
]
}
} Do we have avro guru around the corner? Thanks Stéphane
... View more
12-15-2020
07:57 AM
Hello @TimothySpann Thanks for your reply. I use a basic JsonTreeReader with no schema, just infer schema.
... View more
12-14-2020
11:58 PM
Hello, I'm facing a weird issue with jolt. I have a flowfile which is record-oriented, one JSON object per line with the following structure: {"aleas": [{object1}, {object2}, {object3}]} and why I basically want to do is to get rid of this "aleas" root key andhave something like this: [{object1}, {object2}, {object3}] I've tested this spec on the Jolt demo site: [
{
"operation": "shift",
"spec": {
"aleas": {
"*": []
}
}
}
] But when I run it on Nifi (lastest release) using a JoltTransformRecord processor, I get the following error message: 2020-12-15 07:50:17,415 ERROR [Timer-Driven Process Thread-8] o.a.n.p.jolt.record.JoltTransformRecord JoltTransformRecord[id=654dabc3-0176-1000-0c3a-067d307c6f07] Unable to transform StandardFlowFileRecord[uuid=b818aa99-b538-48bb-942e-c39d70854c53,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1608018617233-9, container=default, section=9], offset=570949, length=1329453],offset=0,name=60dcc444-f06a-4c65-b667-8309583eb782_Feuil1.csv,size=1329453] due to org.apache.nifi.processor.exception.ProcessException: Error transforming the first record: org.apache.nifi.processor.exception.ProcessException: Error transforming the first record
org.apache.nifi.processor.exception.ProcessException: Error transforming the first record
at org.apache.nifi.processors.jolt.record.JoltTransformRecord.onTrigger(JoltTransformRecord.java:335)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1174)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) I use a basic jsonTreeReader as record reader, with all the options set to default. The funny part is that if I put a split record processor and process each JSON flowfile using JoltTransformJSON, it works nicely. Neverheless, I'd like to avoid this solution which is really bad for performance and breaks my whole "record-oriented" flow. Any idea? Thaks for your support Stéphane
... View more
Labels:
- Labels:
-
Apache NiFi
12-14-2020
07:23 AM
Hello @justenji Thanks for the detailed explanation, this is also what I have tested on my side and in my opinion this does the job being "record oriented" 🙂 Have a nice day
... View more
12-09-2020
02:12 AM
I need to test this, but actually it seems that the queryRecord is exactly why I need. It is possible to build some complex conditions using SQL-like language on the record content and then perform some routing based on that
... View more
12-09-2020
01:25 AM
Hi @TimothySpann Thanks so much for pointing me to this processor, it looks so great!!
... View more