About mburgess

Jasthi · ‎06-13-2018

Thanks for response. Small correction.. From custom code the message is gonna publish to kafka queue from there I am picking the JSON message to pass to EvaluateJsonPath processor. the EvaluateJsonPath has now two values one is source path and one is destination path. As you said you can use FetchS3Object to get the file from S3, how should I pass the source path to FetchS3Object processor and then how should I pass the destination path to PutFile processor? Could you explain me briefly ? Right now my flow is like attached screen shot. PFA...

paulhernandez · ‎05-29-2018

Hi guys, thanks so much for the fast support and thanks to the Matts Team @Matt Burgess and @Matt Clarke I finally understood how the processor works. He emits a flow file with no payload and in the meta attributes are the file details like path and filename. Those are used by the HDFSFetch to fetch the correspondent files. Kind regards, Paul

MattWho · ‎05-31-2018

@Mike Wong Does the listFile exhibit the same behavior or does it list your file correctly? - The fact that the logs shows it the processor yielding tells me it found no work to do (meaning no files to list). It yields so that it does not consume not stop CPU looking for work that does not exist. - Did you check your properties for leading or trailing whitespace? Did you try removing the "\" from your file filter? - Thanks, Matt

mallikarjun_r_t · ‎12-06-2018

Is there a way (without using execute script) to modify the processor_group variable in processor?

tyas_saleh · ‎05-27-2018

It works fine now I change within the replacetext to ${hive.ddl} location '${absolute.hdfs.path}'

mburgess · ‎05-21-2018

For approach #1, you could use the FlattenJson processor, you'll likely want to set the Separator property to "_" rather than the default "." since Hive adds the table name to each column in a ResultSet. For approach #2, you could have a single column table (column of type String), then you'd query it with get_json_object (example here). Alternatively if you can map all the types (including the complex types like array, list, struct, etc.) to a Hive table definition, you could use a JSON SerDe to write the data (example here).

vepin_chourasia · ‎10-22-2018

you have to go to path where nifi is installed.and that will be hdf folder which will not be found under root or your user. follow the steps to find go to cmd and hit this command ---> sudo docker exec -it sandbox-hdf /bin/bash then go to path /usr/hdf/3.1.2.0-7/nifi you will see nifi related folders. Now, create you input directory mkdir - inputdir mkdir -outputdir that's it ! use this directories in your getfile and putfile processor. Note: all folder permission should be NIFI as well as input data as well . Happy Learning! let me know if any issue posted on :Mon.Oct 22,2018

mburgess · ‎04-17-2018

I am working on NIFI-4456 which will allow the JSON reader/writer to support the "one JSON per line" format as well as the "JSON array" format for input and output, so you will be able to read in one JSON per line and output a JSON array, using ConvertRecord (or any other record-aware processor). In the meantime, you can use the following crude script in an ExecuteGroovyScript processor to process your entire file (avoiding the Split/Merge pattern), it should get you what you want: def flowFile = session.get() if(!flowFile) return flowFile = session.write(flowFile, {inStream, outStream -> outStream.write('['.bytes) inStream.eachLine { line, i -> if(i > 1) outStream.write(','.bytes) outStream.write(line.bytes) } outStream.write(']'.bytes) } as StreamCallback) session.transfer(flowFile, REL_SUCCESS) The script just adds array brackets around the whole doc, and separates the lines by a comma. I did the crude version because it doesn't need to load the entire input content into memory. If you need more control over the JSON objects, you could iterate over the lines (still with eachLine), use JsonSlurper to deserialize each string into a JSON object, then add each object to an array, then use JsonOutput to serialize the whole thing back to a string. However that involves having the entire content in memory and could get unwieldy for large input flow files.

mburgess · ‎04-26-2018

You only need one session per execution of the script. Using that session, you can get, create, remove, and transfer as many flow files as you want. If you get or create a flow file from the session, then you must transfer or remove it before the end of the script, or else you will get a "Transfer relationship not specified" error. Also you can only transfer each flow file once, if you attempt to transfer the same flow file more than once, you will get the error you describe above.

mburgess · ‎04-02-2018

I'm not sure if there is a function in JSONPath to retain the escaped quotes or not, but you could use UpdateAttribute (between EvaluateJsonPath and ReplaceText) along with the escapeJson function to "re-introduce" the quotes, by setting the "observation" attribute to the following value: ${observation:escapeJson()}

Online	Offline
Last Visited	‎11-07-2024 11:28 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎11-07-2024 11:28 PM
Posts	892
Kudos received	643

Cloudera Community

Re: Nifi Building error when creating a brand new ...

Re: Tuning PutHive3Streaming NiFi processor

Re: NiFi ExecuteScript - Able to add attributes to...

Re: NiFi - JOLT assign value to attribute from Jso...

Re: NiFi - ExecuteScript for getting max value of ...

Re: Nifi for reading json data and extract source ...

Re: NiFi 1.4 queue shows millions of files and 0 M...

Re: Using NiFi to load data from localFS to HDFS

Re: NiFi: Is it possible to access Processor Group...

Re: Oracle to Hive table

Re: create hive table from nested json file in NIF...

Re: GETFILE processor fails with the error "proces...

Re: Modifying JSON file

Re: Execute script groovy error: is not known in t...

Re: Nifi EvaluateJsonPath remove escape