About saikrishna_tara

mburgess · ‎06-19-2018

I recommend using MergeRecord before the JoltTransformJSON as the Jolt transform should be able to be applied to the whole JSON array (after your smaller JSON objects have been merged). You'll want to use a JsonTreeRecordReader and provide an Avro schema that matches your input data above. mergerecord-example.xml is an example template where I generate data similar to yours, use MergeRecord to bundle them 20 at a time, then run the Jolt spec on it, it includes the associated Avro schema and hopefully all config to get you up and going.

DennisJaheruddi · ‎01-15-2019

First of all doublecheck all configurations (incl. password). Just to avoid moving in the right direction. Secondly confirm that you do not need TLS enabled. If these don't help, the following might help with troubleshooting: 1. Become nifi on the node where nifi is running 2. Send the message via Python 3. Share the python command here Note: Please explicity specify all things that you configure in nify when executing python (even if they are not needed because of good defaults for instance).

paulhernandez · ‎05-29-2018

Hi guys, thanks so much for the fast support and thanks to the Matts Team @Matt Burgess and @Matt Clarke I finally understood how the processor works. He emits a flow file with no payload and in the meta attributes are the file details like path and filename. Those are used by the HDFSFetch to fetch the correspondent files. Kind regards, Paul

nitindamle_123 · ‎01-31-2019

@Shu , can you upload xml for recent solution , @shu can you upload .xml file here for recent flow

mburgess · ‎04-17-2018

I am working on NIFI-4456 which will allow the JSON reader/writer to support the "one JSON per line" format as well as the "JSON array" format for input and output, so you will be able to read in one JSON per line and output a JSON array, using ConvertRecord (or any other record-aware processor). In the meantime, you can use the following crude script in an ExecuteGroovyScript processor to process your entire file (avoiding the Split/Merge pattern), it should get you what you want: def flowFile = session.get() if(!flowFile) return flowFile = session.write(flowFile, {inStream, outStream -> outStream.write('['.bytes) inStream.eachLine { line, i -> if(i > 1) outStream.write(','.bytes) outStream.write(line.bytes) } outStream.write(']'.bytes) } as StreamCallback) session.transfer(flowFile, REL_SUCCESS) The script just adds array brackets around the whole doc, and separates the lines by a comma. I did the crude version because it doesn't need to load the entire input content into memory. If you need more control over the JSON objects, you could iterate over the lines (still with eachLine), use JsonSlurper to deserialize each string into a JSON object, then add each object to an array, then use JsonOutput to serialize the whole thing back to a string. However that involves having the entire content in memory and could get unwieldy for large input flow files.

Shu_ashu · ‎03-21-2018

@Saikrishna Tarapareddy Flatten json processor doesn't work if you are having arrays,nested arrays in the json content and the flowfile will route to failure if you are having array,nested arrays in the content. we still need to use splitjson (or) jolt transform processors to split the array. As this processor joins all The keys are combined at each level with a user-defined separator that we have specified in the processor configs. Input Json:- { "id": 17, "name": "John", "child": { "id": "1" }} Output json:- {"id":17,"name":"John","child.id":"1"} As you can see the nested json message has been joined with .(period) in the output json content.

saikrishna_tara · ‎03-01-2018

@Bryan Bende looks like I have an option to use SegmentContent and MergeContent after I use ReplaceText on each segment. I tried it with 10mb segment size on a 120 mb file and it worked..now will try on the bigger file.

Wynner · ‎01-09-2018

@Saikrishna Tarapareddy Those properties refer only to the archive, not the content repository in general. The content repository only cleans out a claim, that is how flow files are stored in the content repository, after all flow files associated with that claim are out of the flow graph. Here is a link to an article that describes how the content repository archiving works: Understanding how NiFi's Content Repository Archiving works. Depending on the version of HDF/NiFi you're using, it will clean out the archived files if you changed the archived.enabled to false.

saikrishna_tara · ‎11-29-2017

@Matt Andruff , that's true , I already came up with a process. i did that using a couple of tables , one to write all successful files with names and dates and another for missing files and dates. i populate these on the file insertion NiFi flow. i join those to find if the missing files ever come back and landed success table. Thanks anyway for your time on this. Regards, Sai

maloochandra · ‎11-17-2017

Precisely what I needed. Thanks!

Online	Offline
Last Visited	‎10-26-2018 04:07 PM

Member Since	‎07-08-2016 07:25 PM
Last Visited	‎10-26-2018 04:07 PM
Posts	260
Kudos received	44

Cloudera Community

Re: I am using unpack processor in NIFI to unzip a...

Re: Changing NiFi Templates

Re: Hive Connection Pool Kerberos Issue

Re: NiFi user getting Tez exception

Re: Help with Hive Regex extract.

Re: Help Transforming JSON

Re: NiFi PutEmail Error

Re: NiFi 1.4 queue shows millions of files and 0 M...

Re: NiFi Updating header

Re: Modifying JSON file

Re: NiFi FlattenJSON is not working.

Re: NiFi JVM settings for large files.

Re: NiFi Content Repo full

Re: How to find missing partitions on a hive table...

Re: [Apache Nifi] Split a flowfile based on json-a...