Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to read json from S3 then edit json using Nifi

avatar
Explorer

I am using Nifi 1.6.0. I am trying to copy an S3 file into redshift. The json file on S3  looks like this:
[

{"a":1,"b":2},

{"a":3,"b":4}
]
However this gives an error because of '[' and ']' in the file (https://stackoverflow.com/a/45348425).

I need to convert the json from above format to a format which looks like this:

{"a":1,"b":2}

{"a":3,"b":4}

So basically what I am trying to do is to remove '[' and ']' and replace '},' with '}'.

The file has over 14 million rows (30 MB in size)

How can I achieve this using Nifi?

1 ACCEPTED SOLUTION

avatar
Super Guru

@baymax277 ,

 

Try this:

araujo_0-1655951401398.png

This is the regex:

(?s)(^\[|\]$)

You may have to increase the buffer size to accommodate your file.

 

Cheers,

André

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

11 REPLIES 11

avatar
Super Guru

@baymax277 ,

 

The problem with your file is not the square brackets ([ ]). The issue is that single quotes are not a valid JSON quote. The correct JSON would be:

[
{"a":1,"b":2},
{"a":3,"b":4}
]

 

The content above is processed correctly by NiFi if you use record-based processors (like, for example QueryRecord).

 

Ideally your source data should be fixed to provide a valid JSON. For example, if those files are being generated by a Python script like this:

my_object = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
print(my_object)

The Python script should be updated to do this instead:

import json
my_object = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
print(json.dumps(my_object))

 

If you cannot have the source data fixed, you can try to replace the single quotes with double quotes using a ReplaceText processor with the following configuration.

 

araujo_0-1655941522722.png

 

This is a simplistic approach, though, and can cause problems in some complex JSON objects if you have strings with unexpected sequences of characters in them.

 

Cheers,

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Explorer

@araujo 
Sorry, my bad. I made a mistake in the json representation mentioned in the question. The records actually have double quotes ("). I have updated the question. 

avatar
Super Guru

@baymax277 ,

 

Could you share a screenshot of the part of you flow that reads and processes the JSON content?

 

André

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Explorer

@araujo 
The jsons files are compressed and are placed in an S3 bucket. Here's the flow I have created to fetch s3 file and decompress it.

baymax277_0-1655945817170.png

 

avatar
Super Guru

@baymax277 ,

 

Could you also share the screenshot of the part of the flow where the error is happening? I think that's after the Decompress File step, right?

 

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Explorer

@araujo , Right now there is no error in Nifi flow. But when I tried to execute a copy command (from an SQL query editor) on the file, it gave me an error "Invalid JSONPath format: Member is not an object. ". When I searched about this error, I found this - https://stackoverflow.com/a/45348425

 

This is why I wanted to convert the file.

avatar
Super Guru

@baymax277 

 

Ok. Can you share a screenshot of the part of the flow that insert the data into your database?

 

Cheers,

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Explorer

I have not created that part of the flow yet, but that will be an ExecuteStreamCommand processor which will run a psql command with the copy sql query:
It will look something like this:

baymax277_0-1655948726669.png

 

avatar
Super Guru

@baymax277 ,

 

You should look into using PutDatabaseRecord instead of ExecuteStreamCommand. This would be more efficient and it would eliminate the need to manually remove those brackets.

 

Cheers,

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.