Created on 06-22-2022 06:14 AM - edited 06-22-2022 05:48 PM
I am using Nifi 1.6.0. I am trying to copy an S3 file into redshift. The json file on S3 looks like this:
[
{"a":1,"b":2},
{"a":3,"b":4}
]
However this gives an error because of '[' and ']' in the file (https://stackoverflow.com/a/45348425).
I need to convert the json from above format to a format which looks like this:
{"a":1,"b":2}
{"a":3,"b":4}
So basically what I am trying to do is to remove '[' and ']' and replace '},' with '}'.
The file has over 14 million rows (30 MB in size)
How can I achieve this using Nifi?
Created on 06-22-2022 07:31 PM - edited 06-22-2022 07:32 PM
Try this:
This is the regex:
(?s)(^\[|\]$)
You may have to increase the buffer size to accommodate your file.
Cheers,
André
Created 06-22-2022 04:46 PM
The problem with your file is not the square brackets ([ ]). The issue is that single quotes are not a valid JSON quote. The correct JSON would be:
[
{"a":1,"b":2},
{"a":3,"b":4}
]
The content above is processed correctly by NiFi if you use record-based processors (like, for example QueryRecord).
Ideally your source data should be fixed to provide a valid JSON. For example, if those files are being generated by a Python script like this:
my_object = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
print(my_object)
The Python script should be updated to do this instead:
import json
my_object = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
print(json.dumps(my_object))
If you cannot have the source data fixed, you can try to replace the single quotes with double quotes using a ReplaceText processor with the following configuration.
This is a simplistic approach, though, and can cause problems in some complex JSON objects if you have strings with unexpected sequences of characters in them.
Cheers,
André
Created 06-22-2022 05:52 PM
@araujo
Sorry, my bad. I made a mistake in the json representation mentioned in the question. The records actually have double quotes ("). I have updated the question.
Created 06-22-2022 05:51 PM
Could you share a screenshot of the part of you flow that reads and processes the JSON content?
André
Created 06-22-2022 05:57 PM
@araujo
The jsons files are compressed and are placed in an S3 bucket. Here's the flow I have created to fetch s3 file and decompress it.
Created 06-22-2022 06:01 PM
Could you also share the screenshot of the part of the flow where the error is happening? I think that's after the Decompress File step, right?
André
Created 06-22-2022 06:16 PM
@araujo , Right now there is no error in Nifi flow. But when I tried to execute a copy command (from an SQL query editor) on the file, it gave me an error "Invalid JSONPath format: Member is not an object. ". When I searched about this error, I found this - https://stackoverflow.com/a/45348425
This is why I wanted to convert the file.
Created 06-22-2022 06:18 PM
Ok. Can you share a screenshot of the part of the flow that insert the data into your database?
Cheers,
André
Created 06-22-2022 06:45 PM
I have not created that part of the flow yet, but that will be an ExecuteStreamCommand processor which will run a psql command with the copy sql query:
It will look something like this:
Created 06-22-2022 06:49 PM
You should look into using PutDatabaseRecord instead of ExecuteStreamCommand. This would be more efficient and it would eliminate the need to manually remove those brackets.
Cheers,
André