Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Master Guru

Apache NiFi 1.0.0 has a lot of cool features, but no JSON-to-CSV converter yet.

For my use case, I wanted to split one JSON file that had 5,000 records into 5,000 CSVs. You could also through a MergeContent processor in there and make one file. My data came from the excellent test data source, RandomUser. They provide a free API to pull down data, https://api.randomuser.me/?results=5000&format=pretty. I chose pretty to get formatted JSON with multiple lines which is easier to parse.

This is an example of JSON returned from that service:

{"results":[
{"gender":"male",
"name":{"title":"monsieur","first":"lohan","last":"marchand"},
"location":{"street":"6684 rue jean-baldassini","city":"auboranges","state":"schwyz","postcode":9591},
"email":"lohan.marchand@example.com",
"login":{"username":"biggoose202","password":"esther","salt":"QIU1HBsr","md5":"9e60da6d4490cd6d102e8010ac98f283","sha1":"3de3ea419da1afe5c83518f8b46f157895266d17","sha256":"c6750c1a5bd18cac01c63d9e58a57d75520861733666ddb7ea6e767a7460479b"},
"dob":"1965-01-28 03:56:58",
"registered":"2014-07-26 11:06:46",
"phone":"(849)-890-5523",
"cell":"(395)-127-9369",
"id":{"name":"AVS","value":"756.OUVK.GFAB.51"},
"picture":{"large":"https://randomuser.me/api/portraits/men/69.jpg","medium":"https://randomuser.me/api/portraits/med/men/69.jpg","thumbnail":"https://randomuser.me/api/portraits/thumb/men/69.jpg"},
"nat":"CH"}
]

Step 1: GetFile: Read the JSON file from a directory (I can process any JSON file in there)

Step 2: SplitJSON: Split on $.results to split array of JSON.

8982-splitjson.png

Step 3: EvaluateJSONPath: Pull out attributes from the JSON record sent. Example: $.cell

8981-pullkeyattributes.png

Step 4: UpdateAttribute: Update the filename to be unique ${md5}.csv.

8940-updateattribute.png

Step 5: ReplaceText: To format JSON attributes into a line of command-separated values.

8983-replacetextcsv.png

Step 6: PutFile: Store the resulting file in a directory. (This could also be PutHDFS or many other sink processors).

8939-flow.png

24,926 Views
Comments
avatar
Expert Contributor

@Timothy Spann I have a similar use case, however my flow fails at the replace text processor. Can you give some advice on the configuration of the processor?

InvokeHTTP --> EvaluateJsonPath --> ReplaceText --> MergeContent --> UpdateAttribute --> PutHDFS

My flow does several HTTP calls with InvokeHTTP (Each call with different ID), extracts attributes from each JSON that is returned (each JSON is unique) and then creates the csv's like in your example. However after the MergeContent processor the merged CSV there is really a lot of duplicate data while all incoming JSONs contain unique data.

ReplaceText conf:

45400-repltext.png

MergeContent conf:

45401-capture.png

avatar
Master Guru

can you post logs and the error?

the better way now is to get the JSON schema (you can use InferAvroSchema if you dont) then just do splitJSON to ConvertRecord

no manual coding

I have lots of new articles on this one.

avatar
Master Guru

With the record-aware processors like ConvertRecord, you don't need SplitJSON either, it can work on a whole JSON array

avatar
Rising Star

Hi @Timothy Spann

I want to parse CSV JSON format file to CSV using Nifi.

My CSV table containing data like

name : surendra,

age : 25

address : {city:chennai,state:TN,zipcode:600234}

Now i want to take the output in below format :::

Name : surendra

Age : 25

Address_city : chennai, Address_state : TN, Address_zipcode : 600234

Can we do like this pls let me know.

avatar
Master Guru

now use record processor