Support Questions

Find answers, ask questions, and share your expertise

CSV with embedded JSON to JSON - NiFi

avatar
New Contributor

Hello,

 

I have a CSV input similar to this, which I need in JSON.

 

Name der Abteilung (DepartmentName),Name des Kontos (AccountName),ID des Kontobesitzers (AccountOwnerId),Zusätzliche Informationen (AdditionalInfo)
ExampleDepartment,,dnajas93477ajdn,"{  ""UsageType"": ""ComputeHR"",  ""ImageType"": ""Windows Server BYOL""}"

 

 

In the CSV there is a nice JSON embedded. I use an Avro Schema and the CSV Reader controller service to get rid of the human readable localized header and to read the the data.

 

The Avro Schema currently looks like this:

 

 

{
   "type" : "record",
   "name" : "DailyCSV",
   "fields" : [
      { "name" : "DepartmentName" , "type" : ["string", "null"] },
	  { "name" : "AccountName" , "type" : ["string", "null"] },
	  { "name" : "AccountOwnerId", "type" : ["string", "null"] },
      { "name" : "AdditionalInfo", "type" : [ "null", "string" ] }
]
}

 

 

This does reads the CSV nicely, however, the embedded JSON will be left as string, resulting following JSON:

 

 

[ {
  "DepartmentName" : ExampleDepartment,
  "AccountName" : null,
  "AccountOwnerId" : "dnajas93477ajdn",
  "AdditionalInfo" : "{  \"UsageType\": \"ComputeHR\",  \"ImageType\": \"Windows Server BYOL\"}
]

 

 

What I would like to have is:

 

 

[ {
  "DepartmentName" : ExampleDepartment,
  "AccountName" : null,
  "AccountOwnerId" : "dnajas93477ajdn",
  "AdditionalInfo" : {
           "UsageType": "ComputeHR",
           "ImageType": "Windows Server BYOL"
           }
]

 

 

Additional information:

 

I do know the exact format of the JSON which gets embedded in this case. However, in the future I will have to tackle with user-generated tags which gets also stored as json arrays embedded in the CSV. There I will have no knowledge of the JSON field names (not even the number of them).

 

Additional Info in the CSV can be empty.

 

Any ideas?

1 ACCEPTED SOLUTION

avatar
Super Guru

@gbukovszki The behavior you are describing is just how nifi escapes the string representation of the JSON inside of the schema.  It is required in order to send to different avro processors.

 

Assuming you have the schema in an attribute JSONAttribute, when you need to unescape, use the expression language below in UpdateAttribute :

${JSONAttribute:unescapeJson()}

 

You can also do similar action if the escaped values are in a FlowFiles content with ReplaceText in Replacement Value:

${'$1':unescapeJson()}

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  

 

Thanks,


Steven @ DFHZ

View solution in original post

2 REPLIES 2

avatar
Super Guru

@gbukovszki The behavior you are describing is just how nifi escapes the string representation of the JSON inside of the schema.  It is required in order to send to different avro processors.

 

Assuming you have the schema in an attribute JSONAttribute, when you need to unescape, use the expression language below in UpdateAttribute :

${JSONAttribute:unescapeJson()}

 

You can also do similar action if the escaped values are in a FlowFiles content with ReplaceText in Replacement Value:

${'$1':unescapeJson()}

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  

 

Thanks,


Steven @ DFHZ

avatar
New Contributor

Great answer many thanks. For the record, I had the data in a FlowFile. I needed a bit of regex magic to remove the quotation marks after unescaping.