Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CSV with embedded JSON to JSON - NiFi

Solved Go to solution

CSV with embedded JSON to JSON - NiFi

New Contributor

Hello,

 

I have a CSV input similar to this, which I need in JSON.

 

Name der Abteilung (DepartmentName),Name des Kontos (AccountName),ID des Kontobesitzers (AccountOwnerId),Zusätzliche Informationen (AdditionalInfo)
ExampleDepartment,,dnajas93477ajdn,"{  ""UsageType"": ""ComputeHR"",  ""ImageType"": ""Windows Server BYOL""}"

 

 

In the CSV there is a nice JSON embedded. I use an Avro Schema and the CSV Reader controller service to get rid of the human readable localized header and to read the the data.

 

The Avro Schema currently looks like this:

 

 

{
   "type" : "record",
   "name" : "DailyCSV",
   "fields" : [
      { "name" : "DepartmentName" , "type" : ["string", "null"] },
	  { "name" : "AccountName" , "type" : ["string", "null"] },
	  { "name" : "AccountOwnerId", "type" : ["string", "null"] },
      { "name" : "AdditionalInfo", "type" : [ "null", "string" ] }
]
}

 

 

This does reads the CSV nicely, however, the embedded JSON will be left as string, resulting following JSON:

 

 

[ {
  "DepartmentName" : ExampleDepartment,
  "AccountName" : null,
  "AccountOwnerId" : "dnajas93477ajdn",
  "AdditionalInfo" : "{  \"UsageType\": \"ComputeHR\",  \"ImageType\": \"Windows Server BYOL\"}
]

 

 

What I would like to have is:

 

 

[ {
  "DepartmentName" : ExampleDepartment,
  "AccountName" : null,
  "AccountOwnerId" : "dnajas93477ajdn",
  "AdditionalInfo" : {
           "UsageType": "ComputeHR",
           "ImageType": "Windows Server BYOL"
           }
]

 

 

Additional information:

 

I do know the exact format of the JSON which gets embedded in this case. However, in the future I will have to tackle with user-generated tags which gets also stored as json arrays embedded in the CSV. There I will have no knowledge of the JSON field names (not even the number of them).

 

Additional Info in the CSV can be empty.

 

Any ideas?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: CSV with embedded JSON to JSON - NiFi

Master Collaborator

@gbukovszki The behavior you are describing is just how nifi escapes the string representation of the JSON inside of the schema.  It is required in order to send to different avro processors.

 

Assuming you have the schema in an attribute JSONAttribute, when you need to unescape, use the expression language below in UpdateAttribute :

${JSONAttribute:unescapeJson()}

 

You can also do similar action if the escaped values are in a FlowFiles content with ReplaceText in Replacement Value:

${'$1':unescapeJson()}

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  

 

Thanks,


Steven @ DFHZ

 


 


If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  


 


Thanks,



Steven

View solution in original post

2 REPLIES 2
Highlighted

Re: CSV with embedded JSON to JSON - NiFi

Master Collaborator

@gbukovszki The behavior you are describing is just how nifi escapes the string representation of the JSON inside of the schema.  It is required in order to send to different avro processors.

 

Assuming you have the schema in an attribute JSONAttribute, when you need to unescape, use the expression language below in UpdateAttribute :

${JSONAttribute:unescapeJson()}

 

You can also do similar action if the escaped values are in a FlowFiles content with ReplaceText in Replacement Value:

${'$1':unescapeJson()}

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  

 

Thanks,


Steven @ DFHZ

 


 


If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  


 


Thanks,



Steven

View solution in original post

Highlighted

Re: CSV with embedded JSON to JSON - NiFi

New Contributor

Great answer many thanks. For the record, I had the data in a FlowFile. I needed a bit of regex magic to remove the quotation marks after unescaping.

Don't have an account?
Coming from Hortonworks? Activate your account here