Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can I convert a fixed width file into Json using controller services?

avatar
New Contributor

I have an input fixed width file as below
John 32 M New York 100
I want to convert it in to a Json like below
{
"Name": "John",
"Age": 32,
"Gender": "M",
"City": "New York",
"Country": 100
}


1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
5 REPLIES 5

avatar
Community Manager

@Chai09, Welcome to our community! To help you get the best possible answer, I have tagged our NiFi experts @cotopaul @SAMSAL @MattWho @steven-matison  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
New Contributor

@VidyaSargur  Thank you for the warm welcome! I appreciate being a part of this community.

avatar

Hi @Chai09 ,

There are different options to do this:

1- You can do ExtractText -> AttributeToJson. In the ExtractText you specify the attribute (Name, Age...) and the regex to capture each attribute value. The AttributeToJson processor you can list the attribute that you want to convert into Json in the AttributeList property and set the Destination to "flowfile-content" to get the Json you need.  To learn how you to use the ExtractText  please refer to:

https://community.cloudera.com/t5/Support-Questions/How-to-ExtractText-from-flow-file-using-Nifi-Pro...

However, You might find the Age and Country values are given to you as string values vs integer, if that is OK with you then you dont need to do anything, If you need them to be integer then you have to use another processor like QueryRecord where you can cast the string into integer or JoltTransformationJson processor where you can do conversion using the following jolt spec:

 

[
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "Age": "=toInteger(@(1,Age))",
      "Country": "=toInteger(@(1,Country))"
    }
  }
]

 

 

2- The easiest way I found if you dont like to use Regex is using QueryRecord with RecordReader set as GrokReader and RecordWrite set as JsonRecordSetWriter as follows

SAMSAL_0-1699635705346.png

The JsonRecord is dynamic property that will define the relationship that will produce the Json output you are looking for and it has the following value:

 

select Name
,CAST(AgeN as SMALLINT) as Age
,Gender
,City
,CAST(CountryN as SMALLINT) as Country
 from FLOWFILE

 

The GrokReader service is used to help you read unstructured data such as log files and its configured as the following:

SAMSAL_2-1699635881126.png

The Grok Expressions Property is set to the following:

 

%{WORD:Name} %{NUMBER:AgeN} %{WORD:Gender} %{DATA:City} %{NUMBER:CountryN}

 

The Grok Expression uses predefined Regex for the given types: WORD, NUMBER, DATA...etc. For more info: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-reaord-serialization-services...

The JsonRecordSetWriter service is configured as follows :

SAMSAL_3-1699636047849.png

This will produce the Json you are looking for the correct data types. Notice how I needed to still cast the Age,Country to INT despite they are being defined as Number in the Grok Expression and that is because JsonRecordSetWriter will still convert everything to string unless you provide an Avro Schema.

If that helps please accept solution.

Thanks

 

 

avatar
New Contributor

Dear @SAMSAL ,

I extend my sincere appreciation for your invaluable insights and guidance on the query. Your expertise has been instrumental in implementing the first method, and I'm grateful for your prompt and effective assistance.

Regarding the second method you suggested, it's working seamlessly for our current requirements. However, I'm curious about its applicability to scenarios where a file contains data that requires splitting and segregation.

For Instance, consider the input
JohnCena32 Male New York USA813668

And the desired JSON output:
{
"firstname": "John",
"lastname": "Cena",
"age": 32,
"gender": "Male",
"city": "New York",
"country": "USA",
"mobile": 813668
}

Could you please shed light on whether the second method remains effective in handling data structures like this?

Thank you once again for your time and expertise.

 

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login