Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to process text files with semi structured data and convert to JSON

Highlighted

How to process text files with semi structured data and convert to JSON

Explorer

sample-data.txtupdateattributes-properties.pngI am trying to build a flow that takes a syslog input from a proofpoint (email) source and convert messages to JSON format. I am using ListenSyslog processor to get the data in and UpdateAttribute processor to get regular syslog properties. The problem is that UpdateAttributes doesn't "recognize" any of regular syslog properties except for syslog.port, syslog.protocol and syslog.sender. The messages are not uniformly constructed, with first 3 fields (delimited by white space) having same format, and the rest of the fields having various formats and lengths. The number of fields can also vary from line to line.

Is there a way to dynamically build JSON object without knowing incoming format and number of attributes?

8 REPLIES 8
Highlighted

Re: How to process text files with semi structured data and convert to JSON

Re: How to process text files with semi structured data and convert to JSON

Explorer

Yes I did try it, with same attributes as I put into UpdateAttributes - same result

Highlighted

Re: How to process text files with semi structured data and convert to JSON

@Alex M I am not sure I understand well your use case. From my understanding, you are receiving syslog message that you want to convert to JSON. If yes, then you can use ParseSyslog to get the attribute then use AttributesToJson to convert.

Attached a test I did. Can you import the template and test it and tell me if this is what are you looking for?

testhcc.xml

Highlighted

Re: How to process text files with semi structured data and convert to JSON

Explorer

I could not upload your template - get "The specified template is not in a valid format" error

Highlighted

Re: How to process text files with semi structured data and convert to JSON

Maybe because I am on NiFi 1.4. Can you try to use ListenSyslog -> ParseSyslog -> AttributesToJSON ?

Highlighted

Re: How to process text files with semi structured data and convert to JSON

Explorer

The processor doesn't recognize an incoming message as a valid syslog format

Highlighted

Re: How to process text files with semi structured data and convert to JSON

@Alex M do you mean the ParseSyslog? I tested with data you provided in your question and everything works fine for me. Can you details what are you doing with few screenshots ? I am not sure I understand what you need to achieve

Highlighted

Re: How to process text files with semi structured data and convert to JSON

Explorer

Yes. The file I attached contains a very minute subset of what's coming in. Here is the picture of my test flow and the error message below:

2017-11-17 11:08:50,535 ERROR [Timer-Driven Process Thread-9] o.a.nifi.processors.standard.ParseSyslog ParseSyslog[id=cae90699-015f-1000-5758-7ddd330c061d] Failed to parse StandardFlowFileRecord[uuid=3ed5f084-c4c2-4fcc-9682-f6203195f5b1,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1510938527608-3484, container=default, section=412], offset=941379, length=141207],offset=0,name=1472246469105863,size=141207] as a Syslog message: it does not conform to any of the RFC formats supported; routing to failure

flow.png

Don't have an account?
Coming from Hortonworks? Activate your account here