About mburgess

mburgess · ‎04-19-2019

The hard part here is that Hive returns STRUCT columns as JSON strings, so even if we can parse the JSON, we've lost the type information. It's possible we can retrieve it from the metadata and (if so) create a nested record from the results. Please feel free to file a Jira for this enhancement.

mburgess · ‎04-19-2019

The JOLT DSL takes a lot of practice, but once it clicks, it's like seeing the Matrix lol. I'm no expert but I've put my time in 😉

mburgess · ‎04-18-2019

Try JoltTransformJSON with the following spec: [ { "operation": "shift", "spec": { "*": "TrackingRequestInformation.&" } }, { "operation": "default", "spec": { "TrackingRequestInformation": { "TrackingRequestFirstIp": null, "TrackingRequestLastIp": null, "TrackingRequestCreationTime": null, "TrackingRequestCreationDate": null, "TrackingRequestTrackingNumber": null, "TrackingRequestFirstCityName": null, "TrackingRequestFirstIpLat": null, "TrackingRequestFirstIpLong": null, "TrackingRequestFirstIpCountryName": null, "TrackingRequestFirstIpCountryCode": null, "TrackingRequestFirstIpPostalCode": null, "TrackingRequestFirstIpGeoLocation": null, "TrackingRequestLastCityName": null, "TrackingRequestLastIpLat": null, "TrackingRequestLastIpLong": null, "TrackingRequestLastIpCountryName": null, "TrackingRequestLastIpCountryCode": null, "TrackingRequestLastIpPostalCode": null, "TrackingRequestLastIpGeoLocation": null }, "DataSourceInformation": { "DataSourceUuid": null, "DataOwnerUuid": null, "RecordUuid": null } } } ] This spec moves the tracking data into TrackingRequestInformation, adds a default DataSourceInformation object, and adds any fields that aren't in the original input, seems to output what you describe above.

mburgess · ‎04-10-2019

Each flow file in the system automatically has an attribute "uuid" set to a unique identifier, this attribute is available in provenance events, via Expression Language (use "uuid" instead of "UUID()"). Is this what you are looking for? Or do you want a separate attribute that would remain the same even if the flow changes flow files (Split, Merge, e.g.)?

mburgess · ‎04-09-2019

1) I believe TransformXML is more flexible in terms of structural transformation as it leverages the full power of XSLT. However the XML-to-JSON XSLTs I've seen sometimes have limitations (inline comments can be a problem, e.g.). 2) I'm not sure which would be faster per se, probably depends on how much data, what kind of transformation(s) are performed, etc. Also I think TransformXML reads the entire XML input into memory, so for large XML files you may risk running out of memory. ConvertRecord's record readers read in a single record at a time IIRC. 3) It doesn't seem like you want to convert anything to Avro, are you asking how to see the record schema? Internally we have our own RecordSchema representation, but when we write out to a flow file attribute (for example), we use Avro's schema format (even if the data is in CSV, JSON, XML, etc.). To see the schema, set your RecordSetWriter to write the schema to the avro.schema attribute, then you can inspect the flow file's attributes from the UI and see the Avro schema. 4) ConvertRecord only changes the format of the input (CSV to JSON, e.g.), it doesn't really do any transformation of the records (although technically you can configure it to add or remove fields). If you're doing any actual transformation of data (uppercasing field names, changing "F" to "Female", etc.) then you can use JoltTransformRecord, UpdateRecord, etc. The key is that all record-based processors will do format conversion for you, so you only need ConvertRecord if all you want is to change the format of the data. Otherwise the other record processors do their thing (like PartitionRecord groups records by value) but will also convert the format, depending on which Reader and Writer you configure. Does that make sense?

mburgess · ‎04-09-2019

This is an open issue (NIFI-4957), I pinged the original author/contributor to get the status of it, if he is not actively working it I can take a look at adding the capability.

mburgess · ‎04-09-2019

You can use UpdateRecord for this, but make sure you have the additional fields in your writer's schema. Alternatively you can use JoltTransformJSON with the following spec: [ { "operation": "default", "spec": { "attributes": { "id": "12233", "map": "Y" } } } ]

mburgess · ‎04-09-2019

You can use UpdateRecord for this, add a user-defined property called "/year" with a Replacement Strategy of "Literal Value" and a value of 2019. Note that your Record Writer's schema should have the "year" field in it.

mburgess · ‎04-08-2019

You can also try JoltTransformRecord, using the JOLT DSL you can choose which fields you want from the input (and where to put them in the output). As a record-based processor, you can use the XMLReader and JSONRecordSetWriter and it will do the conversion for you.

mburgess · ‎04-08-2019

As of NiFi 1.9.0 (HDF 3.4), the XMLReader can be configured to infer the schema. If you can't upgrade, you could download NiFi 1.9.0 and run it once to infer the schema and write it to an attribute, then inspect the flow file and copy off the schema for use in your operational NiFi instance. There may also be libraries and/or websites that will infer the Avro schema from the XML file for you.

Online	Offline
Last Visited	‎12-03-2025 12:10 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎12-03-2025 12:10 PM
Posts	911
Kudos received	661

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: Convert Hive Column type String to Hive Column...

Re: How do I create a nested JSON from a flat JSON...

Re: How do I create a nested JSON from a flat JSON...

Re: Append Flow unique id

Re: Can someone please help me with the AVRO spec ...

Re: Format JSON file using schema

Re: Add json object in the json content of flowfil...

Re: How to add a key - value to JSON in Nifi?

Re: Can someone please help me with the AVRO spec ...

Re: Can someone please help me with the AVRO spec ...