Support Questions

Find answers, ask questions, and share your expertise

Can someone please help me with the AVRO spec for this XML

avatar
Rising Star

The XML that I have is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<dataService>   <dataOutput>      <fltdMessage acid="QXE2828" airline="QXE" arrArpt="KSJC" cdmPart="false" depArpt="KPAE" fdTrigger="HCS_TRACK_MSG" flightRef="97606500" msgType="trackInformation" sensitivity="A" sourceFacility="KZSE" sourceTimeStamp="2019-04-05T16:44:30Z">         <trackInformation>            <qualifiedAircraftId>               <aircraftId>QXE2828</aircraftId>               <computerId>                  <facilityIdentifier>KZSE</facilityIdentifier>                  <idNumber>657</idNumber>               </computerId>               <gufi>KS49063601</gufi>               <igtd>2019-04-05T15:30:00Z</igtd>               <departurePoint>                  <airport>KPAE</airport>               </departurePoint>               <arrivalPoint>                  <airport>KSJC</airport>               </arrivalPoint>            </qualifiedAircraftId>            <speed>378</speed>            <reportedAltitude>               <assignedAltitude>                  <simpleAltitude>330C</simpleAltitude>               </assignedAltitude>            </reportedAltitude>            <position>               <latitude>                  <latitudeDMS degrees="41" direction="NORTH" minutes="14" seconds="14" />               </latitude>               <longitude>                  <longitudeDMS degrees="122" direction="WEST" minutes="49" seconds="03" />               </longitude>            </position>            <timeAtPosition>2019-04-05T16:44:30Z</timeAtPosition>            <ncsmTrackData>               <eta etaType="ESTIMATED" timeValue="2019-04-05T17:29:51Z" />               <rvsmData currentCompliance="true" equipped="true" futureCompliance="true" />               <arrivalFixAndTime arrTime="2019-04-05T17:13:34Z" fixName="ZINNN" />               <nextEvent latitudeDecimal="39.125619878405274" longitudeDecimal="-122.59562140202789" />            </ncsmTrackData>         </trackInformation>      </fltdMessage>   </dataOutput>
</dataService>


I'd like to generate the AVRO spec for this so that I can use ConvertRecord with an XMLReader. Is there an easy way to do this?





107721-1554746021711.png

The InferAvroSchema processor does not seem to support this structure, as can be seen by the "unsupported content" in the following. I belive this processor only supports csv and json.


107673-1554746279645.png


So I think I need to find another means to generate the AVRO schema.


I think this may be close. I converted XML to JSON then I inferred the AVRO schema from the JSON. However this does not work as the AVRO schema for converting the XML directly.


{
"type" : "record",
"name" : "nice",
"fields" : [ {
"name" : "dataService",
"type" : {
"type" : "record",
"name" : "dataService",
"fields" : [ {
"name" : "dataOutput",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "dataOutput",
"fields" : [ {
"name" : "fltdMessage",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "fltdMessage.dataOutput",
"fields" : [ {
"name" : "acid",
"type" : "string",
"doc" : "Type inferred from '\"QXE2828\"'"
}, {
"name" : "airline",
"type" : "string",
"doc" : "Type inferred from '\"QXE\"'"
}, {
"name" : "arrArpt",
"type" : "string",
"doc" : "Type inferred from '\"KSJC\"'"
}, {
"name" : "cdmPart",
"type" : "boolean",
"doc" : "Type inferred from 'false'"
}, {
"name" : "depArpt",
"type" : "string",
"doc" : "Type inferred from '\"KPAE\"'"
}, {
"name" : "fdTrigger",
"type" : "string",
"doc" : "Type inferred from '\"HCS_TRACK_MSG\"'"
}, {
"name" : "flightRef",
"type" : "int",
"doc" : "Type inferred from '97606500'"
}, {
"name" : "msgType",
"type" : "string",
"doc" : "Type inferred from '\"trackInformation\"'"
}, {
"name" : "sensitivity",
"type" : "string",
"doc" : "Type inferred from '\"A\"'"
}, {
"name" : "sourceFacility",
"type" : "string",
"doc" : "Type inferred from '\"KZSE\"'"
}, {
"name" : "sourceTimeStamp",
"type" : "string",
"doc" : "Type inferred from '\"2019-04-05T16:44:30Z\"'"
}, {
"name" : "trackInformation",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "qualifiedAircraftId",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "qualifiedAircraftId.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "aircraftId",
"type" : "string",
"doc" : "Type inferred from '\"QXE2828\"'"
}, {
"name" : "computerId",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "computerId.qualifiedAircraftId.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "facilityIdentifier",
"type" : "string",
"doc" : "Type inferred from '\"KZSE\"'"
}, {
"name" : "idNumber",
"type" : "int",
"doc" : "Type inferred from '657'"
} ]
},
"doc" : "Type inferred from '{\"facilityIdentifier\":\"KZSE\",\"idNumber\":657}'"
}, {
"name" : "gufi",
"type" : "string",
"doc" : "Type inferred from '\"KS49063601\"'"
}, {
"name" : "igtd",
"type" : "string",
"doc" : "Type inferred from '\"2019-04-05T15:30:00Z\"'"
}, {
"name" : "departurePoint",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "departurePoint.qualifiedAircraftId.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "airport",
"type" : "string",
"doc" : "Type inferred from '\"KPAE\"'"
} ]
},
"doc" : "Type inferred from '{\"airport\":\"KPAE\"}'"
}, {
"name" : "arrivalPoint",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "arrivalPoint.qualifiedAircraftId.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "airport",
"type" : "string",
"doc" : "Type inferred from '\"KSJC\"'"
} ]
},
"doc" : "Type inferred from '{\"airport\":\"KSJC\"}'"
} ]
},
"doc" : "Type inferred from '{\"aircraftId\":\"QXE2828\",\"computerId\":{\"facilityIdentifier\":\"KZSE\",\"idNumber\":657},\"gufi\":\"KS49063601\",\"igtd\":\"2019-04-05T15:30:00Z\",\"departurePoint\":{\"airport\":\"KPAE\"},\"arrivalPoint\":{\"airport\":\"KSJC\"}}'"
}, {
"name" : "speed",
"type" : "int",
"doc" : "Type inferred from '378'"
}, {
"name" : "reportedAltitude",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "reportedAltitude.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "assignedAltitude",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "assignedAltitude.reportedAltitude.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "simpleAltitude",
"type" : "string",
"doc" : "Type inferred from '\"330C\"'"
} ]
},
"doc" : "Type inferred from '{\"simpleAltitude\":\"330C\"}'"
} ]
},
"doc" : "Type inferred from '{\"assignedAltitude\":{\"simpleAltitude\":\"330C\"}}'"
}, {
"name" : "position",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "position.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "latitude",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "latitude.position.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "latitudeDMS",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "latitudeDMS.latitude.position.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "degrees",
"type" : "int",
"doc" : "Type inferred from '41'"
}, {
"name" : "direction",
"type" : "string",
"doc" : "Type inferred from '\"NORTH\"'"
}, {
"name" : "minutes",
"type" : "int",
"doc" : "Type inferred from '14'"
}, {
"name" : "seconds",
"type" : "int",
"doc" : "Type inferred from '14'"
} ]
},
"doc" : "Type inferred from '{\"degrees\":41,\"direction\":\"NORTH\",\"minutes\":14,\"seconds\":14}'"
} ]
},
"doc" : "Type inferred from '{\"latitudeDMS\":{\"degrees\":41,\"direction\":\"NORTH\",\"minutes\":14,\"seconds\":14}}'"
}, {
"name" : "longitude",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "longitude.position.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "longitudeDMS",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "longitudeDMS.longitude.position.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "degrees",
"type" : "int",
"doc" : "Type inferred from '122'"
}, {
"name" : "direction",
"type" : "string",
"doc" : "Type inferred from '\"WEST\"'"
}, {
"name" : "minutes",
"type" : "int",
"doc" : "Type inferred from '49'"
}, {
"name" : "seconds",
"type" : "string",
"doc" : "Type inferred from '\"03\"'"
} ]
},
"doc" : "Type inferred from '{\"degrees\":122,\"direction\":\"WEST\",\"minutes\":49,\"seconds\":\"03\"}'"
} ]
},
"doc" : "Type inferred from '{\"longitudeDMS\":{\"degrees\":122,\"direction\":\"WEST\",\"minutes\":49,\"seconds\":\"03\"}}'"
} ]
},
"doc" : "Type inferred from '{\"latitude\":{\"latitudeDMS\":{\"degrees\":41,\"direction\":\"NORTH\",\"minutes\":14,\"seconds\":14}},\"longitude\":{\"longitudeDMS\":{\"degrees\":122,\"direction\":\"WEST\",\"minutes\":49,\"seconds\":\"03\"}}}'"
}, {
"name" : "timeAtPosition",
"type" : "string",
"doc" : "Type inferred from '\"2019-04-05T16:44:30Z\"'"
}, {
"name" : "ncsmTrackData",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "ncsmTrackData.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "eta",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "eta.ncsmTrackData.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "etaType",
"type" : "string",
"doc" : "Type inferred from '\"ESTIMATED\"'"
}, {
"name" : "timeValue",
"type" : "string",
"doc" : "Type inferred from '\"2019-04-05T17:29:51Z\"'"
} ]
},
"doc" : "Type inferred from '{\"etaType\":\"ESTIMATED\",\"timeValue\":\"2019-04-05T17:29:51Z\"}'"
}, {
"name" : "rvsmData",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "rvsmData.ncsmTrackData.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "currentCompliance",
"type" : "boolean",
"doc" : "Type inferred from 'true'"
}, {
"name" : "equipped",
"type" : "boolean",
"doc" : "Type inferred from 'true'"
}, {
"name" : "futureCompliance",
"type" : "boolean",
"doc" : "Type inferred from 'true'"
} ]
},
"doc" : "Type inferred from '{\"currentCompliance\":true,\"equipped\":true,\"futureCompliance\":true}'"
}, {
"name" : "arrivalFixAndTime",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "arrivalFixAndTime.ncsmTrackData.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "arrTime",
"type" : "string",
"doc" : "Type inferred from '\"2019-04-05T17:13:34Z\"'"
}, {
"name" : "fixName",
"type" : "string",
"doc" : "Type inferred from '\"ZINNN\"'"
} ]
},
"doc" : "Type inferred from '{\"arrTime\":\"2019-04-05T17:13:34Z\",\"fixName\":\"ZINNN\"}'"
}, {
"name" : "nextEvent",
"type" : {
"type" : "record",
"name" : "dataService",
"namespace" : "nextEvent.ncsmTrackData.trackInformation.fltdMessage.dataOutput",
"fields" : [ {
"name" : "latitudeDecimal",
"type" : "double",
"doc" : "Type inferred from '39.125619878405274'"
}, {
"name" : "longitudeDecimal",
"type" : "double",
"doc" : "Type inferred from '-122.59562140202789'"
} ]
},
"doc" : "Type inferred from '{\"latitudeDecimal\":39.125619878405274,\"longitudeDecimal\":-122.59562140202789}'"
} ]
},
"doc" : "Type inferred from '{\"eta\":{\"etaType\":\"ESTIMATED\",\"timeValue\":\"2019-04-05T17:29:51Z\"},\"rvsmData\":{\"currentCompliance\":true,\"equipped\":true,\"futureCompliance\":true},\"arrivalFixAndTime\":{\"arrTime\":\"2019-04-05T17:13:34Z\",\"fixName\":\"ZINNN\"},\"nextEvent\":{\"latitudeDecimal\":39.125619878405274,\"longitudeDecimal\":-122.59562140202789}}'"
} ]
},
"doc" : "Type inferred from '{\"qualifiedAircraftId\":{\"aircraftId\":\"QXE2828\",\"computerId\":{\"facilityIdentifier\":\"KZSE\",\"idNumber\":657},\"gufi\":\"KS49063601\",\"igtd\":\"2019-04-05T15:30:00Z\",\"departurePoint\":{\"airport\":\"KPAE\"},\"arrivalPoint\":{\"airport\":\"KSJC\"}},\"speed\":378,\"reportedAltitude\":{\"assignedAltitude\":{\"simpleAltitude\":\"330C\"}},\"position\":{\"latitude\":{\"latitudeDMS\":{\"degrees\":41,\"direction\":\"NORTH\",\"minutes\":14,\"seconds\":14}},\"longitude\":{\"longitudeDMS\":{\"degrees\":122,\"direction\":\"WEST\",\"minutes\":49,\"seconds\":\"03\"}}},\"timeAtPosition\":\"2019-04-05T16:44:30Z\",\"ncsmTrackData\":{\"eta\":{\"etaType\":\"ESTIMATED\",\"timeValue\":\"2019-04-05T17:29:51Z\"},\"rvsmData\":{\"currentCompliance\":true,\"equipped\":true,\"futureCompliance\":true},\"arrivalFixAndTime\":{\"arrTime\":\"2019-04-05T17:13:34Z\",\"fixName\":\"ZINNN\"},\"nextEvent\":{\"latitudeDecimal\":39.125619878405274,\"longitudeDecimal\":-122.59562140202789}}}'"
} ]
},
"doc" : "Type inferred from '{\"acid\":\"QXE2828\",\"airline\":\"QXE\",\"arrArpt\":\"KSJC\",\"cdmPart\":false,\"depArpt\":\"KPAE\",\"fdTrigger\":\"HCS_TRACK_MSG\",\"flightRef\":97606500,\"msgType\":\"trackInformation\",\"sensitivity\":\"A\",\"sourceFacility\":\"KZSE\",\"sourceTimeStamp\":\"2019-04-05T16:44:30Z\",\"trackInformation\":{\"qualifiedAircraftId\":{\"aircraftId\":\"QXE2828\",\"computerId\":{\"facilityIdentifier\":\"KZSE\",\"idNumber\":657},\"gufi\":\"KS49063601\",\"igtd\":\"2019-04-05T15:30:00Z\",\"departurePoint\":{\"airport\":\"KPAE\"},\"arrivalPoint\":{\"airport\":\"KSJC\"}},\"speed\":378,\"reportedAltitude\":{\"assignedAltitude\":{\"simpleAltitude\":\"330C\"}},\"position\":{\"latitude\":{\"latitudeDMS\":{\"degrees\":41,\"direction\":\"NORTH\",\"minutes\":14,\"seconds\":14}},\"longitude\":{\"longitudeDMS\":{\"degrees\":122,\"direction\":\"WEST\",\"minutes\":49,\"seconds\":\"03\"}}},\"timeAtPosition\":\"2019-04-05T16:44:30Z\",\"ncsmTrackData\":{\"eta\":{\"etaType\":\"ESTIMATED\",\"timeValue\":\"2019-04-05T17:29:51Z\"},\"rvsmData\":{\"currentCompliance\":true,\"equipped\":true,\"futureCompliance\":true},\"arrivalFixAndTime\":{\"arrTime\":\"2019-04-05T17:13:34Z\",\"fixName\":\"ZINNN\"},\"nextEvent\":{\"latitudeDecimal\":39.125619878405274,\"longitudeDecimal\":-122.59562140202789}}}}'"
} ]
},
"doc" : "Type inferred from '{\"fltdMessage\":{\"acid\":\"QXE2828\",\"airline\":\"QXE\",\"arrArpt\":\"KSJC\",\"cdmPart\":false,\"depArpt\":\"KPAE\",\"fdTrigger\":\"HCS_TRACK_MSG\",\"flightRef\":97606500,\"msgType\":\"trackInformation\",\"sensitivity\":\"A\",\"sourceFacility\":\"KZSE\",\"sourceTimeStamp\":\"2019-04-05T16:44:30Z\",\"trackInformation\":{\"qualifiedAircraftId\":{\"aircraftId\":\"QXE2828\",\"computerId\":{\"facilityIdentifier\":\"KZSE\",\"idNumber\":657},\"gufi\":\"KS49063601\",\"igtd\":\"2019-04-05T15:30:00Z\",\"departurePoint\":{\"airport\":\"KPAE\"},\"arrivalPoint\":{\"airport\":\"KSJC\"}},\"speed\":378,\"reportedAltitude\":{\"assignedAltitude\":{\"simpleAltitude\":\"330C\"}},\"position\":{\"latitude\":{\"latitudeDMS\":{\"degrees\":41,\"direction\":\"NORTH\",\"minutes\":14,\"seconds\":14}},\"longitude\":{\"longitudeDMS\":{\"degrees\":122,\"direction\":\"WEST\",\"minutes\":49,\"seconds\":\"03\"}}},\"timeAtPosition\":\"2019-04-05T16:44:30Z\",\"ncsmTrackData\":{\"eta\":{\"etaType\":\"ESTIMATED\",\"timeValue\":\"2019-04-05T17:29:51Z\"},\"rvsmData\":{\"currentCompliance\":true,\"equipped\":true,\"futureCompliance\":true},\"arrivalFixAndTime\":{\"arrTime\":\"2019-04-05T17:13:34Z\",\"fixName\":\"ZINNN\"},\"nextEvent\":{\"latitudeDecimal\":39.125619878405274,\"longitudeDecimal\":-122.59562140202789}}}}}'"
} ]
},
"doc" : "Type inferred from '{\"dataOutput\":{\"fltdMessage\":{\"acid\":\"QXE2828\",\"airline\":\"QXE\",\"arrArpt\":\"KSJC\",\"cdmPart\":false,\"depArpt\":\"KPAE\",\"fdTrigger\":\"HCS_TRACK_MSG\",\"flightRef\":97606500,\"msgType\":\"trackInformation\",\"sensitivity\":\"A\",\"sourceFacility\":\"KZSE\",\"sourceTimeStamp\":\"2019-04-05T16:44:30Z\",\"trackInformation\":{\"qualifiedAircraftId\":{\"aircraftId\":\"QXE2828\",\"computerId\":{\"facilityIdentifier\":\"KZSE\",\"idNumber\":657},\"gufi\":\"KS49063601\",\"igtd\":\"2019-04-05T15:30:00Z\",\"departurePoint\":{\"airport\":\"KPAE\"},\"arrivalPoint\":{\"airport\":\"KSJC\"}},\"speed\":378,\"reportedAltitude\":{\"assignedAltitude\":{\"simpleAltitude\":\"330C\"}},\"position\":{\"latitude\":{\"latitudeDMS\":{\"degrees\":41,\"direction\":\"NORTH\",\"minutes\":14,\"seconds\":14}},\"longitude\":{\"longitudeDMS\":{\"degrees\":122,\"direction\":\"WEST\",\"minutes\":49,\"seconds\":\"03\"}}},\"timeAtPosition\":\"2019-04-05T16:44:30Z\",\"ncsmTrackData\":{\"eta\":{\"etaType\":\"ESTIMATED\",\"timeValue\":\"2019-04-05T17:29:51Z\"},\"rvsmData\":{\"currentCompliance\":true,\"equipped\":true,\"futureCompliance\":true},\"arrivalFixAndTime\":{\"arrTime\":\"2019-04-05T17:13:34Z\",\"fixName\":\"ZINNN\"},\"nextEvent\":{\"latitudeDecimal\":39.125619878405274,\"longitudeDecimal\":-122.59562140202789}}}}}}'"
} ]
}


1 ACCEPTED SOLUTION

avatar
Master Guru

As of NiFi 1.9.0 (HDF 3.4), the XMLReader can be configured to infer the schema. If you can't upgrade, you could download NiFi 1.9.0 and run it once to infer the schema and write it to an attribute, then inspect the flow file and copy off the schema for use in your operational NiFi instance. There may also be libraries and/or websites that will infer the Avro schema from the XML file for you.

View solution in original post

11 REPLIES 11

avatar
Master Guru

As of NiFi 1.9.0 (HDF 3.4), the XMLReader can be configured to infer the schema. If you can't upgrade, you could download NiFi 1.9.0 and run it once to infer the schema and write it to an attribute, then inspect the flow file and copy off the schema for use in your operational NiFi instance. There may also be libraries and/or websites that will infer the Avro schema from the XML file for you.

avatar
Rising Star

Ty @Matt Burgess. I can try that. I have a continuous large stream of XML to handle. I've been assuming that using ConvertRecord in order to convert the XML into JSON is faster than using XMLTransform to do the same. Is this your experience? Also I am hoping that ConvertRecord (and AVRO output specs) will give me more flexibility in trimming down the output. i dont need all the input data in my output JSON format (I only need some of it).

avatar
Master Guru

You can also try JoltTransformRecord, using the JOLT DSL you can choose which fields you want from the input (and where to put them in the output). As a record-based processor, you can use the XMLReader and JSONRecordSetWriter and it will do the conversion for you.

avatar
Rising Star

Hi @Matt Burgess

I've tried 1.9.2. Sure enough the XMLRecordReader works with message inference. Using this as part of ConvertRecord allowed me to easily convert from XML to JSON (avoding the use of TransformXML with an xslt sheet).


Four questions for you.

1] Would you expect ConvertRecord or TransformXML to be more flexible?

2] Would you expect ConvertRecord or TransformXML to provide better performance? Performance is key for me - i have a lot of data.

3] How can I inspect the generated flowfiles to see the actual AVRO format? I thought it might be placed on an attribute within the generated flowfile, however I did not see this.

4] You also suggested trying JoltTransformRecord. Would this be instead of ConvertRecord and TransformXML (or in addition to)?

avatar
Master Guru

1) I believe TransformXML is more flexible in terms of structural transformation as it leverages the full power of XSLT. However the XML-to-JSON XSLTs I've seen sometimes have limitations (inline comments can be a problem, e.g.).

2) I'm not sure which would be faster per se, probably depends on how much data, what kind of transformation(s) are performed, etc. Also I think TransformXML reads the entire XML input into memory, so for large XML files you may risk running out of memory. ConvertRecord's record readers read in a single record at a time IIRC.

3) It doesn't seem like you want to convert anything to Avro, are you asking how to see the record schema? Internally we have our own RecordSchema representation, but when we write out to a flow file attribute (for example), we use Avro's schema format (even if the data is in CSV, JSON, XML, etc.). To see the schema, set your RecordSetWriter to write the schema to the avro.schema attribute, then you can inspect the flow file's attributes from the UI and see the Avro schema.

4) ConvertRecord only changes the format of the input (CSV to JSON, e.g.), it doesn't really do any transformation of the records (although technically you can configure it to add or remove fields). If you're doing any actual transformation of data (uppercasing field names, changing "F" to "Female", etc.) then you can use JoltTransformRecord, UpdateRecord, etc. The key is that all record-based processors will do format conversion for you, so you only need ConvertRecord if all you want is to change the format of the data. Otherwise the other record processors do their thing (like PartitionRecord groups records by value) but will also convert the format, depending on which Reader and Writer you configure. Does that make sense?

avatar
Rising Star

Hi @Matt Burgess

All your points make sense. In terms of 3 above, I only thought that I might extract the inferred schema for performance reasons. As it stands the XMLReader will be forced to infer the schema with each received record. The nature of my incoming XML data is that it contains data with optional fields, and hence the inferred schema will change. My thought was that if I can determine the "superset schema" and apply that then I'd eliminate the need for inference (buying back the associated performance). In terms of 4 above, it sounds like what I will want to do is JoltTransformRecord (instead of ConvertRecord). The end result will be that I both convert to JSON and change the structure to a more suitable form.


Thanks very much for your detailed replies. They are hugely useful. With feedback from folks such as you I am bringing my comfort with NIFI up. My role currently is to architect our future solutions, and in that context I am looking for the technologies that we might integrate to solve problems in new and better ways. I'm very bullish about NIFI especially when I see folks from Hortonworks support it so well. I'm keen to try some of the other components that Hortonworks also supports. I do have a separate posted question about streaming messages into buffered queues that I'd love your architectural perspective on.

avatar
Rising Star

Hi @Matt Burgess.


I've tried JoltTransformRecord. Its not behaving as I'd expect. In the following you'll see that I generate a single XML record, I convert it using JoltTransformRecord (that fails). I also convert it using the same XMLReader and JSONSetWriter using ConvertRecord. I then pipe that converted JSON to a separate JoltTransformJSON, that is using the same JOLT Transform as the original JoltTransformRecord. The JoltTransformJSON succeeds. The configuration of the JoltTransformRecord is as follows:


107803-1554901402591.png


The overall flow is as follows:


107754-1554901441459.pngWhat am I missing with the use of the JoltTransformRecord?

avatar
Rising Star

The specific error I see is as follows:

107743-1554903840961.png


Here is the associated template for the flow above. You can use it to verify the problem I am having:

Test_Jolt_Transform_Record.xml

avatar
Rising Star

I have seen a similar problem reported here. Worthy of note, is that I think the problem is data specific. There are XML records that seem to work fine in the test flow I've provided. This does not make sense to me however since as I said the JoltTransformJSON works fine for the generated JSON (even for the problem data).