Created on 04-14-2020 08:31 AM - edited 04-14-2020 12:34 PM
I need to process some XML files that contain some doubled field names as shown is the sample below (Latitude, Longitude and Height). I need to change the 1st ocurrence in Latitude1, Longitude1, Height1 and the 2nd in Latitude2..etc so I can do a JSON conversion after.
What would be the best aproach to change these fields?
<?xml version="1.0" encoding="UTF-8"?><vwConnectionLog><ID>18500380</ID><JobID>1336467</JobID><Active>0</Active><StartTimeConnection>2019-12-01 12:09:12</StartTimeConnection><StartTimeSending>2019-12-01 12:09:17</StartTimeSending><EndTimeConnection>2019-12-01 12:25:23</EndTimeConnection><FirstSiteCode>HORE</FirstSiteCode><FirstRefStatID>84</FirstRefStatID><FirstDistanceToStat>41913</FirstDistanceToStat><LastSiteCode>VLC2</LastSiteCode><LastRefStatID>29</LastRefStatID><LastDistanceToStat>35429</LastDistanceToStat><Latitude>0.7820122964688779</Latitude><Longitude>0.4227112816015234</Longitude><Height>234.8100</Height><JobGUID>f787b88a-38bc-4158-ab44-0081a52f5295</JobGUID><Time>2019-12-01 12:17:17</Time><ActRefStationID>84</ActRefStationID><ActRefStationCode>HORE</ActRefStationCode><ActNMEARefStationID>806</ActNMEARefStationID><Satellites>-1</Satellites><SatellitesUsed>22</SatellitesUsed><PositionFix>4</PositionFix><HDOP>1.7999999523162842</HDOP><Event>Rover state changed</Event><Latitude>0.7820122964688779</Latitude><Longitude>0.4227112816015234</Longitude><Height>234.81</Height><Auxiliaries/><SatellitesGPS>-1</SatellitesGPS><SatellitesGlo>-1</SatellitesGlo><FixedSatellites>-1</FixedSatellites><FixedSatellitesGPS>-1</FixedSatellitesGPS><FixedSatellitesGLO>-1</FixedSatellitesGLO><UsedSatellitesFKP>-1</UsedSatellitesFKP><UsedSatellitesFKPGps>-1</UsedSatellitesFKPGps><UsedSatellitesFKPGlo>-1</UsedSatellitesFKPGlo><SatellitesBDS>-1</SatellitesBDS><FixedSatellitesBDS>-1</FixedSatellitesBDS><UsedSatellitesFKPBds>-1</UsedSatellitesFKPBds><SatellitesGAL>-1</SatellitesGAL><FixedSatellitesGAL>-1</FixedSatellitesGAL><SatellitesQZSS>-1</SatellitesQZSS><FixedSatellitesQZSS>-1</FixedSatellitesQZSS><RoverUserName>pop15064</RoverUserName><RoverUserCompany>template</RoverUserCompany><RoverUserDetail>pop15064 pop15064</RoverUserDetail><RoverUserClientHost>14693</RoverUserClientHost><HeartbeatDisconnectTime>300</HeartbeatDisconnectTime><SubscriptionId>19202</SubscriptionId><RTProductName>RO_VRS_3.1</RTProductName><MessageType>Virtual RS RTCM 3.x (Extended)</MessageType><RTCMVersion>4</RTCMVersion><EndOfMessage>Nothing</EndOfMessage><RefStationID>-1</RefStationID><Connection>NTRIP-Client</Connection><HostName>Proxy</HostName><PortNr>2101</PortNr><NtripMntp>RO_VRS_3.1</NtripMntp><FilePath/><FilePathActive>0</FilePathActive><Authentication>Ntrip</Authentication><CellsSitesType>Automatic cells</CellsSitesType><AutoSelectCellSite>1</AutoSelectCellSite><DistanceForChanging>1000</DistanceForChanging><SatSystem>3</SatSystem><SendNullAntenna>Yes</SendNullAntenna><ErrString/><VerboseInfoString/><MaxDistProvCorr>100000</MaxDistProvCorr><FallbackDistance>3000</FallbackDistance><FallbackOnNWOff>8</FallbackOnNWOff><FallbackOnDist>16</FallbackOnDist><Fallforward>128</Fallforward><UseMaxDistProvCorr>4</UseMaxDistProvCorr><RoverCredentialName>pop15064</RoverCredentialName></vwConnectionLog>
Created 04-16-2020 01:27 AM
Here is the process to the point I got stuck. So I have the original xmls, I do a split to separate the child xmls that are the target of my analysis but they contain duplicate fields. I didn’t find a way to use the ReplaceText to change only the 1st encountered element in order to append an ID so it may be unique. Another approach that I found was to use a Python script attached below, that did the job, but I wonder if is there way to it in one single process.
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
class PyStreamCallback(StreamCallback):
def __init__(self, flowfile):
self.ff = flowfile
pass
def process(self, inputStream, outputStream):
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
text = text.replace("<Latitude>", "<Latitude1>", 1)
outputStream.write(bytearray(text.encode('utf-8')))
flowFile = session.get()
if (flowFile != None):
flowFile = session.write(flowFile,PyStreamCallback(flowFile))
session.transfer(flowFile, REL_SUCCESS)