<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: NiFi ExecuteScript - extracting and storing a value from a single line in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/NiFi-ExecuteScript-extracting-and-storing-a-value-from-a/m-p/358599#M237891</link>
    <description>&lt;P&gt;For anyone having a similar problem, here's how it was resolved.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The issue with #1 was fixed by switching from an&amp;nbsp;&lt;SPAN&gt;ExtractText processor to a RouteOnContent processor. RouteOnContent is much more simplistic and easy to use - just create a property for routing the flowfile and add the regex.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#4 was fixed by:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Moving the unique_id_index variable outside of the callback so it could be used afterwards.&lt;/SPAN&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Same with the unique_id variable ^&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Closing the InputStream before routing&amp;nbsp;the flowfile.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;Full (working) script is below:&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;var InputStreamCallback = Java.type("org.apache.nifi.processor.io.InputStreamCallback");&lt;BR /&gt;var IOUtils = Java.type("org.apache.commons.io.IOUtils");&lt;BR /&gt;var StandardCharsets = Java.type("java.nio.charset.StandardCharsets");&lt;BR /&gt;&lt;BR /&gt;var flowFile = session.get();&lt;BR /&gt;&lt;BR /&gt; // Get the name of the table&lt;BR /&gt;var table = flowFile.getAttribute('table.name');&lt;BR /&gt;&lt;BR /&gt;// Get the index of the unique_id via the attribute&lt;BR /&gt;var unique_id_index = flowFile.getAttribute('unique.id.index');&lt;BR /&gt;var unique_id;&lt;BR /&gt;&lt;BR /&gt;if(flowFile != null) {&lt;BR /&gt;   // Create a new InputStreamCallback, passing in a function to define the interface method&lt;BR /&gt;   session.read(flowFile,&lt;BR /&gt;   new InputStreamCallback(function(inputStream) {&lt;BR /&gt;      try {&lt;BR /&gt;         // Convert the single line of our flowfile to a UTF_8 encoded string&lt;BR /&gt;         var line = IOUtils.toString(inputStream, StandardCharsets.UTF_8);&lt;BR /&gt;      }&lt;BR /&gt;      catch(e) {&lt;BR /&gt;         log.error('Error on toString', e)&lt;BR /&gt;      }&lt;BR /&gt;      // Split the delimited data into an array&lt;BR /&gt;      var data = line.split('\t');&lt;BR /&gt;      // Get the value of the unique_id, using the index&lt;BR /&gt;      unique_id = data[unique_id_index];&lt;BR /&gt;    }));&lt;BR /&gt;&lt;BR /&gt;   if (typeof unique_id === 'undefined') {&lt;BR /&gt;      var ObjectArrayType = Java.type("java.lang.Object[]");&lt;BR /&gt;      var objArray = new ObjectArrayType(1);&lt;BR /&gt;      objArray[0] = table;&lt;BR /&gt;      objArray[1] = line;&lt;BR /&gt;      log.error('Error: could not find unique_id value for table {} and line {}', objArray);&lt;BR /&gt;      session.transfer(flowFile, REL_FAILURE)&lt;BR /&gt;   }&lt;BR /&gt;   else {&lt;BR /&gt;      flowFile = session.putAttribute(flowFile, 'unique.id.value', unique_id)&lt;BR /&gt;      session.transfer(flowFile, REL_SUCCESS)&lt;BR /&gt;   }&lt;BR /&gt;}&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 29 Nov 2022 22:04:07 GMT</pubDate>
    <dc:creator>DANiFi</dc:creator>
    <dc:date>2022-11-29T22:04:07Z</dc:date>
    <item>
      <title>NiFi ExecuteScript - extracting and storing a value from a single line</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NiFi-ExecuteScript-extracting-and-storing-a-value-from-a/m-p/358510#M237865</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm having problems getting the JavaScript I wrote for ExecuteScript to work properly.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What I'm trying to do:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;My flowfiles contain rows of data from a SELECT statement. Each flowfile is searched line by line for non-ASCII characters, and all non-ASCII characters that are found are replaced with empty strings before the data is loaded to a target database. I'd like to grab the value for the table's unique identifier column and write it to a new table, say "&lt;FONT face="courier new,courier"&gt;nifi_replaced_data&lt;/FONT&gt;" so that I can easily look in the source database and see the row with the non-ASCII character that NiFi replaced in the target database.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example, I might be processing a table with the columns:&lt;/P&gt;&lt;PRE&gt;column_1, column_2, unique_id, column_4&lt;/PRE&gt;&lt;P&gt;And the flowfile, which is tab delimited, might look like:&lt;/P&gt;&lt;PRE&gt;somedata&amp;nbsp; &amp;nbsp; bad_&amp;#31;_character&amp;nbsp; &amp;nbsp;AA11BB22&amp;nbsp; &amp;nbsp;someotherdata&lt;BR /&gt;otherdata   no_bad_character  CC33DD44   someotherdata&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;The process:&lt;/STRONG&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Each flowfile is checked in it's entirety for non-ASCII characters using regex. Only files that have one or more bad characters are routed to success / matched.&lt;/LI&gt;&lt;LI&gt;An ExecuteScript processor is used to look at the &lt;FONT face="courier new,courier"&gt;columns&lt;/FONT&gt; attribute, and determine the index of the &lt;FONT face="courier new,courier"&gt;unique_id&lt;/FONT&gt; column. The &lt;FONT face="courier new,courier"&gt;unique_id&lt;/FONT&gt; column is guaranteed in all tables, but its index position will vary based on the table being processed. The index is stored in an attribute &lt;FONT face="courier new,courier"&gt;unique.id.index&lt;/FONT&gt;.&lt;/LI&gt;&lt;LI&gt;The flowfile is split line by line using SplitText, and then RouteText is used to determine if the row / line contains a non-ASCII character. If found, the flowfile is routed to the 2nd ExecuteScript processor.&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;This ExecuteScript processor reads the flowfile via InputSteam, and splits it by the tab delimiter. It then uses the &lt;FONT face="courier new,courier"&gt;unique.id.index&lt;/FONT&gt; attribute to determine where the &lt;FONT face="courier new,courier"&gt;unique_id&lt;/FONT&gt; value is in the array, and writes the value to a new attribute &lt;FONT face="courier new,courier"&gt;unique.id.value&lt;/FONT&gt;.&lt;/LI&gt;&lt;LI&gt;ExecuteSQL writes a row in the&amp;nbsp;&lt;FONT face="courier new,courier"&gt;nifi_replaced_data&lt;FONT face="arial,helvetica,sans-serif"&gt; table in the target database, using the unique_id value that was added as an attribute in step 4. It also writes the source table name, which is held in a separate attribute.&amp;nbsp;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;You may be wondering where the bad character is replaced with an empty string. This happens at a later point in the flow and is not dependent on this processor group.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Issues:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I currently have &lt;STRONG&gt;two problems&lt;/STRONG&gt;, one with the first step in the process and one with the fourth.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Step #1 - When I run NiFi using a table that I&amp;nbsp;&lt;STRONG&gt;know&lt;/STRONG&gt; has one or more rows with a non-ASCII character, the ExtractText processor I'm using does not find any matches when looking at the entire flowfile - and thus nothing is routed on matched. I have the following configuration, but I'm wondering if I should be using a RouteText processor for this instead:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="DANiFi_0-1669675267163.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/36339iC9457388A9BB173F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="DANiFi_0-1669675267163.png" alt="DANiFi_0-1669675267163.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Step #4 - if I bypass the processor for step #1 and just split every flowfile by line without checking the entire thing first, I can find and replace non-ASCII characters, so I know for a fact that I have bad characters in the file. But, my ExecuteScript processor errors on every line when attempting to get the value of the &lt;FONT face="courier new,courier"&gt;unique_id&lt;/FONT&gt;. I'm not sure if the problem is with reading the flowfile via InputSteam, converting the line via &lt;FONT face="courier new,courier"&gt;IOUtils.ToString&lt;/FONT&gt;, or splitting the flowfile via tab delimiter. My hunch is that the IOUtils.ToString method is throwing an exception when it encounters a character that is not UTF-8, and so my delimiter split + index lookup does nothing:&lt;/P&gt;&lt;PRE&gt;var InputStreamCallback = Java.type("org.apache.nifi.processor.io.InputStreamCallback");&lt;BR /&gt;var IOUtils = Java.type("org.apache.commons.io.IOUtils");&lt;BR /&gt;var StandardCharsets = Java.type("java.nio.charset.StandardCharsets");&lt;BR /&gt;&lt;BR /&gt;var flowFile = session.get();&lt;BR /&gt;if(flowFile != null) {&lt;BR /&gt;&lt;BR /&gt;   // Get the name of the table&lt;BR /&gt;   var table = flowFile.getAttribute('table.name');&lt;BR /&gt;&lt;BR /&gt;   // Get the index of the unique_id via the attribute&lt;BR /&gt;   var unique_id_index = flowFile.getAttribute('unique.id.index');&lt;BR /&gt;&lt;BR /&gt;   // Create a new InputStreamCallback, passing in a function to define the interface method&lt;BR /&gt;   session.read(flowFile,&lt;BR /&gt;   new InputStreamCallback(function(inputStream) {&lt;BR /&gt;      try {&lt;BR /&gt;         // Convert the single line of our flowfile to a UTF_8 encoded string&lt;BR /&gt;         var line = IOUtils.toString(inputStream, StandardCharsets.UTF_8);&lt;BR /&gt;      }&lt;BR /&gt;      catch(e) {&lt;BR /&gt;         log.error('Error on toString', e)&lt;BR /&gt;      }&lt;BR /&gt;      // Split the delimited data into an array&lt;BR /&gt;      var data = line.split('\t');&lt;BR /&gt;      // Get the value of the unique_id, using the index&lt;BR /&gt;      var unique_id = data[unique_id_index];&lt;BR /&gt;&lt;BR /&gt;      if (typeof unique_id === 'undefined') {&lt;BR /&gt;         var ObjectArrayType = Java.type("java.lang.Object[]");&lt;BR /&gt;         var objArray = new ObjectArrayType(1);&lt;BR /&gt;         objArray[0] = table;&lt;BR /&gt;         objArray[1] = line;&lt;BR /&gt;         log.error('Error: could not find unique_id value for table {} and line {}', objArray);&lt;BR /&gt;         session.transfer(flowFile, REL_FAILURE)&lt;BR /&gt;      }&lt;BR /&gt;      else {&lt;BR /&gt;         flowFile = session.putAttribute(flowFile, 'unique.id.value', unique_id)&lt;BR /&gt;         session.transfer(flowFile, REL_SUCCESS)&lt;BR /&gt;      }&lt;BR /&gt;      inputStream.close()&lt;BR /&gt;   }));&lt;BR /&gt;}&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm fairly new to NiFi, and while I'm a developer - JavaScript is not my specialty. If someone could point me in the right direction, I'd really appreciate it! I've been struggling with this for far too long...&lt;/P&gt;</description>
      <pubDate>Mon, 28 Nov 2022 23:00:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NiFi-ExecuteScript-extracting-and-storing-a-value-from-a/m-p/358510#M237865</guid>
      <dc:creator>DANiFi</dc:creator>
      <dc:date>2022-11-28T23:00:01Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi ExecuteScript - extracting and storing a value from a single line</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NiFi-ExecuteScript-extracting-and-storing-a-value-from-a/m-p/358599#M237891</link>
      <description>&lt;P&gt;For anyone having a similar problem, here's how it was resolved.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The issue with #1 was fixed by switching from an&amp;nbsp;&lt;SPAN&gt;ExtractText processor to a RouteOnContent processor. RouteOnContent is much more simplistic and easy to use - just create a property for routing the flowfile and add the regex.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#4 was fixed by:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Moving the unique_id_index variable outside of the callback so it could be used afterwards.&lt;/SPAN&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Same with the unique_id variable ^&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Closing the InputStream before routing&amp;nbsp;the flowfile.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;Full (working) script is below:&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;var InputStreamCallback = Java.type("org.apache.nifi.processor.io.InputStreamCallback");&lt;BR /&gt;var IOUtils = Java.type("org.apache.commons.io.IOUtils");&lt;BR /&gt;var StandardCharsets = Java.type("java.nio.charset.StandardCharsets");&lt;BR /&gt;&lt;BR /&gt;var flowFile = session.get();&lt;BR /&gt;&lt;BR /&gt; // Get the name of the table&lt;BR /&gt;var table = flowFile.getAttribute('table.name');&lt;BR /&gt;&lt;BR /&gt;// Get the index of the unique_id via the attribute&lt;BR /&gt;var unique_id_index = flowFile.getAttribute('unique.id.index');&lt;BR /&gt;var unique_id;&lt;BR /&gt;&lt;BR /&gt;if(flowFile != null) {&lt;BR /&gt;   // Create a new InputStreamCallback, passing in a function to define the interface method&lt;BR /&gt;   session.read(flowFile,&lt;BR /&gt;   new InputStreamCallback(function(inputStream) {&lt;BR /&gt;      try {&lt;BR /&gt;         // Convert the single line of our flowfile to a UTF_8 encoded string&lt;BR /&gt;         var line = IOUtils.toString(inputStream, StandardCharsets.UTF_8);&lt;BR /&gt;      }&lt;BR /&gt;      catch(e) {&lt;BR /&gt;         log.error('Error on toString', e)&lt;BR /&gt;      }&lt;BR /&gt;      // Split the delimited data into an array&lt;BR /&gt;      var data = line.split('\t');&lt;BR /&gt;      // Get the value of the unique_id, using the index&lt;BR /&gt;      unique_id = data[unique_id_index];&lt;BR /&gt;    }));&lt;BR /&gt;&lt;BR /&gt;   if (typeof unique_id === 'undefined') {&lt;BR /&gt;      var ObjectArrayType = Java.type("java.lang.Object[]");&lt;BR /&gt;      var objArray = new ObjectArrayType(1);&lt;BR /&gt;      objArray[0] = table;&lt;BR /&gt;      objArray[1] = line;&lt;BR /&gt;      log.error('Error: could not find unique_id value for table {} and line {}', objArray);&lt;BR /&gt;      session.transfer(flowFile, REL_FAILURE)&lt;BR /&gt;   }&lt;BR /&gt;   else {&lt;BR /&gt;      flowFile = session.putAttribute(flowFile, 'unique.id.value', unique_id)&lt;BR /&gt;      session.transfer(flowFile, REL_SUCCESS)&lt;BR /&gt;   }&lt;BR /&gt;}&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Nov 2022 22:04:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NiFi-ExecuteScript-extracting-and-storing-a-value-from-a/m-p/358599#M237891</guid>
      <dc:creator>DANiFi</dc:creator>
      <dc:date>2022-11-29T22:04:07Z</dc:date>
    </item>
  </channel>
</rss>

