<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Encrypted NiFi Content Repository in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218386#M66621</link>
    <description>&lt;P&gt;Hi Alvin,&lt;/P&gt;&lt;P&gt;As stated above, I cannot indicate prioritization or scheduling of feature delivery. I am eager to develop this feature, as I am sure many users would like it to be available as well. You can always monitor activity on the Apache NiFi Jira and the mailing lists. &lt;/P&gt;</description>
    <pubDate>Fri, 19 Jan 2018 12:03:08 GMT</pubDate>
    <dc:creator>alopresto</dc:creator>
    <dc:date>2018-01-19T12:03:08Z</dc:date>
    <item>
      <title>Encrypted NiFi Content Repository</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218378#M66613</link>
      <description>&lt;P&gt;Hi Guys,&lt;/P&gt;&lt;P&gt;I noticed NiFi has encrypted provenance repository in v1.3.&lt;/P&gt;&lt;P&gt;May I ask the timeline to release the encrypted content repository feature?&lt;/P&gt;&lt;P&gt;Since we fetch encrypted financial data to NiFi, then decrypt them for some fields transformations before encrypting them again with another algo.&lt;/P&gt;&lt;P&gt;Based on my understanding, the Decryption Processor will leave a copy of unencrypted data in disk, which is not acceptable for our compliance.&lt;/P&gt;&lt;P&gt;Any idea about that?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Aug 2017 00:54:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218378#M66613</guid>
      <dc:creator>alvinuw</dc:creator>
      <dc:date>2017-08-15T00:54:09Z</dc:date>
    </item>
    <item>
      <title>Re: Encrypted NiFi Content Repository</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218379#M66614</link>
      <description>&lt;P&gt;Alvin,&lt;/P&gt;&lt;P&gt;The encrypted content repository feature is &lt;A target="_blank" href="https://issues.apache.org/jira/browse/NIFI-3834"&gt;actively being worked on&lt;/A&gt;. As a rule, we cannot make claims about the delivery dates or versions of active development features. Hope this helps. &lt;/P&gt;&lt;P&gt;If you have compliance requirements that sensitive data (such as PII, PCI/payment details, EPHI, etc.) is never stored on disk in plaintext, you can explore using the &lt;A target="_blank" href="https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository"&gt;volatile content repository&lt;/A&gt;, but be aware there is the risk of data loss in the event of power failure, and this applies to all content objects, not just the sensitive records. &lt;/P&gt;&lt;P&gt;With Apache NiFi 1.3.0, you can also use the &lt;A target="_blank" href="https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi"&gt;RecordReader and RecordSetWriter approach&lt;/A&gt; -- while the &lt;A target="_blank" href="https://issues.apache.org/jira/browse/NIFI-4132"&gt;EncryptedRecordReader and EncryptedRecordSetWriter controller services&lt;/A&gt; are not yet available, you could use a custom ScriptedRecordReader and ScriptedRecordSetWriter to decrypt and re-encrypt on the fly. The intermediate "record" object is never persisted to disk, so at no point would the plaintext data be written outside of volatile memory regardless of the content repository implementation.  &lt;/P&gt;</description>
      <pubDate>Tue, 15 Aug 2017 01:26:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218379#M66614</guid>
      <dc:creator>alopresto</dc:creator>
      <dc:date>2017-08-15T01:26:11Z</dc:date>
    </item>
    <item>
      <title>Re: Encrypted NiFi Content Repository</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218380#M66615</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/595/alopresto.html" nodeid="595"&gt;@Andy LoPresto&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Thank you for your work on the encrypted repositories tickets.&lt;/P&gt;&lt;P&gt;We considered about "volatile content repository", but it affects all the workflows with data loss risk. &lt;/P&gt;&lt;P&gt;"Decrypt and Re-encrypt on the fly" sounds like a better one for us. We can extract the non-sensitive fields as attributes, while leave the sensitive data in payload.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Aug 2017 02:07:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218380#M66615</guid>
      <dc:creator>alvinuw</dc:creator>
      <dc:date>2017-08-15T02:07:32Z</dc:date>
    </item>
    <item>
      <title>Re: Encrypted NiFi Content Repository</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218381#M66616</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/595/alopresto.html" nodeid="595"&gt;@Andy LoPresto&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;I am curious to know how much risks to use the volatile content repository?&lt;/P&gt;&lt;P&gt;My understanding is:&lt;/P&gt;&lt;P&gt;If there is a node failure/restart,&lt;/P&gt;&lt;P&gt;For data has already been processed/persisted through the flow, no impact on our business or downstreams.&lt;/P&gt;&lt;P&gt;But users cannot view and/or replay content via the provenance UI, since the content are gone due to restart.&lt;/P&gt;&lt;P&gt;For the content of flowfiles are still in the middle of flow during node failure/restart, we can't replay them from where it fails, when the node is back to normal. Instead, we have to fetch the same files from source again, and reprocess them end to end through the flow.&lt;/P&gt;&lt;P&gt;If above is correct, I would say as long as we have source data permanently persisted in somewhere out of NiFi, we can always reprocess it when data in volatile content repository is lost. The only loss is the ability to view/replay them via Provenance UI. &lt;/P&gt;&lt;P&gt;BTW, what happens when content exceeds the maximum size of repository?&lt;/P&gt;&lt;P&gt;Out of memory exception? Auto purged from memory? auto archived in disk?&lt;/P&gt;&lt;P&gt;If I set nifi.content.repository.implementation=org.apache.nifi.controller.repository.VolatileContentRepository&lt;/P&gt;&lt;P&gt;Does that mean below properties are auto-disabled?&lt;/P&gt;&lt;PRE&gt;nifi.content.claim
nifi.content.repository.archive
nifi.content.viewer.url
&lt;/PRE&gt;&lt;P&gt;Any comments are appropriated.&lt;/P&gt;&lt;P&gt;Thanks. &lt;/P&gt;</description>
      <pubDate>Thu, 17 Aug 2017 23:04:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218381#M66616</guid>
      <dc:creator>alvinuw</dc:creator>
      <dc:date>2017-08-17T23:04:48Z</dc:date>
    </item>
    <item>
      <title>Re: Encrypted NiFi Content Repository</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218382#M66617</link>
      <description>&lt;P&gt;Hello &lt;A rel="user" href="https://community.cloudera.com/users/595/alopresto.html" nodeid="595"&gt;@Andy LoPresto&lt;/A&gt;, &lt;/P&gt;&lt;P&gt;After reading here (and on other forum posts), it appears that there's a group of us who are seeking encryption of their "data at rest" (i.e. encryption of the flowfile, content, and provenance repositories). You all at Hortonworks have recently provided an encrypted provenance repository (thank you!), but if we also want the other two to be encrypted, it seems that we currently have five options:&lt;/P&gt;&lt;P&gt;1) Create our own encrypted versions of the pluggable writers/readers for the Content and Flowfile repositories.&lt;/P&gt;&lt;P&gt;2) Wait for you/Hortonworks to finish implementing encrypted versions of the pluggable writers/readers for the Content and Flowfile repositories.&lt;/P&gt;&lt;P&gt;3) Use an encrypted volume. (The OS / disk-driver handles the encryption and decryption transparently)&lt;/P&gt;&lt;P&gt;4) Use a VolatileContentRepository (at our own risk of data loss)&lt;/P&gt;&lt;P&gt;5) Create a ScriptedRecordReader and ScriptedRecordSetWriter (in lieu of the pending EncryptedRecordReader and EncryptedRecordSetWriter controller services).&lt;/P&gt;&lt;P&gt;You alluded to #5 in your response to Alvin, above. My question is: How does #5 differ from #1? More specifically, &lt;EM&gt;when&lt;/EM&gt; does NiFi write content and flowfile data to disk? When queues get full? Would I be correct to assume that #5 only works if we're processing record-oriented data exclusively? For example, if our flow were to assemble record-oriented from one or more relations (i.e. from sources other than an EncryptedRecordReader), then wouldn't there be a possibility of plaintext data written to disk &lt;EM&gt;before&lt;/EM&gt; it becomes associated/identified as a "record"?&lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 22:13:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218382#M66617</guid>
      <dc:creator>arenger</dc:creator>
      <dc:date>2017-10-23T22:13:32Z</dc:date>
    </item>
    <item>
      <title>Re: Encrypted NiFi Content Repository</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218383#M66618</link>
      <description>&lt;P&gt;
	Alex, your question regarding the difference between option 1 and option 5 is a good one. Option 1 discusses the readers and writers that support the provenance repository implementation. These classes serialize and deserialize provenance events from Java objects to byte streams which can be written to the repository files on disk. Option 5 references the record readers and writers which are used on the NiFi canvas to support the abstract "record" concept of a collection of individual units of data within a single flowfile. In this case, the reader and writer classes convert data between external formats (JSON, CSV, arbitrary via 
	&lt;CODE&gt;ScriptedRecord*&lt;/CODE&gt;) and the NiFi internal record format. I understand these concepts seem related, but these classes are completely separate and there is no overlap whatsoever. The as-yet-undeveloped &lt;CODE&gt;EncryptedRecordReader&lt;/CODE&gt; and &lt;CODE&gt;EncryptedRecordSetWriter&lt;/CODE&gt; classes you mention would allow you to operate on encrypted flowfile content, i.e. a flowfile contained 100 lines of customer data and some of the column values were PII/PCI/PHI and therefore encrypted. If you needed to update these records (let's say add a new property to each record which contained the last four digits of a credit card number, but the full number value was encrypted), you could use an &lt;CODE&gt;UpdateRecord&lt;/CODE&gt; processor as follows:&lt;/P&gt;&lt;OL&gt;
	
&lt;LI&gt;&lt;CODE&gt;EncryptedRecordReader&lt;/CODE&gt; to decrypt the records ephemerally&lt;/LI&gt;	
&lt;LI&gt;Add a property "lastFourDigits" which reads the &lt;CODE&gt;/PAN&lt;/CODE&gt; field and slices the last four digits&lt;/LI&gt;	
&lt;LI&gt;&lt;CODE&gt;EncryptedRecordSetWriter&lt;/CODE&gt; would re-encrypt the sensitive fields&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;
	All of these actions happen within the lifecycle of a single 
	&lt;CODE&gt;@OnTrigger&lt;/CODE&gt; of the &lt;CODE&gt;UpdateRecord&lt;/CODE&gt; processor (even though some logic is being performed by the controller services), so none of the plaintext "record" data is persisted anywhere on the system (it is only in RAM). To be clear, in this situation, the actual implementation of the record reader and writer would need to combine the crypto capabilities with the format, so it would actually be something like &lt;CODE&gt;EncryptedJsonRecordReader&lt;/CODE&gt; and &lt;CODE&gt;EncryptedJsonRecordSetWriter&lt;/CODE&gt;. I haven't done the full architecture work here yet, but obviously it's not ideal to have &lt;EM&gt;2*n&lt;/EM&gt; implementations just to provide the crypto capabilities. It is likely this would require architecture changes, either to allow multiple sequentially stacked readers and writers in the processor, or the &lt;CODE&gt;Encrypted*&lt;/CODE&gt; implementations would accept a "type-specific" record reader/writer in their definitions and perform this task via composition. That way you would maintain the type-conversion flexibility that currently exists (i.e. &lt;CODE&gt;EncryptedRecordReader&lt;/CODE&gt; &lt;EM&gt;has a&lt;/EM&gt; &lt;CODE&gt;JsonRecordReader&lt;/CODE&gt; and the &lt;CODE&gt;EncryptedRecordSetWriter&lt;/CODE&gt; &lt;EM&gt;has a &lt;/EM&gt;&lt;CODE&gt;CsvRecordSetWriter&lt;/CODE&gt;, etc.). &lt;/P&gt;&lt;P&gt;
	As for when NiFi persists data to disk, this is usually done during/after the 
	&lt;CODE&gt;@OnTrigger&lt;/CODE&gt; phase completes in the data lifecycle. You can see code like &lt;CODE&gt;flowFile = processSession.putAttribute(flowFile, JMS_SOURCE_DESTINATION_NAME, destinationName);&lt;/CODE&gt; or&lt;/P&gt;
&lt;PRE&gt;flowFile = processSession.write(flowFile, new OutputStreamCallback() {&lt;BR /&gt;    @Override&lt;BR /&gt;    public void process(final OutputStream out) throws IOException {&lt;BR /&gt;        out.write(response.getMessageBody());&lt;BR /&gt;    }&lt;BR /&gt;});
&lt;/PRE&gt;&lt;P&gt;
	This is populating the flowfile attributes or content respectively, and then on 
	&lt;CODE&gt;processSession.commit()&lt;/CODE&gt; that data is persisted to whichever implementation of the content/flowfile repository that is configured (i.e. could be written to disk, volatile memory, etc.) There is a &lt;A target="_blank" href="https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html"&gt;good document which goes into depth&lt;/A&gt; on the write-ahead log implementation, and I wrote extensively about the provenance repository serialization &lt;A target="_blank" href="https://alopresto.github.io/ewapr/"&gt;here&lt;/A&gt;. I hope this clarifies the system. Please follow up if you have further questions. &lt;/P&gt;</description>
      <pubDate>Thu, 26 Oct 2017 07:40:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218383#M66618</guid>
      <dc:creator>alopresto</dc:creator>
      <dc:date>2017-10-26T07:40:13Z</dc:date>
    </item>
    <item>
      <title>Re: Encrypted NiFi Content Repository</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218384#M66619</link>
      <description>&lt;P&gt;Thanks for your in-depth response, Andy! This is great; it certainly clarified the gray areas for me... it sounds like Option 5, then, is simply passing around encrypted data as the FlowFile content. If a given processor needs to &lt;EM&gt;access&lt;/EM&gt; the content, it would employ an EncryptedRecordReader (or however you choose to build it... your layered approach sounds good, I agree that it'd be better than the 2*n implementations...). Since the content is decrypted and re-encrypted as part of the @OnTrigger phase, plaintext data would never be written to the Content Repository. It sounds like the &lt;EM&gt;attributes&lt;/EM&gt; would still be unencrypted, so processors that deal with those (e.g. RouteOnAttribute) could still function.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Oct 2017 22:22:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218384#M66619</guid>
      <dc:creator>arenger</dc:creator>
      <dc:date>2017-10-27T22:22:22Z</dc:date>
    </item>
    <item>
      <title>Re: Encrypted NiFi Content Repository</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218385#M66620</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/595/alopresto.html" nodeid="595"&gt;@Andy LoPresto&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Just want to follow up on ticket &lt;A href="https://issues.apache.org/jira/browse/NIFI-3834" target="_blank"&gt;https://issues.apache.org/jira/browse/NIFI-3834&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Is it prioritized for this year?&lt;/P&gt;&lt;P&gt;Thanks. &lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2018 03:36:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218385#M66620</guid>
      <dc:creator>alvinuw</dc:creator>
      <dc:date>2018-01-19T03:36:04Z</dc:date>
    </item>
    <item>
      <title>Re: Encrypted NiFi Content Repository</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218386#M66621</link>
      <description>&lt;P&gt;Hi Alvin,&lt;/P&gt;&lt;P&gt;As stated above, I cannot indicate prioritization or scheduling of feature delivery. I am eager to develop this feature, as I am sure many users would like it to be available as well. You can always monitor activity on the Apache NiFi Jira and the mailing lists. &lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2018 12:03:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Encrypted-NiFi-Content-Repository/m-p/218386#M66621</guid>
      <dc:creator>alopresto</dc:creator>
      <dc:date>2018-01-19T12:03:08Z</dc:date>
    </item>
  </channel>
</rss>

