About oruchovets

oruchovets · ‎06-03-2018

Hello Nikita. I have a very similar use case. I want to use public key(.asc) and private key (.gpg). You've mentioned in the first post that you can encrypt and decrypt the content, but It doesn't work in my case. Can you share more details on nifi encrypContent configuration and key creation? I've got an exception: 2018-06-04 00:30:41,891 ERROR [Timer-Driven Process Thread-39] o.a.n.processors.standard.EncryptContent EncryptContent[id=104812d1-1833-14cd-e94b-2ada6cb69b98] Cannot encrypt StandardFlowFileRecord[uuid=a2c4feeb-35dc-407d-8a59-044345403950,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1528041593413-3659, container=default, section=587], offset=430110, length=1062],offset=0,name=data.json,size=1062] - : org.apache.nifi.processor.exception.ProcessException: Invalid public keyring - invalid header encountered org.apache.nifi.processor.exception.ProcessException: Invalid public keyring - invalid header encountered at org.apache.nifi.security.util.crypto.OpenPGPKeyBasedEncryptor$OpenPGPEncryptCallback.process(OpenPGPKeyBasedEncryptor.java:338) at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2826) at org.apache.nifi.processors.standard.EncryptContent.onTrigger(EncryptContent.java:506) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1119) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) encryptcontent.png

oruchovets · ‎02-14-2018

Hello, I used the link for securing NIFI https://community.hortonworks.com/articles/58233/using-the-tls-toolkit-to-simplify-security.html and it works fine for POC. In the above link I used tls-toolkit and it generates CN=team_OU=NIFI.p12 CN=team_OU=NIFI.password nifi-cert.pem nifi-key.key keystore.jks nifi.properties truststore.jks Next step , for production environment our IT registered domain name and generates <domain_name>.crt and <domain_name>.key Question: how to switch NIFI use these files / certificates ( <domain_name>.crt and <domain_name>.key) ? Thanks Oleg.

oruchovets · ‎12-09-2017

HI Bryan. end2end flow is: read from db ( nifi returns Avro format ) -> md5 selected columns -> convert to csv -> put to s3 I wanted to do md5 using groovy script using ConvertRecord . But I don't know how to start with a groovy script. Unitests didn't help me. If it is possible please share the example how groovy reads Avro data using ScriptedRecordReader) If my approach is wrong please suggest the better way. Thanks Oleg.

oruchovets · ‎12-08-2017

Thanks Bryan for the explanation. It is much clear now. Just to get started, if I need to read Avro content coming from QueryDataBase processor. I need to make changes in each line and convert it after to CSV. RecordReader will read the output from QueryDataBase processor and make changes. Is there a simple example how to read Avro (groovy script ) using ScriptedReader. The output should be CSV. Is it possible to convert Avro from CSV using ConvertRecord or it is also writing code? If it is writing a code can you give me some example how to convert AVRO to CSV using groovy (in the context of ScriptedRecordSetWriter) Thanks Oleg

oruchovets · ‎12-07-2017

Hi All, I have a big file JSON format ( 1m records ). I need to replace a couple of fields in each JSON using custom logic. I used ExecuteScript processor using a Groovy script but got out of memory exception. I want to try using ConvertRecord processor. I don't have a schema of the JSON that is why I want to use I want to use ScriptedReader and ScriptedRecordSetWriter. Questions: What is the best practice for a use case like process big JSON file and make changes of per record? is it a good idea to use ConvertRecord or should be a different approach. Can you point me to an example of ScriptedReader and ScriptedRecordSetWriter using Groovy. ( I found a lot of ExecuteScript examples but Record Based Groovy I don't ). Also is it possible to share the docs/blog link with an internals of Record based approach? I want to understand at deeper technical level why it is much robust comparing to ExecuteScript ( file-based approach ). Thanks Oleg.

oruchovets · ‎10-27-2017

Hi , I've made some research: ORC file is generated by NIFI ConvertAvroToOrc ( NIFI 1.4 ). I ran the same tests with EMR 5.7 which has hive 2.1 and hive successfully can query orc external table. I checked and it looks like EMR 5.9 hive 2.3 is using upgraded version of ORC [HIVE-15841] - Upgrade Hive to ORC 1.3.3 Question: What should be done to NIFI ConvertAvroToOrc knows to work with hive 2.3 properly? I checked writer version of ORC: ORC generates with NIFI "writerVersion": "HIVE_8732" ORC generated with HIVE 2.3 "writerVersion": "ORC_135" Should I create a separate topic, since it looks like the problem is related to NIFI ORC component? Thanks Oleg.

oruchovets · ‎10-25-2017

I am creating hive external table ORC (ORC file located on S3 , environment AWS EMR 5.9 Hive 2.3.0 ) Command <code>CREATE EXTERNAL TABLE Table1 (Id INT, Name STRING) STORED AS ORC LOCATION 's3://bucket_name' After running the query: <code>Select * from Table1; Result is: <code>+-------------------------------------+---------------------------------------+ | Table1.id | Table1.name | +-------------------------------------+---------------------------------------+ | NULL | NULL | | NULL | NULL | | NULL | NULL | | NULL | NULL | | NULL | NULL | | NULL | NULL | | NULL | NULL | | NULL | NULL | | NULL | NULL | | NULL | NULL | +-------------------------------------+---------------------------------------+ Interesting that the number of returned records 10 and it is correct but all records are NULL. What is wrong, why query returns only NULLs? I am using EMR instances on AWS. Should I configure/check to support ORC format for hive? I tested it with file location on s3 , hdfs. Query from hive , beeline. the behavior is the same: select count (*) returns 10. select * returns NULLs ... ORC file attachment: https://drive.google.com/file/d/0B3MYgurAigDMdm1ESkZYWm9Zdms/view I know guys it is not a Hortonworks distribution, but I would really appreciate your help 🙂 Thanks Oleg.

Online	Offline
Last Visited	‎06-11-2019 06:10 AM

Member Since	‎03-27-2017 04:18 PM
Last Visited	‎06-11-2019 06:10 AM
Posts	11
Kudos received	1

Cloudera Community

Re: NiFi EncryptContent Processor's behaviour

NIFI SSL , how to use registered certificates (no...

Re: Nifi process big file using ConvertRecord proc...

Re: Nifi process big file using ConvertRecord proc...

Nifi process big file using ConvertRecord processo...

Re: HIVE ORC table returns NULLs

HIVE ORC table returns NULLs