About eberezitsky

Naveen_Sagar · ‎05-30-2024

Sure Vidya Thanks.!

LakshmiTharun · ‎01-16-2024

Hi @eberezitsky is there a way to extend ListSFTP with the same functionality but that allows inward connection and allows expression language for configuration

bulbcat · ‎09-15-2023

I use CDH 6.3.2 。 hive 2.1 hadoop 3.0 hive on spark 。yarn cluster 。 hive.merge.sparkfiles=true ; hive.merge.orcfile.stripe.level=true ; This configuration makes the 1099 reduce file result merge into one file when the result is small 。Then the merged file has about 1099 stripes in one file 。 Then the result is so slow when it is read. I tried hive.merge.orcfile.stripe.level=false ; The result is desirable 。One small file with one stripe and read fast 。 Can anyone tell the difference between true and false ？ Why " hive.merge.orcfile.stripe.level=true " is the default one ?

dkoher · ‎11-13-2019

After we load over 100 million notes in HBase, I will be using Nifi to listening to a live HL7 feed to keep the data current. Some of these HL7 message are delete message and the rows need to be removed from HBase.

Bharath5 · ‎11-11-2019

Hi, Even though we modified the stripe size to custom value - "orc.stripe.size"="248435456" there are many files which are still with 5MB , 9 MB. Any reason for this behavior?

srijitachaturve · ‎04-11-2018

Thanks for the solution, but since i am not familiar with rest api, solution by Matt looks easy to me. Will surely try yours one too.

eberezitsky · ‎10-24-2017

@xav webmaster Straight answer: flowFile = session.get() if (flowFile != None): flowFile = session.putAttribute(flowFile, 'myAttr', 'myValue') # implicit return at the end More info on executeScript processor: https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html In your particular case, in callback function where you read from input stream, you can scan from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import StreamCallback class PyStreamCallback(StreamCallback): def __init__(self): self.topic_name='' pass def get_topic_name(self): return self.topic_name def process(self, inputStream, outputStream): Log = IOUtils.toString(inputStream, StandardCharsets.UTF_8) Log2 = str(Log).split(',') Brand = Log2[0] Color = Log2[5] Model = Log2[1] if Brand == 'ford' and Color == 'gray': NewLog = str(Log2) self.topic_name = 'ford' outputStream.write(bytearray((NewLog).encode('utf-8'))) if Brand == 'audi' and Color == 'black': NewLog = str(Log2) self.topic_name = 'audi' outputStream.write(bytearray((NewLog).encode('utf-8'))) if Brand == 'bmw' and Color == 'white': NewLog = str(Log2) self.topic_name = 'bmw' outputStream.write(bytearray((NewLog).encode('utf-8'))) # add exception handling if needed for empty flowfile content, etc if(flowFile != None): caller = PyStreamCallback() flowFile = session.write(flowFile, caller) topic_name = caller.get_topic_name() flowFile = session.putAttribute(flowFile, 'kafka_topic', topic_name) Hope that will help.

eberezitsky · ‎06-26-2017

Alright, so I ended up with simple script and one processor in NiFi. Modules for reload should be provided in "modules_list" property of the processor (comma delimited). Script body: import sys, json def class_reloader(modules_to_reload): reload_msg = "" all_module_names = sys.modules.keys() all_module_names.sort() for mn in all_module_names: m = sys.modules[mn] # -- find full match of names with given modules if mn in modules_to_reload: try: reload(m) reload_msg = reload_msg + mn + "|" except: return 1, reload_msg continue # -- find if mn is submodule of any given one for mtr in modules_to_reload: if mn.startswith(mtr+'.'): try: reload(m) reload_msg = reload_msg + mn + "|" break except: return 1, reload_msg return 0, reload_msg #-------------------------------# flowFile = session.create() if(flowFile != None): modules_prop = modules_list.getValue() ml = [] if modules_prop: ml = modules_prop.split(',') cr = class_reloader(ml) flowFile = session.putAttribute(flowFile, 'class_reload_result', str(cr[0])) flowFile = session.putAttribute(flowFile, 'class_reload_found', str(cr[1])) session.transfer(flowFile, REL_SUCCESS) # implicit return at the end The code can be improved to navigate to FAILURE relationship in case of non-zero response code from the method. It's not perfect solution, but will work in most cases. If you have better one - please share! 🙂

dave_sargrad · ‎10-17-2018

to me.. this is not the simple approach.. its very limiting and its no simpler than using the jolttransformation suggested above.

ashish_kumar_68 · ‎04-28-2019

We can use rank approach which is faster than max , max scans the table twice: Here , partition column is load_date: select ld_dt.txnno , ld_dt.txndate , ld_dt.custno , ld_dt.amount , ld_dt.productno , ld_dt.spendby , ld_dt.load_date from (select *,dense_rank() over (order by load_date desc) dt_rnk from datastore_s2.transactions)ld_dt where ld_dt.dt_rnk=1

Online	Offline
Last Visited	‎02-06-2019 04:58 PM

Member Since	‎11-07-2016 04:38 PM
Last Visited	‎02-06-2019 04:58 PM
Posts	70
Kudos received	40

Cloudera Community

Re: Why doesn't ListSFTP allow upstream connection...

Re: Hive in built function greatest is not workin...

Re: skip directories using nifi getftp processor

Re: Updating hive table with sqoop from mysql tabl...

Re: NiFi Execute Script - Reload Classes

Re: Connecting to AWS S3 from NiFi

Re: Why doesn't ListSFTP allow upstream connection...

Re: ORC Creation Best Practices

Re: NiFi - Delete Hbase Row

Re: Number of files for an ORC table(denormalized ...

Re: HDF/NiFi Improving the performance of your UI

Re: how to generate attributes using executescript...

Re: NiFi Execute Script - Reload Classes

Re: Nifi - How to add key:value to json

Re: How to optimize HIVE access to the "latest" pa...