About araujo

araujo · ‎07-05-2022

@dida The following connector configuration worked for me. My schema was stored in Schema Registry and the connector fetched it from there. { "connector.class": "com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector", "hdfs.output": "/tmp/topics_output/", "hdfs.uri": "hdfs://nn1:8020", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "name": "asd", "output.avro.passthrough.enabled": "true", "output.storage": "com.cloudera.dim.kafka.connect.hdfs.HdfsPartitionStorage", "output.writer": "com.cloudera.dim.kafka.connect.hdfs.parquet.ParquetPartitionWriter", "tasks.max": "1", "topics": "avro-topic", "value.converter": "com.cloudera.dim.kafka.connect.converts.AvroConverter", "value.converter.passthrough.enabled": "false", "value.converter.schema.registry.url": "http://sr-1:7788/api/v1" } Cheers, André

DianaTorres · ‎07-01-2022

@roshanbi Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks

araujo · ‎06-30-2022

@snm1523 , Sorry, I don't remember either. Unfortunately I don't have a cluster handy now to confirm this. Cheers, André

araujo · ‎06-29-2022

@harvey , You are probably running into the problem described in this Technical Service Bulletin. Please check the bulletin for the solution/workaround. Cheers, André

araujo · ‎06-28-2022

@samaan_filho , You can have 2 docker containers running NiFi, each one using the same local port 8443. You cannot expose both of those ports with the same port number to your local machine, though. You'd have to map to different port numbers when exposing them. Can you share more details? Alternatively, please take a look at this article: https://community.cloudera.com/t5/Community-Articles/NiFi-cluster-sandbox-on-Docker/ta-p/346271 Cheers, André

jacektrocinski · ‎06-27-2022

Thanks, I have filed a JIRA bug: https://issues.apache.org/jira/browse/NIFI-10171

araujo · ‎06-26-2022

@Luwi , please note that NiFi currently only supports Java 8 and 11. Cheers Andre

araujo · ‎06-24-2022

@HTalha , Another way to do this is to use the ExecuteScript processor with the following python script: from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import StreamCallback import json import re class SplitAndConvertToJson(StreamCallback): def __init__(self): pass def process(self, inputStream, outputStream): input = IOUtils.toString(inputStream, StandardCharsets.UTF_8) input = re.sub(r'(^\s*\{\s*|\s*\}\s*)', '', input) fields = input.split(',') obj = dict([('val_%s' % (i,), v.rstrip()) for i, v in enumerate(fields)]) outputStream.write(bytearray(json.dumps(obj).encode('utf-8'))) flowfile = session.get() if(flowfile != None): flowfile = session.write(flowfile, SplitAndConvertToJson()) session.transfer(flowfile, REL_SUCCESS) Cheers, André

araujo · ‎06-24-2022

@rookie-xxd , Could you provide the following information: Are you using TLS? A screenshot of the "Hosts" page in Cloudera Manager showing the registered hosts The content of the /etc/hosts file on the machine that is failing to heartbeat Cheers, André

araujo · ‎06-24-2022

@Brenigan , The issue that you are seeing is because you are instantiating PyStreamCallback twice. You should do it once and reference that object in the subsequent calls to the session functions. The code below works as you'd expect: from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import StreamCallback import json #Define a subclass of StreamCallback for use in session.write() class PyStreamCallback(StreamCallback): def __init__(self): self.length = 0 def process(self, inputStream, outputStream): jsn = IOUtils.toString(inputStream, StandardCharsets.UTF_8) array = json.loads(jsn) # type: dict i = 0 while i <= 1: root_key = list(array.keys())[0] array = array[root_key] i += 1 self.length = str(len(array)) def get_length_of_array(self): return self.length # end class flowfile = session.get() if(flowfile != None): reader = PyStreamCallback() flowfile = session.write(flowfile, reader) flowfile = session.putAttribute(flowfile, "length", reader.get_length_of_array()) session.transfer(flowfile, REL_SUCCESS) There is a simpler way to do what you're trying to do, though. For example, say you have the following JSON object in the incoming flowfile: { "root": { "items": [1, 2] } } If you want to set the flowfile "length" attribute with the length of the "items" array, you can simply use the EvaluateJsonPath processor with the following configuration: Cheers, André

Online	Offline
Last Visited	‎07-21-2025 10:25 PM

Member Since	‎06-26-2015 11:59 AM
Last Visited	‎07-21-2025 10:25 PM
Posts	515
Kudos received	139

Cloudera Community

Re: Is it possible to use Single User authenticati...

Re: Dynamically Assign an XSD File

Re: "error": "There is no mapped role for the grou...

Re: Read xml file content into an Attribute: How t...

Re: Nifi Lookup CSV values with SQL NULL values

Re: CDP7 Streams Messaging Kafka Connect to HDFS

Re: Kafka checkpoint

Re: Auth based local repository configuration for ...

Re: Error when trying to kerberise CDP cluster

Re: Running Apache Nifi on Docker without 0.0.0.0

Re: NiFi LookupRecord Processor Bug

Re: ExecuteScript error - ECMAScript is missing

Re: Transform JSON array coming from ConsumeMQTT t...

Re: Installation failed. Unable to receive a heart...

Re: PyStreamCallBack object has no attribute leng...