About migu

migu · ‎03-30-2023

Hello @TimothySpann, Actually at the time it was a requirement from our client which in the meantime we managed to convince to drop. So it has been "fixed" by itself. Thank you anyway for the contribution. Best regards, Miguel

migu · ‎02-17-2023

Hello all, We have a couple of flows already writting values into Kafka using Avro format for the value, via an AvroRecordSetWriter that also uses a Confluent Schema Registry. The thing is that now we would also need to write the Kafka key also in Avro format. Is there any easy way to do it, or how would we isolate the key, to use an Avro writer to write it in Avro and then merge it will the original flow file as an attribute to be used as key by the PublishKafka processor? Thank you. Best regards, Miguel

migu · ‎11-15-2022

Hello, We're looking for some advice on how to identify possible improvements on our current NiFi flow that is a copy of what we had implemented in Flume, both running on AWS with CDP Private Cloud. Our use case begins with an HTTPS listener that in its simplest form receives GET requests and processes them and writes events into Kafka. Between those two ends, we perform some validations and transformations: - we convert the request parameters into JSON format (using 5 processors with AttributesToJSON and ReplaceText) - convert JSON format to rename properties using a JoltTransformJSON - validations using a couple of RouteOnAttribute + ReplaceText to confirm mandatory fields and transform null values - mapping of numeric values into strings using reference data (using UpdateAttribute, RouteOnAttribute and 2 LookupAttribute) - convert JSON into Avro format with ConvertRecord - returning the HTTP response + publishing each individual record to Kafka In total, we have around 20-25 processors on the critical path and around the same to process any errors along the way and we're able to reach about 3,000 records/second using 3 16-core/16GB heap nodes (sizing recommended by Cloudera). Before, we were reaching 9,000 records/second in Flume, having 3 identical nodes to process HTTP and validations and 3 more to receive the Avro messages and publish them to Kafka. We've already tried switching our gp2 disks (3000 IOPS, 125MiB/s) to gp3 (3000 IOPS, 500MiB/s), putting the repositories into different volumes, but none of those translated into gains in performance. Any ideas on how identify improvements like CPU (that only reaches 60% usage even with 128 threads), heap (currently using 16GB), disks (using what was reported above, but monitoring doesn't show them being stressed), custom processor development for our custom validations and transformations, etc? Thank you. Best regards, Miguel

migu · ‎06-08-2022

Hello, On our use case, we're receiving data in JSON format and it must be converted into Avro using a schema that has X and Y includes mandatory fields and everything else (Z and W) should go into a Custom map. Here is our Avro schema: "fields": [ {"name": "X", "type": "long"}, {"name": "Y", "type": "string"}, {"name": "Custom", "type": { "type": "map", "values": "string" } } ] And an example of data coming in: { "X":123, "Y":"ABC", "Z":"zzz", "W":"www" } We have implemented a JoltTransform to do the needed transformation to get Z and W into a Custom array. Now the issue is on doing the conversion from JSON into Avro since we get an error stating it cannot convert an java.lang.Object into a Map. How can we properly process this JSON and put that array into a Map so it can be properly stored using that Avro schema? Thank you. Best regards, Miguel

Online	Offline
Last Visited	‎01-18-2024 05:23 AM

Member Since	‎06-08-2022 07:37 AM
Last Visited	‎01-18-2024 05:23 AM
Posts	4

Cloudera Community

Re: Nifi: Kafka Producer with Avro format in both ...

Nifi: Kafka Producer with Avro format in both key ...

NiFi HTTP listener and processing - improving perf...

NiFi - converting JSON array into Avro Map