Member since
10-03-2017
5
Posts
0
Kudos Received
0
Solutions
11-02-2017
01:08 PM
I am using ConsumeKafkaRecord_0_10 with JsonTreeReader and an AvroRecordSetWriter to read json data from kafka and save it as ORC files (ConsumeKafkaRecord_0_10 -> MergeContent -> ConvertAvroToORC). I can't add fields with a dash in the name to the avro schema, because avro does not allow dash characters in the name of the field. I need to save http_headers that contains fields like: Accept-Encoding, User-Agent, Accept-Language ... How can I do that? Is there a way to escape dash character in the schema? I don't want to use ConsumeKafka_0_10 processor because it's 30x slower than ConsumeKafkaRecord_0_10.
... View more
Labels:
10-04-2017
01:57 PM
Processor ConvertAvroToORC was using only 2 Concurrent Tasks although it was configured to use 4. After restarting the cluster ConvertAvroToORC started using 4 Concurrent Tasks and the throughput is now 14600 msg/sec on the cluster (7300 msg/sec on each machine).
... View more
10-04-2017
11:31 AM
By using ConsumeKafkaRecord_0_10 with JsonTreeReader and an AvroRecordSetWriter Like Bryan suggested I now get a throughput of 9600 msg/sec on the cluster (4800 msg/sec on each machine). I could not remove the MergeContent. If I do I get very small files cca. 0.5MB. Thank you
... View more
10-03-2017
01:43 PM
Thank you for the suggestion. This looks very promising. I just need to figure out how the suggested components work. I will let you know how it goes. Thank you.
... View more
10-03-2017
10:18 AM
I am using a Nifi cluster of 2 x c4.2xlarge machines (8 cores and 15 GB memory each) Nifi is setup to use 12GB of memory # JVM memory settings java.arg.2=-Xms12g java.arg.3=-Xmx12g jsonToAvro processor is running with 7 Concurrent Tasks and I get a throughput of 450 messages per second. Message size is about 3KB. The only slow part is the jsonToAvro processor. When running the workflow all cores are above 90% If I save data to file from kafka and use orc-tools to convert to ORC file I get a throughput of 5000 msg/sec on one machine. I configured NiFi as instructed in the Best practices articel: https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html What am I doing wrong? Thank you.
... View more