This is a simple example of reading CSV data, using it's schema to convert it to AVRO, sending it via Kafka to SAM. SAM then reads it and stores it to HDFS. This is a simple flow, but a start to setting up any level of complex flow.
A number of questions have come up on how to setup this basic flow.
Gotchas: Must send AVRO, beware of nulls, set your schema, put your schema name somewhere, create your kafka topic and make sure you have permissions.
./kafka-console-consumer.sh --zookeeper princeton10.field.hortonworks.com:2181 --topic simple
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].{metadata.broker.list=princeton10.field.hortonworks.com:6667, request.timeout.ms=30000, client.id=console-consumer-7447, security.protocol=PLAINTEXT}aabaabb
Apache NiFi 1.x Flow For CSV Ingest
Set the Schema Name via Update Properties in NiFi
Publish Kafka Record Settings (need CSV Reader and AVRO Record Set Writer)
CSV Reader (Make sure your use the same Schema Registry, Access Strategy and Schema Name)
Avro Writer (Make sure you use the same Schema Registry, Write Strategy and Schema Name)
Add Your Simple Schema to the HSR
Simple Streaming Analytics Manager Store to HDFS
SAM Kafka Source:
SAM HDFS Sink: Set a directory and what fields you want to output