I have data in hive tables which I would like to push to tables in postgres. How can I do this using Nifi processors? What are the sequence of processors that I can use for this usecase. Please advise.
Before that, I would like to know if Nifi is efficient to achieve this when I have millions of records to be written to Postgres?
@John Carter, depending on actual use case you have couple of options to choose in Nifi. In simplest form we can read hive records using "SelectHiveQL" which can output records in either csv or avro format. You can pass those records to "PutDatabaseRecord" processor which can read data in several formats including avro, csv.
For this to work we need to configure below services:
This is one simple example. You can build more complex flows(which may involve filter, join, split or aggregation) based on actual requirements.
Thanks Ajay. Would this be equally good when compared to having a sqoop export in a bash script and call that from executeStreamCommand processor? I have millions of records to push to postgres.
@John Carter, It will depend on kind of latency, processing, and data volume you will be handling. Both are different approaches. Sqoop as you know will run mapreduce jobs while Nifi use case will be on streaming side. Given right resources both will work.
Using executeStreamCommand will also work. Alternatively if you want to use sqoop for all the transfer you can wrap the sqoop command in shell script and use ExecuteProcess. You can decide after weighting pros and cons of various approaches. With actual processing inside nifi you will get inbuilt fault tolerance and monitoring.