Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Move Data from Hive to Postgres using Nifi

Move Data from Hive to Postgres using Nifi

New Contributor

I have data in hive tables which I would like to push to tables in postgres. How can I do this using Nifi processors? What are the sequence of processors that I can use for this usecase. Please advise.

Before that, I would like to know if Nifi is efficient to achieve this when I have millions of records to be written to Postgres?

Thanks,

John

4 REPLIES 4
Highlighted

Re: Move Data from Hive to Postgres using Nifi

Rising Star

@John Carter, depending on actual use case you have couple of options to choose in Nifi. In simplest form we can read hive records using "SelectHiveQL" which can output records in either csv or avro format. You can pass those records to "PutDatabaseRecord" processor which can read data in several formats including avro, csv.

For this to work we need to configure below services:

  1. HiveConnectionPool (for "SelectHiveQL")
  2. Record Reader (Avro,CSV)
  3. DBCPConnectionPool

34789-screen-shot-2017-09-13-at-61605-pm.png

This is one simple example. You can build more complex flows(which may involve filter, join, split or aggregation) based on actual requirements.

Re: Move Data from Hive to Postgres using Nifi

New Contributor

Thanks Ajay. Would this be equally good when compared to having a sqoop export in a bash script and call that from executeStreamCommand processor? I have millions of records to push to postgres.

Re: Move Data from Hive to Postgres using Nifi

Rising Star

@John Carter, It will depend on kind of latency, processing, and data volume you will be handling. Both are different approaches. Sqoop as you know will run mapreduce jobs while Nifi use case will be on streaming side. Given right resources both will work.

Re: Move Data from Hive to Postgres using Nifi

Rising Star
@John Carter

Using executeStreamCommand will also work. Alternatively if you want to use sqoop for all the transfer you can wrap the sqoop command in shell script and use ExecuteProcess. You can decide after weighting pros and cons of various approaches. With actual processing inside nifi you will get inbuilt fault tolerance and monitoring.

Don't have an account?
Coming from Hortonworks? Activate your account here