Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Archiving NiFi Provenance data

Rising Star


We're working on archiving NiFi Provenance data, using the S2S Provenance reporting task, to write it to HDP.

I'm looking for suggestions (or best practices) on what would make sense for where the provenance data is stored in HDP. I mean would it be better to store it in HDFS, HBase, or Hive, etc., so that it's easier and efficient for users/applications to later query/access this data.

Thanks in advance.


Cloudera Employee

Hi @Raj B,

As far as storage options go, HDFS or HBase could both work for you. HDFS is a distributed filesystem that is not simple to do adhoc queries against without an additional code or framework layer, but there are options for that. HBase is a NoSQL database with realtime querying capabilities.

Hive can sit on top of HDFS or HBase. Basically, you would map HDFS files or HBase tables to Hive tables, and then you can query those data stores through Hive, using it's SQL-like query language, HiveQL. If you have users and developers comfortable with SQL, that might be an option worth looking at.

An additional consideration would be what are you using for other data in your HDP cluster, ie, do you already have a lot of HBase experts? It's possible to make lots of combinations of tools work for what you want to do, so may be a good idea to stick with what you are already using.

I hope this helps narrow things down for you. Good luck!

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.