Community Articles
Find and share helpful community-sourced technical articles.
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.
Labels (1)

A - Key components needed to make use of time series data provided by the OSIsoft Pi System

  • For real time processing
    • NiFi for reading the data from Pi and push to HDF cluster
    • Kafka for queuing data samples for stream processing
    • Storm / Spark Streaming
  • For batch processing
    • NiFi for reading the data from Pi and push to HDP cluster
    • Hive - for data, storage, pre-processing and analytical processing
  • HBase - for real time use cases and iterative processing of windows of data

B - Ways HDF can be used to access data from the Pi System and move it into the data lake -

  • File based ingestion - This is the easiest setup and does not require any additional license from OSI Soft. Pi System admins typically know how to export the data files (by tags and/or by a window of time raw/interpolated). The file export process involves setting up, then automating, an extract script (typically this is a .bat script that runs the command line utility to extract data from PI Archive). These text files are then placed on in a location that Apache Nifi can access. There are two patterns for Nifi at this point
    • Edge MiNiFi that takes the extracted file and delivers the content to a central NiFI for further processing
    • An additional step in the extract script that leverages FTP / SCP / NDM to push the file to a landing directory on a central NiFi server or cluster.
  • ADO / OLEDB / ODBC ( AND Pi SDK ) - Pi Servers are built on Microsoft technologies and hence support these data access methods (almost) out of the box. These options do require the connection to originate from a Windows host, either local or remote to the Pi Server, and a bridge to connect the MSFT technology to Nifi.There are design patterns for this available
  • JDBC - (May require additional components / license from OSISoft) - NiFi or MiNiFi connects to Pi using JDBC and pulls data by tag and window of time using SQL queries
  • API - (May require additional components / license from OSISoft)
    • OPC DA ( again windows dependent ). There is a processor built by Hortonworks Professional Services for direct connection to OPC DA using pure Java but is not freely available yet. As an alternative, OPC DA/UA bridges are an option as we do have an open source an OPC UA processor available for Nifi. See
    • OSI offeres an HTTP REST server that can provide data directly to Nifi's get HTTP processors
  • Native Methods are under development from OSI Soft that allow direct pushes into Hive or onto Kafka topics. See

Bottom line - If one is just trying to prove HDF or HDP the least resistance path would be the file option to get the customer started very quickly

Additional notes

If you find very old instances of PI Server there is a real performance concern for the “bulk load” situation regardless of the access method. If you plan to load any large quantity of historical data from a PI Server, then the PI Server will 1) retrieve binary storage files that cover the selected range 2) decompose those files and 3) serve your client the data. To accomplish these steps, the server must load the necessary binaries into memory and then have the spare processing capacity to rip the binary. If you have years and years of data to bring over start small and scale up. If the number of data points in the server server has not changed frequently, look at the PI Archive Files to get an idea of what average amount of time a binary covers and try selecting a range that covers one binary, then two, then three and so on.

Another thing to look for in this bulk loading step: Often there are two PI Servers the first is close to the asset(s) then a second where data from the first is replicated to where most of the Enterprise applications connect to. You will always want to work on the second. If you bring it down, you will only interrupt the customers reporting tasks and not their data collection.

After the bulk loading step if you have a requirement to continuously acquire data from the PI server then you should ask the PI Admin to treat Apache Nifi like any other client and plan for performance accordingly. Often the challenge here is impact of a frequent select * from the real time database which amounts to a significant performance hit.

Secondly keep in mind you are often NOT GETTING RAW DATA you will be getting interpolated data. Increasing Nifi's polling frequency does not necessarily increase the resolution of the acquired data. If updates are requested faster than the PI Server itself is acquiring data then all that is returned is interpolations between the samples. This can impact data science calculations and analyst using this data must be aware this has happened.

The last piece to cover is error handling. As above, the most straight forward method is the file based approach. When a connection to the enterprise historian is broken files will accumulate on the local server and Nifi will start moving them again when the back pressure falls. All of the programmatic data access methods error logic will have to be built for loss of upstream connectivity, loss of downstream connectivity and late arriving data.

Super Collaborator

Hi @wsalazar ,

Your article mostly talked about PI System , should this be the same for MatrikonOPC.??

Can MiNiFi \ NiFi be able to read files from it using OPC DA and HDA specifications.?



For matrikon you need to enable the OPC UA server and use the UA processor provided by

This article is a closer guide to connecting to Matrikon.

Comment back here if you need more help to get this working


One additional strategy that I recommend is using the OSI PI SDK and the OPC UA stack from the OPC foundation to build a OPC UA server for PI. This should be very straight forward to implement and can be scoped down to only implement the features necessary. I hope with growing demand for OPC UA that this is something that OSISoft will make available in the same way OPC DA is today.

You can find more information regarding the Pi SDK in

and you can find the OPC UA Stack from the OPC foundation here

New Member

Thanks @wsalazar for the insights. I know it is an older article, but it is worth revisiting. For real time data need what would approach would you take to connect from NiFi? 

Don't have an account?
Version history
Last update:
‎04-25-2017 05:13 PM
Updated by:
Top Kudoed Authors