Support Questions
Find answers, ask questions, and share your expertise

How to store WebAPI response to MongoDB in Apache Nifi?

Hello all, I am using Apache Nifi for Master's thesis work for integration of genomic data sources. I would like to, for example, ingest this API response into MongoDB. As you notice, the native response data is in flat format, could someone please guide me on how to go by it? Thank and regards, Jasim

2 REPLIES 2

Super Guru

This appears to be in tab-delimited format, with the first line for Data Type ("commented out" with a pound (#) sign), followed by the column definitions (1124 columns I believe), also commented out, followed by a sparse matrix of values per row. Since MongoDB expects a JSON document, I'm guessing you want to convert each row to a JSON document like so?

{
  "COLOR_GRADIENT_SETTINGS": 672,
  "MUTATION_EXTENDED": "BRCA1",
  "GENE_IDCOMMON": "NaN",
  "TCGA-AR-A1AR-01": "NaN",
...
}

Although you should be able to use ExtractText to match the first two lines (ignoring the first one and matching everything but the # in the second), due to the large number of columns it would be cumbersome to write a regular expression to match the column names. If you had the column names (perhaps extracted into attributes), you could use SplitText to split each row into individual flow files. Then another ExtractText could be used (with the same regex I imagine) to get the values into attributes, then a ReplaceText to build the above JSON file.

Alternatively you could use ExecuteScript to write custom logic to skip the first line, split the second by tabs into column names, then for each remaining line you can output a flow file in the above JSON format. Then you can use PutMongo to send the document to your MongoDB instance.

In upcoming NiFi/HDF releases, you will be able to do arbitrary conversions between formats, such as CSV/TSV to JSON for example. This should alleviate the need for custom code to parse large numbers of columns/values.