Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Super Guru

Use Case:

We have data stored in a MongoDB from a third party application in Amazon.

Export from MongoDB to Parquet.

Moving data from a single purpose data silo to your Enterprise Data Lake is a common use case. Using Apache NiFi we can easily save your data from this remote silo and bring it streaming into your analytics store for machine learning and deep analytics with Impala, Hive and Spark. It doesn't matter which cloud which are coming from or going to or from cloud to on-premise or various Hybrid situations. Apache NiFi will work in all of these situations which full data lineage and provenance on what it did when.

I have created a mock dataset with Mockaroo. It's all about yummy South Jersey sandwiches.

Our Easy MongoDB Flows to Ingest Mongo data to our Date Lake and another flow to load MongoDB.

104478-mongodbcloudoverview.png

In our test, we loaded all the data from our Mock REST API into a MongoDB in the cloud. In the real world an application populated that dataset and now we need to bring it into our central data lake for analytics.

104475-mongodbcloud.png

We use Jolt to replace the non-Hadoop friendly built-in MongoDB _id with a friendly name mongo_id.

104471-mongojolt.png

Storing to Parquet on HDFS is Easy (Let's compress with Snappy)

104472-mongotoparquet.png

Connecting to MongoDB is easy, setup a controller and specify the database and collection.

104473-putmongo.png

Our MongoDB Connection Service, just enter your URI with username/password@server.

104474-mongodburl.png

GetHTTP URL
https://my.api.mockaroo.com/hoagie.json

GetHTTP Filename
${filename:append('hoagie.'):append(${now():format('yyyyMMddHHmmSS'):append(${md5}):append('.json')})}

JSON Path Expression
$.*

JOLT Chain
[{
"operation": "shift",
"spec": {
"_id": "mongo_id",
"*": "&"
}
}]

Mongo URI
mongodb://user:userpassword@server.cloud.com:13916/nifi

Many files stored in HDFS as Parquet

104479-parquetinhdfshoagie.png

1,360 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 04:44 AM
Updated by:
 
Contributors
Top Kudoed Authors