Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Importing data from MongoDB

Importing data from MongoDB

New Contributor

Can data from MongoDB be imported using Sqoop and MongoDB's JDBC driver? Or do I need to use the MongoDB Connector for Hadoop plugin and a MapReduce job to import the data to Hadoop?

3 REPLIES 3

Re: Importing data from MongoDB

Master Guru
Either way should work.

The JDBC + Sqoop method's success relies on the robustness of the MongoDB driver. It will work if it supports the mechanisms Sqoop will request of it.

The latter way of running regular MR imports would appear more reliable, as it directly serves the import purpose.
Highlighted

Re: Importing data from MongoDB

New Contributor

Is there any documentation on using MongoDB Connector for Hadoop plugin and writing a MapReduce job to import the data to Hadoop?

Re: Importing data from MongoDB

New Contributor
We used the MongoDB Connector for Hadoop but did not use MapReduce. This was a small trial to see if Cognos could query data in Hadoop. Here's what we did.

1. Build the MongoDB Connector for Hadoop (open source code)

2. Created an external table in Apache Hive (data physically resides in MongoDB) using the CREATE TABLE statement. The data model is denormalized (i.e. the documents contain arrays). The structure of the JSON document was left intact. (i.e. there are arrays and nested documents in the Hive external table). The Hive table is queryable using SQL, but is not suitable for use with Cognos because of the nested documents and arrays.

3. Created another Apache Hive table (data selected from external table and physically ingested into Hadoop) using the CREATE TABLE statement. The CREATE TABLE statement used a Hive lateral view in conjunction with the explode UDTF which output elements of an array as separate rows. This eliminates the arrays from the data. The nested documents were also flattened. This new Hive table was queryable using SQL via Impala. The data in this second Hive table is suitable for use with Cognos because it doesn't contain nested documents or arrays.

Don't have an account?
Coming from Hortonworks? Activate your account here