Community Articles

Find and share helpful community-sourced technical articles.
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
Cloudera Employee

I have been struggling to configure Spark through Mongodb with SSL. Unfortunately, the steps are not well documented there. Here is some quick guidance:

Command line ( spark-shell / spark-submit / pyspark )

1) According with the JIRA below:

Copy both Spark Connector and Mongo-Java-Driver among your datanodes, ( Nodes where your spark executors / driver are supposed to be running ). The Spark Connector uses the Mongo-Java-Driver and the driver will need to be configured to work with SSL. See the ssl tutorial in the java documentation.


- spark_mongo-spark-connector_2.11-2.1.0.jar

- mongodb_mongo-java-driver-3.4.2.jar

OBS: Find yours at the mongodb website.

2) Go to ambari > Spark > Custom spark-defaults, now pass these two parameters in order to make spark (executors/driver) aware about the certificates.

Example from my lab: 

3) Afterwards, move the .jks file to a common shared location among your datanodes ( Executors and Driver ).

4) Submit your spark code

./spark-shell --master yarn-client
import com.mongodb.spark.config._ 
import com.mongodb.spark._ 
val readConfig = ReadConfig(Map("uri" -> "mongodb://user:password@host:port/<database>?ssl=true")) 
val rdd = MongoSpark.load(sc, readConfig) 

5) Zeppelin (Optional) If you want to use it though zeppelin, you should also configure the interpreter (%spark) to look at the correct truststore location ( where your certificate resides ).

Normally Zeppelin Interpreter is another process spawned from Zeppelin, we need to check if the interpreter process has it's own truststore (, something like:

ps -ef | grep zeppelin
zeppelin 18064     1  0 Nov06 ?        00:01:13 /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version= -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/ -Dzeppelin.log.file=/var/log/zeppelin/ -cp ::/usr/hdp/current/zeppelin-server/lib/interpreter/*:/usr/hdp/current/zeppelin-server/lib/*:/usr/hdp/current/zeppelin-server/*::/usr/hdp/current/zeppelin-server/conf org.apache.zeppelin.server.ZeppelinServer
ps auxx | grep interpreter
zeppelin 18064  0.3  6.3 4664040 513732 ?      Sl   Nov06   1:13 /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version= -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/ -Dzeppelin.log.file=/var/log/zeppelin/ -cp ::/usr/hdp/current/zeppelin-server/lib/interpreter/*:/usr/hdp/current/zeppelin-server/lib/*:/usr/hdp/current/zeppelin-server/*::/usr/hdp/current/zeppelin-server/conf org.apache.zeppelin.server.ZeppelinServer
.. /interpreter/spark/zeppelin-spark- 36304

If the clause "" is not specified, we will need to import our certificate into our default cacerts ($JAVA_HOME):

If you need to export from an existing truststore.

keytool -export -keystore /tmp/path/truststore.ts -alias mongodb-cert -file /tmp/mongodb-cert.crt 

and now import into the default cacerts.

keytool -import -keystore /usr/jdk64/jdk1.8.0_112/jre/lib/security/cacerts -alias mongodb-cert -file /tmp/mongodb-cert.crt 

Optional: You could also convert certificates to various forms, sign certificate requests like a "mini CA " or edit certificate trust settings.

Ex. openssl x509 -in /tmp/mongodb-cert.crt -noout -text -format -inform der

6) Rerun the job using zeppelin.

0 Kudos

Thank for your great information. I have a trouble with connecting Mongodb with .ssl (.pem configuartion) from spark and scala via IDEA. Do you have any suggestion on this?