Community Articles
Find and share helpful community-sourced technical articles
Labels (2)
Cloudera Employee

I have been struggling to configure Spark through Mongodb with SSL. Unfortunately, the steps are not well documented there. Here is some quick guidance:

Command line ( spark-shell / spark-submit / pyspark )

1) According with the JIRA below:

https://jira.mongodb.org/browse/SPARK-115?jql=project%20%3D%20SPARK%20AND%20component%20%3D%20Docume...

Copy both Spark Connector and Mongo-Java-Driver among your datanodes, ( Nodes where your spark executors / driver are supposed to be running ). The Spark Connector uses the Mongo-Java-Driver and the driver will need to be configured to work with SSL. See the ssl tutorial in the java documentation.

Ex.

- spark_mongo-spark-connector_2.11-2.1.0.jar

- mongodb_mongo-java-driver-3.4.2.jar

OBS: Find yours at the mongodb website.

2) Go to ambari > Spark > Custom spark-defaults, now pass these two parameters in order to make spark (executors/driver) aware about the certificates.

Example from my lab:

spark.driver.extraJavaOptions=-Djavax.net.ssl.trustStore=/tmp/path/keystore.jks -Djavax.net.ssl.trustStorePassword=bigdata -Djavax.net.ssl.keyStore=/tmp/path/keystore.jks -Djavax.net.ssl.keyStorePassword=bigdata 
spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=/tmp/path/keystore.jks -Djavax.net.ssl.trustStorePassword=bigdata -Djavax.net.ssl.keyStore=/tmp/path/keystore.jks -Djavax.net.ssl.keyStorePassword=bigdata 

3) Afterwards, move the .jks file to a common shared location among your datanodes ( Executors and Driver ).

4) Submit your spark code

./spark-shell --master yarn-client
import com.mongodb.spark.config._ 
import com.mongodb.spark._ 
val readConfig = ReadConfig(Map("uri" -> "mongodb://user:password@host:port/<database>?ssl=true")) 
val rdd = MongoSpark.load(sc, readConfig) 
println(rdd.count) 

5) Zeppelin (Optional) If you want to use it though zeppelin, you should also configure the interpreter (%spark) to look at the correct truststore location ( where your certificate resides ).

Normally Zeppelin Interpreter is another process spawned from Zeppelin, we need to check if the interpreter process has it's own truststore (javax.net.ssl.trustStore), something like:

ps -ef | grep zeppelin
zeppelin 18064     1  0 Nov06 ?        00:01:13 /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version=2.6.2.0-205 -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-zeppelin-minotauro1.hostname.br.log -cp ::/usr/hdp/current/zeppelin-server/lib/interpreter/*:/usr/hdp/current/zeppelin-server/lib/*:/usr/hdp/current/zeppelin-server/*::/usr/hdp/current/zeppelin-server/conf org.apache.zeppelin.server.ZeppelinServer
ps auxx | grep interpreter
zeppelin 18064  0.3  6.3 4664040 513732 ?      Sl   Nov06   1:13 /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version=2.6.2.0-205 -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-zeppelin-minotauro1.hostname.br.log -cp ::/usr/hdp/current/zeppelin-server/lib/interpreter/*:/usr/hdp/current/zeppelin-server/lib/*:/usr/hdp/current/zeppelin-server/*::/usr/hdp/current/zeppelin-server/conf org.apache.zeppelin.server.ZeppelinServer
.. /interpreter/spark/zeppelin-spark-0.6.0.2.5.0.0-1245.jar 36304

If the clause "-Djavax.net.ssl.trustStore" is not specified, we will need to import our certificate into our default cacerts ($JAVA_HOME):

If you need to export from an existing truststore.

keytool -export -keystore /tmp/path/truststore.ts -alias mongodb-cert -file /tmp/mongodb-cert.crt 

and now import into the default cacerts.

keytool -import -keystore /usr/jdk64/jdk1.8.0_112/jre/lib/security/cacerts -alias mongodb-cert -file /tmp/mongodb-cert.crt 

Optional: You could also convert certificates to various forms, sign certificate requests like a "mini CA " or edit certificate trust settings.

Ex. openssl x509 -in /tmp/mongodb-cert.crt -noout -text -format -inform der

6) Rerun the job using zeppelin.

4,111 Views
0 Kudos
Comments
Not applicable

Thank for your great information. I have a trouble with connecting Mongodb with .ssl (.pem configuartion) from spark and scala via IDEA. Do you have any suggestion on this?