Created on 11-07-2017 01:43 AM - edited on 02-12-2020 02:45 AM by SumitraMenon
I have been struggling to configure Spark through Mongodb with SSL. Unfortunately, the steps are not well documented there. Here is some quick guidance:
Command line ( spark-shell / spark-submit / pyspark )
1) According with the JIRA below:
Copy both Spark Connector and Mongo-Java-Driver among your datanodes, ( Nodes where your spark executors / driver are supposed to be running ). The Spark Connector uses the Mongo-Java-Driver and the driver will need to be configured to work with SSL. See the ssl tutorial in the java documentation.
Ex.
- spark_mongo-spark-connector_2.11-2.1.0.jar
- mongodb_mongo-java-driver-3.4.2.jar
OBS: Find yours at the mongodb website.
2) Go to ambari > Spark > Custom spark-defaults, now pass these two parameters in order to make spark (executors/driver) aware about the certificates.
Example from my lab:
spark.driver.extraJavaOptions=-Djavax.net.ssl.trustStore=/tmp/path/keystore.jks -Djavax.net.ssl.trustStorePassword=bigdata -Djavax.net.ssl.keyStore=/tmp/path/keystore.jks -Djavax.net.ssl.keyStorePassword=bigdata
spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=/tmp/path/keystore.jks -Djavax.net.ssl.trustStorePassword=bigdata -Djavax.net.ssl.keyStore=/tmp/path/keystore.jks -Djavax.net.ssl.keyStorePassword=bigdata
3) Afterwards, move the .jks file to a common shared location among your datanodes ( Executors and Driver ).
4) Submit your spark code
./spark-shell --master yarn-client
import com.mongodb.spark.config._ import com.mongodb.spark._ val readConfig = ReadConfig(Map("uri" -> "mongodb://user:password@host:port/<database>?ssl=true")) val rdd = MongoSpark.load(sc, readConfig) println(rdd.count)
5) Zeppelin (Optional) If you want to use it though zeppelin, you should also configure the interpreter (%spark) to look at the correct truststore location ( where your certificate resides ).
Normally Zeppelin Interpreter is another process spawned from Zeppelin, we need to check if the interpreter process has it's own truststore (javax.net.ssl.trustStore), something like:
ps -ef | grep zeppelin
zeppelin 18064 1 0 Nov06 ? 00:01:13 /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version=2.6.2.0-205 -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-zeppelin-minotauro1.hostname.br.log -cp ::/usr/hdp/current/zeppelin-server/lib/interpreter/*:/usr/hdp/current/zeppelin-server/lib/*:/usr/hdp/current/zeppelin-server/*::/usr/hdp/current/zeppelin-server/conf org.apache.zeppelin.server.ZeppelinServer
ps auxx | grep interpreter
zeppelin 18064 0.3 6.3 4664040 513732 ? Sl Nov06 1:13 /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version=2.6.2.0-205 -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-zeppelin-minotauro1.hostname.br.log -cp ::/usr/hdp/current/zeppelin-server/lib/interpreter/*:/usr/hdp/current/zeppelin-server/lib/*:/usr/hdp/current/zeppelin-server/*::/usr/hdp/current/zeppelin-server/conf org.apache.zeppelin.server.ZeppelinServer
.. /interpreter/spark/zeppelin-spark-0.6.0.2.5.0.0-1245.jar 36304
If the clause "-Djavax.net.ssl.trustStore" is not specified, we will need to import our certificate into our default cacerts ($JAVA_HOME):
If you need to export from an existing truststore.
keytool -export -keystore /tmp/path/truststore.ts -alias mongodb-cert -file /tmp/mongodb-cert.crt
and now import into the default cacerts.
keytool -import -keystore /usr/jdk64/jdk1.8.0_112/jre/lib/security/cacerts -alias mongodb-cert -file /tmp/mongodb-cert.crt
Optional: You could also convert certificates to various forms, sign certificate requests like a "mini CA " or edit certificate trust settings.
Ex. openssl x509 -in /tmp/mongodb-cert.crt -noout -text -format -inform der
6) Rerun the job using zeppelin.
Created on 10-07-2018 09:31 AM
Thank for your great information. I have a trouble with connecting Mongodb with .ssl (.pem configuartion) from spark and scala via IDEA. Do you have any suggestion on this?