Support Questions
Find answers, ask questions, and share your expertise

Pyspark on YARN to push data to secure ElasticSearch Cluster - Certification Issue

New Contributor

I have a ElasticSearch Cluster with SearchGuard Enabled. I am trying to push data into ElasticSearch with Spark.

OS - CentOS7

ElasticSearch Version - 6.4.1

Spark - 2.3.0

Java - openjdk-1.8.0

Yarn - 2.7.3

HDFS - 2.7.3

HDP - 2.6.5.0

ElasticSearch has been secured with SearchGuard via PEM key. The chain-ca.pem has been added to the truststore on all the spark nodes. I have added the required configurations on my PySpark Code.

es_write_conf = {
    "es.nodes" : "esm1,esm2,esm3",
    "es.port" : "9200",
    "es.resource" : str(topic+"_"+year_week+"/"+topic),
    "es.input.json": "true",
    "es.nodes.ingest.only": "true",
    "es.net.http.auth.user": "admin",
    "es.net.http.auth.pass": "admin",
    "es.net.ssl":"true",
    "es.net.ssl.cert.allow.self.signed":"true",
    "es.net.ssl.keystore.location":"file:///usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/security/cacerts",
    "es.net.ssl.keystore.pass":"changeit"
}

I ran this with spark-submit on hdfs user:

spark-submit --master local[4] --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/code/ingestion.py demo_machine001

Produced this error.

Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

This was solved by using sudo to run the spark submit. I had previously changed the permissions and the ownership of the cacerts file but it produced the same error.

I am trying to run it through YARN and getting the same errors.

Running it via:

spark-submit --master yarn --deploy-mode cluster --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/code/ingestion.py demo_machine001

Produces the error:

Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

Running it via:

sudo SPARK_HOME=/usr/hdp/current/spark2-client SPARK_MAJOR_VERSION=2 PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.6-src.zip:$PYTHONPATH spark-submit --master yarn --deploy-mode cluster --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/everlytics/ingestion.py demo_machine001

Produces the error:

Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

I have put the cacerts file into hdfs and made the changes in the spark code:

"es.net.ssl.keystore.location":"hdfs://spm1:8020/certificates/cacerts"

This produced an error:

Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cannot initialize SSL - Expected to find keystore file at [hdfs://spm1:8020/certificates/cacerts] but was unable to. Make sure that it is available on the classpath, or if not, that you have specified a valid URI.

I have made multiple iterations for the permissions and ownership to the cacerts on both hdfs and local file system but to no avail.

I have also copied the cacerts to /tmp on each node for global access but nothing has worked so far.