Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Pyspark on YARN to push data to secure ElasticSearch Cluster - Certification Issue

Pyspark on YARN to push data to secure ElasticSearch Cluster - Certification Issue

New Contributor

I have a ElasticSearch Cluster with SearchGuard Enabled. I am trying to push data into ElasticSearch with Spark.

OS - CentOS7

ElasticSearch Version - 6.4.1

Spark - 2.3.0

Java - openjdk-1.8.0

Yarn - 2.7.3

HDFS - 2.7.3

HDP - 2.6.5.0

ElasticSearch has been secured with SearchGuard via PEM key. The chain-ca.pem has been added to the truststore on all the spark nodes. I have added the required configurations on my PySpark Code.

es_write_conf = {
    "es.nodes" : "esm1,esm2,esm3",
    "es.port" : "9200",
    "es.resource" : str(topic+"_"+year_week+"/"+topic),
    "es.input.json": "true",
    "es.nodes.ingest.only": "true",
    "es.net.http.auth.user": "admin",
    "es.net.http.auth.pass": "admin",
    "es.net.ssl":"true",
    "es.net.ssl.cert.allow.self.signed":"true",
    "es.net.ssl.keystore.location":"file:///usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/security/cacerts",
    "es.net.ssl.keystore.pass":"changeit"
}

I ran this with spark-submit on hdfs user:

spark-submit --master local[4] --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/code/ingestion.py demo_machine001

Produced this error.

Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

This was solved by using sudo to run the spark submit. I had previously changed the permissions and the ownership of the cacerts file but it produced the same error.

I am trying to run it through YARN and getting the same errors.

Running it via:

spark-submit --master yarn --deploy-mode cluster --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/code/ingestion.py demo_machine001

Produces the error:

Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

Running it via:

sudo SPARK_HOME=/usr/hdp/current/spark2-client SPARK_MAJOR_VERSION=2 PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.6-src.zip:$PYTHONPATH spark-submit --master yarn --deploy-mode cluster --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/everlytics/ingestion.py demo_machine001

Produces the error:

Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

I have put the cacerts file into hdfs and made the changes in the spark code:

"es.net.ssl.keystore.location":"hdfs://spm1:8020/certificates/cacerts"

This produced an error:

Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cannot initialize SSL - Expected to find keystore file at [hdfs://spm1:8020/certificates/cacerts] but was unable to. Make sure that it is available on the classpath, or if not, that you have specified a valid URI.

I have made multiple iterations for the permissions and ownership to the cacerts on both hdfs and local file system but to no avail.

I have also copied the cacerts to /tmp on each node for global access but nothing has worked so far.

Don't have an account?
Coming from Hortonworks? Activate your account here