Member since
06-27-2018
3
Posts
0
Kudos Received
0
Solutions
10-05-2018
07:19 PM
I have a ElasticSearch Cluster with SearchGuard Enabled. I am trying to push data into ElasticSearch with Spark. OS - CentOS7 ElasticSearch Version - 6.4.1 Spark - 2.3.0 Java - openjdk-1.8.0 Yarn - 2.7.3 HDFS - 2.7.3 HDP - 2.6.5.0 ElasticSearch has been secured with SearchGuard via PEM key. The chain-ca.pem has been added to the truststore on all the spark nodes. I have added the required configurations on my PySpark Code. es_write_conf = {
"es.nodes" : "esm1,esm2,esm3",
"es.port" : "9200",
"es.resource" : str(topic+"_"+year_week+"/"+topic),
"es.input.json": "true",
"es.nodes.ingest.only": "true",
"es.net.http.auth.user": "admin",
"es.net.http.auth.pass": "admin",
"es.net.ssl":"true",
"es.net.ssl.cert.allow.self.signed":"true",
"es.net.ssl.keystore.location":"file:///usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/security/cacerts",
"es.net.ssl.keystore.pass":"changeit"
}
I ran this with spark-submit on hdfs user: spark-submit --master local[4] --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/code/ingestion.py demo_machine001 Produced this error. Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target This was solved by using sudo to run the spark submit. I had previously changed the permissions and the ownership of the cacerts file but it produced the same error. I am trying to run it through YARN and getting the same errors. Running it via: spark-submit --master yarn --deploy-mode cluster --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/code/ingestion.py demo_machine001 Produces the error: Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target Running it via: sudo SPARK_HOME=/usr/hdp/current/spark2-client SPARK_MAJOR_VERSION=2 PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.6-src.zip:$PYTHONPATH spark-submit --master yarn --deploy-mode cluster --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/everlytics/ingestion.py demo_machine001
Produces the error: Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
I have put the cacerts file into hdfs and made the changes in the spark code: "es.net.ssl.keystore.location":"hdfs://spm1:8020/certificates/cacerts" This produced an error: Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cannot initialize SSL - Expected to find keystore file at [hdfs://spm1:8020/certificates/cacerts] but was unable to. Make sure that it is available on the classpath, or if not, that you have specified a valid URI.
I have made multiple iterations for the permissions and ownership to the cacerts on both hdfs and local file system but to no avail. I have also copied the cacerts to /tmp on each node for global access but nothing has worked so far.
... View more
Labels:
07-03-2018
09:06 AM
@Vijay Radha There is an error on your grok parser end_time is returned blank, I had to change it to GREEDYDATA. %{CUS_TIME_FORMAT:start_time} %{IP:ip_src_addr} %{GREEDYDATA:end_time} the dateFormat field seems to take only one date format so we can not use multiple date format definitions. Removing the end_time from the timeFields you can ingest the data.
... View more
06-28-2018
04:28 AM
@Bob Van Haute There is an error in the syntax. The correct parser should have quoted timestamp not the brackets. Here is the correct one. "parserConfig": { "grokPath": "/apps/metron/patterns/accesslog", "patternLabel": "ACCESSLOG", "timestampField": "timestamp", "timeFields": ["timestamp"], "dateFormat": "dd/MMM/yyyy:HH:mm:ss Z" }.
... View more