About charles_bradbur

charles_bradbur · ‎12-01-2017

All - just an update. I was able to get help resolving this on StackOverflow. See the post here: https://stackoverflow.com/questions/47399391/using-nifi-to-pull-elasticsearch-indexes?noredirect=1#comment82139433_47399391

asirna · ‎11-03-2017

@Charles Bradbury, Glad that the issue is resolved. Can you kindly accept the answer so that community users can quickly find the answer.

ahadjidj · ‎10-31-2017

Hi @Charles Bradbury For your information, Spark 2.2 is supported in HDP 2.6.3 annonced today : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.3/bk_release-notes/content/comp_versions.html

charles_bradbur · ‎12-01-2017

All - just an update. The ES-Hadoop connector, as it should be, is something more in the benefit of Elasticsearch, not so much Spark or Hadoop. It will allow me to connect to the Elasticsearch cluster with spark-shell or PySpark. This is great for ad-hoc queries, however, for long term data movement, use Apache NiFi. The setup, if you are interested, can be found via Stackoverflow here, where I got some great help: https://stackoverflow.com/questions/47399391/using-nifi-to-pull-elasticsearch-indexes?noredirect=1#comment82139433_47399391 One issue I ran into was that we have SSL setup on Elasticsearch and while I was referencing that cert (I had to convert the PEM format to JKS, since Hadoop/Spark only understand JKS), it wasn't working. After working with Elasticsearch support, they had me add the CERT to the CACERTS file in my Java installation and everything worked after that. I had to do this on each box in my cluster for Spark/Hadoop if I ran a job across the cluster. If I ran in stand-alone mode, the single box was fine. Either way, this can save you a lot of issues, just add your Elasticsearch CERT to the CACERTS using the keytool.

suram1 · ‎10-17-2017

That seems more reasonable. But if you want to reduce the the port.maxRetries to 250, then better have a spacing of 250. And I think there was a typo, 40000-40031 is 32 ports so you can change it to 40032 if you are using maxRetries to 32 ports. And again, the executor ports will depend on what mode you are running spark on(standalone vs cluster vs client).

Shelton · ‎09-17-2017

@Charles Bradbury It should be frustrating, a simple diagnostic isn't easy just a quick look I saw some incompatibility in 2017-09-1516:58:19,901-StackFeatureVersionInfo:ClusterStack=2.5,ClusterCurrentVersion=None,CommandStack=None,CommandVersion=None->2.5 2017-09-1516:58:19,933-Using hadoop conf dir:/usr/hdp/current/hadoop-client/conf 2017-09-1516:58:19,953- checked_call['rpm -q --queryformat '%{version}-%{release}' hdp-select | sed -e 's/\.el[0-9]//g''] {'stderr': -1} 2017-09-1516:58:19,985- checked_call returned (0,'2.6.0.3-8','') There is a conflict between 2.5 and 2.6.0.3-8 can you validate your hdp.repo in /etc/yum.repo.d/* Make sure you have only the one you intend to install in this case I think 2.6 yum clean all yum repolist Please revert

Online	Offline
Last Visited	‎12-01-2017 10:00 PM

Member Since	‎09-16-2017 12:13 AM
Last Visited	‎12-01-2017 10:00 PM
Posts	20
Kudos received	1

Cloudera Community

Re: Ambari/Spark/Hadoop Cluster and Elasticsearch ...

Re: NiFi: Elasticsearch JSON to Parquet to be stor...

Re: NiFi: Elasticsearch JSON to Parquet to be stor...

Re: Spark2 Exception Encountered When Attempting t...

Re: Upgrade Spark 2.1 (HDS-2.6.2.0) to Spark 2.2

Re: Ambari/Spark/Hadoop Cluster and Elasticsearch ...

Re: Spark Jobs failing - firewall issue....?

Re: Ambari 2.5.0 HDP 2.5.3.0 Installation on RHEL ...