Created on 02-08-201610:33 AM - edited 08-17-201901:12 PM
article shows how to setup and secure a SolrCloud cluster with Kerberos and
Ranger. Furthermore it outlines some important configurations that are
necessary in order to use the combination Solr + HDFS + Kerberos.
Tested on HDP 2.3.4, Ambari 2.1.2, Ranger 0.5, Solr 5.2.1; MIT Kerberos
Pre-Requisites & Service Allocation
should have a running HDP cluster, including Kerberos, Ranger and HDFS.
article I am going to use a 6 node (3 master + 3 worker) cluster with the
following service allocation.
on the size and use case of your Solr environment, you can either install Solr
on separate nodes (larger workloads and collections) or install them on the
same nodes as the Datanodes. For this installation I have decided to install
Solr on the 3 Datanodes.
Note: The picture above is only showing the main
services and components, there are additional clients and services installed
(Yarn, MR, Hive, ...).
yum install lucidworks-hdpsearch
service solr start
ln -s /opt/lucidworks-hdpsearch/solr/server/logs /var/log/solr
Note: Make sure /opt/lucidworks-hdpsearch is owned by
user solr and solr is available as a service ("service solr status"
should return the Solr status)
Keytabs and Principals
for Solr to authenticate itself with the kerberized cluster, it is necessary to
create a Solr and Spnego Keytab. The latter is used for authenticating HTTP requests. Its recommended to create a
keytab per host, instead of a keytab that is distributed to all hosts, e.g.
solr/myhostname@EXAMPLE.COM instead of solr@EXAMPLE.COM
service keytab will also be used to enable Solr Collections to write to the
Move the Keytabs to the individual hosts (in my case =>
horton04,horton05,horton06) and save them under /etc/security/keytabs/solr.service.keytab
Create Spnego Service Keytab
To authenticate HTTP requests, it is necessary to create a Spnego Service Keytab, either by making a copy of the existing spnego-Keytab or by creating a separate solr/spnego principal + keytab. On each Solr host do the following:
Solr data will be stored in the Hadoop Filesystem, it is important to adjust
the time Solr will take to shutdown or "kill" the Solr process (whenever you execute "service solr stop/restart"). If
this setting is not adjusted, Solr will try to shutdown the Solr process and
because it takes a bit more time when using HDFS, Solr will simply kill the process
and most of the time lock the Solr Indexes of your collections. If the index of a collection is locked the following exception is shown after the startup routine
"org.apache.solr.common.SolrException: Index locked for write"
Increase the sleep time from 5 to 30 seconds in /opt/lucidworks-hdpsearch/solr/bin/solr
sed -i 's/(sleep 5)/(sleep 30)/g' /opt/lucidworks-hdpsearch/solr/bin/solr
(set this password to whatever you set when running Mysql pre-req steps for Ranger)
Enable the Plugin and (Re)start Solr
service solr restart
enable script will distribute some files and create sym-links in
If you go
to the Ranger UI, you should be able to see whether your Solr instances are communicating with Ranger or not.
Everything has been setup and the policies have been synced with the Solr nodes, its time for some smoke tests :)
our installation we are going to setup a test collection with one of the sample
datasets from Solr, called
Go to the first node of your Solr Cloud (e.g.
the initial Solr Collection configuration by using the
basic_config, which is
part of every Solr installation
This bug was acknowledged and fixed by Oracle in Java JDK >= 1.8.0_60
White Page / Too many groups
Problem: When the Solr Admin
is secured with Kerberos, users with too many AD groups cant access the page. Usually these users only see a white page as a result and the solr log is
showing the following message.
badMessage: java.lang.IllegalStateException: too much data after closed for
HttpParser Header is too large >8192