Support Questions

Find answers, ask questions, and share your expertise

Solr mapreduce with kerberized backend fails

avatar
Explorer

Dear all,

We have job which runs `MapReduceIndexerTool` in kerebized environment. With couple tweaks we managed to get it running and even successing map/reduce phase, however it fails at go live stage while inserting data:

--- bunch of earlier log entries ---
15/12/17 19:33:08 INFO mapreduce.Job:  map 100% reduce 99%
15/12/17 19:33:28 INFO mapreduce.Job:  map 100% reduce 100%
15/12/17 19:34:58 INFO mapreduce.Job: Job job_1450203660079_0013 completed successfully
15/12/17 19:34:58 INFO mapreduce.Job: Counters: 52
	File System Counters
		FILE: Number of bytes read=1933903322
		FILE: Number of bytes written=3643256225
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=13020909852
		HDFS: Number of bytes written=20619046734
		HDFS: Number of read operations=10964
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1344
	Job Counters 
		Launched map tasks=236
		Launched reduce tasks=24
		Other local map tasks=236
		Total time spent by all maps in occupied slots (ms)=5822436
		Total time spent by all reduces in occupied slots (ms)=15745656
		Total time spent by all map tasks (ms)=5822436
		Total time spent by all reduce tasks (ms)=7872828
		Total vcore-seconds taken by all map tasks=5822436
		Total vcore-seconds taken by all reduce tasks=7872828
		Total megabyte-seconds taken by all map tasks=14905436160
		Total megabyte-seconds taken by all reduce tasks=40308879360
	Map-Reduce Framework
		Map input records=1886
		Map output records=16964842
		Map output bytes=11060997974
		Map output materialized bytes=1650839353
		Input split bytes=41536
		Combine input records=0
		Combine output records=0
		Reduce input groups=16964842
		Reduce shuffle bytes=1650839353
		Reduce input records=16964842
		Reduce output records=16964842
		Spilled Records=35286185
		Shuffled Maps =5664
		Failed Shuffles=0
		Merged Map outputs=5664
		GC time elapsed (ms)=313229
		CPU time spent (ms)=8043320
		Physical memory (bytes) snapshot=479611183104
		Virtual memory (bytes) snapshot=818600177664
		Total committed heap usage (bytes)=530422693888
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=20537547
	File Output Format Counters 
		Bytes Written=20619046734
	org.apache.solr.hadoop.SolrCounters
		SolrReducer: Number of document batches processed=848257
		SolrReducer: Number of documents processed=16964842
		SolrReducer: Time spent by reducers on physical merges [ms]=1316244849188
15/12/17 19:34:58 INFO hadoop.MapReduceIndexerTool: Done. Indexing 1886 files using 236 real mappers into 24 reducers took 3.31220419E11 secs
15/12/17 19:34:58 INFO hadoop.GoLive: Live merging of output shards into Solr cluster...
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00000 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00003 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00005 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00001 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00006 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00002 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00004 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00011 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00012 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00010 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00009 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00008 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00007 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00014 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00013 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00015 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00016 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00017 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00018 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00019 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00020 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00021 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00022 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00023 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:59 ERROR hadoop.GoLive: Error sending live merge command
java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at <a href="http://hdp-1.magic.com:8983/solr:">http://hdp-2.magic.com:8983/solr</a> Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 401 Authentication required</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/admin/cores. Reason:
<pre>    Authentication required</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>
</body>
</html>
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:188)
	at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:118)
	at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:866)
	at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:608)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:595)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at <a href="http://hdp-1.magic.com:8983/solr:">http://hdp-2.magic.com:8983/solr</a> Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 401 Authentication required</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/admin/cores. Reason:
<pre>    Authentication required</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>
</body>
</html>
	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:527)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:214)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:210)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:131)
	at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:99)
	at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:90)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
15/12/17 19:34:59 INFO hadoop.GoLive: Live merging of index shards into Solr cluster took 9.355796E7 secs
15/12/17 19:34:59 INFO hadoop.GoLive: Live merging failed
Job failed, leaving temporary directory: hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676

We had some issues with other places which were calling solr REST services but we have fixed that by using Krb5HttpClientConfigurer, however in this case we can't change code which is coming from Solr codebase.

1 ACCEPTED SOLUTION

avatar
Explorer

I solved problem by adding one line in org.apache.solr.hadoop.GoLive:

HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());

This enables kerberos support in solrj client instances used later during processing requests and propagates token to HTTP calls made to backend. It should be configurable somehow ie. via command line switch. Definitely it is a bug cause Golive phase of jobs will be not working with fully kerberized solr backends.

// CC: @Jonas Straub @Artem Ervits @Neeraj Sabharwal

View solution in original post

9 REPLIES 9

avatar
Master Mentor
@Łukasz Dywicki

You may have to open a support case to troubleshoot this.

avatar

The error looks familiar 🙂 The MR application does not pass any kerberos ticket to the Solr Instance and hence the Spnego authentication is failing on the Solr side.

How do you start your up, is that a custom MapReduce application or the Hadoop Job Jar that is provided with HDP-Search?

SolrCloud or Solr Standalone?

Can you access the SolrAdmin interface with your browser (<solr host>:8983/solr)?

. @Łukasz Dywicki

avatar
Explorer

@Jonas Straub Solr is started by separate command with -c switch so it does have connectivity to kerberized zookeeper instance. Job is launched from bash script via `hadoop jar` command. Bash script does have extra parameters embedded:

export HADOOP_OPTS="-Djava.security.auth.login.config=$MAGIC_CONF_DIR/jaas-client.conf"

I can't access solr from browser, unless I enable negotiation and run kinit on my machine, after that my firefox can access solr administrative interface.

avatar

Thanks for the addt. info.

When you start your Job through your bash script, do you have a valid Kerberos ticket on the machine or does your MR Job use a keytab file to retrieve a valid Kerberos ticket? Without a valid ticket, Solr will always deny access.

You might want to enable Kerberos security for Zookeeper as well, see this https://cwiki.apache.org/confluence/display/solr/K...

"When setting up a kerberized Solr cluster, it is recommended to enable Kerberos security for Zookeeper as well. In such a setup, the client principal used to authenticate requests with Zookeeper can be shared for internode communication as well."

Also see this article https://cwiki.apache.org/confluence/display/RANGER...

Alternatively, you could try the Hadoop Job Jar to ingest your data (I have successfully used it both in kerberized and non-kerberized Solr enivornments):

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr 

Could you share more details about your bash script and MR job?

avatar
Master Mentor

@Łukasz Dywicki has this been resolved? Can you post your solution or accept the best answer?

avatar
Explorer

It's not yet solved, I posted more details. Sorry for delay.

avatar
Explorer

@Jonas Straub I do have java.security.auth.login.config parameter specified in HADOOP_OPTS. I am able to execute job untill it tries to talk with Solr over http directly. Everything is secured - HDFS, Zookeeper and Solr also. I do not have initialized keytab cause as far I understand it should be retrieved by Java.

As I wrote earlier on - we had similar issue with solr client when talking to kerberized solr but we solved it by adding this call before creating client:

HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());<br>

Job is launched from command line. Hadoop call we have is this:

export HADOOP_OPTS="-Djava.security.auth.login.config=$MAGIC_CONF_DIR/jaas-client.conf"
hadoop jar \
     $FIND_JAR \
     org.apache.hadoop.fs.FsShell \
     -find "/$DATA_PATH" \
     -name '*.parquet' \
     -print \
  | \
  hadoop jar \
         $JOB_JAR \
         --libjars $LIB_JARS \
         -D magic_mapper.minTs=$MIN_TS \
         -D magic_mapper.maxTs=$MAX_TS \
         -D magic_mapper.zkHost=$ZOOKEEPER \
         -D magic_mapper.collection=$COLLECTION \
         -D mapreduce.map.output.compress=true \
         -D mapreduce.job.user.classpath.first=true \
         -D mapred.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec \
         -D mapreduce.job.map.class=com.magic.solr.hadoop.IndexMapper \
         --morphline-file /tmp/blank-morphlines.conf \
         --output-dir $TEMP_DIR \
         --zk-host $ZOOKEEPER \
         --collection $COLLECTION \
         --go-live \
         --verbose \
         --input-list -<br>

Input parameters magic_mapper.zkHost and collection and time range is used to calculate partitions, so they are used to just read information from zookeeper. Mapper is responsible for mapping parquet files to solr documents.

avatar
Explorer

@Jonas Straub I'm sure keytab I'm using when job is initialized is fine, cause I can use cURL calls such this to verify if solr is allowing http calls with given token/credentials:

curl --ntlm --negotiate -u : "http://hdp-1el7.magic.com:8983/solr/events/query" -d '{query: "*:*"}'   

This call doesn't fail even if job failed just few seconds earlier.

avatar
Explorer

I solved problem by adding one line in org.apache.solr.hadoop.GoLive:

HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());

This enables kerberos support in solrj client instances used later during processing requests and propagates token to HTTP calls made to backend. It should be configurable somehow ie. via command line switch. Definitely it is a bug cause Golive phase of jobs will be not working with fully kerberized solr backends.

// CC: @Jonas Straub @Artem Ervits @Neeraj Sabharwal