Created 12-17-2015 10:21 PM
Dear all,
We have job which runs `MapReduceIndexerTool` in kerebized environment. With couple tweaks we managed to get it running and even successing map/reduce phase, however it fails at go live stage while inserting data:
--- bunch of earlier log entries --- 15/12/17 19:33:08 INFO mapreduce.Job: map 100% reduce 99% 15/12/17 19:33:28 INFO mapreduce.Job: map 100% reduce 100% 15/12/17 19:34:58 INFO mapreduce.Job: Job job_1450203660079_0013 completed successfully 15/12/17 19:34:58 INFO mapreduce.Job: Counters: 52 File System Counters FILE: Number of bytes read=1933903322 FILE: Number of bytes written=3643256225 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=13020909852 HDFS: Number of bytes written=20619046734 HDFS: Number of read operations=10964 HDFS: Number of large read operations=0 HDFS: Number of write operations=1344 Job Counters Launched map tasks=236 Launched reduce tasks=24 Other local map tasks=236 Total time spent by all maps in occupied slots (ms)=5822436 Total time spent by all reduces in occupied slots (ms)=15745656 Total time spent by all map tasks (ms)=5822436 Total time spent by all reduce tasks (ms)=7872828 Total vcore-seconds taken by all map tasks=5822436 Total vcore-seconds taken by all reduce tasks=7872828 Total megabyte-seconds taken by all map tasks=14905436160 Total megabyte-seconds taken by all reduce tasks=40308879360 Map-Reduce Framework Map input records=1886 Map output records=16964842 Map output bytes=11060997974 Map output materialized bytes=1650839353 Input split bytes=41536 Combine input records=0 Combine output records=0 Reduce input groups=16964842 Reduce shuffle bytes=1650839353 Reduce input records=16964842 Reduce output records=16964842 Spilled Records=35286185 Shuffled Maps =5664 Failed Shuffles=0 Merged Map outputs=5664 GC time elapsed (ms)=313229 CPU time spent (ms)=8043320 Physical memory (bytes) snapshot=479611183104 Virtual memory (bytes) snapshot=818600177664 Total committed heap usage (bytes)=530422693888 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=20537547 File Output Format Counters Bytes Written=20619046734 org.apache.solr.hadoop.SolrCounters SolrReducer: Number of document batches processed=848257 SolrReducer: Number of documents processed=16964842 SolrReducer: Time spent by reducers on physical merges [ms]=1316244849188 15/12/17 19:34:58 INFO hadoop.MapReduceIndexerTool: Done. Indexing 1886 files using 236 real mappers into 24 reducers took 3.31220419E11 secs 15/12/17 19:34:58 INFO hadoop.GoLive: Live merging of output shards into Solr cluster... 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00000 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00003 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00005 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00001 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00006 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00002 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00004 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00011 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00012 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00010 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00009 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00008 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00007 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00014 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00013 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00015 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00016 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00017 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00018 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00019 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00020 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00021 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00022 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00023 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a> 15/12/17 19:34:59 ERROR hadoop.GoLive: Error sending live merge command java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at <a href="http://hdp-1.magic.com:8983/solr:">http://hdp-2.magic.com:8983/solr</a> Expected mime type application/octet-stream but got text/html. <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> <title>Error 401 Authentication required</title> </head> <body><h2>HTTP ERROR 401</h2> <p>Problem accessing /solr/admin/cores. Reason: <pre> Authentication required</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/> </body> </html> at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:118) at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:866) at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:608) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:595) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at <a href="http://hdp-1.magic.com:8983/solr:">http://hdp-2.magic.com:8983/solr</a> Expected mime type application/octet-stream but got text/html. <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> <title>Error 401 Authentication required</title> </head> <body><h2>HTTP ERROR 401</h2> <p>Problem accessing /solr/admin/cores. Reason: <pre> Authentication required</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/> </body> </html> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:527) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:214) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:210) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:131) at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:99) at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:90) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/12/17 19:34:59 INFO hadoop.GoLive: Live merging of index shards into Solr cluster took 9.355796E7 secs 15/12/17 19:34:59 INFO hadoop.GoLive: Live merging failed Job failed, leaving temporary directory: hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676
We had some issues with other places which were calling solr REST services but we have fixed that by using Krb5HttpClientConfigurer, however in this case we can't change code which is coming from Solr codebase.
Created 02-25-2016 02:30 AM
I solved problem by adding one line in org.apache.solr.hadoop.GoLive:
HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
This enables kerberos support in solrj client instances used later during processing requests and propagates token to HTTP calls made to backend. It should be configurable somehow ie. via command line switch. Definitely it is a bug cause Golive phase of jobs will be not working with fully kerberized solr backends.
Created 12-17-2015 11:03 PM
You may have to open a support case to troubleshoot this.
Created 12-21-2015 03:31 PM
The error looks familiar 🙂 The MR application does not pass any kerberos ticket to the Solr Instance and hence the Spnego authentication is failing on the Solr side.
How do you start your up, is that a custom MapReduce application or the Hadoop Job Jar that is provided with HDP-Search?
SolrCloud or Solr Standalone?
Can you access the SolrAdmin interface with your browser (<solr host>:8983/solr)?
Created 12-21-2015 09:37 PM
@Jonas Straub Solr is started by separate command with -c switch so it does have connectivity to kerberized zookeeper instance. Job is launched from bash script via `hadoop jar` command. Bash script does have extra parameters embedded:
export HADOOP_OPTS="-Djava.security.auth.login.config=$MAGIC_CONF_DIR/jaas-client.conf"
I can't access solr from browser, unless I enable negotiation and run kinit on my machine, after that my firefox can access solr administrative interface.
Created 12-23-2015 08:03 AM
Thanks for the addt. info.
When you start your Job through your bash script, do you have a valid Kerberos ticket on the machine or does your MR Job use a keytab file to retrieve a valid Kerberos ticket? Without a valid ticket, Solr will always deny access.
You might want to enable Kerberos security for Zookeeper as well, see this https://cwiki.apache.org/confluence/display/solr/K...
"When setting up a kerberized Solr cluster, it is recommended to enable Kerberos security for Zookeeper as well. In such a setup, the client principal used to authenticate requests with Zookeeper can be shared for internode communication as well."
Also see this article https://cwiki.apache.org/confluence/display/RANGER...
Alternatively, you could try the Hadoop Job Jar to ingest your data (I have successfully used it both in kerberized and non-kerberized Solr enivornments):
hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -Dlww.jaas.file=jaas.conf -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper --collection test -i file:///data/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat --zkConnect horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr
Could you share more details about your bash script and MR job?
Created 02-03-2016 02:37 AM
@Łukasz Dywicki has this been resolved? Can you post your solution or accept the best answer?
Created 02-03-2016 10:23 AM
It's not yet solved, I posted more details. Sorry for delay.
Created 02-03-2016 10:23 AM
@Jonas Straub I do have java.security.auth.login.config parameter specified in HADOOP_OPTS. I am able to execute job untill it tries to talk with Solr over http directly. Everything is secured - HDFS, Zookeeper and Solr also. I do not have initialized keytab cause as far I understand it should be retrieved by Java.
As I wrote earlier on - we had similar issue with solr client when talking to kerberized solr but we solved it by adding this call before creating client:
HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());<br>
Job is launched from command line. Hadoop call we have is this:
export HADOOP_OPTS="-Djava.security.auth.login.config=$MAGIC_CONF_DIR/jaas-client.conf" hadoop jar \ $FIND_JAR \ org.apache.hadoop.fs.FsShell \ -find "/$DATA_PATH" \ -name '*.parquet' \ -print \ | \ hadoop jar \ $JOB_JAR \ --libjars $LIB_JARS \ -D magic_mapper.minTs=$MIN_TS \ -D magic_mapper.maxTs=$MAX_TS \ -D magic_mapper.zkHost=$ZOOKEEPER \ -D magic_mapper.collection=$COLLECTION \ -D mapreduce.map.output.compress=true \ -D mapreduce.job.user.classpath.first=true \ -D mapred.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec \ -D mapreduce.job.map.class=com.magic.solr.hadoop.IndexMapper \ --morphline-file /tmp/blank-morphlines.conf \ --output-dir $TEMP_DIR \ --zk-host $ZOOKEEPER \ --collection $COLLECTION \ --go-live \ --verbose \ --input-list -<br>
Input parameters magic_mapper.zkHost and collection and time range is used to calculate partitions, so they are used to just read information from zookeeper. Mapper is responsible for mapping parquet files to solr documents.
Created 02-22-2016 11:34 PM
@Jonas Straub I'm sure keytab I'm using when job is initialized is fine, cause I can use cURL calls such this to verify if solr is allowing http calls with given token/credentials:
curl --ntlm --negotiate -u : "http://hdp-1el7.magic.com:8983/solr/events/query" -d '{query: "*:*"}'
This call doesn't fail even if job failed just few seconds earlier.
Created 02-25-2016 02:30 AM
I solved problem by adding one line in org.apache.solr.hadoop.GoLive:
HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
This enables kerberos support in solrj client instances used later during processing requests and propagates token to HTTP calls made to backend. It should be configurable somehow ie. via command line switch. Definitely it is a bug cause Golive phase of jobs will be not working with fully kerberized solr backends.