Member since
12-17-2015
6
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3875 | 02-25-2016 02:30 AM |
02-25-2016
02:30 AM
1 Kudo
I solved problem by adding one line in org.apache.solr.hadoop.GoLive: HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
This enables kerberos support in solrj client instances used later during processing requests and propagates token to HTTP calls made to backend. It should be configurable somehow ie. via command line switch. Definitely it is a bug cause Golive phase of jobs will be not working with fully kerberized solr backends. // CC: @Jonas Straub @Artem Ervits @Neeraj Sabharwal
... View more
02-22-2016
11:34 PM
@Jonas Straub I'm sure keytab I'm using when job is initialized is fine, cause I can use cURL calls such this to verify if solr is allowing http calls with given token/credentials: curl --ntlm --negotiate -u : "http://hdp-1el7.magic.com:8983/solr/events/query" -d '{query: "*:*"}'
This call doesn't fail even if job failed just few seconds earlier.
... View more
02-03-2016
10:23 AM
It's not yet solved, I posted more details. Sorry for delay.
... View more
02-03-2016
10:23 AM
1 Kudo
@Jonas Straub I do have
java.security.auth.login.config parameter specified in HADOOP_OPTS. I am able to execute job untill it tries to talk with Solr over http directly. Everything is secured - HDFS, Zookeeper and Solr also. I do not have initialized keytab cause as far I understand it should be retrieved by Java.
As I wrote earlier on - we had similar issue with solr client when talking to kerberized solr but we solved it by adding this call before creating client:
HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());<br>
Job is launched from command line. Hadoop call we have is this:
export HADOOP_OPTS="-Djava.security.auth.login.config=$MAGIC_CONF_DIR/jaas-client.conf"
hadoop jar \
$FIND_JAR \
org.apache.hadoop.fs.FsShell \
-find "/$DATA_PATH" \
-name '*.parquet' \
-print \
| \
hadoop jar \
$JOB_JAR \
--libjars $LIB_JARS \
-D magic_mapper.minTs=$MIN_TS \
-D magic_mapper.maxTs=$MAX_TS \
-D magic_mapper.zkHost=$ZOOKEEPER \
-D magic_mapper.collection=$COLLECTION \
-D mapreduce.map.output.compress=true \
-D mapreduce.job.user.classpath.first=true \
-D mapred.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec \
-D mapreduce.job.map.class=com.magic.solr.hadoop.IndexMapper \
--morphline-file /tmp/blank-morphlines.conf \
--output-dir $TEMP_DIR \
--zk-host $ZOOKEEPER \
--collection $COLLECTION \
--go-live \
--verbose \
--input-list -<br>
Input parameters magic_mapper.zkHost and collection and time range is used to calculate partitions, so they are used to just read information from zookeeper. Mapper is responsible for mapping parquet files to solr documents.
... View more
12-21-2015
09:37 PM
1 Kudo
@Jonas Straub Solr is started by separate command with -c switch so it does have connectivity to kerberized zookeeper instance. Job is launched from bash script via `hadoop jar` command. Bash script does have extra parameters embedded:
export HADOOP_OPTS="-Djava.security.auth.login.config=$MAGIC_CONF_DIR/jaas-client.conf"
I can't access solr from browser, unless I enable negotiation and run kinit on my machine, after that my firefox can access solr administrative interface.
... View more
12-17-2015
10:21 PM
2 Kudos
Dear all,
We have job which runs `MapReduceIndexerTool` in kerebized environment. With couple tweaks we managed to get it running and even successing map/reduce phase, however it fails at go live stage while inserting data:
--- bunch of earlier log entries ---
15/12/17 19:33:08 INFO mapreduce.Job: map 100% reduce 99%
15/12/17 19:33:28 INFO mapreduce.Job: map 100% reduce 100%
15/12/17 19:34:58 INFO mapreduce.Job: Job job_1450203660079_0013 completed successfully
15/12/17 19:34:58 INFO mapreduce.Job: Counters: 52
File System Counters
FILE: Number of bytes read=1933903322
FILE: Number of bytes written=3643256225
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=13020909852
HDFS: Number of bytes written=20619046734
HDFS: Number of read operations=10964
HDFS: Number of large read operations=0
HDFS: Number of write operations=1344
Job Counters
Launched map tasks=236
Launched reduce tasks=24
Other local map tasks=236
Total time spent by all maps in occupied slots (ms)=5822436
Total time spent by all reduces in occupied slots (ms)=15745656
Total time spent by all map tasks (ms)=5822436
Total time spent by all reduce tasks (ms)=7872828
Total vcore-seconds taken by all map tasks=5822436
Total vcore-seconds taken by all reduce tasks=7872828
Total megabyte-seconds taken by all map tasks=14905436160
Total megabyte-seconds taken by all reduce tasks=40308879360
Map-Reduce Framework
Map input records=1886
Map output records=16964842
Map output bytes=11060997974
Map output materialized bytes=1650839353
Input split bytes=41536
Combine input records=0
Combine output records=0
Reduce input groups=16964842
Reduce shuffle bytes=1650839353
Reduce input records=16964842
Reduce output records=16964842
Spilled Records=35286185
Shuffled Maps =5664
Failed Shuffles=0
Merged Map outputs=5664
GC time elapsed (ms)=313229
CPU time spent (ms)=8043320
Physical memory (bytes) snapshot=479611183104
Virtual memory (bytes) snapshot=818600177664
Total committed heap usage (bytes)=530422693888
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=20537547
File Output Format Counters
Bytes Written=20619046734
org.apache.solr.hadoop.SolrCounters
SolrReducer: Number of document batches processed=848257
SolrReducer: Number of documents processed=16964842
SolrReducer: Time spent by reducers on physical merges [ms]=1316244849188
15/12/17 19:34:58 INFO hadoop.MapReduceIndexerTool: Done. Indexing 1886 files using 236 real mappers into 24 reducers took 3.31220419E11 secs
15/12/17 19:34:58 INFO hadoop.GoLive: Live merging of output shards into Solr cluster...
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00000 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00003 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00005 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00001 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00006 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00002 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00004 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00011 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00012 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00010 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00009 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00008 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00007 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00014 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00013 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00015 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00016 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00017 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00018 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00019 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00020 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00021 into <a href="http://hdp-2.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00022 into <a href="http://hdp-3.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:58 INFO hadoop.GoLive: Live merge hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676/results/part-00023 into <a href="http://hdp-1.magic.com:8983/solr">http://hdp-2.magic.com:8983/solr</a>
15/12/17 19:34:59 ERROR hadoop.GoLive: Error sending live merge command
java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at <a href="http://hdp-1.magic.com:8983/solr:">http://hdp-2.magic.com:8983/solr</a> Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 401 Authentication required</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/admin/cores. Reason:
<pre> Authentication required</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>
</body>
</html>
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:118)
at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:866)
at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:608)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at <a href="http://hdp-1.magic.com:8983/solr:">http://hdp-2.magic.com:8983/solr</a> Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 401 Authentication required</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/admin/cores. Reason:
<pre> Authentication required</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>
</body>
</html>
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:527)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:214)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:210)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:131)
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:99)
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:90)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/12/17 19:34:59 INFO hadoop.GoLive: Live merging of index shards into Solr cluster took 9.355796E7 secs
15/12/17 19:34:59 INFO hadoop.GoLive: Live merging failed
Job failed, leaving temporary directory: hdfs://ambari.magic.com:8020/user/banana/mapreduceindexer-temp/temp-29676
We had some issues with other places which were calling solr REST services but we have fixed that by using Krb5HttpClientConfigurer, however in this case we can't change code which is coming from Solr codebase.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Solr