Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can't find resource 'solrconfig.xml' in classpath

avatar
Contributor

I followed the Cloudera Quick Start User Guide to create and index my data.
I was able to successfully excute the following steps:

# generate the instance configuration, the copy the schema
1) $ solrctl instancedir --generate $HOME/party_name_config
   $ cp schema.xml $HOME/party_name_config/conf

#upload the configuration to ZooKeepoer
$ solrctl instancedir --create party_name_config $HOME/party_name_config/

 

# create the new collection
$ solrctl collection --create party_name -c party_name_config

Now when I run the following script:

hadoop --config /etc/hadoop/conf.cloudera.hdfs jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool -D 'mapred.child.java.opts=-Xmx500m' --log4j ~/search/log4j.properties --morphline-file ~/search/readCSV.conf --output-dir hdfs://dwh-mst-dev02.stor.nccourts.org:8020/hdfs/data-lake/civil/solr/party-name --verbose --go-live --zk-host dwh-mst-dev02.stor.nccourts.org:2181/solr --collection party_name hdfs://dwh-mst-dev02.stor.nccourts.org:8020/hdfs/data-lake/civil/party_search

 

I am receiving the following exception:

  2726 [Thread-18] WARN  org.apache.hadoop.mapred.LocalJobRunner  - job_local542592213_0001
java.lang.Exception: org.kitesdk.morphline.api.MorphlineRuntimeException: org.apache.solr.core.SolrResourceNotFoundException: Can't find resource 'solrconfig.xml' in classpath or '/home/iapima/file:/tmp/hadoop-iapima/mapred/local/1490304732115/07193328-e9c3-454c-8523-4a782f9371e4.solr.zip/conf'
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
Caused by: org.kitesdk.morphline.api.MorphlineRuntimeException: org.apache.solr.core.SolrResourceNotFoundException: Can't find resource 'solrconfig.xml' in classpath or '/home/iapima/file:/tmp/hadoop-iapima/mapred/local/1490304732115/07193328-e9c3-454c-8523-4a782f9371e4.solr.zip/conf'
        at org.kitesdk.morphline.solr.SolrLocator.getIndexSchema(SolrLocator.java:209)
        at org.apache.solr.hadoop.morphline.MorphlineMapRunner.<init>(MorphlineMapRunner.java:141)
        at org.apache.solr.hadoop.morphline.MorphlineMapper.setup(MorphlineMapper.java:75)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.core.SolrResourceNotFoundException: Can't find resource 'solrconfig.xml' in classpath or '/home/iapima/file:/tmp/hadoop-iapima/mapred/local/1490304732115/07193328-e9c3-454c-8523-4a782f9371e4.solr.zip/conf'
        at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:362)
        at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:308)
        at org.apache.solr.core.Config.<init>(Config.java:117)
        at org.apache.solr.core.Config.<init>(Config.java:87)
        at org.apache.solr.core.SolrConfig.<init>(SolrConfig.java:167)
        at org.kitesdk.morphline.solr.SolrLocator.getIndexSchema(SolrLocator.java:201)
        ... 11 more

 

When I checked the party_name_config which was created in step 1, I checked under the sub-dir named conf, and the  solrconfig.xml does exist.

 

I am running on CDH 5.10


Help is appreciated. Thanks

8 REPLIES 8

avatar
Rising Star

Looks like the error is stating it is trying to find the solrconfig locally and not able to find it.

 

1. I noticed you are passing hadoop --config /etc/hadoop/conf.cloudera.hdfs ->

try to pass hadoop --config /etc/hadoop/conf.cloudera.yarn  as MRIndexer tool is a map reduce job and need that configuration. Make sure to have yarn and solr gateway on the node from where you are trying to run this from.

 

2. Can you check under zookeeper if you have all configs placed?

 

Login to dwh-mst-dev02.stor.nccourts.org and do

zookeeper-client 

ls /solr

ls /solr/configs

ls /solr/configs/party_name_config

ls /solr/configs/party_name_config/solrconfig.xml

 

Make sure all this is present under /solr in zookeeper

3. Can you paste the content of ~/search/readCSV.conf?  Make sure you have zkHost: dwh-mst-dev02.stor.nccourts.org:2181/solr set in your morphline config.

4. Do you have $HOME set up?

solrctl instancedir --create party_name_config $HOME/party_name_config/ 

 

5. Here is the Cloudera example for MRIT, please have a look at this 

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/search_data_index_prepare.html#csug_t...

 

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/search_batch_index_use_mapreduce.html...

 

 

avatar
Contributor

I made change suggested to point to /etc/hadoop/conf.cloudera.yarn as suggested.  That took care of the earlier

error. When I reran the script, I got the error below. 

-------------

Error: java.io.IOException: Batch Write Failure
        at org.apache.solr.hadoop.BatchWriter.throwIf(BatchWriter.java:239)
        at org.apache.solr.hadoop.BatchWriter.queueBatch(BatchWriter.java:181)
        at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:275)
        at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.solr.common.SolrException: ERROR: [doc=1966-05-19 10:36:59.373733] unknown field 'file_length'

-----------

It seems not to like the id field which is a string representation of a timestamp.

Here is an excerpt of the schema that I am including:

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

<uniqueKey>id</uniqueKey>
<field name="county" type="text_general" indexed="false" stored="true"/>
<field name="year" type="int" indexed="false" stored="true"/>
<field name="court_type" type="text_general" indexed="false" stored="true"/>
<field name="seq_num" type="int" indexed="false" stored="true"/>
<field name="role" type="text_general" indexed="false" stored="true"/>
<field name="num" type="int" indexed="false" stored="true"/>
<field name="stat" type="text_general" indexed="false" stored="true"/>
<field name="biz_name" type="text_general" indexed="true" stored="true"/>

--------------------

And here is an excerpt of my files to be indexed:

id,county,year,court_type,seq_num,party_role,party_num,party_status,biz_name,prefix,last_name,first_name,middle_name,suffix,in_regards_to,case_status,row_of_origin
1994-11-03 12:15:32.12172,180,1994,CVM,558,P,1,DISPOSED,WINDSOR ARMS HOUSING LTD PTNSHP,null,null,null,null,null,null,null,T48
1999-04-16 14:28:37.009778,000,1999,CVD,862,P,1,null,null,null,CRITZER,KAREN,YVONNE,null,null,null,T46

-----------------------

Here is the readCSV.conf

SOLR_LOCATOR : {
# Name of solr collection
collection : party_name

# ZooKeeper ensemble
zkHost : "dwh-mst-dev02.stor.nccourts.org:2181/solr"

# The maximum number of documents to send to Solr per network batch (throughput knob)
# batchSize : 100
}

morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

commands : [
{
readCSV {
separator : ","
columns : [id,county,year,court_type,seq_num,party_role,party_num,party_status,biz_name,prefix,last_name,first_name,middle_name,suffix,in_regards_to,case_status,row_of_origin]
ignoreFirstLine : true
trim : true
charset : UTF-8
}
}
{ logDebug { format : "output record: {}", args : ["@{}"] } }

# load the record into a Solr server or MapReduce Reducer.
{
loadSolr {
solrLocator : ${SOLR_LOCATOR}
}
}
]
}
]

--

Thanks

avatar
Rising Star

Glad to hear the original error went away.

 

The new error is related to schema and morphline.conf fields mismatch

 

 

if you want an id field as timestamp you have to use this

http://kitesdk.org/docs/1.1.0/morphlines/morphlines-reference-guide.html#convertTimestamp

 

else for now do you want to test it with a string field data and see if that works?Just try with 3-4 columns

 

Also look at this for unique field definition

 

https://wiki.apache.org/solr/UniqueKey

 

Also, Can you provide a full stack trace of the error?

 

Also, for testing purpose, you can pass --dry-run option with your MRIT command and once that succeed you can try "go -live"

 

avatar
Contributor

The key being a string is not an issue, as there will be no searches based on the timestamp.  Is there a way in the morphline to specify that the field is ineeded a string and not a timestamp?

Below is the full stack trace:

Error: java.io.IOException: Batch Write Failure
        at org.apache.solr.hadoop.BatchWriter.throwIf(BatchWriter.java:239)
        at org.apache.solr.hadoop.BatchWriter.queueBatch(BatchWriter.java:181)
        at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:275)
        at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.solr.common.SolrException: ERROR: [doc=1966-05-19 10:36:59.365118] unknown field 'file_length'
        at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:185)
        at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:78)
        at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:238)
        at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
        at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:940)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1095)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:701)
        at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
        at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
        at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:99)
        at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2135)
        at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150)
        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
        at org.apache.solr.hadoop.BatchWriter.runUpdate(BatchWriter.java:135)
        at org.apache.solr.hadoop.BatchWriter$Batch.run(BatchWriter.java:90)
        at org.apache.solr.hadoop.BatchWriter.queueBatch(BatchWriter.java:180)
        ... 9 more

98871 [main] ERROR org.apache.solr.hadoop.MapReduceIndexerTool  - Job failed! jobName: org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, jobId: job_1489673434857_0012

avatar
Rising Star
The error showing that file_length field is missing
Caused by: org.apache.solr.common.SolrException: ERROR: [doc=1966-05-19 10:36:59.365118] unknown field 'file_length'
 
solr doesn't have a file_length field specified so either you need to sanitize unknown fields using morphline before sending to solr or add file_length to solr.
 
Solr throws an exception on any attempt to load a document that contains a field that is not specified in schema.xml and looks like your data have that and your scheme.xml doesn't have that field.
 
More details regarding sanitizeUnknownSolrFields
 
 
 
 

avatar
Contributor

Frankly, I am at a loss.  There was some mis-match between my morphline fields and the schema, but I fixed that.  There are no columns not accounted for:  Here is my schema, I confirmed by getting it from the Solr web interface:

<uniqueKey>id</uniqueKey>
<field name="county" type="text_general" indexed="false" stored="true"/>
<field name="year" type="int" indexed="false" stored="true"/>
<field name="court_type" type="text_general" indexed="false" stored="true"/>
<field name="seq_num" type="int" indexed="false" stored="true"/>
<field name="party_role" type="text_general" indexed="false" stored="true"/>
<field name="party_num" type="int" indexed="false" stored="true"/>
<field name="party_status" type="text_general" indexed="false" stored="true"/>
<field name="biz_name" type="text_general" indexed="true" stored="true"/>
<field name="prefix" type="text_general" indexed="false" stored="true"/>
<field name="last_name" type="text_general" indexed="true" stored="true"/>
<field name="first_name" type="text_general" indexed="true" stored="true"/>
<field name="middle_name" type="text_general" indexed="true" stored="true"/>
<field name="suffix" type="text_general" indexed="false" stored="true"/>
<field name="in_regards_to" type="string" indexed="false" stored="true"/>
<field name="case_status" type="string" indexed="false" stored="true"/>
<field name="row_of_origin" type="string" indexed="false" stored="true"/>

And here is the fields as defined in readCSV.conf:
columns : [id,county,year,court_type,seq_num,party_role,party_num,party_status,biz_name,prefix,last_name,first_name,middle_name,suffix,in_regards_to,case_status,row_of_origin]

They are identical. Still same exception. Any other advise is appreciated.

avatar
Contributor

Is there an alternative way to index hdfs files to be used by Solr other than using MapReduceIndexerTool .  Like a map-reduce java program.  Any samples that can be shared are welcome.

avatar
Rising Star

Simple solr indexing example using post:

 

http://www.solrtutorial.com/solr-in-5-minutes.html

 

or you can use solrj

 

http://www.solrtutorial.com/solrj-tutorial.html

 

even you can try indexing using flume and moprhline solr sink

 

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/search_tutorial.html