Reply
Explorer
Posts: 20
Registered: ‎07-29-2013

Re: Make Solr to use HDFS

Hi Bala,

 

  I have found that very rarely is data truly unstructured.  What kind of data is it?  Typically, there is some form of structure to the data.  Can you send me a sample file kevin@cloudera.com

Expert Contributor
Posts: 87
Registered: ‎06-16-2014

Re: Make Solr to use HDFS

Kevin , The data consists of Rich documents (txt , pdf , doc files) It does not hold any particular structure . Is it possible to extract the data out of this format ??
Thanks
Bala
Explorer
Posts: 20
Registered: ‎07-29-2013

Re: Make Solr to use HDFS

Bala,

 

  It absolutely is.  I was just giving you a sample set of instructions so you could play with a CSV file ingest.  You will be looking to use Apache Tika.  The good news is there is a morphline to help you with that.  The bad new is you will have to write that morphline.  I would recommend starting here: https://github.com/cloudera/search#cdk-morphlines-solr-cell

Expert Contributor
Posts: 87
Registered: ‎06-16-2014

Re: Make Solr to use HDFS

Kevin , in the earlier briefing you have mentioned about morphline . So should i proceed with the earlier steps you have asked me to follow . Or should i go through this first ? https://github.com/cloudera/search#cdk-morphlines-solr-cell
Thanks
Bala
Explorer
Posts: 20
Registered: ‎07-29-2013

Re: Make Solr to use HDFS

You can follow the same steps I sent you, but you will need to switch to https://github.com/cloudera/search#cdk-morphlines-solr-cellcdk-morphline-solr-cell morphline instead of the CSV one in the example.

Expert Contributor
Posts: 87
Registered: ‎06-16-2014

Re: Make Solr to use HDFS

Kevin, How to use CDK ?
Thanks
Bala
Expert Contributor
Posts: 87
Registered: ‎06-16-2014

Re: Make Solr to use HDFS

Hello Kevin ,

I am still not able to figure out how to use the CDK u have mentioned :( .. Need help ..

Thanks
Bala
Thanks
Bala
Expert Contributor
Posts: 87
Registered: ‎06-16-2014

Re: Make Solr to use HDFS

Kevin , I followed the steps , It working as expected in dry run. But when i run without dry--run argument . It stops at this step :( :(

 

770  [main] INFO  org.apache.solr.cloud.ZkController  – Write file /tmp/1404354031741-0/velocity/facet_fields.vm
771  [main] INFO  org.apache.solr.cloud.ZkController  – Write file /tmp/1404354031741-0/elevate.xml
773  [main] INFO  org.apache.solr.cloud.ZkController  – Write file /tmp/1404354031741-0/admin-extra.menu-bottom.html
774  [main] INFO  org.apache.solr.cloud.ZkController  – Write file /tmp/1404354031741-0/schema.xml
897  [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  – Indexing 1 files using 1 real mappers into 1 reducers

 

It stops in 897 itself . I restarted and tried , still the same .

 

Any help .

 

Thanks

Bala

Thanks
Bala
New Contributor
Posts: 1
Registered: ‎05-09-2018

Re: Make Solr to use HDFS

Is incremental load to Solr is possible? Meaning that If the dataset set that is going to load in the solr has some unique keys ( with or without update in other fields of the record) that are already present in the solr collection, I want existing records get updated and new record get inserted in Solr collection. Could you please let me know if it is possible in Solr or not. If yes, please advice in achieving the same.

Announcements