Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I want to index data(csv) from a dirctory that on hdfs automatically with solr!

I want to index data(csv) from a dirctory that on hdfs automatically with solr!

New Contributor

hi!

I want to index data(csv) from a dirctory that on hdfs automatically with solr!

So how to do it ?

ps:cdh5/4

1 REPLY 1

Re: I want to index data(csv) from a dirctory that on hdfs automatically with solr!

Expert Contributor

you can use solr hadoop connector from lucid works , below is a sample command using this connector

1- create a path on hdfs to put your csv file/files inside i.e "csv_directory/"

2 - download the connector and use this command , (update the csv structure and zookeeper configuration and solr core name)
hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter="," -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c solr_core_name -i csv_directory/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk localhost:2181