Reply
Highlighted
Explorer
Posts: 9
Registered: ‎09-15-2014
Accepted Solution

csv Morphline and solr

Hi, i'm trying to use the following morphline to index some csv files : morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { readCSV { separator : "\u0001" # le séparateur du fichier CSV columns : [date, canal,filename,indicator,value,category_tags] # les champs du CSV ignoreFirstLine : false # ignorer la premiere ligne ou pas trim : false charset : UTF-8 } } { convertTimestamp { field : date inputFormats : ["yyyy-MM-dd''HH:mm:ss.SSS"] inputTimezone : Europe/Paris outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSSZ" outputTimezone : Europe/Paris } } { generateUUID { field : id } } { logDebug { format : "output record: {}", args : ["@{}"] }} # load the record into a Solr server or MapReduce Reducer { loadSolr { solrLocator : { collection : monitor_flows # Name of solr collection zkHost : "sa12254:2181/solr,sa12253:2181/solr,sa12255:2181/solr" # ZooKeeper ensemble batchSize : 1000 # batchSize } } } ] } ] As you can see, i'm trying to convert the timestamp to the solr format, and to add an unique id. Unfortunately, nothing is indexed. Any help will be welcome. Regards.
Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: csv Morphline and solr

Check the log files of MapReduce job and Solr server. The issue is probably that you are missing a sanitizeUnknownSolrField morphline command in your morphline.

Explorer
Posts: 9
Registered: ‎09-15-2014

Re: csv Morphline and solr

Thanks for your answer.
I have to precise that there is no error during the process.
Regards,

Explorer
Posts: 9
Registered: ‎09-15-2014

Re: csv Morphline and solr

After some tests, the morphline works without convertTimestamp (date as string in this case) works. Still working on...
Explorer
Posts: 9
Registered: ‎09-15-2014

Re: csv Morphline and solr

update: i resolved the problem: the timestamp inputformat was incorrect. Moreover i had to put the 'Z' (in the outpuformat) between quotes, even there is no quotes in https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-core/src/test/resources/te...

Regards.
Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.