Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

csv Morphline and solr

SOLVED Go to solution

csv Morphline and solr

Explorer
Hi, i'm trying to use the following morphline to index some csv files : morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { readCSV { separator : "\u0001" # le séparateur du fichier CSV columns : [date, canal,filename,indicator,value,category_tags] # les champs du CSV ignoreFirstLine : false # ignorer la premiere ligne ou pas trim : false charset : UTF-8 } } { convertTimestamp { field : date inputFormats : ["yyyy-MM-dd''HH:mm:ss.SSS"] inputTimezone : Europe/Paris outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSSZ" outputTimezone : Europe/Paris } } { generateUUID { field : id } } { logDebug { format : "output record: {}", args : ["@{}"] }} # load the record into a Solr server or MapReduce Reducer { loadSolr { solrLocator : { collection : monitor_flows # Name of solr collection zkHost : "sa12254:2181/solr,sa12253:2181/solr,sa12255:2181/solr" # ZooKeeper ensemble batchSize : 1000 # batchSize } } } ] } ] As you can see, i'm trying to convert the timestamp to the solr format, and to add an unique id. Unfortunately, nothing is indexed. Any help will be welcome. Regards.
1 ACCEPTED SOLUTION

Accepted Solutions

Re: csv Morphline and solr

Explorer
update: i resolved the problem: the timestamp inputformat was incorrect. Moreover i had to put the 'Z' (in the outpuformat) between quotes, even there is no quotes in https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-core/src/test/resources/te...

Regards.
5 REPLIES 5
Highlighted

Re: csv Morphline and solr

Expert Contributor

Check the log files of MapReduce job and Solr server. The issue is probably that you are missing a sanitizeUnknownSolrField morphline command in your morphline.

Re: csv Morphline and solr

Explorer
Thanks for your answer.
I have to precise that there is no error during the process.
Regards,

Re: csv Morphline and solr

Explorer
After some tests, the morphline works without convertTimestamp (date as string in this case) works. Still working on...

Re: csv Morphline and solr

Explorer
update: i resolved the problem: the timestamp inputformat was incorrect. Moreover i had to put the 'Z' (in the outpuformat) between quotes, even there is no quotes in https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-core/src/test/resources/te...

Regards.

Re: csv Morphline and solr

New Contributor

Please share the final command which you executing to index.

 

I  am getting following exception...

 

2018-02-18 14:54:01,759 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 192
2018-02-18 14:54:01,765 ERROR [IPC Server handler 15 on 61527] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1516267882526_0659_m_000002_3 - exited : org.kitesdk.morphline.api.MorphlineRuntimeException: java.lang.IllegalArgumentException: Illegal character in scheme name at index 0: 2018-01-10 05:31:10,2,100,100,12,1,1515542470144,311480275243412,18052600405,5808,,310,590,6,190370299670,513,,334,020,7,52941000779800,513,,334,020,7,52941000779800,0,2,3,0,,6,,1,0,52941000779800,,190370299070,0,0,,0,0,,0,0,,0,0,0,0
	at org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
	at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:220)
	at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
	at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalArgumentException: Illegal character in scheme name at index 0: 2018-01-10 05:31:10,2,100,100,12,1,1515542470144,311480275243412,18052600405,5808,,310,590,6,190370299670,513,,334,020,7,52941000779800,513,,334,020,7,52941000779800,0,2,3,0,,6,,1,0,52941000779800,,190370299070,0,0,,0,0,,0,0,,0,0,0,0
	at java.net.URI.create(URI.java:852)
	at org.apache.solr.hadoop.PathParts.stringToUri(PathParts.java:128)
	at org.apache.solr.hadoop.PathParts.<init>(PathParts.java:48)
	at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:192)
	... 10 more
Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 0: 2018-01-10 05:31:10,2,100,100,12,1,1515542470144,311480275243412,18052600405,5808,,310,590,6,190370299670,513,,334,020,7,52941000779800,513,,334,020,7,52941000779800,0,2,3,0,,6,,1,0,52941000779800,,190370299070,0,0,,0,0,,0,0,,0,0,0,0
	at java.net.URI$Parser.fail(URI.java:2848)
	at java.net.URI$Parser.checkChars(URI.java:3021)
	at java.net.URI$Parser.checkChar(URI.java:3031)
	at java.net.URI$Parser.parse(URI.java:3047)
	at java.net.URI.<init>(URI.java:588)
	at java.net.URI.create(URI.java:850)
	... 13 more

 

 

 

 

command:

 hadoop jar /opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/lib/solr/contrib/mr/search-mr-1.0.0-cdh5.10.2-job.jar org.apache.solr.hadoop.MapReduceIndexerTool --solr-home-dir /var/lib/hadoop-hdfs/senario_config --morphline-file /var/lib/hadoop-hdfs/senario_config/conf/morphline.conf --output-dir hdfs://10.10.16.134:8020/solr/senario_collection/core_node1/data/index --input-list hdfs://10.10.16.134:8020/user/hdfs/AMIT/nwsecp-emaster-20180115-175804.log --shards 1