<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question csv Morphline and solr in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/22723#M4107</link>
    <description>Hi, i'm trying to use the following morphline to index some csv files : morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { readCSV { separator : "\u0001" # le séparateur du fichier CSV columns : [date, canal,filename,indicator,value,category_tags] # les champs du CSV ignoreFirstLine : false # ignorer la premiere ligne ou pas trim : false charset : UTF-8 } } { convertTimestamp { field : date inputFormats : ["yyyy-MM-dd''HH:mm:ss.SSS"] inputTimezone : Europe/Paris outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSSZ" outputTimezone : Europe/Paris } } { generateUUID { field : id } } { logDebug { format : "output record: {}", args : ["@{}"] }} # load the record into a Solr server or MapReduce Reducer { loadSolr { solrLocator : { collection : monitor_flows # Name of solr collection zkHost : "sa12254:2181/solr,sa12253:2181/solr,sa12255:2181/solr" # ZooKeeper ensemble batchSize : 1000 # batchSize } } } ] } ] As you can see, i'm trying to convert the timestamp to the solr format, and to add an unique id. Unfortunately, nothing is indexed. Any help will be welcome. Regards.</description>
    <pubDate>Fri, 16 Sep 2022 09:15:28 GMT</pubDate>
    <dc:creator>fkrantz</dc:creator>
    <dc:date>2022-09-16T09:15:28Z</dc:date>
    <item>
      <title>csv Morphline and solr</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/22723#M4107</link>
      <description>Hi, i'm trying to use the following morphline to index some csv files : morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { readCSV { separator : "\u0001" # le séparateur du fichier CSV columns : [date, canal,filename,indicator,value,category_tags] # les champs du CSV ignoreFirstLine : false # ignorer la premiere ligne ou pas trim : false charset : UTF-8 } } { convertTimestamp { field : date inputFormats : ["yyyy-MM-dd''HH:mm:ss.SSS"] inputTimezone : Europe/Paris outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSSZ" outputTimezone : Europe/Paris } } { generateUUID { field : id } } { logDebug { format : "output record: {}", args : ["@{}"] }} # load the record into a Solr server or MapReduce Reducer { loadSolr { solrLocator : { collection : monitor_flows # Name of solr collection zkHost : "sa12254:2181/solr,sa12253:2181/solr,sa12255:2181/solr" # ZooKeeper ensemble batchSize : 1000 # batchSize } } } ] } ] As you can see, i'm trying to convert the timestamp to the solr format, and to add an unique id. Unfortunately, nothing is indexed. Any help will be welcome. Regards.</description>
      <pubDate>Fri, 16 Sep 2022 09:15:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/22723#M4107</guid>
      <dc:creator>fkrantz</dc:creator>
      <dc:date>2022-09-16T09:15:28Z</dc:date>
    </item>
    <item>
      <title>Re: csv Morphline and solr</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/22726#M4108</link>
      <description>&lt;P&gt;Check the log files of MapReduce job&amp;nbsp;and Solr server. The issue is probably that you are missing&amp;nbsp;a&amp;nbsp;sanitizeUnknownSolrField morphline command in your morphline.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Dec 2014 19:27:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/22726#M4108</guid>
      <dc:creator>whosch</dc:creator>
      <dc:date>2014-12-16T19:27:53Z</dc:date>
    </item>
    <item>
      <title>Re: csv Morphline and solr</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/22780#M4109</link>
      <description>Thanks for your answer.&lt;BR /&gt;I have to precise that there is no error during the process.&lt;BR /&gt;Regards,&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 17 Dec 2014 09:47:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/22780#M4109</guid>
      <dc:creator>fkrantz</dc:creator>
      <dc:date>2014-12-17T09:47:53Z</dc:date>
    </item>
    <item>
      <title>Re: csv Morphline and solr</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/22783#M4110</link>
      <description>After some tests, the morphline works without convertTimestamp (date as string in this case) works. Still working on...</description>
      <pubDate>Wed, 17 Dec 2014 10:16:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/22783#M4110</guid>
      <dc:creator>fkrantz</dc:creator>
      <dc:date>2014-12-17T10:16:38Z</dc:date>
    </item>
    <item>
      <title>Re: csv Morphline and solr</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/22785#M4111</link>
      <description>update: i resolved the problem: the timestamp inputformat was incorrect. Moreover i had to put the 'Z' (in the outpuformat) between quotes, even there is no quotes in &lt;A target="_blank" href="https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-core/src/test/resources/test-morphlines/convertTimestamp.conf."&gt;https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-core/src/test/resources/test-morphlines/convertTimestamp.conf.&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Regards.</description>
      <pubDate>Wed, 17 Dec 2014 12:56:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/22785#M4111</guid>
      <dc:creator>fkrantz</dc:creator>
      <dc:date>2014-12-17T12:56:44Z</dc:date>
    </item>
    <item>
      <title>Re: csv Morphline and solr</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/64692#M4112</link>
      <description>&lt;P&gt;Please share the final command which you executing to index.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I&amp;nbsp; am getting following exception...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;2018-02-18 14:54:01,759 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 192
2018-02-18 14:54:01,765 ERROR [IPC Server handler 15 on 61527] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1516267882526_0659_m_000002_3 - exited : org.kitesdk.morphline.api.MorphlineRuntimeException: java.lang.IllegalArgumentException: Illegal character in scheme name at index 0: 2018-01-10 05:31:10,2,100,100,12,1,1515542470144,311480275243412,18052600405,5808,,310,590,6,190370299670,513,,334,020,7,52941000779800,513,,334,020,7,52941000779800,0,2,3,0,,6,,1,0,52941000779800,,190370299070,0,0,,0,0,,0,0,,0,0,0,0
	at org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
	at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:220)
	at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
	at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalArgumentException: Illegal character in scheme name at index 0: 2018-01-10 05:31:10,2,100,100,12,1,1515542470144,311480275243412,18052600405,5808,,310,590,6,190370299670,513,,334,020,7,52941000779800,513,,334,020,7,52941000779800,0,2,3,0,,6,,1,0,52941000779800,,190370299070,0,0,,0,0,,0,0,,0,0,0,0
	at java.net.URI.create(URI.java:852)
	at org.apache.solr.hadoop.PathParts.stringToUri(PathParts.java:128)
	at org.apache.solr.hadoop.PathParts.&amp;lt;init&amp;gt;(PathParts.java:48)
	at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:192)
	... 10 more
Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 0: 2018-01-10 05:31:10,2,100,100,12,1,1515542470144,311480275243412,18052600405,5808,,310,590,6,190370299670,513,,334,020,7,52941000779800,513,,334,020,7,52941000779800,0,2,3,0,,6,,1,0,52941000779800,,190370299070,0,0,,0,0,,0,0,,0,0,0,0
	at java.net.URI$Parser.fail(URI.java:2848)
	at java.net.URI$Parser.checkChars(URI.java:3021)
	at java.net.URI$Parser.checkChar(URI.java:3031)
	at java.net.URI$Parser.parse(URI.java:3047)
	at java.net.URI.&amp;lt;init&amp;gt;(URI.java:588)
	at java.net.URI.create(URI.java:850)
	... 13 more&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;command:&lt;/P&gt;&lt;P&gt;&amp;nbsp;hadoop jar /opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/lib/solr/contrib/mr/search-mr-1.0.0-cdh5.10.2-job.jar org.apache.solr.hadoop.MapReduceIndexerTool --solr-home-dir /var/lib/hadoop-hdfs/senario_config --morphline-file /var/lib/hadoop-hdfs/senario_config/conf/morphline.conf --output-dir hdfs://10.10.16.134:8020/solr/senario_collection/core_node1/data/index --input-list hdfs://10.10.16.134:8020/user/hdfs/AMIT/nwsecp-emaster-20180115-175804.log --shards 1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 18 Feb 2018 09:30:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/csv-Morphline-and-solr/m-p/64692#M4112</guid>
      <dc:creator>Mani</dc:creator>
      <dc:date>2018-02-18T09:30:14Z</dc:date>
    </item>
  </channel>
</rss>

