About fkrantz

fkrantz · ‎01-16-2015

I resolved this issue. There is no need of projectionSchemaString in readerAvroparquetFile. So after removing it, everything was working. Cheers.

fkrantz · ‎01-14-2015

Hi all, i'm using the following morphline to index some parquet files: morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**","org.apache.solr.**"] commands : [ { readAvroParquetFile { projectionSchemaString : """ { "name": "record_parquet", "namespace": "parquet.avro", "type": "record", "fields": [ { "name": "id", "type":["null", "string"] }, { "name": "date_time", "type":["null", "string" ]}, { "name": "sessionid", "type": ["null","string" ]}, { "name": "client_id", "type": ["null","string" ]} ] } """ # supportedMimeTypes : [avro/binary] # projectionSchemaString : "" # optional, avro json schema blurb for getSchema() # projectionSchemaFile : /path/to/syslog.avsc } } { extractAvroPaths { flatten : true paths : { id : /id date_time : "/date_time" session_id : /sessionid client_id : /client_id } } } { addValues { # add values "text/log" and "text/log2" to the source_type output field Channel : [canal] } } { logDebug { format : "output record: {}", args : ["@{}"] } } # load the record into a Solr server or MapReduce Reducer { loadSolr { solrLocator : { collection : parquet_test # Name of solr collection zkHost : "$IP:2181/solr" # ZooKeeper ensemble batchSize : 1000 # batchSize } } } ] } ] Everything is going well except the fact that the tdate_time field (already in solr format, and date type in the schema.xml) is converted in unix epoch format. Any idea about ? Thanks in advance

fkrantz · ‎12-17-2014

update: i resolved the problem: the timestamp inputformat was incorrect. Moreover i had to put the 'Z' (in the outpuformat) between quotes, even there is no quotes in https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-core/src/test/resources/test-morphlines/convertTimestamp.conf. Regards.

fkrantz · ‎12-17-2014

After some tests, the morphline works without convertTimestamp (date as string in this case) works. Still working on...

fkrantz · ‎12-17-2014

Thanks for your answer. I have to precise that there is no error during the process. Regards,

fkrantz · ‎12-16-2014

Hi, i'm trying to use the following morphline to index some csv files : morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { readCSV { separator : "\u0001" # le séparateur du fichier CSV columns : [date, canal,filename,indicator,value,category_tags] # les champs du CSV ignoreFirstLine : false # ignorer la premiere ligne ou pas trim : false charset : UTF-8 } } { convertTimestamp { field : date inputFormats : ["yyyy-MM-dd''HH:mm:ss.SSS"] inputTimezone : Europe/Paris outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSSZ" outputTimezone : Europe/Paris } } { generateUUID { field : id } } { logDebug { format : "output record: {}", args : ["@{}"] }} # load the record into a Solr server or MapReduce Reducer { loadSolr { solrLocator : { collection : monitor_flows # Name of solr collection zkHost : "sa12254:2181/solr,sa12253:2181/solr,sa12255:2181/solr" # ZooKeeper ensemble batchSize : 1000 # batchSize } } } ] } ] As you can see, i'm trying to convert the timestamp to the solr format, and to add an unique id. Unfortunately, nothing is indexed. Any help will be welcome. Regards.

Online	Offline
Last Visited	‎03-11-2015 10:46 AM

Member Since	‎09-15-2014 08:45 AM
Last Visited	‎03-11-2015 10:46 AM
Posts	9

Cloudera Community

Re: Morphline ReadAvroParquetFile timestamp proble...

Re: csv Morphline and solr

Re: Morphline ReadAvroParquetFile timestamp proble...

Morphline ReadAvroParquetFile timestamp problem

Re: csv Morphline and solr

Re: csv Morphline and solr

Re: csv Morphline and solr

csv Morphline and solr