Reply
Highlighted
Explorer
Posts: 9
Registered: ‎09-15-2014
Accepted Solution

Morphline ReadAvroParquetFile timestamp problem

[ Edited ]

Hi all, i'm using the following morphline to index some parquet files:

 

morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**","org.apache.solr.**"] commands : [ { readAvroParquetFile { projectionSchemaString : """ { "name": "record_parquet", "namespace": "parquet.avro", "type": "record", "fields": [ { "name": "id", "type":["null", "string"] }, { "name": "date_time", "type":["null", "string" ]}, { "name": "sessionid", "type": ["null","string" ]}, { "name": "client_id", "type": ["null","string" ]} ] } """ # supportedMimeTypes : [avro/binary] # projectionSchemaString : "" # optional, avro json schema blurb for getSchema() # projectionSchemaFile : /path/to/syslog.avsc } } { extractAvroPaths { flatten : true paths : { id : /id date_time : "/date_time" session_id : /sessionid client_id : /client_id } } } { addValues { # add values "text/log" and "text/log2" to the source_type output field Channel : [canal] } } { logDebug { format : "output record: {}", args : ["@{}"] } } # load the record into a Solr server or MapReduce Reducer { loadSolr { solrLocator : { collection : parquet_test # Name of solr collection zkHost : "$IP:2181/solr" # ZooKeeper ensemble batchSize : 1000 # batchSize } } } ] } ]

 

 

Everything is going well except the fact that the tdate_time field (already in solr format, and date type in the schema.xml) is converted in unix epoch format. Any idea about ? Thanks in advance

Explorer
Posts: 9
Registered: ‎09-15-2014

Re: Morphline ReadAvroParquetFile timestamp problem

I resolved this issue.
There is no need of projectionSchemaString in readerAvroparquetFile.
So after removing it, everything was working.

Cheers.
Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.