Member since
07-06-2018
5
Posts
0
Kudos Received
0
Solutions
07-18-2018
04:09 AM
Expected: It should read actual values of Decimal data types .
... View more
07-11-2018
11:07 PM
While reading parquet file, How to convert Parquet DECIMAL datatype to String.
... View more
07-11-2018
11:00 PM
How to read DECIMAL datatype present parquet file ... - Cloudera Community Cloudera Community
I want to read parquet files using Morphlines.
Reference:https://medium.com/@bkvarda/index-parquet-with-morphlines-and-solr-20671cd93a41
My Parquet file has DECIMAL datatypes. I do not find any documentation, how to deal with DECIMAL datatype in Morphlines. I am using below code in conf file which is not working.
===============================================================================
SOLR_LOCATOR : {
# Name of solr collection #collection : citiscreening collection : icttdnee_ttsd_collection #solrHomeDir : ${HOME}/solr_citiscreening_configs # ZooKeeper ensemble -- edit this for your cluster's Zk hostname(s) zkHost : "bdgtr018x01h2.nam.nsroot.net:2181,bdgtr013x03h2.nam.nsroot.net:2181,bdgtr015x02h2.nam.nsroot.net:2181/solr"
#zkHost : "bdgtr018x01h2.nam.nsroot.net:2181,bdgtr013x03h2.nam.nsroot.net:2181,bdgtr015x02h2.nam.nsroot.net:2181/solr" #bdgtr013x04h2:9983/solr
# The maximum number of documents to send to Solr per network batch (throughput knob) # batchSize : 1000 }
morphlines : [ { # Name used to identify a morphline. E.g. used if there are multiple # morphlines in a morphline config file id : solrTest
# Import all morphline commands in these java packages and their # subpackages. Other commands that may be present on the classpath are # not visible to this morphline. importCommands : ["org.kitesdk.**", "com.cloudera.**", "org.apache.solr.**"]
commands : [
# Read the Parquet data
{ readAvroParquetFile { # For Parquet files that were not written with the parquet.avro package # (e.g. Impala Parquet files) there is no Avro write schema stored in # the Parquet file metadata. To read such files using the # readAvroParquetFile command you must either provide an Avro reader # schema via the readerSchemaFile parameter, or a default Avro schema # will be derived using the standard mapping specification.
# Optionally, use this Avro schema in JSON format inline for projection: readerSchemaString:"""{ "type": "record" ,"name": "my_record" ,"fields": [
{"name": "audit_internal_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 } ,{"name": "alert_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 } ,{"name": "created_date", "type":["null","string"]} ,{"name": "event", "type":["null","string"]} ,{"name": "comments", "type":["null","string"]} ,{"name": "user_identifier", "type":["null","string"]} ,{"name": "user_role", "type":["null","string"]} ,{"name": "status", "type":["null","string"]} ,{"name": "step_identifier", "type":["null","string"]} ,{"name": "attachment_internal_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 } ,{"name": "note_internal_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 } ,{"name": "owner", "type":["null","string"]}
] }"""
} }
{ logDebug { format : "output record {}", args : ["@{}"] } }
{ extractAvroPaths { flatten : true paths : {
audit_internal_id : /audit_internal_id alert_id : /alert_id created_date : /created_date event : /event comments : /comments user_identifier : /user_identifier user_role : /user_role status : /status step_identifier : /step_identifier attachment_internal_id: /attachment_internal_id note_internal_id : /note_internal_id owner : /owner
} } }
{ sanitizeUnknownSolrFields { solrLocator : ${SOLR_LOCATOR} } }
# load the record into a Solr server or MapReduce Reducer. { loadSolr { solrLocator : ${SOLR_LOCATOR} } }
] } ] ==================================
Data in Logs: Output logs: DEBUG org.kitesdk.morphline.stdlib.LogDebugBuilder$LogDebug - output record [{_attachment_body=[{"audit_internal_id": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0002#\u0014I��\u0000", "alert_id": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u000F\u001A(��\u0000", "created_date": "2018-03-19", "event": "ALERT_REVIEWED", "comments": "Alert Reviewed, and Submitted by User:LV1234"}]
... View more
Labels:
- Labels:
-
Apache Solr