Created on 07-11-2018 11:00 PM - edited 09-16-2022 06:27 AM
How to read DECIMAL datatype present parquet file ... - Cloudera Community Cloudera Community
I want to read parquet files using Morphlines.
Reference:https://medium.com/@bkvarda/index-parquet-with-morphlines-and-solr-20671cd93a41
My Parquet file has DECIMAL datatypes. I do not find any documentation, how to deal with DECIMAL datatype in Morphlines. I am using below code in conf file which is not working.
===============================================================================
SOLR_LOCATOR : {
# Name of solr collection
#collection : citiscreening
collection : icttdnee_ttsd_collection
#solrHomeDir : ${HOME}/solr_citiscreening_configs
# ZooKeeper ensemble -- edit this for your cluster's Zk hostname(s)
zkHost : "bdgtr018x01h2.nam.nsroot.net:2181,bdgtr013x03h2.nam.nsroot.net:2181,bdgtr015x02h2.nam.nsroot.net:2181/solr"
#zkHost : "bdgtr018x01h2.nam.nsroot.net:2181,bdgtr013x03h2.nam.nsroot.net:2181,bdgtr015x02h2.nam.nsroot.net:2181/solr"
#bdgtr013x04h2:9983/solr
# The maximum number of documents to send to Solr per network batch (throughput knob)
# batchSize : 1000
}
morphlines : [
{
# Name used to identify a morphline. E.g. used if there are multiple
# morphlines in a morphline config file
id : solrTest
# Import all morphline commands in these java packages and their
# subpackages. Other commands that may be present on the classpath are
# not visible to this morphline.
importCommands : ["org.kitesdk.**", "com.cloudera.**", "org.apache.solr.**"]
commands : [
# Read the Parquet data
{ readAvroParquetFile {
# For Parquet files that were not written with the parquet.avro package
# (e.g. Impala Parquet files) there is no Avro write schema stored in
# the Parquet file metadata. To read such files using the
# readAvroParquetFile command you must either provide an Avro reader
# schema via the readerSchemaFile parameter, or a default Avro schema
# will be derived using the standard mapping specification.
# Optionally, use this Avro schema in JSON format inline for projection:
readerSchemaString:"""{ "type": "record"
,"name": "my_record"
,"fields": [
{"name": "audit_internal_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 }
,{"name": "alert_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 }
,{"name": "created_date", "type":["null","string"]}
,{"name": "event", "type":["null","string"]}
,{"name": "comments", "type":["null","string"]}
,{"name": "user_identifier", "type":["null","string"]}
,{"name": "user_role", "type":["null","string"]}
,{"name": "status", "type":["null","string"]}
,{"name": "step_identifier", "type":["null","string"]}
,{"name": "attachment_internal_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 }
,{"name": "note_internal_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 }
,{"name": "owner", "type":["null","string"]}
]
}"""
}
}
{ logDebug { format : "output record {}", args : ["@{}"] } }
{ extractAvroPaths {
flatten : true
paths : {
audit_internal_id : /audit_internal_id
alert_id : /alert_id
created_date : /created_date
event : /event
comments : /comments
user_identifier : /user_identifier
user_role : /user_role
status : /status
step_identifier : /step_identifier
attachment_internal_id: /attachment_internal_id
note_internal_id : /note_internal_id
owner : /owner
}
}
}
{ sanitizeUnknownSolrFields { solrLocator : ${SOLR_LOCATOR} } }
# load the record into a Solr server or MapReduce Reducer.
{ loadSolr { solrLocator : ${SOLR_LOCATOR} } }
]
}
]
==================================
Data in Logs:
Output logs: DEBUG org.kitesdk.morphline.stdlib.LogDebugBuilder$LogDebug - output record [{_attachment_body=[{"audit_internal_id": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0002#\u0014I��\u0000", "alert_id": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u000F\u001A(��\u0000", "created_date": "2018-03-19", "event": "ALERT_REVIEWED", "comments": "Alert Reviewed, and Submitted by User:LV1234"}]
Created 07-18-2018 04:09 AM
Expected: It should read actual values of Decimal data types .
Created 07-11-2022 10:32 AM
Hi Pooja,
Checking if you get any solution for this ?