Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to read DECIMAL datatype present parquet file using Morphlines config

How to read DECIMAL datatype present parquet file using Morphlines config

New Contributor

I want to read parquet files using Morphlines.

Reference:https://medium.com/@bkvarda/index-parquet-with-morphlines-and-solr-20671cd93a41

My Parquet file has DECIMAL datatypes. I do not find any documentation, how to deal with DECIMAL datatype in Morphlines. I am using below code in conf file which is not working.

===============================================================================

 

SOLR_LOCATOR : {

# Name of solr collection
#collection : citiscreening
collection : icttdnee_ttsd_collection
#solrHomeDir : ${HOME}/solr_citiscreening_configs
# ZooKeeper ensemble -- edit this for your cluster's Zk hostname(s)
zkHost : "bdgtr018x01h2.nam.nsroot.net:2181,bdgtr013x03h2.nam.nsroot.net:2181,bdgtr015x02h2.nam.nsroot.net:2181/solr"

#zkHost : "bdgtr018x01h2.nam.nsroot.net:2181,bdgtr013x03h2.nam.nsroot.net:2181,bdgtr015x02h2.nam.nsroot.net:2181/solr"
#bdgtr013x04h2:9983/solr

# The maximum number of documents to send to Solr per network batch (throughput knob)
# batchSize : 1000
}

morphlines : [
{
# Name used to identify a morphline. E.g. used if there are multiple
# morphlines in a morphline config file
id : solrTest

# Import all morphline commands in these java packages and their
# subpackages. Other commands that may be present on the classpath are
# not visible to this morphline.
importCommands : ["org.kitesdk.**", "com.cloudera.**", "org.apache.solr.**"]

commands : [

# Read the Parquet data

{ readAvroParquetFile {
# For Parquet files that were not written with the parquet.avro package
# (e.g. Impala Parquet files) there is no Avro write schema stored in
# the Parquet file metadata. To read such files using the
# readAvroParquetFile command you must either provide an Avro reader
# schema via the readerSchemaFile parameter, or a default Avro schema
# will be derived using the standard mapping specification.

# Optionally, use this Avro schema in JSON format inline for projection:
readerSchemaString:"""{ "type": "record"
,"name": "my_record"
,"fields": [

{"name": "audit_internal_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 }
,{"name": "alert_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 }
,{"name": "created_date", "type":["null","string"]}
,{"name": "event", "type":["null","string"]}
,{"name": "comments", "type":["null","string"]}
,{"name": "user_identifier", "type":["null","string"]}
,{"name": "user_role", "type":["null","string"]}
,{"name": "status", "type":["null","string"]}
,{"name": "step_identifier", "type":["null","string"]}
,{"name": "attachment_internal_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 }
,{"name": "note_internal_id","type":["bytes","null"],"logicalType":"decimal","precision":38,"scale":10,"default":0 }
,{"name": "owner", "type":["null","string"]}

]
}"""

}
}


{ logDebug { format : "output record {}", args : ["@{}"] } }


{ extractAvroPaths {
flatten : true
paths : {

audit_internal_id : /audit_internal_id
alert_id : /alert_id
created_date : /created_date
event : /event
comments : /comments
user_identifier : /user_identifier
user_role : /user_role
status : /status
step_identifier : /step_identifier
attachment_internal_id: /attachment_internal_id
note_internal_id : /note_internal_id
owner : /owner

}
}
}

{ sanitizeUnknownSolrFields { solrLocator : ${SOLR_LOCATOR} } }

# load the record into a Solr server or MapReduce Reducer.
{ loadSolr { solrLocator : ${SOLR_LOCATOR} } }

]
}
]
==================================

Data in Logs:
Output logs: DEBUG org.kitesdk.morphline.stdlib.LogDebugBuilder$LogDebug - output record [{_attachment_body=[{"audit_internal_id": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0002#\u0014I��\u0000", "alert_id": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u000F\u001A(��\u0000", "created_date": "2018-03-19", "event": "ALERT_REVIEWED", "comments": "Alert Reviewed, and Submitted by User:LV1234"}]

1 REPLY 1

Re: How to read DECIMAL datatype present parquet file using Morphlines config

New Contributor

Any help please?

Don't have an account?
Coming from Hortonworks? Activate your account here