Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)
Super Guru

We need to ingest C-CDA files, which often have weird names in them. To Compound the naming issues, the Extract C-CDA processor flattens XML to have periods which are not allowed in Apache Avro names.

Apache NiFi DataFlow

  • Ingest C-CDA XML Files
  • Extract C-CDA Attributes
  • Route on Attribute
  • AttributeCleaner - my new custom one
  • AttributeToJSON
  • SetSchema
  • QueryRecord
  • ConvertAvroToORC
  • PutHDFS

AttributeCleanerProcessor is my new processor to rename all the attributes. This is a very simple version.

42843-newprocessor.png

42845-attributevalues.png

These are the converted file names.

42847-jsonvalue.png

Above is the JSON fields with values.

42848-routeonattribute.png

Route when code field is not null.

42849-queryrecord.png

If code is active then convert to Apache ORC for Hive.

SQL Table

CREATE EXTERNAL TABLE IF NOT EXISTS ccda (problemSectionact_02observationproblemStatuscodecodeSystemName STRING, vitalSignsSectionorganizerobservations_05idsroot STRING, problemSectionact_02observationproblemStatusstatusCodecode STRING, vitalSignsSectionorganizerobservations_01texttext_01value STRING, vitalSignsSectionorganizerobservations_04texttext_01value STRING, vitalSignsSectionorganizercodecodeSystem STRING, vitalSignsSectionorganizerobservations_01valuesvalue STRING, vitalSignsSectionorganizerobservations_04valuesvalue STRING, problemSectionact_03observationidroot STRING, problemSectionact_02codecodeSystem STRING, vitalSignsSectionorganizerobservations_05effectiveTimevalue STRING, RouteOnAttributeRoute STRING, vitalSignsSectionorganizerobservations_03codecode STRING, vitalSignsSectionorganizerobservations_04statusCodecode STRING, problemSectionidroot STRING, codecode STRING, problemSectionact_01effectiveTimelow STRING, problemSectioncodecodeSystemName STRING, codedisplayName STRING, problemSectionact_01observationstatusCodecode STRING, vitalSignsSectionorganizerobservations_01idsroot STRING, vitalSignsSectionorganizerobservations_02idsroot STRING, vitalSignsSectionorganizerobservations_04idsroot STRING, vitalSignsSectionorganizerobservations_03idsroot STRING, problemSectionact_01observationeffectiveTimelow STRING, filecreationTime STRING, problemSectionact_01observationproblemStatusvaluesdisplayName STRING, problemSectionact_02observationvaluestranslationscode STRING, problemSectionact_03statusCodecode STRING, problemSectionact_03observationvaluestranslationscodeSystem STRING, problemSectionact_02idroot STRING, problemSectionact_03codecode STRING, problemSectionact_01observationidroot STRING, problemSectionact_02observationvaluestranslationscodeSystem STRING, problemSectionact_03observationproblemStatuscodecodeSystem STRING, problemSectionact_03observationproblemStatusvaluescodeSystemName STRING, vitalSignsSectionorganizercodecode STRING, vitalSignsSectionorganizerobservations_02statusCodecode STRING, problemSectionact_03observationvaluestranslationsdisplayName STRING, vitalSignsSectionorganizerobservations_03codecodeSystem STRING, problemSectionact_03observationproblemStatuscodecode STRING, problemSectionact_01observationproblemStatusstatusCodecode STRING, vitalSignsSectionorganizerobservations_04textreferencevalue STRING, filelastAccessTime STRING, vitalSignsSectionorganizerobservations_01codedisplayName STRING, filesize STRING, problemSectioncodecodeSystem STRING, vitalSignsSectionorganizerobservations_01valuesunit STRING, vitalSignsSectionorganizerobservations_02effectiveTimevalue STRING, vitalSignsSectionorganizerobservations_05idsextension STRING, vitalSignsSectionorganizerobservations_04codecode STRING, vitalSignsSectionorganizerobservations_05valuesvalue STRING, vitalSignsSectionorganizerobservations_04idsextension STRING, vitalSignsSectionorganizerobservations_02valuesvalue STRING, problemSectionact_02observationproblemStatusvaluescodeSystem STRING, vitalSignsSectionorganizerobservations_02idsextension STRING, vitalSignsSectionorganizerobservations_03idsextension STRING, vitalSignsSectionorganizerobservations_01idsextension STRING, problemSectiontitle STRING, vitalSignsSectionorganizerobservations_01codecodeSystemName STRING, problemSectionact_03observationvaluestranslationsoriginalTextreferencevalue STRING, vitalSignsSectionorganizerobservations_04valuesunit STRING, problemSectionact_02idextension STRING, vitalSignsSectionorganizerobservations_05statusCodecode STRING, vitalSignsSectionorganizerobservations_04effectiveTimevalue STRING, problemSectionact_02observationvaluestranslationscodeSystemName STRING, vitalSignsSectionorganizerobservations_03valuesvalue STRING, vitalSignsSectionorganizereffectiveTimevalue STRING, problemSectionact_03observationvaluestranslationscode STRING, vitalSignsSectionorganizerobservations_03codedisplayName STRING, vitalSignsSectionorganizerobservations_02texttext_01value STRING, vitalSignsSectionorganizerobservations_05texttext_01value STRING, absolutepath STRING, vitalSignsSectioncodedisplayName STRING, problemSectionact_03idextension STRING, problemSectionact_01observationvaluestranslationsoriginalTextreferencevalue STRING, problemSectionact_02observationvaluestranslationsoriginalTextreferencevalue STRING, filelastModifiedTime STRING, problemSectioncodecode STRING, vitalSignsSectionorganizeridsroot STRING, problemSectionact_02observationproblemStatuscodecodeSystem STRING, vitalSignsSectionorganizerobservations_05codecodeSystem STRING, filegroup STRING, problemSectionact_01observationproblemStatusvaluescode STRING, problemSectionact_01observationvaluestranslationsdisplayName STRING, problemSectionact_02codecode STRING, idextension STRING, vitalSignsSectioncodecode STRING, problemSectionact_03observationproblemStatuscodecodeSystemName STRING, problemSectionact_01idroot STRING, vitalSignsSectiontitle STRING, problemSectionact_01observationproblemStatuscodecodeSystemName STRING, vitalSignsSectionorganizerobservations_03valuesunit STRING, vitalSignsSectionorganizerobservations_01textreferencevalue STRING, effectiveTime STRING, vitalSignsSectionorganizerobservations_03codecodeSystemName STRING, problemSectionact_03observationstatusCodecode STRING, problemSectionact_02statusCodecode STRING, problemSectionact_02observationidextension STRING, problemSectionact_01idextension STRING, vitalSignsSectionorganizerstatusCodecode STRING, vitalSignsSectionorganizerobservations_05codedisplayName STRING, vitalSignsSectionorganizerobservations_04codecodeSystem STRING, vitalSignsSectionorganizerobservations_02codedisplayName STRING, problemSectionact_01observationvaluestranslationscodeSystemName STRING, vitalSignsSectionorganizerobservations_05codecode STRING, vitalSignsSectionorganizerobservations_04codecodeSystemName STRING, problemSectionact_02observationvaluestranslationsdisplayName STRING, idroot STRING, vitalSignsSectionorganizerobservations_02textreferencevalue STRING, problemSectionact_01observationidextension STRING, problemSectionact_01observationvaluestranslationscodeSystem STRING, problemSectionact_01codecode STRING, problemSectionact_02observationproblemStatusvaluesdisplayName STRING, problemSectionact_01codecodeSystem STRING, codecodeSystemName STRING, vitalSignsSectionorganizerobservations_01effectiveTimevalue STRING, vitalSignsSectionorganizercodedisplayName STRING, vitalSignsSectionorganizerobservations_02codecodeSystemName STRING, vitalSignsSectionorganizerobservations_03textreferencevalue STRING, vitalSignsSectionorganizerobservations_02valuesunit STRING, problemSectionact_03observationproblemStatusvaluesdisplayName STRING, problemSectionact_02observationproblemStatuscodecode STRING, vitalSignsSectionorganizerobservations_03statusCodecode STRING, problemSectionact_03observationproblemStatusvaluescode STRING, problemSectionact_02observationeffectiveTimelow STRING, problemSectionact_03observationvaluestranslationscodeSystemName STRING, vitalSignsSectionorganizerobservations_03texttext_01value STRING, problemSectionact_01observationproblemStatuscodecodeSystem STRING, problemSectionact_03observationproblemStatusvaluescodeSystem STRING, vitalSignsSectionorganizercodecodeSystemName STRING, problemSectionact_03observationidextension STRING, vitalSignsSectionorganizerobservations_01codecode STRING, codecodeSystem STRING, problemSectionact_02effectiveTimelow STRING, problemSectioncodedisplayName STRING, problemSectionact_02observationproblemStatusvaluescode STRING, vitalSignsSectionorganizeridsextension STRING, problemSectionact_02observationstatusCodecode STRING, vitalSignsSectionorganizerobservations_02codecode STRING, title STRING, problemSectionact_03idroot STRING, problemSectionidextension STRING, problemSectionact_03observationproblemStatusstatusCodecode STRING, problemSectionact_03effectiveTimelow STRING, problemSectionact_02observationproblemStatusvaluescodeSystemName STRING, fileowner STRING, vitalSignsSectionorganizerobservations_01statusCodecode STRING, vitalSignsSectionorganizerobservations_05textreferencevalue STRING, filepermissions STRING, vitalSignsSectionorganizerobservations_02codecodeSystem STRING, vitalSignsSectionorganizerobservations_05valuesunit STRING, problemSectionact_01observationvaluestranslationscode STRING, problemSectionact_01statusCodecode STRING, vitalSignsSectionorganizerobservations_05codecodeSystemName STRING, problemSectionact_03codecodeSystem STRING, vitalSignsSectioncodecodeSystem STRING, problemSectionact_01observationproblemStatusvaluescodeSystemName STRING, vitalSignsSectioncodecodeSystemName STRING, problemSectionact_01observationproblemStatuscodecode STRING, problemSectionact_02observationidroot STRING, vitalSignsSectionorganizerobservations_01codecodeSystem STRING, problemSectionact_01observationproblemStatusvaluescodeSystem STRING, vitalSignsSectionorganizerobservations_03effectiveTimevalue STRING, vitalSignsSectionorganizerobservations_04codedisplayName STRING, problemSectionact_03observationeffectiveTimelow STRING) STORED AS ORC
LOCATION '/ccda'

AttributeCleaner

Apache Avro names can't have spaces, dots, dashes or weird symbols. So we remove them.

Dirty Name

problem.section.act_01...

Clean Safe Name

problemSectionact_01observationvaluestranslationsoriginalTextreferencevalue

As we need to remove those periods created when the processor flattens out the XML.

42841-ccdaflow.png

Source Code for Custom Processor

https://github.com/tspannhw/nifi-attributecleaner-processor

References

http://calcite.apache.org/docs/reference.html

Apache NiFi Flow

c-cda-ingest-with-custom-processor.xml

1,326 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 09:58 AM
Updated by:
 
Contributors
Top Kudoed Authors