Created on 03-01-201612:22 AM - edited on 02-20-202012:39 AM by SumitraMenon
Because patients visit many doctors, trends in their ailments and complaints may be difficult to identify. The steps in this article will help you address exactly this problem by creating a TagCloud of the most frequent complaints per patient. Below is a sample:
We will generate random HL7 MDM^T02 (v2.3) messages that contain a doctor's note about a fake patient and that patient's fake complaint to their doctor. Apache NiFi will be used to parse these messages and send them to Apache Solr. Finally Banana is used to create the visual dashboard.
In the middle of the dashboard is a TagCloud where the more frequently mentioned symptoms for a selected patient appear larger than others. Because this project relies on randomly generated data, some interesting results are possible. In this case, I got lucky and all the symptoms seem related to the patient's most frequent complaint: Morning Alcohol Drinking. The list of all possible symptoms comes from Google searches.
Summary of steps
Download and install the HDP Sandbox
Download and install the latest NiFi release
Download the HL7 message generator
Create a Solr dashboard to visualize the results
Create and execute a new NiFi flow
Detailed Step-by-step guide
1. Download and install the HDP Sandbox
Download the latest (2.3 as of this writing) HDP Sandbox here. Import it into VMware or VirtualBox, start the instance, and update the DNS entry on your host machine to point to the new instance’s IP. On Mac, edit /etc/hosts, on Windows, edit %systemroot%\system32\drivers\etc\ as administrator and add a line similar to the below:
192.168.56.102 sandbox sandbox.hortonworks.com
2. Download and install the latest NiFi release
Follow the directions here. These were the steps that I executed for 0.5.1
A big thank you to HAPI for their excellent library to parse and create HL7 messages on which my code relies. The generator creates a very simple MDM^T02 that includes an in-line note from a doctor. MDM stands for Medical Document Management, and T02 specifies that this is a message for a new document. For more details about this message type read this document. Here is a sample message for Beatrice Cunningham:
OBX|1|TX|1001^Reason For Visit: |1|Evaluated patient for skin_scaling. ||||||F
As a pre-requisite to executing the code, we need to install Java 8. Execute this on the Sandbox:
yum -y install java-1.8.0-openjdk.x86_64
Now, download the pre-build jar file that has the HL7 generator and execute it to create a single message in /tmp/hl7-messages. I chose to store the jar file in /var/ftp/pub because my IDE uploads files during code development. If you change this directory, also change it in the NiFi flow.
4. Create a Solr dashboard to visualize the results
Now we need to configure Solr to ignore some words that don't add value. We do this by modifying stopwords.txt
cat <<EOF > /opt/hostname-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/stopwords.txt
Next, we download the custom dashboard and start Solr in cloud mode
Click "Browse", then find the template on your local machine, click "Import", and close the Template Window.
Drag and drop to instantiate a new template:
Double click the new process group called HL7, and start all of the processes. To do so, hold down the Shift-key, and select all of the processes on the screen. Then click the "Start" button:
Here is a quick walk through of the processes starting in the top-left corner. First, we use ListFile process to get a directory listing from /tmp/hl7-messages. Second, the FetchFile process reads each file one-by-one, passes the contents to the next step, and deletes if successful. Third, the text file is parsed as an HL7 formatted message. Next, the UpdateAttribute and AttributesToJSON processes get the contents ready for insertion into Solr. Finally, we use the PutSolrContentStream process to add new documents via Solr REST API. The remaining two processes on the very bottom are for spawning the custom Java code and logging details for troubleshooting.
Now open the Banana UI. You should see a dashboard that looks similar to the screenshot in the beginning of this article. You can see how many messages have been processed by clicking the link in the top-right panel called "Filter By".
If you are not seeing any data in Solr/Banana, then reload the page. Also perform a search via this page to validate that results are being indexed via Solr correctly.