I used the wellindex.csv file to obtain a list of well file numbers (file_no), scraped their respective Production, Injection, Scout Ticket web pages, any available LAS format well logfiles, and loaded them into HDFS (/user/dev/wellbook/) for analysis.
To avoid the HDFS small files problem I used the Apache Mahout seqdirectory tool for combining my textfiles into SequenceFiles: the keys are the filenames and the values are the contents of each textfile.
Then I used a combination of Hive queries and the pyquery Python library for parsing relevant fields out of the raw HTML pages.
3 Join with Production / EOR / Auction data (Power BI)
Get a 360-degree view of the well
<Hive tables - Master>
a. Predictive Analytics (Linear Regression)
b. Visualize the data using Yarn Ready applications
4 Dynamic Well Logs
Query for multiple mnemonic readings per well or multiple wells in a given region. Normalize and graph data for specific depth steps on the fly.
5 Dynamic Time Warping
Run the algorithm per well or for all wells and all mnemonics and visualize the results to know what readings belong to the same curve class. Using supervised machine learning, enable automatic bucketing of mnemonics belonging to the same curve class.
Build on your own
Clone the git below and follow the steps in Readme to create your own demo.