Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar

Many of us in Hortonworks Community Connection feel most at home when we are talking about technologies and tools and the "animals in the zoo". However if we want to grow the data lake and gain support from the business we have to learn to think a little differently and use a new vocabulary to communicate.

Start by meeting with the business to identifying possible use cases. Talk to the analysts about the highest priorities and pain points for the business. Before thinking about anything remotely Hadoop animal like, summarize "what" needs to be done. This may take several interviews with different business analysts to gain a full understanding of the problem.

Then determine if Big Data can solve the problem. Are data silos preventing the organization from getting a complete view of the customer or logistics? Is the volume of data required to solve the problem too much or too expensive for existing systems to handle? Are the unstructured or semi-structured data required to solve the problem not working effectively in existing systems? If the answer to any of these questions is yes, then Big Data is likely a good fit.

Next calculate the return of the solution to the business. Return can come from cost savings from increased efficiency or reduction in loss, increased sales resulting from improved customer satisfaction, or new revenue and growth from new data products.

Then estimate the investment required for the solution. What are the costs of the development and infrastructure required for the solution? How much will it cost to operationalize the solution? How much will it cost to maintain the solution in coming years?

The value of the solution is the return minus the investment. Project the figures out over several years. The first year the development, infrastructure, and operationalization costs will most likely be higher so the value will be lower. However if the maintenance costs are low, years two and three may have much higher value with lower investment.

Let's look at some example use cases:

1. Customer 360 is bringing everything that the organization knows about the customer into the data lake. The insights gained from Customer 360 can reduce churn, improve customer loyalty and improve campaign effectiveness. The return is the estimate of increased sales due to reduced churn and better campaign performance. The investment is how much it costs to develop the Customer 360, the costs to obtain the data needed, the infrastructure and personnel required to run the system, and the training required to enable analysts to use it effectively.

2. Fraud detection is preventing loss due to theft. For example a retailer can flag fraudulent returns of stolen goods or detect theft of merchandise. The return is estimated by measuring the amount of loss that could be prevented and the investment is the costs to develop the system, the cost of the infrastructure and personnel to run the system, and the costs to deploy the system to stores.

3. Predictive maintenance optimizes downtime and reduces the cost of maintaining machinery in a factory or vehicles in a fleet. Predictive maintenance uses algorithms that look at the historical failure of parts and the operating conditions of the machines and determines what maintenance needs to be done and when. The return of predictive maintenance is calculated by the reductions in downtime or breakdowns and the savings in parts and labor of only doing maintenance when it is indicated by the operating conditions. How much does a breakdown or downtime cost? Will the contents of the vehicle be lost if the vehicle is down for a lengthy period of time? How much is lost in sales when a delivery is not completed? How much is spent on maintenance and what is the cost of preventable maintenance? The investment is the cost to collect of the machinery or vehicle information, the cost to develop the algorithms and the infrastructure needed to collect and process the machine or fleet data.

Examine the results of the use case discovery and build a roadmap that shows which use cases will be implemented and when the implementation will start and end. Create a map of the use cases on two dimensions: value and difficulty of implementing. Start with the high value use cases that are easy to implement. Save the higher value but more difficult to implement use cases for later in the road map. Your team will be more experienced and better able to tackle these use cases.

Communicate the road map to the business in terms of the value and investment required. Don't dive into too many technical details. Keep it high level and focus on the what and the why.

When you start executing on your use cases don't forget to measure. Tracking your actual return and investment will help you realize the value the solutions and improve your estimation skills going forward.

1,372 Views