Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Resource chargeback

avatar
Contributor

Is there any use cases we can refer to on how to create a chargeback model on usage of the cluster? Example, we will have different Business units that will run jobs/store data on the cluster and internally , operation needs to define a charge back for their usage. We were thinking of about incorporating the cost per CPU/memory/storage or number of jobs/queue defined. Any ideas, guideness is always appreicauted.

1 ACCEPTED SOLUTION

avatar

Hi there @Hassan Faouaz I had an exchange with a colleague of mine @Aaron Wiebe quite a while ago that I often refer back to whenever this topic comes up, I'll paste it below and can certainly answer questions on it if you have any as I'm pretty familiar with the approaches. I also know that people have built one-off chargeback/showback style dashboards etc that pull data from clusters, but they tend to be heavily bespoke in each case, not something that could be easily shared.

Aaron's discussion on this topic is shown below:

-------------------------------------------------------------------------------------------------------------------------------

There are effectively two portions to implementing a chargeback system for Hadoop.

Building the Model

The first is to build a full TCO model for the Hadoop implementation, inclusive of both capital and operational costs. From this number, you should be able to calculate a full cost per month to run the Hadoop system.

Using the fully loaded cost per month, you then set two targets - and these targets are somewhat arbitrary:

Target system utilization to reach a break-even cost metric - aka your target margin. For most people, this ranges from 60% to 80%. The goal of this target is to provide a realistic resource utilization point, but also give yourself space to initiate an expansion once you begin to run over an average utilization in a month which exceeds your target.

The second target is a resource split, between CPU and space. This will be primarily driven on the intended use cases. Most people will split the costs at 75% storage and 25% CPU. Most want to encourage CPU use on the platform, meaning that analysis will be performed rather than simply using Hadoop as a storage mechanism.

Once you've built this model, you can calculate two costs: the cost per GB of storage per month, and the cost per hour per GB of memory for CPU utilization.

Implementing the Model

This step is relatively easy in comparison to the first part. There are two ways to implement this model: charge by reservation or charge by use. Charging by reservation is fairly straightforward - a given use-case or customer will request a certain amount of space, and they will be provided a quota. They are then charged as if their quota was fully utilized, since it is reserved for their use. For CPU/Memory, the calculation can be based on full use of their processing queue over the month.

Charging by use is more complex, and less common - however it is possible. Today, content from the logging subsystems will need to be pulled to determine usage for processing, and the filesystem will need to be traversed to determine usage. There are dangers in this approach in that users could, in theory, dump their datasets just before the end of the month - resulting in a skewed result, but that is also easy to detect.

View solution in original post

2 REPLIES 2

avatar

Hi there @Hassan Faouaz I had an exchange with a colleague of mine @Aaron Wiebe quite a while ago that I often refer back to whenever this topic comes up, I'll paste it below and can certainly answer questions on it if you have any as I'm pretty familiar with the approaches. I also know that people have built one-off chargeback/showback style dashboards etc that pull data from clusters, but they tend to be heavily bespoke in each case, not something that could be easily shared.

Aaron's discussion on this topic is shown below:

-------------------------------------------------------------------------------------------------------------------------------

There are effectively two portions to implementing a chargeback system for Hadoop.

Building the Model

The first is to build a full TCO model for the Hadoop implementation, inclusive of both capital and operational costs. From this number, you should be able to calculate a full cost per month to run the Hadoop system.

Using the fully loaded cost per month, you then set two targets - and these targets are somewhat arbitrary:

Target system utilization to reach a break-even cost metric - aka your target margin. For most people, this ranges from 60% to 80%. The goal of this target is to provide a realistic resource utilization point, but also give yourself space to initiate an expansion once you begin to run over an average utilization in a month which exceeds your target.

The second target is a resource split, between CPU and space. This will be primarily driven on the intended use cases. Most people will split the costs at 75% storage and 25% CPU. Most want to encourage CPU use on the platform, meaning that analysis will be performed rather than simply using Hadoop as a storage mechanism.

Once you've built this model, you can calculate two costs: the cost per GB of storage per month, and the cost per hour per GB of memory for CPU utilization.

Implementing the Model

This step is relatively easy in comparison to the first part. There are two ways to implement this model: charge by reservation or charge by use. Charging by reservation is fairly straightforward - a given use-case or customer will request a certain amount of space, and they will be provided a quota. They are then charged as if their quota was fully utilized, since it is reserved for their use. For CPU/Memory, the calculation can be based on full use of their processing queue over the month.

Charging by use is more complex, and less common - however it is possible. Today, content from the logging subsystems will need to be pulled to determine usage for processing, and the filesystem will need to be traversed to determine usage. There are dangers in this approach in that users could, in theory, dump their datasets just before the end of the month - resulting in a skewed result, but that is also easy to detect.

avatar
Contributor

thanks @drussell I think this is a good place to start. Hopefully in conjuction of using PepperData we can get somewhere.