Operational Costs

Count operational costs for the product
- Software (?)
Extract cost insights from Local LLM Challenge | Speed vs Efficiency (opens in a new tab) (15:40)
etc., all other costs involved

Considerations

Hardware cost considerations

There are different options as for consuming LLMs

Locally on own hardware
Remotely from LLM-hosting services (tokens as a service)
Remotely, by renting hardware in GPU cloud

Conclusion

With all the calculations above, it is one more argument in favor of consuming LLMs locally, hosted on your machine, given that power consumption is low and the electricity bills too. Locally self-hosted LLMs is probably where the industry will head. In order to enable it, we have to have much lower power consumption rates, compared to what we have today, and much more compute in the hardware we have locally.

Scale

Now, given these numbers, consider scaling it to a bigger number of prompts / users. When ideally, you'd want to not be constrained:

energy-wise (for power consumption, while having it low)
and amount of prompts-wise

Ideally to have any number of prompts, and low electricity bill.

Which means very efficient hardware, with more compute and less energy-spend.

Local Hardware Costs Ownership Costs