Operational Costs
-
Count operational costs for the product
- Software (?)
-
Extract cost insights from Local LLM Challenge | Speed vs Efficiency (opens in a new tab) (15:40)
-
etc., all other costs involved
Considerations
Hardware cost considerations
There are different options as for consuming LLMs
- Locally on own hardware
- Remotely from LLM-hosting services (tokens as a service)
- Remotely, by renting hardware in GPU cloud
Conclusion
With all the calculations above, it is one more argument in favor of consuming LLMs locally, hosted on your machine, given that power consumption is low and the electricity bills too. Locally self-hosted LLMs is probably where the industry will head. In order to enable it, we have to have much lower power consumption rates, compared to what we have today, and much more compute in the hardware we have locally.
Scale
Now, given these numbers, consider scaling it to a bigger number of prompts / users. When ideally, you'd want to not be constrained:
- energy-wise (for power consumption, while having it low)
- and amount of prompts-wise
Ideally to have any number of prompts, and low electricity bill.
Which means very efficient hardware, with more compute and less energy-spend.