Moved an AI feature into production a few months ago and the cost profile has been a constant surprise since so the demos and the early prototypes ran cheap because the volume was tiny + the prompts were short but when it hit traffic the token usage scaled a lot. I think it was partly because customers ask longer and unclear questions than our test set because we ended up adding context retrieval that doubled the input length on every call.
We started on GPT4o for the early version and the response quality was good enough that nobody pushed back but after a few weeks of volume the bill came in higher and finance had no way to break out which feature or which model was driving it. I am pulling exports from the OpenAI dashboard and trying to map them back to features manually which is not sustainable.
I shipped the feature and now I am the de facto owner of the cost question. The OpenAI dashboard tells me the total but it does not tell me what I actually need to answer and I spend half a day every week trying to reconcile token counts against feature usage but I am still not confident in the numbers I hand off.
[link] [comments]




