I don't have a specific "tokens per second" rate that I can report — that's not information I have direct access to or track during our conversations.
Here's what I can tell you:
Token generation speed varies depending on the underlying model, the server infrastructure, request load, and response length. It's not a fixed number.
Typically, large language models can generate anywhere from 20 to 200+ tokens per second, depending on the deployment setup (hardware, batching, quantization, etc.).
My token usage per response depends on how complex the answer is, not a fixed per-second rate.
I don't have visibility into my own inference latency or throughput metrics at runtime.
If you're asking about API performance (e.g., for a specific model like GPT-4, Claude, or Gemini), the actual tokens/second figure would come from the provider's infrastructure and can be measured by timing API responses on your end.
Is there something specific you're trying to figure out - like estimating response time, API costs, or throughput for a project?
This is a shared conversation. Sign in to Orris to start your own chat.