Benchmarks based on https://artificialanalysis.ai/models/gpt-4-turbo

TTFT (Time to first token) for each model depends on the size which directly impacts the latency of the model and the average response time. We plan on building out fine tuned models to address our requirements based on the flows and flexibility of the multiagent system.

Also depending on the task we will employ a cascading model to support good quality output but with a mixture of different model sizes to support the overall performance of the platform

On standard hardware the benchmarks are notably a bit slower with output but inference is relatively the same.