How to Reduce the Cost of Evaluating LLM Applications.

0
7


Here’s how not to waste your budget on evaluating models and systems

Towards Data Science
mage created by the author using Flux1.1 Pro.

You can build a fortress in two ways: Start stacking bricks one above the other, or draw a picture of the fortress you’re about to build and plan its execution; then, keep evaluating it against your plan.

We all know the second one is the only way we can possibly build a fortress.

Sometimes, I’m the worst follower of my advice. I’m talking about jumping straight into a notebook to build an LLM app. It’s the worst thing we can do to ruin our project.

Before we begin anything, we need a mechanism to tell us we’re moving in the right direction — to say that the last thing we tried was better than before (or otherwise.)

In software engineering, it’s called test-driven development. For machine learning, it’s evaluation.

The first step and the most valuable skill in developing LLM-powered applications is to define how you’ll evaluate your project.

Evaluating LLM applications is nowhere like software testing. I don’t undermine the challenges in software testing, but evaluating LLMs isn’t as straightforward as testing.