Possibly partition tests that hit the OpenAI API from those that don't #25

falquaddoomi · 2023-07-12T17:59:29Z

I'd noticed that running the full test suite uses the OpenAI API, which consumes a modest amount of credits. It also takes a little while for the requests to go through, since it's hitting an external service.

This is just a proposal, so feel free to decline and close this if you disagree. Anyway, what if we were to partition the tests into, say, full integration tests that could potentially cost money, and non-integration tests that just test our code? If a non-integration test needs an API response, we could mock it or otherwise return a locally-cached response. I imagine we could iterate more quickly and also integrate it safely into a CI/CD system if we could run just the subset of tests that test our code.

miltondp · 2023-07-24T08:09:29Z

This is an important point that needs to be addressed (maybe in the next iteration?). In the current code, I tried to do something like this by putting all unit tests that hit the API into test_model_revision.py, so that could be a quick and dirty solution to this issue.

Nonetheless, I think that using unit tests to test prompts is not the best idea. One possibility is to try something like OpenAI Evals, as briefly described here.

d33bs · 2023-07-28T13:55:39Z

Just wanted to upvote this idea, I think it'd be useful to distinguish the OpenAI tests. Maybe pytest's custom markers could be used here with the existing test suite structure? This could also be extended to account for additional markers beyond the scope of just this issue and be configured with certain defaults, etc.

miltondp · 2023-08-01T17:47:47Z

I'm inclined to completely remove any prompt testing that involves hitting an external, expensive API from the unit tests, and only leave tests that can be run locally for free. The current unit tests that do that should probably be moved to another framework for prompt evaluation such as OpenAI Evals. We could potentially leave some small integration tests that hit a local model and use pytest functionality to run them or not depending on whether the host has a GPU or not.

falquaddoomi · 2023-08-15T21:21:54Z

Agreed about moving the tests that run at cost to another framework. In the meantime, just to show how @d33bs's suggestion about using markers would work, I went ahead and implemented them in PR #33 and marked all the tests in test_model_revision.py as "cost". We can use that implementation as a starting point for, say, your GPU testing idea, @miltondp. We can probably infer whether the user has a GPU and use that as the default value, but it's still IMHO nice for users to be able to conditionally enable/disable the feature.

castedo · 2024-09-06T15:58:06Z

#51 might be useful here

falquaddoomi self-assigned this Jul 12, 2023

d33bs mentioned this issue Jul 20, 2023

Explore the use of static LLM model implementations for deterministic testing and potentially lowered costs #27

Open

falquaddoomi mentioned this issue Aug 15, 2023

Adds pytest param --runcost to run tests that incur API usage costs #33

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly partition tests that hit the OpenAI API from those that don't #25

Possibly partition tests that hit the OpenAI API from those that don't #25

falquaddoomi commented Jul 12, 2023

miltondp commented Jul 24, 2023

d33bs commented Jul 28, 2023

miltondp commented Aug 1, 2023

falquaddoomi commented Aug 15, 2023

castedo commented Sep 6, 2024

Possibly partition tests that hit the OpenAI API from those that don't #25

Possibly partition tests that hit the OpenAI API from those that don't #25

Comments

falquaddoomi commented Jul 12, 2023

miltondp commented Jul 24, 2023

d33bs commented Jul 28, 2023

miltondp commented Aug 1, 2023

falquaddoomi commented Aug 15, 2023

castedo commented Sep 6, 2024