Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly partition tests that hit the OpenAI API from those that don't #25

Open
falquaddoomi opened this issue Jul 12, 2023 · 5 comments
Assignees

Comments

@falquaddoomi
Copy link
Collaborator

I'd noticed that running the full test suite uses the OpenAI API, which consumes a modest amount of credits. It also takes a little while for the requests to go through, since it's hitting an external service.

This is just a proposal, so feel free to decline and close this if you disagree. Anyway, what if we were to partition the tests into, say, full integration tests that could potentially cost money, and non-integration tests that just test our code? If a non-integration test needs an API response, we could mock it or otherwise return a locally-cached response. I imagine we could iterate more quickly and also integrate it safely into a CI/CD system if we could run just the subset of tests that test our code.

@miltondp
Copy link
Collaborator

This is an important point that needs to be addressed (maybe in the next iteration?). In the current code, I tried to do something like this by putting all unit tests that hit the API into test_model_revision.py, so that could be a quick and dirty solution to this issue.

Nonetheless, I think that using unit tests to test prompts is not the best idea. One possibility is to try something like OpenAI Evals, as briefly described here.

@d33bs
Copy link
Collaborator

d33bs commented Jul 28, 2023

Just wanted to upvote this idea, I think it'd be useful to distinguish the OpenAI tests. Maybe pytest's custom markers could be used here with the existing test suite structure? This could also be extended to account for additional markers beyond the scope of just this issue and be configured with certain defaults, etc.

@miltondp
Copy link
Collaborator

miltondp commented Aug 1, 2023

I'm inclined to completely remove any prompt testing that involves hitting an external, expensive API from the unit tests, and only leave tests that can be run locally for free. The current unit tests that do that should probably be moved to another framework for prompt evaluation such as OpenAI Evals. We could potentially leave some small integration tests that hit a local model and use pytest functionality to run them or not depending on whether the host has a GPU or not.

@falquaddoomi
Copy link
Collaborator Author

Agreed about moving the tests that run at cost to another framework. In the meantime, just to show how @d33bs's suggestion about using markers would work, I went ahead and implemented them in PR #33 and marked all the tests in test_model_revision.py as "cost". We can use that implementation as a starting point for, say, your GPU testing idea, @miltondp. We can probably infer whether the user has a GPU and use that as the default value, but it's still IMHO nice for users to be able to conditionally enable/disable the feature.

@castedo
Copy link

castedo commented Sep 6, 2024

#51 might be useful here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants