-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: reactivate benchmarks with quick fixes #2766
Conversation
…into quickfix_benchmarks
… requests that exceed elastic's limits (happening in dense 500k runs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one last step before merging would be to change the title such that it complies with the commit conventions.
|
||
|
||
To run all benchmarks (e.g. for a new haystack release): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To run all benchmarks (e.g. for a new haystack release): | |
To start all benchmarks (e.g. for a new Haystack release), run: |
|
||
Results will be stored in this directory as | ||
- retriever_index_results.csv (+ .md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- retriever_index_results.csv (+ .md) | |
- retriever_index_results.csv and retriever_index_results.md |
Results will be stored in this directory as | ||
- retriever_index_results.csv (+ .md) | ||
- retriever_query_results.csv (+ .md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- retriever_query_results.csv (+ .md) | |
- retriever_query_results.csv and retriever_query_results.md |
- retriever_index_results.csv (+ .md) | ||
- retriever_query_results.csv (+ .md) | ||
- reader_results.csv (+ .md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- reader_results.csv (+ .md) | |
- reader_results.csv and reader_results.md |
|
||
`docker start opensearch > /dev/null 2>&1 || docker run -d -p 9201:9200 -p 9600:9600 -e "discovery.type=single-node" -e "OPENSEARCH_JAVA_OPTS=-Xms4096m -Xmx4096m" --name opensearch opensearchproject/opensearch:2.2.1` | ||
and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a line break either side of this "and" will make the paragraph be much more readable
* quick fix benchmark runs to make them work with current haystack version * fix minor typo * update readme. fix minor things to make benchmarks run again * Update Documentation & Code Style * fix typo in readme * update result files for reader and retriever querying * reduce batch size for update embeddings to prevent xlarge bulk_update requests that exceed elastic's limits (happening in dense 500k runs) * change default memory allocation back to normal. add note to readme * add first indexing results * add memory to docker cmd * full benchmarks results on commit c5a2651 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Related Issue(s): fixes #2813
In the past we were running benchmarks quite regularly for haystack and published results here. As the benchmarking script was not working anymore at some point, we stopped doing it and never found time to fix it.
Proposed changes:
text
->content
)Future work
Note
In order to successfully finish the 500k indexing runs on elastic + opensearch the docker containers we need to allocate more RAM. The default cmds used in haystack's utils
launch_opensearch()
andlaunch_es()
are not sufficient here. This means we need to start the two containers manually before running the benchmarks. Documented this in the benchmarks' README, but would be nice to automate this (e.g. by exposing the memory param in the util functions and passing a value from benchmarks).Pre-flight checklist