ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels (easy/hard) across eight real-life scenarios.
-
Updated
Aug 19, 2023 - Jupyter Notebook
ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels (easy/hard) across eight real-life scenarios.
An exploration of Google Dialogflow through a Bear Transit voice assistant! View the writeup: https://shomil.me/dialogflow-caltransit/
Add a description, image, and links to the natural-lauguage-processing topic page so that developers can more easily learn about it.
To associate your repository with the natural-lauguage-processing topic, visit your repo's landing page and select "manage topics."