asc_rag_chatbot

The deployed app is available here: https://ascragbot-fnbmsd9sdtaqc77x75r7zi.streamlit.app/

How the app works

This chatbot is able to answer questions about Adult Social Care webpages, with a focus on content related to Caring For Someone, Social Care and Assessments, and Independent Living.

The chatbot uses a technique called Retrieval Augmented Generation (RAG) to retrieve contextual information from the relevant webpages and makes this available to OpenAI's GPT3.5 turbo model, so that it can provide a more contextualised response to the user.

The app uses the OpenAI API to achieve this. The main application code is available in app.py, while various core functionality is defined in utils.py.

The high level logic / steps in app.py is as follows:

Loads the embeddings cache - this actually happens in utils.py
Loads and parses the html documents in 'downloaded_pdfs' - see the load_documents function in utils.py
Generates embeddings for all of the webpages - see the get_embedding, embedding_from_string and get_embeddings functions in utils.py
Generates a series of urls for the bot to include in its responses
Builds out the streamlit front-end
Initialises the session state for the model - this is where the model is prompted to provide it with instructions and examples of good responses
Sets out further logic for the UI, and to allow for an ongoing conversation
In this final section it also calls the retrieve_documents function from utils.py. This is the key one that looks at the user's prompt and then retrieves the most similar documents from downloaded_pdfs and includes these in the model's context. This is what allows the model to provide a more contextualised response.

Data

The data that has been used can be found in the folder 'downloaded_pdfs' (these are in fact html files!). These are all webpages, primarily from from kingston.gov.uk.

The file 'asc_website.csv' provides a full list of the pages that have been used and their urls.

In addition to the two .py files, there is also a file called extract.ipynb, which can be used to extract the webpages as html documents. To add more webpages, simply add the title and urls to the 'asc_website.csv' file and re-run extract.ipynb and they will be downloaded to downloaded_pdfs.

Evaluation

Some automated evaluation of the model's responses has been carried out in the file 'evaluate.ipynb'.

This file uses the model and actual answers in the spreadsheet 'AI assistant - questions and answers'. It creates vector embeddings of these answers and then compares them using cosine similarity. Nearly all of the answers have a similarity of over 0.9, which shows a strong match between the model answer and the answer provided by the chatbot.

Running the app locally

First clone this repository on to your local machine.

Next, ensure that you are running the correct version of Python, which is 3.9.7.

Then, in your terminal create a virtual environment and install the required packages:

python -m venv asc_rag_chatbot_venv
source asc_rag_chatbot_venv/bin/activate
pip install -r requirements.txt

Next, create a file called .env and put your OpenAI API key inside e.g.:

OPENAI_API_KEY= 'your key goes here'

Then create a folder called .streamlit and inside it create a file called secrets.toml. Also put your OpenAI API key inside this e.g.:

OPENAI_API_KEY= 'your key goes here'

Next, run the app using the command:

streamlit run app.py

Deploying the app to Streamlit

Deploying the app to Streamlit is very straighforward

First go to platform.openai.com and create an account
Got to dashboard > API keys > Create secret key
Create a key and make a copy of it
Visit streamlit.io and create an account
Connect your Streamlit account to the relevant GitHub account that has the app code
Click on Create App and then 'Yup, I have an app'
Select the relevant repository, branch and file path
Select 'Advanced settings', select Python version 3.9, and in the Secrets section enter OPENAI_API_KEY= ''. Enter your OpenAI API key between the apostrophes
Click on 'Deploy!'

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
downloaded_pdfs		downloaded_pdfs
.gitignore		.gitignore
AI assistant - questions and answers.xlsx		AI assistant - questions and answers.xlsx
README.md		README.md
app.py		app.py
asc_website.csv		asc_website.csv
embeddings_cache.pkl		embeddings_cache.pkl
evaluate.ipynb		evaluate.ipynb
extract.ipynb		extract.ipynb
requirements.txt		requirements.txt
runtime.txt		runtime.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

asc_rag_chatbot

How the app works

Data

Evaluation

Running the app locally

Deploying the app to Streamlit

About

Releases

Packages

Languages

RoyalBoroughKingston/asc_rag_chatbot

Folders and files

Latest commit

History

Repository files navigation

asc_rag_chatbot

How the app works

Data

Evaluation

Running the app locally

Deploying the app to Streamlit

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages