Skip to content

chelokot/AidBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AidBot

A chatbot for finding volunteer help using semantic text search

Visual Studio Code @Aid_Ai_Bot

Description

With the beginning of the war, many people were left without homes, personal belongings or documents. Such people are in dire need of quick search of help from volunteers, and volunteers need to look for victims and people who can provide them with the necessary resources.

Our project is a Telegram bot with semantic text search, that uses AI embedding technology from OpenAI. With our bot you can quickly get data about people or organizations that can help in the specified situation. By entering a request, the user will receive a list of information about things or services that can be provided, contacts and geolocation. Charitable foundations can also use the bot to find those in need of help, find resources, or use additional analysis tools.

User stories can be found here

Sources

We should focus on creating some simplest working project first, and improving quality later if needed. Therefore, we start with parsing just one website and we can extend our sources later:

Database

pgvector

Finding similar vectors directly by calculating cosine similarity to each element in database is pretty heavy. Moreover, getting the whole column with each query is not gonna be very efficient. For doing such "find similar" searches the vector databases are used usually, they provide some smart alghorithms of indexing for fast searching.

Postgres can actually work as vector database using this project. It allows storing vector embeddings in table and using cosine similarity in queries with some fast indexing.

pgvector compiled for Windows is provided in pgvector directory. vector.dll must be putted into ./lib folder in PostgreSQL directory, and all .sql files and vector.control file must be putted into ./share/extension folder in PostgreSQL directory.

Once you do it, extension can be connected to database as CREATE EXTENSION IF NOT EXISTS vector and then you can create column with type vector(), for example: CREATE TABLE table_name (id bigserial primary key, embedding vector(1568)).

After table is filled with some data, you can create index as CREATE INDEX ON table_name USING ivfflat (embedding vector_cosine_ops). Indexes are used to speed up search by a lot

Then you can search in your database simply by using ORDER BY: SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 1;

pgvector-python

pgvector-python is used for convinient work with vectors in database using python. All you need to do to use it after it's installed is

from pgvector.psycopg import register_vector
register_vector(conn)

It will allow you to provide numpy arrays into conn.execute command, for example:

conn.execute(
   'INSERT INTO table_name (embedding) VALUES (%s)', 
   np.random.rand(1568).astype(np.float32)
)

You can see more detailed example in an example file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages