Skip to content

Commit

Permalink
MAJOR UPDATE: 1) Google Drive Integration complete! Downloads files a…
Browse files Browse the repository at this point in the history
…nd folders recursively. Filtering, sorting and queued-loading of Google Drive docs is now available via the UI 2) Improved highlighting: Implemented fuzzy-search logic, replacing exact matching, resulting in expanded highlighting on pages 3) Improved RAG: Increased cosine similarity seach threshold to 80% for more stringent and accurate matching and passing sources data to the LLM for improved response quality 4) Imporved handling of images for citations - skipping image extraction of scanned docs 5) Clearer document naming in citations: The unique ID of the highlighted dodcument is no longer attached to the document name in the 'Refer to the following documents' citations block 6) BUG FIX: When using the free-tier of the AzureCV OCR service, it will handle UsageLimitExceeded errors even when submitting multiple documents back-to-back, auto-waiting and resuming correctly 7) BUG FIX: handle_api_error events will now actually return to the front-end! 8) Refactored process_new_file method into smaller blocks that are now shared with the GoogleDrive loader and can be used by other integrations in the future too 9) Increased chunk size to 500 and removed '250' from the name of the SBERT VectorDB created 10) Cleaned up print and newline statements 12) Improvements to accuracy and relevance of page numbers and doc names cited in response, further refinements on-going 12) Replaced Whoosh indexing search opearator from the default AND to OR 11) HF-Waitress local-LLM server integration begins!
  • Loading branch information
abgulati committed Aug 15, 2024
1 parent d82f748 commit 15d426a
Show file tree
Hide file tree
Showing 2 changed files with 948 additions and 376 deletions.
Loading

0 comments on commit 15d426a

Please sign in to comment.