Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MAJOR UPDATE: 1) Google Drive Integration complete! Downloads files a…
…nd folders recursively. Filtering, sorting and queued-loading of Google Drive docs is now available via the UI 2) Improved highlighting: Implemented fuzzy-search logic, replacing exact matching, resulting in expanded highlighting on pages 3) Improved RAG: Increased cosine similarity seach threshold to 80% for more stringent and accurate matching and passing sources data to the LLM for improved response quality 4) Imporved handling of images for citations - skipping image extraction of scanned docs 5) Clearer document naming in citations: The unique ID of the highlighted dodcument is no longer attached to the document name in the 'Refer to the following documents' citations block 6) BUG FIX: When using the free-tier of the AzureCV OCR service, it will handle UsageLimitExceeded errors even when submitting multiple documents back-to-back, auto-waiting and resuming correctly 7) BUG FIX: handle_api_error events will now actually return to the front-end! 8) Refactored process_new_file method into smaller blocks that are now shared with the GoogleDrive loader and can be used by other integrations in the future too 9) Increased chunk size to 500 and removed '250' from the name of the SBERT VectorDB created 10) Cleaned up print and newline statements 12) Improvements to accuracy and relevance of page numbers and doc names cited in response, further refinements on-going 12) Replaced Whoosh indexing search opearator from the default AND to OR 11) HF-Waitress local-LLM server integration begins!
- Loading branch information