Skip to content

This project covers serch use cases on Harry Potter text databases, with a focus on python integrations.

License

Notifications You must be signed in to change notification settings

iuliaferoli/harry-potter-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Harry Potter and the Elasticsearch Engine

This project covers serch use cases on Harry Potter text databases, with a focus on python integrations.

Part 1: Intro to Elasticsearch 0 --> 4

Create a index where each document is a Harry Potter character with their attributes. This index can them be used to create customized search queries to identify subsets of characters with particular properties.

This example project covers the basic introductory concepts of elasticsearch and kibana.

In Phase 2 Notebooks 5 --> 9

Introduce the python client to communicate with the Elasticsearch engine via code. Create an index from the first Harry Potter movie script to use fore more complex, natural language queries. Use Hugging Face models to add Sentiment Analysis and Embeddings for Semantic Search. Combine multiple models for hybrid search; compare to the native functionality of ELSER (knn Search).

In Phase 3 Files 11

Build a simple Flask APP as a User Interface for search Introduce a new index to store historical searches as they are ran - we can use this for observability & tracking. Separate some helper_functions that we can reuse.

Implemented features and planned additions

  • HP characters index & search
  • HP characters index - python client interface for search
  • HP sentiment analysis on movie subtitles
  • Embeddings and semantic search with ELSER
  • Python-DSL client
  • Web APP with Flask
  • Observability & Monitoring

Watch the video

Setup Environment

Requirements Installation of Elasticsearch (either local or on cloud) see docs

For python environment, recommend to set up a virtual environment see docs. Requirements: pandas.

Harry Potter Characters Index | Intro to Elasticsearch

Python notebook for some essential data cleaning with pandas dataframes.

Instructinos for adding data to the elastic cluster.

Short intro to Dashboards and visualizations in Kibana.

Short intro to Discover and KQL.

Working with Console / dev tools, intro to data types in elastic.

Building requests and intro to queries.

Harry Potter Movie Dialoogue Index | Intro to Elasticsearch Python Client

Working with the python client to build an index and mapping, bulk ingest documents, and run queries.

6 TBD - Elasticsearch Python DSL Client

Use the Eland client to import models from Hugging Face and run Sentiment Analysis on the data

Create embeddings for semantic (natural language) search

Compare with the ELSER model built by Elastic

See blog for our Advent Calendar here

Phase 3

Using Flask for a simple user interface allowing users to search (for a live demo). Added historical query tracking in a new index to later use for observability. img

About

This project covers serch use cases on Harry Potter text databases, with a focus on python integrations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published