Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add models to demo docker image #1978

Merged
merged 2 commits into from
Jan 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,10 @@ COPY haystack /home/user/haystack

# install as a package
COPY setup.py requirements.txt README.md /home/user/
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
RUN pip install -e .

# download punkt tokenizer to be included in image
RUN python3 -c "import nltk;nltk.download('punkt', download_dir='/usr/nltk_data')"
RUN python3 -c "from haystack.utils.docker import cache_models;cache_models()"

# create folder for /file-upload API endpoint with write permissions, this might be adjusted depending on FILE_UPLOAD_PATH
RUN mkdir -p /home/user/file-upload
Expand Down
7 changes: 4 additions & 3 deletions Dockerfile-GPU
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,13 @@ RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1
# Copy package setup files
COPY setup.py requirements.txt README.md /home/user/

RUN pip install --upgrade pip
RUN echo "Install required packages" && \
# Install PyTorch for CUDA 11
pip3 install torch==1.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html && \
# Install from requirements.txt
pip3 install -r requirements.txt

# download punkt tokenizer to be included in image
RUN python3 -c "import nltk;nltk.download('punkt', download_dir='/usr/nltk_data')"

# copy saved models
COPY README.md models* /home/user/models/

Expand All @@ -58,6 +56,9 @@ COPY haystack /home/user/haystack
# Install package
RUN pip3 install -e .

# Cache Roberta and NLTK data
RUN python3 -c "from haystack.utils.docker import cache_models;cache_models()"

# optional : copy sqlite db if needed for testing
#COPY qa.db /home/user/

Expand Down
18 changes: 18 additions & 0 deletions haystack/utils/docker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import logging

def cache_models():
"""
Small function that caches models and other data.
Used only in the Dockerfile to include these caches in the images.
"""
# download punkt tokenizer
logging.info("Caching punkt data")
import nltk
nltk.download('punkt', download_dir='/root/nltk_data')

# Cache roberta-base-squad2 model
logging.info("Caching deepset/roberta-base-squad2")
import transformers
model_to_cache='deepset/roberta-base-squad2'
transformers.AutoTokenizer.from_pretrained(model_to_cache)
transformers.AutoModel.from_pretrained(model_to_cache)