Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign primitives #1398

Merged
merged 140 commits into from
Oct 13, 2021
Merged
Show file tree
Hide file tree
Changes from 129 commits
Commits
Show all changes
140 commits
Select commit Hold shift + click to select a range
2803fa4
first draft / notes on new primitives
tholor Sep 1, 2021
956f54b
wip label / feedback refactor
tholor Sep 1, 2021
cddfb81
rename doc.text -> doc.content. add doc.content_type
tholor Sep 13, 2021
5166966
add datatype for content
tholor Sep 13, 2021
a73f334
remove faq_question_field from ES and weaviate. rename text_field -> …
tholor Sep 13, 2021
c814b9a
update converters for . Add warning for empty
tholor Sep 13, 2021
3434c8a
renam label.question -> label.query. Allow sorting of Answers.
tholor Sep 13, 2021
f99382a
WIP primitives
tholor Sep 16, 2021
b37674a
update ui/reader for new Answer format
tholor Sep 20, 2021
a881a5d
Improve Label. First refactoring of MultiLabel. Adjust eval code
tholor Sep 20, 2021
f7fd715
fixed workflow conflict with introducing new one (#1472)
PiffPaffM Sep 17, 2021
7f3abaa
merge latest master
tholor Sep 20, 2021
b1377c9
Add latest docstring and tutorial changes
github-actions[bot] Sep 20, 2021
d13eb6e
make add_eval_data() work again
tholor Sep 20, 2021
95b7a9c
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Sep 20, 2021
923e8e0
fix reader formats. WIP fix _extract_docs_and_labels_from_dict
tholor Sep 20, 2021
d2938c6
fix test reader
tholor Sep 20, 2021
3de4c12
Add latest docstring and tutorial changes
github-actions[bot] Sep 20, 2021
e82dd48
fix another test case for reader
tholor Sep 20, 2021
cd5be8e
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Sep 20, 2021
bd5937e
fix mypy in farm reader.eval()
tholor Sep 21, 2021
c764cf3
fix mypy in farm reader.eval()
tholor Sep 21, 2021
6c05302
WIP ORM refactor
tholor Sep 21, 2021
b00cd01
Add latest docstring and tutorial changes
github-actions[bot] Sep 21, 2021
4d05f19
fix mypy weaviate
tholor Sep 21, 2021
3ebf36b
make label and multilabel dataclasses
tholor Sep 21, 2021
f3aa4c8
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Sep 21, 2021
83b3330
bump mypy env in CI to python 3.8
tholor Sep 21, 2021
0425cb7
WIP refactor Label ORM
tholor Sep 21, 2021
06db9e6
WIP refactor Label ORM
tholor Sep 21, 2021
f7c8dfd
simplify tests for individual doc stores
tholor Sep 22, 2021
d2c4fd4
WIP refactoring markers of tests
tholor Sep 22, 2021
625d4db
test alternative approach for tests with existing parametrization
tholor Sep 22, 2021
ece09e9
WIP refactor ORMs
tholor Sep 22, 2021
f02f44a
fix skip logic of already parametrized tests
tholor Sep 22, 2021
b7c6e88
fix weaviate behaviour in tests - not parametrizing it in our general…
tholor Sep 22, 2021
c442cb7
Add latest docstring and tutorial changes
github-actions[bot] Sep 22, 2021
9b6b3e5
fix some tests
tholor Sep 22, 2021
81be94e
Merge branch 'simplify_tests' of github.com:deepset-ai/haystack into …
tholor Sep 22, 2021
535d7b8
remove sql from document_store_types
tholor Sep 22, 2021
a95523c
fix markers for generator and pipeline test
tholor Sep 22, 2021
8449517
remove inmemory marker
tholor Sep 22, 2021
18b74ad
remove unneeded elasticsearch markers
tholor Sep 22, 2021
39af157
add dataclasses-json dependency. adjust ORM to just store JSON repr
tholor Sep 22, 2021
96b4612
ignore type as dataclasses_json seems to miss functionality here
tholor Sep 22, 2021
ea841a4
update readme and contributing.md
tholor Sep 23, 2021
ce42682
update contributing
tholor Sep 23, 2021
1b3b899
adjust example
tholor Sep 23, 2021
cd01afd
merge simplified tests PR
tholor Sep 23, 2021
51b715f
fix duplicate doc handling for custom index
tholor Sep 24, 2021
c64b64a
Add latest docstring and tutorial changes
github-actions[bot] Sep 24, 2021
7f9b7af
fix some ORM issues. fix get_all_labels_aggregated.
tholor Sep 24, 2021
86db6ee
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Sep 24, 2021
186674a
update drop flags where get_all_labels_aggregated() was used before
tholor Sep 24, 2021
d4f60a0
Add latest docstring and tutorial changes
github-actions[bot] Sep 24, 2021
f3c28cd
add to_json(). add + fix tests
tholor Sep 24, 2021
064b2dd
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Sep 24, 2021
4b77685
fix no_answer handling in label / multilabel
tholor Sep 27, 2021
43d4642
fix duplicate docs in memory doc store. change primary key for sql do…
tholor Sep 27, 2021
deaa812
merge latest master
tholor Sep 27, 2021
d168f2b
fix mypy issues
tholor Sep 27, 2021
2e224d6
merge latest master
tholor Sep 27, 2021
d2b3f47
fix mypy issues
tholor Sep 27, 2021
6d7b7d5
haystack/retriever/base.py
tholor Sep 27, 2021
80adad9
fix test_write_document_meta[elastic]
tholor Sep 27, 2021
494b592
fix test_elasticsearch_custom_fields
tholor Sep 27, 2021
37b5f8b
fix test_labels[elastic]
tholor Sep 27, 2021
5734381
fix crawler
tholor Sep 28, 2021
595c4df
fix converter
tholor Sep 28, 2021
37ec9ba
fix docx converter
tholor Sep 28, 2021
cab2183
fix preprocessor
tholor Sep 28, 2021
c053d99
fix test_utils
tholor Sep 28, 2021
37c4532
fix tfidf retriever. fix selection of docstore in tests with multiple…
tholor Sep 28, 2021
0b4857e
Add latest docstring and tutorial changes
github-actions[bot] Sep 28, 2021
67c2b81
fix crawler test. fix ocrconverter attribute
tholor Sep 28, 2021
96808f9
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Sep 28, 2021
76dfe61
fix test_elasticsearch_custom_query
tholor Sep 28, 2021
cb9b523
fix generator pipeline
tholor Sep 28, 2021
d50c1af
fix ocr converter
tholor Sep 28, 2021
af0c5ed
fix ragenerator
tholor Sep 28, 2021
8365a5d
Add latest docstring and tutorial changes
github-actions[bot] Sep 28, 2021
3a59c63
fix test_load_and_save_yaml for elasticsearch
tholor Sep 28, 2021
afd32fa
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Sep 28, 2021
01fee78
fixes for pipeline tests
tholor Sep 28, 2021
56ea66e
fix faq pipeline
tholor Sep 28, 2021
dd83a68
fix pipeline tests
tholor Sep 28, 2021
8e96c1a
Add latest docstring and tutorial changes
github-actions[bot] Sep 28, 2021
3badf36
fix weaviate
tholor Sep 29, 2021
874deb4
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Sep 29, 2021
845b0b6
Add latest docstring and tutorial changes
github-actions[bot] Sep 29, 2021
81f04f6
trigger CI
tholor Sep 29, 2021
9597b94
merge latest master
tholor Sep 29, 2021
e91d928
satisfy mypy
tholor Sep 30, 2021
4c8291f
Add latest docstring and tutorial changes
github-actions[bot] Sep 30, 2021
0a144b1
satisfy mypy
tholor Sep 30, 2021
4ae88c5
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Sep 30, 2021
3739e0e
Add latest docstring and tutorial changes
github-actions[bot] Sep 30, 2021
1d62ff5
trigger CI
tholor Sep 30, 2021
f256875
fix question generation test
tholor Sep 30, 2021
1b83b89
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Sep 30, 2021
c814b22
fix ray. fix Q-generation
tholor Sep 30, 2021
8babf49
fix translator test
tholor Sep 30, 2021
e09cd9f
satisfy mypy
tholor Sep 30, 2021
7d8b811
wip refactor feedback rest api
tholor Oct 4, 2021
94b7e5b
merge latest master
tholor Oct 4, 2021
ee5ef45
Merge branch 'master' into primitives
tholor Oct 5, 2021
b28abd4
fix rest api feedback endpoint
tholor Oct 6, 2021
15dedfb
fix doc classifier
tholor Oct 6, 2021
5191efd
remove relation of Labels -> Docs in SQL ORM
tholor Oct 6, 2021
e98a9fe
fix faiss/milvus tests
tholor Oct 6, 2021
8ef3d50
fix doc classifier test
tholor Oct 6, 2021
9649193
fix eval test
tholor Oct 6, 2021
985cf55
fixing eval issues
tholor Oct 6, 2021
e6627c3
Add latest docstring and tutorial changes
github-actions[bot] Oct 6, 2021
e47e3b0
fix mypy
tholor Oct 6, 2021
955576b
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Oct 6, 2021
cb2e260
WIP replace dataclasses-json with manual serialization
tholor Oct 8, 2021
db9c76c
merge latest master
tholor Oct 11, 2021
5276821
Add latest docstring and tutorial changes
github-actions[bot] Oct 11, 2021
08070c1
revert to dataclass-json serialization for now. remove debug prints.
tholor Oct 11, 2021
c0fa82f
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Oct 11, 2021
dbaeee8
update docstrings
tholor Oct 11, 2021
a4ec30e
Merge branch 'master' into primitives
tholor Oct 11, 2021
2c42918
fix extractor. fix Answer Span init
tholor Oct 11, 2021
02b43e4
fix api test
tholor Oct 11, 2021
5318df4
keep meta data of answers in reader.run()
tholor Oct 11, 2021
e94a270
fix meta handling
tholor Oct 11, 2021
e7173b2
adress review feedback
tholor Oct 12, 2021
bf85fbe
Add latest docstring and tutorial changes
github-actions[bot] Oct 12, 2021
60ae172
make document=None for open domain labels
tholor Oct 12, 2021
353b9ad
add import
tholor Oct 12, 2021
2ee1d05
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Oct 12, 2021
a5dc8e9
fix print utils
tholor Oct 12, 2021
b9f9375
merge latest master
tholor Oct 12, 2021
d76c469
merge latest master
tholor Oct 12, 2021
3f29c27
fix rest api
tholor Oct 12, 2021
e0c9a05
adress review feedback
tholor Oct 13, 2021
d1df7ef
Add latest docstring and tutorial changes
github-actions[bot] Oct 13, 2021
3377327
fix mypy
tholor Oct 13, 2021
06a9eed
Merge branch 'primitives' of github.com:deepset-ai/haystack into prim…
tholor Oct 13, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.7
python-version: 3.8
- name: Test with mypy
run: |
pip install mypy types-Markdown types-requests types-PyYAML
Expand Down
18 changes: 9 additions & 9 deletions docs/_src/api/api/document_store.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Get documents from the document store.
#### get\_all\_labels\_aggregated

```python
| get_all_labels_aggregated(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None, open_domain: bool = True, aggregate_by_meta: Optional[Union[str, list]] = None) -> List[MultiLabel]
| get_all_labels_aggregated(index: Optional[str] = None, filters: Optional[Dict[str, List[str]]] = None, open_domain: bool = True, drop_negative_labels: bool = False, drop_no_answers: bool = False, aggregate_by_meta: Optional[Union[str, list]] = None) -> List[MultiLabel]
```

Return all labels in the DocumentStore, aggregated into MultiLabel objects.
Expand All @@ -88,6 +88,7 @@ object, provided that they have the same product_id (to be found in Label.meta["
When False, labels are aggregated in a closed domain fashion based on the question text
and also the id of the document that the label is tied to. In this setting, this function
might return multiple MultiLabel objects with the same question string.
:param TODO drop params
- `aggregate_by_meta`: The names of the Label meta fields by which to aggregate. For example: ["product_id"]

<a name="base.BaseDocumentStore.add_eval_data"></a>
Expand Down Expand Up @@ -131,7 +132,7 @@ class ElasticsearchDocumentStore(BaseDocumentStore)
#### \_\_init\_\_

```python
| __init__(host: Union[str, List[str]] = "localhost", port: Union[int, List[int]] = 9200, username: str = "", password: str = "", api_key_id: Optional[str] = None, api_key: Optional[str] = None, aws4auth=None, index: str = "document", label_index: str = "label", search_fields: Union[str, list] = "text", text_field: str = "text", name_field: str = "name", embedding_field: str = "embedding", embedding_dim: int = 768, custom_mapping: Optional[dict] = None, excluded_meta_data: Optional[list] = None, faq_question_field: Optional[str] = None, analyzer: str = "standard", scheme: str = "http", ca_certs: Optional[str] = None, verify_certs: bool = True, create_index: bool = True, refresh_type: str = "wait_for", similarity="dot_product", timeout=30, return_embedding: bool = False, duplicate_documents: str = 'overwrite', index_type: str = "flat")
| __init__(host: Union[str, List[str]] = "localhost", port: Union[int, List[int]] = 9200, username: str = "", password: str = "", api_key_id: Optional[str] = None, api_key: Optional[str] = None, aws4auth=None, index: str = "document", label_index: str = "label", search_fields: Union[str, list] = "content", content_field: str = "content", name_field: str = "name", embedding_field: str = "embedding", embedding_dim: int = 768, custom_mapping: Optional[dict] = None, excluded_meta_data: Optional[list] = None, analyzer: str = "standard", scheme: str = "http", ca_certs: Optional[str] = None, verify_certs: bool = True, create_index: bool = True, refresh_type: str = "wait_for", similarity="dot_product", timeout=30, return_embedding: bool = False, duplicate_documents: str = 'overwrite', index_type: str = "flat")
```

A DocumentStore using Elasticsearch to store and query the documents for our search.
Expand All @@ -152,7 +153,7 @@ A DocumentStore using Elasticsearch to store and query the documents for our sea
- `index`: Name of index in elasticsearch to use for storing the documents that we want to search. If not existing yet, we will create one.
- `label_index`: Name of index in elasticsearch to use for storing labels. If not existing yet, we will create one.
- `search_fields`: Name of fields used by ElasticsearchRetriever to find matches in the docs to our incoming query (using elastic's multi_match query), e.g. ["title", "full_text"]
- `text_field`: Name of field that might contain the answer and will therefore be passed to the Reader Model (e.g. "full_text").
- `content_field`: Name of field that might contain the answer and will therefore be passed to the Reader Model (e.g. "full_text").
If no Reader is used (e.g. in FAQ-Style QA) the plain content of this field will just be returned.
- `name_field`: Name of field that contains the title of the the doc
- `embedding_field`: Name of field containing an embedding vector (Only needed when using a dense retriever (e.g. DensePassageRetriever, EmbeddingRetriever) on top)
Expand Down Expand Up @@ -239,12 +240,12 @@ they will automatically get UUIDs assigned. See the `Document` class for details
**Arguments**:

- `documents`: a list of Python dictionaries or a list of Haystack Document objects.
For documents as dictionaries, the format is {"text": "<the-actual-text>"}.
Optionally: Include meta data via {"text": "<the-actual-text>",
For documents as dictionaries, the format is {"content": "<the-actual-text>"}.
Optionally: Include meta data via {"content": "<the-actual-text>",
"meta":{"name": "<some-document-name>, "author": "somebody", ...}}
It can be used for filtering and is accessible in the responses of the Finder.
Advanced: If you are using your own Elasticsearch mapping, the key names in the dictionary
should be changed to what you have set for self.text_field and self.name_field.
should be changed to what you have set for self.content_field and self.name_field.
- `index`: Elasticsearch index where the documents should be indexed. If not supplied, self.index will be used.
- `batch_size`: Number of documents that are passed to Elasticsearch's bulk function at a time.
- `duplicate_documents`: Handle duplicates document based on parameter options.
Expand Down Expand Up @@ -1561,7 +1562,7 @@ The current implementation is not supporting the storage of labels, so you canno
#### \_\_init\_\_

```python
| __init__(host: Union[str, List[str]] = "http://localhost", port: Union[int, List[int]] = 8080, timeout_config: tuple = (5, 15), username: str = None, password: str = None, index: str = "Document", embedding_dim: int = 768, text_field: str = "text", name_field: str = "name", faq_question_field="question", similarity: str = "dot_product", index_type: str = "hnsw", custom_schema: Optional[dict] = None, return_embedding: bool = False, embedding_field: str = "embedding", progress_bar: bool = True, duplicate_documents: str = 'overwrite', **kwargs, ,)
| __init__(host: Union[str, List[str]] = "http://localhost", port: Union[int, List[int]] = 8080, timeout_config: tuple = (5, 15), username: str = None, password: str = None, index: str = "Document", embedding_dim: int = 768, content_field: str = "content", name_field: str = "name", similarity: str = "dot_product", index_type: str = "hnsw", custom_schema: Optional[dict] = None, return_embedding: bool = False, embedding_field: str = "embedding", progress_bar: bool = True, duplicate_documents: str = 'overwrite', **kwargs, ,)
```

**Arguments**:
Expand All @@ -1574,10 +1575,9 @@ The current implementation is not supporting the storage of labels, so you canno
- `password`: password (standard authentication via http_auth)
- `index`: Index name for document text, embedding and metadata (in Weaviate terminology, this is a "Class" in Weaviate schema).
- `embedding_dim`: The embedding vector size. Default: 768.
- `text_field`: Name of field that might contain the answer and will therefore be passed to the Reader Model (e.g. "full_text").
- `content_field`: Name of field that might contain the answer and will therefore be passed to the Reader Model (e.g. "full_text").
If no Reader is used (e.g. in FAQ-Style QA) the plain content of this field will just be returned.
- `name_field`: Name of field that contains the title of the the doc
- `faq_question_field`: Name of field containing the question in case of FAQ-Style QA
- `similarity`: The similarity function used to compare document vectors. 'dot_product' is the default.
- `index_type`: Index type of any vector object defined in weaviate schema. The vector index type is pluggable.
Currently, HSNW is only supported.
Expand Down
2 changes: 1 addition & 1 deletion docs/_src/api/api/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ open vs closed domain eval (https://haystack.deepset.ai/tutorials/evaluation).
#### run

```python
| run(labels: List[Label], answers: List[dict], correct_retrieval: bool)
| run(labels: List[Label], answers: List[Answer], correct_retrieval: bool)
```

Run this node on one sample and its labels
Expand Down
2 changes: 1 addition & 1 deletion docs/_src/api/api/file_converter.md
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ Extract text from a .pdf file using the pdftotext library (https://www.xpdfreade
others if your doc contains special characters (e.g. German Umlauts, Cyrillic characters ...).
Note: With "UTF-8" we experienced cases, where a simple "fi" gets wrongly parsed as
"xef\xac\x81c" (see test cases). That's why we keep "Latin 1" as default here.
(See list of available encodings by running `pdftotext -listencodings` in the terminal)
(See list of available encodings by running `pdftotext -listenc` in the terminal)

<a name="pdf.PDFToTextOCRConverter"></a>
## PDFToTextOCRConverter Objects
Expand Down
4 changes: 2 additions & 2 deletions docs/_src/api/api/generator.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ i.e. the model can easily adjust to domain documents even after training has fin
| 'meta': { 'doc_ids': [...],
| 'doc_scores': [80.42758 ...],
| 'doc_probabilities': [40.71379089355469, ...
| 'texts': ['Albert Einstein was a ...]
| 'content': ['Albert Einstein was a ...]
| 'titles': ['"Albert Einstein"', ...]
| }}]}
```
Expand Down Expand Up @@ -134,7 +134,7 @@ Generated answers plus additional infos in a dict like this:
| 'meta': { 'doc_ids': [...],
| 'doc_scores': [80.42758 ...],
| 'doc_probabilities': [40.71379089355469, ...
| 'texts': ['Albert Einstein was a ...]
| 'content': ['Albert Einstein was a ...]
| 'titles': ['"Albert Einstein"', ...]
| }}]}
```
Expand Down
3 changes: 2 additions & 1 deletion docs/_src/api/api/preprocessor.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ If batch_size is set to None, this method will yield all documents and labels.
#### convert\_files\_to\_dicts

```python
convert_files_to_dicts(dir_path: str, clean_func: Optional[Callable] = None, split_paragraphs: bool = False) -> List[dict]
convert_files_to_dicts(dir_path: str, clean_func: Optional[Callable] = None, split_paragraphs: bool = False, encoding: Optional[str] = None) -> List[dict]
```

Convert all files(.txt, .pdf, .docx) in the sub-directories of the given path to Python dicts that can be written to a
Expand All @@ -148,6 +148,7 @@ Document Store.
- `dir_path`: path for the documents to be written to the DocumentStore
- `clean_func`: a custom cleaning function that gets applied to each doc (input: str, output:str)
- `split_paragraphs`: split text in paragraphs.
- `encoding`: character encoding to use when converting pdf documents.

**Returns**:

Expand Down
2 changes: 1 addition & 1 deletion docs/_src/api/api/reader.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ Returns a dict containing the following metrics:
#### eval

```python
| eval(document_store: BaseDocumentStore, device: Optional[str] = None, label_index: str = "label", doc_index: str = "eval_document", label_origin: str = "gold_label", calibrate_conf_scores: bool = False)
| eval(document_store: BaseDocumentStore, device: Optional[str] = None, label_index: str = "label", doc_index: str = "eval_document", label_origin: str = "gold-label", calibrate_conf_scores: bool = False)
```

Performs evaluation on evaluation documents in the DocumentStore.
Expand Down
6 changes: 3 additions & 3 deletions docs/_src/api/api/retriever.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Wrapper method used to time functions.
#### eval

```python
| eval(label_index: str = "label", doc_index: str = "eval_document", label_origin: str = "gold_label", top_k: int = 10, open_domain: bool = False, return_preds: bool = False) -> dict
| eval(label_index: str = "label", doc_index: str = "eval_document", label_origin: str = "gold-label", top_k: int = 10, open_domain: bool = False, return_preds: bool = False) -> dict
```

Performs evaluation on the Retriever.
Expand Down Expand Up @@ -105,7 +105,7 @@ class ElasticsearchRetriever(BaseRetriever)
| "should": [{"multi_match": {
| "query": ${query}, // mandatory query placeholder
| "type": "most_fields",
| "fields": ["text", "title"]}}],
| "fields": ["content", "title"]}}],
| "filter": [ // optional custom filters
| {"terms": {"year": ${years}}},
| {"terms": {"quarter": ${quarters}}},
Expand Down Expand Up @@ -430,7 +430,7 @@ class EmbeddingRetriever(BaseRetriever)
**Arguments**:

- `document_store`: An instance of DocumentStore from which to retrieve documents.
- `embedding_model`: Local path or name of model in Hugging Face's model hub such as ``'deepset/sentence_bert'``
- `embedding_model`: Local path or name of model in Hugging Face's model hub such as ``'sentence-transformers/all-MiniLM-L6-v2'``
- `model_version`: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
- `use_gpu`: Whether to use gpu or not
- `model_format`: Name of framework that was used for saving the model. Options:
Expand Down
8 changes: 4 additions & 4 deletions docs/_src/api/api/translator.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Abstract class for a Translator component that translates either a query or a do

```python
| @abstractmethod
| translate(query: Optional[str] = None, documents: Optional[Union[List[Document], List[str], List[Dict[str, Any]]]] = None, dict_key: Optional[str] = None) -> Union[str, List[Document], List[str], List[Dict[str, Any]]]
| translate(query: Optional[str] = None, documents: Optional[Union[List[Document], List[Answer], List[str], List[Dict[str, Any]]]] = None, dict_key: Optional[str] = None) -> Union[str, List[Document], List[Answer], List[str], List[Dict[str, Any]]]
```

Translate the passed query or a list of documents from language A to B.
Expand All @@ -24,7 +24,7 @@ Translate the passed query or a list of documents from language A to B.
#### run

```python
| run(query: Optional[str] = None, documents: Optional[Union[List[Document], List[str], List[Dict[str, Any]]]] = None, answers: Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] = None, dict_key: Optional[str] = None)
| run(query: Optional[str] = None, documents: Optional[Union[List[Document], List[Answer], List[str], List[Dict[str, Any]]]] = None, answers: Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] = None, dict_key: Optional[str] = None)
```

Method that gets executed when this class is used as a Node in a Haystack Pipeline
Expand Down Expand Up @@ -89,7 +89,7 @@ They also have a few multilingual models that support multiple languages at once
#### translate

```python
| translate(query: Optional[str] = None, documents: Optional[Union[List[Document], List[str], List[Dict[str, Any]]]] = None, dict_key: Optional[str] = None) -> Union[str, List[Document], List[str], List[Dict[str, Any]]]
| translate(query: Optional[str] = None, documents: Optional[Union[List[Document], List[Answer], List[str], List[Dict[str, Any]]]] = None, dict_key: Optional[str] = None) -> Union[str, List[Document], List[Answer], List[str], List[Dict[str, Any]]]
```

Run the actual translation. You can supply a query or a list of documents. Whatever is supplied will be translated.
Expand All @@ -98,5 +98,5 @@ Run the actual translation. You can supply a query or a list of documents. Whate

- `query`: The query string to translate
- `documents`: The documents to translate
- `dict_key`:
- `dict_key`: If you pass a dictionary in `documents`, you can specify here the field which shall be translated.

6 changes: 3 additions & 3 deletions docs/_src/tutorials/tutorials/13.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,9 @@ text1 = "Python is an interpreted, high-level, general-purpose programming langu
text2 = "Princess Arya Stark is the third child and second daughter of Lord Eddard Stark and his wife, Lady Catelyn Stark. She is the sister of the incumbent Westerosi monarchs, Sansa, Queen in the North, and Brandon, King of the Andals and the First Men. After narrowly escaping the persecution of House Stark by House Lannister, Arya is trained as a Faceless Man at the House of Black and White in Braavos, using her abilities to avenge her family. Upon her return to Westeros, she exacts retribution for the Red Wedding by exterminating the Frey male line."
text3 = "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 100 of 2020,[6] as well as DIY magazine's Class of 2020.[7] The band signed to 4AD in late 2020 and shared a new single, 'Scratchcard Lanyard'.[8] In February 2021, the band shared details of their debut studio album, New Long Leg. They also shared the single 'Strong Feelings'.[9] The album, which was produced by John Parish, was released on 2 April 2021.[10]"

docs = [{"text": text1},
{"text": text2},
{"text": text3}]
docs = [{"content": text1},
{"content": text2},
{"content": text3}]

# Initialize document store and write in the documents
document_store = ElasticsearchDocumentStore()
Expand Down
4 changes: 2 additions & 2 deletions docs/_src/tutorials/tutorials/5.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ document_store.add_eval_data(
)

# Let's prepare the labels that we need for the retriever and the reader
labels = document_store.get_all_labels_aggregated(index=label_index)
labels = document_store.get_all_labels_aggregated(index=label_index, drop_negative_labels=True, drop_no_answers=False)
```

## Initialize components of QA-System
Expand Down Expand Up @@ -220,7 +220,7 @@ results = []
# This is how to run the pipeline
for l in labels:
res = p.run(
query=l.question,
query=l.query,
labels=l,
params={"index": doc_index, "Retriever": {"top_k": 10}, "Reader": {"top_k": 5}},
)
Expand Down
2 changes: 1 addition & 1 deletion docs/_src/tutorials/tutorials/7.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ documents: List[Document] = []
for title, text in zip(titles, texts):
documents.append(
Document(
text=text,
content=text,
meta={
"name": title or ""
}
Expand Down
Loading