Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc.print_tree(flat=True) broken in spacy 2.0.18 #3150

Closed
xrmx opened this issue Jan 12, 2019 · 3 comments
Closed

Doc.print_tree(flat=True) broken in spacy 2.0.18 #3150

xrmx opened this issue Jan 12, 2019 · 3 comments
Labels
bug Bugs and behaviour differing from documentation feat / doc Feature: Doc, Span and Token objects help wanted (easy) Contributions welcome! (also suited for spaCy beginners) help wanted Contributions welcome!

Comments

@xrmx
Copy link

xrmx commented Jan 12, 2019

How to reproduce the behaviour

$ python
Python 3.7.2 (default, Jan  3 2019, 02:55:40) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> doc = nlp('Alice ate the pizza')
>>> doc.print_tree(flat=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "doc.pyx", line 983, in spacy.tokens.doc.Doc.print_tree
  File "/venv/lib/python3.7/site-packages/spacy/tokens/printers.py", line 74, in parse_tree
    for sent in doc_clone.sents]
  File "/venv/lib/python3.7/site-packages/spacy/tokens/printers.py", line 74, in <listcomp>
    for sent in doc_clone.sents]
  File "/venv/lib/python3.7/site-packages/spacy/tokens/printers.py", line 41, in POS_tree
    subtree["modifiers"].append(POS_tree(c))
KeyError: 'modifiers'

doc.print_tree() is the following:

[{'NE': '',
  'POS_coarse': 'VERB',
  'POS_fine': 'VBD',
  'arc': 'ROOT',
  'lemma': 'eat',
  'modifiers': [{'NE': '',
                 'POS_coarse': 'PROPN',
                 'POS_fine': 'NNP',
                 'arc': 'nsubj',
                 'lemma': 'alice',
                 'modifiers': [],
                 'word': 'Alice'},
                {'NE': '',
                 'POS_coarse': 'NOUN',
                 'POS_fine': 'NN',
                 'arc': 'dobj',
                 'lemma': 'pizza',
                 'modifiers': [{'NE': '',
                                'POS_coarse': 'DET',
                                'POS_fine': 'DT',
                                'arc': 'det',
                                'lemma': 'the',
                                'modifiers': [],
                                'word': 'the'}],
                 'word': 'pizza'}],
  'word': 'ate'}]

Your Environment

  • spaCy version: 2.0.18
  • Platform: Linux-4.19.0-1-amd64-x86_64-with-debian-buster-sid
  • Python version: 3.7.2
  • Models: en_core_web_sm
@mauryaland
Copy link
Contributor

When the argument flat = True, the key "modifiers" of the dict containing the tree is deleted. I guess putting an if statement in the function POS_tree could fixed the issue.

def POS_tree(root, light=False, flat=False):
    """Helper: generate a POS tree for a root token. The doc must have
    `merge_ents(doc)` ran on it.
    """
    subtree = format_POS(root, light=light, flat=flat)
    if not flat:
        for c in root.children:
            subtree["modifiers"].append(POS_tree(c))
    return subtree

If the solution is approved, I can submit a small PR.

@ines ines added bug Bugs and behaviour differing from documentation feat / doc Feature: Doc, Span and Token objects labels Jan 13, 2019
@ines
Copy link
Member

ines commented Jan 13, 2019

Yes, a PR would be nice!

One thing to note re Doc.print_tree: At the moment, we've deprecated Doc.print_tree for spacy-nightly, because 2.1.x will introduce a new Doc.to_json() method that's also used for training and will output data that exactly matches spaCy's new training format (see #2928).

However, Doc.to_json won't actually cover everything that Doc.print_tree covered, so maybe we do want to bring that one back. I'm not 100% sure if people actually use it much, or whether it makes more sense for users to build their own logic that's more specific to their application.

@ines ines added help wanted Contributions welcome! help wanted (easy) Contributions welcome! (also suited for spaCy beginners) labels Jan 13, 2019
@ines ines closed this as completed Jan 16, 2019
@lock
Copy link

lock bot commented Feb 15, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Feb 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / doc Feature: Doc, Span and Token objects help wanted (easy) Contributions welcome! (also suited for spaCy beginners) help wanted Contributions welcome!
Projects
None yet
Development

No branches or pull requests

3 participants