Skip to content

QubitPi/wilhelm-vocabulary

Repository files navigation

Wilhelm Vocabulary

GitHub workflow status badge Apache License Badge

Data Format

The data that serves wilhelmlang.com. They are written in YAML format, because

  1. it is machine-readable so that it can be consumed quickly in data pipelines
  2. it is human-readable and, thus, easy to read and modify
  3. it supports multi-lines value which is very handy for language data

Encoding Table in YAML

To encode the inflections which are common in most Indo-European languages, an application-specific YAML that looks like the following are employed throughout this repository:

  - term: der Kaufmann
    definition: (male) trader
    declension:
      - ["",         singular, singular, singular,                plural, plural                   ]
      - ["",         indef.,   def.,     noun,                    def.,   noun                     ]
      - [nominative, ein,      der,      Kaufmann,                die,    "Kaufmänner, Kaufleute"  ]
      - [genitive,   eines,    des,      "Kaufmannes, Kaufmanns", der,    "Kaufmänner, Kaufleute"  ]
      - [dative,     einem,    dem,      Kaufmann,                den,    "Kaufmännern, Kaufleuten"]
      - [accusative, einen,    den,      Kaufmann,                die,    "Kaufmänner, Kaufleute"  ]

Note

  • A list under declension is a table row
  • All rows have the same number of columns
  • Each element of the list corresponds to a table cell

The declension (inflection) table above is equivalent to

singular plural
indef. def. noun def. noun
nominative ein der Kaufmann die Kaufmänner, Kaufleute
genitive eines des Kaufmannes, Kaufmanns der Kaufmänner, Kaufleute
dative einem dem Kaufmann den Kaufmännern, Kaufleuten
accusative einen den Kaufmann die Kaufmänner, Kaufleute

Data Pipeline

Data pipeline

How Data (Vocabulary) is Stored in a Graph Database

Why Graph Database

Graph data representation assumes universal connectivity among world entities. This applies pretty well to the realm of languages. Multilanguage learners have already seen that Indo-European languages are similar in many aspects. The similarities not only signify the historical facts about Philology but also surface a great opportunity for multilanguage learners to take advantages of them and study much more efficiently. What's missing is connecting the dots using Graph Databases that visually presents these vastly enlightening links between the related languages in a natural way.

Base Schema

vocabulary:
  - term: string
    definition: list

The meaning of a word is called the definition. A term has a natural relationship to its definition(s). For example, the German noun "Ecke" has at least 4 definitions:

Relationship between term and defintion(s)

Graph data generated by wilhelm-python-sdk

Tip

The parenthesized value at the beginning of each definition item played an un-ignorable role: it is the label of the relationship between term and definition in graph database loaded by Wilhelm SDK. For example, both German words

- term: denn
  definition:
    - (adv.) then, thus
    - (conj.) because

and

 - term: nämlich
   definition:
     - (adj.) same
     - (adv.) namely
     - (adv.) because

can mean "because" acting as different types. This is visualized as follows:

error loading example.png

Visualzing synonyms this way presents a big advantage to human brain who is exceedingly good at memorizing patterns

Inflections

Indo-European languages are mostly inflected. The way inflections are encoded are documented for each language separately:

Basic Graph Query

  • Search for a vocabulary with all its links:

    MATCH (term:Term)-[r]-(x) WHERE term.name = "der Amerikaner" RETURN term, r, x;
    

Languages

Noun Declension

Declension is the inflection for nouns and adjectives.

Tip

The declension tables for all nouns are sourced from Wiktionary

Declension Templates
Masculine

term with a definite article of der signifies a masculine noun which has a declension table template of the following form:

declension:
  - ["",         singular, singular, singular, plural, plural]
  - ["",         indef.,   def.,     noun,     def.,   noun  ]
  - [nominative, ein,      der,      ████████, die,    ██████]
  - [genitive,   eines,    des,      ████████, der,    ██████]
  - [dative,     einem,    dem,      ████████, den,    ██████]
  - [accusative, einen,    den,      ████████, die,    ██████]

For example:

  - term: der Gegenstand
    definition:
      - object
      - thing
    declension:
      - ["",         singular, singular, singular,                    plural, plural      ]
      - ["",         indef.,   def.,     noun,                        def.,   noun        ]
      - [nominative, ein,      der,      Gegenstand,                  die,    Gegenstände ]
      - [genitive,   eines,    des,      "Gegenstandes, Gegenstands", der,    Gegenstände ]
      - [dative,     einem,    dem,      Gegenstand,                  den,    Gegenständen]
      - [accusative, einen,    den,      Gegenstand,                  die,    Gegenstände ]

Caution

Adjectival nouns, however, do NOT follow the template above. Instead, it uses adjective declension table template

The definition of the adjectival nouns begins with "(Adjectival nouns)"`. For example:

  - term: der Kranker
    definition: (Adjectival nouns) the sick person
    declension:
      strong declension (without article):
        - ["",         singular,  singular, singular, plural ]
        - ["",         masculine, feminine, neuter,   ""     ]
        - [nominative, Kranker,   N/A,      N/A,      Kranke ]
        - [genitive,   Kranken,   N/A,      N/A,      Kranker]
        - [dative,     Krankem,   N/A,      N/A,      Kranken]
        - [accusative, Kranken,   N/A,      N/A,      Kranke ]
      weak declension (with definite article):
        - ["",         singular,    singular, singular, plural     ]
        - ["",         masculine,   feminine, neuter,   ""         ]
        - [nominative, der Kranke,  N/A,      N/A,      die Kranken]
        - [genitive,   des Kranken, N/A,      N/A,      der Kranken]
        - [dative,     dem Kranken, N/A,      N/A,      den Kranken]
        - [accusative, den Kranken, N/A,      N/A,      die Kranken]
      mixed declension (with indefinite article):
        - ["",         singular,      singular, singular, plural          ]
        - ["",         masculine,     feminine, neuter,   ""              ]
        - [nominative, ein Kranker,   N/A,      N/A,      (keine)  Kranken]
        - [genitive,   eines Kranken, N/A,      N/A,      (keiner) Kranken]
        - [dative,     einem Kranken, N/A,      N/A,      (keinen) Kranken]
        - [accusative, einen Kranken, N/A,      N/A,      (keine)  Kranken]

Note that since "Kranker" is masculine, all feminine and neuter declensions are undefined and, thus, are marked with "N/A".

Feminine

term with a definite article of die signifies a feminine noun which has a declension table template of the following form:

declension:
  - ["",         singular, singular, singular, plural, plural]
  - ["",         indef.,   def.,     noun,     def.,   noun  ]
  - [nominative, eine,     die,      ████████, die,    ██████]
  - [genitive,   einer,    der,      ████████, der,    ██████]
  - [dative,     einer,    der,      ████████, den,    ██████]
  - [accusative, eine,     die,      ████████, die,    ██████]
Neuter

term with a definite article of das signifies a neuter noun which has a declension table template of the following form:

declension:
  - ["",         singular, singular, singular, plural, plural]
  - ["",         indef.,   def.,     noun,     def.,   noun  ]
  - [nominative, ein,      das,      ████████, die,    ██████]
  - [genitive,   eines,    des,      ████████, der,    ██████]
  - [dative,     einem,    dem,      ████████, den,    ██████]
  - [accusative, ein,      das,      ████████, die,    ██████]

(Attributive) Adjective Declension

Tip

  • Predicate adjectives (e.g. kalt in mir ist kalt "I am cold") are undeclined
  • An adjective can both be predicative (a brave boy) or attributive (The boy is brave). Only attributive adjectives are declined.

There are 3 types of declensions for German adjectives

  1. strong declension,
  2. week declension, and
  3. mixed declension

Each type bears a separate declension table. To master German adjectives, we will need to memorize all of the three.

There is also adjective comparison which throws in potentially 6 more tables. But the rules for comparative and superlative forms are quite universal so we ignore the two and focus only on the positive form, which is the basic form of the adjective.

The template of the adjective is as follows:

declension:
  strong declension (without article):
    - ["",         singular,  singular, singular, plural]
    - ["",         masculine, feminine, neuter,   ""    ]
    - [nominative, █████████, ████████, ████████, ██████]
    - [genitive,   █████████, ████████, ████████, ██████]
    - [dative,     █████████, ████████, ████████, ██████]
    - [accusative, █████████, ████████, ████████, ██████]
  weak declension (with definite article):
    - ["",         singular,  singular, singular, plural]
    - ["",         masculine, feminine, neuter,   ""    ]
    - [nominative, der █████, die ████, das ███,  die ██]
    - [genitive,   des █████, der ████, des ███,  der ██]
    - [dative,     dem █████, der ████, dem ███,  den ██]
    - [accusative, den █████, die ████, das ███,  die ██]
  mixed declension (with indefinite article):
    - ["",         singular,  singular, singular, plural        ]
    - ["",         masculine, feminine, neuter,   ""            ]
    - [nominative, ein █████, eine ███, ein ███,  (keine)  █████]
    - [genitive,   eines ███, einer ██, eines █,  (keiner) █████]
    - [dative,     einem ███, einer ██, einem █,  (keinen) █████]
    - [accusative, einen ███, eine ███, ein ███,  (keine)  █████]

Here is an example of the adjective declension for "unterschiedlich":

  - term: unterschiedlich
    definition: (adj.) different
    declension:
      strong declension (without article):
        - ["",         singular,          singular,          singular,          plural           ]
        - ["",         masculine,         feminine,          neuter,            ""               ]
        - [nominative, unterschiedlicher, unterschiedliche,  unterschiedliches, unterschiedliche ]
        - [genitive,   unterschiedlichen, unterschiedlicher, unterschiedlichen, unterschiedlicher]
        - [dative,     unterschiedlichem, unterschiedlicher, unterschiedlichem, unterschiedlichen]
        - [accusative, unterschiedlichen, unterschiedliche,  unterschiedliches, unterschiedliche ]
      weak declension (with definite article):
        - ["",         singular,              singular,              singular,              plural               ]
        - ["",         masculine,             feminine,              neuter,                ""                   ]
        - [nominative, der unterschiedliche,  die unterschiedliche,  das unterschiedliche,  die unterschiedlichen]
        - [genitive,   des unterschiedlichen, der unterschiedlichen, des unterschiedlichen, der unterschiedlichen]
        - [dative,     dem unterschiedlichen, der unterschiedlichen, dem unterschiedlichen, den unterschiedlichen]
        - [accusative, den unterschiedlichen, die unterschiedliche,  das unterschiedliche,  die unterschiedlichen]
      mixed declension (with indefinite article):
        - ["",         singular,                singular,                singular,                plural                    ]
        - ["",         masculine,               feminine,                neuter,                  ""                        ]
        - [nominative, ein unterschiedlicher,   eine unterschiedliche,   ein unterschiedliches,   (keine) unterschiedlichen ]
        - [genitive,   eines unterschiedlichen, einer unterschiedlichen, eines unterschiedlichen, (keiner) unterschiedlichen]
        - [dative,     einem unterschiedlichen, einer unterschiedlichen, einem unterschiedlichen, (keinen) unterschiedlichen]
        - [accusative, einen unterschiedlichen, eine unterschiedliche,   ein unterschiedliches,   (keine) unterschiedlichen ]

Tip

The declension tables for all adjectives are sourced from Wiktionary

Pronoun Declension

Declension tamplate:

declension:
  - ["",         masclune, feminine, neuter, plural]
  - [nominative, ████████, ████████, ██████, ██████]
  - [genitive,   ████████, ████████, ██████, ██████]
  - [dative,     ████████, ████████, ██████, ██████]
  - [accusative, ████████, ████████, ██████, ██████]

Verb Conjugation

The conjugation is the inflection paradigm for a German verb. Those with conjugation field denotes a verb; its definition also begins with an indefinite form, i.e. "to ..."

There are 3 persons, 2 numbers, and 4 moods (indicative, conditional, imperative and subjunctive) to consider in conjugation. There are 6 tenses in German: the present and past are conjugated, and there are four compound tenses. There are two categories of verbs in German: weak and strong1. In addition, strong verbs are grouped into 7 "classes"

The conjugation table of German verb on Wiktionary is hard to interpret for German beginner. It does, however, presents a very good Philology reference. For example, it tells us which of the 7 "classes" a strong verb belongs to. We, therefore, leave the Wiktionary links to the conjugation table of that verb for data processing in the future, for example,

  - term: aufwachsen
    definition: to grow up
    conjugation: https://en.wiktionary.org/wiki/aufwachsen#Conjugation

and advise user to employ a much more practical method to learn daily conjugation as follows. We take "aufwachsen" as an example.

Important

I'm not advertising for any organizations. I'm simply sharing good resources.

Netzverb Dictionary is the best German dictionary targeting the vocabulary inflections. Search for "aufwachsen" and we will see much more intuitive conjugation tables listed.

This pretty much serves our needs, but what makes Netzverb unpenetrable by other alternatives is that every verb comes with

  1. A printable version that looks much better than the browser's Control+P export

    • There is also a "Sentences with German verb aufwachsen" section with a link that offer a fruitful number of conjugated examples getting us familiar with the inflections of the verb
  2. An on-the-fly generated flashcard sheet which allows us to make a better usage of our random free time

  3. A YouTube video that offers audios of almost every conjugated form, which helps with pronunciations a lot

Tip

  • It is, thus, strongly recommended to study the conjugation through Netzverb Dictionary separately
  • Netzverb Dictionary, however, lacks a programmable API. Wiktionary has a good one instead. This is why we left the conjugation link to Wiktionary for now and it will definitely serve us well as I'm trying to make it happen

Unless otherwise mentioned, we are talking about Attic Greek throughout this repository.

Diacritic Mark Convention

We employ the following 3 diacritic signs only in vocabulary:

  1. the acute (ά)
  2. the circumflex (ᾶ), and
  3. the grave (ὰ)

In fact, it is called the medium diacritics and the same convention used in Loeb Classical Library prints from Harvard. Notice that, however, the commonly sourced Wiktionary uses full diacritics, including the breve diacritic mark; we don't do that.

Noun Declension

The vocabulary entry for each noun consists of its nominative and genitive forms, an article which indicates the noun's gender, and the English meaning. For example.

  - term: τέχνη τέχνης, ἡ
    definition:
      - art,
      - skill,
      - craft
    declension class: 1st

the vocabulary entry above consists of the following 4 items:

  1. τέχνη: nominative singular

  2. τέχνης: genitive singular

  3. ἡ: nominative feminine singular of the article, which shows that the gender of the noun is feminine. Gender will be indicated by the appropriate form of the definite article "the":

    • ὁ for the masculine nouns
    • ἡ for the feminine nouns
    • τό for the neutor nouns
  4. a list of English meanings of the word

  5. the noun employs the first declension. The 3 classes of declensions are

    1. first declension (1st)
    2. second declension (2nd)
    3. third declension (3rd)

    A multi-form nouns will have a list for this field. For example

      - term: αὐτός αὐτή αὐτό
        definition:
          - (without article) he, she, it, they
          - (without article) himself, herself, itself, themselves
          - (with definite article) same
        declension class:
          αὐτός: 2nd
          αὐτή: 1st
          αὐτό: 2nd

The declension of the entry is not shown because to decline any noun, we can take the genitive singular, remove the genitive singular ending to get the stem, and then add the proper set of endings to the stem based on its declension class2.

For example, to decline τέχνη τέχνης, ἡ, art, take the genitive singular τέχνης, remove the genitive singular ending -ης, and add the appropriate endings to the stem which gives following paradigm:

Case Singular Plural
nominative τέχνη τέχναι
genitive τέχνης τεχνῶν
dative τέχνῃ τέχναις
accusative τέχνην τέχνᾱς
vocative τέχνη τέχναι

Adjective Declension

Declension template:

declension:
  - ["",         singular,  singular, singular, dual,      dual,     dual    plural,    plural,   plural]
  - ["",         masculine, feminine, neuter,   masculine, feminine, neuter, masculine, feminine, neuter]
  - [nominative, █████████, ████████, ████████, █████████, ████████, ██████, █████████, ████████, ██████]
  - [genitive,   █████████, ████████, ████████, █████████, ████████, ██████, █████████, ████████, ██████]
  - [dative,     █████████, ████████, ████████, █████████, ████████, ██████, █████████, ████████, ██████]
  - [accusative, █████████, ████████, ████████, █████████, ████████, ██████, █████████, ████████, ██████]
  - [vocative,   █████████, ████████, ████████, █████████, ████████, ██████, █████████, ████████, ██████]

Verb Conjugation

The Greek verb has 6 principal parts. All 6 must be learned whenever a new verb is encountered:

  1. (first person singular) present indicative active
  2. (first person singular) future indicative active
  3. (first person singular) aorist indicative active
  4. (first person singular) perfect indicative active
  5. (first person singular) perfect indicative passive
  6. (first person singular) aorist indicative passive

Tip

The minimum number of forms which one must know in order to generate all possible forms of a verb are called the principal parts of that verb.

From the 6 forms above, various verb forms (i.e. stems & endings) can be derived by rules3

In practice, however, obtaining precise and complete principal parts for some verbs has been proven to be impossible. While the best efforts have been made for reconstructing the complete principal parts, we also put a link to the Wiktionary of each verb for wilhelm-python-sdk, which will dynamically load the complete conjugation tables into graph database.

What's also being loaded are the reconstructed principal parts with a list of references that validate the reconstruction.

In conclusion, the entry of a verb, thus, has the form of:

- term: string
  definition: list
  conjugation:
    wiktionary: string
    principal parts:
      - ["",                                                 Attic, (Possibly other dialects)]
      - [(first person singular) present indicative active,  █████, ...                      ]
      - [(first person singular) future indicative active,   █████, ...                      ]
      - [(first person singular) aorist indicative active,   █████, ...                      ]
      - [(first person singular) perfect indicative active,  █████, ...                      ]
      - [(first person singular) perfect indicative passive, █████, ...                      ]
      - [(first person singular) aorist indicative passive,  █████, ...                      ]
    references: list

For example:

  - term: λέγω
    definition:
      - to say, speak
      - to pick up
    conjugation:
      wiktionary: https://en.wiktionary.org/wiki/λέγω#Verb_2
      principal parts:
        - ["",                                                 Attic    , Koine          ]
        - [(first person singular) present indicative active,  λέγω     , λέγω           ]
        - [(first person singular) future indicative active,   λέξω     , ἐρῶ            ]
        - [(first person singular) aorist indicative active,   ἔλεξα    , εἶπον/εἶπα     ]
        - [(first person singular) perfect indicative active,  (missing), εἴρηκα         ]
        - [(first person singular) perfect indicative passive, λέλεγμαι , λέλεγμαι       ]
        - [(first person singular) aorist indicative passive,  ἐλέχθην  , ἐρρέθην/ἐρρήθην]
      references:
        - https://en.wiktionary.org/wiki/λέγω#Inflection
        - http://atticgreek.org/downloads/allPPbytypes.pdf
        - https://books.openbookpublishers.com/10.11647/obp.0264/ch25.xhtml
        - https://www.billmounce.com/greek-dictionary/lego
        - https://koine-greek.fandom.com/wiki/Λέγω
vocabulary:
  - term: string
    definition: list

Classical Hebrew (Coming Soon)

The vocabulary is presented to help read and understand Biblical Hebrew. A complementary audio helps well with the pronunciation.

中国人学习韩语有先天优势,加之韩语本身也是一门相当简单的语言,所以这里将语法和词汇合并在一起; 每一项也只由 term(韩)和 definition(中)组成,

vocabulary:
  - term: string
    definition: list of strings
    example:
      - Korean: 제가 아무렴 그쪽 편에 서겠어요
        Chinese: 我无论如何都会站在你这边
      - Korean: ...
        Chinese: ...

不用费太多功夫记牢简单的语法和词汇,剩下的就是拿韩语字幕剧不停练习听说读写既成。example 中的例句均来自韩国本土语料

Note

韩语不属于汉藏语系,因其所属语系非常狭小,无法和其它语言产生足够关联,因此其数据暂时不被存入图数据库进行数据分析

License

The use and distribution terms for wilhelm-vocabulary are covered by the Apache License, Version 2.0.

Footnotes

  1. https://en.wikipedia.org/wiki/German_verbs#Conjugation

  2. Greek: An Intensive Course, 2nd Revised Edition, Hansen & Quinn, p.20

  3. Greek: An Intensive Course, 2nd Revised Edition, Hansen & Quinn, p.44

Releases

No releases published

Packages

No packages published

Languages