Named Entity Recognition (NER) , also known as entity chunking/extraction , is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.
Data Set used - https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus
The GMB dataset utilizes IOB tagging or Inside, Outside Beginning. IOB is a common tagging format for tagging tokens which we have discussed earlier. To refresh your memory:
- I- prefix before a tag indicates that the tag is inside a chunk.
- B- prefix before a tag indicates that the tag is the beginning of a chunk.
- O- tag indicates that a token belongs to no chunk (outside).
The tags in this dataset are explained as follows:
- geo = Geographical Entity
- org = Organization
- per = Person
- gpe = Geopolitical Entity
- tim = Time indicator
- art = Artifact
- eve = Event
- nat = Natural Phenomenon
Anything outside these classes is termed as other, denoted as O.
CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets