Skip to content

ShreeCharranR/Named-Entity-Recognition-NER-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Named Entity Recognition (NER)

Named Entity Recognition (NER) , also known as entity chunking/extraction , is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.

Data Set used - https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus

Tag Distribution

The GMB dataset utilizes IOB tagging or Inside, Outside Beginning. IOB is a common tagging format for tagging tokens which we have discussed earlier. To refresh your memory:

  • I- prefix before a tag indicates that the tag is inside a chunk.
  • B- prefix before a tag indicates that the tag is the beginning of a chunk.
  • O- tag indicates that a token belongs to no chunk (outside).

The tags in this dataset are explained as follows:

  • geo = Geographical Entity
  • org = Organization
  • per = Person
  • gpe = Geopolitical Entity
  • tim = Time indicator
  • art = Artifact
  • eve = Event
  • nat = Natural Phenomenon

Anything outside these classes is termed as other, denoted as O.

1- Conditional Random Fields

CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets $X$ and $Y$, the observed and output variables, respectively; the conditional distribution $p(Y|X)$ is then modeled.

2- Spacy

Releases

No releases published

Packages

No packages published