Skip to content

GopherML/bag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bag GoDoc Status Go Report Card Go Test Coverage

All Contributors

Bag is a powerful yet user-friendly bag of words (BoW) implementation written in Go, leveraging a Naive Bayes classifier for efficient text analysis. It functions both as a library that can be seamlessly integrated into Go code and as an accessible command line tool. This dual functionality allows users to leverage bag of words capabilities directly from the command line, making it accessible from any programming language. The implementation supports a file format that facilitates using bag of words functionality as code, designed for ease of use and flexible integration in various environments.

billboard

What is Bag of Words (BoW)?

The bag of words (BoW) model is a fundamental text representation technique in natural language processing (NLP). In this model, a text (such as a sentence or a document) is represented as an unordered collection of words, disregarding grammar and word order but keeping multiplicity. The key idea is to create a vocabulary of all the unique words in the text corpus and then represent each text by a vector of word frequencies or binary indicators. This vector indicates the presence or absence, or frequency, of each word from the vocabulary within the text. The BoW model is widely used for text classification tasks, including sentiment analysis, due to its simplicity and effectiveness in capturing word occurrences.

Demo

Examples

New

func ExampleNew() {
	var cfg Config
	// Initialize with default values
	exampleBag = New(cfg)
}

NewFromTrainingSet

func ExampleNewFromTrainingSet() {
	var t TrainingSet
	t.Samples = SamplesByLabel{
		"positive": {
			"I love this product, it is amazing!",
			"I am very happy with this.",
			"Very good",
		},

		"negative": {
			"This is the worst thing ever.",
			"I hate this so much.",
			"Not good",
		},
	}

	// Initialize with default values
	exampleBag = NewFromTrainingSet(t)
}

Bag.Train

func ExampleBag_Train() {
	exampleBag.Train("I love this product, it is amazing!", "positive")
	exampleBag.Train("This is the worst thing ever.", "negative")
	exampleBag.Train("I am very happy with this.", "positive")
	exampleBag.Train("I hate this so much.", "negative")
	exampleBag.Train("Not good", "negative")
	exampleBag.Train("Very good", "positive")
}

Bag.GetResults

func ExampleBag_GetResults() {
	exampleResults = exampleBag.GetResults("I am very happy with this product.")
	fmt.Println("Collection of results", exampleResults)
}

Results.GetHighestProbability

func ExampleResults_GetHighestProbability() {
	match := exampleResults.GetHighestProbability()
	fmt.Println("Highest probability", match)
}

TrainingSet File

config:
  ngram-size: 1
samples:
  yes:
    - "yes"
    - "Yeah"
    - "Yep"

  no:
    - "No"
    - "Nope"
    - "Nah"

# Note: This training set is short for the sake of README filesize,
# please look in the examples directory for more complete examples

Road to v1.0.0

  • Working implementation as Go library
  • Training sets
  • Support Character NGrams
  • Text normalization added to inbound text processing
  • CLI utility

Long term goals

  • Generated model as MMAP file

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Josh Montoya
Josh Montoya

💻 📖
Matt Stay
Matt Stay

🎨
Chewxy
Chewxy

⚠️
Jack Muir
Jack Muir

⚠️

This project follows the all-contributors specification. Contributions of any kind welcome!