Spack Monitor NLP

This is an effort to see if we can do some kind of clustering using the warning and error messages in the server. The goal will be to:

retrieve current warnings and errors via the spack monitor API
build a word2vec model using them.
output embeddings for each
cluster!

⭐️ View Interface ⭐️

You'll see that the best clustering comes from just using the error or warning messages, and that most of the clusters are boost errors. Could it be that a direct match (e.g., parsing libraries in advance to identify text of errors and using that) is better? Perhaps! And in fact we could do some kind of KNN based on that too. This is more of an unsupervised clustering (we don't have labels).

Usage

$ python -m venv env
$ source env/bin/activate
$ pip install -r requirements.txt

Install umap from conda:

$ conda install -c conda-forge umap-learn
$ pip install umap-learn

Then download data from spack monitor

$ python 1.get_data.py

This will generate a file of errors and warnings!

$ tree data/
data/
├── errors.json
└── warnings.json

We next want to preprocess the data and generate models / vectors!

$ python 2.vectors.py

We are currently only parsing errors, as it's a smaller set and we are more interested in build errors than warnings that clutter the signal. For the "error only" (or parsed) approach we look for strings that have error: and split and take the right side of that. For all other processing methods, we remove paths (e.g., tokenize then remove anything with an os.sep or path separator).

Then generate counts of data (to be put into docs if we want to eventually visualize):

$ python 3.charts.py
Found 30000 errors!
1832 out of 30000 mention 'undefined reference'

Some data will be generated in data, and assets for the web interface will go into docs. The interface allows you to select and see the difference between the models, and clearly just using the error messages (parsed or not) has the strongest signal (best clustering).

And finally, generate a quick plot to show that, if we did KNN for each error, the mean similarity of the closests 10 points (standard deviation not shown, but is calculated if we need):

License

Spack is distributed under the terms of both the MIT license and the Apache License (Version 2.0). Users may choose either license, at their option.

All new contributions must be made under both the MIT and Apache-2.0 licenses.

See LICENSE-MIT, LICENSE-APACHE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (Apache-2.0 OR MIT)

LLNL-CODE-811652

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Spack Monitor NLP

Usage

License

About

Licenses found

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
docs		docs
.gitiginore		.gitiginore
1.get_data.py		1.get_data.py
2.vectors.py		2.vectors.py
3.charts.py		3.charts.py
4.scores.py		4.scores.py
COPYRIGHT		COPYRIGHT
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
NOTICE		NOTICE
README.md		README.md
requirements.txt		requirements.txt

License

Licenses found

buildsi/spack-monitor-nlp

Folders and files

Latest commit

History

Repository files navigation

Spack Monitor NLP

Usage

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages