Skip to content

Commit

Permalink
Introduce splitRegex option.
Browse files Browse the repository at this point in the history
Issue:
Using /\b/ to split the text is limited to languages using only the 63
characters:

```
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9 _
```

cf https://stackoverflow.com/a/2449892

For example, in French, the string `je suis français` will be split into
`["je", " ", "suis", " ", "fran", "ç", "ais"]` which won't allow to
perform the bad-words cleaning.

Therefore I added an option `splitRegex` which allow to overwrite the
regex used to split.
  • Loading branch information
p9f committed Aug 23, 2019
1 parent 3b6febd commit 6c97852
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 2 deletions.
6 changes: 4 additions & 2 deletions lib/badwords.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@ class Filter {
* @param {string} options.placeHolder - Character used to replace profane words.
* @param {string} options.regex - Regular expression used to sanitize words before comparing them to blacklist.
* @param {string} options.replaceRegex - Regular expression used to replace profane words with placeHolder.
* @param {string} options.splitRegex - Regular expression used to split a string into words.
*/
constructor(options = {}) {
Object.assign(this, {
list: options.emptyList && [] || Array.prototype.concat.apply(localList, [baseList, options.list || []]),
exclude: options.exclude || [],
splitRegex: options.splitRegex || /\b/,
placeHolder: options.placeHolder || '*',
regex: options.regex || /[^a-zA-Z0-9|\$|\@]|\^/g,
replaceRegex: options.replaceRegex || /\w/g
Expand Down Expand Up @@ -51,9 +53,9 @@ class Filter {
* @param {string} string - Sentence to filter.
*/
clean(string) {
return string.split(/\b/).map((word) => {
return string.split(this.splitRegex).map((word) => {
return this.isProfane(word) ? this.replaceWord(word) : word;
}).join('');
}).join(this.splitRegex.exec(string)[0]);

This comment has been minimized.

Copy link
@winrid

winrid Aug 22, 2021

This blows up sometimes, since "exec" can return null. Example input: "?"

}

/**
Expand Down
24 changes: 24 additions & 0 deletions test/options.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
require('assert');
var Filter = require('../lib/badwords.js'),
assert = require('better-assert');

describe('options', function() {
describe('split regex', function() {

it('default value', function() {
filter = new Filter();
filter.addWords('français');
assert(filter.clean('fucking asshole') == '******* *******');
assert(filter.clean('mot en français') == 'mot en français');
});

it('override value', function() {
filter = new Filter({splitRegex: / /});
filter.addWords('français');
assert(filter.clean('fucking asshole') == '******* *******');
assert(filter.clean('mot en français') == 'mot en *******');
});


});
});

1 comment on commit 6c97852

@Fitmavincent
Copy link

@Fitmavincent Fitmavincent commented on 6c97852 Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently this splitRegix causing a single emoji error. @p9f
#93

Please sign in to comment.