Globstar in an or pattern seems to behave inconsistently with uses outside or. #88

srdo · 2021-06-04T09:33:14Z

Hi.

According to the docs, globstar will only match path separators if ** are the only characters in a path segment. The following match returns true as expected, since ** are the only characters in the last path segment.

pm.isMatch("dir1/sub/sub", "dir1/**")

Adding an or to this pattern breaks it, and I'm wondering if it's because the or is being "counted as" another segment?

pm.isMatch("dir1/sub/sub", "dir1/**|dir2/**") -> false
pm.isMatch("dir2/sub/sub", "dir1/**|dir2/**") -> true
pm.isMatch("dir1/sub", "dir1/**|dir2/**") -> true, so the globstar seems to now not match / anymore

The text was updated successfully, but these errors were encountered:

conartist6 · 2022-03-06T16:34:51Z

I just got burned by this too. I need it to work, so I'll see what I can figure out.

conartist6 · 2022-03-06T17:59:03Z

I think the bug must be in the parser somewhere. My minimal repro is:

assert(isMatch('a/b', '(x|**)'));

That expression is parsed as follows:

[
  { "type": "bos", "value": "", "output": "" },
  { "type": "paren", "value": "(" },
  { "type": "text", "value": "x|", "output": "|" },
  { "type": "star", "value": "*", "output": "[^/]*?" },
  { "type": "star", "value": "*", "output": "" },
  { "type": "paren", "value": ")", "output": ")" }
]

conartist6 · 2022-03-06T19:44:39Z

Compare that with pm.parse('(**)'):

[
  { "type": "bos", "value": "", "output": "" },
  { "type": "paren", "value": "(" },
  {
    "type": "globstar",
    "value": "**",
    "output": "(?:(?:(?!(?:^|\\\\/)\\\\.).)*?)"
  },
  { "type": "paren", "value": ")", "output": ")" }
]

conartist6 · 2022-03-06T19:49:45Z

And pm.parse('(x|y)'):

[
  { "type": "bos", "value": "", "output": "" },
  { "type": "paren", "value": "(" },
  { "type": "text", "value": "x|y", "output": "|y" },
  { "type": "paren", "value": ")", "output": ")" }
]

Something else is clearly very odd about this, since the output would appear to be invalid, i.e. no part of the reported output would ever match x. In practice though x can be matched by (x|y).

conartist6 · 2022-03-06T20:13:12Z

I wonder why | is always treated as type: 'text' by the parser:

picomatch/lib/parse.js

Lines 616 to 622 in 5467a5a

    
           if (value === '|') { 
        
             if (extglobs.length > 0) { 
        
               extglobs[extglobs.length - 1].conditions++; 
        
             } 
        
             push({ type: 'text', value }); 
        
             continue; 
        
           }

conartist6 · 2022-03-06T20:29:05Z

I'm also a bit suspicious about this:

picomatch/lib/parse.js

Lines 825 to 828 in 5467a5a

    
           const prior = prev.prev; 
        
           const before = prior.prev; 
        
           const isStart = prior.type === 'slash' || prior.type === 'bos'; 
        
           const afterStar = before && (before.type === 'star' || before.type === 'globstar');

conartist6 · 2022-03-06T20:50:06Z

It looks to me like what should happen is we should fall through a couple cases and end up here:

picomatch/lib/parse.js

Line 909 in 5467a5a

prev.type = 'globstar';

Instead we're falling into this case:

picomatch/lib/parse.js

Lines 837 to 840 in 5467a5a

    
           if (!isStart && prior.type !== 'paren' && !isBrace && !isExtglob) { 
        
             push({ type: 'star', value, output: '' }); 
        
             continue; 
        
           }

I suspect the confusing naming scheme with prev, prior and before has led to confusion about what contains what.

jonschlinkert · 2022-03-06T22:06:00Z

Adding an or to this pattern breaks it, and I'm wondering if it's because the or is being "counted as" another segment?

This isn't a bug, the pattern you're using isn't a supported glob. It's just creating a regular expression that happens to not match what you want. Try wrapping the segment in parentheses like this:

pm.isMatch('dir1/sub/sub', 'dir1/(**|dir2)/**');
pm.isMatch('dir2/sub/sub', 'dir1/(**|dir2)/**');
pm.isMatch('dir1/sub', 'dir1/(**|dir2)/**');

Or you can use braces, like: dir1/{**,dir2}/**.

conartist6 · 2022-03-07T01:28:42Z

@jonschlinkert Could you take another look at this? The example pattern you've provided always matches everything, and you seem to be asking me to do exactly what I am already doing that does not work.

Again the failing test case is:

pm.isMatch('a/b', '(x|**)'); // false

and it fails because the ** is interpreted as if it were a * for no reason that I understand.

conartist6 · 2022-03-07T01:48:24Z

Your parser also has different output for (**|x) and (x|**):

> pm.parse('(**|x)').tokens.map(({prev, ...rest}) => rest)
[
  { type: 'bos', value: '', output: '' },
  { type: 'paren', value: '(' },
  { type: 'star', value: '*', output: '[^/]*?' },
  { type: 'text', value: '|x', output: 'x' },
  { type: 'paren', value: ')', output: ')' }
]
> pm.parse('(x|**)').tokens.map(({prev, ...rest}) => rest)
[
  { type: 'bos', value: '', output: '' },
  { type: 'paren', value: '(' },
  { type: 'text', value: 'x|', output: '|' },
  { type: 'star', value: '*', output: '[^/]*?' },
  { type: 'star', value: '*', output: '' },
  { type: 'paren', value: ')', output: ')' }
]

conartist6 · 2022-03-07T01:57:06Z

Ah I'm sorry, I guess I've sort of co-opted this issue. You're saying that OPs code would work if they just added parens, and I'm saying that parens or no it's still gonna be at least partly busted (and busted in different ways depending on the order of the ** within the alternative group).

conartist6 · 2022-03-07T06:19:48Z

Sorry it's a bit off topic, but it's late and I'm having fun. I think I'm going to try my hand at rewriting this parser using a technique I've been building slowly towards. My technique would create a fully streaming (!) n-lookahead parser with top-down control and easy and expressive error handling. An implementation of parserate (to be built on top of the existing forkerate) would allow a parser to be written like so:

function *tokens(iterable, options) {
  const parsr = parserate(iterable);

  while (!parsr.done) {
    if (parsr.consume(/\w+/), 'text') {
      yield {
        type: 'text',
        value: parsr.consumed,
      }
    } else if (parsr.consume(/\d+/), 'digit') {
      yield {
        type: 'digit',
        value: parseInt(parsr.consumed, 10),
      }
    } else if (options.backreferences && parsr.consume(/\{(\d+)\}/, 'backreference')) {
      yield {
        type: 'backreference',
        value: parseInt(parsr.consumedMatch[1], 10),
      }
    } else {
      throw parsr.error(`Unexpected character: '${parsr.value}'`);
      // SyntaxError: Unexpected character: '&'
      // Expected one of: text, digit
      // line: 1,
      // col: 17
    }
  }
}

srdo · 2022-03-07T16:25:40Z

@jonschlinkert You are right, I was missing some parentheses. Changing the pattern to (dir1/**)|(dir2/**) did the trick to express "paths starting with dir1/ or dir2/". Thanks for the hint.

@conartist6 If your original pattern is intended to express "x or nonempty string", it looks like (x)|(**) does that.

I'll close this issue, I'm hoping this hint also solves your problem Conrad, but either way it's probably better if you raise a separate issue for it :)

conartist6 · 2022-03-10T15:59:31Z

I just want to make clear for anyone else that happens across this that the problem described by the title of this issue is not currently resolved, and is now tracked in #104.

srdo closed this as completed Mar 7, 2022

conartist6 mentioned this issue Mar 10, 2022

** behaves like * inside or expression #104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Globstar in an or pattern seems to behave inconsistently with uses outside or. #88

Globstar in an or pattern seems to behave inconsistently with uses outside or. #88

srdo commented Jun 4, 2021

conartist6 commented Mar 6, 2022

conartist6 commented Mar 6, 2022

conartist6 commented Mar 6, 2022 •

edited

Loading

conartist6 commented Mar 6, 2022 •

edited

Loading

conartist6 commented Mar 6, 2022 •

edited

Loading

conartist6 commented Mar 6, 2022

conartist6 commented Mar 6, 2022

jonschlinkert commented Mar 6, 2022

conartist6 commented Mar 7, 2022 •

edited

Loading

conartist6 commented Mar 7, 2022

conartist6 commented Mar 7, 2022 •

edited

Loading

conartist6 commented Mar 7, 2022 •

edited

Loading

srdo commented Mar 7, 2022

conartist6 commented Mar 10, 2022

Globstar in an or pattern seems to behave inconsistently with uses outside or. #88

Globstar in an or pattern seems to behave inconsistently with uses outside or. #88

Comments

srdo commented Jun 4, 2021

conartist6 commented Mar 6, 2022

conartist6 commented Mar 6, 2022

conartist6 commented Mar 6, 2022 • edited Loading

conartist6 commented Mar 6, 2022 • edited Loading

conartist6 commented Mar 6, 2022 • edited Loading

conartist6 commented Mar 6, 2022

conartist6 commented Mar 6, 2022

jonschlinkert commented Mar 6, 2022

conartist6 commented Mar 7, 2022 • edited Loading

conartist6 commented Mar 7, 2022

conartist6 commented Mar 7, 2022 • edited Loading

conartist6 commented Mar 7, 2022 • edited Loading

srdo commented Mar 7, 2022

conartist6 commented Mar 10, 2022

conartist6 commented Mar 6, 2022 •

edited

Loading

conartist6 commented Mar 6, 2022 •

edited

Loading

conartist6 commented Mar 6, 2022 •

edited

Loading

conartist6 commented Mar 7, 2022 •

edited

Loading

conartist6 commented Mar 7, 2022 •

edited

Loading

conartist6 commented Mar 7, 2022 •

edited

Loading