Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globstar in an or pattern seems to behave inconsistently with uses outside or. #88

Closed
srdo opened this issue Jun 4, 2021 · 14 comments
Closed

Comments

@srdo
Copy link

srdo commented Jun 4, 2021

Hi.

According to the docs, globstar will only match path separators if ** are the only characters in a path segment. The following match returns true as expected, since ** are the only characters in the last path segment.

pm.isMatch("dir1/sub/sub", "dir1/**")

Adding an or to this pattern breaks it, and I'm wondering if it's because the or is being "counted as" another segment?

pm.isMatch("dir1/sub/sub", "dir1/**|dir2/**") -> false
pm.isMatch("dir2/sub/sub", "dir1/**|dir2/**") -> true
pm.isMatch("dir1/sub", "dir1/**|dir2/**") -> true, so the globstar seems to now not match / anymore
@conartist6
Copy link

I just got burned by this too. I need it to work, so I'll see what I can figure out.

@conartist6
Copy link

I think the bug must be in the parser somewhere. My minimal repro is:

assert(isMatch('a/b', '(x|**)'));

That expression is parsed as follows:

[
  { "type": "bos", "value": "", "output": "" },
  { "type": "paren", "value": "(" },
  { "type": "text", "value": "x|", "output": "|" },
  { "type": "star", "value": "*", "output": "[^/]*?" },
  { "type": "star", "value": "*", "output": "" },
  { "type": "paren", "value": ")", "output": ")" }
]

@conartist6
Copy link

conartist6 commented Mar 6, 2022

Compare that with pm.parse('(**)'):

[
  { "type": "bos", "value": "", "output": "" },
  { "type": "paren", "value": "(" },
  {
    "type": "globstar",
    "value": "**",
    "output": "(?:(?:(?!(?:^|\\\\/)\\\\.).)*?)"
  },
  { "type": "paren", "value": ")", "output": ")" }
]

@conartist6
Copy link

conartist6 commented Mar 6, 2022

And pm.parse('(x|y)'):

[
  { "type": "bos", "value": "", "output": "" },
  { "type": "paren", "value": "(" },
  { "type": "text", "value": "x|y", "output": "|y" },
  { "type": "paren", "value": ")", "output": ")" }
]

Something else is clearly very odd about this, since the output would appear to be invalid, i.e. no part of the reported output would ever match x. In practice though x can be matched by (x|y).

@conartist6
Copy link

conartist6 commented Mar 6, 2022

I wonder why | is always treated as type: 'text' by the parser:

picomatch/lib/parse.js

Lines 616 to 622 in 5467a5a

if (value === '|') {
if (extglobs.length > 0) {
extglobs[extglobs.length - 1].conditions++;
}
push({ type: 'text', value });
continue;
}

@conartist6
Copy link

I'm also a bit suspicious about this:

picomatch/lib/parse.js

Lines 825 to 828 in 5467a5a

const prior = prev.prev;
const before = prior.prev;
const isStart = prior.type === 'slash' || prior.type === 'bos';
const afterStar = before && (before.type === 'star' || before.type === 'globstar');

@conartist6
Copy link

It looks to me like what should happen is we should fall through a couple cases and end up here:

prev.type = 'globstar';

Instead we're falling into this case:

picomatch/lib/parse.js

Lines 837 to 840 in 5467a5a

if (!isStart && prior.type !== 'paren' && !isBrace && !isExtglob) {
push({ type: 'star', value, output: '' });
continue;
}

I suspect the confusing naming scheme with prev, prior and before has led to confusion about what contains what.

@jonschlinkert
Copy link
Member

Adding an or to this pattern breaks it, and I'm wondering if it's because the or is being "counted as" another segment?

This isn't a bug, the pattern you're using isn't a supported glob. It's just creating a regular expression that happens to not match what you want. Try wrapping the segment in parentheses like this:

pm.isMatch('dir1/sub/sub', 'dir1/(**|dir2)/**');
pm.isMatch('dir2/sub/sub', 'dir1/(**|dir2)/**');
pm.isMatch('dir1/sub', 'dir1/(**|dir2)/**');

Or you can use braces, like: dir1/{**,dir2}/**.

@conartist6
Copy link

conartist6 commented Mar 7, 2022

@jonschlinkert Could you take another look at this? The example pattern you've provided always matches everything, and you seem to be asking me to do exactly what I am already doing that does not work.

Again the failing test case is:

pm.isMatch('a/b', '(x|**)'); // false

and it fails because the ** is interpreted as if it were a * for no reason that I understand.

@conartist6
Copy link

Your parser also has different output for (**|x) and (x|**):

> pm.parse('(**|x)').tokens.map(({prev, ...rest}) => rest)
[
  { type: 'bos', value: '', output: '' },
  { type: 'paren', value: '(' },
  { type: 'star', value: '*', output: '[^/]*?' },
  { type: 'text', value: '|x', output: 'x' },
  { type: 'paren', value: ')', output: ')' }
]
> pm.parse('(x|**)').tokens.map(({prev, ...rest}) => rest)
[
  { type: 'bos', value: '', output: '' },
  { type: 'paren', value: '(' },
  { type: 'text', value: 'x|', output: '|' },
  { type: 'star', value: '*', output: '[^/]*?' },
  { type: 'star', value: '*', output: '' },
  { type: 'paren', value: ')', output: ')' }
]

@conartist6
Copy link

conartist6 commented Mar 7, 2022

Ah I'm sorry, I guess I've sort of co-opted this issue. You're saying that OPs code would work if they just added parens, and I'm saying that parens or no it's still gonna be at least partly busted (and busted in different ways depending on the order of the ** within the alternative group).

@conartist6
Copy link

conartist6 commented Mar 7, 2022

Sorry it's a bit off topic, but it's late and I'm having fun. I think I'm going to try my hand at rewriting this parser using a technique I've been building slowly towards. My technique would create a fully streaming (!) n-lookahead parser with top-down control and easy and expressive error handling. An implementation of parserate (to be built on top of the existing forkerate) would allow a parser to be written like so:

function *tokens(iterable, options) {
  const parsr = parserate(iterable);

  while (!parsr.done) {
    if (parsr.consume(/\w+/), 'text') {
      yield {
        type: 'text',
        value: parsr.consumed,
      }
    } else if (parsr.consume(/\d+/), 'digit') {
      yield {
        type: 'digit',
        value: parseInt(parsr.consumed, 10),
      }
    } else if (options.backreferences && parsr.consume(/\{(\d+)\}/, 'backreference')) {
      yield {
        type: 'backreference',
        value: parseInt(parsr.consumedMatch[1], 10),
      }
    } else {
      throw parsr.error(`Unexpected character: '${parsr.value}'`);
      // SyntaxError: Unexpected character: '&'
      // Expected one of: text, digit
      // line: 1,
      // col: 17
    }
  }
}

@srdo
Copy link
Author

srdo commented Mar 7, 2022

@jonschlinkert You are right, I was missing some parentheses. Changing the pattern to (dir1/**)|(dir2/**) did the trick to express "paths starting with dir1/ or dir2/". Thanks for the hint.

@conartist6 If your original pattern is intended to express "x or nonempty string", it looks like (x)|(**) does that.

I'll close this issue, I'm hoping this hint also solves your problem Conrad, but either way it's probably better if you raise a separate issue for it :)

@conartist6
Copy link

I just want to make clear for anyone else that happens across this that the problem described by the title of this issue is not currently resolved, and is now tracked in #104.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants