Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Include offsets in return value #13

Closed
SebastianZ opened this issue Sep 15, 2016 · 12 comments
Closed

Include offsets in return value #13

SebastianZ opened this issue Sep 15, 2016 · 12 comments

Comments

@SebastianZ
Copy link

A disadvantage of String.prototype.match() and RegExp.prototype.exec() is also that you don't get the offsets of the captured groups.

Therefore I've created https://github.com/SebastianZ/es-proposal-advanced-regexp-match-info yesterday. Though I think it might be worth to combine both to avoid another regexp function doing a similar thing.

Sebastian

@ljharb
Copy link
Member

ljharb commented Sep 15, 2016

Can you be more specific?

var str = 'abc def abc';
var regex = /a(b)c/g;
var a = regex.exec(str);
console.log(a, a.index, a.input, regex.lastIndex);
var b = regex.exec(str);
console.log(b, b.index, b.input, regex.lastIndex);

logs:

["abc", "b"] (2)
0
"abc def abc"
3

["abc", "b"] (2)
8
"abc def abc"
11

@SebastianZ
Copy link
Author

The index of the captured group "b" is not returned.

In your example you could still get the index of b by running another regex, which only covers the capturing group (submatch) ‒ i.e. /b/ ‒ on the match of the first expression. But this is obviously a bad workaround and doesn't work in all cases.

Example:

var str = 'aabbcc';
var regex = /a+.(b)/
var a = regex.exec(str);
console.log(a, a.index, a.input, regex.lastIndex);
var b = regex.exec(str);
console.log(b, b.index, b.input, regex.lastIndex);

Logs:

["aabb", "b"] 0 aabbcc 0
["aabb", "b"] 0 aabbcc 0

There's no chance to know whether the captured group "b" is the first or the second b within the match.

Sebastian

@ljharb
Copy link
Member

ljharb commented Sep 16, 2016

ah, i see what you mean. That seems like the sort of property that should just be added on the exec return value - thus making match and matchAll both get it for free.

@SebastianZ
Copy link
Author

exec() may be changed to take a parameter, but I was told by @caridy in the discussion about formatToParts() that

if the function forks the logic based on some arguments, then in principle it should be splitted into to functions.

Sebastian

@SebastianZ
Copy link
Author

So, the idea was to let String.prototype.matchAll() return an array of objects containing the matched strings plus the information about their start and end indexes.

Taking the example from your proposal:

var regex = /t(e)(st(\d?))/g;
var string = 'test1test2';
var result = string.matchAll(regex);

result would contain this:

[
  {
    match: "test1",
    start: 0,
    end: 5,
    groups: [
      {
        match: "e",
        start: 1,
        end: 2,
      },
      {
        match: "st1",
        start: 2,
        end: 5,
      },
      {
        match: "1",
        start: 4,
        end: 5,
      }
    ]
  },
  {
    match: "test2",
    start: 5,
    end: 10,
    groups: [
      {
        match: "e",
        start: 6,
        end: 7,
      },
      {
        match: "st2",
        start: 7,
        end: 10,
      },
      {
        match: "d",
        start: 9,
        end: 10,
      }
    ]
  }
]

instead of this:

[
  [
    "test1",
    "e",
    "st1",
    "1"
  ],
  [
    "test2",
    "e",
    "st2",
    "2"
  ]
]

@SebastianZ
Copy link
Author

And without the "g" flag, i.e. matching against /t(e)(st(\d?))/ you'd get this as result:

[
  {
    match: "test1",
    start: 0,
    end: 5,
    groups: [
      {
        match: "e",
        start: 1,
        end: 2,
      },
      {
        match: "st1",
        start: 2,
        end: 5,
      },
      {
        match: "1",
        start: 4,
        end: 5,
      }
    ]
  }
]

@ljharb
Copy link
Member

ljharb commented Sep 16, 2016

Why would it need a parameter tho? My thinking is that every .exec call could just unconditionally have an extra property stuck on the array.

@SebastianZ
Copy link
Author

You mean exec() would return the array of matches and add a property with the object to it?
That's another solution, but I see two issues with that:

  1. It extends a standard array object by a property. (Is there any other API doing that?)

  2. The structure would contain duplicate info. To avoid that the structure may then be changed to only contain the start and end offsets for the matches to be consistent with the matches array. Applied to the previous example, this would then look like:

    [
      {
        start: 0,
        end: 5
      },
      {
        start: 1,
        end: 2
      },
      {
        start: 2,
        end: 5
      },
      {
        start: 4,
        end: 5
      }
    ]

Sebastian

@ljharb
Copy link
Member

ljharb commented Sep 16, 2016

exec already does return an array of matches with 2 properties on it (see input and lastIndex above) - this would just be a third.

I definitely think it should be a separate proposal, but I'm not sure if a separate method is the best approach.

@ljharb ljharb closed this as completed Sep 16, 2016
@SebastianZ
Copy link
Author

exec already does return an array of matches with 2 properties on it (see input and lastIndex above) - this would just be a third.

I have to admit I never realized that. So, then it's probably ok to add a third parameter.

I definitely think it should be a separate proposal, but I'm not sure if a separate method is the best approach.

I still believe it would have been great to merge both, but ok. So I'll continue to write my proposal at https://github.com/SebastianZ/es-proposal-advanced-regexp-match-info. Would be great if you could review it once it's ready for being proposed to es-discuss.

Sebastian

@ljharb
Copy link
Member

ljharb commented Sep 16, 2016

Don't get me wrong, I think both would be great :-) but this proposal intentionally keeps it simple by returning the same thing .exec does. If your proposal simply modifies that in a backwards-compatible way, the two proposals could proceed in parallel, whereas combining them might slow both.

@SebastianZ
Copy link
Author

Sounds reasonable.
So, I hope both proposals get traction and make it into the standard then. 😄

Sebastian

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants