Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with multiple named group #216

Closed
nowox opened this issue Jan 20, 2015 · 5 comments
Closed

Issue with multiple named group #216

nowox opened this issue Jan 20, 2015 · 5 comments

Comments

@nowox
Copy link

nowox commented Jan 20, 2015

In this example:
https://regex101.com/r/tQ5qT2/2

The match 15 is never treated

@nhahtdh
Copy link
Collaborator

nhahtdh commented Jan 21, 2015

Referring to captured text in replacement string is not supported in PHP (and R also). PCRE library itself also doesn't have a straightforward replacement API.

However, it seems preg wrapper in PHP works in this case, if you use preg_replace_callback:
http://ideone.com/gRGVUG

So it is worth looking into the implementation of this function in PHP:
https://github.com/php/php-src/blob/bf59acdea75cf13d179f10ce89d296a30f38676d/ext/pcre/php_pcre.c#L1097

@CasimirEtHippolyte
Copy link

You can't indeed refer to a capture by its name in a replacement pattern, however all named captures are at the same time a numbered capture, and you can always use the associated number**(s)**.

About the J modifier in pcre
(Note that it is only an inline modifier and not a global modifier in the PHP implementation, so you need to write (?J) and (?-J) in the pattern to switch it).

When you use it, you are allowed to use the same name for several captures. It is useful if you want to add semantic to your pattern and it is handy if you need to use a backreference inside the pattern.

This feature highlights an issue related to how pcre manages duplicated named captures. Named captures are only aliases for one group and for one match. But if you perform a global research the named capture is alternately the first or the second capture group (in your pattern). So you can't define a bijective relation between a name and a number. If you use preg_match_all in PHP, you can see that the named capture contains only the matches of the last defined group in the pattern. It seems that regex101 has choosen the first. Is it a bug, I can't say it. At least it is a difference of implementation. Note that I don't have tested with R language nor boost.

The only way to obtain a many-to-one relation between numbered groups and a named group in your case is to use the branch reset feature that gives to the two groups the same number:

(?|
    (?&R)\K\s*(?<sign>[=+*-])\s*(?=(?&R))
  |
    (?&R)\s*\K\s*(?<sign>=)\s*(?=-(?&R))
)

(?(DEFINE)
    (?<R>(?<!\w) ( (?:f|r|s) (?:1[0-5]|[0-9]) | CI | 1))
)

As an aside, you can easily design your pattern differently to avoid the problem.

@firasdib
Copy link
Owner

As previously discussed, this is a tricky edge case. Adjusting it one way or the other will only eliminate one case and introduce another.

This is all custom implementation by me, as PCRE has no support for any form of substitution or such. Every language has their own way of doing it. To be safe, you should design your pattern more carefully to avoid falling into problems like these.

@firasdib
Copy link
Owner

I just read this in the perldoc:

When different groups within the same pattern have the same name, any reference to that name assumes the leftmost defined group.

@totalavatar
Copy link

In this example: https://regex101.com/r/tQ5qT2/2

The match 15 is never treated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants
@nhahtdh @nowox @firasdib @CasimirEtHippolyte @totalavatar and others