Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String patterns #140

Open
AnthonyDGreen opened this issue Jul 13, 2017 · 7 comments
Open

String patterns #140

AnthonyDGreen opened this issue Jul 13, 2017 · 7 comments

Comments

@AnthonyDGreen
Copy link
Contributor

This builds on proposal #124 "Pattern Matching" with a pattern specifically designed for the decomposing (parsing) strings. The syntax is designed to as closely as practical mirror the interpolated string syntax introduced in VB14 for creating strings.

Select Case input
    Case Match $"{scheme}://{rest}" When scheme = "ms-word"
        
    Case Match $"{user}@{domain}" When domain <> "outlook.com"
        Throw New NotSupportedException("All must use outlook.com!")
    Case Match $"{*}.docx"
                
    Case Match $"{CInt(id)}|{CDate(date)}|{description}|{CDbl(total)}"
        ' CSVs are never this easy to parse. Sooner or later 75 rows in somebody gets clever.
End Select

After 20 years of InStr, Mid, Left, and Right this really excites me. Even if I'm using IndexOf and Substring it still feels like this kind of parsing is still a frequently recurring task in my life.

This would compose with other patterns too, the "interpolations" bounded by { and } could contain other patterns. Right now I've only been able to figure out how to make it work with lazy matching and without backtracking and it falls apart if two interpolations appear side by side (with no text between) since the first will eat up all the text.

Should the "alignment" part of an interpolation be usable to require/match a substring of fixed or minimal length? Match $"{y,4}-{m}-{d}"
Maybe. It could help with the problem mentioned above.

Is there anything at all that would make sense with the "format" part of an interpolation? It seems hard since there's really no way to get that part to mean the same thing coming out as it does going in.

We need to find the sweet spot for productivity vs. power (as always). These be dangerous waters.

Can we do something to modernize the VB Like operator instead?
Not sure.

Why not full-blown regex literals like in Perl?
This has always been a great personal temptation for me. At the moment I don't think this is the right way to go. RegEx is about making extremely complicated things more terse (and cryptic). That's counter to our goals with VB of making things straight-forward and approachable. All but the very simplest of regexes (regexen?) quickly become arcane magic. One test of this proposal will be how much people still need to fall back to regex with it.

Does this pattern need built-in alternation?

It does seem natural to support alternation within this pattern as a way of describing optionality:

Select Case url
    Case Match $"{scheme}://{domain}:{port}/{path}?{query}",
               $"{scheme}://{domain}:{port}/{path}",
               $"{scheme}://{domain}/{path}",
               $"{domain}/{path}?{query}",
               $"{domain}/{path}",
               $"{domain}"
               
    Case Match $"{drive}:\{absolute}", ' Full path.
               $"\{absolute}",         ' Absolute path on current drive.
               $"{drive}:{relative}"   ' Relative path on current drive.
               $"{relative}"           ' Wait, what, this is a thing?
                              
End Select

Any term that isn't matched in all cases may be null (maybe we should require a ? after the name then). The second Case is complete (I think), but the first does not exhaustively handle all permutations. I think the number of cases you'd need to write to represent all possibilities is 16 (could be wrong). I think the correct code is:

Select Case url
    Case Match $"{scheme}://{domain}:{port}/{path}?{query}",
               $"{scheme}://{domain}:{port}/{path}",
               $"{scheme}://{domain}:{port}?{query}",
               $"{scheme}://{domain}:{port}",
               $"{scheme}://{domain}/{path}?{query}",
               $"{scheme}://{domain}/{path}",
               $"{scheme}://{domain}?{query}",
               $"{scheme}://{domain}",
               $"{domain}:{port}/{path}?{query}",
               $"{domain}:{port}/{path}",
               $"{domain}:{port}?{query}",
               $"{domain}:{port}",
               $"{domain}/{path}?{query}",
               $"{domain}/{path}?{query}",
               $"{domain}?{query}",
               $"{domain}"

Is that really so much more readable than "((<scheme>.+)://)?(<domain>*+)(:(<port>\d+))?(/(<path>.*))?(\?(<query>.+))?"
Well, yes, and infinitely easier to reason about (my brain froze several times writing it), but that's not the point.

Is there some way to support greedier matching or backtracking?
So far pattern functions as I envision them take the form <Function([ByRef p1 [, ByRef p2, ...]]) As Boolean>. Maybe there's some other form we could consider for strings (or maybe all enumerables?) that could let the match function too darn complicated.

This could be pretty hard to prototype and needs a lot of spec work.

@AdamSpeight2008
Copy link
Contributor

@AnthonyDGreen
Could you describe what should happen in the presence of variables with identifiers named the same as "parts" of the pattern? What if they are const or readonly?

Dim scheme = "http"
Dim rest   = "www.microsoft.com"
Select Case input
    Case Match $"{scheme}://{rest}" When scheme = "ms-word"
        
    Case Match $"{user}@{domain}" When domain <> "outlook.com"
        Throw New NotSupportedException("All must use outlook.com!")
    Case Match $"{*}.docx"
                
    Case Match $"{CInt(id)}|{CDate(date)}|{description}|{CDbl(total)}"
        ' CSVs are never this easy to parse. Sooner or later 75 rows in somebody gets clever.
End Select

@zspitz
Copy link

zspitz commented Aug 5, 2017

@AdamSpeight2008 In C#'s pattern matching, using a variable in the match clause with the same name as an outer-scope variable is a compiler error:
A local or parameter named 'scheme' cannot be declared in this scope because that name is used in an enclosing local scope to define a local or parameter.

@bandleader
Copy link

@bandleader
Copy link

BTW, not to complain, but Scala/Kotlin/Swift et al are fast becoming the new VB (not in BASIC syntax, but in the core VB values: expressiveness, sugar, aiding the programmer, etc.)

@zspitz
Copy link

zspitz commented Mar 22, 2020

I want to note one hesitation: currently VB.NET supports two forms of patterns for strings: Like operator patterns, and regular expressions. Adding interpolated strings will bring in a third way; isn't that a bit too many?

@aarondglover
Copy link

aarondglover commented Mar 27, 2020

@zspitz

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Jamie Zawinski

Regex is arguably the most powerful parsing method but I get tired of searching Google for the appropriate expression... And @AnthonyDGreen proposal is much more powerful than the LIKE operator.

I'd welcome an additional way to achieve this and as it is simpler in my mind than Regex I feel it is in keeping with VBs stated goals of remaining approachable.

Thoughts?

@zspitz
Copy link

zspitz commented Mar 27, 2020

@aarondglover

I'm not denying that this proposal would be very powerful; if it would originally have been available in the language it might have been an excellent choice. But once we have the Like syntax, I don't think it appropriate to add another syntax.

That said, perhaps there's some way to incorporate string pattern matching into the existing Like syntax? As @AnthonyDGreen noted in the original post,

Can we do something to modernize the VB Like operator instead?
Not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants