Replaces '.Replace' calls with a Regex #72

GStefanowich · 2024-03-25T18:37:22Z

I removed the block of .Replace calls from the AddressExtractor class.

Not that this is a corporate environment with a bunch of people editing over eachother, but I wrote the filter system so that there wasn't a spaghetti mess of things overlapping (Including people trying to make overlapping edits). Each filter either accepts or denies an email as valid.

I replaced the block of Replaces with a new Filter, so if an address starts with an illegal character ('!`{#\n\) it'll trim them from the start. (Rather than a full replace).

Then if there was a replace it'll return REVALIDATE which will refilter it against other filters to make sure it's still valid.

With filters if one filter removes an illegal character, eg -, and another filter checks for valid domains, then you might have something where you pass in the email @my-test.dom-ain. The first filter might remove our "illegal" - character which results with @mytest.domain, and then the second filter will deny .domain for being an invalid TLD.

For the remaining test you last added EmailAddressesInExtracted with the email foo|test@example.com|bar, the Filter system needs to be modified because you can only currently return one modified Email address. A filter could see | as an illegal character, run a .Split('|') for ["foo", "test@example.com", "bar"] and then run each through the filter system again, where foo and bar would be skipped.

troyhunt · 2024-03-25T23:27:09Z

With issue #61 in mind, wouldn't it be better just to treat any character not in the allow list as a terminating string that denotes either the start or the end of the address? I can't think of a downside to this, not given the specific criteria required for the email address pattern that occurs within those characters.

GStefanowich · 2024-03-26T04:35:15Z

That's ultimately up to you on how you want to handle it.

It'll muddle up the main Regex more, and goes a bit against the design of "Loose main Regex, filter out the bad results" by making the main Regex stricter.

It could perform better with less Regexes, or it could perform better by quickly capturing loose results and only running the "Heavy" Regexes fewer times. I'm not sure and not about to implement both ways to test.

GStefanowich added 2 commits March 21, 2024 12:45

Replaced the block of '.Replace's with a Regex replace

f9c66e3

Made more readable (Maintainability ftw)

1c615ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replaces '.Replace' calls with a Regex #72

Replaces '.Replace' calls with a Regex #72

GStefanowich commented Mar 25, 2024 •

edited

Loading

troyhunt commented Mar 25, 2024

GStefanowich commented Mar 26, 2024

Replaces '.Replace' calls with a Regex #72

Are you sure you want to change the base?

Replaces '.Replace' calls with a Regex #72

Conversation

GStefanowich commented Mar 25, 2024 • edited Loading

troyhunt commented Mar 25, 2024

GStefanowich commented Mar 26, 2024

GStefanowich commented Mar 25, 2024 •

edited

Loading