Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension API to find the public suffix (eTLD) of a given URL/domain #231

Open
Rob--W opened this issue Jun 17, 2022 · 18 comments
Open

Extension API to find the public suffix (eTLD) of a given URL/domain #231

Rob--W opened this issue Jun 17, 2022 · 18 comments
Assignees
Labels
proposal Proposal for a change or new feature supportive: chrome Supportive from Chrome supportive: firefox Supportive from Firefox supportive: safari Supportive from Safari

Comments

@Rob--W
Copy link
Member

Rob--W commented Jun 17, 2022

The public suffix list is a database of effective top-level domains (eTLD), which are the public suffix of URLs. This database is included in browsers (at least by Firefox, Chrome and Safari - sources below) and may be updated remotely. There have been feature requests for an API that allows extensions to identify the public suffix (eTLD) of a given URL:

There are solve known problems with the application of public suffix list (https://github.com/sleevi/psl-problems), but that does not necessarily rule out an extension API with such access. Extensions that need to rely on the public suffix list currently need to rely on alternatives, such as bundling the database with the extension, at the risk of having incompatible interpretations of the "public suffix of a URL" between the browser and the extension. With proper documentation of the problems associated with the public suffix list, extension authors can make a conscious decision to use the API when they need to.

This issue is to track use cases and the desired shape of the API. For example, the following would be the minimum:

let suffix = await browser.publicsuffix.getPublicSuffix("www.example.co.uk");
// Result: co.uk

Here are other examples of APIs to query the public suffix:

@oliverdunk
Copy link
Member

Similar proposal: #58

@xeenon
Copy link
Collaborator

xeenon commented Jun 23, 2022

I would be in favor of this for Safari.

@gijsk
Copy link

gijsk commented Jun 24, 2022

In favour of this proposal, in addition to the consistency issue that was pointed out in the meeting (ie if the extension and browser have a different version of the PSL, there's a potential for security issues), some other arguments for having this exposed as an API:

  • the PSL updates frequently (ie several times a month, sometimes several times a week), as besides TLDs it also covers things like AWS/GCP-style subdomains and informs cookie restrictions for hosted services like those. Having to push webextension updates every time it updates is burdensome on webextension developers and impacts bandwidth and storage use for the webextension distributor (no matter whether that's the browser or webextension vendor). Trying to avoid that by not pushing the PSL updates into the webextension until there's another reason to update the webextension leaves room for the first problem (ie discrepancy with the browser version). It also means that if the webextension is abandoned by the developer, it is forever stuck with an out-of-date copy of the PSL which causes more and more issues for unsuspecting users, even if the webextension "seems fine" in general. It impacts the reliability of the web for those users.
  • having each webextension keep its own copy has implications for bandwidth and disk space use for end users (the bandwidth/download side being especially problematic on mobile), as well as exponentially multiplying the risk of inconsistencies between the browser and the various webextensions that are installed (ie webextension A, B, and C can all have different versions, which are yet different from the browser's own). Although space- and CPU-efficient ways of storing and loading all the PSL data exist, they are easier for browsers than webextensions to implement given limited IO APIs available to webextensions. See also the next point...
  • the PSL is a security-relevant piece of web infrastructure at this point - despite all its flaws. Implementing support correctly is not entirely trivial, and having each webextension do it themselves is a recipe for issues (much like "roll your own crypto" is).
  • the PSL is not perfect, and if we abstract a reasonable API, we stand a chance of replacing it with something that addresses some of the problems. If we leave it all up to webextensions themselves, it will result in proliferated hardcoded dependencies on the PSL and make it more difficult to move away from it in future (ie even if browsers did move away from it, all the extensions would have to follow suit before site owners could stop caring about the PSL).

@rdcronin
Copy link
Contributor

I'd be favor of this for Chrome.

@rdcronin rdcronin added supportive: chrome Supportive from Chrome and removed follow-up: chrome Needs a response from a Chrome representative labels Mar 18, 2024
@Rob--W Rob--W added the supportive: firefox Supportive from Firefox label Mar 18, 2024
@zombie zombie self-assigned this Mar 18, 2024
@dotproto
Copy link
Member

To proceed with this issue, we need a more concrete proposal. Some points that a more fleshed out proposal would address include:

  • Some more investigation needed; e.g. include vs exclude private registries.
  • Consider additional methods.

@zombie
Copy link
Collaborator

zombie commented Mar 18, 2024

I'll follow up to see if Mozilla's multi-account-containers maintainers want to put up a proposal for the api shape here.

@Rob--W
Copy link
Member Author

Rob--W commented Mar 18, 2024

@oliverdunk is going to reach out to PSL maintainers to inform them of our intent to offer this API.

@Dreamsorcerer
Copy link

Dreamsorcerer commented Mar 21, 2024

some other arguments for having this exposed as an API:

Just to reiterate the points from my original request (#58) which covers many of the same arguments:

Issues with the current approach include:

This issue is to track use cases and the desired shape of the API.

As per the title of my original request, I think the most common scenario is to get the organisational domain, rather than the suffix alone. So, to save some manual string manipulation, it would be great for the API to include a function to get the organisational domain.

@oliverdunk
Copy link
Member

I have started an email thread with the maintainers of the PSL - will keep this thread updated.

@simon-friedberger
Copy link

Such a proposal should contain some guidance on what to do when the result changes, or at least a warning. This is a rare event so it is prone to getting overlooked.

@oliverdunk
Copy link
Member

As mentioned in a recent meeting, I met with Simon Friedberger (Mozilla) and Simone Carletti, both PSL maintainers. They were generally very supportive and would like to see this API. We agreed introducing an extension API is unlikely to generate a significant number of additions to the list, since developers are already using the PSL in other ways today, but that while volume is not a concern any education to maintain submission quality would be appreciated.

We also discussed several practical thoughts on the API signature / functionality:

  • There is a public section of the list (for TLDs) and a private section (including sites like github.io). A flag to specify which (or both) you are interested in would be helpful.
  • It would be interesting to be able to provide your own list. This isn't needed for an MVP.
  • Keeping the list up to date in browsers is important.
    • Related, we should expose a list version to developers. There isn't currently an official version but Simone took an action item to investigate adding this as it has come up in the past.
  • It would be interesting if you could access the original rule that influenced a particular result.

@erosman
Copy link

erosman commented May 23, 2024

Just as an idea for the API ....

Last year, I wrote a pure JavaScript PSL (Public Suffix List) module.

I looked at similar available methods in order to base the properties on.
The module outs 4 values of subdomain, domain, sld, & tld.

@Dreamsorcerer
Copy link

Dreamsorcerer commented May 23, 2024

Keeping the list up to date in browsers is important.

I would assume this part is already being done (maybe not in all browsers though..)? e.g. Firefox shows passwords on mail.google.com that were created at calendar.google.com or similar. I assume they must be using the PSL for such functionality.

Last year, I wrote a pure JavaScript PSL (Public Suffix List) module.

I don't think that code is correct (it doesn't appear to handle ! or * rules). There are several other examples online too.
Here's one I've done based on an existing solution, which in theory, should be a lot more optimised for performance: https://github.com/AiondaDotCom/trashmail-addon/blob/master/publicsuffixlist.js
But, does require preprocessing the list to a more optimal format for querying first: https://github.com/AiondaDotCom/trashmail-addon/blob/master/update_suffixes.py (Result: https://github.com/AiondaDotCom/trashmail-addon/blob/master/public_suffix.json)

@oliverdunk
Copy link
Member

I would assume this part is already being done (maybe not in all browsers though..)? e.g. Firefox shows passwords on mail.google.com that were created at calendar.google.com or similar. I assume they must be using the PSL for such functionality.

All browsers include the PSL (it is required for things like cookie handling), but updates aren't necessarily as frequent as would be ideal. I can only speak for Chrome where I understand it is currently a manual process we run every ~6 months.

@gijsk
Copy link

gijsk commented May 23, 2024

On the Firefox side, each build ships with a copy that is up-to-date at time of build, I believe. The update process for the source code is automated, cf. the commit log for the data file: https://hg.mozilla.org/mozilla-central/log/tip/netwerk/dns/effective_tld_names.dat . There was some attempt in the past to be able to update out-of-release-band (like safebrowsing and other similar services that update more frequently than the standard release cadence) but I think that stalled once we hit issues with how this changed origin parsing/serialization (and doing so while a multi-process browser is running while keeping all the processes aligned on that change is... not trivial). Cf. https://twitter.com/ValentinGosu/status/1510295473864728581

@Dreamsorcerer
Copy link

From my side maintaining a list in an extension, I'm satisfied if I remember to update once a year, so even a 6 monthly update seems good to me and a big improvement.

@dannycolin
Copy link

From my side maintaining a list in an extension, I'm satisfied if I remember to update once a year, so even a 6 monthly update seems good to me and a big improvement.

It also saves you from pushing it to all the addons' stores. Granted that most of the time it's a minor change that gets quickly reviewed and accepted. However, there's always a risk things takes more time or something else.

With a builtin API, we're just giving more peace of mind to the addon developers :).

@Rob--W
Copy link
Member Author

Rob--W commented Jul 24, 2024

FYI I asked the contributor who submitted a patch to Firefox before whether they're interested in creating a proposal according to our proposal process: https://bugzilla.mozilla.org/show_bug.cgi?id=1315558#c27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Proposal for a change or new feature supportive: chrome Supportive from Chrome supportive: firefox Supportive from Firefox supportive: safari Supportive from Safari
Projects
None yet
Development

No branches or pull requests