Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early design review for the FLoC API #601

Closed
1 task done
xyaoinum opened this issue Jan 25, 2021 · 17 comments
Closed
1 task done

Early design review for the FLoC API #601

xyaoinum opened this issue Jan 25, 2021 · 17 comments
Assignees
Labels
privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review Review type: CG early review An early review of general direction from a Community Group Topic: privacy Venue: WICG

Comments

@xyaoinum
Copy link

xyaoinum commented Jan 25, 2021

HIQaH! QaH! TAG!

I'm requesting a TAG review of the FLoC API.

In today's web, people’s interests are typically inferred based on observing what sites or pages they visit, which relies on tracking techniques like third-party cookies or less-transparent mechanisms like device fingerprinting. User privacy could be better protected if interest-based advertising could be accomplished without needing to collect a particular individual’s exact browsing history.

The FLoC API would enable ad-targeting based on the user’s general browsing interest, without the websites knowing their exact browsing history.

Please read the Security and Privacy self-review for the privacy goals and concerns.

Further details:

  • I have reviewed the TAG's API Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
  • The group where standardization of this work is intended to be done ("unknown" if not known): Unknown
  • Existing major pieces of multi-stakeholder review or discussion of this design: Unknown
  • Major unresolved issues with or opposition to this design: None at the moment
  • This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

☂️ open a single issue in our GitHub repo for the entire review

💬 leave review feedback as a comment in this issue and @-notify @xyaoinum, @jkarlin, @michaelkleber

@xyaoinum xyaoinum added Progress: untriaged Review type: CG early review An early review of general direction from a Community Group labels Jan 25, 2021
@torgo torgo assigned torgo and rhiaro Feb 9, 2021
@torgo torgo added privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. Topic: privacy Venue: WICG and removed Progress: untriaged labels Feb 9, 2021
@torgo torgo added this to the 2021-02-15-week milestone Feb 9, 2021
@plinss plinss self-assigned this Feb 10, 2021
@atanassov atanassov self-assigned this Feb 15, 2021
@torgo
Copy link
Member

torgo commented Feb 22, 2021

One thing we are particularly concerned about is the topic of "sensitive categories." As we wrote in the Ethical Web Principles, the web should not cause harm to society. Members of marginalised groups can often be harmed simply by being identified as part of that group. So we need to be really careful about this. Can you provide some additional information about possible mitigations against this type of misuse?

@rhiaro
Copy link
Contributor

rhiaro commented Feb 23, 2021

Sensitive categories

The documentation of "sensitive categories" visible so far are on google ad policy pages. Categories that are considered "sensitive" are, as stated, not likely to be universal, and are also likely to change over time. I'd like to see:

  • an in-depth treatment of how sensitive categories will be determined (by a diverse set of stakeholders, so that the definition of "sensitive" is not biased by the backgrounds of implementors alone);
  • discussion of if it is possible - and desirable (it might not be) - for sensitive categories to differ based on external factors (eg. geographic region);
  • a persistent and authoritative means of documenting what they are that is not tied to a single implementor or company;
  • how such documentation can be updated and maintained in the long run;
  • and what the spec can do to ensure implementers actually abide by restrictions around sensitive categories.

Language about erring on the side of user privacy and safety when the "sensitivity" of a category is unknown might be appropriate.

Browser support

I imagine not all browsers will actually want to implement this API. Is the result of this, from an advertisers point of view, that serving personalised ads is not possible in certain browsers? Does this create a risk of platform segmentation in that some websites could detect non-implementation of the API and refuse to serve content altogether (which would severely limit user choice and increase concentration of a smaller set of browsers)? A mitigation for this could be to specify explicitly 'not-implemented' return values for the API calls that are indistinguishable from a full implementation.

The description of the experimentation phase mentions refreshing cohort data every 7 days; is timing something that will be specified, or is that left to implementations? Is there anything about cohort data "expiry" if a browser is not used (or only used to browse opted-out sites) for a certain period?

Opting out

I note that "Whether the browser sends a real FLoC or a random one is user controllable" which is good. I would hope to see some further work on guaranteeing that the "random" FLoCs sent in this situation does not become a de-facto "user who has disabled FLoC" cohort.

It's worth further thought about how sending a random "real" FLoC affects personalised advertising the user sees - when it is essentially personalised to someone who isn't them. It might be better for disabling FLoC to behave the same as incognito mode, where a "null" value is sent, indicating to the advertiser that personalised advertising is not possible in this case.

I note that sites can opt out of being included in the input set. Good! I would be more comfortable if sites had to explicitly opt in though.

Have you also thought about more granular controls for the end user which would allow them to see the list of sites included from their browsing history (and which features of the sites are used) and selectively exclude/include them?

If I am reading this correctly, sites that opt out of being included in the cohort input data cannot access the cohort information from the API themselves. Sites may have very legitimate reasons for opting out (eg. they serve sensitive content and wish to protect their visitors from any kind of tracking) yet be supported by ad revenue themselves. It is important to better explore the implications of this.

Centralisation of ad targeting

Centralisation is a big concern here. This proposal makes it the responsibility of browser vendors (a small group) to determine what categories of user are of interest to advertisers for targeting. This may make it difficult for smaller organisations to compete or innovate in this space. What mitigations can we expect to see for this?

How transparent / auditable are the algorithms used to generates the cohorts going to be? When some browser vendors are also advertising companies, how to separate concerns and ensure the privacy needs of users are always put first?

Accessing cohort information

I can't see any information about how cohorts are described to advertisers, other than their "short cohort name". How does an advertiser know what ads to serve to a cohort given the value "43A7"? Are the cohort descriptions/metadata served out of band to advertisers? I would like an idea of what this looks like.

Security & privacy concerns

I would like to challenge the assertion that there are no security impacts.

  • A large set of potentially very sensitive personal data is being collected by the browser to enable cohort generation. The impact of a security vulnerability causing this data to be leaked could be great.
  • The explainer acknowledges that sites that already know PII about the user can record their cohort - potentially gathering more data about the user than they could ever possibly have access to without explicit input from the user - but dismisses this risk by comparing it to the status quo, and does not mention this risk in the Security & Privacy self-check.
  • Sites which log cohort data for their visitors (with or without supplementary PII) will be able to log changes in this data over time, which may turn into a fingerprinting vector or allow them to infer other information about the user.
  • We have seen over past years the tendency for sites to gather and hoard data that they don't actually need for anything specific, just because they can. The temptation to track cohort data alongside any other user data they have with such a straightforward API may be great. This in turn increases the risk to users when data breaches inevitably occur, and correlations can be made between known PII and cohorts.
  • How many cohorts can one user be in? When a user is in multiple cohorts, what are the correlation risks related to the intersection of multiple cohorts? "Thousands" of users per cohort is not really that many. Membership to a hundred cohorts could quickly become identifying.
  1. How do the features in this specification work in the context of a browser's Private Browsing or Incognito mode?

The behavior is the same as if the interest cohort is invalid/null in a regular browsing mode, i.e. an exception will be thrown.

To clarify - does this mean that sites calling the API would receive an invalid/null result? In what circumstances in regular browsing mode is this the case? When a user hasn't been assigned to a valid cohort yet? Is that a common enough case that the probability of a 'null' result being due to use of incognito mode is relatively low? (Sites should not be able to detect the use of incognito mode.)

Q14 is missing a response about how the browser gathers inputs for cohort calculating in incognito mode. I assume it gathers no data at all, but it would be good to say that explicitly.

Thanks!

@plinss plinss added the Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review label Feb 24, 2021
@torgo
Copy link
Member

torgo commented Mar 8, 2021

Hi @xyaoinum - do you have anything you can share with us in response to the above points? It would be good to understand where we go from here. How would you like to proceed? At this point we are waiting for your feedback. /cc @chrishtr.

@xyaoinum
Copy link
Author

xyaoinum commented Mar 8, 2021

Hi @torgo, @rhiaro: Thank you for your questions and comments. We're still thinking through them and we hope to respond to these points within a week or two.

@torgo
Copy link
Member

torgo commented Mar 9, 2021

Thanks @xyaoinum. Just to follow up, you are probably also aware of the EFF article which makes many of the same points from Amy's feedback. Despite the incendiary headline, please have a look through this feedback and take this on board as EFF is an important and credible stakeholder organisation when it comes to security & privacy on the web.

@kuro68k
Copy link

kuro68k commented Mar 11, 2021

* The explainer acknowledges that sites that already know PII about the user can record their cohort - potentially gathering more data about the user than they could ever possibly have access to without explicit input from the user - but dismisses this risk by comparing it to the status quo, and does not mention this risk in the Security & Privacy self-check.

Just to add, I don't think this is an accurate description of the status quo, and any response should acknowledge that. Particularly in the last few years, efforts have been made to deny sites behaviour and interest data from sources like 3rd party cookies and browser history detection via Javascript. One of the major motivations behind this has been the ability to combine it with PII for purposes that users consider unacceptable.

At the very least this description of the status quo needs to be justified before use.

@kuro68k
Copy link

kuro68k commented Mar 12, 2021

To clarify - does this mean that sites calling the API would receive an invalid/null result? In what circumstances in regular browsing mode is this the case? When a user hasn't been assigned to a valid cohort yet? Is that a common enough case that the probability of a 'null' result being due to use of incognito mode is relatively low? (Sites should not be able to detect the use of incognito mode.)

I don't think this can be relied upon. Any change in behaviour can be used for tracking, and the null result is itself a cohort.

A randomly selected cohort would be better. In fact it would be overall better if the browser selected a number of possible cohorts that fit the user's profile and randomly selected one in normal operation. Otherwise cohort membership will change too slowly to prevent it being used for tracking.

The real problem is sites that already hold PII. There is no way I can think of it detect that and frustrate it, and as it stands FLoC is simply giving such sites more information that they would otherwise be able to gather with current default tracking protections in major browsers.

@lknik
Copy link
Member

lknik commented Mar 15, 2021

@rhiaro

To clarify - does this mean that sites calling the API would receive an invalid/null result?

Thanks for this review. I'm happy that the TAG is continuing the tradition of broad security-privacy aspects :-)

In the meantime, perhaps this answers the concerns regarding incognito.

@lknik
Copy link
Member

lknik commented Mar 25, 2021

Hello again,

Not sure if this belongs to this review, but I sure hope that the final FloC will not have the potential of leaking web browsing history (which is not mentioned in the S&P questionnaire).

@michaelkleber
Copy link

Hi @lknik! The 50-bit SimHash values that you're calculating get masked down to many fewer bits before being used to pick your flock. It's designed for lots of collision — each cohort will cover thousands of people with hundreds of different browsing histories.

@lknik
Copy link
Member

lknik commented Mar 26, 2021

@michaelkleber Can we then learn exactly what is the bit size and how it's defined? Would be great to have a full writeup to understand this proposal entirely.

@kuro68k
Copy link

kuro68k commented Mar 26, 2021

It seems like designing the SimHash to be resilient against all kinds of analysis, to prevent information about the user's browsing history being leaked, is likely to be extremely difficult.

To prove it to be robust it would need to undergo extensive mathematical analysis, a very specialist subject that would probably require paying some academics to work on it. It should be externally validated.

@samuelweiler
Copy link

It's possible there's some confusion about the TAG's suggestion re: incognito mode.

@torgo torgo added Review type: Already shipped Already shipped in at least one browser privacy-needs-resolution Issue the Privacy Group has raised and looks for a response on. and removed privacy-needs-resolution Issue the Privacy Group has raised and looks for a response on. labels May 11, 2021
@rhiaro
Copy link
Contributor

rhiaro commented May 13, 2021

Hello, we looked at this again during our virtual face-to-face this week. I haven't seen a response to the points in my earlier feedback yet, and we also note that there has been a lot of community discussion about the potential negative implications of this work both for end-user privacy, and for the ad-supported sites which might depend on it. We're particularly concerned that FLoC is already being trialed, despite a lot of this feedback remaining unaddressed. We would be happy to arrange a call with you to discuss further, if that would help.

@jkarlin
Copy link

jkarlin commented Aug 10, 2021

Sorry for the very long delay in response. The delay was mostly due to the fact that your feedback, in concert with feedback from other parts of the community convinced us that we should take another go at the design. When we post the updated design, I will address the remaining relevant questions and concerns here.

Note that it might make sense to remove the "already shipped" tag as it was in an Origin Trial only which has since ended.

@rhiaro
Copy link
Contributor

rhiaro commented Aug 11, 2021

Thanks @jkarlin! We'll close this issue for now then. Please either reopen this one with updates, or open a new design review when you have a new design.

@rhiaro rhiaro closed this as completed Aug 11, 2021
@rhiaro rhiaro removed the Review type: Already shipped Already shipped in at least one browser label Aug 11, 2021
@jkarlin
Copy link

jkarlin commented Mar 25, 2022

To close the loop, I've opened a review in #726 for the Topics API that replaces FLoC. In that issue, I responded to the questions that were asked here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review Review type: CG early review An early review of general direction from a Community Group Topic: privacy Venue: WICG
Projects
None yet
Development

No branches or pull requests

10 participants