Skip to content
This repository has been archived by the owner on Nov 6, 2023. It is now read-only.

improve make-trivial-rule #6868

Merged
merged 2 commits into from
Oct 21, 2016
Merged

improve make-trivial-rule #6868

merged 2 commits into from
Oct 21, 2016

Conversation

J0WI
Copy link
Contributor

@J0WI J0WI commented Sep 13, 2016

improve make-trivial-rule duplicate heuristic
add comment block for nonfunctional hosts

add comment block for nonfunctional hosts
@J0WI
Copy link
Contributor Author

J0WI commented Sep 13, 2016

This is a really useful tool to create simple rules, I think we should also recommend this in our guides.

We may also give a hint in this script to manually search for more subdomains. Or even providing the search link?

I decided to use the letters from the superscript minuscule Unicode table for common errors. This hopefully brings more consistent and order in our comments.
cc @loveithateit if I missed something there.

@jeremyn
Copy link
Contributor

jeremyn commented Sep 13, 2016

@J0WI I think it's great you did this work and I definitely like having a template top comment.

I don't like the superscript pattern because it's harder to read, and Unicode can be annoying. I'd much rather just use (h), (m) etc.

When you specifically list categories, contributors who encounter a problem that's not listed like incomplete certificate chain or some bizarre, rare HTTPS configuration problem have to figure out what to do. You could add an other problem category. Also I've been thinking that we should just roll self-signed, mismatch etc into a single invalid certificate category. We really don't need that specific breakdown and it's annoying to have to use multiple categories.

Should we ask contributors who submit a new ruleset with no problematic subdomains to delete the top comment?

Also, when I make new rulesets, I just copy an existing ruleset and edit it. I haven't used make-trivial-rule in a while. Just throwing that out there.

@J0WI
Copy link
Contributor Author

J0WI commented Sep 13, 2016

I don't like the superscript pattern because it's harder to read, and Unicode can be annoying. I'd much rather just use (h), (m) etc.

Personally I prefer Unicode symbols for readability. We also already have quite a few rules using them:

$ grep " ᵐ$" * | wc -l
1072
$ grep " m$" * | wc -l
59

The initial idea of the comments template was, that I was not able to type these characters on my keyboard and it was annoying to search them. So I wasn't using them.
I hope this will make it easier.

When you specifically list categories, contributors who encounter a problem that's not listed like incomplete certificate chain or some bizarre, rare HTTPS configuration problem have to figure out what to do. You could add an other problem category. Also I've been thinking that we should just roll self-signed, mismatch etc into a single invalid certificate category. We really don't need that specific breakdown and it's annoying to have to use multiple categories.

The best practice for mixed content or cert chain issues would be to use a platform.
For uncommon issues contributors will have to write an extra comment/exclusion anyway.

Should we ask contributors who submit a new ruleset with no problematic subdomains to delete the top comment?

I thought this is self-explaining, but we could easily do that.

@jeremyn
Copy link
Contributor

jeremyn commented Sep 13, 2016

I know we have a bunch of rulesets with this Unicode pattern, I've seen them and don't like it.

The Unicode is just plain hard to read for my eyeballs, even with a large monitor. Also I feel that almost anyone would agree that

Refused:
    a.example.com
    c.example.com

Self-signed:
    b.example.com

Timeout:
    d.example.com
    e.example.com

Is easier to write and review than:

    a.example.com ʳ
    b.example.com ˢ
    c.example.com ʳ
    d.example.com ᵗ
    e.example.com ᵗ

ʳ connection refused
ˢ self-signed certificate
ᵗ timeout on https

Also I really don't want to have to walk a contributor through using Unicode footnotes to comment something like migros.xml. What would that look like?

@J0WI
Copy link
Contributor Author

J0WI commented Sep 13, 2016

The sort-by-category solution is nice for small rulesets, but when you have many commented subdomains you have to sort each category separately.

When you have all comments of a host in a list, you can easy add new ones and then just sort the whole list. Therefore it's also easier to find a host in the list, because they are only sorted in alphabetical order and not spitted into categories.

Maybe you have more than one domain in one ruleset, then you need to sort it in three times in the first example and only two times in the second:

Refused:
    a.example.com
    c.example.com

    a.example.org
    c.example.org

Self-signed:
    b.example.com

    b.example.org

vs

    a.example.com ʳ
    b.example.com ˢ
    c.example.com ʳ

    a.example.org ʳ
    b.example.org ˢ
    c.example.org ʳ

ʳ connection refused
ˢ self-signed certificate

In the category list you also can't note multiple issues, e.g.:

    bad.example.com ⁴ʰᵐ

    ⁴ 404/diffrent content
    ʰ http redirect
    ᵐ certificate mismatch

(Means bad.example.com serves a mismatching cert, if you accept that it will redirect the wrong location and to http)

But let see how others think about it.
cc @fuglede @Hainish

@Hainish
Copy link
Member

Hainish commented Sep 20, 2016

I think there's a middle-ground here. I agree with the readability of non-superscripted characters, as well as the niceties of having one master list and the ability to tag multiple times for a given host. These are not mutually exclusive. Take the following:

    a.example.com (r)
    b.example.com (s)
    c.example.com (r)
    d.example.com (t)
    e.example.com (t)
    f.example.com (4,s)

r: connection refused
s: self-signed certificate
t: timeout on https
4: 40X response

Look good?

@J0WI
Copy link
Contributor Author

J0WI commented Sep 20, 2016

These are some more characters to type, but I'm also fine with that.

@jeremyn
Copy link
Contributor

jeremyn commented Sep 20, 2016

If we're going to use annotations then I agree regular ASCII is better than Unicode superscripts.

Does (4, s) actually come up in practice? Who would discover that?

I think the exhaustive list of categories ought to be this, with whatever names we find appropriate:

  • Site doesn't load (includes refused, timeout, secure connection failed)
  • Site loads, invalid certificate (includes expired, self-signed, mismatch, incomplete certificate chain)
  • Site loads, bad mixed content blocking
  • Site loads, HTTPS -> HTTP redirect loop

That's it. Possibly incomplete certificate chain should be split from invalid certificate because an incomplete chain can require specific testing to see.

Also see the discussion here.

@Hainish
Copy link
Member

Hainish commented Sep 20, 2016

@jeremyn I think we should settle on some set of standard categories and note them in https://github.com/EFForg/https-everywhere/blob/master/ruleset-style.md. But I don't think we should prematurely limit ourselves to a predefined set and prevent ourselves from adding more. Who knows what situations will come up. For instance, as SHA-1 certs become deprecated, if the StartSSL SHA-1 heisenbug presents a problem for many of the subdomains on a specific ruleset, we'll want to note that. So I don't see any reason not to create a new category for it.

@jeremyn
Copy link
Contributor

jeremyn commented Sep 20, 2016

If we're going to pre-list categories in the trivial rule template, the list of categories should minimal, obvious, and complete. make-trivial-rule is for new contributors who aren't sure what to do and are nervous about breaking things. For obscure problems we can walk them through how to document it but that should be unnecessary in most cases.

@J0WI
Copy link
Contributor Author

J0WI commented Sep 21, 2016

I plan to add a CONTRIBUTING.md file when this one is done.

@jeremyn jeremyn mentioned this pull request Sep 22, 2016
@Hainish
Copy link
Member

Hainish commented Sep 26, 2016

With CONTRIBUTING.md (which I think is a great idea) perhaps we can include the canonical, full list there, and have some common subset included in make-trivial-rule.

@J0WI
Copy link
Contributor Author

J0WI commented Oct 1, 2016

I just updated this without Unicode chars.

@J0WI J0WI mentioned this pull request Oct 1, 2016
@Hainish Hainish merged commit 028a42f into EFForg:master Oct 21, 2016
@Hainish
Copy link
Member

Hainish commented Oct 21, 2016

lgtm!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants