Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subdomain support for CIDs longer than 63 #7318

Open
lidel opened this issue May 14, 2020 · 26 comments · Fixed by multiformats/multibase#65
Open

Subdomain support for CIDs longer than 63 #7318

lidel opened this issue May 14, 2020 · 26 comments · Fixed by multiformats/multibase#65
Labels
kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up status/in-progress In progress topic/cidv1b32 Topic cidv1b32 topic/ed25519 Issues related to ed25519 Peer IDs topic/gateway Topic gateway
Milestone

Comments

@lidel
Copy link
Member

lidel commented May 14, 2020

I hoped to punt this until we need to switch away from sha256 in CIDs, but we may need to solve this problem sooner than expected due to ED25519 keys being new default soon (#6916)

Problem: DNS label limit of 63

RFC 1034: "each node has a label, which is zero to 63 octets in length"

The default CIDv1 Base32 with multihash of sha256 and RSA libp2p-key fits:

but if we use ED25519 libp2p-key then we are 2 characters over the limit:

Label longer than 63 characters means the hostname can't resolve:

$ ping bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipns.dweb.link
ping: bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipns.dweb.link: Name or service not known

And links are not picked up by tools like Slack:

oops-2020-05-14--17-59-27

Note: I used ED25519 as an example, but not limited to that single type of CID. Even if we find a way to fit ED25519 in a single label, the problem remains for CIDs with a multihash created with longer hash functions.

Solved: IPNS-specific fix for ED25519 keys

In parallel to the generic fix, we could represent ED25519 keys in a way that fits under 63 characters, solving the UX issue for IPNS websites loaded from public gateways.

Done: #7441 – we support {cidv1base36}.ipns.dweb.link which perfectly fits

Open Problem: generic solution for long CIDs

I am happy to open PR with a fix, but unsure if I have the best fix in mind, would love to gather feedback first.

❓ (A) support split CIDs (but have broken TLS)

The first idea I have is to split the label when the max is reached.
To maximize entropy for Origin isolation, the remainder should be on the left side:

Pros:

  • 👍 each long CID gets own Origin – we keep isolation
  • 👍 path redirect provided by subdomain gateway can take care of splitting
  • 👍 future-proof solution for longer hashes such as sha2-512
    • the next limit is pretty far away: the maximum length of full domain name: 253 characters, including dots
    • sha2-512 on dweb.link is 121 characters

Cons:

  • 💢 decreased entropy in security guarantees provided by origin isolation
  • 💢 wildcard TLS certificate does not pass validation for more than a single level of labels
  • 💢 copying & pasting CID as-is no longer works on public gateways (user needs to put . in the middle etc)
    • Note: to make it easier UX-wise, we should allow . anywhere inside of CID, but internally merge labels, and return a redirect to canonical version that splits at deterministic position (enforcing maximum label for Origin).

❓ (B) redirect long CIDs to an "insecure" subdomain

This would make it possible for content to load, but longer CIDs would not get Origin isolation per CID.

To make this bit more clear and idiomatic, we could present this as "cross origin resource sharing" endpoint that allows both CORS requests + supports loading everything from a single origin + has paths locked down in browsers like noted in ipfs/in-web-browsers#157.

Think in terms of

  • https://dweb.link/ipfs/superlongcid redirecting to https://cors.dweb.link/superlongcid

Pros:

  • 👍 does not break TLS wildcard certs (easy setup for gateway operators)
  • 👍 useful outside this problem: provides idiomatic way for exposing path gateway on subdomain gateways (for use when origin isolation is not needed)

Cons:

  • 💢 long CIDs don't get Origin isolation

❓ (C) swap DAG root with CID that uses shorter hash function

Pros:

  • 👍 "just works"

Cons:

  • 💢 decreased entropy
  • 💢 newly created root blocks need to be persisted somehow: if I bookmark the page loaded via shortened CID and then the root block gets garbage-collected, the address is dead.
    • potential fix: we could always create redundant sha256 root block for every DAG that uses longer hash function for interop

❓ (D) leverage HTTP proxy mode (on localhost)

When Gateway port is used as HTTP proxy, local client does not perform DNS lookup, but original URL is sent in HTTP request to the proxy for processing.

Because HTTP proxy IS go-ipfs node in that scenario, it does not do DNS lookup, but extract original (long) CID and resolves it, without involvement of DNS.

As long user agents are not overzealous in validating URLs, this would allow for long (>63) CIDs on subdomains.

This is important, because it enables localhost gateway (used by Brave) to resolve long CIDs correctly without any additional hacks.

UX details tbd. This could be the solution for localhost gateway, but for public ones we still need something else.

Other ideas?

Would love to find a better way to work around this

cc @aschmahmann @Stebalien ipfs/in-web-browsers#89

@lidel lidel added kind/bug A bug in existing code (including security flaws) topic/gateway Topic gateway need/triage Needs initial labeling and prioritization topic/cidv1b32 Topic cidv1b32 topic/ed25519 Issues related to ed25519 Peer IDs labels May 14, 2020
@ribasushi
Copy link
Contributor

ribasushi commented May 14, 2020

It's a bit unfortunate that keys are so overly verbose: https://cid.ipfs.io/#bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk

It looks like we have an actual protobuf construct inside the raw bytes. Is this... something we need to do?

If we shave off 2 bytes, nothing extra needs to be done...

@lidel
Copy link
Member Author

lidel commented May 14, 2020

I am afraid even if we find a hacky workaround for libp2p-keys in ED25519, the problem remains for CID that use longer hash functions than sha256.

@Stebalien
Copy link
Member

For context, we're trying to encode 40 bytes into 62 characters (with one character for the multibase prefix).

I believe base36 would work, if that's an option. That should give us exactly 63 characters.

We could change how we encode these peer IDs in text and use an ed25519 specific codec (<cidv1>-<ed25519>-<multihash>). That would still be a reasonable encoding of an ed25519 CID but I'd prefer to avoid it.

@Stebalien
Copy link
Member

But I agree we should support longer keys regardless. But will this be a problem for TLS certs? Can we get a double-star cert?

@lidel
Copy link
Member Author

lidel commented May 14, 2020

  • I am not aware of any CA that provides double wildcard certs.
    That is why ENS gateway still has the TLS warning (example: https://blog.almonit.eth.link).

  • Switching the default text representation of PeerID to Base36 would introduce work across ecosystem to bubble up support (missing from multibase.csv atm) and its not as popular as RFC version of Base32. Not sure what's lesser evil, that, or a new codec.

@Stebalien
Copy link
Member

@aschmahmann and I discussed this and it is possible to shrink ed25519 pids, but it's painful and requires coordination with all libp2p implementations.

To shrink ed25519 keys, we need to:

  1. Encode them as <cidv1>-<ed25519>-<multihash> in text. This will reduce the id size to 36 bytes (from 40).
  2. Ideally, migrate to CIDs on the wire in libp2p. That would save us 10% on the wire for ed25519 keys and make it easier to interoperate with other p2p networks (because we could use their native key formats instead of wrapping them in protobufs before hashing).

Unfortunately, if we want to get 1 in the near future, we'd make it significantly harder to get 2. Basically, if we start using the new ed25519 pid encoding now, we'd have to convert back to the normal pid binary format (raw multihash) when decoding. However, if/when we decide to use CIDs as the binary pid format, we'd have trouble round-tripping.

That is, in the ideal world, if we encounter a text-based PID as a CID:

  • If it uses the libp2p-key multicodec, it's a legacy peer ID. Encode it as a multihash on the wire.
  • If it uses any other multicodec, it's a new peer ID. Encode it as a CID on the wire.

However, if we implement 1 before 2, we'd have to encode legacy keys in this new CID format. When converting back, we'd end up with the wrong "on the wire" format.

@MichaelMure
Copy link
Contributor

* I am not aware of any CA that provides double wildcard certs.

This seems to not be possible: https://serverfault.com/a/946120

@MichaelMure
Copy link
Contributor

^ might have been closed a bit eagerly by github.

So am I correct to assume that multi-subdomain is not considered anymore ? That'd be nice as it would be a pain to host with TLS due to the certificate limitation.

@ribasushi
Copy link
Contributor

@MichaelMure yeah, github is too eager indeed. Yes, this is precisely why we went with b36 - to keep TLS possible for the time being.

@lidel
Copy link
Member Author

lidel commented May 22, 2020

We've met yesterday and came up with next steps to
always resolve CIDs over DNS and have no TLS errors when current defaults/ED25519 keys are used:
(1) solve TLS problem for IPNS with ED25519 keys
(2) make it possible to load longer CIDs

Notes at: ipfs/team-mgmt#1159 – early feedback / questions appreciated!

@MichaelMure
Copy link
Contributor

Could you explain what (2) is in more details ? This document mainly discuss IPNS.

@lidel
Copy link
Member Author

lidel commented May 22, 2020

@MichaelMure see ipfs/team-mgmt#1159 (comment)
Note: it won't be needed for defaults, but will make it possible to load custom CIDs if someone has to use longer hashes for some reason.

@MichaelMure
Copy link
Contributor

Alright. Due to the TLS problem, Infura in unlikely to support that but I suppose that sort of OK as it should be a very rare usecase.

@Stebalien
Copy link
Member

Well, the hope is that use of companion and/or native IPFS support is wide-spread before that ever becomes an issue...

@Stebalien Stebalien removed the need/triage Needs initial labeling and prioritization label May 22, 2020
lidel added a commit that referenced this issue May 25, 2020
This adds subdomain gateway support for CIDs longer than 63 characters.
CID is split after reaching 63 character limit counting from right to
left. Requests made with random splits are redirected to canonical split
version to ensure every CID gets exactly one Origin.

Ref.
- https://tools.ietf.org/html/rfc1034#page-7
- #7318

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
@bmwiedemann
Copy link
Contributor

bmwiedemann commented May 28, 2020

I found an interesting Proposed Standard https://tools.ietf.org/html/rfc4343#section-2.2
that suggests that there may be 230=256-26 different usable byte values in DNS hostnames.
But I guess in practice, many servers and clients will not support these as part of FQDNs.

@lidel
Copy link
Member Author

lidel commented Jun 8, 2020

Leveraging RFC4343 is a no-go – no browser support afaik..

FYSA I've talked with @Stebalien last week, and we are re-evaluating.

None of us is happy with ramifications of splitting into multiple DNS labels, originally proposed in #7358. It will cause us troubles with TLS in the future, and the ultimate goal of subdomain gateways is seamless UX in web browsers.

Decided to look into alternative approach that prioritizes UX in user agents and removes the problem of TLS errors caused by more than one level of wildcards: #7441

@Stebalien
Copy link
Member

@lidel can we close this?

@lidel
Copy link
Member Author

lidel commented Jun 7, 2021

No, we need to solve this in a way that enables people to load all CIDs, no matter what gateway type is used.

Right now, subdomains are limited to subset of CIDs: https://dweb.link/ipfs/bafkriqdv2ut4g2hs57uer3hwwbz2gz3hqaeal2po6kyyk7k7tbhqg3vw36er25pxfwnrkriyyhgvra2sq3i5vgry325d32mlljj6l3lyvbexmCID incompatible with DNS label length limit of 63

Hot take: our options are limited here, could be that that longer CIDs end up on a separate subdomain with the same sandboxing / local storage / api limitations as ones proposed for path gateway (ipfs/in-web-browsers#157). Those would not work as website roots, but would be fine for loading other types of content.

lidel added a commit to ipfs/ipfs-webui that referenced this issue Sep 6, 2021
We can't use dweb.link as the default until
ipfs/kubo#7318 is open.

Default gateway should be able to open all CIDs, and dweb.link is
limited to 63char ones max.
@BigLep BigLep unassigned lidel Mar 3, 2022
@BigLep BigLep added this to the TBD milestone Mar 3, 2022
@Winterhuman
Copy link
Contributor

Winterhuman commented Apr 16, 2022

Just wanted to add to this discussion with an idea, what if you used queries to hold the ID of the CID, e.g.

bafkreievmw4c7yvuhvxt4qjcgqz4nsejxrw4wy4xkhtq54dc62ptceu6xq becomes:

vmw4c7yvuhvxt4qjcgqz4nsejxrw4wy4xkhtq54dc62ptceu6xq.ipfs.dweb.link/?id="bafkreie" (or maybe keeping the multihash ID in the subdomain is better)

Only CIDs for the same content can share the multihash subdomain, so subdomain isolation should be maintained. (unless I'm missing something major, in which case correct me)

(Also, I think topic/ed25519 can be removed)

@lidel
Copy link
Member Author

lidel commented Apr 19, 2022

@Winterhuman
Copy link
Contributor

Winterhuman commented Oct 21, 2022

As another option, using CIDv2 (ipfs/specs#305) may allow for "case-insensitive" CIDs which are actually case-sensitive when parsed.

The difference between foo and FOO can be expressed as 000 and 111, where 0 is lowercase and 1 is uppercase, so if CIDs had metadata to describe their casing, then you could do case-insensitive versions of case-sensitive encoded CIDs. e.g.

CIDv1 doesn't fit, but is case-insensitive: id...long-cid
CIDv1 fits, but is case-sensitive: ID...LoNg-CiD
CIDv2 fits, and is case-insensitive: id+metadata...long-cid (or wherever the metadata for CIDv2 will be placed)

The advantage is that the CID metadata changes the CID slightly, so each CID will still have Origin Isolation. But, if the metadata itself gets too long, then extremely long CID strings will still be too big, however, encoding the case-binary efficiently to take the minimal space should make the limit pretty high in theory.

@lidel
Copy link
Member Author

lidel commented Oct 21, 2022

@Winterhuman how you can fit sha512 in proposed CIDv2 and have no more than 63 characters?
Are you suggesting using a different (weaker) hash like sha256 to point at the stronger one sha512?
If so, I am afraid that is not a fix, just a workaround – you are decreasing security of use cases that need longer hashes.

@Winterhuman
Copy link
Contributor

Winterhuman commented Oct 23, 2022

No, that's already described in option C. As in encode a SHA512 CID using a case-sensitive encoding, like base58btc. Then, you store the casing of the characters as metadata, e.g.

zYAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt

Could be z+metadata+yajkonbau5kiqmhpmsxycvn66da1vlmwbt, where the metadata bytes describe the casing to apply to the all-lowercase multicodec + multihash characters to make it the original case-sensitive encoding, and since the metadata changes the CID slightly each casing would be a unique CID. One complication is that you'd need the metadata to be encoded as case-insensitive inside the case-sensitive CID in order for it to be read

@Winterhuman
Copy link
Contributor

Either that or you could nest a case-sensitive CIDv1 inside a multibase-esque multiformat so it's constructed like:

<multicasing code><multicasing bytes (variable)><multibase><multicodecs>...<multihash digest>

That'd get around having to encode the casing metadata inside the case-sensitive encoding itself, but, requires making a new multiformat or modifying multibase significantly

@MicahZoltu
Copy link

Couldn't you go with the splitting option, but instead of putting the remainder in a subdomain, you put it in the path?
Instead of:

https://ba.fzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipfs.dweb.link/

Do:

https://fzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipfs.dweb.link/remainder/ba/

This is an annoying UX, but it preserves as much subdomain isolation as is possible with 63 characters and doesn't result in TLS wildcard problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up status/in-progress In progress topic/cidv1b32 Topic cidv1b32 topic/ed25519 Issues related to ed25519 Peer IDs topic/gateway Topic gateway
Projects
No open projects
Status: 🥞 Todo
Development

Successfully merging a pull request may close this issue.

8 participants