Skip to content
This repository has been archived by the owner on Sep 18, 2021. It is now read-only.

Add test case for 2 letter top level domains #59

Open
chmac opened this issue Sep 26, 2013 · 2 comments
Open

Add test case for 2 letter top level domains #59

chmac opened this issue Sep 26, 2013 · 2 comments

Comments

@chmac
Copy link

chmac commented Sep 26, 2013

I've been testing a little by posting to twitter, and domains like neustar.us and cal.io and github.io are not auto linked. However, domains like chmac.com and army.mil are. These differences are not covered in the test cases.

I tried to dig into the javascript implementation to see what the actual behaviour is, but I spent enough time on PHP regexs today, maybe another day!

I just did another test and a domain foo.dd.uk is auto linked, even though it's a non existent second level domain. While gov.uk which is a valid domain is not linked, but www.gov.uk is.

I'd hazard a guess and say that any 2 letter top level domain (a.xx) is not linked, while domains like a.xxx.xx or a.xx.xx are linked. Ironically, t.co is not linked!

@jakl
Copy link
Contributor

jakl commented Sep 27, 2013

Good sleuthing! Signal to noise in tweets is often more towards noise so we only autolink if we're mostly certain it's a URL; it could be an emoticon or internet meme or have meaning in another language. Some domains are treated as especially strong signals like .com

For now, you're right, we should have clear tests. Later we can revisit which domains should link and how changes would affect a sample set of tweets.

@chmac
Copy link
Author

chmac commented Sep 27, 2013

A little more sleuthing later, it looks like on both the API and the javascript frontend on twitter.com, the following happens:

  • Any first level valid CC domain is not linked github.io
  • Any second level valid CC domain is linked www.github.io or chmac.github.io
  • Any domain with a non existent CC TLD is not linked, so github.pp, www.github.pp or chmac.github.pp
  • Any first or second level valid global or US domain is linked github.aero, github.museum, github.mil or github.edu, or chmac.github.museum

There's presumably a list of valid TLDs in the js and other codebases. It's probably possible to tests from that.

I'd suggest that domains like github.io probably should be linked automatically, because a non existent domain like github.pp wouldn't be linked anyway, so things like that.ll wouldn't be linked, but that's a decision for somebody else to make.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants