Skip to content

Commit

Permalink
Fixes "case-sensitive" URI matching for Disallow rules in robots.txt (#…
Browse files Browse the repository at this point in the history
…46)

* Fixes "case-sensitive" URI matching for Disallow rules in robots.txt

Based on Issue #45 (Robots.txt "Disallow" URI matching should be case-sensitive) I removed the use of `strtolower` in `parseDisallow` to preserve the URI's case sensitivity.

The issue was opened based on RFC standard by google which indicates:
"The value of the disallow rule is case-sensitive."
(Source: https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt?hl=en#disallow)

---

I ran PHP-Unit and all tests passed since none were specifically testing case-sensitivity. I added test the_disallows_uri_check_is_case_sensitive to cover this issue.

* Remove .idea files

---------

Co-authored-by: Matthew Kesack <matthew.kesack@coursehero.com>
  • Loading branch information
mattfo0 and matt-learneo committed Sep 25, 2024
1 parent 9533d45 commit 560d6d1
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 1 deletion.
2 changes: 1 addition & 1 deletion src/RobotsTxt.php
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ protected function parseUserAgent(string $line): string

protected function parseDisallow(string $line): string
{
return trim(substr_replace(strtolower(trim($line)), '', 0, 8), ': ');
return trim(substr_replace(trim($line), '', 0, 8), ': ');
}

protected function isDisallowLine(string $line): string
Expand Down
9 changes: 9 additions & 0 deletions tests/RobotsTxtTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,15 @@ public function the_disallows_user_agent_check_is_case_insensitive()
$this->assertFalse($robots->allows('/no-agents', strtolower('UserAgent007')));
}

/** @test */
public function the_disallows_uri_check_is_case_sensitive()
{
$robots = RobotsTxt::readFrom(__DIR__.'/data/robots.txt');

$this->assertFalse($robots->allows('/Case-Sensitive/Disallow'));
$this->assertTrue($robots->allows(strtolower('/Case-Sensitive/Disallow')));
}

/** @test */
public function it_can_handle_multiple_user_agent_query_strings()
{
Expand Down
1 change: 1 addition & 0 deletions tests/data/robots.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Disallow: /nl/admin/
Disallow: /en/admin/*
Disallow: /fr/admin$
Disallow: /es/admin-disallow/
Disallow: /Case-Sensitive/Disallow
User-agent: google

Disallow: /
Expand Down

0 comments on commit 560d6d1

Please sign in to comment.