instruct search engine not to crawl /catalog folder #3585

pvgenuchten · 2019-02-14T13:00:36Z

Reproduce:
Search on google for "mdView.current.record.title"
-> you'll find hits from templates in various gn implementations

Fix it by excluding /catalog folder from google indexing in robots.txt
Would be good to hide also other pages for crawler, such as /doc/api

disallow: /catalog/*

Note that robots.txt included in gn will only work if you deploy geonetwork at a root folder. If not, then create your own robots.txt in root folder. In that scenario include /geonetwork in the disallowed path.

Read more at https://webmasters.stackexchange.com/questions/89395/can-robots-txt-be-in-a-servers-sub-directory and http://www.robotstxt.org

The text was updated successfully, but these errors were encountered:

* Some search engines are indexing the Angular JS HTML template files. Exclude the ${context_path}/catalog and ${context_path}/static from being crawled by robots. Fix geonetwork#3585.

…vided (#7327) * Return 200 OK for robots.txt and sitemap. Return 200 OK for /robots.xt and /srv/api/sitemap instead of 500 "service not found". Previously if the request didn't contain "Accept: text/plain" or "Accept: application/xml" the server returned a 500 error. Now the server accepts any "Accept" header without complaining, returning a 200 response with "Content-Type: text/plain" or "Content-Type: application/xml" and the right content. * Disallow /catalog and /static in robots.txt. Some search engines are indexing the Angular JS HTML template files. Exclude the ${context_path}/catalog and ${context_path}/static from being crawled by robots. Fix #3585.

…s provided (#7327) * Return 200 OK for robots.txt and sitemap. Return 200 OK for /robots.xt and /srv/api/sitemap instead of 500 "service not found". Previously if the request didn't contain "Accept: text/plain" or "Accept: application/xml" the server returned a 500 error. Now the server accepts any "Accept" header without complaining, returning a 200 response with "Content-Type: text/plain" or "Content-Type: application/xml" and the right content. * Disallow /catalog and /static in robots.txt. Some search engines are indexing the Angular JS HTML template files. Exclude the ${context_path}/catalog and ${context_path}/static from being crawled by robots. Fix #3585.

juanluisrp mentioned this issue Sep 9, 2023

Fix robots.txt and sitemap 500 errors if no right content type is provided #7327

Merged

juanluisrp closed this as completed in #7327 Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

instruct search engine not to crawl /catalog folder #3585

instruct search engine not to crawl /catalog folder #3585

pvgenuchten commented Feb 14, 2019 •

edited by juanluisrp

Loading

instruct search engine not to crawl /catalog folder #3585

instruct search engine not to crawl /catalog folder #3585

Comments

pvgenuchten commented Feb 14, 2019 • edited by juanluisrp Loading

pvgenuchten commented Feb 14, 2019 •

edited by juanluisrp

Loading