Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instruct search engine not to crawl /catalog folder #3585

Closed
pvgenuchten opened this issue Feb 14, 2019 · 0 comments · Fixed by #7327
Closed

instruct search engine not to crawl /catalog folder #3585

pvgenuchten opened this issue Feb 14, 2019 · 0 comments · Fixed by #7327

Comments

@pvgenuchten
Copy link

pvgenuchten commented Feb 14, 2019

Reproduce:
Search on google for "mdView.current.record.title"
-> you'll find hits from templates in various gn implementations

Fix it by excluding /catalog folder from google indexing in robots.txt
Would be good to hide also other pages for crawler, such as /doc/api

disallow: /catalog/*

Note that robots.txt included in gn will only work if you deploy geonetwork at a root folder. If not, then create your own robots.txt in root folder. In that scenario include /geonetwork in the disallowed path.

Read more at https://webmasters.stackexchange.com/questions/89395/can-robots-txt-be-in-a-servers-sub-directory and http://www.robotstxt.org

juanluisrp added a commit to GeoCat/core-geonetwork that referenced this issue Sep 9, 2023
* Some search engines are indexing the Angular JS HTML template files. Exclude the
${context_path}/catalog and ${context_path}/static from being crawled by robots.

Fix geonetwork#3585.
juanluisrp added a commit that referenced this issue Sep 11, 2023
…vided (#7327)

* Return 200 OK for robots.txt and sitemap. Return 200 OK for /robots.xt and /srv/api/sitemap instead 
of 500 "service not found". Previously if the request didn't contain "Accept: text/plain" or 
"Accept: application/xml" the server returned a 500 error. Now the server accepts any "Accept" header 
without complaining, returning a 200 response with "Content-Type: text/plain" or 
"Content-Type: application/xml" and the right content. 
* Disallow /catalog and /static in robots.txt. Some search engines are indexing the Angular JS HTML 
template files. Exclude the ${context_path}/catalog and ${context_path}/static from being crawled by 
robots. Fix #3585.
juanluisrp added a commit that referenced this issue Sep 11, 2023
…s provided (#7327)

* Return 200 OK for robots.txt and sitemap. Return 200 OK for /robots.xt and /srv/api/sitemap instead
of 500 "service not found". Previously if the request didn't contain "Accept: text/plain" or
"Accept: application/xml" the server returned a 500 error. Now the server accepts any "Accept" header
without complaining, returning a 200 response with "Content-Type: text/plain" or
"Content-Type: application/xml" and the right content.
* Disallow /catalog and /static in robots.txt. Some search engines are indexing the Angular JS HTML
template files. Exclude the ${context_path}/catalog and ${context_path}/static from being crawled by
robots. Fix #3585.
juanluisrp added a commit that referenced this issue Sep 11, 2023
…s provided (#7327)

* Return 200 OK for robots.txt and sitemap. Return 200 OK for /robots.xt and /srv/api/sitemap instead
of 500 "service not found". Previously if the request didn't contain "Accept: text/plain" or
"Accept: application/xml" the server returned a 500 error. Now the server accepts any "Accept" header
without complaining, returning a 200 response with "Content-Type: text/plain" or
"Content-Type: application/xml" and the right content.
* Disallow /catalog and /static in robots.txt. Some search engines are indexing the Angular JS HTML
template files. Exclude the ${context_path}/catalog and ${context_path}/static from being crawled by
robots. Fix #3585.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant