Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The Web connectors already allows to ignore robots.txt by option.
With this ticket, another option is added, to allow the connector to ignore robots instructions in <meta name="robots ... tags.
Proposal (to be discussed)
Add a new option list "Page level robots instructions" to the "Robots" Tab. List entries:
- Obey meta robots tags (the default)
- Don't took at meta robots tags
The end user doc needs to be updated.
Google ressources on robot instructions in HTML pages:
[0] https://support.google.com/webmasters/answer/79812?hl=en&ctx=cb&src=cb&cbid=tnnsjq5jcodt&cbrank=4
[1] https://support.google.com/webmasters/answer/96569?hl=en&ctx=cb&src=cb&cbid=-5rmggrfsp2rq&cbrank=3
[2] https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag?csw=1
Thread on the mailing list
[3] https://www.mail-archive.com/user@manifoldcf.apache.org/msg03258.html