Details
Description
The rule
<!-- removes duplicate slashes --> <regex> <pattern>(?<!:)/{2,}</pattern> <substitution>/</substitution> </regex>
in regex-normalize.xml removes the third slash in file:///path/index.html. The resulting URL file://path/index.html fails to fetch because path is interpreted as host part of the URL as in file://localhost/path/index.html, cf. wikipedia, RFC 1738 (1994), and RFC 3986 (2005).
(split as sub-task from NUTCH-1483)
Attachments
Attachments
Issue Links
- is superceded by
-
NUTCH-1879 Regex URL normalizer should remove multiple slashes after file: protocol
- Closed