Description
When one calls org.apache.any23.Any23.createDocumentSource(String documentURI) we only do the following simple checks
documentURI.toLowerCase().startsWith("http:" || 'file:' || 'https:'...
before picking the appropriate DocumentSource.
An improvement on this algorithm would be to add additional code to attempt to add the above protocol string's to the beginning of the documentURI after the above checks have been made. This way we carry out the same logical checks, in the same order but also make better attempts to find an appropriate DocumentSource before calling the IllegalArgumentException "Unsupported protocol for document URI: '%s' .", documentURI.
An example would be if someone were to pass in the following documentURI
'/Downloads/github/Scottish-Technical-Standards-Domestic/html_domestic/domestic/section6'
In the above case this file happens to reside on the local file system however no 'file:' protocol has been added to the documentURI.