Description
Incredibly enough, it seems that our encoding detector does not take the Content-Type header into account at all when trying to guess a document's charset encoding!
This has caused a problem for me with the page: http://w3c.github.io/microdata-rdf/tests/0065.html
Even though the Content-Type header is set to "text/html; charset=utf-8", we're guessing the charset to be: "IBM500", which in turn renders the page into complete gibberish.
This must be a bug in Tika, because even when I set the declared encoding of the charset detector to UTF-8, IBM500 is still the most confident result.