Description
There is a rare case which causes an AssertionError in the backtrace step of JapaneseTokenizer that we (Amazon Product Search) found in our tests.
If there is a text span of length 1024 (determined by MAX_BACKTRACE_GAP) where the regular backtrace is not called, a forced backtrace will be applied. If the partially best path at this point happens to end at the last pos, and since there is always a final backtrace applied at the end, the final backtrace will try to backtrace from and to the same position, causing an AssertionError in RollingCharBuffer.get() when it tries to generate an empty buffer.
We are fixing it by returning prematurely in the backtrace() method when the from and to pos are the same:
if (endPos == lastBackTracePos) { return; }
The backtrace() method is essentially no-op when this condition happens, thus when -ea is not enabled, it can still output the correct tokens.
We will open a PR for this issue.