Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
2.0.0-BETA
-
None
-
None
-
OS: ArcoLinux
Kernel: 5.10.60-1-lts
CPU: Intel i5-8400 (6) @ 4.000GHz
Memory: 32Gb
Description
It appears that canceling a request will not stop work in Tika. The handler finishes the job and then fails as it attempts to return data.
I would have expected tika to detect client-side cancellations and propagate this to relevant child processes, like tesseract, thus avoiding unnecessary work.
I send a request like so. Here FILE is a pdf that has inline images and requires OCR scanning.
curl -T "$FILE" \ -s "http://localhost:9998/tika/text" \ -H "Accept: application/json" \ -H "X-Tika-OCRLanguage: dan+eng" \ -H "X-Tika-PDFextractInlineImages: true"
Then "ctrl-C" before the response is returned.
Dockerfile:
FROM apache/tika:2.0.0-full RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install tesseract-ocr-dan
docker-compose.yaml:
version: "3.9" services: tika: build: tika/ ports: - "9998:9998"