[TIKA-2262] Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.17
Component/s: parser
Labels:

Description

Background:

Image captions are a small piece of text, usually of one line, added to the metadata of images to provide a brief summary of the scenery in the image.
It is a challenging and interesting problem in the domain of computer vision. Tika already has a support for image recognition via Object Recognition Parser, TIKA-1993 which uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow.
Captioning an image is a very useful feature since it helps text based Information Retrieval(IR) systems to "understand" the scenery in images.

Technical details and references:

Google has long back open sourced their 'show and tell' neural network and its model for autogenerating captions. Source Code, Research blog
Integrate it the same way as the ObjectRecognitionParser
- Create a RESTful API Service similar to this
- Extend or enhance ObjectRecognitionParser or one of its implementation

{skills, learning, homework} for GSoC students

Knowledge of languages: java AND python, and maven build system
RESTful APIs
tensorflow/keras,
deeplearning

Alternatively, a little more harder path for experienced:
Import keras/tensorflow model to deeplearning4j and run them natively inside JVM.

Benefits

no RESTful integration required. thus no external dependencies
easy to distribute on hadoop/spark clusters

Hurdles:

This is a work in progress feature on deeplearning4j and hence expected to have lots of troubles on the way!

Attachments

Activity

People

Assignee:: Chris A. Mattmann

Reporter:: Thamme Gowda

Votes:: 2 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 11/Feb/17 21:38

Updated:: 09/Jul/17 15:48

Resolved:: 09/Jul/17 15:19