Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Background:
Image captions are a small piece of text, usually of one line, added to the metadata of images to provide a brief summary of the scenery in the image.
It is a challenging and interesting problem in the domain of computer vision. Tika already has a support for image recognition via Object Recognition Parser, TIKA-1993 which uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow.
Captioning an image is a very useful feature since it helps text based Information Retrieval(IR) systems to "understand" the scenery in images.
Technical details and references:
- Google has long back open sourced their 'show and tell' neural network and its model for autogenerating captions. Source Code, Research blog
- Integrate it the same way as the ObjectRecognitionParser
- Create a RESTful API Service similar to this
- Extend or enhance ObjectRecognitionParser or one of its implementation
{skills, learning, homework} for GSoC students
- Knowledge of languages: java AND python, and maven build system
- RESTful APIs
- tensorflow/keras,
- deeplearning
Alternatively, a little more harder path for experienced:
Import keras/tensorflow model to deeplearning4j and run them natively inside JVM.
Benefits
- no RESTful integration required. thus no external dependencies
- easy to distribute on hadoop/spark clusters
Hurdles:
- This is a work in progress feature on deeplearning4j and hence expected to have lots of troubles on the way!