Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
##Description
This program aims to provide the support to identify geonames for any unstructured text data in the project NSF polar research. https://github.com/NSF-Polar-Cyberinfrastructure/datavis-hackathon/issues/1
This project is a content-based geotagging solution, made of a variaty of NLP tools and could be used for any geotagging purposes.
##Workingflow
1. Plain text input is passed to geoparser
2. Location names are extracted from the text using OpenNLP NER
3. Provide two roles:
- The most frequent location name choosed as the best match for the input text
- Other extracted locations are treated as alternatives (equal)
4. location extracted above, search the best GeoName object and return the resloved objects with fields (name in gazetteer, longitude, latitude)
##How to Use
Cautions: This program requires at least 1.2 GB disk space for building Lucene Index
```Java
function A(stream){
Metadata metadata = new Metadata();
ParseContext context=new ParseContext();
GeoParserConfig config= new GeoParserConfig();
config.setGazetterPath(gazetteerPath);
config.setNERModelPath(nerPath);
context.set(GeoParserConfig.class, config);
geoparser.parse(
stream,
new BodyContentHandler(),
metadata,
context);
for(String name: metadata.names())
{ String value=metadata.get(name); System.out.println(name +" " + value); } }
```
This parser generates useful geographical information to Tika's Metadata Object.
Fields for best matched location:
```
Geographic_NAME
Geographic_LONGTITUDE
Geographic_LATITUDE
```
Fields for alternatives:
```
Geographic_NAME1
Geographic_LONGTITUDE1
Geographic_LATITUDE1
Geographic_NAME2
Geographic_LONGTITUDE2
Geographic_LATITUDE2
...
```
If you have any questions, contact me: anyayunli@gmail.com