README.md 3.87 KB
Newer Older
Javi Corvi's avatar
Javi Corvi committed
1 2
# nlp-gate-generic-component

javi's avatar
javi committed
3
Text mining GATE generic component for run in Batch/Pipeline mode using software containers (dockers).
Javi Corvi's avatar
Javi Corvi committed
4 5 6

## Description

javi's avatar
javi committed
7 8 9
This tool execute the Default Gazeteer or Flexible Gazetter Lookup given dictionaries passed as parameters and, in a second stage, execute JAPE rules given a main.jape file.
The list of the dictionaries/gazeteers entries has to be provided as in the GATE format.
The ANNIE SerialAnalyserController is used to execute the pipeline.
Javi Corvi's avatar
Javi Corvi committed
10

javi's avatar
javi committed
11
This component is a docker wrapper that executes, in batch mode, GATE Processing Resources:
Javi Corvi's avatar
Javi Corvi committed
12

javi's avatar
javi committed
13 14 15
	DefaultGazeteer: https://gate.ac.uk/sale/tao/splitch13.html#x18-32200013.2  
	FlexibleGazetter: https://gate.ac.uk/sale/tao/splitch13.html#x18-33300013.6
	Jape Transducer: https://gate.ac.uk/sale/thakker-jape-tutorial/GATE%20JAPE%20manual.pdf
Javi Corvi's avatar
Javi Corvi committed
16

javi's avatar
javi committed
17
To this aim it uses the corresponding GATE plugins present in ANNIE and in TOOLS.
Javi Corvi's avatar
Javi Corvi committed
18

javi's avatar
javi committed
19
This library is useful if you need to execute gazeteers lookup and JAPE rules in batch mode, for example, using Nextflow as workflow manager. 
Javi Corvi's avatar
Javi Corvi committed
20

jcorvi's avatar
jcorvi committed
21
## Actual Version: 1.4, 2020-08-22
javi's avatar
javi committed
22
## [Changelog](https://gitlab.bsc.es/inb/text-mining/generic-tools/nlp-gate-generic-component/blob/master/CHANGELOG) 
jcorvi's avatar
jcorvi committed
23 24
## Docker

javi's avatar
javi committed
25
javicorvi/nlp-gate-generic-component
jcorvi's avatar
jcorvi committed
26

Javi Corvi's avatar
Javi Corvi committed
27 28
## Build and Run the Docker 

29
	# To build the docker, just go into the nlp-gate-generic-component folder and execute
Javi Corvi's avatar
Javi Corvi committed
30 31 32 33 34
	docker build -t nlp-gate-generic-component .
	#To run the docker, just set the input_folder and the output
	mkdir ${PWD}/output_folder; docker run --rm -u $UID -v ${PWD}/input_folder:/in:ro -v ${PWD}/output_folder:/out:rw nlp-gate-generic-component nlp-gate-generic-component -i /in -o /out	-a ANNOTATION_SET -l in/dictionaries/lists.def -j in/jape_rules/main.jape
Parameters:
<p>
javi's avatar
javi committed
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
-i or input:  input folder with the documents to annotated. The documents could be plain txt or xml gate documents.
</p>
<p>
-o or -output: folder with the documents annotated in gate format.
</p>
<p>
-gt or -gazetter_type: Gazetter type:  default, flexible.  If no value is provided the DefautlGazetter is used
</p>
<p>
-inputFeatureNames: See flexible gazetter required fields.  These feature values are used to replace the corresponding original text. 
</p>
<p>
-a or outputASName: Output Annotation Set. Annotation set where the annotation will be included for the gazetter lookup and for the Jape Rules.
</p>
<p>
-ia or inputASName: Input Annotation Set. If you want to provided different input annotation, set this parameter. By default the -a output annotation set is used as input.  
Javi Corvi's avatar
Javi Corvi committed
51 52
</p>
<p>
javi's avatar
javi committed
53
-l or listsURL: Dictionary List definitions. A lists.def Gate-formatted file separated by tab can be provided or a zip file that contains the dictionary/gazetteer files including the lists.def
Javi Corvi's avatar
Javi Corvi committed
54 55
</p>
<p>
javi's avatar
javi committed
56
-gazetteerFeatureSeparator: The character used to add arbitrary features to gazetteer entries. Default tab.
Javi Corvi's avatar
Javi Corvi committed
57 58
</p>
<p>
javi's avatar
javi committed
59
-caseSensitive: Should the gazetteer be case sensitive during matching. Default false
Javi Corvi's avatar
Javi Corvi committed
60 61
</p>
<p>
javi's avatar
javi committed
62
-longestMatchOnly: This parameter is only relevant when the list of lookups contains proper prefixes. The default behaviour (when this parameter is set to true) is to only match the longest entry. Setting this parameter to false will cause the gazetteer to match all possible prefixes.
Javi Corvi's avatar
Javi Corvi committed
63 64
</p>
<p>
javi's avatar
javi committed
65
-j or jape_main: main.jape path with the JAPE rules to be executed.
Javi Corvi's avatar
Javi Corvi committed
66 67 68 69 70 71 72 73 74 75
</p>	

## Built With

* [Docker](https://www.docker.com/) - Docker Containers
* [Maven](https://maven.apache.org/) - Dependency Management
* [GATE](https://gate.ac.uk/overview.html) - GATE: a full-lifecycle open source solution for text processing

## Versioning

76
We use [SemVer](http://semver.org/) for versioning. For the versions available, see the [tags on this repository](https://gitlab.bsc.es/inb/text-mining/generic-tools/nlp-gate-generic-component/-/tags). 
Javi Corvi's avatar
Javi Corvi committed
77 78 79 80 81 82 83 84 85 86

## Authors

* **Javier Corvi** 


## License

This project is licensed under the GNU GENERAL PUBLIC LICENSE Version 3 - see the [LICENSE.md](LICENSE.md) file for details