README.md 2.92 KB
Newer Older
Javi Corvi's avatar
Javi Corvi committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# nlp-gate-generic-component

Text mining GATE generic component for run in Batch/Pipeline mode.

## Description

This component is a docker wrapper that execute the GATE ANNIE DefaultGazeteer and JAPE rules in batch mode.

The tool execute the Default Gazeteer Lookup given dictionaries passed as parameters and, in a second stage, execute JAPE rules given a main.jape file.

The list of the dictionaries/gazeteers entries has to be provided as in the GATE DefaultGazzeteer format.

More information about ANNIE DefaultGazeteer: https://gate.ac.uk/sale/tao/splitch13.html#x18-32200013.2

More information about JAPE rules:
https://gate.ac.uk/sale/thakker-jape-tutorial/GATE%20JAPE%20manual.pdf

This library is very useful if you need to execute gazeteers lookup and JAPE rules in batch mode, inside a Nextflow pipeline for example. 

jcorvi's avatar
jcorvi committed
20
## Actual Version: 1.1, 2020-03-04
jcorvi's avatar
jcorvi committed
21
22
23
24
25
## [Changelog](https://gitlab.bsc.es/inb/text-mining/generic-tools/import-json-to-mongo/blob/master/CHANGELOG) 
## Docker

javicorvi/import-json-to-mongo

Javi Corvi's avatar
Javi Corvi committed
26
27
## Build and Run the Docker 

28
	# To build the docker, just go into the nlp-gate-generic-component folder and execute
Javi Corvi's avatar
Javi Corvi committed
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
	docker build -t nlp-gate-generic-component .
	#To run the docker, just set the input_folder and the output
	mkdir ${PWD}/output_folder; docker run --rm -u $UID -v ${PWD}/input_folder:/in:ro -v ${PWD}/output_folder:/out:rw nlp-gate-generic-component nlp-gate-generic-component -i /in -o /out	-a ANNOTATION_SET -l in/dictionaries/lists.def -j in/jape_rules/main.jape
Parameters:
<p>
-i input folder with the documents to annotated. The documents could be plain txt or xml gate documents.
</p>
<p>
-o output folder with the documents annotated in gate format.
</p>
<p>
-a Output Annotation Set. Annotation set where the annotation will be included for the gazetter lookup and for the Jape Rules.
</p>
<p>
-ia Input Annotation Set. If you want to provided different input annotation, set this parameter. By default the -a output annotation set is used as input.  
</p>
<p>
jcorvi's avatar
jcorvi committed
46
-l Dictionary List definitions. A lists.def Gate-formatted file separated by tab can be provided or a zip file that contains the dictionary/gazetteer files including the lists.def
Javi Corvi's avatar
Javi Corvi committed
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
</p>
<p>
-j main.jape path with the JAPE rules to be executed.
</p>	
		
In this example the dictionaries/gazeteers and the jape rules are in the input folder.

## Built With

* [Docker](https://www.docker.com/) - Docker Containers
* [Maven](https://maven.apache.org/) - Dependency Management
* [GATE](https://gate.ac.uk/overview.html) - GATE: a full-lifecycle open source solution for text processing

## Versioning

62
We use [SemVer](http://semver.org/) for versioning. For the versions available, see the [tags on this repository](https://gitlab.bsc.es/inb/text-mining/generic-tools/nlp-gate-generic-component/-/tags). 
Javi Corvi's avatar
Javi Corvi committed
63
64
65
66
67
68
69
70
71
72

## Authors

* **Javier Corvi** 


## License

This project is licensed under the GNU GENERAL PUBLIC LICENSE Version 3 - see the [LICENSE.md](LICENSE.md) file for details