# nlp-gate-generic-component Text mining GATE generic component for run in Batch/Pipeline mode using software containers (dockers). ## Description This tool execute the Default Gazeteer or Flexible Gazetter Lookup given dictionaries passed as parameters and, in a second stage, execute JAPE rules given a main.jape file. The list of the dictionaries/gazeteers entries has to be provided as in the GATE format. The ANNIE SerialAnalyserController is used to execute the pipeline. This component is a docker wrapper that executes, in batch mode, GATE Processing Resources: DefaultGazeteer: https://gate.ac.uk/sale/tao/splitch13.html#x18-32200013.2 FlexibleGazetter: https://gate.ac.uk/sale/tao/splitch13.html#x18-33300013.6 Jape Transducer: https://gate.ac.uk/sale/thakker-jape-tutorial/GATE%20JAPE%20manual.pdf To this aim it uses the corresponding GATE plugins present in ANNIE and in TOOLS. This library is useful if you need to execute gazeteers lookup and JAPE rules in batch mode, for example, using Nextflow as workflow manager. ## Actual Version: 1.4, 2020-08-22 ## [Changelog](https://gitlab.bsc.es/inb/text-mining/generic-tools/nlp-gate-generic-component/blob/master/CHANGELOG) ## Docker javicorvi/nlp-gate-generic-component ## Build and Run the Docker # To build the docker, just go into the nlp-gate-generic-component folder and execute docker build -t nlp-gate-generic-component . #To run the docker, just set the input_folder and the output mkdir ${PWD}/output_folder; docker run --rm -u $UID -v ${PWD}/input_folder:/in:ro -v ${PWD}/output_folder:/out:rw nlp-gate-generic-component nlp-gate-generic-component -i /in -o /out -a ANNOTATION_SET -l in/dictionaries/lists.def -j in/jape_rules/main.jape Parameters:
-i or input: input folder with the documents to annotated. The documents could be plain txt or xml gate documents.
-o or -output: folder with the documents annotated in gate format.
-gt or -gazetter_type: Gazetter type: default, flexible. If no value is provided the DefautlGazetter is used
-inputFeatureNames: See flexible gazetter required fields. These feature values are used to replace the corresponding original text.
-a or outputASName: Output Annotation Set. Annotation set where the annotation will be included for the gazetter lookup and for the Jape Rules.
-ia or inputASName: Input Annotation Set. If you want to provided different input annotation, set this parameter. By default the -a output annotation set is used as input.
-l or listsURL: Dictionary List definitions. A lists.def Gate-formatted file separated by tab can be provided or a zip file that contains the dictionary/gazetteer files including the lists.def
-gazetteerFeatureSeparator: The character used to add arbitrary features to gazetteer entries. Default tab.
-caseSensitive: Should the gazetteer be case sensitive during matching. Default false
-longestMatchOnly: This parameter is only relevant when the list of lookups contains proper prefixes. The default behaviour (when this parameter is set to true) is to only match the longest entry. Setting this parameter to false will cause the gazetteer to match all possible prefixes.
-j or jape_main: main.jape path with the JAPE rules to be executed.
## Built With * [Docker](https://www.docker.com/) - Docker Containers * [Maven](https://maven.apache.org/) - Dependency Management * [GATE](https://gate.ac.uk/overview.html) - GATE: a full-lifecycle open source solution for text processing ## Versioning We use [SemVer](http://semver.org/) for versioning. For the versions available, see the [tags on this repository](https://gitlab.bsc.es/inb/text-mining/generic-tools/nlp-gate-generic-component/-/tags). ## Authors * **Javier Corvi** ## License This project is licensed under the GNU GENERAL PUBLIC LICENSE Version 3 - see the [LICENSE.md](LICENSE.md) file for details