Add Flexible Gazeetter Processing Resource. Enable the posibility to run this gazetter lookup using features inside the Token annotation.
Minor modification of parameters names, using the same parameters as in the GATE plugins definitions.
## Version 1.2, 2020-03-25
Externalization of Gate parameters: gazetteerFeatureSeparator, caseSensitive and longestMatchOnly. Now this parameters can be passed to the component.
To see the defaults please go to help.
First version of the component.
Internal Map parametersin order to be clear managing the parameters.
## Version 1.1, 2020-03-10
Posibility of adding a .zip file in the dictionary definition.
Parameter -l ---> Dictionary List definitions. A lists.def Gate-formatted file separated by tab can be provided or a zip file that contains the dictionary/gazetteer files including the lists.def
## Version 1.2, 2020-03-25
Externalization of Gate parameters: gazetteerFeatureSeparator, caseSensitive and longestMatchOnly. Now this parameters can be passed to the component.
To see the defaults please go to help.
## Version 1.0, 2020-03-03
Internal Map parameters in order to be clear managing the parameters.
Text mining GATE generic component for run in Batch/Pipeline mode.
Text mining GATE generic component for run in Batch/Pipeline mode using software containers (dockers).
## Description
This component is a docker wrapper that execute the GATE ANNIE DefaultGazeteer and JAPE rules in batch mode.
This tool execute the Default Gazeteer or Flexible Gazetter Lookup given dictionaries passed as parameters and, in a second stage, execute JAPE rules given a main.jape file.
The list of the dictionaries/gazeteers entries has to be provided as in the GATE format.
The ANNIE SerialAnalyserController is used to execute the pipeline.
The tool execute the Default Gazeteer Lookup given dictionaries passed as parameters and, in a second stage, execute JAPE rules given a main.jape file.
This component is a docker wrapper that executes, in batch mode, GATE Processing Resources:
The list of the dictionaries/gazeteers entries has to be provided as in the GATE DefaultGazzeteer format.
-i input folder with the documents to annotated. The documents could be plain txt or xml gate documents.
-i or input: input folder with the documents to annotated. The documents could be plain txt or xml gate documents.
</p>
<p>
-o or -output: folder with the documents annotated in gate format.
</p>
<p>
-gt or -gazetter_type: Gazetter type: default, flexible. If no value is provided the DefautlGazetter is used
</p>
<p>
-inputFeatureNames: See flexible gazetter required fields. These feature values are used to replace the corresponding original text.
</p>
<p>
-a or outputASName: Output Annotation Set. Annotation set where the annotation will be included for the gazetter lookup and for the Jape Rules.
</p>
<p>
-ia or inputASName: Input Annotation Set. If you want to provided different input annotation, set this parameter. By default the -a output annotation set is used as input.
</p>
<p>
-o output folder with the documents annotated in gate format.
-l or listsURL: Dictionary List definitions. A lists.def Gate-formatted file separated by tab can be provided or a zip file that contains the dictionary/gazetteer files including the lists.def
</p>
<p>
-a Output Annotation Set. Annotation set where the annotation will be included for the gazetter lookup and for the Jape Rules.
-gazetteerFeatureSeparator: The character used to add arbitrary features to gazetteer entries. Default tab.
</p>
<p>
-ia Input Annotation Set. If you want to provided different input annotation, set this parameter. By default the -a output annotation set is used as input.
-caseSensitive: Should the gazetteer be case sensitive during matching. Default false
</p>
<p>
-l Dictionary List definitions. A lists.def Gate-formatted file separated by tab can be provided or a zip file that contains the dictionary/gazetteer files including the lists.def
-longestMatchOnly: This parameter is only relevant when the list of lookups contains proper prefixes. The default behaviour (when this parameter is set to true) is to only match the longest entry. Setting this parameter to false will cause the gazetteer to match all possible prefixes.
</p>
<p>
-j main.jape path with the JAPE rules to be executed.
-j or jape_main: main.jape path with the JAPE rules to be executed.
</p>
In this example the dictionaries/gazeteers and the jape rules are in the input folder.
* Generic Library for execute GATE Dictionary/Gazetteer and JAPE rules processing in batch mode.
* Generic Library for execute GATE DefaultGazetteer and FlexibleGazetter and JAPE rules processing in batch mode.
*
*/
publicclassApp{
...
...
@@ -50,20 +52,29 @@ public class App {
output.setRequired(true);
options.addOption(output);
OptionlistDefinitions=newOption("l","lists_definitions",true,"Dictionary List definitions. "
OptionlistDefinitions=newOption("l","listsURL",true,"Dictionary List definitions. "
+"A lists.def Gate-formatted file separated by tab can be provided or a zip file that contains the dictionary/gazetteer files including the lists.def ");
listDefinitions.setRequired(false);
options.addOption(listDefinitions);
OptiongazetterType=newOption("gt","gazetter_type",true,"Gazetter type: default, flexible. If no value is provided the DefautlGazetter is used");
gazetterType.setRequired(false);
options.addOption(gazetterType);
OptioninputFeatureNames=newOption("inputFeatureNames","inputFeatureNames",true,"See flexible gazetter required fields. These feature values are used to replace the corresponding original text. "
+" Default vales are Token.root,Token.word. Format if there is more than one feature: Token.xxx,Token.yyy");
inputFeatureNames.setRequired(false);
options.addOption(inputFeatureNames);
OptionjapeMain=newOption("j","jape_main",true,"Jape Main file for processing rules");
japeMain.setRequired(false);
options.addOption(japeMain);
Optionset=newOption("a","annotation_set",true,"Output Annotation Set. Annotation set where the annotation will be included for the gazetter lookup and for the Jape Rules");
Optionset=newOption("a","outputASName",true,"Output Annotation Set. Annotation set where the annotation will be included for the gazetter lookup and for the Jape Rules");
set.setRequired(true);
options.addOption(set);
Optioniset=newOption("ia","input_annotation_set",true,"Input Annotation Set. If you want to provided different input annotation set this parameter. By default the -a output annotation set is used as input.");
Optioniset=newOption("ia","inputASName",true,"Input Annotation Set. If you want to provided different input annotation set this parameter. By default the -a output annotation set is used as input.");