Commit 09db99c9 authored by javi's avatar javi

remove anatomy and finding stop words and add this rule into the

ner-postprocessing.
parent 173fd773
Pipeline #3722 passed with stage
in 2 minutes and 20 seconds
......@@ -18,4 +18,6 @@ Improve of several dictionaries and rules:
Possibility to add parameters during the execution. Modification in nlp-gate-generic-component.
\ No newline at end of file
## Version 1.3, 2020-03-27
Remove anatomy and finding stop words and add this rule into the ner-postprocessing.
\ No newline at end of file
......@@ -24,7 +24,7 @@ For the fields FINDING and STUDY_TESTCD a STUDY_DOMAIN feature is added to descr
Internally, the cdisc-etox-annotation library uses the generic nlp-gate-generic-component https://gitlab.bsc.es/inb/text-mining/generic-tools/nlp-gate-generic-component. This library is a generic component that annotate text with parametrices GATE-formatted gazetters/dictionaries. In other words, the cdisc-etox-annotation library is an instance of the nlp-gate-generic-component with a specific set of dictionaries.
## Actual Version: 1.2, 2020-03-25
## Actual Version: 1.3, 2020-03-27
## [Changelog](https://gitlab.bsc.es/inb/text-mining/bio-tools/cdisc-etox-annotation/blob/master/CHANGELOG)
## Docker
......
......@@ -2,6 +2,4 @@ etox_in-life-observations_dict.lst:ETOX_ILO:ETOX_ILO
etox_anatomy_dict.lst:ANATOMY_ETOX:ANATOMY_ETOX
etox_send_dict.lst:SEND_ETOX:SEND_ETOX
etox_moa_dict.lst:MOA_ETOX:MOA_ETOX
cdisc_send_dict.lst:SEND_CIDSC:SEND_CIDSC
stop_words_finding.lst:STOP_WORD_FINDING:STOP_WORD_FINDING
stop_words_anatomy.lst:STOP_WORD_ANATOMY:STOP_WORD_ANATOMY
\ No newline at end of file
cdisc_send_dict.lst:SEND_CIDSC:SEND_CIDSC
\ No newline at end of file
liver weight
general
body weigh
body weight
weight
\ No newline at end of file
administration/collection site
animal identification
anus
body temperature
bodyweight/growth
breathing
digit/claw
dosing
ear
eye
feces/urine
urine
food consumption
general behaviour
general condition
locomotive behaviour
mouth
normal
nouse
posture
pulmonary parameter
skin/fur
tail
teeth
tongue
unclassified
varia
unspecified
\ No newline at end of file
Imports: {
import static gate.Utils.*;
}
Phase:secondphase
Input: Lookup
Options: control = appelt
Rule: removeStopwords
(
{Lookup.majorType == "STOP_WORD_ANATOMY"}
) :stop
-->
{
System.out.println("ENTER RULE STOP ANATOMY");
gate.AnnotationSet lookup = (gate.AnnotationSet) bindings.get("stop");
gate.Annotation ann = (gate.Annotation) lookup.iterator().next();
gate.AnnotationSet to_remove = outputAS.get("SPECIMEN", ann.getStartNode().getOffset(), ann.getEndNode().getOffset());
for (Annotation rem : to_remove) {
if(ann.getStartNode().getOffset()==rem.getStartNode().getOffset() && ann.getEndNode().getOffset()==rem.getEndNode().getOffset()){
System.out.println(rem.getType() + " : " + stringFor(doc, rem));
outputAS.remove(rem);
}
}
}
\ No newline at end of file
Imports: {
import static gate.Utils.*;
}
Phase:secondphase
Input: Lookup
Options: control = appelt
Rule: removeStopwords
(
{Lookup.majorType == "STOP_WORD_FINDING"}
) :stop
-->
{
System.out.println("ENTER RULE STOP ILO");
gate.AnnotationSet lookup = (gate.AnnotationSet) bindings.get("stop");
gate.Annotation ann = (gate.Annotation) lookup.iterator().next();
gate.AnnotationSet to_remove = outputAS.get("FINDING", ann.getStartNode().getOffset(), ann.getEndNode().getOffset());
for (Annotation rem : to_remove) {
if(ann.getStartNode().getOffset()==rem.getStartNode().getOffset() && ann.getEndNode().getOffset()==rem.getEndNode().getOffset()){
System.out.println(rem.getType() + " : " + stringFor(doc, rem));
outputAS.remove(rem);
}
}
}
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment