Wowool#
Wowool is a NLP development platform for intelligent content management. It performs basic linguistic analysis (tokenization, lemmatization and part of speech tagging) that serves as a basis to a sophisticated linguistic pattern matching rule language.
The output of this rule language is a series of annotations, that can have associated attribute-value pairs.
Using this rule language, you get access to the following features:
Character pattern matching: you can match at character level, for instance, series of 10 digits starting with +1 is a typical USA telephone number.
Pattern matching based on tokens and their characteristics, like literal, lemma or stem, part of speech or token attributes.
Context based matching: for instance, a sequence of proper names followed by a company form descriptor, like “Inc” of “GmbH” is a Company.
You can match discontinuous related items, also known as facts: for instance, the name of a drug and its suspected side-effect. This is particularly useful if you are trying to create a relational databases out of your textual data.
Reusable annotations: you can write rules using other annotations already created. You do not need to worry on ordering the rules, the engine finds whatever it needs to match a rule.
There is support for Ontologies: you can organize your annotations in classes and subclasses and thus you are able to structure your knowledge better.
There is a mechanism to deal with correference: you can link common noun phrases, like ‘this drug’ to entity names, like ‘fentanyl’.
Negation of tokens or concepts.
From the rules there is a gateway to python, which allows for extremely powerful manipulation of the data in the annotations: aggregating information, creating profiles and knowledge networks, enriching databases and ontologies, normalizing strings, checking out external sources like the internet or other databases, performing document categorization, to name just a few examples.