Wowool#

Wowool is a NLP development platform for intelligent content management. It performs basic linguistic analysis (tokenization, lemmatization and part of speech tagging) that serves as a basis to a sophisticated linguistic pattern matching rule language.

The output of this rule language is a series of annotations, that can have associated attribute-value pairs.

Using this rule language, you get access to the following features:

  • Character pattern matching: you can match at character level, for instance, series of 10 digits starting with +1 is a typical USA telephone number.

  • Pattern matching based on tokens and their characteristics, like literal, lemma or stem, part of speech or token attributes.

  • Context based matching: for instance, a sequence of proper names followed by a company form descriptor, like “Inc” of “GmbH” is a Company.

  • You can match discontinuous related items, also known as facts: for instance, the name of a drug and its suspected side-effect. This is particularly useful if you are trying to create a relational databases out of your textual data.

  • Reusable annotations: you can write rules using other annotations already created. You do not need to worry on ordering the rules, the engine finds whatever it needs to match a rule.

  • There is support for Ontologies: you can organize your annotations in classes and subclasses and thus you are able to structure your knowledge better.

  • There is a mechanism to deal with correference: you can link common noun phrases, like ‘this drug’ to entity names, like ‘fentanyl’.

  • Negation of tokens or concepts.

From the rules there is a gateway to python, which allows for extremely powerful manipulation of the data in the annotations: aggregating information, creating profiles and knowledge networks, enriching databases and ontologies, normalizing strings, checking out external sources like the internet or other databases, performing document categorization, to name just a few examples.