Samples#

This document will teach you by example how to use the Python bindings of the Wowool Native SDK. You can download several samples from:

Github:

Download: git clone phforesteot/eot-wowool-samples.git

Basic analysis#

Consider the following sample application which will perform a basic analysis and print some results.

english_pipeline_init.py#

from eot.wowool.native.core import PipeLine
from eot.wowool.document import Document

document = Document("Mark Van Den Berg works at Omega Pharma.")
# Create an analyzer for a given language and options
process = PipeLine("english,entity")
# Process the data
document = process(document)
print(document)

The code sample will iterate through the sentences and the tokens and return the analysis of the given string:

S:(  0, 40)
 C:(  0, 40): Sentence
 C:(  0, 17): Person,@(canonical='Mark Van Den Berg' family='Van Den Berg' gender='male' given='Mark' )
 C:(  0,  4): PersonGiv
 C:(  0,  4): GivenName,@(gender='male' standalone='Yes' )
 T:(  0,  4): Mark,{+giv, +init-cap, +init-token},[Mark:Prop-Std]
 C:(  5, 17): PersonFam
 T:(  5,  8): Van,{+init-cap},[Van:Prop-Std]
 T:(  9, 12): Den,{+init-cap, +nf, +nf-lex},[Den:Prop-Std]
 T:( 13, 17): Berg,{+init-cap},[Berg:Prop-Std]
 T:( 18, 23): works,[work:V-Pres-3-Sg]
 T:( 24, 26): at,[at:Prep-Std]
 C:( 27, 39): Company,@(country='Belgium' sector='pharma' )
 T:( 27, 32): Omega,{+init-cap, +nf},[Omega:Prop-Std]
 T:( 33, 39): Pharma,{+init-cap, +nf, +nf-lex},[Pharma:Prop-Std]
 T:( 39, 40): .,[.:Punct-Sent]

Here’s how to interpret the results:

Tokens: T:(begin_offset, end_offset): literal, properties, [stem, pos]
Concept: C:(begin_offset, end_offset): uri, [attributes]

In short, C:(0:16):Person means we have found a Person from offset 0 until 16, which is Jan Van Den Berg.

Custom rule#

Let’s write a rule and extract some results. This example assumes you have a basic knowledge of the Wowoolian language. We will extract a Person followed by a form of the verb werken followed by a concept Person.

dutch_chain_domains.py#

from eot.wowool.native.core import Language, Domain
from eot.wowool.annotation import Concept
from eot.wowool.document import Document

# Create a domain rule file and name it 'profile.dom'
profile = Domain(source=r"""rule:{ Person .. <'werken' > .. Company }=PersonWorkCompany;""")
annotation_filter = set(["PersonWorkCompany", "Person", "Company"])


def intersting_concepts(concept: Concept) -> bool:
    global annotation_filter
    return concept.uri in annotation_filter


# Create an analyzer for a given language, the default dutch domain and the domain we just built
analyzer = Language("dutch")
entities = Domain("dutch-entity")
# Process the input text
document = analyzer(Document("Mark Van Den Berg werkte als hoofdarts by Omega Pharma."))
document = profile(entities(document))
# Print the concepts only
for concept in Concept.iter(document, intersting_concepts):
    print(concept)

dutch_custom_rule.py#

from eot.wowool.native.core import Language, Compiler, Domain, Filter
from eot.wowool.document import Document

Compiler.compile(
    output_file="profile.dom",
    source=r""" rule:{ Person .. 'werken' .. Company }= PersonWorkCompany;""",
)
analyzer = Language("dutch")

domains = [Domain(dn) for dn in ["dutch-entity.dom", "profile.dom"]]

filter = Filter(["PersonWorkCompany", "Person", "Company"])
document = analyzer(Document("Mark Van Den Berg werkte als hoofdarts by Omega Pharma."))
for domain in domains:
    document = domain(document)
document = filter(document)
for sentence in document.analysis:
    for annotation in sentence:
        if annotation.is_concept:
            print(annotation)

This results in the following output:

C:(  0, 54): PersonWorkCompany
C:(  0, 17): Person,@(canonical='Mark Van Den Berg' family='Van Den Berg' gender='male' given='Mark' )
C:( 42, 54): Company,@(country='Belgium' sector='pharma' )