Samples#
This document will teach you by example how to use the Python bindings of the Wowool Native SDK. You can download several samples from:
Download: git clone phforesteot/eot-wowool-samples.git
Basic analysis#
Consider the following sample application which will perform a basic analysis and print some results.
from eot.wowool.native.core import PipeLine
from eot.wowool.document import Document
document = Document("Mark Van Den Berg works at Omega Pharma.")
# Create an analyzer for a given language and options
process = PipeLine("english,entity")
# Process the data
document = process(document)
print(document)
The code sample will iterate through the sentences and the tokens and return the analysis of the given string:
S:( 0, 40)
C:( 0, 40): Sentence
C:( 0, 17): Person,@(canonical='Mark Van Den Berg' family='Van Den Berg' gender='male' given='Mark' )
C:( 0, 4): PersonGiv
C:( 0, 4): GivenName,@(gender='male' standalone='Yes' )
T:( 0, 4): Mark,{+giv, +init-cap, +init-token},[Mark:Prop-Std]
C:( 5, 17): PersonFam
T:( 5, 8): Van,{+init-cap},[Van:Prop-Std]
T:( 9, 12): Den,{+init-cap, +nf, +nf-lex},[Den:Prop-Std]
T:( 13, 17): Berg,{+init-cap},[Berg:Prop-Std]
T:( 18, 23): works,[work:V-Pres-3-Sg]
T:( 24, 26): at,[at:Prep-Std]
C:( 27, 39): Company,@(country='Belgium' sector='pharma' )
T:( 27, 32): Omega,{+init-cap, +nf},[Omega:Prop-Std]
T:( 33, 39): Pharma,{+init-cap, +nf, +nf-lex},[Pharma:Prop-Std]
T:( 39, 40): .,[.:Punct-Sent]
Here’s how to interpret the results:
Tokens:
T:(begin_offset, end_offset): literal, properties, [stem, pos]
Concept:
C:(begin_offset, end_offset): uri, [attributes]
In short, C:(0:16):Person
means we have found a Person from offset 0 until 16, which is Jan Van Den Berg.
Custom rule#
Let’s write a rule and extract some results. This example assumes you have a basic knowledge of the Wowoolian language. We will extract a Person followed by a form of the verb werken followed by a concept Person.
from eot.wowool.native.core import Language, Domain
from eot.wowool.annotation import Concept
from eot.wowool.document import Document
# Create a domain rule file and name it 'profile.dom'
profile = Domain(source=r"""rule:{ Person .. <'werken' > .. Company }=PersonWorkCompany;""")
annotation_filter = set(["PersonWorkCompany", "Person", "Company"])
def intersting_concepts(concept: Concept) -> bool:
global annotation_filter
return concept.uri in annotation_filter
# Create an analyzer for a given language, the default dutch domain and the domain we just built
analyzer = Language("dutch")
entities = Domain("dutch-entity")
# Process the input text
document = analyzer(Document("Mark Van Den Berg werkte als hoofdarts by Omega Pharma."))
document = profile(entities(document))
# Print the concepts only
for concept in Concept.iter(document, intersting_concepts):
print(concept)
from eot.wowool.native.core import Language, Compiler, Domain, Filter
from eot.wowool.document import Document
Compiler.compile(
output_file="profile.dom",
source=r""" rule:{ Person .. 'werken' .. Company }= PersonWorkCompany;""",
)
analyzer = Language("dutch")
domains = [Domain(dn) for dn in ["dutch-entity.dom", "profile.dom"]]
filter = Filter(["PersonWorkCompany", "Person", "Company"])
document = analyzer(Document("Mark Van Den Berg werkte als hoofdarts by Omega Pharma."))
for domain in domains:
document = domain(document)
document = filter(document)
for sentence in document.analysis:
for annotation in sentence:
if annotation.is_concept:
print(annotation)
This results in the following output:
C:( 0, 54): PersonWorkCompany
C:( 0, 17): Person,@(canonical='Mark Van Den Berg' family='Van Den Berg' gender='male' given='Mark' )
C:( 42, 54): Company,@(country='Belgium' sector='pharma' )