Howto’s#

Match literals#

To match the word exactly as it is written, you use the literal match.

Matching using Lexicons#

Only the first kiwi in the sentence is an exact match

lexicon:{
            kiwi
    } = fruit;

Mary ate one kiwi. John ate two kiwis. I love a KIWI

kiwi = fruit

Matching using Rules#

Only the first kiwi in the sentence is an exact match

rule:{ "kiwi" } = fruit;

Mary ate one kiwi. John ate two kiwis. I love a KIWI

kiwi = fruit

Match Stems#

Match on the root form of the words, buy will match buy, bought, buying, …

Match using Lexicons#

lexicon:(input="stem"){
        pêche
} = french_fruit;

J’ai des pommes et des pêches à vendre.

pêches = french_fruit

Lexicons - normalized stems#

The input will be normalized, accents will be removed and the match will be case insensitive. Use input=”normalized_stem”

lexicon:(input="normalized_stem"){
        peche
} = french_fruit;

J’aime la confiture de pêches.

pêches = french_fruit

Match using Rules#

Use single quotes to address the stem of a literal.

rule:{ ( 'pêche' | 'pomme' ) } = french_fruit;

J’ai des pommes et des pêches à vendre.

pommes = french_fruit
pêches = french_fruit

Match Part of Speech#

Matches Noun-Noun Adj-Noun sequences#

rule:{ ( Nn|Adj )+ Nn } = NounSequence;

The quick brown fox jumped over the lazy dog.

quick brown fox = NounSequence
lazy dog = NounSequence

Matches Work as a verb#

Only the first ‘work’ as the one in the next sentence is not a verb.

rule:{  <'work',V> } = Activity;

I work at the Hospital. The work there is good.

work = Activity

Compounds#

Heads#

Match words containing ‘fiets’ (‘bicycle’ in Dutch) as a head (the last part of the compound).

In lexicons:

lexicon:(input="head") { fiets } = Fiets;

In rules, use ‘h’ in front of the expression to match:

rule: { (<h'fiets'>) } = Fiets;

Ik heb een grote groene bakfiets en een kleine koersfiets om op het fietspad te rijden.

bakfiets = Fiets

koersfiets = Fiets

Component#

Match words containing ‘fiets’ in any part of the compound.

In lexicons:

lexicon:(input="component") { fiets } = Fiets;

In rules: use ‘c’ in front of the expression to match.

rule: { (<c'fiets'>) } = FietsDing;

Ik heb een grote groene bakfiets en een kleine koersfiets om op het fietspad te rijden.

bakfiets = FietsDing

koersfiets = FietsDing

fietspad = FietsDing

Canonicals#

You can match a concept canonical by using the canonical input. For that you need a domain before that contains canonicals in the annotations, like the entity domain.

lexicon:(input="canonical") {  Joe Biden, Emmanuel Macron, Boris Johnson } = President;

Emmanuel Macron won the elecions. He was happy about it. The main oponent of Macron was Marine Le Pen.

Emmanuel Macron = President

he = President

Macron = President

Filter a concept#

rule :
{
    'Charlie' 'Hebdo'
} = wow::filter@(concept="Person");

You can combine it with a lexicon:

lexicon:
{
    Charlie Hebdo,
    Fannie Mae
} = NoPerson;

// Charlie Hebdo
rule :
{
    NoPerson
} = wow::filter@(concept="Person");

Wildcards#

Lexicon#

lexicon:{ (.)* (I|i)nc\. , (.)*soft } = company;

I worked at Mickysoft and Ashlar Inc. last year.

* Mickysoft = company

* Ashlar Inc. = company

Rules: within tokens#

rule:{ "(.)*ing" } = ing_words;

I was thinking while I was running that eating is a worldly pleasure.

* thinking = ing_words

* running = ing_words

* eating = ing_words

Rules: tokens#

rule:{  ( 'buy' | 'purchase') {(Det)? (Adj)* Nn} = Purchase } = buying_things;

I bought apples. He purchased a fancy fast car. That’s life

* bought apples = buying_things

* purchased a fancy fast car = buying_things

Match Character classes#

Classes#

These are the available classes: lower,upper,digit,xdigit,alnum,alpha,range,punct

lexicon:{  ([:upper:]){2}([:digit:]){2,3} } = aa_number;

Matches: AA234 and FZ93 but not aa123 or A123

AA234 = aa_number

FZ93 = aa_number

lower#

words made up of all lowercase characters:

lexicon:{  ([:lower:])+ } = all_lower_case;

Matches: these are lower case words. But NOT tHesE ONEs.

these = all_lower_case

are = all_lower_case

lower = all_lower_case

case = all_lower_case

words = all_lower_case

upper#

words beginning with an uppercase character and followed by one or many lowercase characters:

lexicon:{  [:upper:]([:lower:])+ } = initial_capitals;

Matches: these are Initial Uppercase words.

Initial = initial_capitals

Uppercase = initial_capitals

range#

Word beginning with 2 uppercase characters and followed by a 2 or 3 numbers in the range 0-9:

lexicon:{  ([:range(A-Z):]){2}([:range(0-9):]){2,3} } = aa_numer;

Matches: AA234 and FZ93 but not aa123 A123

AA234 = aa_numer

FZ93 = aa_numer

Matching alphanumeric and not found#

Match a word made up from alphanumeric characters which is not found in the morphological dictionary (+nf).

rule: {  <"([:alnum:])+", +nf> } = alnum_words;

Matches: these are strang34 45worD00s.

strang34 = alnum_words

45worD00s = alnum_words

Context Rules#

A context rules use the context to annotate something, for instance, if someone kills a person, that person is the victim: ‘kill’ {Person}=Victim.

Matches words that are not found in lexicon#

Only the first ‘Mercedes’ is matched as the context in the next sentence does not match.

rule: {
        { Prop } = Car@(country="Germany")
        "is" "a" "German" "car"
};

A Mercedes is a German car. Mercedes is a nice person.

German = Car@(country=”Germany”)

Find entities based on context#

lexicon: {baseball, tennis} = Sports;

rule:{
        {(<Prop>)+} = SportsPerson
        'be' 'a' Sports 'player'
};

Haduki Dingledong was a tennis player.

Haduki Dingledong = SportsPerson
tennis = Sports

Fun with rules#

Find NounPhrases#

rule: { (<Adj>)+ <Nn> } = NounPhrase;

He has a very big house.

very big house = NounPhrase

Hyphenation#

We still match on words that have been split using hyphenation

rule: { (<h'fiets'>) } = Fiets;

Ik heb een grote groene bak-

fiets en een kleine koers-

fiets.

bakfiets = Fiets
koersfiets = Fiets

Semantic rules#

Rules: semantics (and adding attributes)#

lexicon: { fancy, nice} =  positive_adj;
lexicon: { ugly, polluting } =  negative_adj;

rule: {
        (positive_adj)+ <Nn>
} = Sentiment@(type="positive");

rule: {
        (negative_adj)+ <Nn>
} =Sentiment@(type="negative");

A fancy nice car is better than an ugly polluting truck.

fancy = positive_adj

nice = positive_adj

ugly = negative_adj

polluting = negative_adj

fancy nice car = Sentiment@(type=”positive”)

ugly polluting truck = Sentiment@(type=”negative”)