Howto’s#

Match literals#

To match the word exactly as it is written, you use the literal match.

Matching using Lexicons#

Only the first kiwi in the sentence is an exact match

lexicon:{
            kiwi
    } = fruit;

Mary ate one kiwi. John ate two kiwis. I love a KIWI

  • kiwi = fruit

Matching using Rules#

Only the first kiwi in the sentence is an exact match

rule:{ "kiwi" } = fruit;

Mary ate one kiwi. John ate two kiwis. I love a KIWI

  • kiwi = fruit

Match Stems#

Match on the root form of the words, buy will match buy, bought, buying, …

Match using Lexicons#

lexicon:(input="stem"){
        pêche
} = french_fruit;

J’ai des pommes et des pêches à vendre.

  • pêches = french_fruit

Lexicons - normalized stems#

The input will be normalized, accents will be removed and the match will be case insensitive. Use input=”normalized_stem”

lexicon:(input="normalized_stem"){
        peche
} = french_fruit;

J’aime la confiture de pêches.

  • pêches = french_fruit

Match using Rules#

Use single quotes to address the stem of a literal.

rule:{ ( 'pêche' | 'pomme' ) } = french_fruit;

J’ai des pommes et des pêches à vendre.

  • pommes = french_fruit

  • pêches = french_fruit

Match Part of Speech#

Matches Noun-Noun Adj-Noun sequences#

rule:{ ( Nn|Adj )+ Nn } = NounSequence;

The quick brown fox jumped over the lazy dog.

  • quick brown fox = NounSequence

  • lazy dog = NounSequence

Matches Work as a verb#

Only the first ‘work’ as the one in the next sentence is not a verb.

rule:{  <'work',V> } = Activity;

I work at the Hospital. The work there is good.

  • work = Activity

Compounds#

Heads#

Match words containing ‘fiets’ (‘bicycle’ in Dutch) as a head (the last part of the compound).

In lexicons:

lexicon:(input="head") { fiets } = Fiets;

In rules, use ‘h’ in front of the expression to match:

rule: { (<h'fiets'>) } = Fiets;

Ik heb een grote groene bakfiets en een kleine koersfiets om op het fietspad te rijden.

  • bakfiets = Fiets

  • koersfiets = Fiets

Component#

Match words containing ‘fiets’ in any part of the compound.

In lexicons:

lexicon:(input="component") { fiets } = Fiets;

In rules: use ‘c’ in front of the expression to match.

rule: { (<c'fiets'>) } = FietsDing;

Ik heb een grote groene bakfiets en een kleine koersfiets om op het fietspad te rijden.

  • bakfiets = FietsDing

  • koersfiets = FietsDing

  • fietspad = FietsDing

Canonicals#

You can match a concept canonical by using the canonical input. For that you need a domain before that contains canonicals in the annotations, like the entity domain.

lexicon:(input="canonical") {  Joe Biden, Emmanuel Macron, Boris Johnson } = President;

Emmanuel Macron won the elecions. He was happy about it. The main oponent of Macron was Marine Le Pen.

  • Emmanuel Macron = President

  • he = President

  • Macron = President

Filter a concept#

rule :
{
    'Charlie' 'Hebdo'
} = wow::filter@(concept="Person");

You can combine it with a lexicon:

lexicon:
{
    Charlie Hebdo,
    Fannie Mae
} = NoPerson;

// Charlie Hebdo
rule :
{
    NoPerson
} = wow::filter@(concept="Person");

Wildcards#

Lexicon#

lexicon:{ (.)* (I|i)nc\. , (.)*soft } = company;
I worked at Mickysoft and Ashlar Inc. last year.
* Mickysoft = company
* Ashlar Inc. = company

Rules: within tokens#

rule:{ "(.)*ing" } = ing_words;
I was thinking while I was running that eating is a worldly pleasure.
* thinking = ing_words
* running = ing_words
* eating = ing_words

Rules: tokens#

rule:{  ( 'buy' | 'purchase') {(Det)? (Adj)* Nn} = Purchase } = buying_things;
I bought apples. He purchased a fancy fast car. That’s life
* bought apples = buying_things
* purchased a fancy fast car = buying_things

Match Character classes#

Classes#

These are the available classes: lower,upper,digit,xdigit,alnum,alpha,range,punct

lexicon:{  ([:upper:]){2}([:digit:]){2,3} } = aa_number;

Matches: AA234 and FZ93 but not aa123 or A123

  • AA234 = aa_number

  • FZ93 = aa_number

lower#

words made up of all lowercase characters:

lexicon:{  ([:lower:])+ } = all_lower_case;

Matches: these are lower case words. But NOT tHesE ONEs.

  • these = all_lower_case

  • are = all_lower_case

  • lower = all_lower_case

  • case = all_lower_case

  • words = all_lower_case

upper#

words beginning with an uppercase character and followed by one or many lowercase characters:

lexicon:{  [:upper:]([:lower:])+ } = initial_capitals;

Matches: these are Initial Uppercase words.

  • Initial = initial_capitals

  • Uppercase = initial_capitals

range#

Word beginning with 2 uppercase characters and followed by a 2 or 3 numbers in the range 0-9:

lexicon:{  ([:range(A-Z):]){2}([:range(0-9):]){2,3} } = aa_numer;

Matches: AA234 and FZ93 but not aa123 A123

  • AA234 = aa_numer

  • FZ93 = aa_numer

Matching alphanumeric and not found#

Match a word made up from alphanumeric characters which is not found in the morphological dictionary (+nf).

rule: {  <"([:alnum:])+", +nf> } = alnum_words;

Matches: these are strang34 45worD00s.

  • strang34 = alnum_words

  • 45worD00s = alnum_words

Context Rules#

A context rules use the context to annotate something, for instance, if someone kills a person, that person is the victim: ‘kill’ {Person}=Victim.

Matches words that are not found in lexicon#

Only the first ‘Mercedes’ is matched as the context in the next sentence does not match.

rule: {
        { Prop } = Car@(country="Germany")
        "is" "a" "German" "car"
};

A Mercedes is a German car. Mercedes is a nice person.

  • German = Car@(country=”Germany”)

Find entities based on context#

lexicon: {baseball, tennis} = Sports;

rule:{
        {(<Prop>)+} = SportsPerson
        'be' 'a' Sports 'player'
};

Haduki Dingledong was a tennis player.

  • Haduki Dingledong = SportsPerson

  • tennis = Sports

Fun with rules#

Find NounPhrases#

rule: { (<Adj>)+ <Nn> } = NounPhrase;

He has a very big house.

  • very big house = NounPhrase

Hyphenation#

We still match on words that have been split using hyphenation

rule: { (<h'fiets'>) } = Fiets;
Ik heb een grote groene bak-
fiets en een kleine koers-
fiets.
  • bakfiets = Fiets

  • koersfiets = Fiets

Semantic rules#

Rules: semantics (and adding attributes)#

lexicon: { fancy, nice} =  positive_adj;
lexicon: { ugly, polluting } =  negative_adj;

rule: {
        (positive_adj)+ <Nn>
} = Sentiment@(type="positive");

rule: {
        (negative_adj)+ <Nn>
} =Sentiment@(type="negative");

A fancy nice car is better than an ugly polluting truck.

  • fancy = positive_adj

  • nice = positive_adj

  • ugly = negative_adj

  • polluting = negative_adj

  • fancy nice car = Sentiment@(type=”positive”)

  • ugly polluting truck = Sentiment@(type=”negative”)