Howto’s#
Match literals#
To match the word exactly as it is written, you use the literal match.
Matching using Lexicons#
Only the first kiwi in the sentence is an exact match
lexicon:{
kiwi
} = fruit;
Mary ate one kiwi. John ate two kiwis. I love a KIWI
kiwi = fruit
Matching using Rules#
Only the first kiwi in the sentence is an exact match
rule:{ "kiwi" } = fruit;
Mary ate one kiwi. John ate two kiwis. I love a KIWI
kiwi = fruit
Match Stems#
Match on the root form of the words, buy will match buy, bought, buying, …
Match using Lexicons#
lexicon:(input="stem"){
pêche
} = french_fruit;
J’ai des pommes et des pêches à vendre.
pêches = french_fruit
Lexicons - normalized stems#
The input will be normalized, accents will be removed and the match will be case insensitive. Use input=”normalized_stem”
lexicon:(input="normalized_stem"){
peche
} = french_fruit;
J’aime la confiture de pêches.
pêches = french_fruit
Match using Rules#
Use single quotes to address the stem of a literal.
rule:{ ( 'pêche' | 'pomme' ) } = french_fruit;
J’ai des pommes et des pêches à vendre.
pommes = french_fruit
pêches = french_fruit
Match Part of Speech#
Matches Noun-Noun Adj-Noun sequences#
rule:{ ( Nn|Adj )+ Nn } = NounSequence;
The quick brown fox jumped over the lazy dog.
quick brown fox = NounSequence
lazy dog = NounSequence
Matches Work as a verb#
Only the first ‘work’ as the one in the next sentence is not a verb.
rule:{ <'work',V> } = Activity;
I work at the Hospital. The work there is good.
work = Activity
Compounds#
Heads#
Match words containing ‘fiets’ (‘bicycle’ in Dutch) as a head (the last part of the compound).
In lexicons:
lexicon:(input="head") { fiets } = Fiets;
In rules, use ‘h’ in front of the expression to match:
rule: { (<h'fiets'>) } = Fiets;
Ik heb een grote groene bakfiets en een kleine koersfiets om op het fietspad te rijden.
bakfiets = Fiets
koersfiets = Fiets
Component#
Match words containing ‘fiets’ in any part of the compound.
In lexicons:
lexicon:(input="component") { fiets } = Fiets;
In rules: use ‘c’ in front of the expression to match.
rule: { (<c'fiets'>) } = FietsDing;
Ik heb een grote groene bakfiets en een kleine koersfiets om op het fietspad te rijden.
bakfiets = FietsDing
koersfiets = FietsDing
fietspad = FietsDing
Canonicals#
You can match a concept canonical by using the canonical input. For that you need a domain before that contains canonicals in the annotations, like the entity domain.
lexicon:(input="canonical") { Joe Biden, Emmanuel Macron, Boris Johnson } = President;
Emmanuel Macron won the elecions. He was happy about it. The main oponent of Macron was Marine Le Pen.
Emmanuel Macron = President
he = President
Macron = President
Filter a concept#
rule :
{
'Charlie' 'Hebdo'
} = wow::filter@(concept="Person");
You can combine it with a lexicon:
lexicon:
{
Charlie Hebdo,
Fannie Mae
} = NoPerson;
// Charlie Hebdo
rule :
{
NoPerson
} = wow::filter@(concept="Person");
Wildcards#
Lexicon#
lexicon:{ (.)* (I|i)nc\. , (.)*soft } = company;
I worked at Mickysoft and Ashlar Inc. last year.* Mickysoft = company* Ashlar Inc. = company
Rules: within tokens#
rule:{ "(.)*ing" } = ing_words;
I was thinking while I was running that eating is a worldly pleasure.* thinking = ing_words* running = ing_words* eating = ing_words
Rules: tokens#
rule:{ ( 'buy' | 'purchase') {(Det)? (Adj)* Nn} = Purchase } = buying_things;
I bought apples. He purchased a fancy fast car. That’s life* bought apples = buying_things* purchased a fancy fast car = buying_things
Match Character classes#
Classes#
These are the available classes: lower,upper,digit,xdigit,alnum,alpha,range,punct
lexicon:{ ([:upper:]){2}([:digit:]){2,3} } = aa_number;
Matches: AA234 and FZ93 but not aa123 or A123
AA234 = aa_number
FZ93 = aa_number
lower#
words made up of all lowercase characters:
lexicon:{ ([:lower:])+ } = all_lower_case;
Matches: these are lower case words. But NOT tHesE ONEs.
these = all_lower_case
are = all_lower_case
lower = all_lower_case
case = all_lower_case
words = all_lower_case
upper#
words beginning with an uppercase character and followed by one or many lowercase characters:
lexicon:{ [:upper:]([:lower:])+ } = initial_capitals;
Matches: these are Initial Uppercase words.
Initial = initial_capitals
Uppercase = initial_capitals
range#
Word beginning with 2 uppercase characters and followed by a 2 or 3 numbers in the range 0-9:
lexicon:{ ([:range(A-Z):]){2}([:range(0-9):]){2,3} } = aa_numer;
Matches: AA234 and FZ93 but not aa123 A123
AA234 = aa_numer
FZ93 = aa_numer
Matching alphanumeric and not found#
Match a word made up from alphanumeric characters which is not found in the morphological dictionary (+nf).
rule: { <"([:alnum:])+", +nf> } = alnum_words;
Matches: these are strang34 45worD00s.
strang34 = alnum_words
45worD00s = alnum_words
Context Rules#
A context rules use the context to annotate something, for instance, if someone kills a person, that person is the victim: ‘kill’ {Person}=Victim.
Matches words that are not found in lexicon#
Only the first ‘Mercedes’ is matched as the context in the next sentence does not match.
rule: {
{ Prop } = Car@(country="Germany")
"is" "a" "German" "car"
};
A Mercedes is a German car. Mercedes is a nice person.
German = Car@(country=”Germany”)
Find entities based on context#
lexicon: {baseball, tennis} = Sports;
rule:{
{(<Prop>)+} = SportsPerson
'be' 'a' Sports 'player'
};
Haduki Dingledong was a tennis player.
Haduki Dingledong = SportsPerson
tennis = Sports
Fun with rules#
Find NounPhrases#
rule: { (<Adj>)+ <Nn> } = NounPhrase;
He has a very big house.
very big house = NounPhrase
Hyphenation#
We still match on words that have been split using hyphenation
rule: { (<h'fiets'>) } = Fiets;
Ik heb een grote groene bak-fiets en een kleine koers-fiets.
bakfiets = Fiets
koersfiets = Fiets
Semantic rules#
Rules: semantics (and adding attributes)#
lexicon: { fancy, nice} = positive_adj;
lexicon: { ugly, polluting } = negative_adj;
rule: {
(positive_adj)+ <Nn>
} = Sentiment@(type="positive");
rule: {
(negative_adj)+ <Nn>
} =Sentiment@(type="negative");
A fancy nice car is better than an ugly polluting truck.
fancy = positive_adj
nice = positive_adj
ugly = negative_adj
polluting = negative_adj
fancy nice car = Sentiment@(type=”positive”)
ugly polluting truck = Sentiment@(type=”negative”)