Movie Murder Investigation#

This tutorial requires previous knowledge of the basics of wowool.

We are going to go through how to put together a whole project. This is a complex project that will allow us to touch many linguistic issues, so it is a good basis to sharpen your linguistic knowledge.

Let’s imagine that we are the police and we are trying to find criminal actions in movies and important actors: weapons, victims, perpetrators and so on.

Through this project we will learn about wowool construct such as:

  • corpus analysis

  • lexicons

  • regular expressions

  • token properties

  • rules

  • containers

  • using existing domains

  • using filters

  • longest match, shortest match and shortest match attribute

  • unordered match

And we will introduce important linguistic concepts:

  • stems

  • part of speech

  • noun phrases

  • verb phrases

  • active and passive sentences

  • coordination

  • ellipsis of the subject

  • collocations

We are going to walk you through how to create lexicons of characters, weapons and verbs of harm and then how to put everything together with the help of rules to find perpetrators, victims and murder weapons.

Setting up the enviroment#

  • Go to your bin directory.

  • Download the “movies” corpus and copy it with the name movies in your bin directory.

  • Create a directory called movie-rules, that’s where we are going to place our rules

Creating lexicons#

First we are going to create a series of lexicons that will help us to define later who are the bad ones and the victims.

Character Lexicon#

A very important thing to do is to identify the players, in this case the characters that appear in the movie, to see later who is causing havoc.

The names of characters are typically one or more proper nouns (<’Prop’>), like “Batman” or “Buffalo Bill”, if we search for it, we will see many passing by:

wow -p english -f movies -e "(Prop)+"

Note that in the Windows terminal you can sometimes get weird characters: “Katsushir┼ì”. This is because the locale of the terminal is not utf-8. If you redirect your ouput to a file, this is not a problem.

We get a lot of characters, like ‘Harry’, ‘Jack’, ‘Vader’, but also other types of proper nouns, like ‘FBI’, ‘New York City’ or ‘Mount Doom’. We could take this list and clean it, but it has 1740 items, so it could take us a couple of hours to go through them.

We could also write a more restrictive rule. For instance, we could filter out other entities that we know are no Characters, like City, Organization, Company, Facility, Country and WorldRegion by using the english-entity domain.

wow -p 'english,english-entity' -f movies \
    -e '{(Prop)+}= Character'    \
    -e '{ Character[^ (City|Organization|Country|PlaceAdj|Facility|Company|WorldRegion|Event) ^] }= wow::filter@(concept="Character")'

Some notes:

  • The square brackets mean a container. In Wowool all annotations are containers, which means that we can write rules that match within the scope of the annotation. In this case: We are going to look inside the scope of what we have defined as a Character (just any series of proper nouns ) and check if there are any cities, countries and so on.

  • The ^ in the container is what we call an anchor: a ^ to the left “[^” means that the annotation has to start with the expression that follows. To the right “^]”, that it has to end with that expression. In the rule above, we want the entities to start and end at the same points of the Character annotation. So, if we find our character to be “New York City”, which is a also a City, it will be filtered out, because the beginning and end coincide. On the other hand, if we have a character called “Abraham Lincoln” and a City called “Lincoln”, the character won’t be filtered out, because it does not have the same starting and end point as the city.

This rule gives us more accurate results as many common entities have been filtered out, but still cannot get rid of fiction cities like ‘Duloc’ or other entities that we are not capturing, like ‘Project Mayhem’ or ‘Middle Earth’. But anyway, it is a start and we can clean the file as we go along.

To get directly a wow file, use the -o option:

wow -p "english,english-entity" -f movies \
    -e '{(Prop)+}= Character'  \
    -e '{ Character[^ (City|Organization|Country|PlaceAdj|Facility|Company|WorldRegion|Event) ^] }= wow::filter@(concept="Character")'   \
    -o movie-rules/character.wow

Open the file movie-rules/character.wow. Search for ‘=’. You will find 2 annotations: CAPTURE and Character. CAPTURE is everything that we have matched, so it is too extensive. Delete the CAPTURE lexicon and leave only the Character lexicon.

Now you have a starting point. Go through the file and clean up a bit:

  • Take away every name that you do not recognize directly as a character (e.g. “Black Pearl”), but be careful because there are very unusual names in the files.

  • Get rid of positions, professions and forms of address (Mr.) if the person is not always addressed like that:

    Lieutenant Colonel Bill Kilgore -> Bill Kilgore

  • Replace ‘ and . by the escaped sequence ' and .

    Delmar O’Donnell -> Delmar O'Donnell

If you can’t decide if a word is a character, get rid of it, or search for it:

wow -p english -f movies -e "<'Anchor'>"

You can see the word in context and decide:

DOCUMENT::grep:movies/FindingNemo.txt
SENTENCE:(612,674) The pair encounter three sharks named Bruce , Anchor , and Chum .
CAPTURE -> Anchor

A Character, thus.

Remember that if there are 2 words, you need two tokens:

wow -p english -f movies -e "'Arch' 'Stanton'"

Save that file as character.wow in your movie-rules directory, it does not matter if it is not perfect, we can clean it as we go along.

Now you can try your new annotation:

wow -p english,movie-rules -f movies -e 'Character'

You can see that the most popular names in our movies are “Jack”, “Harry” and “Danny”.

But, we seem to be missing Characters with middle names in initials: “Walter E . Kurtz”. This happens because the ‘.’ is nogt a proper noun, it is a punctuation (Punct).

When the ‘.’ is preceded by a single letter, the tokenizer also splits the token in two, so we need to take this into account. Let’s write a discovery rule and see what we find:

wow -p 'english,english-entity,movie-rules' -f movies -e "(Prop) '[:upper:]' '\.' (Prop)"

Note that the period ‘.’ is a token on its own. If you are unsure of how a string is tokenized, mainly because there could be punctuation involved, the best way to find out is to run the string:

wow -p english -i "Walter E. Kurtz"

t(0,6) "Walter" (Init-Cap, Init-Token)['Walter':Prop-Std]
t(7,8) "E" (NF)['E':Prop-Std]
t(8,9) "." ['.':Punct-Sent]
t(10,15) "Kurtz" (Init-Cap, NF)['Kurtz':Prop-Std]

Every line with t(begin_offset,end_offset), is a token, so ‘.’ is a separate token.

By running the discovery rule we get:

Benjamin L . Willard
Gillian B . Loeb
Henry F . Potter
Walter E . Kurtz
John H . Miller
Vernon T . Waldrip
John F . Kennedy

So, let’s add these gentlemen. Do not forget to escape the ‘.’:

Benjamin L . Willard,
Gillian B . Loeb,
Henry F . Potter,
Walter E . Kurtz,
John H . Miller,
Vernon T . Waldrip,
John F . Kennedy,

As a big fan of Star Wars, I can see that we are also missing names like “Obi-Wan Kenobi”. Some names have hyphens, so, to find those we are going to make another rule that will make use of token properties and regular expressions:

wow -p english -f movies  -e " +Init-Cap '-' '[:upper:](.)+' (Prop)+"

Some explanation:

  • You see that our rule has 3 tokens, even when the words we are after (“Woo-jin”) are written together as one token. The tokenizer splits these words into 3 tokens for analysis convenience, so we need 3 tokens <> <> <>

  • +Init-Cap means that the token starts with an initial uppercase letter and is a token property. Token properties are assigned by the tokenizer and they appear in between ( ) after the literals:

wow -p english -i "Woo"
t(0,3) "Woo" **(Init-Cap, Init-Token)**['Woo':Prop-Std]

To use them in a rule, we precede them with a +: +Init-Cap, +Init-Token

  • <’[:upper:](.)+’> means that the stem has to start with an upper case. The rest is not important.

By running this rule, we discover additional characters that we can add now to our lexicon in the character.wow file:

Lloyd - Hughes,
Ben - Hur,
Witch - King,
Obi - Wan Kenobi,
Jean - Jacques Saurel,
Obi - Wan,
Cha - Cha,
Dae - Su,
No - Face,

Because we are in a movie context, we find more unusual names. For instance, in the movie 12 Angry Men, the members of the jury are identified as “Juror” and then a number: “An angry Juror 3 accuses Juror 5”. To cover all 12 Angry men, we could use a regular expression:

Juror (1)?[:digit:],

The optional digit is a one (1), because we assume that the only jurors with 2 digits would be: 10,11 and 12, all starting with “1”.

Add this to your character wow:

lexicon :
{
    Juror (1)?[:digit:],
} = Character;

I put it in a separate lexicon, but you can add them to the existing one.

Weapon Lexicon#

To find weapons, we are going to resort to something called collocations.

Collocations refer to the context in which a word appear. For instance, if we take the word ‘gun’ and we look at left context, we might find “machine gun” comes up often, at the right we might find “guns and ammunition”. So, it is a sort of analysis of the context.

Collocations are very useful, because words that are similar tend to appear in the same context: ‘He killed him with a ….”, probably we find “gun”, “arrow” or “knife”, as they are instruments of killing.

Let’s use a very simple rule to find collocations, let’s look for 2 words to the left of those 3 weapons and see what we get. We will capture those words and add an annotation ‘Left’ to see them more clearly:

wow -p english -f movies -e "{<><>} = Left ('gun'|'knife'|'arrow')"

The top results we get are:

with submachine
1 -> with his
2 -> with an
2 -> holding a
3 -> with a

we see that we have a number of collocations that start with “with”, which makes sense because “with” (a preposition) is often used to point at an instrument.

  • A preposition (Prep) express direction (‘to’, ‘towards’), instrument (‘with’) or company (‘with’), location (‘on’, ‘into’, ‘behind’), time (‘during’, ‘after’) or possesion (‘of’).

So let’s reverse the rule and see what nouns we get behind “with”.

We could make it loose, at the risk of getting many results:

./wow -p english -f movies -e " 'with' .. (Nn)+"

We could go about it on a more linguistic way, which will give us better results:

wow -p english -f movies -e "'with' (Det)? (Adj|Prop)* (Nn)+"

So I am asking for ‘with’ followed optionally by a determiner (Det) by zero or many adjectives (Adj) or proper nouns (Prop). My expression needs to end with a noun (Nn), as weapons are usually nouns.

A small reminder of what are these parts of speech:

  • A determiner (Det) is a word that ‘determines’ what kind of thing we are talking about. They are words such as: ‘the’, ‘a’, ‘this’, ‘another’, ‘your’. For instance ‘the’ usually introduces a noun that we already know (‘The man did not want to enter’), ‘a’ an unspecified noun (‘a child told me’), other show the position: ‘this’, ‘that’ or who owns it: ‘my’, ‘his’. It always appears at the beginning of a noun phrase.

  • An adjective (Adj) gives us a specification or qualification of the noun it is talking about: ‘ugly’, ‘expensive’, ‘white’, ‘systemic’. It usually appears before a noun or proper noun: ‘[entertaining] movie’. It can also appear after verbs like ‘be’ referring to the subject: “this phone is [expensive]”.

If you find it too complicated, then, try using the noun phrases (NP) from the entity domain:

wow -p english,english-entity -f movies -e "'with' NP[ Nn ^]"
  • A noun phrase (NP) is a group of words that group toghether around a Noun or a Proper Noun, which is the head of the NP: “Henry, “life”, “their relationship”, “optimal weather conditions” or “evasive and inconsistent answers” are all noun phrases.

This “NP[ <Nn> ^]” means that within the noun phrase there should be a noun and it should be anchored (^) at the end of the noun phrase, which means that the last word of the NP should be a noun. We do that to restrict a bit the output, as weapons are usually nouns and not proper nouns.

We can now go quickly through the list and start collecting a lexicon with the weapons we find. Add them to a file in movie-rules called weapon.wow

lexicon : (input="stem")
{
    arrow,
    ax,
    axe,
    (bamboo )?spear,
    baseball bat,
    baton,
    bomb,
    explosive,
    gun,
    (rock )?hammer,
    hatchet,
    katana,
    knife,
    lasso,
    machete,
    Molotov cocktail,
    pistol,
    rifle,
    switchblade,
    sword,
    weapon,
} = Weapon;

Note

  • Use (input=”stem”), so that your annotation will match both the singular and plural (‘spear’ and ‘spears’).

  • If a word is in plural (guns) you need to put it in the singular in the lexicon (“gun”), otherwise you won’t match the stems of the words.

  • You can generalize expressions by making elements optional: “(bamboo )?spear” . Both “bamboo spear” or just “spear” are valid weapons. Note how in the optional part we have included the space after the word, as in lexicons the space is a character we need to account for.

If you think of more weapons from the top of your head (“machine gun”,”chainsaw”) just add them to the list.

Try again:

wow -p "english,movie-rules" -f movies -e "Weapon"

As you go through your results, you might encounter some issues:

Chuck has adapted to the island ‘s meager living conditions , having become adept at spearing fish and making fires .
CAPTURE -> spearing

In this case, ‘spearing’ is not a weapon. We have asked to match the stem, ‘spear’ is the stem of the noun which is the weapon, but also of the verb, which is an action.

Let’s fix it by deleting ‘spear’ from our lexicon and making the following rule that will match only the noun:

rule:
{
    <'spear',Nn>
} = Weapon;

You might also find more guns just by going through the results of the rule above:

car bomb,
chef 's knife,
electromagnetic pulse weapon,
handsaw,
machine gun,
plasma weapon,
warhammer,

Add whatever you see. You will find more things every time you iterate through your results.

Harm Lexicon#

Now we have our characters and weapons, let’s try to find harmful actions. Actions are expressed mainly by verbs, so let’s see what kind of verbs appear in our movies:

wow -p english -f movies -e " V "

Think of the pattern ‘Character harms Character’ where ‘harm’ is the verb you want to capture, like ‘kill’

We can already select some of the verbs that mean harming someone (see below), and maybe we can make a classification of how bad the harm is, like if it results necessarily in death or not:

Deadly:

lexicon : (input="stem")
{
    assassinate,
    choke,
    crucify,
    dismember,
    drown,
    electrocute,
    eliminate,
    execute,
    garrote,
    impale,
    kill,
    liquidate,
    massacre,
    murder,
    slaughter,
    slay,
    strangle,
    suffocate,
} = Harm@(type="deadly");

Non-deadly or unclear:

lexicon : (input="stem")
{
    abuse,
    ambush,
    assault,
    attack,
    avenge,
    beat,
    cripple,
    damage,
    deceive,
    defeat,
    disfigure,
    drug,
    enslave,
    extort,
    frighten,
    grope,
    harm,
    humiliate,
    imprison,
    injure,
    insult,
    intimidate,
    kidnap,
    knock,
    mistreat,
    mutilate,
    punish,
    rape,
    sabotage,
    sodomize,
    threaten,
    torment,
    torture,
    violate,
    wound,
    wrestle,
} = Harm@(type="non-deadly");

lexicon : (input="stem")
{
    shoot,
    stab,
} = Harm@(type="unclear");

Harm rules#

With rules we can do a lot of interesting things, are they are not just lists of words, but it deals with abstractions and relations.

First let’s create a file harm-fact.wow, that’s where we will copy our rules and lexicons to put together the complete harm action.

Let’s start and create our first Harm rule:

wow -p 'english,movie-rules' -f movies -e "Character Harm Character"

We get some good results and we get 2 dangerous villains:

1 -> Hadley beats Bogs
1 -> Hadley murder Tommy
1 -> Quaid killing Lori
1 -> Quaid kills Benny

We can already make a rule that in our harm-fact.wow to get us started:

rule:
{
    {Character} = Perpetrator
    Harm
    {Character} = Victim
} = HarmFact;

Try to find good names for your annotations so that your rules become readable. For our own sake, we are going to call this rule “basic rule”, as we will be coming back to it.

With that rule, we get the fact as well as the actors (the sub annotations Perpetrator and Victim).

But this rule is very restrictive, we are sure that there is much more going on. A simple way to get more recall is to use verb phrases (VP) instead of just the Harm verb and to allow for an optional adverb (Adv, like “accidentally”) in front of the verb phrase.

A verb phrase a group of words that go together with a verb, which is known as main verb, it could be a simple verb, like “eat” or a more complex construction, like ‘must have been hidden”, “can sing” or “is not working”. Verbs that help, but have not the most important meaning, like ‘can’ or ‘must’ are also called modal verbs or auxiliary verbs.

rule:
{
    {Character} = Perpetrator
    (Adv)?
    VP[ Harm ]
    {Character} = Victim
} = HarmFact;
wow -p 'english,english-entity,movie-rules' -f movies -e "HarmFact"

Now we get some more hits like: “Gaear has killed Jean” or “Blackie accidentally kills Benny”.

The kind of semantics actors we are looking for are a Subject a Verb and an Object.

The Subject is usually the agent that performs the action, though not in all cases (we will explain it later).

The Object, is the thing on which the action befalls. In these cases the murder victim.

Let’s look at them separately.

The Subject#

Let’s first concentrate on the subject, the person who is doing the Harm action. Let’s loosen up the rule and see what we find between a subject and its Harm verb:

wow -p 'english,movie-rules' -f movies -e 'Character .. Harm Character'

To loosen up the rule we use the .. operator (shortest match) between the Character and the verb. You see that we get many more results now, but many are not valid.

Let’s go through the list bottom up and pick up interesting examples:

“Zé plans to kill Carrot”

“plan” belongs to a class of verbs that mean ‘intent’ or ‘hope’, usually followed by ‘to’ and a verb. To find similar expressions, we can search for the following:

wow -p "english,movie-rules" -f movies -e "Character V 'to' Harm Character"

We get them similar verbs to “plan”, like “attempt”, “plot”,and “hope”.

1 -> Diana attempts to kill Ares
1 -> Diana hoping to kill Ludendorff
1 -> Groot threaten to attack Yondu
1 -> Mann attempts to kill Cooper
1 -> Michael plots to murder Sollozzo
1 -> Quirrell attempts to kill Harry
1 -> Stewart Menzies appears to threaten Clarke
1 -> Zé plans to kill Carrot

There is not actual harm done here, so we could ignore these expressions or make a rule IntentToHarmFact (as opposed to HarmFact):

lexicon: (input="stem")
{
    attempt,
    hope,
    intend,
    plan,
    plot,
    try,
    want,
} = Intend;

rule:
{
    {Character} = Perpetrator
     Intend <"to"> Harm
     {Character} = Victim
} = IntentToHarmFact;

Let’s look at another phenomenon:

Goose is shot by Li’l Dice

This is a harm fact and we have a victim and perpetrators (in this case Li’l Dice), but the subject (Goose) is not performing the harm, but it is the victim, so what’s happening?

That is what we call a passive . In a passive, there is a transformation, the subject becomes the one on which the action is inflected and the real actor comes after the preposition “by”, the verb is changed into ‘be’ followed by a past participle. A normal sentence, where this does not happen is called active

So, these 2 constructions are similar:

active:

Agent Action Object
“John murders Harry”

passive:

Object ‘be’ Action (‘by’ Agent)
“Harry is murdered by John”

A passive is used for variation, to put the stress on the object or because we don’t know or don’t want to mention the agent: “Harry was murdered”. So we need to take into accout that the agent can be optional. Let’s make a rule and see what we get:

// Passive
rule:
{
    {Character} = Victim
    VP[ 'be' Harm ] 'by'
    {Character} = Perpetrator
} = HarmFact;

wow -p "english,movie-rules" -f movies  -e "Character 'be' Harm ('by' Character)?"

As in the first rule, we could have an adverb modifying the action: “Faramir is gravely wounded”, but in this case the Adverb is between the verbs. Let’s them add it optionally:

rule:
{
    {Character} = Victim
    VP[ 'be' (Adv)? Harm ]
    ( 'by' {Character} = Perpetrator )?
} = HarmFact;

This rule we will refer to as “passive rule” from now on.

One last construction that we could cover is the following:

“Blondie shoots and kills Angel Eyes”

In movies, sequences of actions are usually described, so it is common to find that kind of construction:

Subject Verb .. ‘and|but’ Verb Object

We see that the second part of the construction, after the ‘and’ or ‘but’ does not have a subject of itself. This phenomenon is called ellipsis of the subject, we do not mention the subject because it is understood. Let’s try it out:

   wow -p english,english-entity,movie-rules -f movies -e "Character (Adv)? V ..  ('and'|'but') (Adv)? Harm Character"

We get some good hits, but also some wich are not correct:

   "[Gollum] attacks Frodo and bites his finger off to reclaim the Ring , but Frodo fights back and knocks [Gollum]"

 The first subject, "Gollum", is not the one that knocks "Gollum". This is because how the shortest match operator works.

 Let's review how the longest match, shortest match operator and shortest match attribute work. To make it easy, take this sentence: "John and Mary visited Japan and China", and we will try to match GivenName and Country using the different forms of matching:

 * longest match operator: GivenName ... Country

       "[John] and Mary visited Japan and [China]"

   we match from the first GivenName to the last Country
  • Shortest match operator: GivenName .. Country

    “[John] and Mary visited [Japan] and China”

    we match from the first GivenName we find to the first found Country

  • Shortest span attribute: GivenName .? Country

    “John and [Mary] visited [Japan] and China”

    we match the smallest span between GivenName and Country

In syntax, the important elements of the verbs tend to appear close to it, so we are are going to use the shortest span in this case. This is my rule now:

//-------------------------
// Ellipsis:
// "Gollum attacks Frodo and bites his finger off to reclaim the Ring , but Frodo fights back and knocks Gollum"
//-------------------------
rule:
{
    {Character} = Perpetrator
    (Adv)?
    VP
    .?
    ('and'|'but')
    (Adv)?
    VP[ Harm ]
    {Character} = Victim
} = HarmFact;

As you can see, I have also added a comment. This is good practice, to mention what you are trying to cover and to add an example, so that someone else can figure out what the rule tries to cover.

This rule we are going to call “subject ellipsis rule”.

So, now, my output looks fine except for this sentence:

[Clove] gloats about Rue ‘s death , Thresh , District 11 ‘s male tribute , appears and brutally kills [Clove]

In this case, it is Thresh that kills Clove, but the shortest match does not capture it, because between ‘Thresh’ and the first verb ‘appears’ there is “, District 11 ‘s male tribute,”, an explanation of who Thresh is.

That type of construction is called an apposition in linguistics, and it is a very common phenomenon. They serve as a kind of explanation, they are a kind of equivalent of their referent. Appositions usually occur before a proper noun (“[Spanish tennis player] Rafael Nadal”) or following the proper noun or noun in between commas (“Rafael Nadal, a Spanish tennis player, … ” or “diabetes, a common metabolic disease, …”).

In our current case, we just want to accoung for them, so that we can skip them in our matcher. Luckily, we can cover it, using NPs, or something called NPLong, which is a series of NPs that are related.

Let’s add an optional apposition ((”,” NPLong “,”)? ) to our subject ellipsis rule:

rule:
{
    {Character} = Perpetrator
    ("\," NPLong "\,")?
    (Adv)?
    VP
    .?
    ('and'|'but')
    (Adv)?
    VP[ Harm ]
    {Character} = Victim
} = HarmFact;

We could also add appositions to our first basic rule to get things like: “Billy Batts , a mobster in the Gambino crime family , insults Tommy”:

rule:
{
    {Character} = Perpetrator
    (<"\,"> NPLong <"\,">)?
    (Adv)?
    VP[ Harm ]
    {Character} = Victim
} = HarmFact;

The Object#

In general the object does not tend to be far away from the verb. Let’s try this loose rule and see if there is something we can repair:

wow -p "english,english-entity,movie-rules" -f movies -e "Character (<>)?  Harm .. Character"

One thing we find is that sometimes the Character in the object has some words in front: “Göth brutally mistreats his Jewish maid Helen Hirsch”. As we explained above, “Jewish maid”, which is in front of the Character name, is an apposition, an explanation of who Helen Hirsch is.

The rule: Character (Adv)? Harm Character, won’t match it, because “his Jewish maid” is not part of the Character name.

We can fix it by using again the noun phrases that will cover the whole phrase. This is how the basic rule would look then:

rule:
{
    NP [ {Character} = Perpetrator ]
    ("\," NPLong "\,")?
    (Adv)?
    VP[ Harm ]
    NP [ {Character} = Victim ]
} = HarmFact;

Coordination#

Last, but not least, sometimes the victims or perpetrators are more than one person “Drax and Gamora defeat Korath and Nebula”. An expression with and “and” or “.. , .. and ..” is called coordination. Let’s modify our rules 2 deal with a coordination of 2 characters by adding an optional pattern to our victims and perpetrators for our rules, that will now look like this:

// -------------------------------------------------------
// Basic rule:
// "Gaear kills Carl"
// Adv: "Jack then shoots Barbossa"
// Apposition: "Billy Batts , a mobster in the Gambino crime family , insults Tommy"
// Coordination: "Drax and Gamora defeat Korath and Nebula"
// -------------------------------------------------------
rule:
{
    NP [ {Character} = Perpetrator ]
    ( 'and' NP [ {Character} = Perpetrator] )?
    ("\," NPLong "\,")?
    (Adv)?
    VP[ Harm ]
    NP [ {Character} = Victim ]
    ( 'and'  NP [ {Character} = Victim ])?
} = HarmFact;


// -------------------------------------------------------
// Passive rule:
// "Goose is shot by Li"
// -------------------------------------------------------
rule:
{
    NP [ {Character} = Victim ]
    ('and' NP [{Character} = Victim ])?
    VP ['be' (Adv)? Harm]
    (
        'by'
        NP [ {Character} = Perpetrator ]
        ( 'and'  NP [ {Character} = Perpetrator ] )?
    )?
} = HarmFact;

// -------------------------------------------------------
// Subject ellipsis rule:
// "Carl pulls a gun and shoots and kills Gustafson"
// Adv: "Melina then outwit and kill Richter"
// Apposition: "Thresh , District 11 's male tribute , appears and brutally kills Clove"
// -------------------------------------------------------
rule:
{
    {Character} = Perpetrator
    ("\," NPLong "\,")?
    (<Adv>)?
    VP
    .?
    ('and'|'but')
    (Adv)?
    VP[ Harm ]
    NP [ {Character} = Victim ]
    ('and' NP [{Character} = Victim ])?
} = HarmFact;

It looks complicated, but we have managed to cover quite a bit of patterns with 3 rules. Let’s leave it at this and check out results:

wow -p english,english-entity,movie-rules -f movies -e "HarmFact"

Harmfacts and Weapons#

we can check now how weapons related to the harmfact. To see this we are going to use the following discovery rule:

wow -p english,english-entity,movie-rules -f movies -e "( HarmFact %% Weapon)"