wow – Wowool Driver#

Wowool comes with a series of APIs, but you can also test and develop your rules by using the ‘wow’ driver. The wowool console is part of the eot-wowool-sdk

run wow (wowool console):

wow –help
wow++ –help (wrapper of the c++)
PATH: [python_env]/lib/python3.9/site-packages/eot/wowool/package/lib

Options#

Input options (-i, -f)#

wow takes plain text utf-8 as input. There are several ways to pass the input text to the driver:

command line input -i#
wow -p english -i "The cat sat on the mat"
passing a single file -f#
wow -p english -f test.txt
passing a folder -f#
wow -p english -f "/home/corpus"

Pipeline (-p)#

The pipeline options is a comma delimited string containing the components you want to run. The sequence of components will define the output results. You can add 3 type of components inside your pipelines.

  • Language

  • Domains

  • Applications

wow -p english,entity,topics.app -i "Climate change has a global effect on our planet."

The example above will run the english language, english entities and add topics to the result.

Languages and Domains are easy to use as they do not have any arguments, but some of the application can be configured. In general all applications supports arguments that can be passed in the command line, using this syntax’s name({…}).app . Inside the round bracket the application assumes a json object with each key being a parameter of the application initialization.

If the signature of an application is app_name( arg1:str , arg2:list ) then you’re pipeline should look like this:

app_name( {"arg1":"value", "arg2":[ "v2.1", "v2.2"]} ).app

This is an example of the snipped application

Annotating a Person followed by a Company:

wow -p 'english,entity,snippet(source = "rule:{ Person .. Company }= PersonCompany;"}).app' \
    -i "John Smith works for EyeOnText."

Alternatively, you can use the short form notation, which avoids issues with quotes on the command line:

wow -p 'english,entity,snippet(rule:{ Person .. Company }= PersonCompany;).app' \
    -i "John Smith works for EyeOnText."

See Applications for the options of each of them.

Output views (–tool)#

wow prints on the console the results of analysis. You can choose the way you want to see your data by using the option –tool with the argument of your choice

–tool raw#

The tool “raw” is the default. It displays all the tokenization, annotation and token information:

wow -p english -i "Life is like a box of chocolates" --tool raw

You will get the following output:

{Sentence
  t(0,4) "Life" (Init-Cap, Init-Token)['life':Nn-Sg]
  t(5,7) "is" ['be':V-Pres-Sg-be]
  t(8,12) "like" ['like':Prep-Std], ['like':Conj-Sub]
  t(13,14) "a" ['a':Det-Indef]
  t(15,18) "box" ['box':Nn-Sg]
  t(19,21) "of" ['of':Prep-of]
  t(22,32) "chocolates" ['chocolate':Nn-Pl]
}Sentence

The tool raw is the default. If you do not pass any tool to the wowool console you will get the same output:

wow -p english -i "Life is like a box of chocolates"

–tool json#

It will print the annotations (or concepts) and tokens in json format

wow -p english -i "Life is like a box of chocolates" --tool json
{"sentences":[ 0,32],
        "tokens":[
                [0,4,"Life","Init-Cap","Init-Token",["life", "Nn-Sg"]],
                [5,7,"is",["be", "V-Pres-Sg-be"]],
                [8,12,"like",["like", "Prep-Std"],["like", "Conj-Sub"]],
                [13,14,"a",["a", "Det-Indef"]],
                [15,18,"box",["box", "Nn-Sg"]],
                [19,21,"of",["of", "Prep-of"]],
                [22,32,"chocolates",["chocolate", "Nn-Pl"]]
        ],
        "concepts":[ [0,32,1,"Sentence"]]
}

–tool grep#

This is a powerful corpus analysis tool. It evaluates the wowool rules after the -e parameter and returns the matches and a at the end a frequency count:

wow -p english -f ~/corpus/movies/ --tool grep -e "'kill' (Det)? (Nn|Prop)+ "
3  kill bill
2  killed his brother
2  killed his wife
1  kill germans
1  kill pauline
1  kill the man
1  killed his son
1  kill their attacker
1  kill his friend tommy
1  kill the president
1  kill mobster marion bishop
1  kill the messenger

Note

The –tool option is not required to run the corpus analysis tool, you can get the same results just using -e:

wow -p english -f ~/corpus/movies/ -e "'kill' (Det)? (Nn|Prop)+ "

Note

In Linux/MacOS you can use single (’) or double quotes (”) to surround your expression, they are interchangeable, except, if you have single quotes within your expression (<’kill’>) you need to use double quotes, and if you have double quotes (<”left”>) then you have to use single quotes around.

In Windows you always have to use the double quotes and escape them when you want to use the literals.

For more information, in tutorials there is Semantic corpus analysis tutorial that has a more complete information.

–tool stagger#

Tool tagger is used to see the annotation markup:

wow -p english,entity -i "Ginni Rometty is the CEO of IBM" --tool stagger
Person -> "Ginni Rometty" @( canonical="Ginni Rometty",company="IBM",family="Rometty",gender="female",given="Ginni",position="CEO",works_at="IBM")
NP -> "Ginni Rometty"
PersonGiv -> "Ginni"
GivenName -> "Ginni" @( gender="female",standalone="Yes")
PersonFam -> "Rometty"
VP -> "is" @( negation="false",voice="active")
NP -> "the CEO"
PersonMention -> "CEO"
Position -> "CEO" @( canonical="CEO",company="IBM",sector="it",theme="business")
NP -> "IBM"
Company -> "IBM" @( canonical="IBM",country="USA",sector="it")

–tool profile#

It gives you some interesting statistics about the processing:

wow -p english -f ~/corpus/english/movies --tool profile

The results are:

| total               | count                                   |
| --------------------| ----------------------------------------|
| documents           | 76                                      |
| tokens              | 62456                                   |
| not_found           | 4096                                    |
| coverage            | 93                                      |
| time(s)             | 0.749567                                |
| bytes               | 315473                                  |
| Mb/h                | 1515.15                                 |
| Mb/(h/thread)       | 6060.58                                 |
| thread(s):          | 4                                       |
---
|  **annotations**    | count                                   |
| --------------------| ----------------------------------------|
| Sentence            | 2753                                    |
|  **total**          | 2753                                    |

You get information about the number of documents, of tokens, the coverage (words that are in the dictionary) the total time in seconds, the bytes, the MB per hour (1 thread and all multithread) and how many threads are running.

The second part tells you what annotations have been found and the frequency. As we are not running any domain in the above example the only annotation is Sentence

  • –tool lid

‘lid’ is the option to request language identification. Create a file, lid.txt, with the following content:

Vi civiliserede folkeslag har mistet evnen til at være stille, og må tage timer i tavshed fra det vilde liv, førend det vil optage os i sig.

Then run:

wow --tool lid -f lid.txt

(0:146) DA

The results are the section offsets where a certain language is found, in this case, just Danish (DA) from offset to 0 to 146.

Let’s make the file longer:

Vi civiliserede folkeslag har mistet evnen til at være stille, og må tage timer i tavshed fra det vilde liv, førend det vil optage os i sig.

Voyager, c’est naître et mourir à chaque instant.

Wer die Wahrheit nicht weiß, der ist bloß ein Dummkopf. Aber wer sie weiß, und sie eine Lüge nennt, der ist ein Verbrecher!

Wat me wel een beetje dwars zit is dat ik soms niet precies kan uitleggen wat me dwars zit.

At some point in life the world’s beauty becomes enough. You don’t need to photograph, paint, or even remember it. It is enough.

Muchos años después, frente al pelotón de fusilamiento, el coronel Aureliano Buendía había de recordar aquella tarde remota en que su padre lo llevó a conocer el hielo.

växa är att undra, att bli vuxen är att långsamt glömma det man undrade som barn.

This is a multilingual text, to enable multilingual processing we will call the –tool lid with the option:

–tool_options ‘{“section_data”:true }’

wow --tool lid -f lid.txt --tool_options '{"section_data":true }'

That will give us the complete multilingual processing. If we just want to see the sections without the analytics, use the silent option -s:

wow --tool lid -f lid.txt --tool_options '{"section_data":true }' -s

Usage#

usage: wow [-h] [-f FILE] -p PIPELINE [-i [TEXT ...]] [--lxware LXWARE]
           [-a ANNOTATIONS]
           [-t {raw,json,concepts,grep,stagger,text,apps,info,input_text,canonical,none}]
           [-v VERBOSE] [--utf8] [--dbg DBG] [-e EXPRESSION] [--grep-stem]
           [--encoding ENCODING] [--sentyziser SENTYZISER]
           [--allow_dev_version]

Named Arguments#

-f, --file

folder or file

-p, --pipeline

pipeline description

-i, --text

The input text to process

--lxware

location of lxware

-a, --annotations

filter the annotations

-t, --tool

Possible choices: raw, json, concepts, grep, stagger, text, apps, info, input_text, canonical, none

name of the tool raw, json, concepts, grep, stagger, text, apps, info, none

Default: “raw”

-v, --verbose

debug levels, trace,debug,…

Default: “error”

--utf8

display the utf8 offsets

Default: False

--dbg

Switches on the extensive debugger options. Current options: print_annotations, nofilter, insertion, rule_trigger, matcher, hmm, rule_info, streams, stream_lookup, overlap

-e, --expression

wowoolian expression, will force the tool grep.

--grep-stem

will return the grep result using the stem.

Default: False

--encoding

set the encoding for reading the input file

Default: “utf-8”

--sentyziser

set the number of lines breaks ( ) in one sentence.

Default: “3”

--allow_dev_version

use the lingware dev version if avaliabel

Default: True