Entity Graph Application#

Application ID eot_entity_graph

Application Aliases entity-graph.app graph.app eot/graph.app eot/entity-graph.app

Description#

The Entity Graph application produces a list of links between entities. These links represent relations between the entities that have been found in the document. For instance, between a Person and their Position or Pathology and symptoms.

Configuration#

The configuration is an object defined as (bold = required, italic = optional):

  • links (array[Link]) – The list of relations you want to create

  • slots (array[Slot]) – The list of slots to remember

  • themes (DataNode) – The themes (or categories of a document) that link to a node

  • topics (DataNode) – The topics that link to a node

  • content (DataNode) – The content or text you want to attach to a node

Note

All nodes above are optional, but you will need at least one of the following links,themes,topics to get something out.

With Link being an object describing the nodes that will be linked to each other and their relation, defined as:

  • from (Node) – Describes what will be stored in the from node

  • to (Node) – Describes what will be stored in the to node

  • relation (Node) – Describes what will be used in the relation data

  • scope (dict) – A dictionary object with the uri of the scope that will be used for this link

With Node being an object describing what will be used to create this node, defined as: One of the following key’s (uri,label,slot) is required in a Node object

  • uri (string) – The concept URI that will captured and where we will store the concept.canonical content for the node, e.g. Person

  • label (string) – The literal string that will be used as the label of the node.

  • slot (string) – The name of the slot that will be used as a node. Exception: This can not be used in the from node !

  • attributes (array[string]) – array of the attributes to add to the results, e.g. "gender"

  • content (string) – Alternatively you can use the content field to format the given node. The variable concept will be available for use in f-string formatting

With Slot being an object defined as:

  • uri (string) – The concept URI that will be captured and where we will store the concept. canonical content for the slot, e.g. "Position"

  • label (string) – The literal string that will be used as content for the slot, e.g. "Job"

  • content (string) – Alternatively you can use the content field to format the data of the slot. The variables concept and document will be available for use in f-string formatting. For instance, “Software Engineer”.

With DataNode being an object describing a node’s data, defined as:

  • to (string) – Name of the node of which you want to attach the data

  • count (number) – Number of links you want to create or the size of the node data.

Scope#

One of the attributes in a link node is a scope. Scopes ensure we are not matching outside the given annotation. Note that it is identified not by an expression but rather by a concept URI.

If no scope is provided, you will link the ‘to’ concept to all the ‘from’ concepts that appear in the same sentence. Sometimes we do not want to do that, because we want to be more specific in the kind of relation that the concepts have.

In the rules below we define what a ‘work’ relation is from a Person to a Company so that we don’t get spurious links. Do not forget to put the domain with your scope in your pipeline.

"scope": { "uri", "CompanyPosition" }
rule: { Person .. 'work' 'for' Company } = ScopePersonCompany;
rule: { Person .. 'be' Position 'at' Company } = ScopePersonCompany;
rule: { Person '\,' Det Position 'at' Company '\,' } = ScopePersonCompany;

Slots#

Slots are like mementos: things you have seen and want to remember at a later stage. It is used as a means to link to items that have previously been encountered in the document, but are not present in the sentence that is currently being processed. Put differently: it’s a list of items – slots – where each slot corresponds to a concept URI and contains the last thing you have seen of that type. For instance, in job postings, you might want to remember the position and then add the hard and soft skills required for that position.

{
    "slots": [ { "uri": "Person"  } ] ,
    "links": [
        {
            "from": { "uri": "Position" }
            "to": { "slot": "Person" }
        }
    ]
}

Note

Remember that you cannot use the slot in the “from” node.

The slot Document is a predefined one. We can use it to attach topics or themes to a document.

Scopes#

One of the attributes in a link node is a scope. Scopes ensure we are not matching outside the given annotation. Note that it is identified not by an expression but rather by a concept URI:

"scope": { "uri" : "CompanyPosition" }

Topics#

Topics are the most important noun groups in your document. They provide with a short insight on what your document is about and a relevancy estimation of how prominent the topic is in the document.

Topics are connected to a slot, usually the document name.

“to”: [name of slot] “count”: [number of topics to insert] }

{
    "slots": [ { "uri": "Title" }  ],
    "topics": { "to": "Title", "count": 3 }
}

Note

To be able to link the topics of a document requires the Topics application in your pipeline.

Themes#

Themes provides the categories of the document, based on the linguistic clues.

The themes connect to a slot. In this case we connect them to a document.

“to”: [name of slot] “count”: [number of themes to insert]

{
    "themes": { "to": "Document" }
}

Tip

“Document” is a special node to which you can attach your themes.

Note

To be able to link the themes of a document requires the Themes application in your pipeline.

Example#

In the following example we will create 2 types of links:

  • Person followed by a Company
    • Store the following attributes:
      • “gender” on the Person node, and

      • “country” on the Company node

    • Create a relation and name it P2C (person to company)

  • Person followed by a Position and a Company
    • Store the nodes Person and Company

    • Create a relation and use the stem of the Position expression to create a link

{
    "links": [
        {
            "from": {
                "uri": "Person",
                "attributes": ["gender"]
           },
            "to": {
                "uri": "Company",
                "attributes": ["country"]
           },
           "relation": {
                "label": "P2C"
           }
        },
        {
            "from": {
                "uri": "Person"
            },
            "to": {
                "uri": "Company"
            },
            "relation": {
                "uri": "Position",
                "content": "{concept.stem}"
            }
        }
      ],
    "topics": { "to": "Document" }
}

Using a file name in a slot#

The processed document is uniquely identified by it’s document id, which has either been provided manually or is derived automatically, see for example the use of Input Providers

{
    "slots": [
        {
            "label": "Filename",
            "content": "{Path(document.id).stem}"
        }
    ],
    "links": [
        {
            "from": { "uri": "Title" },
            "relation": { "label": "file" },
            "to": { "slot": "Filename" }
        }
    ]
}

For an interpretation of the JSON data, refer to the application’s JSON schema.

Example#

wow -p 'english,entity,entity-graph(links=[{"from":{"uri":"Person","attributes":["gender"]},"to":{"uri":"Company"},"relation":{"label":"works_for"}}]).app'  \
    -i "John Smith works for Eyeontext."
[
    {
        "from": {
            "label": "Person",
            "name": "John Smith",
            "attributes": {
                "gender": "male"
            }
        },
        "relation": {
            "label": "works_for",
            "name": "works_for"
        },
        "to": {
            "label": "Company",
            "name": "Eyeontext"
        }
    }
]

which yields:

[
    {
        "from": {
            "label": "Person",
            "name": "John Smith",
            "attributes": { "gender" : [ "male" ] }
        },
        "relation": {
            "label": "works_for",
            "name": "works_for"
        },
        "to": {
            "label": "Company",
            "name": "Eyeontext"
        }
    }
]

For a better way to run it, have a look at the Entity Graph Tutorial.