Analysis Results Schema for JSON#

Application ID eot_analysis

Description#

The AnalysisResults schema describes the data returned from an NLP analysis by the SDK or the Portal before it is returned into a native programming object such as Analysis object for Python. It is the structure that contains the basic linguistic analysis results.

Definition#

Analysis#

The AnalysisResults schema is defined as:

  • object – The analyzed document

    • id (string) – The identifier of the analyzed document

    • language (string) – The language file that processed the document. If multiple languages process the document, only the first language is set

    • sentences (array[Sentence]) – The list of analyzed sentences

Sentence#

The Sentence schema is an extension of the Annotation schema, which is defined as:

  • array – The annotation, formatted as an array for size-efficiency during serialization

    • [0] (number) – The type of the annotation: 1 as Sentence, 2 as Concept, 3 as Token

    • [1] (number) – The begin offset of the annotation

    • [2] (number) – The end offset of the annotation

with the Sentence schema therefore defined as:

  • array – The sentence annotation

    • [0] (number) – Always set to 1

    • [1] (number) – The begin offset of the sentence within the document

    • [2] (number) – The end offset of the sentence within the document

    • [3] (array[Annotation]) – The list of annotations in the sentence

    • [4] (dict[str:list[str]]) – Options that can be set on the sentence

  • Available Sentence options –

    • header : [‘true’] – Means the the sentence looks like a header.

Concept#

The Concept schema is defined as:

  • array – The concept annotation, formatted as an array for size-efficiency during serialization

    • [0] (number) – Always set to 2

    • [1] (number) – The begin offset of the concept within the document

    • [2] (number) – The end offset of the concept within the document

    • [3] (string) – The URI or name of the concept, e.g. "Person"

    • [4] (ConceptAttributes) – The attributes of the concept

with the ConceptAttributes schema defined as:

  • object

    • <name> (array[string]) – The attribute values of the attribute with name <name>

Token#

The Token schema is defined as:

  • array – The token annotation, formatted as an array for size-efficiency during serialization

    • [0] (number) – Always set to 3

    • [1] (number) – The begin offset of the token

    • [2] (number) – The end offset of the token

    • [3] (string) – The literal representation of the token

    • [4] (array[string]) – The properties of the token

    • [5] (array[MorphData]) – The morphological data of the token

with the MorphData schema defined as a recursive structure:

  • array

    • [0] (string) – The stem of the token

    • [1] (string) – The part-of-speech of the token

    • [2] (string) – The properties of the token

    • [3] (string) [optional] – The (child) morphological data

[
3,                       --> object type 3 = Token
23,                      --> begin_offset
41,                      --> end_offset
"verplegerassistent",    --> literal
[],                      --> properties
[
    [
    "verplegerassistent",   --> stem
    "Nn-Sg",                --> part of speech
    "compound",             --> properties
        [
            ["verpleger","Nn-Sg"], --> first part , stem, pos
            ["assistent","Nn-Sg"]  --> second part , stem, pos
        ]                   --> components
    ]
]                           --> morphology
]

Example#

The following is an example of the JSON results for this application:

{
    "id": "test.txt",
    "language": "dutch",
    "sentences": [
        [1, 0, 62,
            [
                [2, 0, 62, "Sentence", {}],
                [2, 0, 12, "Person", {
                    "canonical": ["Jan Jansens"],
                    "family": ["Jansens"],
                    "gender": ["male"], "given": ["Jan"]}],
                [3, 0, 3, "Jan", ["init-cap", "init-token"], [["Jan", "Prop-Std", "giv"]]],
                [3, 4, 12, "Jansens", ["init-cap", "nf", "nf-lex"], [["Jannsen", "Prop-Std", "gen", "guesser"]]],
                [3, 13, 18, "werkt", [], [["werken", "V-Pres"]]],
                [3, 19, 22, "als", [], [["als", "Prep-Std"]]],
                [2, 23, 41, "Position", {}],
                [3, 23, 41, "verplegerassistent", [], [["verplegerassistent", "Nn-Sg", "compound", [["verpleger", "Nn-Sg"], ["assistent", "Nn-Sg"]]]]],
                [3, 42, 44, "in", [], [["in", "Prep-Std", "prefix"]]],
                [3, 45, 48, "het", [], [["het", "Det-Def"]]],
                [2, 49, 61, "Organization", {}],
                [3, 49, 51, "UZ", ["all-cap", "nf"], [["UZ", "Prop-Std"]]],
                [3, 52, 61, "Midelheim", ["init-cap", "nf", "nf-lex"], [["Midelheim", "Prop-Std"]]],
                [3, 61, 62, ".", [], [[".", "Punct-Sent"]]]
            ]
        ]
    ]
}