Document Schema for JSON#

Description#

The document schema describes the structure of the JSON data returned from the Portal or the SDK before it is transformed to a native programming object such as the Document object for Python. It is the structure in which all NLP results for a processed document are represented. Each processed document is a assigned a JSON object with:

  • a uniquely given or assigned – through use of input providers – document identifier

  • any applied applications adding their results under a unique application identifier

As an example, the following diagram represents a visualization of the schema:

Document schema

Definition#

The Document schema is defined as:

  • object – The document

    • id (str) – The document identifier

    • apps (object) – The applications that have processed the document

      • <app_id> (App) – The application data for the app with the given identifier

with the App schema defined as:

Example#

The following is an example in pseudo-JSON of document data:

{
    "id": "test.txt",
    "apps": {
        "<app_id>" : {
            "results" : "<app_results>",
            "diagnostics": "<app_diagnostics>"
        }
    }
}

Note that <doc_id> is a placeholder for the actual document id. Likewise, <app_id> is a placeholder for an application id, <app_results> is a placeholder for the actual application results and <app_diagnostics> for the actual application diagnostics.