Document Schema for JSON#

Description#

The document schema describes the structure of the JSON data returned from the Portal or the SDK before it is transformed to a native programming object such as the Document object for Python. It is the structure in which all NLP results for a processed document are represented. Each processed document is a assigned a JSON object with:

a uniquely given or assigned – through use of input providers – document identifier
any applied applications adding their results under a unique application identifier

As an example, the following diagram represents a visualization of the schema:

Definition#

The Document schema is defined as:

object – The document
- id (str) – The document identifier
- apps (object) – The applications that have processed the document
  - <app_id> (App) – The application data for the app with the given identifier

with the App schema defined as:

object – The application data
- results (depends on application) – The application results
- diagnostics (diagnostics schema) – The application diagnostics

Example#

The following is an example in pseudo-JSON of document data:

{
    "id": "test.txt",
    "apps": {
        "<app_id>" : {
            "results" : "<app_results>",
            "diagnostics": "<app_diagnostics>"
        }
    }
}

Note that <doc_id> is a placeholder for the actual document id. Likewise, <app_id> is a placeholder for an application id, <app_results> is a placeholder for the actual application results and <app_diagnostics> for the actual application diagnostics.