Document Schema for JSON#
Description#
The document schema describes the structure of the JSON data returned from the Portal or the SDK before it is transformed to a native programming object such as the Document object for Python. It is the structure in which all NLP results for a processed document are represented. Each processed document is a assigned a JSON object with:
a uniquely given or assigned – through use of input providers – document identifier
any applied applications adding their results under a unique application identifier
As an example, the following diagram represents a visualization of the schema:
Definition#
The Document
schema is defined as:
object
– The documentid (
str
) – The document identifierapps (
object
) – The applications that have processed the document<app_id> (
App
) – The application data for the app with the given identifier
with the App
schema defined as:
object
– The application dataresults (depends on application) – The application results
diagnostics (diagnostics schema) – The application diagnostics
Example#
The following is an example in pseudo-JSON of document data:
{
"id": "test.txt",
"apps": {
"<app_id>" : {
"results" : "<app_results>",
"diagnostics": "<app_diagnostics>"
}
}
}
Note that <doc_id>
is a placeholder for the actual document id. Likewise, <app_id>
is a placeholder for an application id, <app_results>
is a placeholder for the actual application results and <app_diagnostics>
for the actual application diagnostics.