Core API for Python#

The Core API is the part of EyeOnText’s Python API that is common to both the Portal and the SDK. This part of the API covers areas from working with input data, to using your results.

Working with Files and Folders Using Input Providers#

Any analysis starts with retrieving the – usually unstructured – text from your data source. Often, this data is not available in-memory, but needs to be read from disk and extracted from a variety of files (or folders) of different types: text, HTML, PDF, etc.

The Core API provides a convenient utilities – called input providers – that allow for easy and seamless extraction of your textual data from a variety of files and folders.

Constructing Documents#

The central data structure in the Core API is the document: a data structure that gathers the input data that needs to be processed and adds the corresponding results. This data structure can be created from an input provider and is passed on from step to step in your pipeline acquiring more results as each step is processed.

This approach allows each step in your pipeline to build upon the results already present, enriching your data as the complexity of the pipeline grows.

Analyzing Documents#

The basis of any NLP application starts with an aptly named NLP analysis. This data structure contains an annotated representation of the original input document. The supported annotations are:

Using Applications to Enhance Your Analysis#

Applications can be added to your pipeline to enrich your analysis. Although most applications are available only through the SDK, some have less stringent requirements and are available in the Core API: