Language Identification#

Application ID eot_language_identifier

Application Aliases language-identifier.app lid.app eot/lid.app eot/language-identifier.app

Description#

The Language Identification (lid) application identifies the language of a document and even pinpoint the different sections with their according languages:

Configuration#

The configuration is an object defined as (bold = required, italic = optional):

  • default_language (string) – The default language code to return when the language of a section cannot be detected. Default: english

  • language_candidates (array[string]) – List of the languages to consider

  • sections (boolean) – Analyze the full document and return the sections with their corresponding language. Default: False

  • section_data (boolean) – Add the text of the sections in the results. Default: False

Example#

return the language of the sentence

wow -p "lid.app" \
    -i "Ik ga naar het werk met de fiets"

which yields:

eot_language_identifier#
{
    "language": "dutch"
}

return the section of the different paragraphs.

wow -p 'lid(sections=true,section_data=true).app' \
        -i "Ik ga met de fiets naar het werk, en ik kom terug met de train.

But I'm driving to de gym with my car."

which yields:

eot_language_identifier#
{
    "sections": [
        {
            "begin_offset": 0,
            "end_offset": 65,
            "language": "dutch",
            "text": "Ik ga met de fiets naar het werk, en ik kom terug met de train.\n\n"
        },
        {
            "begin_offset": 65,
            "end_offset": 103,
            "language": "english",
            "text": "But I'm driving to de gym with my car."
        }
    ]
}

For an interpretation of the JSON data, refer to the application’s JSON schema.