How the (real) Duolingo API works: The Data

There’s a lot of data returned endpoints, and many of the endpoints duplicate data to better suit how the apps consume the data.

TTS server information

The Text-to-Speech engine information is returned in /version_info. The top-level keys tts_cdn_url and tts_voice_configuration can be combined to get the audio URL for any individual word or phrase (by phrase id).

For the following JSON returned...

"tts_base_url": "https://d7mj4aqfscim2.cloudfront.net/",
"tts_cdn_url": "http://static.duolingo.com/",
"tts_voice_configuration": {
    "multi_voices": "{\"dn\": [\"dn\"], \"fr\": [\"fr\", \"fr/mathieu\"], \"en\": [\"en/salli\"], \"pt\": [\"pt\"], \"nb\": [\"nb/liv\"], \"de\": [\"de\"], \"tr\": [\"tr/filiz\"], \"it\": [\"it/carla\"], \"da\": [\"da\"], \"sv\": [\"sv/astrid\"], \"es\": [\"es\"]}",
    "path": "tts/{voice}/{type}/{id}",
    "voices": "{\"nb\": \"nb/liv\", \"en\": \"en/salli\", \"tr\": \"tr/filiz\", \"it\": \"it/carla\", \"sv\": \"sv/astrid\"}"
}

... we get urls like:

  • https://d7mj4aqfscim2.cloudfront.net/tts/sv/astrid/token/hej
  • https://d7mj4aqfscim2.cloudfront.net/tts/fr/matthieu/token/bon

As of yet I don’t know of any voice listed in voices that is not listed in multi_voices, so it may be safe to ignore the voices key.

Dictionary

As described in endpoints, the dictionary API exists on a different server than the other endpoints. Its URI can be found in /version_info as well, as dict_base_url.

Calendar

Users’ intra-language practice calendar, and their overall calendar, are accessible over the users/show?id= endpoint. It does not require login to get a user’s calendars.

The user’s overall calendar is the top-level key calendar, but their language-specific calendar is stored as calendar under the language_data[language] key. More information on the language_data block is below.

Here’s an example user calendar. The improvement is the number of points earned for the practice, etc., and the datetime is the millisecond. I find it worthy of note that Duolingo returns the millisecond, while only returning it with accuracy in the thousands.

calendar: [
     {
         improvement: 10,
         datetime: 1435978875000
     },
     {
         improvement: 10,
         datetime: 1435979114000
     },
     {
         improvement: 10,
         datetime: 1435979472000
     },
     {
         improvement: 10,
         datetime: 1435979728000
     },
     {
         improvement: 10,
         datetime: 1435980180000
     },
     ...
 ]

Certificates

A user’s certificates – the extended tests that cost 25 lingots – are available (without authentication) through the /users/show?id= endpoint.

"certificates": [
    {
        "datetime": "\n\n\n\n\n\n\n\n1 month ago\n\n",
        "id": "abcde",
        "language": "de",
        "language_string": "German",
        "score": 2.09
    },
    ...
]

The datetime is a newline-padded string.

language_data

Language data is a behemoth. It’s gigantic, it’s horrifying. I love it.

Language_data is a field in users/show that stores essentially everything the app might want to know about the user’s progress in a language.

Only one key is present in language_data at a time – their current learning language. Other languages must be set as the current using the /me/current_langauge endpoint before its language_data can be retrieved.

The following data is some of, but not all of, what is stored inside a langauge_data block:

  • streak: the user’s current streak in days for that language.

  • langauge_string: the string of the language being learned. ex, French

  • level_progress: the current number of XP earned in the current level

  • fluency_score: a float containing the fluency of the user.

  • level and next_level: integers with the user’s current and next levels

  • notify_time: the time, in minutes, that a user should be notified to

    practice at. This is stored relative to the user’s current timezone as returned by the users/show endpoint.

  • points_ranking_data: a list containing user objects of the user’s friends,

    ranked by points. More on user objects below.

  • num_skills_learned: the number of skills in the current langauge learned.

  • level_left: XP until the user levels up.

  • tracking_properties: data about the user’s tree.

  • next_lesson: the next lesson the user has to learn. Contains the URL,

    which can be used with the skills API.

  • skills: a list of skill objects that the user has learned.

  • bonus_skills: a list of bonus skill objects that the user has learned.

  • points: integer with the user’s current XP

Users

Duolingo uses the same user object wherever users appear in its API – namely, the points_ranking_data list and points_ranking_data_dict dict containing the data on each user with their ID as a key.

{
    username: "me",
    language_string: "French",
    points_data: {
    languages: [ ],
    total: 10000
    },
    avatar: "https://s3.amazonaws.com/duolingo-images/avatar/default_2",
    language: "fr",
    fullname: null,
    id: 1234,
    rank: 1,
    self: true
}

Skills

A skill object is either gotten from the skills list inside of langauge_data or from the skills endpoint.