How the (real) Duolingo API works: The Data¶
There’s a lot of data returned endpoints, and many of the endpoints duplicate data to better suit how the apps consume the data.
TTS server information¶
The Text-to-Speech engine information is returned in /version_info
.
The top-level keys tts_cdn_url
and tts_voice_configuration
can be
combined to get the audio URL for any individual word or phrase (by
phrase id).
For the following JSON returned...
"tts_base_url": "https://d7mj4aqfscim2.cloudfront.net/",
"tts_cdn_url": "http://static.duolingo.com/",
"tts_voice_configuration": {
"multi_voices": "{\"dn\": [\"dn\"], \"fr\": [\"fr\", \"fr/mathieu\"], \"en\": [\"en/salli\"], \"pt\": [\"pt\"], \"nb\": [\"nb/liv\"], \"de\": [\"de\"], \"tr\": [\"tr/filiz\"], \"it\": [\"it/carla\"], \"da\": [\"da\"], \"sv\": [\"sv/astrid\"], \"es\": [\"es\"]}",
"path": "tts/{voice}/{type}/{id}",
"voices": "{\"nb\": \"nb/liv\", \"en\": \"en/salli\", \"tr\": \"tr/filiz\", \"it\": \"it/carla\", \"sv\": \"sv/astrid\"}"
}
... we get urls like:
https://d7mj4aqfscim2.cloudfront.net/tts/sv/astrid/token/hej
https://d7mj4aqfscim2.cloudfront.net/tts/fr/matthieu/token/bon
As of yet I don’t know of any voice listed in voices
that is not
listed in multi_voices
, so it may be safe to ignore the voices
key.
Dictionary¶
As described in endpoints, the dictionary API exists on a
different server than the other endpoints. Its URI can be found in
/version_info
as well, as dict_base_url
.
Calendar¶
Users’ intra-language practice calendar, and their overall calendar, are
accessible over the users/show?id=
endpoint. It does not require login to
get a user’s calendars.
The user’s overall calendar is the top-level key calendar
, but their
language-specific calendar is stored as calendar
under the
language_data[language]
key. More information on the language_data
block is below.
Here’s an example user calendar. The improvement is the number of points earned for the practice, etc., and the datetime is the millisecond. I find it worthy of note that Duolingo returns the millisecond, while only returning it with accuracy in the thousands.
calendar: [
{
improvement: 10,
datetime: 1435978875000
},
{
improvement: 10,
datetime: 1435979114000
},
{
improvement: 10,
datetime: 1435979472000
},
{
improvement: 10,
datetime: 1435979728000
},
{
improvement: 10,
datetime: 1435980180000
},
...
]
Certificates¶
A user’s certificates – the extended tests that cost 25 lingots – are
available (without authentication) through the /users/show?id=
endpoint.
"certificates": [
{
"datetime": "\n\n\n\n\n\n\n\n1 month ago\n\n",
"id": "abcde",
"language": "de",
"language_string": "German",
"score": 2.09
},
...
]
The datetime is a newline-padded string.
language_data
¶
Language data is a behemoth. It’s gigantic, it’s horrifying. I love it.
Language_data
is a field in users/show
that stores essentially
everything the app might want to know about the user’s progress in a language.
Only one key is present inlanguage_data
at a time – their current learning language. Other languages must be set as the current using the/me/current_langauge
endpoint before itslanguage_data
can be retrieved.
The following data is some of, but not all of, what is stored inside a
langauge_data
block:
streak
: the user’s current streak in days for that language.langauge_string
: the string of the language being learned. ex, Frenchlevel_progress
: the current number of XP earned in the current levelfluency_score
: a float containing the fluency of the user.level
andnext_level
: integers with the user’s current and next levelsnotify_time
: the time, in minutes, that a user should be notified topractice at. This is stored relative to the user’s current timezone as returned by the
users/show
endpoint.
points_ranking_data
: a list containing user objects of the user’s friends,ranked by points. More on user objects below.
num_skills_learned
: the number of skills in the current langauge learned.level_left
: XP until the user levels up.tracking_properties
: data about the user’s tree.next_lesson
: the next lesson the user has to learn. Contains the URL,which can be used with the skills API.
skills
: a list of skill objects that the user has learned.bonus_skills
: a list of bonus skill objects that the user has learned.points
: integer with the user’s current XP
Users¶
Duolingo uses the same user object wherever users appear in its API – namely,
the points_ranking_data
list and points_ranking_data_dict
dict
containing the data on each user with their ID as a key.
{
username: "me",
language_string: "French",
points_data: {
languages: [ ],
total: 10000
},
avatar: "https://s3.amazonaws.com/duolingo-images/avatar/default_2",
language: "fr",
fullname: null,
id: 1234,
rank: 1,
self: true
}
Skills¶
A skill object is either gotten from the skills
list inside of
langauge_data
or from the skills
endpoint.