Packaging 93K Levels
A story of custom binary serialization for a game made with Unity3D
This year I helped a customer develop and ship a word puzzle game for iOS and Android. One of the challenges in this project was, the game should ship localized in 31 languages. This means every language gets its own set of levels. The customer wanted to start with 3K levels, which they were able to procedurally generate offline. The game targets people, who know multiple languages, so switching between languages should be seamless. Also, the game should be playable offline and there were no efforts to build up a BackEnd team, so the simplest thing to do was to package all 93K levels with the game.
There are many ways how one can represent a level to be loadable in Unity3D, but when it comes to 93 thousand levels we should keep the loading time and bundle size in mind. The procedurally generated levels came as a CSV file per language, each larger than 2MB. 2MB multiplied by 31 languages results in 62MB of data just for the levels. This was an unacceptable penalty, so I had to get creative.
A level in a word puzzle game consists mainly of … words. In our particular game, each word in a level could be formed from up to 9 different characters. This restriction was a very fortunate detail, which helped us to reduce the level of size.
Say we have 20 words in a level and a word can be 3 to 9 characters long. For English, it resolves to a storage consumption between 60 and 180 bytes, when words are stored in ASCII/UTF-8 or 120 to 360 bytes when we store the words in UTF-16. When it comes to other languages Japanese, Russian, Greek, etc… we pay an even higher price. A Japanese (katakana letter) takes up 3 bytes in UTF-8 moving the needle to 180 to 440 bytes.
But what if we store the words not as a sequence of letters, but as a sequence of lookup indexes? Every word in a level is a permutation of X characters where X is between 3 and 9.
OK, I guess it is time to give you an example. Here are the words for one of the simple English levels:
GOLD, OLD, GOD, LOG, DOG
All 5 words are formable by four characters G
L
O
D
.
So if we create a lookup table where G = 0
, L = 1
, O = 2
, D = 3
.
We can write the words in the puzzle as follows:
GOLD = 0213, OLD = 213, GOD = 023, LOG = 120, DOG = 320
To make the point stick even better, here is an example of a simple Russian puzzle:
ПАРК, ПАР, АКР, КРАП, КАРП, РАК
If the characters are represented as П = 0
, А = 1
, К = 2
, Р = 3
.
The words translate to:
ПАРК = 0132, ПАР = 013, АКР = 132, КРАП = 2310, КАРП = 2130,
РАК = 312
When we do this translation of language-specific letters to numbers between 0 and 8, we introduce indirection, which lets us store data in a more compact way. Numbers between 0 and 8 can be represented with only 4 bits (half a byte). So in case of English words:
GOLD, OLD, GOD, LOG, DOG
We go from 16 bytes to 8 bytes, where every character is represented in 1 byte as an English letter, but can be represented in half of a byte as a number.
And in the case of Russian words:
ПАРК, ПАР, АКР, КРАП, КАРП, РАК
We reduce the size from 42 bytes (each letter is represented in 2 bytes) to 11 bytes, consuming almost 4x less memory on disk.
Words are not the only data that needs to be stored in the level, but being able to represent words in a very compact way, was one of the biggest gains for us in order to reduce the file size of 93K levels. In the end, we managed to store all levels for a language in about 400KB, which results in about 12MB for all 31 languages. Furthermore, the files are designed in a way that a particular level can be decoded almost instantly resulting in a Level
class instance, which can be used by the game logic. Here is the simplified API of the Level
and Word
classes:
public class Level
{
public readonly Language Language;
public readonly byte Width;
public readonly byte Height;
private readonly char[] letters;
private readonly List<Word> words = new List<Word>(); public IEnumerable<Word> Words => words.ToArray();
public byte[] ToBytes()
public static Level FromBytes(Language language, byte[] bytes)
}public class Word
{
public readonly string Value;
public readonly Direction Direction;
public readonly int X;
public readonly int Y;
}public enum Direction : byte
{
Horizontal,
Vertical,
None
}public enum Language : byte
{
AZ,
BR,
...
}
As a small bonus point, I want to mention. I used FlexBuffers for storing UI and metagame text localization data. I wrote a much more extensive blog post about the benefits of using FlexBuffers in Unity3D:
For this particular game, I converted CSV files with text localization data into FlexBuffer files, which enabled us to switch between 31 languages instantly with almost zero loading cost also from a UI point of view.
🙌 Thank you for reading my story. Please let me know if you have any questions, or interested in learning more about custom binary data serialization.
👋
PS: You can hire me for this kind of work 😉.