Introduction to CSV / TSV workflow
If you haven’t heard terms CSV or TSV earlier, do not worry. CSV is short for comma-separated values and TSV is short for tab-separated values. Basically both of them describe textual file formats where values are separated either with commas (CSV) or with tabs (TSV).
,Unit,Hitpoints,Armor,Movement speed,Damage,Attacks per second,DPS ,Swordman,100,10,1.00,10,1.00,10 ,Brute,150,2,0.70,20,0.70,14 ,Scout,60,3,1.30,5,1.20,6 ,Archer,50,2,0.80,12,0.50,6
(CSV sample file)
Unit Hitpoints Armor Movement speed Damage Attacks per second DPS Swordman 100 10 1.00 10 1.00 10 Brute 150 2 0.70 20 0.70 14 Scout 60 3 1.30 5 1.20 6 Archer 50 2 0.80 12 0.50 6
(TSV sample file)
Then you might wonder why should you care about CSV or TSV. Well, the answer is simple: most spreadsheet programs (Microsoft Excel, LibreOffice Calc, Google Docs etc.) can output their current content as CSV or TSV files.
And at this point you should hear voices singing “Hello Darkness, My Old Friend” as you figure out that your old enemy (always hated spreadsheet program) could be your best friend while you are developing a game.
Basically the whole process is following:
- Add CSV or TSV parsing support to your game engine
- Create or edit a spreadsheet in your (least) favourite spreadsheet program
- Export that spreadsheet to CSV or TSV file
- Read the CSV or TSV file in your game engine and use those values in game or generate new code from those values
- Figure out what part of your game needs changes and jump back to step 2.
Now you might be wondering: “Why should I use spreadsheet program instead of my own solution / JSON files / hard coded values / $1 ?”, and answer is that almost anyone can use spreadsheet program, and with those new shiny online tools (like Google Docs and Excel Online) you can easily add collaboration support to your game development process that is far more powerful than most homegrown tools.
e.g. Google Docs supports simultaneously edits from multiple people, you get version history and you can directly download the spreadsheet as CSV or TSV file with any HTTP download tool.
Steps in your workflow that you can improve
There are lots of improvements you can do while working with this kind setup since CSV / TSV workflow can contain multiple steps:
Some people want to keep their spreadsheets simple. Others want to add colors and formatting. The thing to remember in here is that only actual values are written to CSV / TSV files. So you can use formulas, colors, bolding, notes etc. to make your spreadsheet more (or less) useable and it won’t have a negative effect to CSV / TSV files.
Below are three formattings from same data (you can find the actual spreadsheet from Google Docs)
(with colors and symbols)
As you can see, the same data can be presented in many different ways. The thing to remember in here is that only the first sheet will be converted to CSV / TSV file. So you can easily add e.g. graphs, additional notes or images to those extra sheet and they won’t bother the CSV / TSV generation at all.
If you want to use Unicode symbols (e.g. Emojis like 🛡) in your CSV / TSV files, make sure your text file parsers also supports those. And also inform other collaborators about what kind of things you are allowed to use.
There are many useful functions that you can use while creating your spreadheet files. I have listed some useful ones below, and you can check out their usage examples from this spreadsheet
- NOW(), is used to display last edit time of the document. Very useful since one can easily check from CSV / TSV file when last edit for spreadsheet file was done.
- SUBSTITUTE(), is used to replace existing text with new text in a string. This is useful for localization spreadsheets where text examples with replace might be needed
- ROUND(), is used to round decimal numbers
- SUM(), is used to sum values from different cells together
- IF(), does conditional output based on comparision
Since it is easy to add invalid values to spreadsheet (e.g. copy+paste error turns 100 to 10t0), you might want to add additional validations to your workflow. Easiest thing is to add validations directly to spreadsheet (e.g. number in this cell must be between 1 and 100) but you might want to use additional validation in your code, specially if invalid values can cause crashes or other nasty experiences for users.
If you use separate code for validating, then here are few tips you can do:
- Check that every line has equal number of commas (CSV) or tabs (TSV), assuming that actual elements do NOT contain extra commas or tabs
- You might want to skip lines that only contain commas (CSV) or tabs (TSV)
- Be VERY strict about input and error out immediately if something does not match your requirements. (e.g. in C# programming language Enum.Parse function accepts numbers, and this could lead to an issue, if you only want to accept enums)
You can extend your CSV / TSV workflow to include code generation. This is not viable method on all platforms, but if you don’t want or need runtime CSV / TSV parsing in your game then this is one way to use power of CSV / TSV file.
If we use the CSV file from beginning of this post, we can use following C# code to generate static readonly dictionary that contains those values.
and output will be
If you only have few entries (cells with values) in your CSV / TSV file, then you shouldn’t really worry about the parsing performance. But if you have hundreds or even thousand entries in your CSV / TSV file then you should spend a bit of time to make CSV / TSV handling more efficient.
The parsing method I suggest is to read those CSV / TSV entries line by line and splitting each line separately. After the splitting you can feed those values forward e.g. to your constructors.
Some people like to use Regex for splitting, but I would avoid it with CSV / TSV files since Regex is a bit of overkill for that purpose. In most cases it is easier to rely on KISS principle and not to construct too complicated CSV / TSV entries.
In certain situations you can get massive performance improvements if you generate optimized CSV / TSV files from the originals. Things you can optimize in cases like these are:
- enums (UnitType.Archer becomes 1), which are faster to cast from number to chosen enum type than parsing the enum type from string
- booleans (true becomes 1)
- durations (1min 40seconds becomes 100), which can be expressed e.g. as seconds
- color values (RGB Hex #78FF25 becomes 7929637)
below you can see snippets from original CSV file and “optimized” CSV file
Unit type, Is ground unit, Training time, Unit selection color, Bonus damage type, Speed Archer, true, 1 min, 255 255 0, Fire, 13
Unit type, Is ground unit, Training time, Unit selection color, Bonus damage type, Speed 3, 1, 60, 16776960, 2, 13
With massive CSV / TSV files, you should target for sparse presentation. That means you should set base/default value and only populate the entries if they are different than base/default value. This makes parsing a bit faster and saves some storage space.
Below are examples from sparse vs. dense
(empty cells in sparse are highlighted with pink color)
Dots vs. commas with decimal numbers
If you are going to use decimal numbers in your CSV / TSV files, make sure you use either dots or commas as decimal separators. Do not mix and match since it only confuses people.
Also make sure you set your CSV / TSV parsing locale to match your selection. Otherwise you might end situation where the parsing doesn’t work in all environments or produces incorrect results.
I have already given few examples in this post, but here are additional ones that might inspire you