How to improve your CSV / TSV workflow

This blog post contains tips that you can use to improve your CSV / TSV workflow. Terms CSV and TSV are also explained.

Introduction to CSV / TSV workflow

If you haven’t heard terms CSV or TSV earlier, do not worry. CSV is short for comma-separated values and TSV is short for tab-separated values. Basically both of them describe textual file formats where values are separated either with commas (CSV) or with tabs (TSV).

,Unit,Hitpoints,Armor,Movement speed,Damage,Attacks per second,DPS
,Swordman,100,10,1.00,10,1.00,10
,Brute,150,2,0.70,20,0.70,14
,Scout,60,3,1.30,5,1.20,6
,Archer,50,2,0.80,12,0.50,6

(CSV sample file)

	Unit	Hitpoints	Armor	Movement speed	Damage	Attacks per second	DPS
	Swordman	100	10	1.00	10	1.00	10
	Brute	150	2	0.70	20	0.70	14
	Scout	60	3	1.30	5	1.20	6
	Archer	50	2	0.80	12	0.50	6

(TSV sample file)

Then you might wonder why should you care about CSV or TSV. Well, the answer is simple: most spreadsheet programs (Microsoft Excel, LibreOffice Calc, Google Docs etc.) can output their current content as CSV or TSV files.

And at this point you should hear voices singing “Hello Darkness, My Old Friend” as you figure out that your old enemy (always hated spreadsheet program) could be your best friend while you are developing a game.

Basically the whole process is following:

Add CSV or TSV parsing support to your game engine
Create or edit a spreadsheet in your (least) favourite spreadsheet program
Export that spreadsheet to CSV or TSV file
Read the CSV or TSV file in your game engine and use those values in game or generate new code from those values
Figure out what part of your game needs changes and jump back to step 2.

Now you might be wondering: “Why should I use spreadsheet program instead of my own solution / JSON files / hard coded values / $1 ?”, and answer is that almost anyone can use spreadsheet program, and with those new shiny online tools (like Google Docs and Excel Online) you can easily add collaboration support to your game development process that is far more powerful than most homegrown tools.

e.g. Google Docs supports simultaneously edits from multiple people, you get version history and you can directly download the spreadsheet as CSV or TSV file with any HTTP download tool.

Steps in your workflow that you can improve

There are lots of improvements you can do while working with this kind setup since CSV / TSV workflow can contain multiple steps:

Formatting

Some people want to keep their spreadsheets simple. Others want to add colors and formatting. The thing to remember in here is that only actual values are written to CSV / TSV files. So you can use formulas, colors, bolding, notes etc. to make your spreadsheet more (or less) useable and it won’t have a negative effect to CSV / TSV files.

Below are three formattings from same data (you can find the actual spreadsheet from Google Docs)

Plain spreadsheet (plain)

Simple formatting spreadsheet (simple formatting)

Colors and symbols spreadsheet (with colors and symbols)

As you can see, the same data can be presented in many different ways. The thing to remember in here is that only the first sheet will be converted to CSV / TSV file. So you can easily add e.g. graphs, additional notes or images to those extra sheet and they won’t bother the CSV / TSV generation at all.

If you want to use Unicode symbols (e.g. Emojis like 🛡) in your CSV / TSV files, make sure your text file parsers also supports those. And also inform other collaborators about what kind of things you are allowed to use.

Functions

There are many useful functions that you can use while creating your spreadheet files. I have listed some useful ones below, and you can check out their usage examples from this spreadsheet

NOW(), is used to display last edit time of the document. Very useful since one can easily check from CSV / TSV file when last edit for spreadsheet file was done.
SUBSTITUTE(), is used to replace existing text with new text in a string. This is useful for localization spreadsheets where text examples with replace might be needed
ROUND(), is used to round decimal numbers
SUM(), is used to sum values from different cells together
IF(), does conditional output based on comparision

Useful functions spreadsheet

Validation

Since it is easy to add invalid values to spreadsheet (e.g. copy+paste error turns 100 to 10t0), you might want to add additional validations to your workflow. Easiest thing is to add validations directly to spreadsheet (e.g. number in this cell must be between 1 and 100) but you might want to use additional validation in your code, specially if invalid values can cause crashes or other nasty experiences for users.

If you use separate code for validating, then here are few tips you can do:

Check that every line has equal number of commas (CSV) or tabs (TSV), assuming that actual elements do NOT contain extra commas or tabs
You might want to skip lines that only contain commas (CSV) or tabs (TSV)
Be VERY strict about input and error out immediately if something does not match your requirements. (e.g. in C# programming language Enum.Parse function accepts numbers, and this could lead to an issue, if you only want to accept enums)

Code generation

You can extend your CSV / TSV workflow to include code generation. This is not viable method on all platforms, but if you don’t want or need runtime CSV / TSV parsing in your game then this is one way to use power of CSV / TSV file.

If we use the CSV file from beginning of this post, we can use following C# code to generate static readonly dictionary that contains those values.

using System;
using System.Text;
using System.IO;
					
public class Program
{
	public enum UnitType 
	{
		Swordman,
		Brute,
		Scout,
		Archer
	}
	
	public class UnitStats
	{
		public int hitpoints;
		public int armor;
		public float movement;
		public int damage;
		public float attacksPerSecond;
		public float damagePerSecond;
		
		public UnitStats(int hp, int arm, float mov, int dama, float aps, float dps)
		{
			this.hitpoints = hp;
			this.armor = arm;
			this.movement = mov;
			this.damage = dama;
			this.attacksPerSecond = aps;
			this.damagePerSecond = dps;
		}
	}
	
	
	public static void Main()
	{
		string csvInput = @",Unit,Hitpoints,Armor,Movement speed,Damage,Attacks per second,DPS
                            ,Swordman,100,10,1.00,10,1.00,10
                            ,Brute,150,2,0.70,20,0.70,14
                            ,Scout,60,3,1.30,5,1.20,6
                            ,Archer,50,2,0.80,12,0.50,6";
		
		StringBuilder builder = new StringBuilder();
		
		builder.AppendLine("private readonly Dictionary<UnitType,UnitStats> unitValues = new Dictionary<UnitType,UnitStats>()");
		builder.AppendLine("{");
		
		using (StringReader reader = new StringReader(csvInput))
		{
			// Loop over the lines in the string.
			string line;
			
			UnitType unitType;
			
			while ((line = reader.ReadLine()) != null)
			{
				string[] splitted = line.Split(',');
				
				// Only process entries that start with proper enum
				if (splitted.Length > 1 && Enum.TryParse(splitted[1], out unitType))
				{
					builder.Append("   { UnitType." + splitted[1] + ", new UnitStats(");
					for (int j = 2; j < splitted.Length; j++)
					{
						builder.Append(splitted[j]);
						
						// Add f for floats
						if (splitted[j].Contains("."))
						{
							builder.Append("f");
						}
						
                        // Add commas when needed
						if (j != splitted.Length - 1)
						{
							builder.Append(", ");
						}
					}
					builder.AppendLine(") },");
				}
			}
		}

		builder.AppendLine("};");
		
		Console.Write(builder.ToString());
	}
}

and output will be

private readonly Dictionary<UnitType,UnitStats> unitValues = new Dictionary<UnitType,UnitStats>()
{
   { UnitType.Swordman, new UnitStats(100, 10, 1.00f, 10, 1.00f, 10) },
   { UnitType.Brute, new UnitStats(150, 2, 0.70f, 20, 0.70f, 14) },
   { UnitType.Scout, new UnitStats(60, 3, 1.30f, 5, 1.20f, 6) },
   { UnitType.Archer, new UnitStats(50, 2, 0.80f, 12, 0.50f, 6) },
};

Optimization

If you only have few entries (cells with values) in your CSV / TSV file, then you shouldn’t really worry about the parsing performance. But if you have hundreds or even thousand entries in your CSV / TSV file then you should spend a bit of time to make CSV / TSV handling more efficient.

The parsing method I suggest is to read those CSV / TSV entries line by line and splitting each line separately. After the splitting you can feed those values forward e.g. to your constructors.

Some people like to use Regex for splitting, but I would avoid it with CSV / TSV files since Regex is a bit of overkill for that purpose. In most cases it is easier to rely on KISS principle and not to construct too complicated CSV / TSV entries.

In certain situations you can get massive performance improvements if you generate optimized CSV / TSV files from the originals. Things you can optimize in cases like these are:

enums (UnitType.Archer becomes 1), which are faster to cast from number to chosen enum type than parsing the enum type from string
booleans (true becomes 1)
durations (1min 40seconds becomes 100), which can be expressed e.g. as seconds
color values (RGB Hex #78FF25 becomes 7929637)

below you can see snippets from original CSV file and “optimized” CSV file

Unit type, Is ground unit, Training time, Unit selection color, Bonus damage type, Speed
Archer, true, 1 min, 255 255 0, Fire, 13

▼

Unit type, Is ground unit, Training time, Unit selection color, Bonus damage type, Speed
3, 1, 60, 16776960, 2, 13

With massive CSV / TSV files, you should target for sparse presentation. That means you should set base/default value and only populate the entries if they are different than base/default value. This makes parsing a bit faster and saves some storage space.

Below are examples from sparse vs. dense

Sparse vs. dense spreadsheet

(empty cells in sparse are highlighted with pink color)

Dots vs. commas with decimal numbers

If you are going to use decimal numbers in your CSV / TSV files, make sure you use either dots or commas as decimal separators. Do not mix and match since it only confuses people.

Also make sure you set your CSV / TSV parsing locale to match your selection. Otherwise you might end situation where the parsing doesn’t work in all environments or produces incorrect results.

Examples

I have already given few examples in this post, but here are additional ones that might inspire you

Game map

Game map spreadsheet

Medal times

Medal times spreadsheet

Shop values

Shop values spreadsheet