Generating Character Names in Unity from External Files (Part II)
Check out the (free!) Unity asset, complete with readme and a basic usage guide, on GitHub!
Where we started
Last time we discussed our design goals for the Cultural Names asset and we jumped right in with some simple .csv parsing. We also talked through the way we chose to organize each culture into a single dictionary, with the original filename serving as the key and the value a jagged array of all the possible components in a name: surname, forenames (both male and female), titles, and suffixes.
Building a name from the arrays
Now that we have the dictionary successfully loaded into memory whenever we call the method outlined last time, getting a name is actually very straightforward. We generally prefer to use .NET’s System.Random
class, instead of Unity’s, and so we create an instance of the random class:
Then we can use rnd.Next()
, which is a helpful method that defaults to picking a random int
between zero and a maximum value, given as a parameter. You can define a minimum value if needed, but for our purposes we want it to be zero since that’s the starting index of our jagged array. The method we’ve defined takes the array and a NameType
as parameters; the NameType
is an enum that defines whether the name is a surname, forename, etc. It returns a string that’s a random name taken from that list. You’ll note we cast the NameType
to an int; this is an easy way to use enums to define parts of an array, but it’s important not to redefine your enum’s values when you initialize it, otherwise the default ordering won’t work.
Remember, the first dimension of these jagged arrays defines which component of the name we’re addressing, so it’s simply the casted NameType
; the second dimension is the arbitrarily long list of names. One tricky thing that always trips me up is the syntax for accessing the dimensions in a jagged array. If you want to get the Length
of a jagged array’s first dimension, it’s just array.Length
as you would normally do; if you want the Length
of the second dimension, though, it’s array[first].Length
, where first
is which first dimension you’re in. Pretty obvious, but I always forget!
But how do we give this handy little method the right array? That’s handled by a check to the dictionary using TryGetValue()
. Here’s the method that generates a random forename:
The advantage of TryGetValue()
is that it doesn’t cause issues on failure; we catch possible failures here in Unity’s debug log with a suggestion to check the culture
parameter.
So we’re grabbing names now! In the asset there’s also a method for generating a random full name, returning a Name
struct (more on that later). But you may have noticed an issue: duplicate names. As alluded to in the comments on the method above, there’s no guarantee the name returned is unique; it’s merely returning a random name from that array without knowledge of past names. So how do we guarantee names aren’t duplicated?
Preventing duplicates
Preventing duplicate names could be done in a number of ways, but we elected to use a hashing system. For the unitiated, a hash is the output of a cryptographic hash function, which is a mathematical algorithm that given certain input of arbitrary size returns a bit string of fixed size that’s easy to compare. These were devised for cryptography, although they are also very useful to detect duplicate data or prevent data corruption issues. The tremendous advantage is that it is incredibly unlikely that two hashes would collide (that is, different starting data creating identical hashes), and it’s relatively easy to use.
Could we have simply made a list of all the names that have been generated and compared those strings? Yes. But this good practice, and more fun, as well as scaling nicely. We’re using the MD5 algorithm for this project, but do not use it for actual cryptographic purposes! It’s highly vulnerable to attack and is essentially worthless for protecting against intentional corruption. It is, however, still elegant and quick, perfect to protect against accidental corruption. The operations are handled by C#’s built-in System.Cryptography
methods, and the general strategy is as follows: generate a random name; hash that name; compare it against an existing list of hashed names; re-roll a new random name if it conflicts, otherwise add this new hash to the list of generated names and return the shining new unique name. The hashing method looks like this:
You’ll note the MD5 hasher takes a byte array for its input, so we first convert the name to a string, then a byte array, then feed that byte array into the hasher, which outputs a hashed string. We have a static list of hashed names that is stored in the NameBuilder
class and some simple methods for adding and removing hashes from this list—make sure to look at the full source code on GitHub in the #Initialization and List Management
region to see the implementation. So now let’s look at the final method that uses this hasher and compares the generated name:
Of note is the way we test the name. We generate a name (both forename and surname) and use a do while
loop to try to find a unique name. The IsUniqueName()
method returns a bool and looks like this:
You’ll note we’re testing the number of iterations of the while
loop. This is critical, because in some edge cases you could conceiveably loop forever and cause major issues. One edge case would be if the modder only put one or two names into each of the NameTypes for a particular culture; in that case all possible names would be quickly used up, leaving only colliding names left for the generator. We arbitrarily capped it at twenty attempts before failure; this is much higher than you’d ever see with a sufficiently large pool of names, but low enough that you won’t see any performance issues.
Conclusion
The whole name is stored in a Name
struct with a series of public strings inside; this is just a container to store everything in and make it nice and neat to pass names around to the parts of your code that need it.
Well, that wraps up our discussion of Cultural Names! Don’t forget to grab the asset from GitHub and let us know what you think about it. Leave an issue if you have a problem or a suggestion.