Today I had to write an assignment for my Evolutionary Algorithms course, which was to implement a genetic algorithm to solve the Traveling Salesman Problem. I wrote the Python code, it worked well but it took a while to find a good solution with 100 cities, and I realised it would be a few days before I could get a solution for the dataset with 3000 cities. I definitely needed to speed up my program somehow.

I decided to give Cython a try. Cython is a Python optimizer, you basically annotate your code with types so that Cython can get rid of your Python calls and convert them to pure C, with all the speed that entails. After spending a few hours adding types to my program (it would be a lot less if I didn’t need to pass a huge array to two classes), I finally had working code.

My code didn’t require many changes, I only really optimised two functions. One went from this:

def fitness(self):
    cost = 0
    for index in range(1, GENOME_LENGTH):
        cost += GRAPH[self.genome[index-1], self.genome[index]]
    cost += GRAPH[self.genome[GENOME_LENGTH-1], self.genome[0]]
    return cost

to this:

def fitness(self):
    cdef numpy.float64_t cost = 0
    cdef int index
    cdef numpy.ndarray[numpy.float64_t, ndim=2] graph = self.graph
    cdef numpy.ndarray[numpy.int64_t, ndim=1] genome = self.get_genome()
    for index in range(1, GENOME_LENGTH):
        cost += graph[genome[index-1], genome[index]]
    cost += graph[genome[GENOME_LENGTH-1], genome[0]]
    return cost

The other went from this:

positions = (randrange(GENOME_LENGTH), randrange(GENOME_LENGTH))
start = min(positions)
stop = max(positions)
son = father.copy()
daughter = mother.copy()
if start == stop:
    return (son, daughter)
# Crossover for mother-son.
for gene in mother.genome[start:stop]:
    for counter in range(GENOME_LENGTH):
        if son.genome[counter] == gene:
            son.genome[counter] = -1
for counter in range(start, stop):
    if son.genome[counter] != -1:
        for counter2 in range(start):
            if son.genome[counter2] == -1:
                son.genome[counter2] = son.genome[counter]
                son.genome[counter] = -1
        for counter2 in range(stop, GENOME_LENGTH):
            if son.genome[counter2] == -1:
                son.genome[counter2] = son.genome[counter]
                son.genome[counter] = -1
son.genome[start:stop] = mother.genome[start:stop]

to this:

positions = (randrange(GENOME_LENGTH), randrange(GENOME_LENGTH))
cdef int start = min(positions)
cdef int stop = max(positions)
cdef int counter
cdef int counter2
cdef numpy.int64_t gene
cdef numpy.ndarray[numpy.int64_t, ndim=1] mother_genome = mother.get_genome()
cdef numpy.ndarray[numpy.int64_t, ndim=1] father_genome = father.get_genome()
son = father.copy()
daughter = mother.copy()
cdef numpy.ndarray[numpy.int64_t, ndim=1] son_genome = son.get_genome()
cdef numpy.ndarray[numpy.int64_t, ndim=1] daughter_genome = son.get_genome()
if start == stop:
    return (son, daughter)
# Crossover for mother-son.
for gene in mother_genome[start:stop]:
    for counter in range(GENOME_LENGTH):
        if son_genome[counter] == gene:
            son_genome[counter] = -1
for counter in range(start, stop):
    if son_genome[counter] != -1:
        for counter2 in range(start):
            if son_genome[counter2] == -1:
                son_genome[counter2] = son_genome[counter]
                son_genome[counter] = -1
        for counter2 in range(stop, GENOME_LENGTH):
            if son_genome[counter2] == -1:
                son_genome[counter2] = son_genome[counter]
                son_genome[counter] = -1
son_genome[start:stop] = mother_genome[start:stop]

Compiling it with -O3 in GCC gave me the following results:

cities   python       cython
-------- ------------ ----------
17       20 sec       3.45 sec
48       69 sec       3.92 sec
100      217 sec      4.88 sec
2500     many hours   868 sec

You read that right, there’s a 43x speedup with the 100 cities dataset. I suspect that reading it is what takes the most time. I have not tried this with the largest dataset yet, but pure Python hadn’t finished 100 epochs in about 20 minutes while the Cython code did them in about 30 seconds.

These speedups are nothins short of amazing, especially considering that I only spent an hour or two, and most of that was spent looking at the manual or trying to figure out how to pass my array in the code.

If you have a piece of Python that you need to run fast, then I would recommend you used Cython immediately. This means that I can exploit the beauty of Python and the speed of C together, and that’s a match made in heaven.