In case anyone is interested in what it actually means, the messy upper line is how good my network is (lower is better) at my test data. Every line is one trained network of different configuration. As you can see they are basically all the same so I guess the configuration is not the limiting factor here.
The thin lower lines are how good the networks are at the training data (again, lower is better). Having a network that is good only at training data is useless so that they differ is of no consequence. That I still trained them was more in an attempt to see if things got better eventually (pyBrain is not very fast so I let it run while I worked).
I think my problem is a combination of the data I send into the network and the quality of the training data. I think some of it is wrongly labeled which probably confuses the trainer and prevents it from building an optimal network.