Recently I got 3 powerful servers and decided to install ubuntu cloud, I did that manually, through packages. But, I'm somewhat new to both linux and eucalyptus and, naturally, I got into lots of troubles.
Here's roughly how it went. Since those three servers have lots of processors, ram and disks I decided to take advantage of that. But rather than running three single-node clouds I thought it might be more manageable if I reuse another, older server for CLC and walrus (as far as I understand, there's no way to run 3 walruses on the more powerful servers rather than on the CLC, correct?). This meant I had to go for a multi-cluster setup, running CC, SC and NC on each of the new servers. The very first question I have is if that's actually advisable, I couldn't find any references to such setups on the internet?
So, currently, the clusters are connected through a single ethernet controller, they're running ubuntu 10.04.1, cloud is in managed-novlan mode (although there's no particular reason to prefer that rather than any other mode), both pubinterface and privinterface are set to the bridge (br0), and the rest is pretty much default settings.
I can run instances and so on (I use hybridfox) and everything is generally fine, except that the whole LAN starts to behave oddly.
Sometimes random nodes disappear for like 10 mins or so, sometimes ping on the public ip of a virtual machine goes to the correct VM, sometimes it goes to another. Sometimes nat rules for a VM are set properly, sometimes I can see missing rules. Sometimes ping on the LAN gets very high delay, sometimes it's ok.
After some experimenting I noted that even though I had spanning-tree off on the br0's it was somehow turned on again. As STP is running on the lan, I thought that could be the problem. I switched it off manually (brctl stp br0 off), but somehow, it got back on. I did that several times and annoyingly, every time it gets back on. So I killed it from the switch (which is some sort of a dell switch) - turned it off and made the switch filter STP packets on all ports. Didn't help at all.
And now I'm kind of running out of ideas. Anyone running a similar cloud setup? Any suggestions on what can I do at least to figure out better what's going on?