Lithium-SR1 Clustering Problem - LLDP flows not updating to match new Leader

asked 2016-01-07 08:22:50 -0700

DeJuan gravatar image

updated 2016-01-08 10:04:07 -0700

I currently am using Lithium SR1. I have three VMs which are connected to each other and configured for a 3-node cluster. For convenience, let us call these VM1, VM2, and VM3. I have also installed the integration project on one of these VMs in order to use the cluster monitor. I boot each VM in sequence, letting each reach the OpenDaylight welcome screen before booting the next. Upon booting up, the cluster leadership does not appear to be consisistent according to the cluster monitor; the leadership roles appear to be split between VM1 and VM2. My configuration files in all three clusters list the seed nodes and replica locations to be, in order, VM1, VM2, VM3. In the event of failure on VM2, I've seen its roles get delegated to VM3. I then create a mininet topology where each switch is connected to VM1, VM2, and VM3 in that order. An excerpt of my code with slight pseudocode is as follows:

OVSSwitch13 = partial( OVSSwitch, protocols='OpenFlow13')
net = Mininet(controller=RemoteController, switch=OVSSwitch13, link=TCLink)
c1 = net.addController('c1', insert_IP1_here, port=6633)
c2 = net.addController('c2', insert_IP2_here, port=6633)
c3 = net.addController('c3', insert_IP3_here, port=6633)

-set up network of five switches-
-repeat for remaining 4 switches-
CLI( net )

I wait for the cluster to report that it is listening on port 6633 before engaging this script. Upon successful activation, I can see the topology in the DLUX UI in all of the switches. I've also activated the table-miss-enforcer, so I can see the connections between switches as well at a higher level in the DLUX Topology window. If I then issue commands from within mininet to, say, change the topology to isolate a switch, within around 12 seconds or so the change will be reflected across the cluster. No pings can complete though, since there's no actual routing in Lithium, as far as I'm aware.

So far, as expected. If I shutdown one of the VMs that is not VM1 and leave the other two running, I can see the node go down in the cluster monitor, and if it was not a pure follower, its previous leadership positions will shift elsewhere.

Here is where the problems happen. If I shutdown VM1 while leaving the other two running, VM1s leaderships eventually get reassigned to VM2 or VM3. However, the topology ceases to actually update. It appears that the enforcer writes rules so that the switches only send messages to VM1 as it's their configured master, so when it goes down, we stop getting topology updates. If we wait for leadership of the four shards to complete reassignment and then reboot VM1, we start getting updates again, even though VM1 is a pure follower at this point according to the cluster monitor. Lastly, if we then shutdown VM1 again, the topology as a whole disappears from the ... (more)

edit retag flag offensive close merge delete