Ask is moving to Stack Overflow and Serverfault.com! Please use the "opendaylight" tag on either of these sites. This site is now in Read-Only mode

Revision history [back]

click to hide/show revision 1
initial version

Lithium Clustering Problems?

I currently am using Lithium SR1. I have three VMs which are connected to each other and configured for a 3-node cluster. For convenience, let us call these VM1, VM2, and VM3. I have also installed the integration project on one of these VMs in order to use the cluster monitor. I boot each VM in sequence, letting each reach the OpenDaylight welcome screen before booting the next. Upon booting up, the cluster leadership does not appear to be consisistent according to the cluster monitor; the leadership roles appear to be split between VM1 and VM2. My configuration files in all three clusters list the seed nodes and replica locations to be, in order, VM1, VM2, VM3. In the event of failure on VM2, I've seen its roles get delegated to VM3. I then create a mininet topology where each switch is connected to VM1, VM2, and VM3 in that order. An excerpt of my code with slight pseudocode is as follows:

OVSSwitch13 = partial( OVSSwitch, protocols='OpenFlow13')
net = Mininet(controller=RemoteController, switch=OVSSwitch13, link=TCLink)
c1 = net.addController('c1', insert_IP1_here, port=6633)
c2 = net.addController('c2', insert_IP2_here, port=6633)
c3 = net.addController('c3', insert_IP3_here, port=6633)

-set up network of five switches-
net.start()
c1.start()
c2.start()
c3.start()
s1.start([c1,c2,c3])
-repeat for remaining 4 switches-
net.start()
net.staticArp()
CLI( net )
netStop()

I wait for the cluster to report that it is listening on port 6633 before engaging this script. Upon successful activation, I can see the topology in the DLUX UI in all of the switches. I've also activated the table-miss-enforcer, so I can see the connections between switches as well at a higher level in the DLUX Topology window. If I then issue commands from within mininet to, say, change the topology to isolate a switch, within around 12 seconds or so the change will be reflected across the cluster. No pings can complete though, since there's no actual routing in Lithium, as far as I'm aware.

So far, as expected. If I shutdown one of the VMs that is not VM1 and leave the other two running, I can see the node go down in the cluster monitor, and if it was not a pure follower, its previous leadership positions will shift elsewhere.

Here is where the problems happen. If I shutdown VM1 while leaving the other two running, VM1s leaderships eventually get reassigned to VM2 or VM3. However, the topology ceases to actually update. It appears that the enforcer writes rules so that the switches only send messages to VM1 as it's their configured master, so when it goes down, we stop getting topology updates. If we wait for leadership of the four shards to complete reassignment and then reboot VM1, we start getting updates again, even though VM1 is a pure follower at this point according to the cluster monitor. Lastly, if we then shutdown VM1 again, the topology as a whole disappears from the other two functional members of the cluster. I can no longer see the switches at all, let alone the links that would be connecting them. In the event that the current primary fails, I believe the behavior should probably be along the lines of clearing out the current LLDP redirection flows and install new ones to redirect the packets to the current cluster topology leader. However, ODL isn't currently doing that and as such, this problem arises. It seems that there's a disconnect in reassigning leadership and actually adjusting the flows in the network to match.

If anyone sees a potential mistake I've made in setting up, please let me know.

Lithium Lithium-SR1 Clustering Problems?

I currently am using Lithium SR1. I have three VMs which are connected to each other and configured for a 3-node cluster. For convenience, let us call these VM1, VM2, and VM3. I have also installed the integration project on one of these VMs in order to use the cluster monitor. I boot each VM in sequence, letting each reach the OpenDaylight welcome screen before booting the next. Upon booting up, the cluster leadership does not appear to be consisistent according to the cluster monitor; the leadership roles appear to be split between VM1 and VM2. My configuration files in all three clusters list the seed nodes and replica locations to be, in order, VM1, VM2, VM3. In the event of failure on VM2, I've seen its roles get delegated to VM3. I then create a mininet topology where each switch is connected to VM1, VM2, and VM3 in that order. An excerpt of my code with slight pseudocode is as follows:

OVSSwitch13 = partial( OVSSwitch, protocols='OpenFlow13')
net = Mininet(controller=RemoteController, switch=OVSSwitch13, link=TCLink)
c1 = net.addController('c1', insert_IP1_here, port=6633)
c2 = net.addController('c2', insert_IP2_here, port=6633)
c3 = net.addController('c3', insert_IP3_here, port=6633)

-set up network of five switches-
net.start()
c1.start()
c2.start()
c3.start()
s1.start([c1,c2,c3])
-repeat for remaining 4 switches-
net.start()
net.staticArp()
CLI( net )
netStop()

I wait for the cluster to report that it is listening on port 6633 before engaging this script. Upon successful activation, I can see the topology in the DLUX UI in all of the switches. I've also activated the table-miss-enforcer, so I can see the connections between switches as well at a higher level in the DLUX Topology window. If I then issue commands from within mininet to, say, change the topology to isolate a switch, within around 12 seconds or so the change will be reflected across the cluster. No pings can complete though, since there's no actual routing in Lithium, as far as I'm aware.

So far, as expected. If I shutdown one of the VMs that is not VM1 and leave the other two running, I can see the node go down in the cluster monitor, and if it was not a pure follower, its previous leadership positions will shift elsewhere.

Here is where the problems happen. If I shutdown VM1 while leaving the other two running, VM1s leaderships eventually get reassigned to VM2 or VM3. However, the topology ceases to actually update. It appears that the enforcer writes rules so that the switches only send messages to VM1 as it's their configured master, so when it goes down, we stop getting topology updates. If we wait for leadership of the four shards to complete reassignment and then reboot VM1, we start getting updates again, even though VM1 is a pure follower at this point according to the cluster monitor. Lastly, if we then shutdown VM1 again, the topology as a whole disappears from the other two functional members of the cluster. I can no longer see the switches at all, let alone the links that would be connecting them. In the event that the current primary fails, I believe the behavior should probably be along the lines of clearing out the current LLDP redirection flows and install new ones to redirect the packets to the current cluster topology leader. However, ODL isn't currently doing that and as such, this problem arises. It seems that there's a disconnect in reassigning leadership and actually adjusting the flows in the network to match.

If anyone sees a potential mistake I've made in setting up, please let me know.

Lithium-SR1 Clustering Problems?Problem - LLDP flows not updating to match new Leader?

I currently am using Lithium SR1. I have three VMs which are connected to each other and configured for a 3-node cluster. For convenience, let us call these VM1, VM2, and VM3. I have also installed the integration project on one of these VMs in order to use the cluster monitor. I boot each VM in sequence, letting each reach the OpenDaylight welcome screen before booting the next. Upon booting up, the cluster leadership does not appear to be consisistent according to the cluster monitor; the leadership roles appear to be split between VM1 and VM2. My configuration files in all three clusters list the seed nodes and replica locations to be, in order, VM1, VM2, VM3. In the event of failure on VM2, I've seen its roles get delegated to VM3. I then create a mininet topology where each switch is connected to VM1, VM2, and VM3 in that order. An excerpt of my code with slight pseudocode is as follows:

OVSSwitch13 = partial( OVSSwitch, protocols='OpenFlow13')
net = Mininet(controller=RemoteController, switch=OVSSwitch13, link=TCLink)
c1 = net.addController('c1', insert_IP1_here, port=6633)
c2 = net.addController('c2', insert_IP2_here, port=6633)
c3 = net.addController('c3', insert_IP3_here, port=6633)

-set up network of five switches-
net.start()
c1.start()
c2.start()
c3.start()
s1.start([c1,c2,c3])
-repeat for remaining 4 switches-
net.start()
net.staticArp()
CLI( net )
netStop()

I wait for the cluster to report that it is listening on port 6633 before engaging this script. Upon successful activation, I can see the topology in the DLUX UI in all of the switches. I've also activated the table-miss-enforcer, so I can see the connections between switches as well at a higher level in the DLUX Topology window. If I then issue commands from within mininet to, say, change the topology to isolate a switch, within around 12 seconds or so the change will be reflected across the cluster. No pings can complete though, since there's no actual routing in Lithium, as far as I'm aware.

So far, as expected. If I shutdown one of the VMs that is not VM1 and leave the other two running, I can see the node go down in the cluster monitor, and if it was not a pure follower, its previous leadership positions will shift elsewhere.

Here is where the problems happen. If I shutdown VM1 while leaving the other two running, VM1s leaderships eventually get reassigned to VM2 or VM3. However, the topology ceases to actually update. It appears that the enforcer writes rules so that the switches only send messages to VM1 as it's their configured master, so when it goes down, we stop getting topology updates. If we wait for leadership of the four shards to complete reassignment and then reboot VM1, we start getting updates again, even though VM1 is a pure follower at this point according to the cluster monitor. Lastly, if we then shutdown VM1 again, the topology as a whole disappears from the other two functional members of the cluster. I can no longer see the switches at all, let alone the links that would be connecting them. In the event that the current primary fails, I believe the behavior should probably be along the lines of clearing out the current LLDP redirection flows and install new ones to redirect the packets to the current cluster topology leader. However, ODL isn't currently doing that and as such, this problem arises. It seems that there's a disconnect in reassigning leadership and actually adjusting the flows in the network to match.

If anyone sees a potential mistake I've made in setting up, please let me know.

Lithium-SR1 Clustering Problem - LLDP flows not updating to match new Leader?

I currently am using Lithium SR1. I have three VMs which are connected to each other and configured for a 3-node cluster. For convenience, let us call these VM1, VM2, and VM3. I have also installed the integration project on one of these VMs in order to use the cluster monitor. I boot each VM in sequence, letting each reach the OpenDaylight welcome screen before booting the next. Upon booting up, the cluster leadership does not appear to be consisistent according to the cluster monitor; the leadership roles appear to be split between VM1 and VM2. My configuration files in all three clusters list the seed nodes and replica locations to be, in order, VM1, VM2, VM3. In the event of failure on VM2, I've seen its roles get delegated to VM3. I then create a mininet topology where each switch is connected to VM1, VM2, and VM3 in that order. An excerpt of my code with slight pseudocode is as follows:

OVSSwitch13 = partial( OVSSwitch, protocols='OpenFlow13')
net = Mininet(controller=RemoteController, switch=OVSSwitch13, link=TCLink)
c1 = net.addController('c1', insert_IP1_here, port=6633)
c2 = net.addController('c2', insert_IP2_here, port=6633)
c3 = net.addController('c3', insert_IP3_here, port=6633)

-set up network of five switches-
net.start()
c1.start()
c2.start()
c3.start()
s1.start([c1,c2,c3])
-repeat for remaining 4 switches-
net.start()
net.staticArp()
CLI( net )
netStop()

I wait for the cluster to report that it is listening on port 6633 before engaging this script. Upon successful activation, I can see the topology in the DLUX UI in all of the switches. I've also activated the table-miss-enforcer, so I can see the connections between switches as well at a higher level in the DLUX Topology window. If I then issue commands from within mininet to, say, change the topology to isolate a switch, within around 12 seconds or so the change will be reflected across the cluster. No pings can complete though, since there's no actual routing in Lithium, as far as I'm aware.

So far, as expected. If I shutdown one of the VMs that is not VM1 and leave the other two running, I can see the node go down in the cluster monitor, and if it was not a pure follower, its previous leadership positions will shift elsewhere.

Here is where the problems happen. If I shutdown VM1 while leaving the other two running, VM1s leaderships eventually get reassigned to VM2 or VM3. However, the topology ceases to actually update. It appears that the enforcer writes rules so that the switches only send messages to VM1 as it's their configured master, so when it goes down, we stop getting topology updates. If we wait for leadership of the four shards to complete reassignment and then reboot VM1, we start getting updates again, even though VM1 is a pure follower at this point according to the cluster monitor. Lastly, if we then shutdown VM1 again, the topology as a whole disappears from the other two functional members of the cluster. I can no longer see the switches at all, let alone the links that would be connecting them. In the event that the current primary fails, I believe the behavior should probably be along the lines of clearing out the current LLDP redirection flows and install new ones to redirect the packets to the current cluster topology leader. Alternatively, send LLDP packets to all cluster members and not just one. However, ODL isn't currently doing that either of those and as such, this problem arises. It seems that there's a disconnect in reassigning leadership and actually adjusting the flows in the network to match.

If anyone sees a potential mistake I've made in setting up, please let me know.

Lithium-SR1 Clustering Problem - LLDP flows not updating to match new Leader?Leader

I currently am using Lithium SR1. I have three VMs which are connected to each other and configured for a 3-node cluster. For convenience, let us call these VM1, VM2, and VM3. I have also installed the integration project on one of these VMs in order to use the cluster monitor. I boot each VM in sequence, letting each reach the OpenDaylight welcome screen before booting the next. Upon booting up, the cluster leadership does not appear to be consisistent according to the cluster monitor; the leadership roles appear to be split between VM1 and VM2. My configuration files in all three clusters list the seed nodes and replica locations to be, in order, VM1, VM2, VM3. In the event of failure on VM2, I've seen its roles get delegated to VM3. I then create a mininet topology where each switch is connected to VM1, VM2, and VM3 in that order. An excerpt of my code with slight pseudocode is as follows:

OVSSwitch13 = partial( OVSSwitch, protocols='OpenFlow13')
net = Mininet(controller=RemoteController, switch=OVSSwitch13, link=TCLink)
c1 = net.addController('c1', insert_IP1_here, port=6633)
c2 = net.addController('c2', insert_IP2_here, port=6633)
c3 = net.addController('c3', insert_IP3_here, port=6633)

-set up network of five switches-
net.start()
c1.start()
c2.start()
c3.start()
s1.start([c1,c2,c3])
-repeat for remaining 4 switches-
net.start()
net.staticArp()
CLI( net )
netStop()

I wait for the cluster to report that it is listening on port 6633 before engaging this script. Upon successful activation, I can see the topology in the DLUX UI in all of the switches. I've also activated the table-miss-enforcer, so I can see the connections between switches as well at a higher level in the DLUX Topology window. If I then issue commands from within mininet to, say, change the topology to isolate a switch, within around 12 seconds or so the change will be reflected across the cluster. No pings can complete though, since there's no actual routing in Lithium, as far as I'm aware.

So far, as expected. If I shutdown one of the VMs that is not VM1 and leave the other two running, I can see the node go down in the cluster monitor, and if it was not a pure follower, its previous leadership positions will shift elsewhere.

Here is where the problems happen. If I shutdown VM1 while leaving the other two running, VM1s leaderships eventually get reassigned to VM2 or VM3. However, the topology ceases to actually update. It appears that the enforcer writes rules so that the switches only send messages to VM1 as it's their configured master, so when it goes down, we stop getting topology updates. If we wait for leadership of the four shards to complete reassignment and then reboot VM1, we start getting updates again, even though VM1 is a pure follower at this point according to the cluster monitor. Lastly, if we then shutdown VM1 again, the topology as a whole disappears from the other two functional members of the cluster. I can no longer see the switches at all, let alone the links that would be connecting them. In the event that the current primary fails, I believe the behavior should probably be along the lines of clearing out the current LLDP redirection flows and install new ones to redirect the packets to the current cluster topology leader. Alternatively, send LLDP packets to all cluster members and not just one. However, ODL isn't currently doing either of those and as such, this problem arises. It seems that there's a disconnect in reassigning leadership and actually adjusting the flows in the network to match.

If anyone sees a potential mistake I've made in setting up, please let me know.

Lithium-SR1 Clustering Problem - LLDP flows not updating to match new Leader

I currently am using Lithium SR1. I have three VMs which are connected to each other and configured for a 3-node cluster. For convenience, let us call these VM1, VM2, and VM3. I have also installed the integration project on one of these VMs in order to use the cluster monitor. I boot each VM in sequence, letting each reach the OpenDaylight welcome screen before booting the next. Upon booting up, the cluster leadership does not appear to be consisistent according to the cluster monitor; the leadership roles appear to be split between VM1 and VM2. My configuration files in all three clusters list the seed nodes and replica locations to be, in order, VM1, VM2, VM3. In the event of failure on VM2, I've seen its roles get delegated to VM3. I then create a mininet topology where each switch is connected to VM1, VM2, and VM3 in that order. An excerpt of my code with slight pseudocode is as follows:

OVSSwitch13 = partial( OVSSwitch, protocols='OpenFlow13')
net = Mininet(controller=RemoteController, switch=OVSSwitch13, link=TCLink)
c1 = net.addController('c1', insert_IP1_here, port=6633)
c2 = net.addController('c2', insert_IP2_here, port=6633)
c3 = net.addController('c3', insert_IP3_here, port=6633)

-set up network of five switches-
net.start()
c1.start()
c2.start()
c3.start()
s1.start([c1,c2,c3])
-repeat for remaining 4 switches-
net.start()
net.staticArp()
CLI( net )
netStop()

I wait for the cluster to report that it is listening on port 6633 before engaging this script. Upon successful activation, I can see the topology in the DLUX UI in all of the switches. I've also activated the table-miss-enforcer, so I can see the connections between switches as well at a higher level in the DLUX Topology window. If I then issue commands from within mininet to, say, change the topology to isolate a switch, within around 12 seconds or so the change will be reflected across the cluster. No pings can complete though, since there's no actual routing in Lithium, as far as I'm aware.

So far, as expected. If I shutdown one of the VMs that is not VM1 and leave the other two running, I can see the node go down in the cluster monitor, and if it was not a pure follower, its previous leadership positions will shift elsewhere.

Here is where the problems happen. If I shutdown VM1 while leaving the other two running, VM1s leaderships eventually get reassigned to VM2 or VM3. However, the topology ceases to actually update. It appears that the enforcer writes rules so that the switches only send messages to VM1 as it's their configured master, so when it goes down, we stop getting topology updates. If we wait for leadership of the four shards to complete reassignment and then reboot VM1, we start getting updates again, even though VM1 is a pure follower at this point according to the cluster monitor. Lastly, if we then shutdown VM1 again, the topology as a whole disappears from the other two functional members of the cluster. I can no longer see the switches at all, let alone the links that would be connecting them. In the event that the current primary fails, I believe the behavior should probably be along the lines of clearing out the current LLDP redirection flows and install new ones to redirect the packets to the current cluster topology leader. Alternatively, send LLDP packets to all cluster members and not just one. However, ODL isn't currently doing either of those and as such, this problem arises. It seems that there's a disconnect in reassigning leadership and actually adjusting the flows in the network to match.

If anyone sees a potential mistake I've made in setting up, please let me know.

Lithium-SR1 Clustering Problem - LLDP flows not updating to match new Leader

I currently am using Lithium SR1. I have three VMs which are connected to each other and configured for a 3-node cluster. For convenience, let us call these VM1, VM2, and VM3. I have also installed the integration project on one of these VMs in order to use the cluster monitor. I boot each VM in sequence, letting each reach the OpenDaylight welcome screen before booting the next. Upon booting up, the cluster leadership does not appear to be consisistent according to the cluster monitor; the leadership roles appear to be split between VM1 and VM2. My configuration files in all three clusters list the seed nodes and replica locations to be, in order, VM1, VM2, VM3. In the event of failure on VM2, I've seen its roles get delegated to VM3. I then create a mininet topology where each switch is connected to VM1, VM2, and VM3 in that order. An excerpt of my code with slight pseudocode is as follows:

OVSSwitch13 = partial( OVSSwitch, protocols='OpenFlow13')
net = Mininet(controller=RemoteController, switch=OVSSwitch13, link=TCLink)
c1 = net.addController('c1', insert_IP1_here, port=6633)
c2 = net.addController('c2', insert_IP2_here, port=6633)
c3 = net.addController('c3', insert_IP3_here, port=6633)

-set up network of five switches-
net.start()
c1.start()
c2.start()
c3.start()
s1.start([c1,c2,c3])
-repeat for remaining 4 switches-
net.start()
net.staticArp()
CLI( net )
netStop()

I wait for the cluster to report that it is listening on port 6633 before engaging this script. Upon successful activation, I can see the topology in the DLUX UI in all of the switches. I've also activated the table-miss-enforcer, so I can see the connections between switches as well at a higher level in the DLUX Topology window. If I then issue commands from within mininet to, say, change the topology to isolate a switch, within around 12 seconds or so the change will be reflected across the cluster. No pings can complete though, since there's no actual routing in Lithium, as far as I'm aware.

So far, as expected. If I shutdown one of the VMs that is not VM1 and leave the other two running, I can see the node go down in the cluster monitor, and if it was not a pure follower, its previous leadership positions will shift elsewhere.

Here is where the problems happen. If I shutdown VM1 while leaving the other two running, VM1s leaderships eventually get reassigned to VM2 or VM3. However, the topology ceases to actually update. It appears that the enforcer writes rules so that the switches only send messages to VM1 as it's their configured master, so when it goes down, we stop getting topology updates. If we wait for leadership of the four shards to complete reassignment and then reboot VM1, we start getting updates again, even though VM1 is a pure follower at this point according to the cluster monitor. Lastly, if we then shutdown VM1 again, the topology as a whole disappears from the other two functional members of the cluster. I can no longer see the switches at all, let alone the links that would be connecting them. In the event that the current primary fails, I believe the behavior should probably be along the lines of clearing out the current LLDP redirection flows and install new ones to redirect the packets to the current cluster topology leader. Alternatively, send LLDP packets to all cluster members and not just one. However, ODL isn't currently doing either of those and as such, this problem arises. It seems that there's a disconnect in reassigning leadership and actually adjusting the flows in the network to match. I've heard the enforcer is only there for plug and play purposes and probably doesn't support what I expected it to do, though...

If anyone sees a potential mistake I've made in setting up, please let me know.