Reg:ODL_Lithium_SR2_clustering_issues

asked 2015-12-06 22:31:55 -0700

Raghav gravatar image

updated 2015-12-06 22:33:19 -0700

Hi All,

I have a 3 node cluster set up and the cluster comes up fine with no issues.

But when I trigger my network discovery process most of the time any of the follower nodes gets the below exceptions spread across its karaf log and the MDSAL instance goes for a toss.

Let me know when the cluster (akka layer) gets into this state and let me know any important cluster configuration related parameters that shall aid me here to solve and get out of this.

Thanks for your time and assistance.

Please find the below for your reference:

The below is extracted from the shard MBean instance across the 3 nodes:

./chkcluster.sh chkshards Shard Leader LeadershipChangeCount "member-1-shard-default-operational", : "member-3-shard-default-operational", : 21, "member-2-shard-default-operational", : "member-3-shard-default-operational", : 19, "member-3-shard-default-operational", : "member-3-shard-default-operational", : 21,

I believe there are frequent elections and shard leader switches seeing the above leadershipChangeCount param.

The stack snaps are as shown below:

a)org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Could not find a leader for shard member-2-shard-default-operational. This typically happens when the system is coming up or recovering and a leader is being elected. Try again later. at org.opendaylight.controller.cluster.datastore.NoOpTransactionContext.readData(NoOpTransactionContext.java:76)[190:org.opendaylight.controller.sal-distributed-datastore:1.2.0.Lithium] ... 23 more d)Caused by: org.opendaylight.controller.cluster.datastore.exceptions.NoShardLeaderException: Could not find a leader for shard member-2-shard-default-operational. This typically happens when the system is coming up or recovering and a leader is being elected. Try again later. at org.opendaylight.controller.cluster.datastore.ShardManager.createNoShardLeaderException(ShardManager.java:392)[190:org.opendaylight.controller.sal-distributed-datastore:1.2.0.Lithium] at org.opendaylight.controller.cluster.datastore.ShardManager.onShardNotInitializedTimeout(ShardManager.java:230)[190:org.opendaylight.controller.sal-distributed-datastore:1.2.0.Lithium] at org.opendaylight.controller.cluster.datastore.ShardManager.handleCommand(ShardManager.java:182)[190:org.opendaylight.controller.sal-distributed-datastore:1.2.0.Lithium] at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:36)[182:org.opendaylight.controller.sal-clustering-commons:1.2.0.Lithium] at akka.persistence.UntypedPersistentActor.onReceive(Eventsourced.scala:430)[180:com.typesafe.akka.persistence.experimental:2.3.10] at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:97)[182:org.opendaylight.controller.sal-clustering-commons:1.2.0.Lithium] at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:534)[175:com.typesafe.akka.actor:2.3.10] at akka.persistence.Recovery$State$class.process(Recovery.scala:30)[180:com.typesafe.akka.persistence.experimental:2.3.10] at akka.persistence.ProcessorImpl$$anon$2.process(Processor.scala:103)[180:com.typesafe.akka.persistence.experimental:2.3.10] at akka.persistence.ProcessorImpl$$anon$2.aroundReceive(Processor.scala:114)[180:com.typesafe.akka.persistence.experimental:2.3.10] at akka.persistence.Recovery$class.aroundReceive(Recovery.scala:265)[180:com.typesafe.akka.persistence.experimental:2.3.10] at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(Eventsourced.scala:428)[180:com.typesafe.akka.persistence.experimental:2.3.10] at akka.persistence.Eventsourced$$anon$2.doAroundReceive(Eventsourced.scala:82)[180:com.typesafe.akka.persistence.experimental ... (more)

edit retag flag offensive close merge delete

Comments

https://www.mail-archive.com/controller-dev@lists.opendaylight.org/msg00709.html

sunilkumarms ( 2017-08-30 05:17:41 -0700 )edit