Netconf scalability question

asked 2015-09-28

updated 2015-09-28 08:13:31 -0700

Hello, I am running scalability tests for Netconf and can't scale above 450 nodes. The tests are doing the following operations:

  1. Build a docker image based on a docker file. The images is running ConfD.
  2. Start ODL.
  3. Spawn 400 containers running ConfD. ODL initiate a connection to the ConfD instance and the schema within the container are then interpreted by ODL.
  4. I wait until all 400 nodes are connected (with a 200s timeout).
  5. Rinse and repeat.

I have observed that above 450 or so, ODL shuts down RestConf and the Netconf nodes are in "Connecting" state. The machine running theses tests has 200GB RAM and has very high I/O. I am also seeing this in the logs:

I have started looking into increasing the fixed and flexible threadpools but looking at the logs, I can see that there are "too many files open" despite the machine having a maximum of 25 millions limit. Also, ConfD has a couple of Yang files that are synced with ODL.

Any input would be appreciated.

3 answers

answered 2015-09-28

updated 2015-09-28 14:37:08 -0700

As it turns out, the per process ulimit was set to 1024 but the global file limit to 25m. Increasing it to a reasonable number fixed the issue.


answered 2015-09-28

One thing to do, to get more info, is to attach a profiler (I've used jvisual to debug openflowplugin issues before). If you hit the fd limit and you have 25M limit configured, then something is probably not getting cleaned up.

answered 2015-09-28

uLimit will definitelly help, in Lithium (not sure about Helium) NETCONF also shares schema for same devices,

Memory overhead per device is rather low - in our testing we were continuosly able to get ~10k sessions from netconf devices to one controller instance with 2G heap.

