Hello, I am running scalability tests for Netconf and can't scale above 450 nodes. The tests are doing the following operations:
- Build a docker image based on a docker file. The images is running ConfD.
- Start ODL.
- Spawn 400 containers running ConfD. ODL initiate a connection to the ConfD instance and the schema within the container are then interpreted by ODL.
- I wait until all 400 nodes are connected (with a 200s timeout).
- Rinse and repeat.
I have observed that above 450 or so, ODL shuts down RestConf and the Netconf nodes are in "Connecting" state. The machine running theses tests has 200GB RAM and has very high I/O. I am also seeing this in the logs: https://gist.github.com/sniggel/fa337d204a4d245a63cf
I have started looking into increasing the fixed and flexible threadpools but looking at the logs, I can see that there are "too many files open" despite the machine having a maximum of 25 millions limit. Also, ConfD has a couple of Yang files that are synced with ODL.
Any input would be appreciated.