nutznboltz
November 25th, 2011, 04:35 PM
What is the best way to submit a patch to fix all the damage that LP: #600941 causes?
I ask because LP: #600941 was put into every version of Ubuntu still supported at this time.
The story is that after it was pushed out all of our systems started experiencing Nagios nrpe restart failures.
Commands like
/etc/init.d/nagios-nrpe-server restart
would cause nrpe to stop but not restart.
I tracked this down to the way that the /etc/init.d/nagios-nrpe-server script is calling start-stop-daemon.
The issue is that the "stop" stanza in the /etc/init.d/nagios-nrpe-server script first calls start-stop-daemon which sends SIGTERM to nrpe and then waits only for one second.
If nrpe has not exited by that time the pid file will still exist and the /etc/init.d/nagios-nrpe-server script will remove it.
Worse if "/etc/init.d/nagios-nrpe-server restart" is used not only will the pid file be removed, the attempt to restart nrpe will fail provided that the nrpe daemon is still tardy in shutting down.
The attempt to start under those circumstances will fail because nrpe will still be bound to a socket and the second attempt at binding will cause the nrpe startup to abort.
They should have wondered why there was a comment about "sometimes the pid file does not get removed".
They should have tested on systems that have a heavy load and therefore slow nrpe response times.
Thanks
I ask because LP: #600941 was put into every version of Ubuntu still supported at this time.
The story is that after it was pushed out all of our systems started experiencing Nagios nrpe restart failures.
Commands like
/etc/init.d/nagios-nrpe-server restart
would cause nrpe to stop but not restart.
I tracked this down to the way that the /etc/init.d/nagios-nrpe-server script is calling start-stop-daemon.
The issue is that the "stop" stanza in the /etc/init.d/nagios-nrpe-server script first calls start-stop-daemon which sends SIGTERM to nrpe and then waits only for one second.
If nrpe has not exited by that time the pid file will still exist and the /etc/init.d/nagios-nrpe-server script will remove it.
Worse if "/etc/init.d/nagios-nrpe-server restart" is used not only will the pid file be removed, the attempt to restart nrpe will fail provided that the nrpe daemon is still tardy in shutting down.
The attempt to start under those circumstances will fail because nrpe will still be bound to a socket and the second attempt at binding will cause the nrpe startup to abort.
They should have wondered why there was a comment about "sometimes the pid file does not get removed".
They should have tested on systems that have a heavy load and therefore slow nrpe response times.
Thanks