Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
This issue is reproducible in FreeBSD under moderate load during a restart of Tomcat 8 running via jsvc.
The old jsvc controller process exits after the new process is started, deleting the new pid file with it. As a result, the jsvc starter process fails with a timeout since it is waiting on the pid file to be created, which never happens. The Tomcat process itself is started without a pid file.
Example: (35659/35660 is the old jsvc process, 56362/56363 is the new jsvc process):
2017-11-08 09:36:59 35660 jsvc debug: Daemon destroyed successfully 2017-11-08 09:36:59 35660 jsvc debug: Calling System.exit(0) 2017-11-08 09:36:59 56362 jsvc debug: Switching umask back to 022 from 077 (((/var/run/tomcat8.pid is written by 56362 here))) 2017-11-08 09:36:59 56363 jsvc debug: Using specific JVM in /usr/local/openjdk8/jre/lib/amd64/server/libjvm.so 2017-11-08 09:36:59 56363 jsvc debug: Attemtping to load library /usr/local/openjdk8/jre/lib/amd64/server/libjvm.so (((/var/run/tomcat8.pid is deleted by 35659 here))) 2017-11-08 09:36:59 35659 jsvc debug: Service shut down 2017-11-08 09:36:59 56363 jsvc debug: JVM library /usr/local/openjdk8/jre/lib/amd64/server/libjvm.so loaded 2017-11-08 09:36:59 56363 jsvc debug: JVM library entry point found (0x019DE640)
Restart script eventually times out:
>/usr/local/etc/rc.d/tomcat8 restart Stopping tomcat8. Waiting for PIDS: 35660. Starting tomcat8. /usr/local/etc/rc.d/tomcat8: WARNING: failed to start tomcat8
No PID file:
>ls -l /var/run/tomcat8.pid ls: /var/run/tomcat8.pid: No such file or directory
Yet Tomcat is running:
>ps ax|grep java|grep -v grep 56362 - Is 0:00.00 /usr/local/bin/jsvc -java-home /usr/local/openjdk8 -server -user www -pidfile /var/run/tomcat8.pid -wait 300 -outfile /u 56363 - I 0:57.25 /usr/local/bin/jsvc -java-home /usr/local/openjdk8 -server -user www -pidfile /var/run/tomcat8.pid -wait 300 -outfile /u
The issue is that the pidfile contains the PID of the child, but is being deleted by the parent process (the controller), in the run_controller function which looks like:
static int run_controller(arg_data *args, home_data *data, uid_t uid, gid_t gid) . . . waitpid(pid, &status, 0); unlink(args->pidf);
If the controller process is paged out (which happens often because it is dormant while inside waitpid), then considerable amount of time can pass between the time the child terminates and the call to unlink(args->pidf).
The issue can be reproduced reliably by adding sleep(1); before unlink(args->pidf).