Runtime Watchdog¶
Note
This feature is available since Cloe runtime version 0.16.
The Cloe runtime has the ability to activate a watchdog that will act when
a simulation state exceeds a configured timeout. It can be configured in the
stack file in the engine section and has the following defaults:
engine:
watchdog:
mode: off
default_timeout: 90000
state_timeouts:
CONNECT: 300000
ABORT: 90000
STOP: 300000
DISCONNECT: 600000
/engine/watchdog/mode¶
The following modes are available:
offThe watchdog is disabled (the default).
logWhen a timeout occurs, the watchdog logs a critical message, but does nothing else:
Watchdog timeout of 90000 ms exceeded for state: X
This can be useful if the log messages are continuously monitored, as the orchestrator may be better suited to provide customizable reactions.
abortWhen a timeout occurs, the watchdog logs a message and then pushes an ABORT interrupt. This will result in an orderly shutdown but will not work if the state that caused the timeout never returns.
killWhen a timeout occurs, the program is killed. None of the plugins will be given the opportunity to clean up, so this may result in output files that are only partially written or processes that are still running in the background.
/engine/watchdog/default_timeout¶
The default timeout is used for each state unless a state-specific timeout is
set in /engine/watchdog/state_timeouts. This value is specified in
milliseconds, with zero indicating no timeout.
Note
The default timeout should generally be at least as long as the
polling interval (set in /engine/polling_interval), otherwise the watchdog
will trigger during normal operation.
/engine/watchdog/state_timeouts¶
Not all states need the same time, in particular, the CONNECT and DISCONNECT states may require I/O operations that can take an order of magnitude more time than other states in the simulation.
Each state can therefore be given a state-specific timeout. This can be either
null to use the default timeout, or a number of milliseconds. The following
case-sensitive states are available:
CONNECT
START
STEP_BEGIN
STEP_SIMULATORS
STEP_CONTROLLERS
STEP_END
PAUSE
RESUME
SUCCESS
FAIL
ABORT
STOP
RESET
KEEP_ALIVE
DISCONNECT
See System States for more information on the simulation states.