State Machine Communication: Prefer Requests Over Start/Stop Commands

The Problem

PLC programs often use state machines to control machine behavior. That is usually a good choice. A state machine makes the current operating mode visible and keeps transitions explicit.

The difficult part starts when one state machine must control another one.

Assume PLC 1 decides that a unit should run, and PLC 2 controls that unit. The simple solution is to send a start command from PLC 1 to PLC 2. Later, PLC 1 sends a stop command.

That works in a demo. It is weaker in a real machine.

Start command between two state machines

A start command is an event. It says that something happened at one point in time. It does not say that the sender is still alive. It does not say that the communication link is still healthy. It does not say that running is still wanted.

For continuous operation, the receiver usually needs more than an event. It needs to know the current intent of the sender.

Start/Stop Commands Are Fragile

One-shot commands are easy to understand, but they create hidden protocol requirements.

Consider this sequence:

PLC 1 sends start.
PLC 2 receives start and enters RUNNING.
PLC 1 loses power.
PLC 2 keeps running because no stop command arrives.

PLC 2 cannot distinguish these cases:

PLC 1 still wants the unit to run.
PLC 1 is stopped and cannot send anything.

The same problem appears when a network switch fails, an ADS connection is interrupted, or a task stops executing. The receiver may continue based on an old event.

You can make a start/stop protocol robust. But then it is no longer just start and stop. You need extra parts:

acknowledgement bits
command sequence numbers
state feedback
communication watchdogs
timeout handling
reset behavior
error diagnostics

Those parts are not wrong. In some systems they are required. But they are also work, and they are easy to implement inconsistently.

The core issue is simple:

A command describes an event. A request describes an active intent.

For continuous operation, active intent is often the better abstraction.

A Better Pattern: Cyclic Run Request

Instead of sending start once, the controlling state machine repeatedly sends a run request while it wants the controlled state machine to run.

The message is no longer:

Start now.

It becomes:

Keep running while this request continues to arrive.

Cyclic run request between two state machines Proposal

The controlled state machine uses a timeout. Every valid run request refreshes it. If requests stop arriving, the controlled state machine leaves RUNNING and moves to STOPPING or IDLE.

This is a watchdog pattern. With a one-shot start, the default after a communication failure may be “continue running”. With a cyclic run request, the default becomes “stop after the timeout”.

That is usually the safer operational behavior. It is not the same as functional safety, but it is a useful control-layer default.

How the Timeout Works

In TwinCAT, a TOF timer is a practical way to implement this pattern.

TOF is an off-delay timer. Its output Q stays TRUE for the configured time PT after the input IN becomes FALSE.

This is useful because the run request is normally only present for one PLC scan. The receiver sets an internal runRequested flag when a request arrives. The cyclic update method calls the timer and clears the flag again.

As long as new requests arrive before the off-delay expires, Q remains TRUE. If requests stop, Q becomes FALSE after the timeout.

Important rule:

Call the timer every scan cycle.

Do not call timers only inside selected states unless you have a clear reason. A skipped timer call means skipped timer behavior. That can make timeout diagnostics misleading.

TwinCAT Implementation

The example below shows the core pattern. Real code should add diagnostics, reset handling, and application-specific stop behavior.

First define an explicit enum for the state machine.

{attribute 'qualified_only'}
{attribute 'strict'}
TYPE MachineState :
(
    IDLE,
    PREPARING,
    RUNNING,
    STOPPING,
    ERROR
) DINT;
END_TYPE

The controller stores the current state, the one-scan request flag, and the off-delay timer.

FUNCTION_BLOCK MachineController
VAR
    state : MachineState := MachineState.IDLE;
    runRequested : BOOL;
    runWatchdog : TOF;
END_VAR
VAR CONSTANT
    RUN_REQUEST_TIMEOUT : TIME := T#100MS;
END_VAR

The controlling state machine calls requestRun() cyclically while operation is wanted.

METHOD requestRun

runRequested := TRUE;

The controlled state machine calls update() once per PLC scan.

METHOD update

runWatchdog(IN := runRequested, PT := RUN_REQUEST_TIMEOUT);
runRequested := FALSE;

CASE state OF

    MachineState.IDLE:
        IF runWatchdog.Q THEN
            state := MachineState.PREPARING;
        END_IF

    MachineState.PREPARING:
        state := MachineState.RUNNING;

    MachineState.RUNNING:
        IF NOT runWatchdog.Q THEN
            state := MachineState.STOPPING;
        END_IF

    MachineState.STOPPING:
        state := MachineState.IDLE;

    MachineState.ERROR:
        ;

ELSE
    state := MachineState.ERROR;

END_CASE

The timer is called first. Then runRequested is cleared. If another function block wants this machine to keep running, it must call requestRun() again before the next update() call.

This makes the request edge-independent. The receiver does not care about a rising edge. It cares whether the request continues to be refreshed.

In a larger application, the caller might look like this:

IF productionState = ProductionState.RUNNING THEN
    machineController.requestRun();
END_IF

machineController.update();

Call order matters. The request should be made before update() in the same scan if both blocks run in the same task. If the request comes from another PLC or task, size the timeout accordingly.

Choosing the Timeout

The timeout must be longer than the worst-case interval between valid requests.

Do not base it only on the nominal task cycle. Real systems have jitter, communication delays, task priority effects, and online changes. A short timeout can create false stops. A long timeout delays the reaction to a lost request.

Example for local logic in the same PLC task:

Parameter	Value
Sender task cycle	10 ms
Receiver task cycle	10 ms
Expected request interval	10 ms
Practical timeout	30 ms to 100 ms

For communication between PLCs, use more margin. The correct value depends on the communication mechanism, task cycle times, network load, and required stopping behavior.

As a rule of thumb, start with this question:

What is the longest acceptable time the receiver may continue running after the sender stops requesting operation?

Then check whether the communication path can reliably refresh the request within that time. If not, the architecture has a conflict. Increasing the timeout may hide nuisance stops, but it also increases running time after a failure.

That trade-off should be explicit.

Failure Modes

This pattern improves several common failure modes, but it does not remove the need for proper machine design.

Failure	Expected behavior
Controlling PLC stops	Request expires and receiver stops operationally
Communication link fails	Request expires and receiver stops operationally
Sender task has expected jitter	Timeout should tolerate it
Sender task hangs	Request expires if no new request is produced
Receiver logic has an internal error	Must be handled by separate error logic

The last row matter. A watchdog on the request does not fix bad receiver code. The receiver still needs internal diagnostics, error states, and deterministic stop behavior.

For example, STOPPING should not be a decorative state. It should actively bring the controlled unit into a defined condition. Depending on the machine, that may mean ramping down motion, closing valves, disabling outputs, or handing control to a lower-level drive function.

Conclusion

One-shot start and stop commands are simple, but they are often the wrong abstraction for continuous operation between state machines.

A start command says that an event happened. A cyclic run request says that operation is still wanted now.

For PLC state machines, that difference matters. A cyclic request with timeout supervision gives the receiver a clear rule:

Run while the request is refreshed. Stop when the request expires.

The pattern is small, testable, and easy to debug online. It still needs correct timeout selection, deterministic receiver logic, and proper diagnostics. It also does not replace functional safety.

Used in the right place, it reduces protocol complexity and gives the system a better default behavior when communication or the controlling state machine fails.