158 lines
5.5 KiB
Plaintext
158 lines
5.5 KiB
Plaintext
================================================================================
|
|
PLC STARTUP RACE CONDITION - FIX SUMMARY
|
|
================================================================================
|
|
|
|
ROOT CAUSE:
|
|
-----------
|
|
PLC2 crashed at startup when its Modbus TCP write callback to PLC1
|
|
(192.168.100.12:502) raised ConnectionRefusedError before PLC1 was ready.
|
|
|
|
Location: outputs/scenario_run/logic/plc2.py line 39
|
|
if key in cbs:
|
|
cbs[key]() # <-- CRASHED HERE with Connection refused
|
|
|
|
SOLUTION:
|
|
---------
|
|
Added safe retry wrapper in the PLC logic generator (tools/compile_ir.py)
|
|
that retries callback 30 times with 0.2s delay (6s total), never raises.
|
|
|
|
================================================================================
|
|
EXACT FILE CHANGES
|
|
================================================================================
|
|
|
|
FILE: tools/compile_ir.py
|
|
FUNCTION: render_plc_rules()
|
|
LINES: 17-46
|
|
|
|
CHANGE 1: Added import time (line 24)
|
|
------------------------------------------
|
|
+ lines.append("import time\n")
|
|
|
|
CHANGE 2: Added _safe_callback function (after line 28)
|
|
----------------------------------------------------------
|
|
+ lines.append("def _safe_callback(cb: Callable[[], None], retries: int = 30, delay: float = 0.2) -> None:\n")
|
|
+ lines.append(" \"\"\"Invoke callback with retry logic to handle startup race conditions.\"\"\"\n")
|
|
+ lines.append(" for attempt in range(retries):\n")
|
|
+ lines.append(" try:\n")
|
|
+ lines.append(" cb()\n")
|
|
+ lines.append(" return\n")
|
|
+ lines.append(" except Exception as e:\n")
|
|
+ lines.append(" if attempt == retries - 1:\n")
|
|
+ lines.append(" print(f\"WARNING: Callback failed after {retries} attempts: {e}\")\n")
|
|
+ lines.append(" return\n")
|
|
+ lines.append(" time.sleep(delay)\n\n\n")
|
|
|
|
CHANGE 3: Modified _write to use _safe_callback (line 46)
|
|
-----------------------------------------------------------
|
|
- lines.append(" cbs[key]()\n\n\n")
|
|
+ lines.append(" _safe_callback(cbs[key])\n\n\n")
|
|
|
|
================================================================================
|
|
GENERATED CODE COMPARISON
|
|
================================================================================
|
|
|
|
BEFORE (plc2.py):
|
|
-----------------
|
|
from typing import Any, Callable, Dict
|
|
|
|
def _write(out_regs, cbs, key, value):
|
|
if key not in out_regs:
|
|
return
|
|
cur = out_regs[key].get('value', None)
|
|
if cur == value:
|
|
return
|
|
out_regs[key]['value'] = value
|
|
if key in cbs:
|
|
cbs[key]() # <-- CRASHES
|
|
|
|
AFTER (plc2.py):
|
|
----------------
|
|
import time # <-- ADDED
|
|
from typing import Any, Callable, Dict
|
|
|
|
def _safe_callback(cb, retries=30, delay=0.2): # <-- ADDED
|
|
"""Invoke callback with retry logic to handle startup race conditions."""
|
|
for attempt in range(retries):
|
|
try:
|
|
cb()
|
|
return
|
|
except Exception as e:
|
|
if attempt == retries - 1:
|
|
print(f"WARNING: Callback failed after {retries} attempts: {e}")
|
|
return
|
|
time.sleep(delay)
|
|
|
|
def _write(out_regs, cbs, key, value):
|
|
if key not in out_regs:
|
|
return
|
|
cur = out_regs[key].get('value', None)
|
|
if cur == value:
|
|
return
|
|
out_regs[key]['value'] = value
|
|
if key in cbs:
|
|
_safe_callback(cbs[key]) # <-- NOW SAFE
|
|
|
|
================================================================================
|
|
VALIDATION COMMANDS
|
|
================================================================================
|
|
|
|
1. Rebuild scenario:
|
|
.venv/bin/python3 build_scenario.py --out outputs/scenario_run --overwrite
|
|
|
|
2. Verify fix is present:
|
|
.venv/bin/python3 validate_fix.py
|
|
|
|
3. Check generated code:
|
|
grep -A10 "_safe_callback" outputs/scenario_run/logic/plc2.py
|
|
|
|
4. Start ICS-SimLab:
|
|
cd ~/projects/ICS-SimLab-main/curtin-ics-simlab
|
|
sudo ./start.sh ~/projects/ics-simlab-config-gen_claude/outputs/scenario_run
|
|
|
|
5. Monitor PLC2 logs (NO crashes expected):
|
|
sudo docker logs $(sudo docker ps | grep plc2 | awk '{print $NF}') -f
|
|
|
|
6. Stop:
|
|
cd ~/projects/ICS-SimLab-main/curtin-ics-simlab && sudo ./stop.sh
|
|
|
|
================================================================================
|
|
EXPECTED BEHAVIOR
|
|
================================================================================
|
|
|
|
BEFORE FIX:
|
|
PLC2 container crashes immediately with:
|
|
Exception in thread Thread-1:
|
|
ConnectionRefusedError: [Errno 111] Connection refused
|
|
|
|
AFTER FIX (Success):
|
|
PLC2 container starts
|
|
Silent retries for ~6 seconds while PLC1 starts
|
|
Eventually callbacks succeed
|
|
No crashes, no exceptions
|
|
|
|
AFTER FIX (PLC1 never starts):
|
|
PLC2 container starts
|
|
After 6 seconds: WARNING: Callback failed after 30 attempts
|
|
Container keeps running (no crash)
|
|
Will retry on next write attempt
|
|
|
|
================================================================================
|
|
FILES CREATED
|
|
================================================================================
|
|
|
|
Modified:
|
|
tools/compile_ir.py (CRITICAL FIX)
|
|
|
|
New:
|
|
build_scenario.py (deterministic builder using correct venv)
|
|
validate_fix.py (validation script)
|
|
test_simlab.sh (interactive launcher)
|
|
diagnose_runtime.sh (diagnostic script)
|
|
RUNTIME_FIX.md (complete documentation)
|
|
CHANGES.md (detailed changes with diffs)
|
|
DELIVERABLES.md (comprehensive summary)
|
|
QUICKSTART.txt (this file)
|
|
FIX_SUMMARY.txt (exact changes)
|
|
|
|
================================================================================
|