ics-simlab-config-gen-claude/docs/FIX_SUMMARY.txt

158 lines
5.5 KiB
Plaintext

================================================================================
PLC STARTUP RACE CONDITION - FIX SUMMARY
================================================================================
ROOT CAUSE:
-----------
PLC2 crashed at startup when its Modbus TCP write callback to PLC1
(192.168.100.12:502) raised ConnectionRefusedError before PLC1 was ready.
Location: outputs/scenario_run/logic/plc2.py line 39
if key in cbs:
cbs[key]() # <-- CRASHED HERE with Connection refused
SOLUTION:
---------
Added safe retry wrapper in the PLC logic generator (tools/compile_ir.py)
that retries callback 30 times with 0.2s delay (6s total), never raises.
================================================================================
EXACT FILE CHANGES
================================================================================
FILE: tools/compile_ir.py
FUNCTION: render_plc_rules()
LINES: 17-46
CHANGE 1: Added import time (line 24)
------------------------------------------
+ lines.append("import time\n")
CHANGE 2: Added _safe_callback function (after line 28)
----------------------------------------------------------
+ lines.append("def _safe_callback(cb: Callable[[], None], retries: int = 30, delay: float = 0.2) -> None:\n")
+ lines.append(" \"\"\"Invoke callback with retry logic to handle startup race conditions.\"\"\"\n")
+ lines.append(" for attempt in range(retries):\n")
+ lines.append(" try:\n")
+ lines.append(" cb()\n")
+ lines.append(" return\n")
+ lines.append(" except Exception as e:\n")
+ lines.append(" if attempt == retries - 1:\n")
+ lines.append(" print(f\"WARNING: Callback failed after {retries} attempts: {e}\")\n")
+ lines.append(" return\n")
+ lines.append(" time.sleep(delay)\n\n\n")
CHANGE 3: Modified _write to use _safe_callback (line 46)
-----------------------------------------------------------
- lines.append(" cbs[key]()\n\n\n")
+ lines.append(" _safe_callback(cbs[key])\n\n\n")
================================================================================
GENERATED CODE COMPARISON
================================================================================
BEFORE (plc2.py):
-----------------
from typing import Any, Callable, Dict
def _write(out_regs, cbs, key, value):
if key not in out_regs:
return
cur = out_regs[key].get('value', None)
if cur == value:
return
out_regs[key]['value'] = value
if key in cbs:
cbs[key]() # <-- CRASHES
AFTER (plc2.py):
----------------
import time # <-- ADDED
from typing import Any, Callable, Dict
def _safe_callback(cb, retries=30, delay=0.2): # <-- ADDED
"""Invoke callback with retry logic to handle startup race conditions."""
for attempt in range(retries):
try:
cb()
return
except Exception as e:
if attempt == retries - 1:
print(f"WARNING: Callback failed after {retries} attempts: {e}")
return
time.sleep(delay)
def _write(out_regs, cbs, key, value):
if key not in out_regs:
return
cur = out_regs[key].get('value', None)
if cur == value:
return
out_regs[key]['value'] = value
if key in cbs:
_safe_callback(cbs[key]) # <-- NOW SAFE
================================================================================
VALIDATION COMMANDS
================================================================================
1. Rebuild scenario:
.venv/bin/python3 build_scenario.py --out outputs/scenario_run --overwrite
2. Verify fix is present:
.venv/bin/python3 validate_fix.py
3. Check generated code:
grep -A10 "_safe_callback" outputs/scenario_run/logic/plc2.py
4. Start ICS-SimLab:
cd ~/projects/ICS-SimLab-main/curtin-ics-simlab
sudo ./start.sh ~/projects/ics-simlab-config-gen_claude/outputs/scenario_run
5. Monitor PLC2 logs (NO crashes expected):
sudo docker logs $(sudo docker ps | grep plc2 | awk '{print $NF}') -f
6. Stop:
cd ~/projects/ICS-SimLab-main/curtin-ics-simlab && sudo ./stop.sh
================================================================================
EXPECTED BEHAVIOR
================================================================================
BEFORE FIX:
PLC2 container crashes immediately with:
Exception in thread Thread-1:
ConnectionRefusedError: [Errno 111] Connection refused
AFTER FIX (Success):
PLC2 container starts
Silent retries for ~6 seconds while PLC1 starts
Eventually callbacks succeed
No crashes, no exceptions
AFTER FIX (PLC1 never starts):
PLC2 container starts
After 6 seconds: WARNING: Callback failed after 30 attempts
Container keeps running (no crash)
Will retry on next write attempt
================================================================================
FILES CREATED
================================================================================
Modified:
tools/compile_ir.py (CRITICAL FIX)
New:
build_scenario.py (deterministic builder using correct venv)
validate_fix.py (validation script)
test_simlab.sh (interactive launcher)
diagnose_runtime.sh (diagnostic script)
RUNTIME_FIX.md (complete documentation)
CHANGES.md (detailed changes with diffs)
DELIVERABLES.md (comprehensive summary)
QUICKSTART.txt (this file)
FIX_SUMMARY.txt (exact changes)
================================================================================