CoT Suppression in Spatial UI Tasks

[YOUR VOICE] The Claim

Chain-of-thought prompting is the default recommendation for complex LLM tasks. But spatial UI tasks — clicking a specific button, reading a specific label, enumerating visible elements — degrade when the model is asked to reason step-by-step. The reasoning introduces spatial hallucinations.

The Mechanism

MISSING — Experimental setup: same UI tasks with and without CoT prompting across multiple VLMs

MISSING — Specific failure patterns observed (coordinate drift during reasoning, element hallucination in enumeration, spatial confusion in multi-step CoT)

MISSING — The suppression technique used in Leith and its effect on accuracy

The Evidence

MISSING — Comparative accuracy table: CoT-enabled vs CoT-suppressed across task types

MISSING — Example failure cases showing spatial hallucination during CoT

[YOUR VOICE] Implications

MISSING — When to use CoT and when to suppress it. The broader lesson about prompt engineering for spatial tasks.

Open Questions

Is this a VLM architecture limitation or a training data gap?
Do models fine-tuned on spatial tasks still exhibit this problem?
What’s the minimum reasoning the model needs to complete multi-step UI tasks without CoT?

Reference Documents

Document	What it covers
Leith _docs/	MISSING — CoT suppression implementation and results
Prompt engineering experiments	MISSING — Full experimental protocol

mlx-triage: Preflight Validation for MLX Models

Building the Operations Layer for a Multi-Agent Development System