2 min read

CoT Suppression in Spatial UI Tasks

Table of Contents

[YOUR VOICE] The Claim

Chain-of-thought prompting is the default recommendation for complex LLM tasks. But spatial UI tasks β€” clicking a specific button, reading a specific label, enumerating visible elements β€” degrade when the model is asked to reason step-by-step. The reasoning introduces spatial hallucinations.


The Mechanism

MISSING β€” Experimental setup: same UI tasks with and without CoT prompting across multiple VLMs

MISSING β€” Specific failure patterns observed (coordinate drift during reasoning, element hallucination in enumeration, spatial confusion in multi-step CoT)

MISSING β€” The suppression technique used in Leith and its effect on accuracy


The Evidence

MISSING β€” Comparative accuracy table: CoT-enabled vs CoT-suppressed across task types

MISSING β€” Example failure cases showing spatial hallucination during CoT


[YOUR VOICE] Implications

MISSING β€” When to use CoT and when to suppress it. The broader lesson about prompt engineering for spatial tasks.


Open Questions

  • Is this a VLM architecture limitation or a training data gap?
  • Do models fine-tuned on spatial tasks still exhibit this problem?
  • What’s the minimum reasoning the model needs to complete multi-step UI tasks without CoT?

Reference Documents

DocumentWhat it covers
Leith _docs/MISSING β€” CoT suppression implementation and results
Prompt engineering experimentsMISSING β€” Full experimental protocol