Posts tagged with "benchmarking"

Benchmarking UI Detection on ScreenSpot-Pro

How we evaluated uitag against 1,581 annotations across 26 professional macOS applications — methodology, results, and what the numbers actually mean.

GUI-Specialized Apple Silicon VLM Matrix

Which vision-language models actually work for UI tasks on M-series chips — tested configurations, latency numbers, and the models worth your time.

Apple Silicon VLM Benchmark Roundup

A short public narrative covering what we tested, what we found, and what you should run if you're doing local multimodal inference on M-series hardware.