All tags
Posts tagged with "benchmarking"
Benchmarking UI Detection on ScreenSpot-Pro
How we evaluated uitag against 1,581 annotations across 26 professional macOS applications — methodology, results, and what the numbers actually mean.
GUI-Specialized Apple Silicon VLM Matrix
Which vision-language models actually work for UI tasks on M-series chips — tested configurations, latency numbers, and the models worth your time.
Apple Silicon VLM Benchmark Roundup
A short public narrative covering what we tested, what we found, and what you should run if you're doing local multimodal inference on M-series hardware.