AI-Powered Computer Control for Windows

OmniParser Quick Start Guide πŸš€

OmniParser Quick Start Guide πŸš€

TL;DR - Just Use It! ✨

OmniParser is now fully automatic! No server management needed.

You: "Capture and parse the screen"
FlowVision: [Auto-starts server if needed]
FlowVision: [Returns parsed UI elements]

That’s it! 🎊


What Changed

Before ❌

# Terminal 1
cd T:\OmniParser\omnitool\omniparserserver
python -m omniparserserver --som_model_path ...
[Keep this terminal open forever]

# Terminal 2 - FlowVision
[Configure server URL]
[Use FlowVision]

After βœ…

# Just use FlowVision - it handles everything!
[Launch FlowVision]
[Use screen capture]
[Done!]

First Time Setup

One-Time: Install OmniParser

1. Clone OmniParser (if not already installed):

cd T:\
git clone https://github.com/microsoft/OmniParser.git

2. Install Python Dependencies:

cd T:\OmniParser
pip install -r requirements.txt

3. Download Model Weights:

huggingface-cli download microsoft/OmniParser-v2.0 --local-dir weights

Done! βœ“

Verify Installation

Check that these exist:

  • T:\OmniParser\omnitool\omniparserserver\omniparserserver.py βœ“
  • T:\OmniParser\weights\icon_detect\model.pt βœ“
  • T:\OmniParser\weights\icon_caption_florence\ βœ“

Usage

Basic Usage (Most Common)

Just use screen capture - FlowVision does the rest!

You: "Capture the current screen and tell me what you see"

First Time:

  • Takes ~15-20 seconds (server startup + model loading)
  • Shows status: β€œStarting local OmniParser server…”

After First Time:

  • Takes ~2-3 seconds (server already running)
  • Instant processing!

What Happens Automatically

1. You request screen capture
   ↓
2. FlowVision checks: Server running?
   ↓ No
3. FlowVision starts Python server
   ↓
4. Server loads models (~15 seconds)
   ↓
5. FlowVision verifies server ready
   ↓
6. Processes your screenshot
   ↓
7. Returns parsed UI elements
   ↓
8. Server stays running for next time!

Performance

Timing

First Request (Cold Start):

Server startup:  ~15 seconds
Processing:      ~3 seconds
──────────────────────────────
Total:          ~18 seconds

Subsequent Requests:

Processing:      ~3 seconds  βœ“

With GPU (if configured):

Processing:      ~1 second   βœ“βœ“

Making It Faster

Option 1: Keep FlowVision Running

  • Server stays active
  • All captures are fast (~3 seconds)

Option 2: Use GPU

  1. Edit LocalOmniParserManager.cs
  2. Change --device cpu to --device cuda
  3. Rebuild FlowVision
  4. Result: ~1 second per screenshot!

Troubleshooting

β€œOmniParser server not available”

Check installation:

# Verify these exist:
dir T:\OmniParser\omnitool\omniparserserver\omniparserserver.py
dir T:\OmniParser\weights\icon_detect\model.pt
dir T:\OmniParser\weights\icon_caption_florence

Check Python:

python --version  # Should be 3.12+
pip list | findstr torch  # Should show PyTorch

β€œServer starting but timing out”

First time takes longer - model loading can take 20-30 seconds.

  • Wait patiently
  • Check FlowVision logs for β€œOmniParser-Server” messages
  • Verify models downloaded correctly

β€œPort 8080 already in use”

Option 1: Stop other service

netstat -ano | findstr :8080
taskkill /PID <pid> /F

Option 2: Use different port

// In your code
LocalOmniParserManager.Configure(port: 8081);

Get Detailed Diagnostics

string info = LocalOmniParserManager.GetDiagnostics();
Console.WriteLine(info);

Shows:

  • Installation path
  • Server script status
  • Weights folder status
  • Server running status
  • Process status

Advanced Configuration

Custom OmniParser Location

If not at T:\OmniParser:

LocalOmniParserManager.Configure(
    omniParserPath: @"C:\MyPath\OmniParser"
);

Custom Python Executable

If using specific Python:

LocalOmniParserManager.Configure(
    pythonExe: @"C:\Python312\python.exe"
);

Using Conda Environment

If using conda:

LocalOmniParserManager.Configure(
    pythonExe: @"C:\Users\YourName\miniconda3\envs\omni\python.exe"
);

Custom Port

If port conflict:

LocalOmniParserManager.Configure(
    port: 8081
);

API Reference

Check Server Status

bool isRunning = await LocalOmniParserManager.IsServerRunningAsync();
if (isRunning) {
    Console.WriteLine("Server is ready!");
}

Ensure Server Running

bool started = await LocalOmniParserManager.EnsureServerRunningAsync();
if (started) {
    // Server is now running, proceed with capture
}

Stop Server

LocalOmniParserManager.StopServer();
// Server stopped (will auto-restart on next capture)

Get Diagnostics

string diagnostics = LocalOmniParserManager.GetDiagnostics();
Console.WriteLine(diagnostics);

Output:

OmniParser Diagnostics:
  Installation Path: T:\OmniParser
  Server Script: βœ“ Found
  Weights Folder: βœ“ Found
  Python Executable: python
  Server URL: http://127.0.0.1:8080
  Server Running: βœ“ Yes
  Process Active: βœ“ Yes

Tips & Tricks

1. First Use is Slow

  • Normal! Server startup + model loading
  • Takes ~15-20 seconds first time
  • Subsequent captures are fast (~2-3 seconds)

2. Keep Server Running

  • Server stays active between captures
  • No need to restart
  • Much faster for multiple captures

3. Check Logs

  • FlowVision logs show server output
  • Look for β€œOmniParser-Server” entries
  • Helps debug issues

4. GPU Acceleration

  • Edit LocalOmniParserManager.cs
  • Change --device cpu to --device cuda
  • Requires CUDA-capable GPU
  • ~1 second per screenshot!

5. Batch Processing

  • Server handles multiple requests
  • Process many screenshots efficiently
  • No per-request overhead

Common Workflows

Single Screenshot

You: "Capture and analyze the screen"
FlowVision: [Ensures server running]
FlowVision: [Captures and processes]
FlowVision: "I see: [UI elements]"

Multiple Screenshots

You: "Capture the screen"
FlowVision: [First: ~15s, starts server]

You: "Capture again"
FlowVision: [Fast: ~3s, server running]

You: "And again"
FlowVision: [Fast: ~3s, server running]

Different Applications

You: "Capture Chrome window"
FlowVision: [Processes Chrome UI]

You: "Now capture VS Code"
FlowVision: [Processes VS Code UI]

You: "Back to Chrome"
FlowVision: [Processes Chrome UI]

All fast after first one!


Status Messages

You’ll See:

Starting:

"Checking local OmniParser server..."
"Starting local OmniParser server..."
"Server started successfully in 15 seconds"

Running:

"Server already running"
"Processing screenshot..."
"Done!"

Errors:

"Failed to start server: [reason]"
"OmniParser not found at T:\OmniParser"
"Check installation and try again"

FAQ

Q: Do I need to start the server manually?

A: No! FlowVision starts it automatically.

Q: Will it restart if it crashes?

A: Yes! Next capture will restart it automatically.

Q: Can I use it offline?

A: Yes! Everything runs locally.

Q: Does it use GPU?

A: CPU by default. Edit code for GPU.

Q: Can I change the port?

A: Yes! Use LocalOmniParserManager.Configure(port: 8081)

Q: What if OmniParser is elsewhere?

A: Configure path: LocalOmniParserManager.Configure(omniParserPath: "C:\MyPath")

Q: How do I know if it’s working?

A: Check diagnostics: LocalOmniParserManager.GetDiagnostics()


Summary

What You Get ✨

βœ… Automatic - Server starts when needed βœ… Fast - ~3 seconds after first request βœ… Reliable - Auto-restart on failure βœ… Local - No cloud, works offline βœ… Private - All processing on your machine βœ… Simple - No manual management

What You Need πŸ“‹

βœ… OmniParser at T:\OmniParser βœ… Python 3.12+ with dependencies βœ… Model weights downloaded βœ… Port 8080 available (or configure different)

How to Use πŸš€

  1. Use FlowVision screen capture
  2. That’s it! βœ“

It just works! 🎊


Status: βœ… Ready to use Configuration: ❌ None needed (optional customization available)