OmniParser Quick Start Guide π
TL;DR - Just Use It! β¨
OmniParser is now fully automatic! No server management needed.
You: "Capture and parse the screen"
FlowVision: [Auto-starts server if needed]
FlowVision: [Returns parsed UI elements]
Thatβs it! π
What Changed
Before β
# Terminal 1
cd T:\OmniParser\omnitool\omniparserserver
python -m omniparserserver --som_model_path ...
[Keep this terminal open forever]
# Terminal 2 - FlowVision
[Configure server URL]
[Use FlowVision]
After β
# Just use FlowVision - it handles everything!
[Launch FlowVision]
[Use screen capture]
[Done!]
First Time Setup
One-Time: Install OmniParser
1. Clone OmniParser (if not already installed):
cd T:\
git clone https://github.com/microsoft/OmniParser.git
2. Install Python Dependencies:
cd T:\OmniParser
pip install -r requirements.txt
3. Download Model Weights:
huggingface-cli download microsoft/OmniParser-v2.0 --local-dir weights
Done! β
Verify Installation
Check that these exist:
T:\OmniParser\omnitool\omniparserserver\omniparserserver.pyβT:\OmniParser\weights\icon_detect\model.ptβT:\OmniParser\weights\icon_caption_florence\β
Usage
Basic Usage (Most Common)
Just use screen capture - FlowVision does the rest!
You: "Capture the current screen and tell me what you see"
First Time:
- Takes ~15-20 seconds (server startup + model loading)
- Shows status: βStarting local OmniParser serverβ¦β
After First Time:
- Takes ~2-3 seconds (server already running)
- Instant processing!
What Happens Automatically
1. You request screen capture
β
2. FlowVision checks: Server running?
β No
3. FlowVision starts Python server
β
4. Server loads models (~15 seconds)
β
5. FlowVision verifies server ready
β
6. Processes your screenshot
β
7. Returns parsed UI elements
β
8. Server stays running for next time!
Performance
Timing
First Request (Cold Start):
Server startup: ~15 seconds
Processing: ~3 seconds
ββββββββββββββββββββββββββββββ
Total: ~18 seconds
Subsequent Requests:
Processing: ~3 seconds β
With GPU (if configured):
Processing: ~1 second ββ
Making It Faster
Option 1: Keep FlowVision Running
- Server stays active
- All captures are fast (~3 seconds)
Option 2: Use GPU
- Edit
LocalOmniParserManager.cs - Change
--device cputo--device cuda - Rebuild FlowVision
- Result: ~1 second per screenshot!
Troubleshooting
βOmniParser server not availableβ
Check installation:
# Verify these exist:
dir T:\OmniParser\omnitool\omniparserserver\omniparserserver.py
dir T:\OmniParser\weights\icon_detect\model.pt
dir T:\OmniParser\weights\icon_caption_florence
Check Python:
python --version # Should be 3.12+
pip list | findstr torch # Should show PyTorch
βServer starting but timing outβ
First time takes longer - model loading can take 20-30 seconds.
- Wait patiently
- Check FlowVision logs for βOmniParser-Serverβ messages
- Verify models downloaded correctly
βPort 8080 already in useβ
Option 1: Stop other service
netstat -ano | findstr :8080
taskkill /PID <pid> /F
Option 2: Use different port
// In your code
LocalOmniParserManager.Configure(port: 8081);
Get Detailed Diagnostics
string info = LocalOmniParserManager.GetDiagnostics();
Console.WriteLine(info);
Shows:
- Installation path
- Server script status
- Weights folder status
- Server running status
- Process status
Advanced Configuration
Custom OmniParser Location
If not at T:\OmniParser:
LocalOmniParserManager.Configure(
omniParserPath: @"C:\MyPath\OmniParser"
);
Custom Python Executable
If using specific Python:
LocalOmniParserManager.Configure(
pythonExe: @"C:\Python312\python.exe"
);
Using Conda Environment
If using conda:
LocalOmniParserManager.Configure(
pythonExe: @"C:\Users\YourName\miniconda3\envs\omni\python.exe"
);
Custom Port
If port conflict:
LocalOmniParserManager.Configure(
port: 8081
);
API Reference
Check Server Status
bool isRunning = await LocalOmniParserManager.IsServerRunningAsync();
if (isRunning) {
Console.WriteLine("Server is ready!");
}
Ensure Server Running
bool started = await LocalOmniParserManager.EnsureServerRunningAsync();
if (started) {
// Server is now running, proceed with capture
}
Stop Server
LocalOmniParserManager.StopServer();
// Server stopped (will auto-restart on next capture)
Get Diagnostics
string diagnostics = LocalOmniParserManager.GetDiagnostics();
Console.WriteLine(diagnostics);
Output:
OmniParser Diagnostics:
Installation Path: T:\OmniParser
Server Script: β Found
Weights Folder: β Found
Python Executable: python
Server URL: http://127.0.0.1:8080
Server Running: β Yes
Process Active: β Yes
Tips & Tricks
1. First Use is Slow
- Normal! Server startup + model loading
- Takes ~15-20 seconds first time
- Subsequent captures are fast (~2-3 seconds)
2. Keep Server Running
- Server stays active between captures
- No need to restart
- Much faster for multiple captures
3. Check Logs
- FlowVision logs show server output
- Look for βOmniParser-Serverβ entries
- Helps debug issues
4. GPU Acceleration
- Edit LocalOmniParserManager.cs
- Change
--device cputo--device cuda - Requires CUDA-capable GPU
- ~1 second per screenshot!
5. Batch Processing
- Server handles multiple requests
- Process many screenshots efficiently
- No per-request overhead
Common Workflows
Single Screenshot
You: "Capture and analyze the screen"
FlowVision: [Ensures server running]
FlowVision: [Captures and processes]
FlowVision: "I see: [UI elements]"
Multiple Screenshots
You: "Capture the screen"
FlowVision: [First: ~15s, starts server]
You: "Capture again"
FlowVision: [Fast: ~3s, server running]
You: "And again"
FlowVision: [Fast: ~3s, server running]
Different Applications
You: "Capture Chrome window"
FlowVision: [Processes Chrome UI]
You: "Now capture VS Code"
FlowVision: [Processes VS Code UI]
You: "Back to Chrome"
FlowVision: [Processes Chrome UI]
All fast after first one!
Status Messages
Youβll See:
Starting:
"Checking local OmniParser server..."
"Starting local OmniParser server..."
"Server started successfully in 15 seconds"
Running:
"Server already running"
"Processing screenshot..."
"Done!"
Errors:
"Failed to start server: [reason]"
"OmniParser not found at T:\OmniParser"
"Check installation and try again"
FAQ
Q: Do I need to start the server manually?
A: No! FlowVision starts it automatically.
Q: Will it restart if it crashes?
A: Yes! Next capture will restart it automatically.
Q: Can I use it offline?
A: Yes! Everything runs locally.
Q: Does it use GPU?
A: CPU by default. Edit code for GPU.
Q: Can I change the port?
A: Yes! Use LocalOmniParserManager.Configure(port: 8081)
Q: What if OmniParser is elsewhere?
A: Configure path: LocalOmniParserManager.Configure(omniParserPath: "C:\MyPath")
Q: How do I know if itβs working?
A: Check diagnostics: LocalOmniParserManager.GetDiagnostics()
Summary
What You Get β¨
β Automatic - Server starts when needed β Fast - ~3 seconds after first request β Reliable - Auto-restart on failure β Local - No cloud, works offline β Private - All processing on your machine β Simple - No manual management
What You Need π
β
OmniParser at T:\OmniParser
β
Python 3.12+ with dependencies
β
Model weights downloaded
β
Port 8080 available (or configure different)
How to Use π
- Use FlowVision screen capture
- Thatβs it! β
It just works! π
Status: β Ready to use Configuration: β None needed (optional customization available)