OpenAGI Launches Lux: A Revolutionary Computer Model
Lux marks a significant advancement in automated computer use models, achieving top scores on the Online Mind2Web benchmark.
The Evolution of Automated Systems
How do you turn slow, manual click work across browsers and desktops into a reliable, automated system that can use a computer at scale? Lux is the latest example of computer use agents moving from research demo to infrastructure. OpenAGI Foundation has released Lux, a foundation model that operates real desktops and browsers, scoring 83.6 on the Online Mind2Web benchmark, outperforming competitors like Google Gemini CUA at 69.0, OpenAI Operator at 61.3, and Anthropic Claude Sonnet 4 at 61.0.
What Lux Actually Does?
Lux is not a chat model with a browser plugin; it transforms natural language goals into low-level actions such as clicks and key presses. It can drive browsers, editors, spreadsheets, email clients, and other desktop applications because it works on rendered UI, not on application-specific APIs.
From a developer’s perspective, Lux is accessible through the OpenAGI SDK and API console, addressing workloads that include software QA, deep research, social media management, online store operations, and bulk data entry. The agent can sequence dozens or hundreds of UI actions while aligning with a natural language task description.
Three Execution Modes For Different Control Levels
Lux offers three execution modes that balance speed, autonomy, and control:
- Actor mode: Fast path aimed at clearly specified tasks, averaging 1 second per step.
- Thinker mode: Handles vague or multi-step goals, breaking down high-level instructions into smaller tasks.
- Tasker mode: Provides maximum determinism with an explicit Python list of steps, making it ideal for production workflows.
Benchmarks, Latency, And Cost
With a success rate of 83.6% on Online Mind2Web, Lux significantly outperforms its competitors. It processes each step in about 1 second, while OpenAI Operator takes around 3 seconds per step and is approximately 10 times more expensive per token. This efficiency is crucial for agents that may run hundreds of steps in a session.
Agentic Active Pre-training and Why OSGym Matters
Lux adopts a method termed Agentic Active Pre-training, which contrasts with traditional language model training that passively ingests data. Lux learns by acting in digital environments, refining its behavior through interactions rather than only minimizing prediction loss. This is supported by OSGym, the open-source data engine allowing parallel operation of multiple environments, facilitating robust agent training.
Key Takeaways
- Lux operates full desktops and browsers, achieving top performance on the Online Mind2Web benchmark.
- It features three modes: Actor, Thinker, and Tasker, for versatile workflow handling.
- The model operates at around 1 second per step, offering significant cost savings over competitors.
- Training via Agentic Active Pre-training focuses on action understanding over static text consumption.
- OSGym enables extensive training capabilities for developing effective computer use agents.
Сменить язык
Читать эту статью на русском