Observee

VLMs and VLAs for Edge Computing
Version v0.0.2
Updated
Company Observee Inc. Email contact@observee.com

Backed by Y Combinator

About Observee

Observee is specializing in building and deploying Vision Language Models (VLMs) and Vision Language Action Models (VLAs) for edge computing environments. Our pipeline is built for real-time frame selection and video processing to reduce latency and costs while enabling edge devices to work as autonomous AI agents.

Each deployed model supports reinforcement learning loops through periodic fine-tuning and policy adaptation from environmental feedback, enabling continuous performance improvement over time (currently in Alpha).

We have models available for edge devices with different hardware and software constraints while also providing hybrid online/offline inference capabilities.

VLM SDK

Our VLM SDK is available in beta. Please contact us to get access.

It supports RSTP, UDP, WebRTC, ONVIF, and HTTP video streams with automated frame selection to optimize costs with real-time video processing for handling general purpose vision tasks and supports tool use for complex tasks.

Key Capabilities

VLA in Action: Drone Demo

Watch our first VLA model in action, demonstrating autonomous drone navigation and control:

Observee VLA Model - Autonomous Drone Navigation