Modern LLM agents often fail at simple monitoring tasks due to an inability to manage check-in frequency and context. Microsoft Research addresses this with SentinelStep, a mechanism that enables AI agents to complete long-running monitoring tasks. SentinelStep wraps the agent in a workflow with dynamic polling and careful context management, allowing agents to monitor conditions for hours or days. How SentinelStep Works The core challenge is determining the optimal polling frequency. SentinelStep makes an educated guess at the polling interval based on the task, dynamically adjusting it based on observed behavior. A second challenge, context overflow, is handled by saving the agent state after the first check and using that state for subsequent checks. Core Components SentinelStep consists of actions to collect information, a condition that determines task completion, and a polling interval. Given these components, the system behaves as follows: every [polling interval] do [actions] until [condition] is satisfied. These three components are defined and exposed in the co-planning interface of Magentic-UI, where users can accept pre-filled plans or adjust as needed. Evaluation SentinelBench, a suite of synthetic web environments for evaluating monitoring tasks, allows for repeatable experiments, including setups like GitHub Watcher, Teams Monitor, and Flight Monitor. Initial tests show that SentinelStep improves reliability for longer tasks, with task reliability rising from 5.6% to 33.3% (1 hour) and 38.9% (2 hours). Impact and Availability SentinelStep is open-sourced as part of Magentic-UI on GitHub or via pip install magnetic-ui. Intended use, privacy considerations, and safety guidelines are available in the Magentic-UI Transparency Note.