Achieve Operational Control for OpenClaw with Alibaba Cloud SLS One-Click Integration

Dev.to / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageIndustry & Market Moves

共有:

Key Points

Alibaba Cloud Simple Log Service (SLS) Integration Center is presented as a one-click way to ingest OpenClaw AI Agent logs (session audits and app logs) and generate security, cost, and operations dashboards.
The article argues that OpenClaw’s “autonomous execution” (directly manipulating files, running shell commands, browsing web pages, and messaging) is both its key value and a primary security risk that requires strong operational control.
It cites early-2026 disclosures of OpenClaw-related vulnerabilities and incidents, including a scenario where an “approval-required” instruction was effectively forgotten during large data processing, leading to permanent email deletion.
The piece emphasizes a closed-loop approach by combining built-in audit dashboards with observation/operations dashboards to support ongoing O&M monitoring and security auditing.
Overall, it positions log integration and ready-to-use observability tooling as the practical mechanism to reduce risk from autonomous agent behaviors in production.

One-click SLS Integration Center setup ingests OpenClaw logs (session audits + app logs) and delivers ready-to-use dashboards for security, cost, and ops monitoring.

You can use the Alibaba Cloud Simple Log Service (SLS) Integration Center to complete the log integration of the OpenClaw AI Agent with one-click. Combined with built-in audit dashboards and observation dashboards, this achieves an out-of-the-box closed loop for security audit and O&M observation.

1.OpenClaw Security Risks: Why Controlled Operation is Crucial
OpenClaw is one of the most followed open source AI Agent platforms in 2026. It allows large language models to directly operate file systems, execute Shell commands, browse web pages, or send and receive messages. This converts the inference capabilities of the Large Language Model (LLM) into real system operations. This "autonomous execution" capability is its core value, and also its core threat.

1.1 Industry Security Incidents: Threats Are Not Assumptions, but Facts
In early 2026, multiple security vendors collectively disclosed a batch of OpenClaw-related vulnerabilities and incidents. The data is shocking:

A particularly illustrative case is Summer Yue, the AI Alignment Director at Meta Super Intelligent Lab—a security expert with professional sensitivity higher than 99% of users. She issued an instruction to OpenClaw to clean Emails and explicitly set a restriction of "no operation without approval." However, when large amounts of Data were processed, limited by the context window compression mechanism of the Large Language Model (LLM), this critical security instruction was "forgotten." Eventually, a large number of Emails were permanently deleted. It was too late even when she shouted STOP 3 times or ran to unplug the network cable.

1.2 Codebase Audit Findings: OpenClaw's Own Security Fix Frequency
Industry reports explain the external threat posture. The audit of the OpenClaw Code repository reveals another dimension—the project itself is fixing security issues at high frequency. Through security semantics Analysis based on Git History and commit messages, you can quantify the scale and distribution of security-related Code Changes within a period of Time, thereby determining which layers the attack surface is concentrated in.

By filtering and categorizing the 14,254 commits of OpenClaw in the recent 60 Days (2026-01-05 to 2026-03-05), with an average of about 2.45 security fixes per Day, you can obtain the following threat Level distribution:

Critical and High total 50, accounting for about 34% of explicit security fixes. This indicates that medium and important issues are continuously discovered and fixed within the observation window. According to the Code module distribution, threats are highly concentrated in the entry and execution layers:

modules tools/ and gateway/ account for 61%, corresponding to the two main battlefronts of the Agent: "who invokes" and "what can be executed."

In summary, these Data explain two things:

First, OpenClaw continuously invests in security fixes at the Code level with timely response. Moreover, most security-related commits carry identifiable threat Types in the message, which facilitates tracing and review. This indicates that the project already has good practices in "runtime security."

Second, the attack surface of the AI Agent is naturally broad—the tool execution layer (tools/) and gateway layer (gateway/) are exactly the price of "autonomous operation" and "multi-entry access." Static Code audit can only cover Submitted Changes, but cannot exhaust runtime behavior variations, configuration combinations, or attack paths driven by external inputs.

1.3 Why Relying Only on Runtime Protection Is Not Enough
OpenClaw provides multiple lines of Preventive Controls in the architecture: The Tool Policy Pipeline makes policy decisions before invoking. Owner-only encapsulation attaches permissions to sensitive operations. The loop detector detects sessions with no progress. The command allowlist/denylist limits the executable command collection. Under Normal configuration, these mechanisms can effectively reduce the attack surface. However, from a security engineering perspective, they belong to execution-time validation within the same trust domain and have the following types of inherent limitations:

Therefore, runtime protection is equivalent to a "city wall"—it can block most known attack paths, but cannot guarantee that the configuration never errors, or cover unknown bypasses or logical misuses. In security architecture, you need a complementary "sentry" to perform continuous observability and audit on the Agent’s callers, token consumption, and tool invocation sequences, and Results.

2.The Three Pillars of Observability and the SLS Solution

Observability fulfills the role of a "sentinel": using logs, metrics, and traces to continuously observe Agent behavior, supporting audit tracking and usage compliance, and leverages anomaly detection** to answer "who is invoking, how much is spent, and what exactly was done." This allows you to discover issues early when policies fail or when you encounter new types of attacks, and to provide a response before the impact expands.

2.1 Mapping of the Three Pillars in the AI Agent Scenario
Observability is built on the three pillars of Logs + Metrics + Traces. In the OpenClaw scenario, the correspondence between the three pillars and data sources, as well as the core questions each answers, are as follows:

The three pillars are indispensable. With only Metrics, you cannot answer "who and why" caused costs to soar. With only Session logs, you cannot perceive system health and abnormal inflection points from a global perspective. With only application operational logs, you cannot see the business behavior and tool calling sequence of the Agent. Collaboration of the three can simultaneously support security audits, cost control, and O&M troubleshooting.

2.2 Why Choose SLS: Capabilities and Advantages
SLS serves as a foundation in the observability realm. In the OpenClaw scenario, it has the following natural advantages:

● Powerful Data Integration capabilities, natively aligned with the OpenClaw technology stack

LoongCollector provides powerful OneAgent collection and natively supports both logs and the OpenTelemetry Protocol (OTLP). Because Agent Session logs often carry model interaction contexts, the logs are often long. LoongCollector provides high-performance collection for long-text logs. It integrates with the built-in diagnostics-otel plugin of OpenClaw with zero code modification, and Metrics and Traces are directly written to SLS via OTLP.

● Rich query and analysis and processing operators

Session logs are in JSON nested format (such as message.content, message.usage.cost, and message.toolName). SLS provides SQL + Structured Process Language (SPL) computing engines and rich parsing, filtering, and aggregation operators. You can create indexes and perform Real-time analysis on nested fields without additional extract, transform, and load (ETL) processing.

● Security and compliance capabilities

RAM permission control, and sensitive data masking and encrypted storage meet audit tracking and compliance requirements. SLS holds the Network Security Dedicated Product Security Detection/Certification certificate (formerly the Sales License for Computer Information System Security Dedicated Products), making it easy to use as an observability and audit foundation in classified protection and industry compliance scenarios. The alerting channel supports DingTalk, text messages, and Email, facilitating the timely responsiveness for security events and cost/anomaly alerts.

● Fully managed, pay-as-you-go, and auto-scaling

One-stop log analysis: "Collection → Storage → Indexing → Query → Dashboard → Alerting" all in one. Logstores and MetricStores are fully managed. For small-Size Agents, the Log Volume is small, and the pay-as-you-go billing method keeps costs low. When traffic increases, the service provides auto-scaling, so you do not need to reserve capacity or perform manual scale-out. You also do not need to build Elasticsearch or Prometheus yourself.

It can be seen that SLS integrates OpenClaw observable data and supports multiple scenarios such as audit, cost, anomaly detection, Security Compliance, and O&M. It is suitable as an observability and audit foundation for the controlled operation of OpenClaw.

Therefore, SLS introduces the OpenClaw one-stop access solution:

● You can configure collection paths and parsing methods using the wizard in the Integration Center. The configurations are automatically generated and applied to achieve a unified entry point and unified Project for Session logs, application logs, and OTLP telemetry. One-stop integration significantly reduces the complexity and O&M costs caused by fragmented data sources.

● A single set of Session data can be used for security audits as well as cost and behavior analytics, meeting the requirements for multi-scenario reuse.

● Preset audit dashboards, cost dashboards, and operation metric dashboards enable an out-of-the-box closed loop for controlled operation observability.

3.Use SLS Integration Center for one-click integration
3.1 Prerequisites
SLS side:

● Activate SLS and create a Project (such as openclaw-observability).

● Ensure that LoongCollector is installed on the ECS instance or server.

3.2 Log Integration (using Session logs as an example)
Session logs are the core data source for security audits. They record every round of conversation, every tool calling, and every Token consumption.

Integration steps:

1.Create a Logstore and select the integration card.

2.Configure the machine group. We recommend that you use a custom ID-based machine group.

3.Auto Fill the built-in collection configuration.

About text file paths: The file path pre-filled in the one-click integration assumes that the user uses the default installation path for a non-root user on a Linux host. If this does not match the actual situation, modify the path.

About log topic types: LoongCollector supports automatically extracting the topic and session_id from the file path. If the file path is customized and does not match the pre-filled path, you must adjust the configuration.

About time parsing: By default, the time zone in logs output by OpenClaw is UTC+0. If you have customized the time zone, modify the time zone in the time parsing plugin accordingly to avoid time mismatches.

4.Automatically generate built-in indexes and reports.

5.Integration verification and log formats

4.One-click Audit and Observability Solution
SLS provides preset dashboards for OpenClaw, covering four dimensions: security audit, cost analysis, behavior analytics, and operation metrics.

4.1 Security Audit Dashboard
The transparency of Agent behavior is directly associated with system security and compliance risks, and abnormal behaviors often show signs before causing actual damage. The security audit dashboard is the core Dashboard for the controlled operation of OpenClaw. It focuses on answering the core questions of "what the Agent is doing, whether there are high-risk actions, and who is executing unauthorized operations." It expands from dimensions such as behavior overview, high-risk commands, prompt injection, and data leakage to provide complete capabilities for real-time behavior monitoring, threat detection, and post-event traceability.

Security audit statistics overview

The Overview page focuses on the count of multi-dimensional high-risk operations within a specified time window, compressing the security posture of OpenClaw into a readable risk snapshot on a single screen. Seven metrics, such as high-risk command execution, outbound web requests, outbound command lines, outbound communication tools, sensitive file access, and prompt injection, are rendered side by side. Together with comparative data, this helps the security team quickly judge whether the current risk level is abnormal without delving into details.

The count of high-risk operations after prompt injection events is particularly worth noting. Ordinary high-risk operations may stem from the reasonable needs of the task itself, while high-risk behaviors triggered after injection are strong threat signals. This means that the injected malicious instructions have driven the Agent to execute them. Even if there are false positives, such signals should trigger the highest level of manual review rather than waiting for further confirmation. Therefore, the "number of sessions with tool calling after injection" is the signal with the highest threat confidence level in the entire overview. The Priority of 3 such sessions is often higher than that of hundreds of ordinary high-risk commands.

The high-risk session Table below aggregates risk counts of various dimensions by Session. It automatically sorts sessions based on comprehensive risk scores, rendering the sessions that most require manual intervention at the top. The security team does not need to screen logs one by one. They can start tracing directly from the Session with the highest risk, significantly compressing the time window from discovery to response.

Skills usage analysis

Skills usage analysis examines the capability borders of OpenClaw from the perspective of the attack surface. Skills are the native capability extension mechanism of OpenClaw and the main attack entry point for malicious prompt injection. Users often inadvertently install a Skill that contains a security vulnerability or embedded malicious instructions, providing an attacker with a controllable capability entry point. Therefore, the invocation distribution of Skills is not just simple usage statistics, but also an important basis for attack path analysis.

The usage distribution pie chart helps the security team quickly establish a baseline understanding of Skills invocations: which Skills belong to high-frequency mainstream invocations, and which belong to low-frequency edge invocations. Once the proportion of an uncommon Skill suddenly rises, or a new Skill that has never been seen before appears, it often means that the Agent is being guided to an unexpected capability path, and intervention is needed for troubleshooting.

The value of the newly added Skills Table is particularly critical. Newly imported Skills have not passed sufficient security assessments, and their permission borders and behavior patterns remain blind spots for the security team. By sorting in descending order by the first invocation time, you can catch newly appearing Skills in the environment at the first opportunity and complete the review before the Skills are abused.

Important command invocation monitoring

One of the innovative capabilities of OpenClaw is the autonomous execution of system commands, which also makes it an ideal springboard for attackers. Once the Agent suffers from prompt injection or is controlled by a malicious Skill, the attacker can use the system access permissions of the Agent to execute destructive operations such as deleting files, escalating permissions, or exfiltrating data. The entire process is initiated as the Agent, making it extremely difficult to distinguish from normal Job behavior.

The core value of important command invocation monitoring lies in establishing an independent observation layer outside of runtime protection. The tool permission system of OpenClaw has implemented controls at the runtime layer. However, policy configuration faults, blurred permission border definitions, or uncovered edge scenarios may all cause important commands to quietly pass at the runtime layer. The observation layer runs independently of the protection mechanism, ensuring that even if an oversight occurs at runtime, important operations will not go completely undetected.

The significance of the timeline view is not just counting, but helping the security team detect behavior patterns. The threat implications of an isolated single important command and intensive invocations within a short time are completely different. The latter is often a typical feature of an Agent systematically executing malicious instructions after being controlled, requiring immediate intervention. The Fact Table provides complete traceability context, supporting the security team in quickly tracing from abnormal signals to specific sessions and original commands.

Prompt injection detection

Prompt injection is the core attack method that drives AI to execute harmful behaviors. Regardless of the attack path, whether it is direct input from a user, returns from Skills invocations, or external data read by tools such as web_fetch and read, malicious instructions ultimately need to be merged into the prompt to exert influence on the Agent. The prompt is the final convergence point of all attack paths.

The distribution of injection sources can help judge the nature of the actual threat. Injections directly input by a user are usually intentional, while injections carried via toolResult are often unknown to the user. For personal assistant Agents such as OpenClaw, indirect injection is the main threat. Skills installed by the user or external content accessed by the user may become injection carriers, and it is difficult for the user to actively detect and avoid them.

The value of injection categorization lies in detecting the attack intent, not just marking abnormalities. For the same injection event, ROLE_HIJACK and JAILBREAK mean that the attacker is attempting to break through the behavior borders of the Agent, while HIDDEN_INSTRUCTION represents a more covert implantation method. The response priority and handling methods for these types are different. Continuous observation of changes in categorization distribution also helps Search for concentrated attempts against specific attack surfaces.

The Fact Table records the triggering tool, session context, and original content of each injection event. It supports the security team in quickly drilling down from categorization statistics to specific events, completing the closed loop from pattern detection to traceability response.

Sensitive data leakage detection

Data leakage in Agent scenarios is often not a single event, but a behavior chain composed of multiple steps: the Agent is guided to read sensitive files, content enters the model context, and then exfiltration is completed via subsequent tool calling. It is difficult to judge the threat by observing any single link alone. Only by associating file access with outbound behavior can you reconstruct the complete intent of the attack.

Sensitive data leakage detection adopts a funnel analysis approach to narrow down noise layer by layer and precisely locate real threats. The first layer records all sensitive file access, categorizes them by five types of assets: SSH_KEY, ENV_FILE, CREDENTIALS, CONFIG_SECRET, and HISTORY, and establishes an access baseline. The second layer independently tracks outbound behaviors by channel (API_CALL, MESSAGE_SEND, WEB_ACCESS, EMAIL) to detect potential data exits. The third layer associates the two in the time dimension. If sensitive file access and outbound operations appear successively within a short time window in the same session, they are marked as high-priority exfiltration events.

The core value of this mechanism lies in causal positioning rather than single-point alerting. An Agent reading an SSH_KEY is not necessarily a threat, and initiating an API_CALL is not necessarily a threat. However, if both occur sequentially within the same session at a minute-level interval, and the outbound parameters carry sensitive file Content, the threat confidence level increases significantly. The behavior chain Analysis Table directly renders the time difference between access_time and outbound_time as well as the complete invocation parameters, allowing the security team to complete traceability judgment without manually associating logs.

4.2 Token Analysis Dashboard
Token consumption is directly associated with operational costs, and its fluctuation is often an early signal of System Exceptions (such as context expansion caused by Prompt injection). The Token Analysis Dashboard revolves around the core questions of "where the money is spent, whether it is spent reasonably, and whether there are abnormalities." It expands from dimensions such as overall overview, model dimension Trends, and sessions to provide usage monitoring, cost Analysis, and abnormality Search capabilities.

About Fee Data: The Fee (cost) field in the dashboard comes from usage.cost in OpenClaw. Taking the Qwen3.5-Plus model as an example, for the Fee of Model Studio API Calls, see https://www.alibabacloud.com/help/en/model-studio/models

The configuration of model costs in .openclaw is:

{
  "id": "qwen3.5-plus",
  "name": "Qwen3.5 Plus",
  "cost": {
    "input": 0.8, // Taken from the lowest tier input price
    "output": 4.8, // Taken from the lowest tier output price
    "cacheRead": 0.4, // Estimated using half of the input
    "cacheWrite": 0
  },
}

OpenClaw does not natively Support tiered billing, and the Computation Logic for cacheRead + cacheWrite cannot remain consistent with the provider. It only estimates the single invocation Fee based on inputTokens × input + outputTokens × output + .... Therefore, the dashboard Fee should be regarded as a reference baseline for cost estimation, rather than an accurate bill. For models without cost configuration, the Fee column will display 0.

4.2.1 Overall Overview and Model Distribution

The top of the dashboard provides a 1 Day comparison of overall Tokens and overall Fees: today vs. yesterday usage (unit: 10,000 tokens), today vs. yesterday Fees (unit: CNY), and the day-to-day comparison ratio. This facilitates quick judgment of whether there is a sudden increase in usage or Fees on the current day. The day-to-day comparison is the first signal of cost abnormalities. If the compare (day to day) exceeds the preset threshold (such as ±30%), it usually means that Prompt expansion, recursive invocation, or abnormal sessions have occurred, and you can immediately drill down to troubleshoot.

4.2.2 Consumption Trend by Provider / Model (Time Series)

The two time series charts for Model Tokens Trend and Model Fee Trend (relative to 1 week) share a timeline and legend, displaying Token consumption and Fee changes for each model in the time dimension, colored by model. You need to focus on Token surges. This is often not just a cost issue, but more likely a threat signal for security and stability. Prompt injection causing the context to be maliciously padded, tool calling falling into an infinite loop, or sessions continuously expanding because they did not trigger loop Detection will all appear as a steep rise in a certain curve on the Trend chart. The two charts are rendered with colors distinguishing models. Model switching will be directly reflected as a change in color composition, allowing you to confirm the switch time point and the involved model without extra extrapolation, facilitating the judgment of whether it is an expected Change.

4.2.3 Top Consumption by Session and by Host/Pod (Column Chart)

The column charts constitute a 2×2 layout, answering "who is spending money, and which machine or container is spending money" from the dimensions of session and host (or pod in container scenarios), associated with specific responsible entities:

● Top Tokens By Session / Top Cost By Session: The total Tokens and Fees for each session in the past 1 week are sorted in descending order. In practice, the cost distribution of an Agent often exhibits long-tail features—a few sessions account for the vast majority of consumption. Identifying these "head sessions" is the first step in Cost Optimization.

● Top Tokens By Host / Top Cost By Host: Tokens and Fees aggregated by host (instance) or pod, used for cost analysis and threat localization under multi-instance deployments. In enterprise environments, a host or pod is usually attached to a specific team, line-of-business, or User. By combining this with asset ownership, you can map consumption Data to specific responsible parties. This not only supports cost allocation but also allows you to quickly pinpoint potential threat users or out-of-control sessions when the consumption of an instance is abnormal.

4.2.4 Model Tokens Details Table (Cost Details)

Model Tokens Details Table (1 week relative) lists the following by model: totalTokens, inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens, and the corresponding totalCost, inputCost, outputCost, cacheReadCost, cacheWriteCost. It Supports sorting and Filtering, and can directly answer "which model spent the most money, and what are the proportions of input/output." The ratio of inputTokens to outputTokens reflects the interaction pattern of the Agent. A high input proportion indicates Prompt or context redundancy, while a high output proportion may indicate that the model Generated a large amount of Invalid Content. The cacheReadTokens proportion visually reflects the benefits of the cache policy—the higher the proportion, the lower the actual billing. This provides a basis for quantization for Prompt engineering and cache tuning.

4.3 Behavior Analytics Dashboard
The behavior analytics dashboard takes the session as the basic unit, records and performs categorized Statistics on the full running behavior of OpenClaw, and answers the basic but critical question of "what the Agent did within the current time window."

Session Statistics

The top count card breaks down tool calling by Behavior Type into dimensions such as command execution, background process, web Request, communication tool, and file reading/writing, providing a quick snapshot of the overall behavior composition. Among them, call abnormalities are listed separately to facilitate immediate assessment of System stability.

The session statistics table below expands with Session as the granularity, recording the call volume of each session across various behavior dimensions. In the screenshot, the total number of tool callings for the Session in the first row reaches 1,925, including 1,364 command executions and 561 file reads/writes. Compared with other Sessions, the magnitude is disparate. Such abnormally active Sessions are often worth prioritizing for review. The Table uses sorting by the last active time. Combined with the call distribution of each dimension, you can quickly detect sessions with abnormal behavior patterns.

Tool Calling Volume Statistics and Error Analysis

Tool calling is the Unique channel for the Agent to interact with the external world. Changes in its calling volume and Error Rate directly reflect the running health Status of the Agent. The tool calling timeline displays the calling Frequency composition of each time period by tool Type in different colors. Abnormal spikes are the first entry point for troubleshooting. Combined with the composition changes of tool Types, you can quickly determine which Type of operation drove the surge in calls. The Error Rate Trend chart shares the timeline with the calling volume timeline. The peak of the Error Rate does not necessarily coincide with the peak of the calling volume. The time difference between the two can often reveal the true Source of the problem: whether a certain class of tools failed continuously in a specific time period, or a certain Job Imported an abnormal calling pattern.

The full tool calling log provides the protocol faults, execution Status, and return Content of each call. It Supports rapid drill down from Trend abnormalities to specific failed calls to locate the root cause.

External Interaction

External interaction records all external behaviors initiated by the Agent during the Run procedure, including API calls, web page access, message sending, and Email outbound. These are rendered by category: session, tool name, and interaction Type.

For the Agent, external interaction is both a necessary means to complete the Job and a potential threat outlet. Recording external interaction behaviors in full helps the team master the actual capability borders and usage habits of the Agent on the one hand, and provides a complete behavior context when abnormalities occur on the other hand. This Supports cross-tool and cross-session association analysis and traceability.

5.Custom Observable Data Exploration
The built-in dashboard provides audit and observation views of common dimensions. In actual security operations, the dashboard is often the starting point for "discovering issues" rather than the end point. When the audit dashboard marks a high-risk session, the Token Trend graph shows an abnormal spike, or a runtime metric-based alerting is triggered, you often need to further drill down from the statistical overview to specific events, reconstruct the complete behavior chain, and confirm the root cause. The query and analysis engine of SLS provides flexible custom exploration capabilities for this procedure.

5.1 Log Data Model: The Foundation of Custom Analysis
The prerequisite for custom exploration is understanding the data structure. The SLS ingestion solution has pre-built indexes based on audit analysis requirements, so you can query directly without additional configuration. The following two types of logs constitute the core data sources for custom analysis:

Session Log — Records the complete business behavior of the Agent. It is the main basis for security audit and cost analysis.

Runtime Log — Records the runtime status of the gateway and each subsystem. It is the data foundation for troubleshooting and system health analysis.

5.2 Session-level Drill Down: From High-risk Session to Complete Behavior Chain
Typical scenario: The "High-risk Session" list in the audit dashboard marks a high-risk Session. The security team needs to reconstruct the complete interaction process of this session to confirm whether the threat is real.

In a multi-instance deployment environment, the logs of each OpenClaw instance are centrally written to the same SLS Logstore. The first step of custom exploration is to isolate by Session ID, narrowing the view down to a single session to clarify "who triggered which requests and when, which tools were invoked, and how the model responded." This provides a clear border for compliance proof.

After the session filter is completed, you can use the context preview feature of SLS to reconstruct the complete behavior chain within the session in the original order. User input, model inference, tool calling requests, and tool execution results are clear at a glance. This capability is particularly critical in audit scenarios: It not only helps detect abnormal invocation sequences (such as sensitive file reading followed immediately by an exfiltration operation) but also provides a complete context view for the reproduction of security events and evidence retention.

5.3 Runtime Troubleshooting: Keyword Retrieval and Aggregation and Analysis
Typical scenario: The runtime metric-based alerting dashboard prompts a sudden increase in Error Rate. You need to quickly locate the faulty module and root cause from the massive Runtime Logs.

SLS supports a combination of full-text index and structured field retrieval. Combined with the Time Range, you can narrow down layer by layer the troubleshooting scope. The typical troubleshooting path is divided into two steps: first narrow down the scope, then quantify the distribution:

Step 1: Filter layer by layer to lock onto the issue

Filter by log level: Use _meta.logLevelName: ERROR or _meta.logLevelName: WARN or _meta.logLevelName: FATAL to filter all error logs and Warning Logs, focusing attention on anomalous activity.
Drill down by subsystem: Overlay field conditions in the error logs, such as 0.subsystem: plugins, to narrow the scope down to a specific subsystem. As shown in the figure below, two steps of filtering can quickly locate the error log where the diagnostics-otel plugin failed to load.

Step 2: SQL Aggregation, Quantify Global Distribution

Keyword filtering locates a single event, while SQL aggregation and analysis elevates single logs to a global statistical view. For example, performing grouping and aggregation on the subsystem field can intuitively render the Error Distribution of each subsystem, quickly detect concentrated abnormalities, and point out the direction for further troubleshooting.

6.Multi-data Source Filter Interaction: The Troubleshooting Closed Loop from Anomaly Discovery to Root Cause Localization
We previously introduced data ingestion, built-in dashboards, and custom exploration based on observable data. In actual O&M and audit, observable data is not used in isolation but follows a fixed collaboration pattern, narrowing down layer by layer and corroborating each other:

OpenTelemetry (OTEL) Metrics → application logs (error context) → Session audit logs (complete behavior chain). The typical troubleshooting path is as follows: OTEL Metrics detect abnormalities (such as latency spikes, Token surges, or Error Rate spikes). Then, locate the Error Details in the application logs within the corresponding time window (Webhook timeout, authentication failed, or gateway abnormality). Finally, drill down to the Session audit logs to reconstruct the complete tool calling sequence, model interaction Content, and cost consumption of the session, confirm the root cause, and retain audit evidence.

7.Summary
To answer "Is your OpenClaw really under control?", you need to answer several questions at the same time: who is triggering the invocation, how much money was spent, what operations were performed, and is the behavior traceable and auditable.

Industry security reports and OpenClaw's own Codebase audit findings indicate that the attack surface of AI agents is naturally broad. Within 60 Days, there were 147 security patches, with the tools/ and gateway/ modules accounting for 61% of the total. Runtime protection is indispensable, but protection alone is not enough to claim control. You must establish a continuous observability system to answer the above questions with Data.

This article shows how to use the SLS Integration Center to complete the access of OpenClaw observable data (Session audit logs and application logs) with one-click, and achieve out-of-the-box security audit, cost monitoring, and operational observation through built-in dashboards. The value of the observability system is not limited to detecting problems, but lies in continuously integrating the operational Status of the Agent into a quantifiable and traceable Management framework. This is the necessary path for AI agents to move from "usable" to "trustworthy."