MIT and a stack of academic collaborators reviewed documentation for 30 widely used “agentic” AI systems. TLDR: a lot of capability, not a lot of paperwork.
Across eight disclosure categories, most systems provide little to no public detail on risks, safety testing, or governance. The gaps are basic. In many cases, it’s unclear whether you can monitor an agent’s step-by-step execution at all—which makes accountability, and post-mortems, harder than they should be.
A few numbers worth noting: 12 of the 30 agents offer no usage monitoring, or only notify you once you hit a rate limit.
Most agents also don’t identify themselves as AI by default. No watermarking. No clear signaling to users or third parties. And not necessarily honoring standard web conventions like robots.txt.
Stopping them can be its own problem. The study notes that some tools—including Alibaba’s MobileAgent, HubSpot’s Breeze, IBM’s watsonx, and n8n automations—lack documented ways to halt a single agent once it’s running. In some cases, the only option is to stop everything.
Important caveat: this isn’t red-team testing. The study is based primarily on publicly available documentation, which means it’s measuring what companies disclose—not necessarily everything that exists behind the scenes.
The researchers did reach out to the companies involved. About a quarter responded, but only 3 of 30 provided substantive comments. Perplexity, for its part, disputed parts of the report, calling out “significant factual inaccuracies.”
There are bright spots. The paper highlights OpenAI’s ChatGPT Agent as one example of traceability, using cryptographic signatures on browser requests to track behavior.
But zoom out, and the pattern is consistent: limited disclosure, unclear monitoring, and uneven control mechanisms—at a moment when these systems are starting to show up in real workflows.

Read more at ZDNet.
