Loading...
Loading...
A wave of AI “agent” adoption is colliding with growing reliability and trust issues across core developer platforms. Anthropic’s Claude ecosystem saw fresh benchmarking claims (BioMysteryBench) alongside user backlash over outages, regressions in Claude Code, silent model removals, and feature shifts that appear to push code execution from Pro to higher-priced tiers. Billing and policy opacity amplified concerns after reports of an overcharge tied to a bug and refused refunds, while critics questioned Mythos verification transparency. In parallel, GitHub faced multiple incidents and workflow-breaking UX changes, underscoring how AI-driven growth is stressing availability, predictability, and customer confidence.
AI agents are being integrated into developer workflows at scale, but reliability, billing, and transparency issues are eroding developer trust and complicating operational planning. Tech professionals must account for availability, predictable pricing, and verification risks when adopting agentic AI in production.
Dossier last updated: 2026-05-11 05:11:09
欧盟委员会正就人工智能模型问题与OpenAI和Anthropic进行磋商
欧盟委员会正就人工智能模型问题与OpenAI和Anthropic进行磋商
Inti Landauro / Reuters : The European Commission says it is in ongoing discussions with OpenAI and Anthropic to access their latest AI models; OpenAI is “proactively offering” access — The European Commission is in ongoing discussions with U.S. artificial intelligence giants OpenAI and Anthropic, a spokesperson said on Monday.
Users report that Anthropic's Claude has repeatedly refused to add AGPLv3 licenses to projects, returning an "API Error: Output blocked by content filtering policy." Multiple reproductions and prior reports suggest the behavior affects several users. Complainants worry this may be an intentional or opaque policy decision by Anthropic, especially troubling given expectations that Claude was likely trained on AGPL-licensed code. Affected developers say they may pivot to alternatives (e.g., OpenAI Codex) to avoid future restrictions and cite transparency concerns. The issue matters because license-related filtering from code-assist AIs could disrupt open-source workflows and raise questions about training, filtering, and platform trust.
A V2EX thread reports that OpenAI's Codex can continue executing a complex goal even when a user's usage quota is down to its last 1%, effectively "finishing the map" before stopping. Users compared behavior across models: Codex and Anthropic Claude reportedly finish the ongoing task, while some say antigravity (unnamed provider) stops immediately when quota ends. Replies conflict on whether the final 1% is charged against future replenished quota or weekly limits; several users tested and reported it does not deduct from restored quota but may still count toward weekly caps. This matters for developers relying on quota billing and predictable task termination.
Anthropic’s Mythos AI reportedly analyzed the curl codebase under a Linux Foundation program, prompting attention from curl’s lead developer after initial access delays. Mythos’s scan covered roughly 178K lines across src/ and lib/, joining prior AI tools (AISLE, Zeropath, OpenAI Codex Security) and developer-assist bots (GitHub Copilot, Augment) that have already driven dozens of bug fixes and CVEs in curl. The curl team uses AI findings alongside static analysis, fuzzing, and human review to prioritize security. The episode highlights both the potency of advanced AI in surfacing vulnerabilities and the practical workflow and governance questions around controlled access, responsible disclosure, and how maintainers incorporate AI-generated reports.
Anthropic’s Mythos—a powerful code-auditing AI—flagged a real vulnerability in the curl project during a limited access program tied to Project Glasswing and Linux Foundation channels. The curl lead developer was offered access but received delayed direct access; instead, an intermediary ran Mythos scans and produced a report. Mythos joined other AI security tools in analyzing curl and reportedly found at least one notable issue, underscoring both the model’s capability to surface real-world software flaws and the tensions around restricted releases. This episode matters because it highlights how advanced AI can accelerate vulnerability discovery, influence responsible disclosure, and shape who gets early access to potent security tools.
A developer released adamsreview, a Claude Code plugin that orchestrates multi-stage, multi-agent PR reviews with up to seven parallel sub-agent lenses (correctness, security, UX, etc.), deduplication, validation gates, persistent JSON artifacts, and an automated fix loop that re-reviews and reverts regressions before committing. It integrates with Claude Code (Max recommended) and optionally Codex CLI via an --ensemble mode, and exposes six commands—review, codex-review, add, walkthrough, fix, and promote—to run, merge, import, interact with, and auto-apply fixes. The tool aims to catch more real bugs and reduce false positives compared with built-in review tools, keeping review artifacts and command scripts in a plugin layout for easy installation and workflow integration. This matters because it automates and hardens code review with AI-driven, auditable pipelines that can fit into existing PR workflows.
A developer released adamsreview, a Claude Code plugin that runs multi-stage, multi-agent pull request reviews using parallel sub-agents, validation passes, persistent JSON state, and optional ensemble review via Codex CLI and PR bot comments. Packaged as six slash commands (review, codex-review, add, promote, walkthrough, fix), it stores state on disk to clear context between stages, uses AskUserQuestion for guided human validation, and dispatches per-fix-group agents that re-review changes before committing. The author reports it catches more real bugs with fewer false positives than built-in Claude review tools and competitors like CodeRabbit and Greptile, and it works on a standard Claude Code subscription (Max recommended). Source code is on GitHub and the author seeks feedback from Claude Code users and pro devs.
A Reddit user asked whether Anthropic’s Claude is meaningfully superior to OpenAI’s Codex/ChatGPT, noting Claude’s stronger revenue as evidence and asking if that spells trouble for OpenAI. The post seeks community perspectives rather than hard benchmarks, with implied comparisons around code generation, conversational capability, and commercial adoption. This matters because vendor claims, revenue trends, and perceived quality influence developer choice, enterprise contracts, and the competitive dynamics among major AI model providers. The discussion highlights how market signals (revenue, positioning) and real-world technical differences (model accuracy, safety, latency, integrations) both shape platform selection and the broader AI tools market.
A user reports that OpenAI Codex stopped working after returning to China and that VPNs haven't restored access, seeking help. They also shared a detailed shot-by-shot script and prompt for a POV short video featuring two cats—an efficient orange cat and a naive cow-patterned cat—performing a fast-paced countertop routine that escalates into a fizzy mishap, a comic scuffle, and a wounded-but-resilient service finish. The post mixes a technical access problem (blocked AI/code assistant service) with creative production details (camera POV, characters, scene beats), highlighting both a tooling/accessibility gap for creators and a ready-made content plan that depends on tools like Codex for scripting or generation. This matters for creators relying on remote AI services and platform availability.
A solo developer reports diving deep into AI, spending at least 12 hours daily on hands-on experimentation rather than just reading tutorials. They’re working with models, workflows, agents and a tool called Hermes, iterating through frequent breakages, remote fixes and reinstallations. Two mentors, @Reboottttttt and @superdan, provided guidance, troubleshooting and test environments that enabled practical learning. The author highlights the addictive, game-like nature of building scripts and automation that actually perform tasks, and credits mentorship and iterative tinkering for rapid skill development. This reflects grassroots, maker-style adoption of AI tooling and the role of community support in upskilling.
Anthropic’s research preview Mythos reportedly generates working exploits for Firefox’s JavaScript shell (SpiderMonkey) in 72.4% of trials, a dramatic jump from under 1% with prior models like Opus. The article warns this undermines decades of security built on layered sandboxes—browser JS engines, browser process sandboxes and OS app sandboxes—by making vulnerability discovery and exploit generation much easier. The author argues Mythos’ size and cost limit broad deployment now, but similar capabilities are already appearing in far smaller models (e.g., Gemma 4) and will spread as hardware and software optimizations improve. That trajectory poses a growing cybersecurity risk to the internet’s foundational defenses. Key players: Anthropic, Mythos, Gemma 4, Opus.
Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts
Hermes Agent has climbed to the top spot in OpenRouter's global token metrics over the past 24 hours, surpassing competitors such as Anthropic's Claude Code and OpenClaw. The Reddit-posted screenshot highlights Hermes Agent as the most used model by token volume, signaling rapid adoption or a burst of traffic through OpenRouter’s API gateway. This matters because token-based rankings reflect real usage patterns across apps and integrations that rely on OpenRouter to access multiple LLMs; a surge can indicate shifting developer or user preferences and affect downstream billing, latency, and ecosystem visibility for models. Observers and developers may watch whether this is a sustained trend or a short-lived spike.
METR’s updated “time horizon” graph showing frontier models handling longer software-development tasks sparked panic online after Mythos and other models appeared to “break” the curve. The author argues the alarm is misplaced: METR measures 50% success on narrowly defined coding tasks, not reliable or general intelligence, and higher thresholds (80–95%) leave significant headroom. Recent gains likely stem from integrating symbolic tools, interpreters and verification harnesses rather than pure model scaling, so the chart doesn’t prove indefinite progress from more compute or parameter scaling. Broader benchmarks like ECI place Mythos roughly on trend with GPT-5.4, underscoring that the graph’s apparent leap is less momentous than some tweets suggested.
Companies are exploring Anthropic's Claude Enterprise for workplace use, with decision-makers focused on two main concerns: disabling Claude Code and managing Claude Co-Work to prevent accidental edits or exposure of confidential shared files. The post asks whether organizations have adopted Claude, how they controlled features and permissions, and what governance, access controls, or training were used to protect sensitive data. This matters because enterprise adoption hinges on secure configuration, granular access management, and integration with existing data protection policies. Practical experiences around admin controls, audit logs, role-based permissions, and data handling guarantees would guide procurement and deployment.
A Reddit thread asks readers to imagine a world where agentic AI security is a non-issue, prompting discussion about how development priorities, regulation, and investment would shift. Participants debate what would change if safeguards, misuse risks, and containment failures were effectively solved—highlighting faster product innovation, reallocated resources toward capability growth, and altered incentive structures for startups and incumbents. The conversation touches on responsibility, oversight, and whether trust in agentic systems would spur widespread deployment across industries. This matters because agentic AI—autonomous systems that plan and act—poses distinct technical and policy challenges; if perceived risks fall away, adoption accelerates with big implications for security, ethics, and competitive dynamics in the tech sector.
An image titled “Made with Claude: Evolution of Intelligence” shared on Reddit showcases artwork generated by Anthropic’s Claude model. The post highlights Claude’s capability to produce creative visuals, underscoring growing competition among generative AI offerings from companies like Anthropic and OpenAI. This matters because visual-generation strengths are becoming a key differentiator for AI platforms as they expand beyond text, influencing product positioning, developer adoption, and content workflows across startups and larger tech firms. The share on a prominent AI subreddit signals community interest and public testing, which can accelerate feature iteration, integration into creative tools, and debates about provenance, copyright, and responsible use.
@Areskapitalon: 原因就是,模型公司其实无法吃下AI代替劳动力解放出的全部效率和利润。它们只能吃下前沿模型的领先性溢价,而取得这个溢价又需要堆大量的额外算力,Anthropic最近的困境就可见一斑。 最有可能吃下这一块利润的,就是原来的那些SaaS平台公司