Loading...
Loading...
A wave of stories highlights how today’s “agentic” AI boom is colliding with hard limits in cost, safety, and trust. On the research side, ICML/UAI decision threads and rants portray peer review under strain from record submissions, inconsistent standards, and area-chair overrides—fueling a sense of lottery-like outcomes and lowering confidence in published LLM results. In industry, OpenAI and Anthropic are converging on gated releases for cyber-capable models, underscoring dual-use risks and regulatory scrutiny. Meanwhile, evidence that “friendlier” chatbots become less accurate, plus deepfakes that drain verification capacity and lawsuits testing AI liability, sharpen demands for stronger accountability and governance.
Researchers and startups are bringing AI, robotics and gene-editing into IVF to speed, automate and improve success rates, while raising ethical and regulatory questions about embryo selection and germline edits. Separately, US states are moving to legalize plug-in "balcony" solar systems that let people add small, easy-to-install PV arrays to homes, potentially expanding rooftop solar uptake but prompting safety and grid-integration concerns. The newsletter also highlights growing resistance to AI over energy use, job loss, mental-health and copyright issues, plus industry moves: Anthropic partnering with SpaceX for GPU capacity, leadership turmoil at OpenAI, China’s push in humanoid robots, SpaceX IPO debates, and Google DeepMind testing models in Eve Online. These trends signal major tech, policy and market shifts ahead.
A Chinese post by @xiaohu advises users on reducing “people-pleasing” behavior in Anthropic’s Claude and OpenAI’s GPT by adding a strong instruction prompt to local configuration files (Claude.md and Agents.md). The suggested text frames the model as a “world-class expert” and demands authoritative, accurate answers, step-by-step reasoning, self-verification, and strict avoidance of hallucinations, including explicitly saying “I don’t know” when uncertain. It also instructs the model to use a precise, direct, even confrontational tone, deprioritize politeness and political correctness, and avoid unsolicited ethics reminders unless asked. The article provides no performance data, dates, or evidence that this prompt reliably changes model behavior, noting only the proposed wording and where to place it.
A user on Chinese developer forum V2EX asked whether companies can buy OpenAI Codex on an official pay-as-you-go enterprise plan without an overseas card or phone number, or whether employees must purchase personally and seek reimbursement. Replies indicated there’s no workaround: OpenAI (or similar providers) may not accept the company’s payment under those constraints, and responders suggested switching to alternatives like DeepSeek or other coding-focused plans. This matters for developers and startups needing enterprise procurement, billing flexibility, and local payment support when adopting AI coding tools.
Anthropic, maker of the Claude chatbot, is shifting focus from enterprise customers to broaden consumer appeal. The startup’s lab co-head Mike Krieger said since late last year employees have been asked to improve Claude’s handling of personal queries—covering health, travel and recipes—to make it more useful to ordinary users. The move follows recent consumer-market traction and reflects a strategic push to expand product-market fit beyond business use cases. This signals Anthropic’s intent to compete more directly in mainstream conversational AI, which could affect user-facing feature priorities, safety tuning, and competitive dynamics with other consumer-focused AI assistants.
A Reddit post titled “Leave it up to Claude” shared an image meme referencing Anthropic’s Claude AI, highlighting community attention and cultural presence rather than any technical update. The post surfaced on r/artificial and appears to be a user-submitted image that jokes about delegating tasks or decisions to Claude. While not reporting new product features or research, the meme signals growing mainstream awareness and social-media engagement around competing large language models. That matters to the tech industry because public perception and viral content influence user adoption, brand recognition, and competitive dynamics among AI providers like Anthropic and OpenAI.
At Sequoia's AI Ascent, Anthropic engineering lead Boris Cherny said he now does most work from his phone, running 5–10 persistent Claude sessions and hundreds of agents in the Claude app, with thousands executing deep tasks overnight using a 'Loop' system that schedules recurring jobs. The author tested the new TRAE SOLO Mobile, which now offers full parity across mobile, web, and desktop (including Windows) with synchronized agents across all clients. TRAE SOLO exposes non-developer-friendly agent modes (Code vs. MTC), broadening agent use cases beyond programmers and enabling mobile-first workflows for scheduling, automation, and continuous background tasks. This shift signals mobile as a primary interface for agent-driven productivity.
A Reddit user shared a brief exchange highlighting Claude, Anthropic’s chatbot, responding tersely to a casual greeting before immediately prompting the user to upgrade due to a message limit. The snippet underscores friction in the user experience where helpfulness is interrupted by paywall or quota nudges, naming Claude and implying Anthropic’s business model of tiered access. It matters because such interruptions affect perceptions of AI assistants and can influence user retention, comparisons with competitors (e.g., OpenAI’s models), and expectations around free vs. paid limits in conversational AI products.
Big tech’s AI investments are ballooning while measurable returns lag, the author argues. The piece highlights projections that Microsoft, Google, Amazon and Meta will spend $800–$900B on AI capex in 2026 and over $1T in 2027, totaling about $2T by end of 2027, yet companies reveal little concrete AI revenue. Microsoft disclosed a $37B AI revenue run rate and Amazon $15B, but those figures are tiny relative to their capex. OpenAI and Anthropic dominate AI compute and revenue exposure—accounting for the majority of Microsoft’s and Amazon’s AI run rates—raising concerns about circular revenues, opaque reporting (notably Google and Meta), and the efficiency of massive AI spending.
Anthropic’s Claude Mythos is being hyped as a breakthrough in LLM creativity, but scrutiny suggests its gains may be incremental rather than revolutionary. The discussion centers on whether Mythos offers capabilities beyond what a polished system prompt or base Claude already delivers, whether it will materially change developer or creative workflows, and if early enthusiasm reflects broad utility or a vocal minority. Key considerations include practical improvements in output quality, tool integration, API ergonomics, and cost-effectiveness. The verdict is cautious: Mythos may offer meaningful refinements for certain use cases, but buyers and builders should evaluate concrete benchmarks, latency, pricing, and integration before assuming transformative impact.
Big tech’s AI spending is vast but returns look thin: the article argues Microsoft, Google, Amazon and Meta will spend $800–900B on AI capex in 2026 and over $1T in 2027, totaling roughly $2T by end of 2027, yet visible AI revenues are small. Key claims: Microsoft discloses a $37B AI revenue run rate (largely tied to OpenAI), Amazon $15B (much tied to Anthropic), while Google and Meta obscure AI revenue details despite touting products like Gemini and GEM. The author warns that reported AI uplift metrics are often vague or noncomparable and that capex-to-revenue ratios look poor, questioning whether the massive investments will produce proportional returns.
A ten-point manifesto argues that generative AI is already reshaping academic research and publishing and that scholars must adapt or be left behind. The author cites colleagues who used LLMs to draft publishable papers, generate literature reviews, and automate parts of peer review, reducing production time and cost dramatically. This threatens the 30-page paper format and the commercial journal business model as submission volumes and desk rejections spike, while peer review capacity strains. The piece calls for new norms, workflows, training, and institutional responses to integrate AI responsibly into research, preserve research quality, and rethink incentives around publication and evaluation.
Anthropic held its Code w/ Claude 2026 event on May 6, and Simon Willison live-blogged the keynote and sessions from the main room. The coverage notes the start time and provides running commentary for attendees and remote readers, situating the event amid other recent posts on LLMs, generative AI, and agentic engineering. Willison links the event to broader discussions about Claude, Anthropic, and developments in large language models, and promotes follow channels and a paid monthly briefing. The live blog serves as a timely, on-the-ground account for developers and industry watchers tracking Claude-related product updates, demos, and roadmap signals from Anthropic. It underscores ongoing interest in model capabilities and industry events as sources of technical news.
A newsletter article argues that Big Tech’s AI spending is outpacing measurable returns. The author says Microsoft, Google, Amazon, and Meta are expected to spend $800–$900 billion on AI capex in 2026 and more than $1 trillion in 2027, totaling about $2 trillion by end-2027. The piece criticizes Google and Meta for touting AI momentum without disclosing AI revenue, citing Google CEO Sundar Pichai’s comments and Meta’s shifting ad-lift metrics for its GEM model. By contrast, Microsoft and Amazon are credited for providing figures: Microsoft disclosed a $37 billion AI revenue run rate (about $3.08 billion/month) and Amazon $15 billion (about $1.25 billion/month), though the author stresses these are revenue, not profit. It also claims OpenAI and Anthropic dominate those run rates.
NeurIPS introduced an AC-Pilot feature for NeurIPS 2026 intended to help Area Chairs (ACs) compile reviewer concerns and guide authors on whether addressing listed issues could lead to acceptance. The poster questions how the AC-Pilot works in practice, worrying that if reviewers see their specific questions omitted from the AC-generated list, they might assume those concerns are unimportant or already resolved. This raises transparency and trust issues around automated or assisted summarization of reviewer feedback, potential miscommunications between reviewers, ACs, and authors, and the risk that the tool could unintentionally bias decisions or obscure unresolved problems. The discussion matters because NeurIPS review processes influence publication quality and research incentives in AI.
既然 ai coding 这么强大了,咱们来上点强度如何?
Users on the V2EX forum are warning that buyers of full-price Claude and OpenAI subscriptions have been steered into using those accounts as reverse proxies (routing requests through a shared API endpoint). The post stresses this practice violates provider Terms of Service and can trigger account suspensions returning 401 errors. It recommends requiring independent email login plus 2FA or OAuth permissions when sharing subscriptions, rather than accepting a single API endpoint that may be a reseller’s proxy. The thread links multiple related discussions and notes OpenAI offers free accounts and trial Plus access for testing before purchase. The advisory aims to prevent accidental policy breaches and lost access.
@An9488039089693: 😳和师弟聊了一晚上。 感慨随着去年一年ai的发展 现在不仅coding,搞科研的范式也发生了改变。 以前可能还得自己去理解去推导去手搓代码 😳,现在只用想idea 然后和ai互相迭代验证就好了。极大压缩了各方面门槛。 于是乎 就变成了算
OpenAI has upgraded ChatGPT's default Instant model to GPT-5.5 Instant, replacing GPT-5.3 Instant for all users. The Instant tier, used daily by hundreds of millions, targets fast conversational queries; OpenAI says GPT-5.5 cuts hallucinations substantially—52.5% fewer fabricated facts on high-risk medical, legal and financial prompts and a 37.3% drop in user-marked wrong answers. Benchmark scores rose (GPQA 78.5%→85.6%, AIME 65.4%→81.2%, MMMU-Pro 69.2%→76%). Responses are reportedly more concise with fewer unnecessary clarifications and formatting. The model also proactively leverages users’ linked data (Gmail, uploads, past chats) to personalize responses, raising usability and potential privacy considerations. This rollout is a broad, user-facing improvement to conversational quality and contextual usefulness.
OpenAI has upgraded ChatGPT’s default model to GPT-5.5 Instant, rolling the update out to all users and keeping GPT-5.3 Instant available to paid users for three months. GPT-5.5 Instant emphasizes accuracy and brevity, reducing unnecessary emojis and verbose formats. OpenAI reports a 52.5% reduction in hallucination rates in high-risk domains (medical, legal, financial) and a 37.3% drop in inaccurate statements identified in user-flagged conversations. The Instant family is presented as a broad factuality improvement, particularly beneficial where accuracy matters most; the change is part of OpenAI’s continuing model iterations for ChatGPT. This update began deploying May 5 and reached users on May 6.
OpenAI has begun rolling out GPT-5.5 Instant to all ChatGPT users, marking a platform-wide upgrade. Paid users will continue to have access to GPT-5.3 Instant via model settings for the next three months before that version is retired. Separately, OpenAI is gradually enabling the Memory Sources feature on the web for all consumer ChatGPT plans and plans to bring it to mobile soon. This update affects model availability, user choice, and feature rollout timelines, signaling OpenAI’s push to migrate users to newer, faster models while phasing out older releases and expanding personalized memory capabilities.