Loading...
Loading...
Major publishers are increasingly blocking the Internet Archive’s Wayback Machine crawler and limiting API access, arguing that archived news can be repurposed for scraping and AI training—especially for paywalled content. Investigations found dozens of prominent sites, plus Reddit, restricting ia_archiverbot, while outlets like The Guardian and Financial Times are selectively filtering article URLs. Digital rights groups such as the EFF warn the move won’t meaningfully stop AI but will degrade a critical public record used for journalism, research, and legal evidence, especially when articles are edited or removed. The dispute lands amid broader legal pressure on the Archive, raising fears the web is becoming effectively “unarchivable.”
Blocking the Wayback Machine reduces access to archived news and web records that journalists, researchers, and engineers rely on for verification, provenance, and reproducibility. Tech professionals building datasets, compliance tools, or integrity checks face degraded sources and legal uncertainty when public web history is incomplete.
Dossier last updated: 2026-05-10 03:58:03
The Internet Archive has launched a new foundation based in Switzerland to expand its global mission of preserving digital knowledge. The move establishes a European legal and operational hub intended to improve international governance, fundraising, and collaborations while offering a jurisdiction perceived as favorable for cultural preservation and data stewardship. Key players include the Internet Archive organization and its leadership, positioning the new Swiss foundation as complementary to the U.S.-based nonprofit. This matters because a European presence can ease cross-border partnerships, address legal and copyright complexities, and reassure international donors and partners about stewardship and governance of large digital collections. The step may influence how digital archives are maintained and governed globally.
A blog post reports that George Orwell’s 1939 review of Bertrand Russell’s book “Power: A New Social Analysis” had become difficult to find online and was no longer appearing in search results. The author says they previously learned from the piece, describing it as more than a standard book review, and recently tried to locate it again. After “sleuthing,” they recovered a copy via the Internet Archive and note that the review was originally published in The Adelphi in January 1939. The post also points readers to a scanned version of the original publication hosted on the Internet Archive. The item matters as an example of digital preservation and the role of archives in restoring access to historical texts.
Two hundred journalists publicly praised the Internet Archive and its Wayback Machine for preserving news, documents and rare books, signing a letter defending the service as major outlets debate restricting archival access. Signatories — including Rachel Maddow, Justin Bank, Ashley Belanger, Annalee Newitz and other reporters — cited daily use for fact-checking, tracing policy and terms-of-service changes, reconstructing altered press releases, and accessing hard-to-find scholarly works. They argue the Archive is an essential public record and forensic tool that prevents “memory-holing” of online content, likening its role to national libraries and urging protection of broad, impartial access to digital history. The endorsements spotlight tensions between publishers’ rights and public-interest archiving.
About 200 journalists have publicly praised the Internet Archive and its Wayback Machine for preserving news, historical materials, and rare books, urging support as some media outlets debate blocking archival copies of their work. Signatories — including Rachel Maddow, Justin Bank, Ashley Belanger, Annalee Newitz and other reporters — describe using the Archive daily to fact-check, recover altered press releases, compare past website versions, and access otherwise rare research materials. They argue the Archive functions as a critical public ledger and research tool that protects the historical record against revision, censorship, and link rot, making it essential for accountability reporting and cultural preservation.
The Internet Archive’s Wayback Machine, a critical web archiving service run by the nonprofit Internet Archive, is facing financial and legal pressures that threaten its ability to preserve web history. The organization has grown into a vital resource for researchers, journalists, and developers by storing billions of web pages, books, and multimedia; now rising costs, server maintenance, and copyright litigation risks imperil its operations. Key players include the Internet Archive, its founder Brewster Kahle, libraries and academic partners, and rights holders involved in lawsuits. The outcome matters for digital preservation, public access to historical internet content, and the dependability of a shared web heritage relied on by tech companies and the broader internet ecosystem.
The Internet Archive’s Wayback Machine—crucial for journalists, researchers, and public accountability—is increasingly being blocked or limited by major news organizations and platforms. USA Today Co., The New York Times, Reddit and about 23 major sites now restrict the crawler ia_archiverbot or otherwise limit access to archived content; The Guardian excludes its content from the Archive API. Publishers say measures target scraping and potential AI misuse of content, while journalists and advocates argue these moves undermine historical preservation and reporting. More than 100 journalists, backed by groups like the EFF, have rallied to defend the Wayback Machine’s role in safeguarding the web’s record, highlighting risks to fact-checking, research, and public interest journalism.
The Internet Archive’s Wayback Machine faces growing pushback as major news organizations, including USA Today Co., The New York Times, and Reddit, restrict or block its crawler amid concerns about scraping and potential misuse by AI companies. Analysis from Originality AI found 23 major sites blocking ia_archiverbot; others, like The Guardian, limit API access or filter archived content. Journalists and advocacy groups including the EFF have rallied in support, collecting over 100 signatories — from Rachel Maddow to independent reporters — arguing the Wayback Machine is vital for preserving journalistic records, fact-checking, and public-interest research. The dispute highlights tensions between publishers protecting content and broader public-access, archival, and AI-training concerns.
A Hacker News post is resurfacing PBS NOVA’s 1998 web feature “Terror in Space” (pbs.org), drawing discussion about the Mir space station and how older PBS content remains accessible online. Commenters point to related reporting, including Bryan Burrough’s book “Dragonfly: NASA and the Crisis Aboard Mir,” and compare risks in other programs such as Skylab, which reentered on July 11, 1979. A key thread focuses on availability: users ask where to watch the episode, and another provides an Internet Archive link (archive.org/details/TerrorinSpace), noting torrents for broader NOVA seasons. Others remark on the site’s late-1990s design, praising its simplicity while criticizing navigation choices. The post highlights ongoing interest in spaceflight history and digital preservation of public media.
The Internet Archive’s Wayback Machine is being blocked by at least 23 major news sites, endangering long-term access to journalistic records. Publishers including top outlets (unnamed in the excerpt) have used robots.txt or other measures to prevent archival crawls, limiting the Archive’s ability to preserve articles and the public record. This matters because the Wayback Machine serves researchers, journalists, and courts by providing historical snapshots; blocking reduces transparency, hinders fact-checking, and risks loss of context when articles change or are removed. The dispute highlights tensions between publishers’ control of content and public-interest archiving, raising policy and technical questions about digital preservation, copyright, and the responsibilities of platforms and news organizations.
Chicago music collector Aadam Jacobs has allowed volunteers from the Internet Archive to digitize roughly 2,500 of his more than 10,000 cassette concert recordings to preserve them before the tapes degrade. The uploads include rare early performances — notably a 1989 Nirvana show — and previously unknown or hard-to-find tapes from artists such as Sonic Youth, R.E.M., Phish, Liz Phair, Pavement, Neutral Milk Hotel and numerous punk bands. Volunteers use vintage cassette decks to transfer audio, then clean, tag and catalog the files for public access. The project highlights community-driven digital preservation and the Internet Archive’s role in making cultural audio artifacts accessible online.
Chicago collector Aadam Jacobs has donated digitization rights for thousands of concert cassette tapes to the Internet Archive, where volunteers have posted roughly 2,500 recordings so far. The collection includes rare early performances by Nirvana (1989), Sonic Youth, R.E.M., Phish, Liz Phair, Pavement, Neutral Milk Hotel and others. Volunteers use vintage cassette decks to transfer tapes to digital files, then clean, organize and label tracks — sometimes identifying forgotten song titles from obscure punk bands. The project preserves aging analog media, broadens public access to historically significant live performances, and highlights how volunteer-driven digitization can rescue cultural audio before it degrades.
Chicago-based concert recorder Aadam Jacobs is making thousands of rare live recordings available via the Internet Archive, according to the article. Jacobs, described as a longtime music superfan, has taped shows he attended since the 1980s and has accumulated more than 10,000 recordings on tape. The move puts a large, previously personal collection into a public digital repository where listeners can stream or access the material online. If fully uploaded, the collection could expand the Internet Archive’s live music holdings and preserve performances that may not exist in official releases. The article provides limited additional details beyond the scale of Jacobs’ archive and its arrival on the Internet Archive, such as specific artists, dates, or upload timelines.
The Internet Archive's Wayback Machine, the web's leading archiving service, faces an existential threat after a high-profile legal battle challenged its ability to store and serve copies of web content. The dispute centers on copyright holders and publishers seeking to restrict archiving of news and paywalled articles; a recent court case and takedown threats could force the Archive to alter its operations or remove cached pages. Key players include the Internet Archive, major publishers, and US courts; the outcome will affect researchers, journalists, and developers who rely on historical web snapshots. The ruling could reshape digital preservation, access to public records, and how platforms balance copyright with archival public interest.
Kate Knibbs / Wired : Originality AI: 23 major news websites and Reddit block the Internet Archive's crawler; journalists and advocacy groups sign a letter supporting the Archive — As major news outlets cut off the Wayback Machine, journalists and advocacy groups are rallying to protect the Internet Archive's vast collection of web pages.
The Internet Archive’s Wayback Machine is facing growing restrictions as major news organizations — including USA Today Co., The New York Times, and Reddit — are blocking or limiting its crawler, ia_archiverbot, and some publishers are restricting API access. Originality AI found 23 major news sites blocking the bot; outlets cite concerns about scraping, commercial misuse, and AI training. Journalists and advocacy groups (EFF, Fight for the Future) rallied, collecting 100+ journalist signatures urging preservation of the web archive, arguing it’s vital for reporting, fact-checking, and preserving digital history. The dispute matters because limiting archiving erodes journalistic transparency, public recordkeeping, and researchers’ ability to audit historical web content amid rising AI data-use anxieties.
The Internet Archive’s Wayback Machine—home to over a trillion archived web pages used by journalists, researchers, and courts—is losing access to major publishers after The New York Times and others began blocking its crawlers to prevent AI scraping. The move, driven by publisher concerns about AI models being trained on copyrighted news, risks erasing a decades‑long historical record of how stories originally appeared online. The article argues that archiving and searchable copying have established fair‑use precedent (citing past cases like Google Books) and that nonprofit preservation serves a transformative, public‑interest purpose distinct from commercial AI training. It warns that cutting off the Archive to control AI access would harm future research and the public record.