Anna’s Archive, the open-source search engine behind several large shadow libraries, has claimed it scraped Spotify’s music catalogue at scale, acquiring metadata for around 256 million tracks and archiving roughly 86 million songs in total.
According to a blog post published by the group, the collection spans more than 15 million artists and over 58 million albums, with the total dataset weighing in at just under 300TB. Anna’s Archive says it intends to make the files publicly available for download in stages, beginning with the most popular music.
“A while ago, we discovered a way to scrape Spotify at scale,” the group wrote. “We saw a role for us here to build a music archive primarily aimed at preservation.” While Spotify does not represent all the music ever recorded, the group described the platform as “a great start” for building what it calls a long-term cultural archive.
The group estimates that the 86 million songs it has already archived account for roughly 99.6 percent of all listens on Spotify, despite representing only about 37 percent of the platform’s total catalogue. Millions of additional tracks, it says, remain to be archived.
Anna’s Archive has historically focused on text-based material such as books, journals, and research papers, arguing that written works offer the highest information density. However, the group says its broader mission of “preserving humanity’s knowledge and culture” applies equally to music and other media formats.
The legality of the project is not in doubt. Scraping, hosting, and distributing copyrighted music without permission is a clear violation of intellectual property laws in most jurisdictions. Anna’s Archive does not dispute this, positioning its work instead as a response to what it sees as fragile, incomplete, or commercially biased cultural preservation efforts.
The group argues that existing music collections, both physical and digital, tend to over-represent popular artists or rely on extremely large, high-fidelity files that are impractical to store at scale. By contrast, it claims its archive prioritises breadth, efficiency, and long-term accessibility. It also says the metadata it has compiled is the largest publicly available music metadata database to date.
The music files will be released gradually, ordered by popularity, and made available to anyone with sufficient storage capacity. Whether the archive remains accessible for long, or becomes the target of swift legal action, remains to be seen.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.