Los Angeles city departments collectively manage more than 14 petabytes of digital assets, and a growing share of that storage is being eaten up by duplicate image files — scanned photographs, permit documents, architectural drawings, and surveillance stills that exist in two, three, or sometimes a dozen identical copies across disconnected servers. The problem has quietly compounded for years, but 2026 is the year the bill is starting to come due.
The timing matters because three overlapping pressures have converged at once. Mayor Karen Bass's ongoing housing emergency directive has forced the Los Angeles Housing Department to digitize thousands of inspection records and code-enforcement photographs going back to the early 2000s. The city's 2028 Olympic infrastructure buildout has added another layer of project documentation across agencies. And a $47 million cloud-migration contract awarded to the city's Information Technology Agency in early 2025 has exposed, for the first time, exactly how much redundant data is being shuffled into expensive cloud tiers rather than cleaned up first.
The Scale of the Problem Across LA's Institutions
The duplication issue is not unique to city government. The Los Angeles Public Library system, which operates 73 branches including the Central Library on West 5th Street downtown, has been digitizing its photo collection since 2019 under a California State Library grant. Archivists there have identified duplicate-image rates running as high as 30 percent in certain donated collections, according to records the library has published in its annual digital preservation reports. That means roughly one in three image files processed during some intake batches requires a manual or automated review before it can be correctly catalogued.
At the UCLA Library's digital collections division in Westwood, the challenge is similar. The institution stores tens of millions of archival images, and its metadata team has publicly described deduplication as one of the top five ongoing operational costs in its digital infrastructure budget. File-comparison software licenses alone can run between $8,000 and $25,000 annually per institutional deployment, depending on the volume of assets being scanned.
For the city's own IT operations, the math gets starker. Standard cloud storage pricing — even at government-negotiated rates — runs roughly $20 to $23 per terabyte per month for warm-tier access. If even five percent of the city's 14-petabyte footprint consists of direct duplicates, that is 700 terabytes of redundant data costing an estimated $14,000 to $16,000 every single month for no functional benefit. Over a fiscal year, that approaches $200,000 in wasted expenditure before labor costs are factored in.
What Deduplication Actually Requires — and What Comes Next
Automated deduplication tools — software that computes cryptographic hashes of each image file and flags identical matches — have been commercially available for years. The problem in large public institutions is rarely the technology. It is the workflow. Files digitized in different years, by different contractors, and stored in different departmental systems often carry different filenames, different metadata tags, and different embedded timestamps, even when the underlying image is pixel-for-pixel identical. Perceptual hashing, a technique that matches near-duplicate images rather than exact copies, adds another layer of complexity and processing cost.
The Los Angeles County Metropolitan Transportation Authority, which manages its own substantial archive of engineering drawings and construction photographs related to ongoing projects like the Eastside Gold Line extension and the West Santa Ana Branch, began piloting a perceptual-hash deduplication workflow in late 2024. The agency has not publicly disclosed results, but the project is listed in its fiscal year 2026 technology initiative disclosures filed with the county.
For smaller cultural organizations along the Wilshire Corridor — including several museums preparing digital exhibitions tied to the 2028 Games — the practical advice from digital preservation consultants is consistent: audit before you migrate. Running a deduplication pass on existing local storage before committing assets to a cloud contract can cut storage costs by 15 to 40 percent, depending on how old the collection is and how many digitization rounds it has been through. The window to do that cheaply is now, before multi-year cloud contracts lock in inflated storage volumes that take years to renegotiate down.