Los Angeles is sitting on a digital storage problem that nobody wants to own. Across city departments, public libraries, and cultural institutions stretching from Boyle Heights to Brentwood, duplicate images have accumulated for years inside government servers — redundant photos, scanned documents, and archival assets that drain storage budgets, slow retrieval systems, and make public records requests slower and more expensive to fulfill.
The issue has sharpened in 2026 because the clock is ticking. With the 2028 Olympic Games less than two years out, agencies including the Los Angeles Department of Transportation and the Bureau of Engineering are under pressure to modernize data infrastructure to support real-time event coordination and media operations. Duplicate image files — some estimates from municipal IT circles place redundancy rates above 30 percent in older departmental systems — represent both a cost and a liability when response times matter.
Where the Backlog Lives
The Los Angeles Public Library system, which operates 73 branch locations and maintains the digitized California Historical Society Photo Collection through its Central Library on Fifth Street downtown, is one of the most visible pressure points. Librarians and archivists have long flagged that digitization drives conducted between 2015 and 2022 produced large volumes of near-duplicate scans — slightly different exposures or resolutions of the same physical image — without a consistent protocol for selecting a canonical version and retiring the rest.
The Getty Research Institute in Brentwood faces a parallel challenge at a larger scale. Its digital asset management system holds hundreds of thousands of images from international acquisitions, and staff have described publicly, in published institutional reports, the difficulty of implementing deduplication without disrupting metadata chains that tie images to provenance records, rights clearances, and loan agreements.
At the municipal level, the Mayor's Office of Innovation — operating under Karen Bass's housing emergency framework, which has already stretched city IT resources — has not yet issued a unified citywide standard for image deduplication. That gap matters. Without a shared protocol, individual departments make incompatible choices, and the problem compounds.
The Decisions That Will Define the Next 18 Months
Three forks in the road are coming fast. First, the city must decide whether to contract a single enterprise deduplication platform or allow departments to procure their own tools. Enterprise licensing for image management software at municipal scale typically runs between $400,000 and $1.2 million annually depending on storage volume — a range that matters when the city's technology budget is already absorbing costs from Olympic infrastructure upgrades along the Crenshaw/LAX Metro line corridor.
Second, cultural institutions need to settle on a hash-based versus perceptual matching approach. Hash-based deduplication identifies exact binary duplicates — fast and cheap, but it misses near-duplicates. Perceptual matching catches visually similar images even when file properties differ, which is critical for archives but computationally heavier and more expensive. The Los Angeles County Museum of Art, which announced a major digital collections expansion in late 2024, is understood to be evaluating perceptual tools, though no contract has been publicly awarded.
Third — and most consequential — is the question of human review. Automated deduplication tools can flag candidates for deletion, but archivists and legal staff must still sign off before any image tied to a public record or rights agreement is retired. That labor cost is real. At current rates for certified archivists in Los Angeles County, sustained review work runs roughly $75 to $95 per hour, and a backlog of even 500,000 flagged image pairs represents a significant staffing commitment.
The practical path forward requires the Mayor's Office of Innovation to publish written guidance before the end of the third quarter of 2026 — ideally a tiered framework that lets small branch libraries use lightweight hash tools while larger institutions with complex provenance requirements adopt perceptual systems under separate funding. Institutions waiting for that guidance should document their current redundancy rates now, building the evidence base for budget requests. Departments that skip that step going into 2027 planning cycles will find themselves arguing for resources without the numbers to back them up.