Los Angeles city departments collectively store an estimated tens of millions of digital image files across fragmented servers, and a growing share of that data is redundant — duplicate photos clogging infrastructure that officials want reserved for 2028 Olympic operations and emergency management. The duplication problem isn't abstract: it translates directly into dollars, server hours, and delayed workflows at agencies already stretched by housing emergency response and wildfire preparedness demands.
The issue has sharpened this fiscal year because the Mayor's Office of Innovation and the city's Information Technology Agency have both flagged digital storage costs as a line item that has ballooned well beyond original projections in the current budget cycle. Storage inefficiency is one of the clearest and most measurable contributors. Industry benchmarks consistently show that unmanaged media libraries in large municipal systems can carry duplicate-file rates of 30 to 40 percent — meaning nearly four in ten images on a given server are exact or near-exact copies that consume space without serving any unique purpose.
What the Data Actually Shows
The Los Angeles City Archives, housed at 555 Ramirez Street in the Arts District, manages physical and digital records for dozens of departments. Its digital holdings have grown sharply since the COVID-era push to digitize paper records, and staff there have publicly acknowledged the challenge of deduplication at scale without a centralized platform. The Getty Center in Bel Air, while a private institution, has spent years developing internal deduplication protocols for its image collections and has shared methodology with municipal partners — a model that city IT staff have cited in planning documents as a reference point.
On the nonprofit and housing side, the Los Angeles Homeless Services Authority — which runs data collection for the annual Point-in-Time homeless count — stores thousands of photographs documenting encampment conditions, client intake, and outreach fieldwork. Those images feed into reporting under Mayor Bass's Inside Safe program, which has conducted more than 200 operations since its January 2023 launch. Field teams using mobile devices routinely upload images multiple times across different platforms, creating duplicate chains that LAHSA's own technology staff have had to manually reconcile.
The financial stakes are concrete. Cloud storage for large image files — particularly uncompressed JPEGs and RAW formats used in official documentation — runs roughly $0.023 per gigabyte per month on standard municipal cloud contracts. A library of 10 million images averaging 8 megabytes each represents approximately 80 terabytes of raw data. If 35 percent of that is duplicate, the city is paying to store roughly 28 terabytes of files it doesn't need — adding up to around $7,700 a month, or more than $92,000 annually, purely on redundant copies before factoring in backup replication costs, which typically double the effective storage bill.
What Comes Next for City Systems
The pressure to act is sharpest in the context of 2028 Olympics infrastructure planning. LA28, the organizing committee operating out of offices in Downtown Los Angeles, is coordinating with city agencies on a unified digital asset management framework that needs to be operational well before the July 2028 games. Duplicate image data in legacy city systems creates a migration liability — the more redundant files that exist when data gets transferred to new platforms, the higher the labor cost of cleaning it up on the back end.
Deduplication software has matured considerably. Tools that use perceptual hashing — a technique that identifies visually similar images even when file names or metadata differ — can now scan and flag millions of images in hours rather than weeks. Several vendors are actively pitching the city's ITA procurement office, according to public contract solicitation records posted to the city's online vendor portal.
For Angelenos paying attention to how city technology budgets get spent, the practical takeaway is this: the deduplication problem is solvable, the tools exist, and the longer the cleanup gets deferred, the more expensive the eventual fix becomes. With wildfire season accelerating the volume of aerial and ground-level documentation images flooding city servers every summer, July is not a bad time to start counting the cost of copies nobody asked for.