Los Angeles city departments collectively store an estimated tens of millions of digital image files across municipal servers, and a significant share of that archive is made up of exact or near-exact duplicates — a problem that IT administrators at agencies ranging from the Bureau of Street Services to the Department of Building and Safety have been quietly wrestling with for years. The duplication problem is not abstract. Storage costs real money, wastes real time, and in a city preparing to host the 2028 Summer Olympics while simultaneously managing a declared housing emergency, neither resource is in surplus.
The urgency has sharpened in 2026 partly because of scale. The city's 2028 Olympic Infrastructure Office, headquartered near the Coliseum in South Los Angeles, has been onboarding contractors at a rapid pace since late 2024, each firm uploading progress photos, site surveys, and permitting imagery into shared project management systems. Without a standardized deduplication protocol, the same aerial photograph of the Sepulveda Basin sports complex can exist in six separate folders across three platforms simultaneously — each copy consuming storage that costs the city money under enterprise licensing agreements.
What the Data Actually Shows
Industry benchmarks from enterprise data management firms suggest that between 20 and 40 percent of files in large unmanaged digital repositories are duplicates or near-duplicates. Apply even the conservative end of that range to a department like the Los Angeles Department of City Planning, which has been digitizing permit records and environmental review documents — many of them image-heavy — since its citywide digitization push began in earnest around 2019, and the redundancy problem becomes substantial fast.
Cloud storage pricing compounds the issue. Standard enterprise cloud storage tiers used by California government agencies typically run between $0.02 and $0.05 per gigabyte per month under state procurement contracts. A municipal archive carrying 500 terabytes of redundant image data — a figure that falls well within the plausible range for a city the size of Los Angeles — could be paying an excess of several hundred thousand dollars annually just to store files that are perfect copies of files already stored elsewhere on the same system.
The Los Angeles Housing Department, which manages the city's response to Mayor Karen Bass's January 2023 homelessness emergency declaration, has been particularly affected. Case workers and inspection teams in neighborhoods from Skid Row to Van Nuys generate field photographs daily — before-and-after shots of encampment clearances, interim housing unit inspections, and shelter capacity documentation. Those images flow into at least three separate tracking platforms the department uses, according to publicly available procurement records, meaning duplication rates in that particular workflow could exceed the industry average.
Fixing It: Tools, Timelines, and Trade-offs
Automated duplicate image detection tools — software that uses perceptual hashing or pixel-level comparison algorithms to identify redundant files — have dropped sharply in price over the past five years. Several open-source options exist, and commercial platforms marketed to government clients now offer deduplication as a standard feature rather than an add-on. The Metropolitan Transportation Authority, which operates bus and rail across Los Angeles County and manages its own substantial surveillance and infrastructure image archive, began piloting one such system in fiscal year 2024-25, according to publicly available board agenda documents.
The practical path forward for most departments involves three steps: running a full repository audit to establish a baseline duplication rate, implementing hash-based detection on any new image ingest pipeline, and setting a retention policy that designates a single canonical copy of each image with clear metadata. None of this is technically complicated. The obstacle is almost always budget priority and staff time — both of which, in Los Angeles right now, are consumed by more visible crises.
For residents and watchdog groups tracking city spending through tools like the LA Controller's open data portal on Spring Street, the duplicate image problem is a useful proxy for a broader question about how well municipal data infrastructure has kept pace with the city's ambitions. The 2028 deadline is fixed. The storage bills are not going down on their own.