Los Angeles city departments collectively manage tens of millions of digital image files, and a growing share of that inventory is redundant — the same photograph stored two, five, sometimes a dozen times across disconnected servers. The problem has a name in the data management world: duplicate image proliferation. And in LA, it is costing real money.
The timing matters because the city is mid-sprint on several technology-heavy initiatives. The 2028 Olympic organizing committee, LA28, is building out digital asset management systems to handle promotional content, venue photography, and media rights files. The Mayor's office of housing is cataloguing property conditions as part of the Bass administration's housing emergency declaration. Both efforts depend on clean, deduplicated image libraries — and both are running into the same wall.
What the Data Actually Shows
Cloud storage pricing gives the problem a dollar value. Amazon Web Services S3 standard storage, widely used by public-sector contractors in California, runs approximately $0.023 per gigabyte per month as of mid-2026. A single high-resolution image from a modern DSLR or drone can clock in at 25 to 50 megabytes. Multiply that by millions of duplicates sitting idle and the monthly bill adds up fast — industry benchmarks from data management firms suggest that between 30 and 40 percent of unmanaged digital asset libraries consist of exact or near-exact duplicate files.
The Los Angeles County Museum of Art on Wilshire Boulevard undertook a digital collection audit in 2024 and found the exercise required dedicated staff time over several months just to flag redundant files in its public image catalog. The Getty Center in Brentwood, which manages one of the largest art image repositories in the western United States, has invested in proprietary deduplication tooling as part of its broader digital infrastructure work. Neither institution has published a specific dollar figure tied to the cleanup, but the labor and licensing costs associated with large-scale deduplication projects routinely run into six figures for collections of comparable size.
For city government, the stakes are higher. The Los Angeles Housing Department, operating under the Mayor's emergency housing directives, has been photographing properties across South LA, Boyle Heights, and the San Fernando Valley to document conditions tied to code enforcement and interim housing programs. Field teams using mobile devices upload images through multiple apps and platforms, and without a centralized deduplication protocol, the same property can end up photographed and stored dozens of times under different file names.
The AI Connection — and the 2028 Deadline
Duplicate images are more than a storage expense. They distort machine learning models. When AI systems used for property assessment, crowd management planning at venues like SoFi Stadium in Inglewood, or permitting workflows are trained on image datasets that contain heavy duplication, the models can overweight certain visual patterns and underperform on edge cases. Data scientists refer to this as training data contamination.
LA28 has a hard deadline. The Olympic and Paralympic Games open on July 14, 2028 — less than 24 months away. The organizing committee's technology partners are already building the content pipelines that will handle hundreds of thousands of venue, athlete, and event images. Getting deduplication protocols in place before that library scales is significantly cheaper than cleaning it up afterward.
Perceptual hashing is the standard technical solution — algorithms that generate a fingerprint for each image based on visual content rather than file metadata, catching duplicates even when file names, formats, or compression levels differ. Tools including open-source libraries like ImageHash and commercial platforms from vendors such as Cloudinary offer this at scale. Los Angeles–based digital production firms in Culver City and the broader Hollywood tech corridor have been quietly building deduplication workflows into their post-production pipelines for the past three years, driven partly by the economics of streaming content delivery.
For city agencies and nonprofits still running manual processes, the practical first step is an audit. Cataloguing what exists — across Google Drive folders, SharePoint instances, and legacy on-premise servers — is unglamorous work, but every duplicate identified before a major AI training run or a public-facing digital launch is one less error baked into the system downstream.