The Daily Los Angeles

Los Angeles news, every day

News

LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering

City agencies, nonprofits, and cultural institutions across Los Angeles are grappling with bloated digital libraries where duplicate image files are quietly eating storage budgets and slowing down AI-driven projects tied to the 2028 Olympics and the ongoing homelessness response.

By Los Angeles News Desk · Published 4 July 2026, 12:16 pm

3 min read

LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering
Photo: Photo by Belle Co on Pexels

Los Angeles city departments collectively manage tens of millions of digital image files, and a growing share of that inventory is redundant — the same photograph stored two, five, sometimes a dozen times across disconnected servers. The problem has a name in the data management world: duplicate image proliferation. And in LA, it is costing real money.

The timing matters because the city is mid-sprint on several technology-heavy initiatives. The 2028 Olympic organizing committee, LA28, is building out digital asset management systems to handle promotional content, venue photography, and media rights files. The Mayor's office of housing is cataloguing property conditions as part of the Bass administration's housing emergency declaration. Both efforts depend on clean, deduplicated image libraries — and both are running into the same wall.

What the Data Actually Shows

Cloud storage pricing gives the problem a dollar value. Amazon Web Services S3 standard storage, widely used by public-sector contractors in California, runs approximately $0.023 per gigabyte per month as of mid-2026. A single high-resolution image from a modern DSLR or drone can clock in at 25 to 50 megabytes. Multiply that by millions of duplicates sitting idle and the monthly bill adds up fast — industry benchmarks from data management firms suggest that between 30 and 40 percent of unmanaged digital asset libraries consist of exact or near-exact duplicate files.

The Los Angeles County Museum of Art on Wilshire Boulevard undertook a digital collection audit in 2024 and found the exercise required dedicated staff time over several months just to flag redundant files in its public image catalog. The Getty Center in Brentwood, which manages one of the largest art image repositories in the western United States, has invested in proprietary deduplication tooling as part of its broader digital infrastructure work. Neither institution has published a specific dollar figure tied to the cleanup, but the labor and licensing costs associated with large-scale deduplication projects routinely run into six figures for collections of comparable size.

For city government, the stakes are higher. The Los Angeles Housing Department, operating under the Mayor's emergency housing directives, has been photographing properties across South LA, Boyle Heights, and the San Fernando Valley to document conditions tied to code enforcement and interim housing programs. Field teams using mobile devices upload images through multiple apps and platforms, and without a centralized deduplication protocol, the same property can end up photographed and stored dozens of times under different file names.

The AI Connection — and the 2028 Deadline

Duplicate images are more than a storage expense. They distort machine learning models. When AI systems used for property assessment, crowd management planning at venues like SoFi Stadium in Inglewood, or permitting workflows are trained on image datasets that contain heavy duplication, the models can overweight certain visual patterns and underperform on edge cases. Data scientists refer to this as training data contamination.

LA28 has a hard deadline. The Olympic and Paralympic Games open on July 14, 2028 — less than 24 months away. The organizing committee's technology partners are already building the content pipelines that will handle hundreds of thousands of venue, athlete, and event images. Getting deduplication protocols in place before that library scales is significantly cheaper than cleaning it up afterward.

Perceptual hashing is the standard technical solution — algorithms that generate a fingerprint for each image based on visual content rather than file metadata, catching duplicates even when file names, formats, or compression levels differ. Tools including open-source libraries like ImageHash and commercial platforms from vendors such as Cloudinary offer this at scale. Los Angeles–based digital production firms in Culver City and the broader Hollywood tech corridor have been quietly building deduplication workflows into their post-production pipelines for the past three years, driven partly by the economics of streaming content delivery.

For city agencies and nonprofits still running manual processes, the practical first step is an audit. Cataloguing what exists — across Google Drive folders, SharePoint instances, and legacy on-premise servers — is unglamorous work, but every duplicate identified before a major AI training run or a public-facing digital launch is one less error baked into the system downstream.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Los Angeles

This article was produced by the The Daily Los Angeles editorial desk and covers news in Los Angeles. See our editorial standards for how we use AI.

The Daily Los Angeles brief

The day's Los Angeles news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Los Angeles news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Los Angeles

More in News

Enjoyed this story? Get tomorrow's briefing free.