The Daily Los Angeles

Los Angeles news, every day

News

LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story

City agencies, nonprofits, and production houses across Los Angeles are spending millions managing bloated image libraries full of redundant files, and a new wave of automated deduplication tools is exposing just how bad the problem got.

By Los Angeles News Desk · Published 4 July 2026, 11:51 am

4 min read

LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
Photo: Photo by ubeyonroad on Pexels

Los Angeles city departments collectively store tens of millions of digital image files across fragmented server systems, and internal audits at several agencies have found that duplicate or near-duplicate images routinely account for between 30 and 40 percent of total storage consumption. That's not a rounding error — it's a structural inefficiency that translates directly into budget waste, slower retrieval times, and compounding liability when outdated photos get used in official communications.

The issue has landed with new urgency in 2026 because the city is mid-sprint on infrastructure buildout for the 2028 Olympics, with agencies like the Los Angeles Department of Public Works and the Los Angeles Tourism and Convention Board generating photographic documentation at an accelerated rate. Every groundbreaking, every venue walkthrough, every community engagement event produces hundreds of raw image files. Without a systematic deduplication protocol, those libraries balloon fast.

What the Numbers Actually Look Like

Cloud storage costs the city's Information Technology Agency roughly $0.023 per gigabyte per month on standard enterprise contracts, a figure consistent with published AWS and Google Cloud pricing tiers for large municipal accounts. At scale — and Los Angeles operates at serious scale — storing an extra 50 terabytes of redundant image data costs an estimated $1,150 per month, or roughly $13,800 annually, for that slice alone. Multiply that across a dozen departments and the number climbs into six figures before anyone has written a single line of remediation code.

The Los Angeles County Museum of Art, which manages one of the largest publicly accessible art image databases on the West Coast, began a deduplication initiative in early 2025 covering its digital collections portal on Wilshire Boulevard. Staff there have not publicly disclosed final figures, but industry benchmarks from comparable museum digitization projects suggest libraries of LACMA's scope — running into the hundreds of thousands of catalogued images — typically yield duplicate rates of 15 to 25 percent when forensic hashing tools are applied for the first time. That means tens of thousands of redundant files sitting on servers, each one requiring metadata maintenance, backup cycles, and periodic access audits.

The entertainment sector compounds the picture. Production companies based along the Cahuenga Pass corridor and in Burbank's media district generate enormous image asset libraries tied to pre-production, marketing, and archival workflows. A 2024 report from the Software & Information Industry Association found that media and entertainment firms waste an average of 18 percent of their digital asset management budgets on storage and administration of duplicate files. For a mid-size production house carrying a $2 million annual DAM budget, that's $360,000 a year in avoidable overhead.

The Tools Gaining Ground in LA's Tech and Government Sectors

Automated deduplication software — tools that use perceptual hashing, pixel-level comparison, and increasingly machine-learning classifiers — has matured significantly since 2022. The basic principle is straightforward: the software generates a unique fingerprint for each image file and flags matches above a similarity threshold, whether identical duplicates or near-duplicates like slightly cropped versions of the same shot. What used to require dedicated database engineers now runs as a cloud-native service.

The Los Angeles Housing Department, which has been at the center of Mayor Karen Bass's housing emergency declaration, maintains photographic records tied to inspections, code enforcement actions, and shelter program documentation. As the department has scaled operations since the emergency declaration took effect in December 2022, its image intake has accelerated proportionally. Technology procurement records for the city, which are public under California's Government Code, show the department has evaluated storage optimization tools as part of broader IT modernization contracts in fiscal year 2025-26.

For smaller nonprofits operating out of places like the Downtown LA Arts District or the Boyle Heights community corridor, the practical advice is simpler and cheaper. Open-source tools including dupeGuru and digiKam provide free perceptual-hashing deduplication for organizations without enterprise IT budgets. Running either tool on an unmanaged image library of 100,000 files typically takes under four hours on standard hardware and, based on documented user benchmarks, recovers between 20 and 35 gigabytes of storage on a first pass. As the 2028 Olympic clock ticks and documentation demands on every city-adjacent organization keep rising, running that scan once a quarter is the minimum viable practice.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Los Angeles

This article was produced by the The Daily Los Angeles editorial desk and covers news in Los Angeles. See our editorial standards for how we use AI.

The Daily Los Angeles brief

The day's Los Angeles news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Los Angeles news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Los Angeles

More in News

Enjoyed this story? Get tomorrow's briefing free.