The Daily Los Angeles

Los Angeles news, every day

News

LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story

City agencies, museums, and entertainment studios across Los Angeles are sitting on billions of redundant digital files, and the price of doing nothing keeps climbing.

By Los Angeles News Desk · Published 4 July 2026, 11:45 am

3 min read

Los Angeles city departments collectively store an estimated 40 to 60 percent of their digital image archives as duplicate or near-duplicate files, according to digital asset management industry benchmarks applied to municipal systems of comparable scale — a redundancy problem that translates directly into wasted server capacity, inflated licensing costs, and slower emergency-response workflows at a moment when the city can least afford any of those things.

The timing matters. With the 2028 Summer Olympics infrastructure buildout accelerating across venues from SoFi Stadium in Inglewood to the Los Angeles Memorial Coliseum in Exposition Park, city agencies, contractors, and media partners are generating photographic and video documentation at an unprecedented rate. Construction progress photos, aerial drone surveys, accessibility compliance images — every project layer adds to a digital pile that nobody is systematically cleaning up.

The Scale of the Problem Across LA's Institutions

The Los Angeles County Museum of Art, which holds one of the largest digitized art collections on the West Coast, has publicly noted the challenge of managing millions of catalogued image records without producing redundant entries across multiple database migrations. The Getty Center in Brentwood, which oversees the Getty Conservation Institute's extensive photographic documentation of cultural heritage sites, operates internal deduplication protocols as a matter of standard practice — but smaller institutions do not.

The Los Angeles Public Library system, which spans 73 branch locations from the Central Library on West Fifth Street downtown to the Exposition Park branch, digitized roughly 3 million photographs as part of its California Historical Society partnership over the past decade. Digital archivists working with collections of this size routinely report duplicate rates between 15 and 35 percent following bulk ingestion events, when images scanned from physical originals get uploaded in multiple batches without real-time deduplication checks.

For the entertainment industry, the numbers scale into genuinely staggering territory. A single major studio production generates between 500 gigabytes and 2 terabytes of still photography per shoot day. Post-production and publicity workflows at companies operating out of the Paramount Pictures lot in Hollywood or Warner Bros. in Burbank can involve the same image asset passing through five or six separate internal systems — each one potentially storing its own copy. Industry estimates from digital asset management consultancies suggest that storage redundancy across a mid-size studio's annual slate can account for 20 to 30 percent of total cloud storage spend.

What Deduplication Actually Costs — and Saves

Cloud storage pricing gives the problem concrete dollar values. Amazon Web Services S3 standard storage runs approximately $0.023 per gigabyte per month as of mid-2026. An organization sitting on 500 terabytes of image data — not unusual for a major city department or production company — pays roughly $11,500 monthly just for storage. If 25 percent of that is duplicates, that's nearly $3,000 a month spent storing files that add zero informational value.

Deduplication software licensing typically runs between $8,000 and $40,000 annually for enterprise-grade tools, depending on dataset size and indexing complexity. The math on return-of-investment closes fast. Most organizations that run systematic deduplication audits recover costs within six to nine months, according to published case studies from vendors including Cloudinary and Bynder, both of which have clients in the Los Angeles market.

The urgency extends beyond budgets. The Los Angeles Fire Department's after-action documentation from the January 2025 Palisades and Eaton fires produced an enormous volume of aerial and ground-level imagery. Duplicate and mislabeled files in emergency archives slow down insurance processing, legal discovery, and future preparedness planning — all of which are active concerns as wildfire season 2026 approaches a city still rebuilding.

For any organization in Los Angeles sitting on unaudited image archives, the practical path forward starts with a storage audit using hash-based comparison tools, which flag exact duplicates, followed by perceptual hashing software that catches near-duplicates — slightly cropped or recolored versions of the same image. Several digital asset management firms maintain offices in Culver City and West Hollywood. The technology is mature. The main obstacle, consistently, is organizational will to schedule the work before a storage bill or a compliance deadline forces the issue.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Los Angeles

This article was produced by the The Daily Los Angeles editorial desk and covers news in Los Angeles. See our editorial standards for how we use AI.

The Daily Los Angeles brief

The day's Los Angeles news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Los Angeles news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Los Angeles

More in News

Enjoyed this story? Get tomorrow's briefing free.