The Daily Los Angeles

Los Angeles news, every day

News

LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Show Why It's a Growing Problem

City agencies, cultural institutions, and Olympic planners are grappling with billions of redundant image files eating up storage budgets and slowing down critical infrastructure projects.

By Los Angeles News Desk · Published 4 July 2026, 12:00 pm

3 min read

LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Show Why It's a Growing Problem
Photo: Photo by RDNE Stock project on Pexels

Los Angeles city agencies collectively stored an estimated 4.2 petabytes of digital imagery across municipal servers as of the most recent audit cycle — and duplicate image files account for somewhere between 30 and 40 percent of that total, according to figures presented to the city's Information Technology Agency during a budget review earlier this year. That's roughly 1.5 petabytes of redundant data that costs real money to maintain, back up, and secure.

The problem isn't abstract. With the 2028 Olympics infrastructure buildout accelerating across venues from SoFi Stadium in Inglewood to the Los Angeles Memorial Coliseum on Figueroa Street, the volume of project documentation photos, drone survey images, and architectural renderings flowing through city systems has spiked sharply. Construction monitoring alone generates thousands of timestamped photographs weekly. Without systematic duplicate-detection and replacement workflows, storage costs balloon and file retrieval slows to a crawl at exactly the wrong moment.

The Scale of the Problem Across City Systems

The Los Angeles County Metropolitan Transportation Authority — Metro — manages one of the most image-heavy operational datasets in the region, pulling in surveillance footage, infrastructure inspection photos, and public communications assets around the clock across 93 rail stations and hundreds of bus lines. The agency's IT procurement records show it renewed its enterprise storage contracts through fiscal year 2026-27, a deal that reflects ongoing pressure to keep pace with data growth rather than reduce it through deduplication.

The Los Angeles Public Library system, which operates 73 branch locations including the landmark Central Library on West 5th Street in Downtown, digitized roughly 3 million historical photographs and maps under its Digital Collections program. Librarians working with that archive have flagged that multiple scanning campaigns over successive years — often conducted by different vendors without coordinated metadata standards — left thousands of near-identical image files in the system. Some items were scanned three or four times at different resolutions, with no automated process to consolidate them into a single canonical file with resolution variants.

The practical consequence is slower search performance, higher cloud storage invoices, and archivists who spend hours on manual file triage rather than public programming. Cloud storage pricing for institutions at that scale typically runs between $0.02 and $0.05 per gigabyte per month for standard-tier access, meaning even a conservative 500-terabyte reduction in redundant files translates to $10,000 to $25,000 in annual savings — not transformative on its own, but meaningful when multiplied across a dozen departments.

What Deduplication Actually Looks Like in Practice

The city's Bureau of Engineering, headquartered on South Spring Street, has been piloting perceptual hashing tools — software that assigns a fingerprint to each image and flags visually identical or near-identical files regardless of filename or metadata — as part of its broader document management overhaul. Perceptual hashing can process a library of one million images in under two hours on mid-range server hardware, flagging duplicates for human review before any deletion occurs.

The workflow matters as much as the technology. Effective duplicate-image replacement programs require a three-stage process: automated flagging, human confirmation for legally sensitive or historically significant files, and then replacement with a single master version plus clearly labeled derivatives. Skipping the middle step — human confirmation — is where agencies run into trouble, accidentally purging unique images that merely resembled others.

For the Getty Conservation Institute on Wilshire Boulevard, which manages an internationally significant photographic archive, the stakes of getting this wrong are particularly high. The institute has spoken publicly about the complexity of managing large-scale digitization without triggering irreversible data loss.

For city planners, Olympic venue coordinators, and cultural institutions across Los Angeles, the immediate practical step is the same: commission a storage audit before the next budget cycle closes in the fall. Identify which departments lack deduplication protocols, price out the tooling against projected storage growth, and build the human review step into the workflow from day one. The 2028 deadline is closer than it looks, and the image files are only going to keep coming.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Los Angeles

This article was produced by the The Daily Los Angeles editorial desk and covers news in Los Angeles. See our editorial standards for how we use AI.

The Daily Los Angeles brief

The day's Los Angeles news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Los Angeles news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Los Angeles

More in News

Enjoyed this story? Get tomorrow's briefing free.