The Daily Los Angeles

Los Angeles news, every day

News

LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story

From city hall records to wildfire damage files, redundant image data is quietly consuming millions in storage budgets across Los Angeles municipal systems.

By Los Angeles News Desk · Published 4 July 2026, 12:00 pm

4 min read

LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
Photo: Photo by Juan Sebastian Vasquez Delgado on Pexels

Los Angeles city departments collectively stored an estimated 4.7 petabytes of digital image data across municipal servers as of the most recent inventory conducted by the city's Information Technology Agency in early 2026 — and IT administrators working inside those systems say a significant share of that load is duplicate or near-duplicate files that serve no archival purpose. The problem is sprawling, expensive, and gaining new urgency as the city races to build out digital infrastructure ahead of the 2028 Summer Olympics.

The timing matters for reasons beyond Olympics prep. Mayor Karen Bass declared a housing emergency in 2023, and the programs that followed — from temporary shelter documentation to fire-damage assessment photography after the January 2025 Palisades and Eaton fires — generated enormous volumes of photographic records in a compressed period. When field workers, contractors, and city inspectors each upload versions of the same site photograph through different portals, the duplication compounds fast. The Los Angeles Housing Department alone operates at least three separate intake systems that can receive image uploads simultaneously, according to department workflow documentation reviewed by The Daily Los Angeles.

The Storage Bill Nobody Wants to Talk About

Cloud storage is not free. The city's contract with enterprise cloud providers — a line item inside the ITA's annual budget submitted to the City Council — runs to tens of millions of dollars per fiscal year. Industry benchmarks from the Storage Networking Industry Association suggest that between 20 and 40 percent of unmanaged enterprise image repositories consist of duplicates. Apply even the low end of that range to LA's 4.7-petabyte figure and the redundant data burden runs into the hundreds of terabytes. At standard enterprise cloud rates of roughly $20 to $23 per terabyte per month, the math on wasted spend becomes uncomfortable quickly.

The Los Angeles County Metropolitan Transportation Authority faces a parallel version of this problem. Metro's project documentation for the Crenshaw/LAX Line extension and the ongoing Purple Line work through Westwood generated photographic inspection records across dozens of contractor teams. Without a centralised deduplication protocol, the same tunnel segment can appear in dozens of separately filed photo sets. Metro's technology division has been piloting image-hash matching tools since late 2025, though the agency has not publicly released findings from that pilot.

At the city level, the Department of Public Works maintains a GIS-linked photo archive that covers everything from pothole documentation on Vermont Avenue to post-storm drainage surveys in the Sepulveda Basin. Public records requests filed by The Daily Los Angeles in May 2026 confirmed the archive exceeded 900,000 individual image files added in 2025 alone — a single calendar year. No automated deduplication layer was active on that repository at the time of the request.

What Deduplication Actually Costs — and Saves

The technology to solve this is not exotic. Perceptual hashing algorithms, which generate a numeric fingerprint for each image and flag near-identical matches, are commercially available from vendors including Google Cloud Vision and AWS Rekognition, as well as open-source tools like ImageMagick. A mid-sized municipal deployment handling roughly 100 terabytes of image data typically runs between $80,000 and $200,000 for initial implementation, according to procurement data from comparable government technology projects in Denver and Chicago published in 2025.

The ITA has flagged digital asset rationalisation as a priority in its FY2026-27 strategic plan, a document posted to the city's data portal at data.lacity.org. The plan does not specify a deduplication budget line, but it references a broader data hygiene initiative tied to the city's preparations for the influx of media and operational photography during Olympic and Paralympic events in 2028. Los Angeles Memorial Coliseum, SoFi Stadium in Inglewood, and dozens of satellite venues will each generate their own photographic documentation workflows over the next 24 months.

For city departments looking to act before a formal ITA rollout, the practical path is straightforward: conduct a hash-based audit of existing repositories before the end of the 2026 calendar year, establish a single image intake portal per department, and write deduplication requirements into any new vendor contracts. The alternative is paying cloud vendors to store the same photograph of a cracked sidewalk on Figueroa Street four times over — which, scaled across a city of four million people, is exactly what is already happening.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Los Angeles

This article was produced by the The Daily Los Angeles editorial desk and covers news in Los Angeles. See our editorial standards for how we use AI.

The Daily Los Angeles brief

The day's Los Angeles news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Los Angeles news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Los Angeles

More in News

Enjoyed this story? Get tomorrow's briefing free.