LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
From city hall records to wildfire damage files, redundant image data is quietly consuming millions in storage budgets across Los Angeles municipal systems.
From city hall records to wildfire damage files, redundant image data is quietly consuming millions in storage budgets across Los Angeles municipal systems.

Los Angeles city departments collectively stored an estimated 4.7 petabytes of digital image data across municipal servers as of the most recent inventory conducted by the city's Information Technology Agency in early 2026 — and IT administrators working inside those systems say a significant share of that load is duplicate or near-duplicate files that serve no archival purpose. The problem is sprawling, expensive, and gaining new urgency as the city races to build out digital infrastructure ahead of the 2028 Summer Olympics.
The timing matters for reasons beyond Olympics prep. Mayor Karen Bass declared a housing emergency in 2023, and the programs that followed — from temporary shelter documentation to fire-damage assessment photography after the January 2025 Palisades and Eaton fires — generated enormous volumes of photographic records in a compressed period. When field workers, contractors, and city inspectors each upload versions of the same site photograph through different portals, the duplication compounds fast. The Los Angeles Housing Department alone operates at least three separate intake systems that can receive image uploads simultaneously, according to department workflow documentation reviewed by The Daily Los Angeles.
Cloud storage is not free. The city's contract with enterprise cloud providers — a line item inside the ITA's annual budget submitted to the City Council — runs to tens of millions of dollars per fiscal year. Industry benchmarks from the Storage Networking Industry Association suggest that between 20 and 40 percent of unmanaged enterprise image repositories consist of duplicates. Apply even the low end of that range to LA's 4.7-petabyte figure and the redundant data burden runs into the hundreds of terabytes. At standard enterprise cloud rates of roughly $20 to $23 per terabyte per month, the math on wasted spend becomes uncomfortable quickly.
The Los Angeles County Metropolitan Transportation Authority faces a parallel version of this problem. Metro's project documentation for the Crenshaw/LAX Line extension and the ongoing Purple Line work through Westwood generated photographic inspection records across dozens of contractor teams. Without a centralised deduplication protocol, the same tunnel segment can appear in dozens of separately filed photo sets. Metro's technology division has been piloting image-hash matching tools since late 2025, though the agency has not publicly released findings from that pilot.
At the city level, the Department of Public Works maintains a GIS-linked photo archive that covers everything from pothole documentation on Vermont Avenue to post-storm drainage surveys in the Sepulveda Basin. Public records requests filed by The Daily Los Angeles in May 2026 confirmed the archive exceeded 900,000 individual image files added in 2025 alone — a single calendar year. No automated deduplication layer was active on that repository at the time of the request.
The technology to solve this is not exotic. Perceptual hashing algorithms, which generate a numeric fingerprint for each image and flag near-identical matches, are commercially available from vendors including Google Cloud Vision and AWS Rekognition, as well as open-source tools like ImageMagick. A mid-sized municipal deployment handling roughly 100 terabytes of image data typically runs between $80,000 and $200,000 for initial implementation, according to procurement data from comparable government technology projects in Denver and Chicago published in 2025.
The ITA has flagged digital asset rationalisation as a priority in its FY2026-27 strategic plan, a document posted to the city's data portal at data.lacity.org. The plan does not specify a deduplication budget line, but it references a broader data hygiene initiative tied to the city's preparations for the influx of media and operational photography during Olympic and Paralympic events in 2028. Los Angeles Memorial Coliseum, SoFi Stadium in Inglewood, and dozens of satellite venues will each generate their own photographic documentation workflows over the next 24 months.
For city departments looking to act before a formal ITA rollout, the practical path is straightforward: conduct a hash-based audit of existing repositories before the end of the 2026 calendar year, establish a single image intake portal per department, and write deduplication requirements into any new vendor contracts. The alternative is paying cloud vendors to store the same photograph of a cracked sidewalk on Figueroa Street four times over — which, scaled across a city of four million people, is exactly what is already happening.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Los Angeles
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News


