L.A.'s Digital Archives Are Riddled With Duplicate Images — and the Numbers Reveal a Costly Problem
City agencies and cultural institutions are sitting on millions of redundant files, and the storage bills are climbing fast.
City agencies and cultural institutions are sitting on millions of redundant files, and the storage bills are climbing fast.

Los Angeles city departments and major cultural institutions are collectively storing tens of millions of duplicate digital images across fragmented server systems, a problem that IT administrators and archivists say has ballooned quietly for years and is now generating measurable financial waste at a time when municipal budgets are already stretched.
The issue cuts across nearly every corner of city government and the broader civic infrastructure. From the Los Angeles County Museum of Art's digitization drives on Wilshire Boulevard to the Bureau of Engineering's aerial photo libraries downtown, redundant image files occupy server space that costs real money — and, in some cases, creates compliance headaches for departments required to maintain accurate public records.
The timing matters. Mayor Karen Bass's ongoing housing emergency declaration has pushed city departments to digitize inspection records, permitting photos, and encampment documentation at an accelerated pace since early 2023. More images flowing in faster, with less coordination between departments, means the duplication problem compounds monthly.
Industry benchmarks from enterprise storage research firms suggest that between 20 and 40 percent of files held in large organizational image repositories are exact or near-exact duplicates. For a city the scale of Los Angeles — which manages data across more than 40 distinct departments — that range translates into a significant portion of storage capacity doing nothing but running up costs.
Cloud storage pricing for enterprise customers generally runs between $0.02 and $0.05 per gigabyte per month depending on tier and provider. A single high-resolution aerial photograph of a Los Angeles neighborhood can exceed 500 megabytes. Multiply that across the thousands of inspection and survey images the Department of Building and Safety alone processes in a given quarter, and redundant copies accumulate into terabytes of billable waste.
The Los Angeles City Archives, housed near City Hall East on Alameda Street, manages historical photograph collections that have been partially digitized over the past decade. Archivists working on those collections have long flagged that batch-scanning processes frequently generate multiple file versions — originals, derivatives, thumbnails, re-exports — that end up catalogued inconsistently or not catalogued at all, making deduplication difficult without specialized software tools.
The Getty Center, while a private institution, faces a version of the same challenge at a larger scale. Its digital asset management systems hold millions of image records related to collections, conservation photography, and public programming. The Getty has invested in commercial deduplication tooling, but even well-resourced organizations report that legacy data migrations — moving images from older systems to newer platforms — routinely reintroduce duplicates that analysts must then manually review.
Los Angeles's 2028 Summer Olympics infrastructure push is adding fresh urgency. City departments responsible for venues in Exposition Park, along the Crenshaw corridor, and at the Los Angeles Memorial Coliseum are generating construction-progress photo documentation at a scale they have not handled before. Project managers at the Bureau of Engineering have been flagging since late 2025 that image-management protocols need updating before those files multiply into unmanageable archives.
Deduplication software — tools that hash image files and identify bit-for-bit or perceptual matches — can reduce redundant storage by 15 to 30 percent in typical municipal deployments, according to published case studies from comparable large U.S. cities. The catch is implementation cost. Licensing fees for enterprise-grade platforms can run from $50,000 to several hundred thousand dollars annually depending on the size of the repository, a number that requires dedicated IT budget approval the city has not yet publicly allocated.
For departments operating under existing budget constraints, the practical near-term step is a storage audit — a manual or semi-automated review of file directories to identify the worst duplication clusters. The Los Angeles County Office of Digital Services has previously recommended that all county departments conduct annual data inventories, and city agencies would do well to follow suit before the Olympics documentation surge arrives in earnest. Every terabyte of redundant image data cleared now is one fewer invoice to justify next fiscal year.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Los Angeles
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News


