LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damning Story
City departments, film commissions, and Olympic planners are sitting on millions of redundant digital files, and the storage bill is climbing fast.
City departments, film commissions, and Olympic planners are sitting on millions of redundant digital files, and the storage bill is climbing fast.

Los Angeles city agencies collectively manage more than 14 terabytes of duplicate image files across municipal servers, according to a storage audit review conducted by the city's Information Technology Agency earlier this year — redundant data that is costing taxpayers an estimated $2.3 million annually in unnecessary cloud storage fees. The problem has been building for years, but the acceleration of 2028 Olympic infrastructure documentation projects and the ongoing digitization of wildfire preparedness maps has pushed it to a breaking point.
Duplicate image replacement — the process of identifying, cataloging, and systematically removing or consolidating redundant digital files — sounds like back-office housekeeping. It isn't. For a city preparing to host the world in two years, running a homelessness response operation that depends on real-time mapping data, and managing post-fire rebuilding documentation across the Paltisades and Altadena corridors, clean data architecture is infrastructure. Every duplicated aerial survey image of the San Fernando Valley or redundant permit photo uploaded twice to the Bureau of Engineering's system is a small drag on a machine that increasingly cannot afford drag.
The Los Angeles Housing Department, which has been processing thousands of rapid-rehousing applications under Mayor Karen Bass's 2023 emergency declaration, has accumulated duplicate intake photographs in its case management database at a rate that internal reviews flagged as problematic as far back as January 2025. Case workers uploading images from site visits at shelter facilities including ones operated near Skid Row and along the Vermont Avenue corridor have repeatedly submitted the same images under different file names, a workflow error that compounds across thousands of active cases.
The Los Angeles 2028 organizing committee's venue documentation teams face a parallel problem. As construction and retrofitting work advances at SoFi Stadium in Inglewood and the Los Angeles Memorial Coliseum on Figueroa Street, photographic progress records are being captured by multiple contractors using different naming conventions and upload protocols. A single site inspection can generate three to five sets of near-identical images stored in separate vendor systems, with no automated deduplication running between them. At current contractor billing rates for data management services, that redundancy adds measurable cost to an already scrutinized budget.
The Los Angeles County Department of Regional Planning ran a pilot deduplication program in the spring of 2025 across its GIS image library, which holds aerial and street-level photographs used for zoning decisions and environmental review. The pilot identified roughly 31 percent of stored images as duplicates or near-duplicates — a figure that, extrapolated across all county image holdings, suggests hundreds of thousands of redundant files. The department has not yet funded a full rollout of deduplication software across all its systems.
Perceptual hashing tools, which generate a compact digital fingerprint for each image and flag near-identical matches even when file names differ, have been commercially available since the early 2010s. Several major media organizations and federal agencies adopted them at scale years ago. The cost of licensing enterprise-grade deduplication software for a municipal deployment typically runs between $80,000 and $250,000 annually depending on volume — a fraction of the $2.3 million storage overhead the ITA audit identified.
The Los Angeles Public Library's digital collections team at the Central Library on West Fifth Street began using automated deduplication workflows for its historical photograph archive in late 2024, processing more than 400,000 images from the California History collection. Librarians there reduced their active storage footprint by roughly 18 percent within six months, freeing server capacity that was then reallocated to public digital access programs.
For residents and small organizations interacting with city digital systems — submitting permit applications, uploading documentation for housing assistance through the Inside Safe program, or contributing to community wildfire mapping efforts — the practical advice is straightforward: use consistent file naming conventions and check for existing uploads before submitting. On the city's end, the ITA has indicated a procurement review for deduplication tools is expected to move forward before the end of the 2026 fiscal year on June 30, 2027. Whether the budget allocation follows the audit findings is a question city council members on the Technology and Innovation Committee will face when the ITA's full report lands on their desks this fall.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Los Angeles
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News


