The Daily Los Angeles

Los Angeles news, every day

News

LA's Digital Archives Are Drowning in Duplicate Images — And the Numbers Tell the Story

From city hall databases to the 2028 Olympics planning files, duplicated visual assets are costing Los Angeles agencies millions in wasted storage and staff hours.

By Los Angeles News Desk · Published 4 July 2026, 11:45 am

4 min read

Los Angeles city departments collectively store an estimated tens of millions of digital image files across fragmented servers, and a growing share of that data is redundant — the same photograph, rendering, or scan saved twice, three times, or dozens of times under different filenames. The problem has a name — duplicate image accumulation — and for a city already burning through infrastructure budgets ahead of the 2028 Summer Olympics, it carries a real dollar cost.

The timing matters because several major LA agencies are mid-migration. The Bureau of Engineering is consolidating project documentation onto a unified cloud platform, the Department of City Planning is digitising decades of permit records, and the LA 2028 organising committee is building out media asset libraries that will eventually serve thousands of credentialed journalists and broadcast partners. Each of those workflows is generating duplicate files at scale — and without automated deduplication, those files compound storage costs month over month.

What the Data Shows

Industry benchmarks from enterprise data management firms — not figures specific to LA's own audits, which have not been made public — suggest that between 20 and 30 percent of files in large unmanaged digital asset repositories are exact or near-exact duplicates. Apply even the low end of that range to a city archive running into the hundreds of terabytes and the wasted storage becomes material. Commercial cloud storage for municipal governments typically runs between $0.02 and $0.05 per gigabyte per month under standard enterprise contracts. At 20 percent duplication across a 500-terabyte archive, that is roughly 100 terabytes of avoidable spend — translating to somewhere between $2,000 and $5,000 a month in unnecessary cloud fees, before accounting for bandwidth and retrieval costs.

The Los Angeles Public Library's Photo Collection, housed at the Central Library on West Fifth Street in Downtown, digitised more than 100,000 images as part of a long-running preservation initiative. Librarians working on that project have described — in publicly available program documentation — the challenge of ingesting donated collections where the same print appears in multiple donor batches. Manual deduplication at that scale is slow. Automated hash-matching tools, which compare unique numerical fingerprints of each file, can cut review time by more than 60 percent according to digital preservation literature published by the Library of Congress.

The city's Information Technology Agency, headquartered in the Civic Center complex near Temple Street, oversees data governance standards for most municipal departments. A 2024 citywide technology strategic plan — available on the ITA's public portal — identified data deduplication as a priority efficiency measure but did not attach a specific remediation timeline or budget line to image assets specifically.

The Olympics Deadline Is Concentrating Minds

The 2028 Games are focusing attention on this in ways that routine IT audits have not. LA 2028 planners need media credential systems, venue maps, athlete headshots, and sponsor imagery to function without lag or redundancy errors. The organising committee's technology partners, which include major cloud vendors, are reportedly building deduplication into the asset pipeline from the start — a lesson drawn partly from the experience of prior host cities where archive bloat caused search failures during high-traffic accreditation windows.

Closer to home, the city's ongoing homelessness response infrastructure — coordinated through the LA Homeless Services Authority on South San Pedro Street in the Arts District — relies on case management systems that include photographic documentation of encampment sites and shelter capacity. Duplicate images in those records have created data-matching errors in at least some internal workflows, according to descriptions in LAHSA audit summaries published between 2023 and 2025, though the authority has not publicly quantified the scope.

For agencies that have not yet built deduplication into their file intake processes, the practical path forward involves three steps: running a hash-based scan to identify exact duplicates, using perceptual hashing tools to flag near-duplicates such as cropped or lightly edited versions of the same image, and establishing a retention policy before deleting anything. The last step matters most. In public agencies, records deletion carries legal exposure, and a file that looks like a throwaway duplicate may carry its own chain-of-custody significance. Getting the numbers right means knowing which duplicates are truly redundant — and which ones are doing a job that just isn't visible yet.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Los Angeles

This article was produced by the The Daily Los Angeles editorial desk and covers news in Los Angeles. See our editorial standards for how we use AI.

The Daily Los Angeles brief

The day's Los Angeles news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Los Angeles news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Los Angeles

More in News

Enjoyed this story? Get tomorrow's briefing free.