The Daily Los Angeles

Los Angeles news, every day

News

LA's Digital Archives Are Riddled With Duplicate Images — and the Numbers Tell a Costly Story

From city hall records to the LAPL photo collection, redundant image files are quietly draining storage budgets and slowing down the public systems Angelenos rely on.

By Los Angeles News Desk · Published 4 July 2026, 11:36 am

3 min read

LA's Digital Archives Are Riddled With Duplicate Images — and the Numbers Tell a Costly Story
Photo: Photo by Cristiane Doffini / Pexels

Los Angeles city agencies and public institutions collectively hold tens of millions of digital image files across fragmented server systems — and a growing share of those files are exact or near-exact duplicates eating up expensive storage capacity that taxpayers fund. The problem is measurable, documented in internal IT audits, and getting worse as agencies digitize backlogs ahead of the 2028 Olympics infrastructure push.

The issue matters right now because the city is in the middle of an unprecedented digitization sprint. The Los Angeles Public Library's Central Branch on West Fifth Street in downtown has been scanning historical photo collections since 2022. The Los Angeles County Department of Arts and Culture has its own digital asset management program. The Mayor's Office of Budget and Innovation has flagged data infrastructure costs in consecutive fiscal years. When duplicate images go undetected and unremediated, every copied file sits on hardware that costs real money to lease, power, and maintain.

What the Data Actually Shows

Industry benchmarks from enterprise storage research consistently put duplicate and redundant file rates in large municipal archives between 20 and 40 percent of total stored data — meaning that for every 10 terabytes a city department holds, as many as 4 terabytes may be functionally identical copies of files that already exist elsewhere on the same system. For a department paying commercial cloud storage rates — which ran between $0.02 and $0.023 per gigabyte per month on major platforms as of early 2026 — that redundancy compounds into five- and six-figure annual waste at meaningful scale.

The Los Angeles City Archives, housed in the Piper Technical Center on North Spring Street in Lincoln Heights, manages records for dozens of departments, including Planning, Public Works, and the City Clerk. The archive began a phased migration to hybrid cloud storage in fiscal year 2024-25. Duplicate image detection — the process of algorithmically identifying files that share identical or near-identical pixel data, hash values, or metadata — was listed as a line-item requirement in the migration contract scope. Whether that deduplication work has been completed on schedule is not publicly confirmed in documents reviewed for this article.

The LAPL's Digital Collections portal, accessible at digitallibrary.lapl.org, currently indexes more than 180,000 digitized photographs, maps, and documents from the Los Angeles Herald Examiner collection alone. Librarians and archivists have acknowledged publicly, in presentations at the California Library Association's 2025 annual conference in Long Beach, that deduplication workflows remain a known gap in large-scale digitization projects statewide — particularly when scanning batches are uploaded by multiple contractors without centralized hash-checking protocols.

Why Automated Detection Is Harder Than It Sounds

Duplicate image replacement is not simply a matter of deleting obvious copies. Municipal records often require version histories. A photograph scanned at 300 DPI and again at 600 DPI produces two files that are related but technically distinct — and archivists must decide which version becomes canonical before the lower-resolution copy can be flagged for removal. Metadata attached to each scan — date of digitization, operator ID, chain-of-custody notes — may itself carry legal significance in records subject to California Public Records Act requests.

The City of Los Angeles IT Agency, which oversees the citywide enterprise systems under its contract framework, has a data governance working group that is expected to publish updated deduplication standards before the end of calendar year 2026. Those standards will apply to all departments migrating to the city's consolidated GovCloud environment, a project with a multi-year timeline tied partly to readiness benchmarks for the 2028 Games.

For members of the public who submit Public Records Act requests and receive image-heavy document packages, the practical consequence of poor deduplication is simple: bloated PDF downloads, slower portal response times, and occasional version confusion when duplicate files carry conflicting metadata. The City Clerk's online records portal at cityclerk.lacity.gov logged more than 2.1 million document page views in fiscal year 2024-25, according to the department's annual report. Even marginal efficiency gains from cleaned-up image libraries translate into faster load times at that volume. Agencies that complete deduplication audits before the 2026 year-end deadline will be better positioned when Olympic-related public record requests accelerate in 2027.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Los Angeles

This article was produced by the The Daily Los Angeles editorial desk and covers news in Los Angeles. See our editorial standards for how we use AI.

The Daily Los Angeles brief

The day's Los Angeles news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Los Angeles news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Los Angeles

More in News

Enjoyed this story? Get tomorrow's briefing free.