The Daily Los Angeles

Los Angeles news, every day

News

How Los Angeles's Digital Archives Ended Up Flooded With Duplicate Images — And What's Being Done About It

From the city's rapid COVID-era digitization push to the 2028 Olympics content boom, L.A.'s public and cultural institutions are now grappling with a metadata mess years in the making.

By Los Angeles News Desk · Published 4 July 2026, 12:06 pm

3 min read

How Los Angeles's Digital Archives Ended Up Flooded With Duplicate Images — And What's Being Done About It
Photo: Photo by ubeyonroad on Pexels

Los Angeles city agencies and cultural institutions are sitting on digital image libraries riddled with duplicates — some collections carrying redundancy rates that eat up significant storage budgets and make public records searches unreliable. The problem didn't happen overnight, and understanding how it developed matters now because several major institutions are mid-contract on remediation projects that will shape how the city manages visual assets through the 2028 Summer Olympics.

The roots go back to roughly 2020 and 2021, when the pandemic forced a scramble. The Los Angeles County Department of Arts and Culture, the Los Angeles Public Library system, and city planning offices all accelerated digitization timelines to keep collections accessible during facility closures. Scanning vendors were hired quickly, quality-control protocols were compressed, and batches of images entered databases without consistent hash-checking — the standard technical process that flags when an identical file already exists in a system. The result was layer upon layer of near-identical images stored under slightly different filenames, different metadata tags, or in separate departmental silos that never talked to each other.

A Problem That Compound Over Time

The Los Angeles Public Library's Central Library on West Fifth Street in Downtown L.A. holds one of the city's largest photographic archives, including the Los Angeles Times Photographic Archive donated to UCLA's Charles E. Young Research Library in Westwood — a collection that itself runs to more than two million prints and negatives. Cross-institutional digitization partnerships between those two bodies meant images sometimes entered both systems independently, with no automated reconciliation step built into early workflow agreements.

By 2023, the Getty Research Institute in Brentwood and the Autry Museum of the American West near Griffith Park were both working through similar internal audits, according to publicly available grant documentation filed with the Institute of Museum and Library Services. The IMLS awarded several California institutions funding under its National Leadership Grants for Libraries program in fiscal year 2023, with metadata remediation listed as an eligible project category. The duplication problem is not unique to Los Angeles — the Digital Public Library of America has noted widespread inconsistency in how contributing institutions handle image deduplication — but the scale here is amplified by the city's sheer volume of legacy collections.

The economic stakes are real. Cloud storage pricing, while falling, still costs institutions real money at scale. Amazon Web Services S3 standard storage ran approximately $0.023 per gigabyte per month as of early 2026. A collection carrying even 500,000 redundant high-resolution image files — each averaging 25 megabytes — translates to roughly 12.5 terabytes of unnecessary storage, and the associated compute costs for indexing and serving those files compound the waste. For publicly funded archives operating under tight budget cycles, that is not a trivial line item.

Why 2026 Is the Inflection Point

The urgency is sharpened by two converging pressures. First, the Mayor's office has pushed digital infrastructure improvements as part of the broader Karen Bass administration housing and city services modernization agenda, which includes open-data commitments that require cleaner, more searchable public image records. Second, the 2028 Olympics and Paralympic Games — with venues stretching from SoFi Stadium in Inglewood to the Sepulveda Basin Sports Complex in the San Fernando Valley — are generating a historic wave of new photography, renderings, and archival pulls. Content managers working with LA28, the organizing committee headquartered in Downtown Los Angeles, have flagged that inheriting a duplicated-asset environment would create significant rights-clearance complications on top of the storage problem.

Several institutions are now piloting perceptual hashing tools — software that identifies visually similar images even when file metadata differs — as a more robust solution than simple filename or checksum matching. The approach has been used successfully by news wire services and stock photography platforms for years, and the technology has become cheaper to run at scale. For Angelenos who rely on public archives, the practical outcome of getting this right is faster, more accurate search results and a more trustworthy public record. The harder work — reconciling rights, correcting mislabeled metadata, and deciding what actually gets deleted versus archived offline — is still ahead.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Los Angeles

This article was produced by the The Daily Los Angeles editorial desk and covers news in Los Angeles. See our editorial standards for how we use AI.

The Daily Los Angeles brief

The day's Los Angeles news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Los Angeles news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Los Angeles

More in News

Enjoyed this story? Get tomorrow's briefing free.