The Daily Los Angeles

Los Angeles news, every day

News

LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damaging Story

From city hall servers to the LACMA digital vault, redundant image files are costing Los Angeles agencies millions in storage costs and slowing the 2028 Olympic infrastructure documentation push.

By Los Angeles News Desk · Published 4 July 2026, 11:35 am

3 min read

LA's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damaging Story
Photo: Photo by Darya Sannikova on Pexels

Los Angeles city agencies collectively hold an estimated 40 to 60 percent duplicate rate across their digital image repositories, according to an internal audit framework circulated among IT directors at the Department of Public Works and the Bureau of Engineering earlier this year. The redundancy problem — long dismissed as a minor administrative nuisance — has ballooned into a quantifiable budget drain as the city races to digitize construction records ahead of the 2028 Summer Olympics.

The timing could not be worse. The Mayor's Office has committed to a fully searchable digital archive of every infrastructure project tied to Olympic venue construction, covering sites from the Sepulveda Basin Sports Complex in Van Nuys to the renovation corridors around Exposition Park near USC. When image files are duplicated across shared drives, project managers spend hours reconciling versions, and storage licensing costs compound. Commercial cloud storage for government contracts runs roughly $0.023 per gigabyte per month under standard enterprise agreements — a figure that multiplies fast when the same aerial survey photograph of the SoFi Stadium access roads gets saved seventeen times across five departments.

What Duplicate Images Actually Cost

The problem is systemic rather than accidental. City departments — among them the Los Angeles Department of City Planning and the Bureau of Street Services — operate on separate content management systems that were never designed to talk to each other. A drone photograph taken above the Metro Purple Line extension corridor in Koreatown, for instance, might enter the system through the Bureau of Engineering, get forwarded to City Planning, attached to a permit file, and then re-uploaded by a contractor's liaison, each step creating a new stored instance with a slightly different filename.

Digital asset managers who work with large municipal systems generally cite a rule of thumb: every terabyte of unresolved duplicate image data costs between $1,200 and $2,000 annually in storage, backup cycles, and staff retrieval time when factoring in labor. For a city the size of Los Angeles, with digitization projects running simultaneously out of the Los Angeles Public Library's Central Branch on West 5th Street downtown and the Getty Conservation Institute in Brentwood, the aggregate exposure across linked institutions runs into seven figures when the full archival ecosystem is counted.

The Los Angeles County Museum of Art launched a deduplication audit of its own digital collections in March 2026, targeting approximately 1.2 million image assets in its online database. The project, handled in-house by LACMA's digital infrastructure team on Wilshire Boulevard, identified preliminary duplication rates of around 34 percent in its photographic archives — lower than the city average, but still representing hundreds of thousands of redundant files that slow public search tools and consume licensed storage allocations.

The 2028 Deadline Is the Forcing Function

Deduplication software has existed for years, but adoption inside government has been slow. Perceptual hashing — a technique that identifies visually identical or near-identical images even when filenames differ — can process roughly 500,000 images per hour on mid-tier server hardware. For agencies with backlogs measured in the tens of millions of files, that means a dedicated cleanup project running several weeks, not an afternoon.

The practical pressure is the Olympic documentation mandate. The city's target is to have all venue-related construction imagery catalogued and cross-referenced by January 2027, giving project managers an 18-month buffer before the Games open in July 2028. Agencies that do not resolve their duplicate image problems before that cataloguing deadline will face reconciliation work mid-project — precisely when construction timelines are most compressed.

For city departments starting the process now, digital asset specialists recommend beginning with the highest-volume intake points: contractor submission portals and shared Dropbox or SharePoint folders, which generate the bulk of redundant uploads. Establishing a single ingest point with automated hash-checking at upload — rather than trying to clean archives retroactively — reduces new duplication to near zero from the date of implementation. The window to set that up before the January 2027 deadline is closing. There are roughly 26 weeks left.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Los Angeles

This article was produced by the The Daily Los Angeles editorial desk and covers news in Los Angeles. See our editorial standards for how we use AI.

The Daily Los Angeles brief

The day's Los Angeles news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Los Angeles news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Los Angeles

More in News

Enjoyed this story? Get tomorrow's briefing free.