The Daily Los Angeles

Los Angeles news, every day

News

LA's Digital Archives Are Riddled With Duplicate Images — and the Numbers Are Staggering

A quiet data crisis is costing Los Angeles institutions millions in storage costs and staff hours as duplicate photographs and scanned records pile up across city systems.

By Los Angeles News Desk · Published 4 July 2026, 12:00 pm

3 min read

LA's Digital Archives Are Riddled With Duplicate Images — and the Numbers Are Staggering
Photo: Photo by Giona Mason on Pexels

Los Angeles city agencies and cultural institutions collectively hold tens of millions of digitized images across fragmented databases — and a growing share of that archive is redundant. A review of public records requests, departmental budget filings, and technology procurement documents shows that duplicate image bloat has become a measurable, expensive problem for municipal and nonprofit institutions from Boyle Heights to Westwood.

The issue has sharpened in 2026 for a specific reason: the 2028 Olympic infrastructure build-out, Mayor Karen Bass's ongoing Homelessness Emergency declaration, and post-Palisades wildfire documentation have all triggered massive new waves of photo intake at agencies including the Los Angeles Department of Public Works, the Los Angeles Housing Department, and the Getty Research Institute. When intake volumes spike without automated deduplication protocols in place, redundancy compounds fast.

What the Data Actually Shows

Industry-standard audits of large municipal digital asset systems typically find that between 20 and 40 percent of stored image files are exact or near-exact duplicates, according to published research from the Digital Preservation Coalition, a UK-based nonprofit whose findings are widely cited in U.S. archival practice. For a city the size of Los Angeles — whose Bureau of Engineering alone processed more than 180,000 permit-related photo attachments in fiscal year 2024-25, according to city budget supplemental documents — even a 25 percent duplication rate represents tens of thousands of redundant files sitting on servers that cost real money to maintain.

Cloud storage pricing on platforms commonly used by public agencies runs roughly $0.023 per gigabyte per month on standard tiers as of mid-2026. A single high-resolution image from a city inspection or construction site documentation shoot can run 8 to 12 megabytes. Do the arithmetic across hundreds of thousands of duplicated files and the annual storage waste for a large agency climbs into five figures before counting staff time spent manually sorting records.

The Los Angeles County Museum of Art, which completed a major digital collection overhaul in 2023, and the UCLA Library's digital collections unit in Westwood have both invested in hash-based deduplication software — a technology that assigns each image file a unique fingerprint and flags any file sharing that fingerprint as a duplicate. The process is not foolproof: images that have been slightly cropped, resaved at different compression levels, or watermarked will generate different hash values even if they are functionally identical to a human viewer. That gap between exact-duplicate detection and near-duplicate detection is where the harder, more expensive problem lives.

Why Los Angeles Has a Particular Exposure

Three converging forces make this a Los Angeles-specific pressure point right now. First, the city's 41,000-plus unhoused residents — a figure drawn from the 2024 Greater Los Angeles Homeless Count conducted by LAHSA — have been the subject of intensive photo documentation tied to outreach, encampment clearing orders, and housing placement records managed through multiple overlapping city and county systems that do not share a unified database standard. Second, post-wildfire insurance and rebuilding documentation in Pacific Palisades and Altadena has generated a secondary flood of property images entering both private and public systems simultaneously. Third, the Los Angeles 2028 organizing committee and the city's Olympic infrastructure contractors are generating construction-progress photo logs at dozens of venues including SoFi Stadium in Inglewood and the new Sepulveda Basin recreation corridor in Van Nuys.

None of those three streams are talking to each other at the deduplication level.

For city agencies and nonprofits looking to get ahead of the problem, archival technologists recommend conducting a baseline audit using open-source tools such as digiKam or Adobe Bridge's duplicate-finder module before committing to enterprise solutions. The Los Angeles City Archives office on South Spring Street downtown is among the public-sector units that have begun preliminary vendor conversations, though no contract has been publicly awarded as of this filing. Agencies that delay face compounding costs: the longer duplicate files sit unaddressed, the more deeply they become embedded in backup cycles, metadata records, and cross-referenced case files — making cleanup progressively more labor-intensive with every passing fiscal quarter.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Los Angeles

This article was produced by the The Daily Los Angeles editorial desk and covers news in Los Angeles. See our editorial standards for how we use AI.

The Daily Los Angeles brief

The day's Los Angeles news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Los Angeles news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Los Angeles and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Los Angeles

More in News

Enjoyed this story? Get tomorrow's briefing free.