Los Angeles has spent millions converting decades of paper permits, planning documents, and public art records into digital form — but a growing headache has emerged inside those databases: thousands of duplicate images clogging the city's archival systems, slowing retrieval times, and inflating storage costs at a moment when municipal budgets are already stretched thin.
The problem is not cosmetic. City departments managing everything from building inspection photos in Boyle Heights to cultural asset records held by the Los Angeles County Museum of Art's digital collections arm are dealing with redundant image files that were uploaded multiple times across incompatible legacy platforms. Technologists working with the city say the duplication problem compounds every time a new system is layered on top of an old one — which, in Los Angeles, has happened repeatedly since the early 2000s.
The urgency is sharpened by the 2028 Olympics. The city is under pressure to present a coherent, searchable public-facing digital infrastructure by the time athletes and visitors arrive. Duplicate image records inside venue planning databases and public-space documentation systems are a known obstacle to that goal.
What L.A. Is Actually Doing About It
The Bureau of Engineering, which manages infrastructure documentation for projects across the city, has been piloting a deduplication protocol since early 2026, targeting image libraries tied to streetscape and public works projects along the Crenshaw Corridor and in the Olympic Boulevard construction zone near USC. The protocol uses perceptual hashing — a technique that identifies visually similar images even when file names differ — to flag redundant uploads before they're permanently indexed.
The Los Angeles Public Library's digital collections division, based out of the Central Library on West 5th Street, has separately contracted with a records management vendor to audit its Historic Photographs Collection, which contains more than 750,000 digitized images. Staff identified a significant portion of that archive as containing near-duplicate or exact-duplicate entries, according to a 2025 internal review summary made available to The Daily Los Angeles. The library declined to provide a precise figure for how many duplicates were found, citing the ongoing nature of the audit.
The Getty Research Institute in Brentwood, which maintains one of the largest art-documentation digital repositories in the western United States, completed a similar deduplication sweep across its online collections in 2024. The Getty's approach, which relied on AI-assisted image clustering, has since been referenced by at least three other major U.S. cultural institutions as a model worth adapting.
How Other Cities Are Handling the Same Problem
London's Victoria and Albert Museum completed a system-wide image deduplication project across its 1.2 million-object digital catalogue in late 2024, reducing redundant files by roughly 18 percent and cutting cloud storage costs by an estimated £340,000 annually, according to figures the museum published in its 2024-25 annual report. Amsterdam's Rijksmuseum, which opened its Rijksstudio platform to the public in 2013, built deduplication logic directly into its upload architecture from the start — a structural decision Los Angeles's older agencies didn't have the benefit of making.
New York City's Department of Records and Information Services, which manages millions of municipal photographs at its archives in lower Manhattan, is dealing with a version of the same problem. The city allocated $2.1 million in its fiscal year 2026 budget for records modernization, a portion of which covers image deduplication work, according to the New York City Office of Management and Budget's published spending plan.
Los Angeles has not published a comparable standalone budget line for image deduplication work. Costs are distributed across individual departmental IT budgets, making a citywide figure difficult to establish from public records alone.
For residents and researchers, the practical effect shows up in search results. A query for construction photos from the Metro K Line expansion, for example, can return multiple versions of the same image with different file names — a frustration that slows down journalists, historians, and planners alike.
City technology officials have said publicly that a unified digital asset management strategy is part of the longer-term ITA roadmap, though no specific implementation date has been announced. For departments like the Bureau of Engineering and the Public Library, the work is happening now, file by file, ahead of whatever unified system eventually arrives.