Los Angeles city agencies collectively store an estimated 40 to 60 percent of their digital image libraries as duplicate or near-duplicate files, according to digital asset management audits conducted across several municipal departments in the past 18 months. The redundancy is expensive, embarrassing, and — with the 2028 Olympics infrastructure buildout generating new visual documentation daily — getting worse.
The problem is not unique to government. But in a city spending billions on venue upgrades, homeless shelter construction, and post-wildfire recovery documentation, duplicated image data represents a measurable drain on storage budgets and staff time that administrators are only beginning to quantify. The city of Los Angeles spent approximately $4.2 million on cloud storage contracts for municipal digital assets in fiscal year 2025-26, a figure that digital records managers say could drop significantly with systematic deduplication.
Where the Numbers Come From
The Los Angeles Public Library's digital collections unit, headquartered on West 5th Street in Downtown, began a full image audit in March 2026. Staff catalogers found that roughly one in three image files in the system's historical photography archive had at least one duplicate — sometimes filed under different metadata tags, sometimes uploaded twice during separate digitization campaigns years apart. The library's digital archive now holds more than 1.1 million image assets, and the audit is about 60 percent complete.
The Los Angeles County Department of Arts and Culture, which administers the county's public art registry and documents installations across more than 88 municipalities, ran a similar internal review in late 2025. Program staff identified over 14,000 flagged duplicate image pairs across roughly 200,000 total files — a duplication rate of around seven percent, lower than city agencies but still representing tens of thousands of unnecessary files consuming server capacity.
The Metropolitan Transportation Authority tells a different story. SoFi Stadium aside, the MTA's construction documentation teams — generating visual records for the Wilshire/Crenshaw and East San Fernando Valley transit lines — have been uploading progress photos from multiple contractors using overlapping naming conventions since 2023. Without a unified asset management platform, the same site photograph can exist under four or five different file names across three separate shared drives. Exact duplication counts across the MTA's full project portfolio have not been publicly released, but a digital records framework proposal submitted to the MTA board in February 2026 described the problem as systemic.
The Olympic Clock Is Already Running
The 2028 Games are adding urgency that budget conversations alone couldn't. LA28, the organizing committee based in Century City, is coordinating with venues from the Rose Bowl in Pasadena to the Long Beach Arena, and each site generates its own photography workflow. Without standardized deduplication protocols built in from the start, the organizational archive could reach a scale that makes retroactive cleanup prohibitively expensive — a lesson London organizers documented after 2012, when the British Olympic Association reported spending additional six-figure sums sorting and deduplicating its Games archive post-event.
The practical cost per gigabyte is small in isolation. Enterprise cloud storage runs roughly $0.023 per gigabyte per month on standard tiers. But across an institution holding 50 terabytes of image data — not unusual for a major cultural or infrastructure agency — duplicate files in the 40 percent range translate to roughly 20 terabytes of avoidable storage, amounting to around $460 per month, or more than $5,500 per year, before staff labor costs for managing bloated catalogs are factored in.
Software tools for automated duplicate detection — including perceptual hashing programs that catch visually identical images even when file names differ — are available at price points ranging from open-source to enterprise licensing around $8,000 to $15,000 annually for large institutional deployments. Several vendors have been in conversations with both the LA County Library system and the Department of Arts and Culture, though no contracts have been publicly announced.
For agencies, the practical next step is an inventory-first approach: catalog what exists before buying new storage or migrating to new platforms. The LA Public Library's March audit, expected to conclude by September 2026, is being watched by at least three other city departments as a proof-of-concept. If the numbers hold, the case for citywide deduplication standards will write itself.