LA's Digital Archives Are Riddled With Duplicate Images — And the Numbers Are Staggering
A growing problem in the city's public records, housing databases, and Olympic planning files is costing agencies time and money they can't afford to waste.
A growing problem in the city's public records, housing databases, and Olympic planning files is costing agencies time and money they can't afford to waste.

Los Angeles city agencies collectively manage tens of millions of digital image files across dozens of databases, and a significant share of them are exact or near-exact duplicates — a data hygiene crisis that IT auditors, records managers, and project planners say is quietly inflating storage costs and slowing down critical work ahead of the 2028 Summer Olympics.
The issue has sharpened focus this year because of the sheer scale of parallel digitization efforts underway across the city. Mayor Karen Bass's housing emergency declaration triggered an accelerated push to digitize permitting documents, site surveys, and inspection photographs across the Los Angeles Housing Department. The Los Angeles Homeless Services Authority has been building out its Homeless Management Information System with field-collected images. The LA 2028 organizing committee is assembling venue documentation packages. Each effort, run by a separate team with separate storage infrastructure, has been uploading imagery with minimal deduplication protocol.
Storage analysts who work with municipal governments estimate that duplicate image files typically account for between 20 and 40 percent of total image storage in large public-sector databases — a range that, applied to a city the size of Los Angeles, translates to significant wasted expenditure. The city's Information Technology Agency budget for fiscal year 2025-26 was set at roughly $180 million, according to the city's published budget documents. Even modest redundancy in that infrastructure carries a measurable price tag.
Within the Los Angeles Department of Building and Safety, permit files tied to the post-January 2025 wildfire rebuilding effort in the Palisades and Altadena corridors have been particularly prone to duplication. Inspectors upload photos from mobile devices in the field; supervisors sometimes re-upload the same images through desktop portals; contractors submit overlapping documentation packages. The result is file trees with three, four, or five copies of the same JPEG sitting in different subdirectories, all flagged as active records.
The Los Angeles County Office of the Assessor, which maintains a separate but related database of property images for the roughly 2.6 million parcels it tracks countywide, began a deduplication project in late 2024 using perceptual hashing — a technique that identifies visually similar images even when file names and metadata differ. Early internal benchmarks from that project, described in a county technology services report published in March 2025, found that approximately 28 percent of residential exterior photographs in one test dataset were redundant.
The practical consequences extend beyond storage bills. At LA Metro, which is managing infrastructure documentation for new and upgraded transit lines feeding into Olympic venues like SoFi Stadium in Inglewood and the LA Memorial Coliseum in Exposition Park, duplicate images create version-control problems. Engineers pulling site photographs for construction review can't always confirm whether they have the most current image or an older copy that was re-uploaded.
For the housing emergency response, the stakes are more immediate. The Los Angeles Housing Department's Rent Stabilization division uses photographic evidence in tenant complaint cases. Duplicate or mislabeled images have, in some instances, complicated case timelines — a problem acknowledged in a city controller's office performance audit released in February 2026, which flagged records management inconsistencies in the department's digital filing systems without specifying image duplication as the sole cause.
Deduplication software licenses for enterprise-scale municipal use typically run between $50,000 and $250,000 annually depending on dataset size and vendor, according to published pricing from vendors including Veritas and Aparavi. Several Los Angeles agencies are currently evaluating procurement options, according to city contract filings posted on the Controller's Open Data portal.
The city's Information Technology Agency is expected to present a consolidated digital asset management framework to the City Council's Budget and Finance Committee before the end of calendar year 2026. Agencies hoping to bring their image databases into compliance before the Olympics will need procurement decisions finalized by early 2027 at the latest — leaving roughly six months to act before construction documentation timelines become too compressed to allow a full audit and cleanup pass.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Los Angeles
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News


