Online genealogical treasure trove was built using advanced machine learning technology and is the smartest of its kind
TEL AVIV, Israel & LEHI, Utah--(BUSINESS WIRE)--#familyhistory--MyHeritage, the leading global service for discovering your past and empowering your future, announced today the publication of a huge collection of historical U.S. city directories that has been two years in the making. The collection was produced by MyHeritage from 25,000 public U.S. city directories published between 1860 and 1960. It comprises 545 million aggregated records that have been automatically consolidated from 1.3 billion records. This addition grows the total size of MyHeritage’s historical record database to 11.9 billion records.
MyHeritage teams applied innovative technologies to produce this collection and make it as useful and easy-to-use as possible. The city directories in this collection were published by cities and towns all over the U.S., and each directory is formatted differently. To overcome the formatting differences and unify the structures, MyHeritage corrected errors in the Optical Character Recognition of the scanned directory pages, and then employed several advanced technologies, including Record Extraction, Name Entity Recognition, and Conditional Random Fields to parse the data. By training a machine learning model how to parse raw free-text records into names, occupations, and addresses, the company produced a searchable, structured index of valuable historical information.
As an important resource for family history research, city directories can provide fascinating new discoveries for anyone exploring their family history in mid-19th to mid-20th century America. The records contain valuable insights on everyday American life spanning the time period from the Civil War to the Civil Rights Movement. Cities in the United States have been producing and distributing directories since the 1700s, providing an up-to-date resource to help residents find and contact local individuals and businesses. The city directories provide a wealth of information regarding family life during those years, listing names, residences, occupations, and relationships between individuals. Thanks to their exceptional level of detail, city directories can also provide a viable alternative to U.S. census records during non-census years, and can fill in the gaps in situations where census records were lost or destroyed. In 1921, a fire at the U.S. Department of Commerce destroyed most of the records from the 1890 census. Despite the loss of the records in the fire, much of the data can be reconstructed using the 1890 city directories on MyHeritage, which consist of directory books from 344 cities across the country, including 88 of the 100 most populated cities during that year.
“We are harnessing new technologies to make family history research more accessible than ever before,” said Tal Erlichman, Director of Product Management at MyHeritage. “The use of machine learning to process the city directory records highlights the major strides MyHeritage is making in digitizing global historical records.”
MyHeritage automatically consolidated multiple entries for the same individual into one robust record that includes data from all the years an individual lived at the same address. This makes it easy to track changing life circumstances over the years. Users may be able to see more easily when their ancestors changed professions or got married, divorced, or were widowed — and MyHeritage automatically inferred approximate dates for such life events. Inferred dates contribute to improved matching between family trees and historical records on MyHeritage.
MyHeritage is currently indexing thousands of additional U.S. city directories that will be added to the collection in the coming months. This addition will include directories dating back to the late 18th century, as well as a large and unique set of directories from the late 20th century.
The online collection of U.S. city directories is now available on SuperSearch™, MyHeritage’s search engine for historical records. Searching the collection is free. A subscription is required to view the full records and to access Record Matches.
Search the new U.S. City Directories collection.
MyHeritage is the leading global discovery platform for exploring family history and gaining valuable health insights. With billions of historical records and family tree profiles, and with sophisticated matching technologies that work across all its assets, MyHeritage allows users to discover their past and empower their future. Launched in 2016, MyHeritage DNA has become one of the world’s largest consumer DNA databases, with 3.9 million people. As the world’s leading global service that combines family history and DNA testing for genealogy and health, MyHeritage is uniquely positioned to offer users a meaningful discovery experience that unites their past, present, and future. Available in 42 languages, MyHeritage is the most popular DNA test and family history service in Europe. www.myheritage.com
Director of PR