The Canadian Research Knowledge Network (CRKN) and Library and Archives Canada (LAC) have partnered on a pilot project to improve access to LAC’s materials in the Héritage collection through Intelligent Character Recognition (ICR). The pilot project will process a subset of RG 10 collection, “Records relating to Indian Affairs,” with Transkribus ICR software developed by READ-COOP. This project will improve access to a highly used set of records and begin the journey towards our long-term aspiration to make the Héritage collection full-text searchable.
ICR uses artificial intelligence models to identify characters in both digitized handwritten and printed text. Aware of the need to improve access to the Héritage collection, which contains approximately 60% handwritten text, CRKN staff identified the Transkribus ICR software as the best option for launching this pilot project. Initial tests showed that Transkribus was able to transcribe a sample of handwritten text with an error rate of just 5–7%, which can be improved through training.
CRKN and LAC collaboratively selected the RG 10 sub-collection “Records relating to Indian Affairs” for the pilot project. This choice reflects our organizations’ commitment to making Indigenous records more accessible, as well as the high usage rate of the material, particularly among claims researchers working for Indigenous communities. In recognition that the collection contains sensitive materials, CRKN and LAC will undertake an assessment of the transcribed content to ensure that we handle its access appropriately.
“Making the critical materials in the Héritage collection more discoverable to researchers has long been a goal of CRKN. We are thrilled to begin working towards this goal alongside LAC and READ-COOP,” said Ken Hernden, Preservation and Access Committee Chair, and University Archivist and Associate University Librarian at Queen’s University. “This pilot project will not only strengthen the functionality of Héritage, but will allow us to gain a deeper understanding of the materials in the RG 10 collection and explore ways to provide access to these materials for researchers working with Indigenous communities.”
“LAC is committed to reconciliation and to building strong, diverse and ongoing partnerships that serve LAC's 2030 Vision to make our collections better known and more accessible to a broader, more diverse audience. By working with CRKN, this innovative collaboration will see two million handwritten pages transcribed and made keyword searchable. Researchers all over the country will now be able to search ‘records relating to Indian Affairs’ within the RG 10 collection,” said Johanna Smith, Director General, Outreach and Engagement, Library and Archives Canada.
CRKN has become a member of READ-COOP as part of this project and will benefit from lower fees on ICR processing while supporting the development of handwritten text recognition technology. As this technology and other artificial intelligence-based tools develop, CRKN looks forward to implementing them to enhance our capacity, improve the research process for users, and support our infrastructure.
CRKN would like to thank the CRKN Preservation and Access Committee, the National Claims Research Directors, and Maxime Gohier, Professor at Université du Québec à Rimouski for their assistance in launching this pilot project. CRKN and LAC have engaged with the National Centre for Truth and Reconciliation about this project and are committed to sharing the outcomes of the project with them. CRKN and LAC have also engaged with the Union of B.C. Indian Chiefs (UBIC) and are looking forward to connecting with other groups working with this material.
For more information on the Transkribus pilot project, please contact:
Francesca Brzezicki
Heritage Engagement Officer, CRKN
fbrzezicki@crkn.ca
Media Relations
Library and Archives Canada
819-994-4589
media@bac-lac.gc.ca