The current Canadiana infrastructure provides trusted, publicly-available online access to over 65 million pages (150 TB of data and growing) of digitized Canadian historical and cultural information dating from the 16th century to the present. However, this infrastructure is outdated and does not benefit from digital advancements. To address this, CRKN is re-engineering and optimizing the Canadiana infrastructure to enrich access and engagement with the wealth of information held in the Canadiana Collections.
Existing vs. Future Canadiana Infrastructure
Features and Capabilities | Existing Infrastructure | Modernized Infrastructure |
Access content at no cost | yes | yes |
Read metadata online | yes | yes |
Download PDFs | partial | yes |
Bulk download content | partial | yes |
View robust, respectful metadata | partial | yes |
Download metadata | no | yes |
Apply Traditional Knowledge Notices or Labels | no | yes |
Engage with semantically enriched full-text | no | yes |
Integrate with high performance computing systems (e.g., Globus) | no | yes |
Manage content with granular access controls | no | yes |
Query corpus via the IIIF API | no | yes |
Search, read, or download full-text | no | yes |
Undertake intelligent search inclusive of non-textual materials | no | yes |
Canadiana Development Objectives
- API access to the underlying data corpus of 65M images of digitized content (150TB of data and growing) to facilitate machine access and large-scale research
- Integration with Globus service to facilitate content sharing with HPC and ARC systems supported by the Digital Research Alliance and regional compute centres in Canada
Example use cases: As a computer scientist interested in using machine learning and natural language processing to uncover colonial patterns from Canada’s earliest decades to the present, Tai decides to work with the rebuilt Canadiana dataset. Tai downloads the relevant data to their Alliance HPC environment via the Canadiana-Globus integration. Drawing on their experience training AI to annotate full-text documents in novel ways, Tai can then then further refine the already-annotated full-text and contribute improved content back to CRKN for wider community benefit.
Example use cases: Nidal is a digital humanities scholar studying the earliest effects of climate change in Canada. They require access to 19th and mid-20th century solar magnetogram data contained in the Canadiana collections. Thanks to the modernized Canadiana’s robust metadata, combined with API and Globus access to the data corpus, Nidal can download the relevant data and perform programmatic analysis on these datasets that accounts for changes in measurement standards over time. Nidal is then able to identify historical baselines, which, when combined with recent climate data, can help inform current climate change mitigation strategies.
- Enhanced quality of the data corpus
- Transcription: Using AI techniques, images of handwritten text and other hard-to-read content will be transcribed into full-text
- Semantic enrichment: Using AI techniques, the full-text will be enriched by identifying and tagging proper nouns and geographical areas etc.
Example use case: As a Cultural Studies professor studying Black intellectual history in Canada, Cameron requires robust search capabilities to uncover historical details that have traditionally remained missing from metadata and finding aids pertaining to Canada’s past. Making use of the rebuilt platform’s intelligent search capabilities, Cameron can query the collection in deeply nuanced ways, save and return to searches, and reveal previously hidden information. Making use of Canadiana’s IIIF (International Image Interoperability Framework) features, Cameron can then create a digital exhibition of information pertaining to Black intellectual history and link this content with other relevant archives and collections.
- Improved engagement and usability of the corpus
- Embedded tools to enhance interaction with the data corpus
- Interoperability with other linked open datasets, Digital Humanities tools, and repositories
Example use case: As a digital humanities scholar studying how social movements are formed through and against government discourses, Rowan needs to be able to aggregate information in the Canadiana collections with other Canadian research collections. Thanks to the rebuilt Canadiana’s linked data capabilities, Rowan can analyze details about social movements and social identity formation across diverse datasets. In Rowan’s open access publication on the findings, Rowan is also able to capitalize on the rebuilt Canadiana’s linked data capacities to link to the primary sources they cite in their paper.
- Content access and management capacity
- Command-line and GUI accesses for a wider range of users to enable member institutions to add research collections to Canadiana
- Traditional Knowledge Notices and Labels to support principles and practices of data sovereignty for First Nations, Inuit and Métis peoples
Example use case: Awena is a member of a First Nations community and is searching for historical documentation of their Nation’s cultural practices. Upon locating materials in Canadiana pertaining to their specific First Nation, Awena collaborates with their community to identify and set appropriate protocols for access to these materials. Making use of the rebuilt Canadiana’s authorization functionality, Awena implements the community-defined labels and protocols for the materials.
Example use case: Quinn is a librarian responsible for historical digital collections at a CRKN member institution. They have a 15+ year old digitized newspaper collection running on an obsolete platform with no migration path, limited metadata, and poor OCR. Using the new API/collection management tools, with approval from CRKN staff, Quinn can easily upload the collection into Canadiana. The collection is now part of the Canadiana corpus and benefits from new infrastructure features as well as AI-powered improved transcription and metadata enrichment. It also benefits from permanent preservation and from the access API protocols. The library can incorporate it into its discovery systems and/or current digital collections platforms.
Open Science Infrastructure for Canad(ian): Digital Collections of the Future (DCoF) Project
From climate change to socio-economic disparity, the problems Canadians face today originate in our shared history. The potential solutions to these challenges may lie there too, but only if researchers can access, analyze, and reveal the forces that have shaped contemporary Canadian life.
To support the innovative research that will be enabled by the modernized Canadiana infrastructure, CRKN is partnering with its members on a Canada Foundation for Innovation (CFI) – Innovation Fund application. Led by the University of Ottawa and Dr. Constance Crompton, Canada Research Chair in Digital Humanities, this CFI project will enable world-class research on Canada’s past to empower a better future for all Canadians.