Digging Deeper into Document Repositories
By Mark Ciotola
First published on February 27, 2019
- Students will learn methods for digging deeper into more examples and types of digital document and record repositories.
There are additional types of repositories that will be of value.
Libraries can often get you past paywalls for journals and digital books. They may also have research help guides for particular subjects. Often universities have their own collections of digitized primary sources. The documents in each collection may be partially unique.
Historical societies have local, original historical sources, often digitized at least part of their collection.
Community libraries often have their own sources of local original materials, and their collections often contain some exotic original sources. Community libraries can also be a way to get past paywalls or obtain books from university libraries (such as through the Link+ service). The collections and services of community libraries can vary tremendously.
Journals typically contain secondary sources, but may also have partial or full reproductions of original sources.
Some newspapers keep files of historical information on topics of interest. They may keep images of old issues online or on microfilm.
Obviously, museums often have extensive collections of original sources (albeit often in artifact form). What is less known s that many museums also operate as research institutes and may have extensive libraries and subject files.
- Students will learn about databases. They will review spreadsheets as a metaphor for data tables and relational databases.
- Students will create a database using a simple tool such as SQLite and learn how to perform queries.
- Students will perform queries on actual historical databases.
- Students will learn how to gain enhanced access to online repositories of historical documents and information.
Text parsing and processing
One you find relevant documents, you might need a faster way to search for content of particular interest than reading everything. Text parsing is a way to search for particular terms or fragments. It can get much more sophisticated than simply typing a search term. Parsing involves searching and sometimes changing text in an automated manner. For example, one may wish to search a collection of ancient documents for a particular person’s name, for a certain period, while omitting another person’s name.
Parsing requires the text to be in a computer readable form. Optical Character Recognition (OCR) software can concert images containing text into searchable text documents. There are many considerations required in parsing. For example, are there different spellings of that person’s name? Is the capitalization of the name inconsistent? Is tat person known my nicknames or abbreviations?
There are many tools for parsing. The most common is the simple find, or find & replace, command in word processors, text editors, and many other applications. So you do not necessarily need to write your own program for this. However, you may have to become skilled at writing expressions to find exactly what you want.
- SQLite database system to create, edit and store tables and records.
- Students will write a brief PERL program to parse a sample document.
« Document Preservation and Retrieval | COURSE | Using Image Processing to Gain Superhero Vision »