Digging Deeper into Document Repositories

Objectives

Students will learn methods for digging deeper into more examples and types of digital document and record repositories.

More Repositories

There are additional types of repositories that will be of value.

University Libraries

Libraries can often get you past paywalls for journals and digital books. They may also have research help guides for particular subjects. Often universities have their own collections of digitized primary sources. The documents in each collection may be partially unique.

Historical Societies

Historical societies have local, original historical sources, often digitized at least part of their collection.

Community Libraries

Community libraries often have their own sources of local original materials, and their collections often contain some exotic original sources. Community libraries can also be a way to get past paywalls or obtain books from university libraries (such as through the Link+ service). The collections and services of community libraries can vary tremendously.

Journals

Journals typically contain secondary sources, but may also have partial or full reproductions of original sources.

Newspapers

Some newspapers keep files of historical information on topics of interest. They may keep images of old issues online or on microfilm.

Museums

Obviously, museums often have extensive collections of original sources (albeit often in artifact form). What is less known s that many museums also operate as research institutes and may have extensive libraries and subject files.

Activities

Students will learn about databases. They will review spreadsheets as a metaphor for data tables and relational databases.
Students will create a database using a simple tool such as SQLite and learn how to perform queries.
Students will perform queries on actual historical databases.
Students will learn how to gain enhanced access to online repositories of historical documents and information.

Text parsing and processing

One you find relevant documents, you might need a faster way to search for content of particular interest than reading everything. Text parsing is a way to search for particular terms or fragments. It can get much more sophisticated than simply typing a search term. Parsing involves searching and sometimes changing text in an automated manner. For example, one may wish to search a collection of ancient documents for a particular person’s name, for a certain period, while omitting another person’s name.

Parsing requires the text to be in a computer readable form. Optical Character Recognition (OCR) software can concert images containing text into searchable text documents. There are many considerations required in parsing. For example, are there different spellings of that person’s name? Is the capitalization of the name inconsistent? Is tat person known my nicknames or abbreviations?

There are many tools for parsing. The most common is the simple find, or find & replace, command in word processors, text editors, and many other applications. So you do not necessarily need to write your own program for this. However, you may have to become skilled at writing expressions to find exactly what you want.

Resources

SQLite database system to create, edit and store tables and records.

Leveling Up

Students will write a brief PERL program to parse a sample document.

« Document Preservation and Retrieval | COURSE | Using Image Processing to Gain Superhero Vision »

Digital History