Best Practices for Digitizing and Describing Cultural Heritage Materials

From AUCWiki
Revision as of 14:53, 25 February 2013 by Cfrunyon (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Contents

Scope

Best Practices for Digitizing and Describing Cultural Heritage Materials outlines digital capture specifications and the application of qualified Dublin Core metadata elements to describe digital content for inclusion in the the Rare Books and Special Collections Digital Library (RBSCDL). [1]

The RBSCDL, established in Fall 2011, provides a platform for Rare Books and Special Collections Library collection managers to publish born digital and digital surrogates of cultural heritage materials. Resources from the University Archives, Rare and Special Books, Archives and Manuscripts Collections, Oral History Interviews, Regional Architecture Collections, and Photography Collections may be included in the Rare Books and Special Collections Digital Library. Further, the digital library supports collaborative digitization and digital collection projects. The library’s collection policies and content are managed by the Rare Books and Special Collections Library and technical support is supplied by University Academic Computing Technologies. The RBSCDL is powered by CONTENTdm proprietary software developed by OCLC. [2] [3]

Selection

The Rare Books and Special Collections Digital Library relies on a number of criterion to prioritize digitization projects that meet the criteria outlines in the Collection Development Policy. We take into account the following:

  • Researcher and instructor requests
  • Popularity of the material
  • Fragility of the resource
  • Rarity and uniqueness of the collection
  • Availability of digitization equipment and personnel

Many of the digital collections housed in the Rare Books and Special Collections Digital Library are representative samples of the full contents of archival collections. Criteria for inclusion in the Rare Books and Special Collections Digital Library include the following:

  • Topical relevance of the material
  • Uniqueness of the resource
  • Visual or aural clarity of item

All digital assets are reviewed and selected for inclusion in the Rare Books and Special Collections Digital Library by Rare Books and Special Collections Library personnel in accordance with United States Copyright Act of 1976.

Preservation

The condition of all materials is considered prior to digitization. Treatments are undertaken by Conservation Services staff as required.

Capture Specifications

Capture specifications are based on the following:

  • American Society of Media Photographers Digital Photography Best Practices and Workflow [4]
  • NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials [5]
  • Technical Guidelines for Digitizing Cultural Heritage Materials published by the Federal Agencies Digitization Guidelines Initiative Still Image Working Group [6]
  • UVa Library Internal Production Digitization Standards [7]

These capture specifications produce preservation master image and sound assets that are uncompressed and uncorrected. Delivery master files are those assets that have been color-corrected, cropped, edited, or otherwise altered are preserved along with the preservation master files. Delivery files are derived from the delivery master files. Delivery files may be compressed or otherwise altered to promote access.

Physical Objects

Photographing specifications for 3 dimensional objects, including fragile books and art, are based on the Technical Guidelines for Digitizing Cultural Heritage Materials published by the Federal Agencies Digitization Guidelines Initiative Still Image Working Group. [6]

Preservation Master
Resolution Bit Depth Mode Compression Format Edits
16 megapixel 24 RGB Lossy image/jpg None
Delivery Master
Resolution Bit Depth Mode Compression Format Edits
16 megapixel 24 RGB Lossy image/jpg Cropping, color correction
Delivery
Resolution Bit Depth Mode Compression Format Edits
16 megapixel 24 RGB Lossy image/jpg None

Sound

Capture specifications for sound, including interviews and other audio, are based on the UVa Library Internal Production Digitization Standards. [7]

Preservation Master
Resolution Sample Rate Compression Format Edits
44.1 kHz 16 bps None audio/wav None
Delivery Master
Resolution Sample Rate Compression Format Edits
44.1 kHz 16 bps None audio/wav Cropping, normalizing sound wave
Delivery
Resolution Sample Rate Compression Format Edits
 ????? kHz 128 kbps Lossy audio/mpeg (.mp3) None

Moving Images

Capture specifications for moving images, including films and videos, are based on best practices as they become apparent. The professional digital preservation community has not yet achieved a consensus on the best wrappers for storing uncompressed video bit streams. [8]

Preservation Master
Compression Format Example Edits
None video/mpeg Film and video None
None AVCHD Video File and CPI Interview, event None
Delivery Master
Compression Format Example Edits
None video/mpeg Film and video Cutting
Delivery
Compression Format Example Edits
Lossy video/mp4 Film and video H.264 encoding

Still Images

Scanning specifications for 2 dimensional objects, including photos, slides, film, transparencies, negatives, books, papers, records, and visual art, are based on the Technical Guidelines for Digitizing Cultural Heritage Materials published by the Federal Agencies Digitization Guidelines Initiative Still Image Working Group. [6]

Preservation Master
Size Resolution Compression Dimensions Bit Depth Mode Format Edits
35 mm 4000 ppi None Sized to match original, no magnification or reduction. 24 RGB image/tiff None
4 x 5 in.; 10 x 13 cm.; 100 x 130mm. 1200 ppi None Sized to match original, no magnification or reduction. 24 RGB image/tiff None
8 x 10 in.; 205 x 50mm. 800 ppi None Sized to match original, no magnification or reduction. 24 RGB image/tiff None
Letter; Legal; A4; A3 600 ppi None Sized to match original, no magnification or reduction. 24 RGB image/tiff None
Delivery Master
Size Resolution Compression Dimensions Bit Depth Mode Format Edits
35 mm 4000 ppi None Sized to match original, no magnification or reduction. 24 RGB image/tiff Cropping, color correction, image rotation
4 x 5 in.; 10 x 13 cm.; 100 x 130mm. 1200 ppi None Sized to match original, no magnification or reduction. 24 RGB image/tiff Cropping, color correction, image rotation
8 x 10 in.; 205 x 50mm. 800 ppi None Sized to match original, no magnification or reduction. 24 RGB image/tiff Cropping, color correction, image rotation
Letter; Legal; A4; A3 600 ppi None Sized to match original, no magnification or reduction. 24 RGB image/tiff Cropping, color correction, image rotation
Delivery
Size Resolution Compression Dimensions Bit Depth Mode Format Edits
35 mm 4000 ppi Lossless Sized to match original, no magnification or reduction. 24 RGB image/jp2 None
4 x 5 in.; 100 x 130mm. 1200 ppi Lossless Sized to match original, no magnification or reduction. 24 RGB image/jp2, application/pdf [9] OCR [9]
8 x 10 in.; 205 x 50mm. 800 ppi Lossless Sized to match original, no magnification or reduction. 24 RGB image/jp2, application/pdf [9] OCR [9]
Letter; Legal; A4; A3 600 ppi Lossless Sized to match original, no magnification or reduction. 24 RGB image/jp2, application/pdf [9] OCR [9]

File Names

Each digital image is given a unique file name based on the following elements:

  • Collection ID/Call Number
    • -Series
      • -Subseries
        • -Folder
          • -Item Number
            • _Page Number/Description

Use an underscore only when digitizing a compound object. For example, an item with pages, like a book, or a two sided photograph. Separate all other elements with a hyphen.

Single Objects

Use for any item you want to display in a single record, like photographs, maps, architectural drawings, oral histories, and films.

Photographs
Collection ID - Series Number - Subseries Number - Item Number

Examples of collections that use this file naming schema:

Architectural Records
Collection ID - Series Number - Subseries Number - Item Number

Examples of collections that use this file naming schema:

Compound Objects

Use for two or more objects you want to display in a single record, like books, multi-page documents, and 2 sided images like postcards.

Books
 Call Number - Item Number _Page Number/Description

Examples of collections that use this file naming schema:

Postcards
 Collection ID - Series Number - Subseries Number - Item Number - Side Number _ Side Description

Examples of collections that use this file naming schema:

Records
 Collection ID - Record Group ID - Record Subgroup ID - Series Number - Subseries Number - Item Number/Date _Page Number

Examples of collections that use this file naming schema:

Directory Structure

All Rare Books and Special Collections Digital Library materials are stored on the RBSCL server. In progress collections are stored on the RBSCLprojects server. File are stored according to the directory structure outlined below. In the case of collections that do not contain compound objects, omit that level in the hierarchy.

  • Collection
    • Preservation Master
      • Compound Object
        • Item
    • Delivery Master
      • Compound Object
        • Item
    • Delivery
      • Compound Object
        • Item

Quality Assurance

It is important during the digitization process to verify the quality of each captured item.

Still Images

All scans must be checked for the following:

Visual clarity
The image must be clear. Common issues with clarity include moving the scanner or image during the capture process, resulting in a blurry scan. Another common mistake is scanning an item with the scanner lid up. All images must be checked for these and like issues.
Accuracy
Each scan must be checked against the original item to ensure that the photograph, book, drawing, map, or other item has been fully captured. Additionally, the image's file name must be checked for accuracy. If the file name is incorrect, it should be corrected before proceeding

Scans that have been checked for visual and accuracy should be entered into a spreadsheet to indicate that the item has been approved and is ready to be edited.

Metadata

Description

The descriptive metadata scheme used for describing cultural heritage materials is derived from the fifteen terms of the Dublin Core Metadata Element Set, Version 1.1 and their refining properties defined in DCMI Metadata Terms. Best Practices for CONTENTdm and other OAI-PMH compliant Repositories: Creating Shareable Metadata, authored by the CONTENTdm Metadata Working Group was also consulted. [10] [11] [12]

Name/Label Element [13] Scope Note Controlled Vocabulary? [14] Required?
Title Title
  • Capitalize the first word of the title and enter in lowercase all other words except proper nouns.
  • Do not include initial articles, unless an integral part of the title and/or provided by the creator as the true title. This allows the system to index by title alphabetically. Optionally, you may add the title containing the article to the Alternative Title field.
  • If the item does not have a title, such as an un-cataloged photograph, assign one that is brief but descriptive. Bring out the unique qualities of an item.
Alternative Title Title-Alternative
  • Use for other titles the resource may be known by, alternate spellings, transliterations, MARC title statement, etc.
  • Capitalize the first word of the title, and enter in lowercase all other words except proper nouns.
  • Do not move initial articles to the end.
Creator Creator
  • Enter the name(s) of entities responsible for the creation of the item.
  • Look up names in the Library of Congress Name Authority File or the Getty Union List of Artist Names. [15] [16]
  • If an authorized name is not found in the Library of Congress Name Authority File or the Getty Union List of Artist Names, create one based on RDA rules. [17]
  • Use MARC Relators to use generate a different label for the field. For a photograph collection, you might label the field Photographer. [15]
  • List multiple entries separated by a semicolon in alphabetical order.
Contributor Contributor
  • Enter the name(s) of entities that made significant contributions to the item.
  • Look up names in the Library of Congress Name Authority File or the Getty Union List of Artist Names. [15] [16]
  • If an authorized name is not found in the Library of Congress Name Authority File or the Getty Union List of Artist Names, create one based on RDA rules. [17]
  • Use MARC Relators to use generate a different label for the field. For a collection of content acquired by another party, you might use Collector. [15]
  • List multiple entries separated by a semicolon in alphabetical order.
Advisor Contributor
  • Enter the name(s) of the theses advisors.
  • Look up names in the Library of Congress Name Authority File or the Getty Union List of Artist Names. [15] [16]
  • If an authorized name is not found in the Library of Congress Name Authority File or the Getty Union List of Artist Names, create one based on RDA rules. [17]
  • Use MARC Relators to use generate a different label for the field. For an advisor or professor, you might use Thesis Advisor.[15]
  • List multiple entries separated by a semicolon in alphabetical order.
Department Contributor
  • Enter the name(s) of sponsoring department at AUC.
  • Look up names in the Library of Congress Name Authority File or the Getty Union List of Artist Names. [15] [16]
  • If an authorized name is not found in the Library of Congress Name Authority File or the Getty Union List of Artist Names, create one based on RDA rules. [17]
  • Use MARC Relators to use generate a different label for the field. For an advisor or professor, you might use Thesis Advisor. [15]
  • List multiple entries separated by a semicolon in alphabetical order.
Editor Contributor
  • Enter the name(s) of the editor(s).
  • Look up names in the Library of Congress Name Authority File or the Getty Union List of Artist Names. [15] [16]
  • If an authorized name is not found in the Library of Congress Name Authority File[15] or the Getty Union List of Artist Names, create one based on RDA rules. [17]
  • Use MARC Relatorsto use generate a different label for the field. For an advisor or professor, you might use Thesis Advisor. [15]
  • List multiple entries separated by a semicolon in alphabetical order.
Publisher Publisher
  • Use to record the publisher
  • Look up names in the Library of Congress Name Authority File. [15]
  • If an authorized name is not found in the Library of Congress Name Authority File, create one based on RDA rules. [17]
  • List multiple entries separated by a semicolon in alphabetical order.
Abstract Description-Abstract
  • Enter the abstract or similar information (introduction, overview, scope, etc.) as transcribed directly from the resource.
Table of Contents Description-Table of Contents
  • Enter the table of contents as transcribed directly from the resource.
  • Separate Table of Contents items with a semicolon.
Description Description
  • Enter supplemental descriptive information such as a free text summary.
  • End field with a period.
  • Avoid simply restating the title.
  • When transcribed, enter source of information using one of or a combination of the following options:
    • Photographer's description: ____.
    • Translation of photographer's description: _____.
    • Text in image: _____.
    • Translation of text in image: _____.
  • Enter the names of people in pictures using the following: Pictured from left to right: _____.
    • When the writing is unintelligible, enter illegible between brackets, e.g. [illegible].
  • If an exact date or date range is unknown for an item, you may enter circa dates, and undated information in the description, e.g. This photograph is undated. This photograph was taken circa 1990. Be sure to enter approximate date ranges in the Date Created field. For example, if a photograph was taken circa 1924, enter 1920-1930 in the Date Created field. Be as specific as possible when entering date ranges.
Name Subject
  • Use to record a person, family or corporate body that is the subject of the item being described. For example, enter the names of people pictured in a photograph in this field. Do not include the name of the creator unless he/she is also the subject of the item.
  • Look up names in the Library of Congress Name Authority File or the Getty Union List of Artist Names. [15] [16]
  • If an authorized name is not found in the Library of Congress Name Authority File or the Getty Union List of Artist Names, create one based on RDA rules. [17]
  • List multiple entries separated by a semicolon in alphabetical order.
Topic Subject
  • Library of Congress Subject Headings should be used to record specific topical information about the image and its context to the collection. Additionally, when describing items that are graphic in nature, such as photographs, use Subjects. [15]
  • Include terms for places such as in this field only if the object being described relates directly to the location.
  • Do not include names. Record names in the Names field.
  • Exclude geographic headings and subdivisions if that information can be appropriately included in the Location (coverage.spatial) field. Exceptions exist for items such as maps and newspapers, in cases where the main topic of the item is a geographic place.
  • List multiple entries separated by a semicolon in alphabetical order.
Subject Subject
  • Use this field to inventory the items depicted in an image, e.g. Protest posters, Monuments & memorials, etc.
  • Look up terms in the Library of Congress Thesaurus for Graphic Materials. Be sure to use subject terms, not genre/format terms. [18]
  • List multiple entries separated by a semicolon in alphabetical order.
Architectural Detail Subject
  • Look up terms in the Getty Art and Architecture Thesaurus. [19]
  • List multiple entries separated by a semicolon in alphabetical order.
Keyword Subject
  • Use this field to enter creator/contributor/donor-supplied keywords.
  • List multiple entries separated by a semicolon in alphabetical order.
Location Coverage-Spatial
  • Enter each place as its own term, in order from specific to general.
  • Look up terms in the Getty Thesaurus of Geographic Names. [20]
  • Do not include “World” unless it specifically is world-related, like world maps.
  • Only include names of geographic places like cities, counties, countries, etc., and omit named places such as Tahrir Square or Matḥaf al-Miṣrī. Record named places in the Topic element instead.
  • List multiple entries separated by a semicolon in order of decreasing specificity.
Date Created Date-Created
  • Format the date according ISO 8601 standard:
    • YYYY-MM-DD
    • YYYY-MM
    • YYYY
    • YYYY-YYYY
  • If the date is entirely unknown, do not enter text like “undated.” Instead, narrow the date to a date range, e.g. 1900-1999.
  • List multiple entries separated by a semicolon in chronological order.
Date Submitted Date-Submitted
  • Use for collections with community-submitted items, e.g. University on the Square: Documenting Egypt's 21st Century Revolution.
  • Format the date according ISO 8601 standard.
    • YYYY-MM-DD
    • YYYY-MM
    • YYYY
    • YYYY-YYYY
  • If the date is entirely unknown, do not enter text like “undated.” Instead, narrow the date to a date range, e.g. 1900-1999.
  • List multiple entries separated by a semicolon in chronological order.
Type Type
  • Enter the characteristic and general type of content of the resource.
  • Look up terms using the Dublin Core Metadata Initiative Type Vocabulary. [21]
Format Format
  • Enter Internet Media Type. [22]
Extent Format-Extent
  • Enter dimensions of original in W x H unit or measurement, using the metric system (e.g. mm. or cm.) or digital image measurement (e.g. px.).
  • Abbreviate units, such as px. or mm. End each unit of measurement with a period.
  • Include moving image and sound recording run times in HH:MM:SS format, using leading zeros when necessary.
  • Include the number of pages when applicable, in ### p. format. End p with a period.
  • List multiple entries separated by a semicolon.
Medium Format-Medium
  • Enter original format or medium information in this field.
  • Look up terms in the Getty Art and Architecture Thesaurus. [19]
  • List multiple entries separated by a semicolon.
Language Language
  • Enter the ISO 639-1 Language Code for items with linguistic content. [15]
  • Leave this field blank if there is no linguistic content.
  • List multiple entries separated by a semicolon in alphabetical order.
Identifier Identifier
  • Enter a unique identifier.
  • The identifier should be the same as the digital file name and correspond with the physical collection.
Original Identifier Identifier
  • Enter the original file name if different from identifier.
Collection Relation
  • Enter the title of the digital collection to which the resource belongs.
Is Part Of Relation-Is Part Of
  • Enter the physical container.
Source Source
  • Enter the file, series, collection, repository, and institution with which the resource is related.
  • List multiple entries separated by a semicolon in order of decreasing specificity.
License Rights-License
  • Enter copyright information.
Access Rights Rights-Access Rights
  • Enter a statement about requesting permissions or reproductions.
Acknowledgements Description
  • Include granting and funding agencies, donors, etc.

Preservation

Name/Label Element [13] Scope Note Controlled Vocabulary? [14] Required?
File Name None
  • Name of the file automatically concatenated by combining the identifier with the file extension.
  • This is a hidden field will be added at the end of the project by the Digital Collections Archivist.
File Extension None
  • Enter the file extension of the delivery file.
  • This is a hidden field.

Workflow

Name/Label Element Scope Note Required?
Date Entered None
  • Enter the date, in YYYY-MM-DD format, that the metadata record is created.
Date Reviewed-Student None
  • Enter the date, in YYYY-MM-DD format, that the metadata record is reviewed by a student employee.
Date Reviewed-Curator None
  • Enter the date, in YYYY-MM-DD format, that the metadata record is reviewed by an RBSCL collection curator.
Date Selected None
  • Enter the date, in YYYY-MM-DD format, that the item is selected for inclusion in the digital collection.
Date Reviewed-Archivist None
  • Enter the date, in YYYY-MM-DD format, that the metadata record is reviewed by the RBSCL Digital Collections Archivist.
Date Uploaded None
  • Enter the date, in YYYY-MM-DD format, that the metadata record and digital files are uploaded to the RBSCDL.

Properties

Configuration of CONTENTdm collection and administrative fields. [2]

Collection Fields

Field name DC map Data type Large Search Hide Required Vocab
Title Title Text
Alternative Title Title-Alternative Text
Creator Creator Text
Contributor Contributors Text
Publisher Publisher Text
Description Description-Abstract Text
Description Description-Table Of Contents Text
Description Description Text
Name Subject Text
Topic Subject Text
Subject Subject Text
Architectural Detail Subject Text
Location Coverage-Spatial Text
Date Created Date-Created Date
Date Submitted Date-Submitted Date
Type Type Text
Format Format Text
Extent Format-Extent Text
Medium Format-Medium Text
Language Language Text
Identifier Identifier Text
Original Identifier Identifier Text
Collection Relation Text
Source Source Text
License Rights-License Text
Access Rights Rights-Access Rights Text
Acknowledgements Description Text

Administrative Fields

Field name DC map Data type Search Hide
Archival file None Text
OCLC number None Text
Date issued Date-issued Date
Date modified Date-modified Date
CONTENTdm number None Text
CONTENTdm file name None Text

More Information

References

  1. Rare Books and Special Collections Digital Library
  2. 2.0 2.1 CONTENTdm
  3. OCLC
  4. Digital Photography Best Practices and Workflow
  5. The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials
  6. 6.0 6.1 6.2 Technical Guidelines for Digitizing Cultural Heritage Materials
  7. 7.0 7.1 UVa Library Internal Production Digitization Standards
  8. Whither Digital Video Preservation?
  9. 9.0 9.1 9.2 9.3 9.4 9.5 For images with text only.
  10. Dublin Core Metadata Element Set, Version 1.1
  11. DCMI Metadata Terms
  12. Best Practices for CONTENTdm and other OAI-PMH compliant Repositories: Creating Shareable Metadata
  13. 13.0 13.1 Dublin Core metadata element map for collections in the RBSCDL.
  14. 14.0 14.1 Content is derived from an internationally-accepted or locally-created list of approved terms.
  15. 15.00 15.01 15.02 15.03 15.04 15.05 15.06 15.07 15.08 15.09 15.10 15.11 15.12 15.13 15.14 Library of Congress Authorities
  16. 16.0 16.1 16.2 16.3 16.4 16.5 Getty Union List of Artist Names
  17. 17.0 17.1 17.2 17.3 17.4 17.5 17.6 Resource Description & Access
  18. Library of Congress Thesaurus for Graphic Materials
  19. 19.0 19.1 Getty Art and Architecture Thesaurus
  20. Getty Thesaurus of Geographic Names
  21. Dublin Core Metadata Initiative Type Vocabulary
  22. Internet Media Types
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox