Automated re-typesetting, indexing and content enhancement for scanned marriage registers

For much of England and Wales marriage registers began to be kept in 1537. The marriage details were recorded locally, and in longhand, until 1st July 1837, when central records began. All registers were kept in the local parish church. In the period from 1896 to 1922 an attempt was made, by the Ph...

Full description

Bibliographic Details
Main Author: Brailsford, David F.
Format: Conference or Workshop Item
Published: 2009
Subjects:
Online Access:https://eprints.nottingham.ac.uk/28117/
_version_ 1848793510562496512
author Brailsford, David F.
author_facet Brailsford, David F.
author_sort Brailsford, David F.
building Nottingham Research Data Repository
collection Online Access
description For much of England and Wales marriage registers began to be kept in 1537. The marriage details were recorded locally, and in longhand, until 1st July 1837, when central records began. All registers were kept in the local parish church. In the period from 1896 to 1922 an attempt was made, by the Phillimore company of London, using volunteer help, to transcribe marriage registers for as many English parishes as possible and to have them printed. This paper describes an experiment in the automated retypesetting of Volume 2 of the 15-volume Phillimore series relating to the county of Derbyshire. The source material was plain text derived from running Optical Character Recognition (OCR) on a set of page scans taken from the original printed volume. The aim of the experiment was to avoid any idea of labour-intensive page-by-page rebuilding with tools such as Acrobat Capture. Instead, it proved possible to capitalise on the regular, tabular, structure of the Register pages as a means of automating the re-typesetting process, using UNIX troff software and its tbl preprocessor. A series of simple software tools helped to bring about the OCR-to-troff transformation. However, the re-typesetting of the text was not just an end in itself but, additionally, a step on the way to content enhancement and content repurposing. This included the indexing of the marriage entries and their potential transformation into XML and GEDCOM notations. The experiment has shown, for highly regular material, that the efforts of one programmer, with suitable low-level tools, can be far more effective than attempting to recreate the printed material using WYSIWYG software.
first_indexed 2025-11-14T19:01:27Z
format Conference or Workshop Item
id nottingham-28117
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T19:01:27Z
publishDate 2009
recordtype eprints
repository_type Digital Repository
spelling nottingham-281172020-05-04T20:26:10Z https://eprints.nottingham.ac.uk/28117/ Automated re-typesetting, indexing and content enhancement for scanned marriage registers Brailsford, David F. For much of England and Wales marriage registers began to be kept in 1537. The marriage details were recorded locally, and in longhand, until 1st July 1837, when central records began. All registers were kept in the local parish church. In the period from 1896 to 1922 an attempt was made, by the Phillimore company of London, using volunteer help, to transcribe marriage registers for as many English parishes as possible and to have them printed. This paper describes an experiment in the automated retypesetting of Volume 2 of the 15-volume Phillimore series relating to the county of Derbyshire. The source material was plain text derived from running Optical Character Recognition (OCR) on a set of page scans taken from the original printed volume. The aim of the experiment was to avoid any idea of labour-intensive page-by-page rebuilding with tools such as Acrobat Capture. Instead, it proved possible to capitalise on the regular, tabular, structure of the Register pages as a means of automating the re-typesetting process, using UNIX troff software and its tbl preprocessor. A series of simple software tools helped to bring about the OCR-to-troff transformation. However, the re-typesetting of the text was not just an end in itself but, additionally, a step on the way to content enhancement and content repurposing. This included the indexing of the marriage entries and their potential transformation into XML and GEDCOM notations. The experiment has shown, for highly regular material, that the efforts of one programmer, with suitable low-level tools, can be far more effective than attempting to recreate the printed material using WYSIWYG software. 2009-09 Conference or Workshop Item PeerReviewed Brailsford, David F. (2009) Automated re-typesetting, indexing and content enhancement for scanned marriage registers. In: ACM Symposium on Document Engineering (DocEng '09), 15-18 Sept 2009, Munich, Germany. Re-typesetting GEDCOM OCR troff genealogy hyperlinking indexing. http://dl.acm.org/citation.cfm?doid=1600193.1600202
spellingShingle Re-typesetting
GEDCOM
OCR
troff
genealogy
hyperlinking
indexing.
Brailsford, David F.
Automated re-typesetting, indexing and content enhancement for scanned marriage registers
title Automated re-typesetting, indexing and content enhancement for scanned marriage registers
title_full Automated re-typesetting, indexing and content enhancement for scanned marriage registers
title_fullStr Automated re-typesetting, indexing and content enhancement for scanned marriage registers
title_full_unstemmed Automated re-typesetting, indexing and content enhancement for scanned marriage registers
title_short Automated re-typesetting, indexing and content enhancement for scanned marriage registers
title_sort automated re-typesetting, indexing and content enhancement for scanned marriage registers
topic Re-typesetting
GEDCOM
OCR
troff
genealogy
hyperlinking
indexing.
url https://eprints.nottingham.ac.uk/28117/
https://eprints.nottingham.ac.uk/28117/