Reconstituting typeset Marriage Registers using simple software tools

In a world of fully integrated software applications, which can seem daunting to develop and to maintain, it is sometimes useful to recall that a system of loosely-linked software components can provide surprisingly powerful and flexible methods for software development. This paper describes a pr...

Full description

Bibliographic Details
Main Author: Brailsford, David F.
Format: Article
Published: Springer-Verlag 2012
Subjects:
Online Access:https://eprints.nottingham.ac.uk/28126/
_version_ 1848793513281454080
author Brailsford, David F.
author_facet Brailsford, David F.
author_sort Brailsford, David F.
building Nottingham Research Data Repository
collection Online Access
description In a world of fully integrated software applications, which can seem daunting to develop and to maintain, it is sometimes useful to recall that a system of loosely-linked software components can provide surprisingly powerful and flexible methods for software development. This paper describes a project which aims to retypeset a series of volumes from the Phillimore Marriage Registers, first published in England around the turn of the last century. The source material is plain text derived from running Optical Character Recognition (OCR) on a set of page scans taken from the original printed volumes. The regular, tabular, structure of the Register pages allows us to automate the re-typesetting process. The UNIX troff software and its tbl preprocessor are used for the typesetting itself, but a series of simple awk-based software tools, all of them parsers and code generators of one sort or another, is used to bring about the OCR-to-troff transformation. By re-parsing the generated troff codes it is possible to produce a surname index as a supplement to the retypeset volume. Moreover, this second-stage parsing has been invaluable in discovering subtle ‘typos’ in the automatically generated material. With small adjustments to this parser it would be possible to output the complete marriage entries in standard XML or GEDCOM notations.
first_indexed 2025-11-14T19:01:29Z
format Article
id nottingham-28126
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T19:01:29Z
publishDate 2012
publisher Springer-Verlag
recordtype eprints
repository_type Digital Repository
spelling nottingham-281262020-05-04T16:32:52Z https://eprints.nottingham.ac.uk/28126/ Reconstituting typeset Marriage Registers using simple software tools Brailsford, David F. In a world of fully integrated software applications, which can seem daunting to develop and to maintain, it is sometimes useful to recall that a system of loosely-linked software components can provide surprisingly powerful and flexible methods for software development. This paper describes a project which aims to retypeset a series of volumes from the Phillimore Marriage Registers, first published in England around the turn of the last century. The source material is plain text derived from running Optical Character Recognition (OCR) on a set of page scans taken from the original printed volumes. The regular, tabular, structure of the Register pages allows us to automate the re-typesetting process. The UNIX troff software and its tbl preprocessor are used for the typesetting itself, but a series of simple awk-based software tools, all of them parsers and code generators of one sort or another, is used to bring about the OCR-to-troff transformation. By re-parsing the generated troff codes it is possible to produce a surname index as a supplement to the retypeset volume. Moreover, this second-stage parsing has been invaluable in discovering subtle ‘typos’ in the automatically generated material. With small adjustments to this parser it would be possible to output the complete marriage entries in standard XML or GEDCOM notations. Springer-Verlag 2012-05-01 Article PeerReviewed Brailsford, David F. (2012) Reconstituting typeset Marriage Registers using simple software tools. Computer Science - Research and Development, 27 (2). pp. 113-126. ISSN 1865-2042 Re-Typesetting OCR Troff Parsing Genealogy Hyperlinking Indexing http://link.springer.com/article/10.1007/s00450-010-0145-x doi:10.1007/s00450-010-0145-x doi:10.1007/s00450-010-0145-x
spellingShingle Re-Typesetting
OCR
Troff
Parsing
Genealogy
Hyperlinking
Indexing
Brailsford, David F.
Reconstituting typeset Marriage Registers using simple software tools
title Reconstituting typeset Marriage Registers using simple software tools
title_full Reconstituting typeset Marriage Registers using simple software tools
title_fullStr Reconstituting typeset Marriage Registers using simple software tools
title_full_unstemmed Reconstituting typeset Marriage Registers using simple software tools
title_short Reconstituting typeset Marriage Registers using simple software tools
title_sort reconstituting typeset marriage registers using simple software tools
topic Re-Typesetting
OCR
Troff
Parsing
Genealogy
Hyperlinking
Indexing
url https://eprints.nottingham.ac.uk/28126/
https://eprints.nottingham.ac.uk/28126/
https://eprints.nottingham.ac.uk/28126/