Automated re-typesetting, indexing and content enhancement for scanned marriage registers
For much of England and Wales marriage registers began to be kept in 1537. The marriage details were recorded locally, and in longhand, until 1st July 1837, when central records began. All registers were kept in the local parish church. In the period from 1896 to 1922 an attempt was made, by the Ph...
| Main Author: | |
|---|---|
| Format: | Conference or Workshop Item |
| Published: |
2009
|
| Subjects: | |
| Online Access: | https://eprints.nottingham.ac.uk/28117/ |
| _version_ | 1848793510562496512 |
|---|---|
| author | Brailsford, David F. |
| author_facet | Brailsford, David F. |
| author_sort | Brailsford, David F. |
| building | Nottingham Research Data Repository |
| collection | Online Access |
| description | For much of England and Wales marriage registers began to be kept in 1537. The marriage details were recorded locally, and in longhand, until 1st July 1837, when central records began. All registers were kept in the local parish church.
In the period from 1896 to 1922 an attempt was made, by the Phillimore company of London, using volunteer help, to transcribe marriage registers for as many English parishes as possible and to have them printed.
This paper describes an experiment in the automated retypesetting of Volume 2 of the 15-volume Phillimore series relating to the county of Derbyshire. The source material was plain text derived from running Optical Character Recognition (OCR) on a set of page scans taken from the original printed volume.
The aim of the experiment was to avoid any idea of labour-intensive page-by-page rebuilding with tools such as Acrobat Capture. Instead, it proved possible to capitalise on the regular, tabular, structure of the Register pages as a means of automating the re-typesetting process, using UNIX troff software and its tbl preprocessor. A series of simple software tools helped to bring about the OCR-to-troff transformation.
However, the re-typesetting of the text was not just an end in itself but, additionally, a step on the way to content enhancement and content repurposing. This included the indexing of the marriage entries and their potential transformation into XML and GEDCOM notations. The experiment has shown, for highly regular material, that the efforts of one programmer, with suitable low-level tools, can be far more effective than attempting to recreate the printed material using WYSIWYG software. |
| first_indexed | 2025-11-14T19:01:27Z |
| format | Conference or Workshop Item |
| id | nottingham-28117 |
| institution | University of Nottingham Malaysia Campus |
| institution_category | Local University |
| last_indexed | 2025-11-14T19:01:27Z |
| publishDate | 2009 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | nottingham-281172020-05-04T20:26:10Z https://eprints.nottingham.ac.uk/28117/ Automated re-typesetting, indexing and content enhancement for scanned marriage registers Brailsford, David F. For much of England and Wales marriage registers began to be kept in 1537. The marriage details were recorded locally, and in longhand, until 1st July 1837, when central records began. All registers were kept in the local parish church. In the period from 1896 to 1922 an attempt was made, by the Phillimore company of London, using volunteer help, to transcribe marriage registers for as many English parishes as possible and to have them printed. This paper describes an experiment in the automated retypesetting of Volume 2 of the 15-volume Phillimore series relating to the county of Derbyshire. The source material was plain text derived from running Optical Character Recognition (OCR) on a set of page scans taken from the original printed volume. The aim of the experiment was to avoid any idea of labour-intensive page-by-page rebuilding with tools such as Acrobat Capture. Instead, it proved possible to capitalise on the regular, tabular, structure of the Register pages as a means of automating the re-typesetting process, using UNIX troff software and its tbl preprocessor. A series of simple software tools helped to bring about the OCR-to-troff transformation. However, the re-typesetting of the text was not just an end in itself but, additionally, a step on the way to content enhancement and content repurposing. This included the indexing of the marriage entries and their potential transformation into XML and GEDCOM notations. The experiment has shown, for highly regular material, that the efforts of one programmer, with suitable low-level tools, can be far more effective than attempting to recreate the printed material using WYSIWYG software. 2009-09 Conference or Workshop Item PeerReviewed Brailsford, David F. (2009) Automated re-typesetting, indexing and content enhancement for scanned marriage registers. In: ACM Symposium on Document Engineering (DocEng '09), 15-18 Sept 2009, Munich, Germany. Re-typesetting GEDCOM OCR troff genealogy hyperlinking indexing. http://dl.acm.org/citation.cfm?doid=1600193.1600202 |
| spellingShingle | Re-typesetting GEDCOM OCR troff genealogy hyperlinking indexing. Brailsford, David F. Automated re-typesetting, indexing and content enhancement for scanned marriage registers |
| title | Automated re-typesetting, indexing and content enhancement for scanned marriage registers |
| title_full | Automated re-typesetting, indexing and content enhancement for scanned marriage registers |
| title_fullStr | Automated re-typesetting, indexing and content enhancement for scanned marriage registers |
| title_full_unstemmed | Automated re-typesetting, indexing and content enhancement for scanned marriage registers |
| title_short | Automated re-typesetting, indexing and content enhancement for scanned marriage registers |
| title_sort | automated re-typesetting, indexing and content enhancement for scanned marriage registers |
| topic | Re-typesetting GEDCOM OCR troff genealogy hyperlinking indexing. |
| url | https://eprints.nottingham.ac.uk/28117/ https://eprints.nottingham.ac.uk/28117/ |