A Transcript Notation for Genealogy (Updated 18 Feb 2003)

The following notation provides a way to transcribe missing, unclear, and alternative text. It was adapted from the "regular expression" notation used in programming.

Genealogical sources are replete with incomplete and unclear information. Written documents often contain smudged or faded characters. Engravings and script can often be difficult to read or interpret.

The conventional way to denote transcription problems is to use editorial comments enclosed in brackets. This is a flexible technique because there are few restrictions on the words or phrases used.

Unfortunately, the conventional notation has not been formally defined. Furthermore, the notation is not concise and can be disruptive for information in tabular form. Because the wording in conventional notes is arbitrary, and because it uses a postfix format, the annotation can be easily misinterpreted. Lastly, missing or illegible characters in the source require transcription "placeholders" which have not been defined as part of the notation.

To overcome some of the problems associated with conventional notation, a set of predefined symbols is recommended. The symbols described below have been chosen to avoid conflicts with characters commonly found in source text. The bracket notation has been preserved for "legacy support" and to permit detailed descriptions if necessary.

The symbols are defined as follows:

Symbol

Meaning
? (question mark)

An unclear character or expression

* (asterisk)

A series of two or more unclear characters

^ (caret) A missing character or expression
# (octothorpe) A series of two or more missing characters, or a missing entry
@ (at) A literal character or expression
, (comma)

Alternative text separator

{} (braces) An expression or expression group
[] (brackets) A traditional annotation of the data

The notation for aternative characters and expressions permits unclear text to be transcribed more accurately. In cases where the text is very unclear, the ? or * symbols should be used. In cases where the text has a limited number of interpretations, the alternative notation should be used.

The difference between the asterisk and the octothorpe is subtle but important. An octothorpe indicates that data was never recorded or has faded to the point of invisibility. An asterisk indicates that the data is present but cannot be interpreted by the researcher.

Text enclosed in braces to denotes an "expression". The enclosed text is presumed to be literal, and may include any character other than a left or right brace.

An expression not followed immediately by a '?', '@', or '^' signifies that the enclosed text consists of alternative characters or phrases. Alternative characters are denoted by simply enclosing the characters within braces. Alternative text is denoted by separating each alternative using a comma. Alternatives are ordered left to right by preference (i.e. the further right an alternative is, the less likely it is to be correct).

An expression followed immediately by a question mark indicates that the entire preceding expression is very unclear to the transcriber. An expression followed by the question mark may not contain alternatives.

An expression followed immediately by an at sign indicates that the entire preceding expression is a literal transcript of the source text, even though it contains irregular spelling or syntax. An expression followed by the at sign may not contain alternatives. Because the transcribed text is presumed to be literal, this syntax is seldom needed. Nevertheless, it can be used to emphasize the literal nature of the text (equivalent to the traditional use of "[sic]").

An expression followed immediately by a caret indicates that the entire preceding expression is missing, but may have been the text within the expression. An expression followed by the caret may contain alternatives.

Examples:

Text Description
Lord of Alt???cham A series of three unclear characters within the title
Lord of Alt#cham Two or more missing characters within the title
Cath{ae}rine Most likely "Catharine", but possibly "Catherine"
Harold {Spiltz}? The entire surname is unclear, but may be "Spiltz"
{Maria, Myra} Johnson Given name most likely Maria, but possibly Myra
What shall we do?@ The original text “What shall we do?” (the question mark is literal)
John Sl^no^ski Two missing characters within the surname
21 Aug 18{836}5 Month most likely 1885, but possibly 1835 or 1865
John {Fittsgerald}@ Irregular surname spelling transcribed verbatim
{CK}ath{ae}rin{ea}

Legal, but such excessive use is discouraged

Take the #@24 train. The original text "Take the #@24 train." (the octothorpe is literal)
# A missing/blank entry from a form-based document
one {nation, country}^ unto A entire word is missing, but was probably "nation" or "country"
for the su#cane harvest A series of two of more missing characters within one word

Note: The notation for alternative characters and expressions is only applicable to text transcribed from a single source, not alternative data from multiple sources.

Please address any questions or comments about this notation by email to .


 
 
© 2001-2003 Software Renovation Corporation. All rights reserved.