A New Transliteration of the Hebrew Bible

Transliterations are useful to those who are not intimately acquainted with the complex orthography of Biblical Hebrew. Even in scholarly publications, transliterations are often used, and producing them can be a source of frustration. Being able to copy them from a reliable source is helpful. When done well, transliterations are also useful to those who research Biblical Hebrew orthography. Ideally, a transliteration allows one to recreate the precise orthography, excluding cantillation marks, of the Hebrew original. In addition, it ideally resolves numerous ambiguities inherent in the Masoretic and pre-Masoretic orthography. That is, maximally useful transliterations identify when consonants serve as matres lectionis and when the letter aleph is quiescent, distinguish between dagesh lene and dagesh forte, distinguish between qamets and qamets-hatuph, distinguish between vocal shewa and silent shewa, and identify syllable boundaries. Transliterating the entire Hebrew Bible by hand would be a laborious and error-prone process. However, previous purely algorithmic efforts at transliterating have not been entirely successful. There is a fundamental problem with a purely algorithmic approach: in some cases identical surface forms require that Massoretic ambiguities be resolved differently, resulting in different transliterations. To avoid these problems, we have produced transliterations of the entire Hebrew Bible using a semi-automatic process. Our algorithm uses the Westminster Leningrad Codex and the Westminster Hebrew Morphology as its inputs, uses these inputs to group together words that are guaranteed to have identical transliterations, and relies heavily on the rules for syllable structure to identify possible transliterations of those words. Crucially, the algorithm knows its limitations. While in most cases the algorithm produces a single, correct transliteration, in other cases it produces multiple options for the correct transliteration. A human resolved the remaining ambiguities and selected the correct transliteration from among the options provided by the algorithm. In a handful of cases, mostly involving orthographic anomalies in Codex Leningradensis, the algorithm could not identify a single plausible transliteration, and the human supplied it. We used a variety of algorithmic methods for identifying possible mistakes and inconsistencies so that the quality of the transliterations would be high, representing a significant advance over previous transliterations we have seen. We are now releasing the transliterations freely for non-commercial use.