Computers and The Bible - The BAS Library

Home > Magazines > Bible Review > February 1988

038

Perhaps someday the computer itself will be able to decide what patterns to look for and what the patterns mean. This is surely not beyond imagining. But we are not there yet. On the other hand, who would have imagined, when computers were invented, that they would be a useful tool in biblical studies? We have already come a long way.

Is a text the work of one author? How did the language itself develop over time? Will certain patterns help us to date the text or to unravel the layers of tradition within it? What was the original text of a translation?

These are the kinds of questions we are beginning to ask our computers. We do not yet have clear answers to all these questions. But we have taken the first modest steps that are a necessary prerequisite for asking meaningful questions—and for finding meaningful answers.

The first step, which has now been largely accomplished, is to put the Bible on the computer, so that it will be, as we say in the trade, machine readable. This is not as easy as it sounds.

In the case of an English text, such as Shakespeare’s sonnets, the text can simply be typed into the computer. However, our English texts of the Bible are not the texts we want to analyze. We want to analyze the original language of the text. That language is Hebrew (since we are focusing on the Old Testament, while others work on Greek, Aramaic, Latin and other texts).

When we encode text on a computer, whether in English or another language, and store it in the computer’s memory, each letter (or other symbol) is assigned a machine-readable value. The computer does not “understand” English any more than it understands Japanese, Hebrew or Greek. So just as it analyzes and stores English texts, it can store Hebrew, Greek or even Akkadian texts, each of them encoded with different systems of symbols. Just as the computer recognizes the values given to symbols for the English language, it will recognize values for symbols in other languages.

Keyboards now exist for each of these languages. There is even a special keyboard that combines English with either Greek characters or Hebrew characters.

Of course, since the computer does not really know Hebrew any more than English, the Hebrew text can actually be recorded with a system of agreed symbols, based on the English alphabet rather than the Hebrew alphabet. At least four systems for doing this have been developed. Existing resemblances between Hebrew and English can be employed so that one easily becomes accustomed to writing Hebrew with English characters; Hebrew resh is recorded as R, beth as B, samekh as S, and so on. Other Hebrew letters are given graphically similar or even arbitrary symbols: shin becomes “$,” sin becomes “&,” aleph becomes “)” and tet becomes “+.” In this system of transliteration (in which Hebrew characters are transliterated into English) the first 040verse of the Bible looks like this:¹

BR)$YT BR) )LHYM )T H$MYM W)T H)RC

Because Hebrew is a consonantal alphabet, in which most of the vowels are omitted, the text printed above omits the vowels. It is as if the text, read in equivalent English, looked like this:

N TH BGNNG GD CRTD HVN ND RTH (In the beginning God created heaven and earth.)

Unless you know what vowels to supply, you—or the computer—can’t read the text accurately. In the ninth and tenth centuries A.D., Jewish scribes known as Masoretes developed a system for indicating vowels with signs placed mostly under, but sometimes between or above, the Hebrew letters. The Masoretes also fixed the final form of the biblical text. The text they produced in this way is known as the Masoretic text. It is the traditional Jewish text. It can be vocalized precisely. (It is sometimes called a vocalized or a pointed text, that is to say, “pointed” with the vowel signs.)

A full recording of the biblical text must include these vowel signs as well as the Hebrew letters. Accordingly, a different computer symbol must be assigned to each vowel sign.

Three further sets of information are recorded together with the biblical text because these are also an inseparable part of the traditional Masoretic biblical text. The first are cantillation signs, which guide the musical chanting of the Bible in the synagogue. These signs also indicate the grammatical relation between words (that is, whether two or more words are combined in reading or whether one has to pause between them). The cantillation signs appear under and above the Hebrew consonants and are encoded by means of numbers in the system we are using as an example here.

The Masoretic text sometimes indicates that a word is to be read (Qere) differently from the way it is written (Ktiv). The Masoretic text also contains other peculiarities: Some letters are written irregularly, or particularly large or small. Each of these peculiarities must be given encodable signs.

Finally, we want to be able to locate each word in the text, so we must enable the computer to identify the chapter, book and verse in which it appears. Thus, the first verse of the Bible (Genesis 1:1) can be recorded as GN 01 01.

The following shows how the first verse of Genesis looks in computer language with all of the applicable signs:

GN 01 01 B.:R”)$I73YT B. FRF74) ):ELOHI92YM )”71T HA$. FMA73YIYM W:)”71T HF)F75REC

Three other transliteration systems for encoding the Bible similar to this have been developed, so we now have four “copies” of the Hebrew Bible in machine-readable form, with all the necessary components—consonants, vowels, cantillation signs, some Masoretic notes and references and indication of book, chapter and verse. These systems for encoding the text were developed simultaneously by several individuals in different centers because a lack of communication and/or cooperation between them prevented a combined effort. However, to some extent this turned out to be beneficial: A computer program written to disclose mechanical, encoding errors was able to compare the different encodings typed by these individuals. As a result, we now have a “corrected,” machine-readable text of the Bible in various computer centers around the world.

But when we speak of a corrected computer text of the Bible, we are naturally led to ask: What is a “correct” text of the Hebrew Bible? The answer, unfortunately, is that there is no absolutely correct text. This is true of all ancient texts that have been transmitted by generations of scribes, each copying the text from previous generations. We have focused on the Masoretic text, but there are a number of others. Moreover, there are some variations in copies of the Masoretic text. Most are minor, but they do differ in hundreds of small details. So a decision has to be made on which manuscript or edition of the Masoretic text to use.

One of the most highly regarded is a particular modern edition based on one source only, since that one source (manuscript) represents an existing text and not a modern scholarly concoction. This edition, the so-called Biblia Hebraica Stuttgartensia (Stuttgart, West Germany, 1967–1977; last printing, 1984), is based on the codex Leningrad B19A, which dates from 1009 A.D. Three computer texts are based on this edition, while the fourth² is based on an independent reading of the codex Leningrad itself.

Each of these four encodings of the biblical text was typed by a typist, letter after letter, vowel after vowel, etc. But this system is now outdated. Today it is much more efficient to use a so-called optical 041character reader (OCR), which can scan the biblical text from a printed edition, page after page. Scanning Hebrew is not easy, because of the complicated combination of consonant-vowel-cantillation signs, but even with the necessary proofreading this procedure is now the most efficient one.

Simply putting the Bible on the computer not enough for analytical purposes, however. Obviously, we want to be able to question the computer about all occurrences in the text of a given word or root. To do this, the computer must be able to recognize, for example, that “buy” and “bought” are forms of the same word. A similar problem is presented by various forms of the verb “to be” (am, are, is, etc.). The only solution to this problem is to instruct the computer about the connection between “be” and “is,” “buy” and “bought,” etc. This can be done in various ways, all of which require providing the computer with “dictionary words” for all “text words.” Thus, if a text contains the word “bought,” the computer needs to be given its dictionary word, “buy.” The dictionary word, or head word in computer jargon, is called “lemma.” The lemma is what actually appears in the dictionary. Several computer centers now have computer files that provide the dictionary (or lemma) word for the text words in the Hebrew Bible. The creation of this file is called lemmatization. On the basis of this lemmatization, the computer is now able to group all the Hebrew words that belong together, even if the specific text words would be listed in different places if arranged alphabetically.

In Hebrew, lemmatization is particularly difficult because linguistic elements are often prefixed to the word. A he (h) at the beginning of a word means “the”; a beth (b) means “in” or “on”; lamed (l) means “to” or “for”; mem (m) means “from”; a waw (v) means “and.” In Hebrew, the phrase “and in his house” is one word. The word for house is beyt. The “and” and “in” are rendered by prefixes; the “his,” by a suffix. Thus “and in his house” is in Hebrew the one word u-be-beyt-o. The computer must be instructed how to break down this single word into its constituent elements.

Finally, the computer file must contain a full grammatical analysis of each word, including its status as a noun, verb, adverb, etc., and details as to the gender, number and verbal forms, etc.

Four different grammatical analyses have now been created for the Hebrew Bible.³

The biblical text, together with the lemmatization and grammatical analysis, as well as other kinds of information, forms one large collection of related computer units (files) often called a database or databank. The material in the databank can be used and accessed in different ways, using part or all of the information.

What, then, can we do with the computerized Bible? One thing we can do is make lists of all the forms of words occurring in the Bible. We can also extend this search to the “dictionary forms” of all words occurring in the text. Thus, when represented in English, all Hebrew forms of “to go” (including “went”) can be easily collected, together with the appropriate references.

However, this type of information is already available in printed books called concordances, so there is no need to use computers. A concordance (to any text of the Bible, in either Hebrew, Greek or English) lists all occurrences of a given word in all books of the Bible, together with the surrounding words (“context”) and with references to the biblical passages (book, chapter, verse). In this way one can find all passages in which all forms of “to buy” (including “bought”) occur in the English translation of the Bible, and the same applies to concordances of the Hebrew text.

But printed books are static; once published, they cannot be changed to suit the needs of different types of research. Computerized databases, on the other hand, are dynamic, and can be manipulated to obtain different types of information.

For example, conventional concordances refer to the Bible as a whole. With a computer we can produce partial concordances, enabling the researcher study separately the vocabulary of an individual biblical book, or any combination of biblical books, such as two prophets, or the language of the “late” biblical books (such as Chronicles, Ezra-Nehemiah and Ecclesiastes). It is also possible to study specific parts of several books, such as the books that may have been influenced by—or rewritten in the spirit of—the Book of Deuteronomy (the so-called Deuteronomistic sections of the Bible).

With a computer we can also study combinations of words such as the words “king” and 042“queen” occurring in the same verse. We can make a grammatical analysis of the text, separating the elements of Hebrew words with prefixes and suffixes. For example, we can study all the pronominal suffixes, such as “my” (as in beyt-i, “my house”), or all prepositional prefixes that are part of the main word in Hebrew. We can easily collect and therefore study construct nouns (nouns used as adjectives) and verb forms, both rare and common.

In this way, we can begin to trace the history of biblical language and describe its characteristics at various states and in various books.

Computer-assisted analysis also opens up new vistas in the study of the text itself. Thanks to computers, we can compare the Masoretic text with texts preserved in the Dead Sea Scrolls, as well as other well-known versions of the biblical text. We already have machine-readable texts of the Septuagint’s Greek translations of the Bible from the third and second centuries B.C. in Alexandria. Plus, we have machine-readable texts of several Aramaic translations (Targums), of the Syriac version (the Peshitta), and of the principal Latin translation (the Vulgate). These versions are studied in their own right, but they can also be studied in conjunction with the text of the Hebrew Bible. The differences in the versions are often of considerable significance; computers enable us to study the patterns of these differences.

I am associated with a project that is comparing the Septuagint with the Hebrew text. Known as CATSS (Computer Assisted Tools for Septuagint Studies), the project is directed jointly by Robert A. Kraft of the University of Pennsylvania and myself at the Hebrew University of Jerusalem. The Westminster Theological Seminary in Philadelphia focusing on the Hebrew text of the Bible, joins our efforts.

The Septuagint is a collection of the oldest preserved Greek translations of the Bible. Among other things, we are studying the translation techniques of the Septuagint. For the first time, it is possible to talk precisely, instead of in generalizations. We can now provide exact statistics on the relative literalness of the various translations of biblical books. We can determine with statistical precision how literal a translation of, say, 2 Kings is, or how free a translation of Isaiah is. Based on such studies, we will be able in some instances to reconstruct with considerable confidence many elements of the original Hebrew text from which the Greek translator worked. We can independently study all the elements of the Masoretic text that are not present in the Septuagint translations and, conversely, all elements of the Septuagint translations that are not found in the Masoretic text. We can also begin to study the patterns that are in these apparent omissions and additions.

Several scholars associated with the Technion in Haifa, Israel, especially Yehuda Radday, have used the computerized texts of the Bible to investigate whether a biblical book is or is not the work of one author. Critical biblical scholars have long claimed that the Pentateuch, Joshua, Judges, Isaiah, Jeremiah and other books were written by more than one author. Radday and scholars like him believe that it is possible to formulate objective criteria by which to judge the unity of biblical books. The technique involves using a long list of general linguistic criteria to compare the similarity or dissimilarity of a biblical book’s different sections in these respects. How successful this will be is still an open question.

While computer-assisted research will change biblical research in some respects, these changes will nevertheless be limited. The new computer tools will be of special importance for the study of the Bible’s text and of its language. They will also assist in authorship studies. The principal task of biblical scholarship, however, will remain the interpretation of the text (exegesis). Today, this is largely a noncomputer function, but who knows what the computer future may hold.

Computers can find patterns hidden in obscure recesses of biblical literature. Knowing what patterns to look for, however, still requires human intelligence. What these patterns mean, once they are identified, is also a matter that requires human, rather than computer, intelligence. Perhaps someday the computer itself will be able to decide what patterns to look for and what the patterns mean. This is surely not beyond imagining. But we are not there yet. On the other hand, who would have imagined, when computers were invented, that they would be a useful tool in biblical studies? We have already come […]

Endnotes

This is the Michigan-Claremont text (encoded by H. Van Dyke Parunak and Richard Whitaker).

CATAB, Gérard E. Weil (Lyons, France).

The four computer texts of the Bible produced so far are (1) the Michigan-Claremont text of Parunak and Whitaker (prepared with a grant from the Packard Foundation, and used by the CATSS project), (2) the text of Francis Andersen and Dean Forbes, (3) the CATAB text of Weil and (4) the text produced by R. Ferdinand Poswick (Maredsous, Belgium: Centre, Informatique et Bible). Almost all other texts somehow derive from one of these four texts. Each of the four groups also produced or is producing its own grammatical analyses, two of which have not been shared with others. The material produced by the Michigan-Claremont group will be in the public domain. Furthermore, two different versions of the so-called Mikrah text are now commercially available. This text derives from both the Michigan-Claremont and the Maredsous text. In addition to these, the Werkgroep Informatika of the Free University in Amsterdam, Netherlands, is preparing its own grammatical analysis, including syntactical features not recorded in the other analyses.