Roget’s Thesaurus
of
English words and phrases classified and arranged so as to facilitate the expression of ideas and assist in literary composition
by Peter Mark Roget
An electronic thesaurus derived from the version of Roget's Thesaurus published in 1911.
MICRA, Inc. makes no representation that the original 1911 printed work on which this is based is now in the public domain in any particular country. However, MICRA, Inc. makes no proprietary claims regarding this electronic version of the 1911 thesaurus. If the 1911 work is currently public domain, this electronic version can also be treated as public domain.
Note that this version of Thesaurus-1911 has been supplemented with over 1,000 words not present in the original 1911 edition, but many modern words are still missing. About 1500 verbs (out of 6500) which can be found in an 80,000-word spell-checker are absent from this work. The deficiency of nouns is probably much worse, especially on technical topics. Of 40,000 unique words contained in the original text, 12,000 are not recognized by a spell-checker. Most of these are foreign words (primarily Latin), and many are obsolete. In this version, these words are marked as such by comments in square brackets. Although this version has been proof-read, there are doubtless numerous residual transcription errors, some of which may be obvious even without reference to the original text. We will be grateful if any of these are brought to our attention; the corrections will appear in subsequent versions.
The original arrangement has also been modified slightly in several places, in particular by splitting one entry into two. A version of the 1911 thesaurus which is almost identical to the original (only a small number of additions to the original work) has also been prepared by MICRA, Inc., and also carries no restrictions from MICRA. Copies of that version or this one may be purchased for $40.00 from MICRA, Inc., or from the Austin Code Works, Austin Texas.
Occasional references to numbers starting with "@" are the embryonic beginnings of a reorganized version, mentioned below. A few comments are also included within curly brackets {}.
The following additional differences will be noted between this version and the original edition of the printed 1911 thesaurus:
(1) the space-saving abbreviations in the original, using hyphens to represent common words, prefixes or suffixes, have been expanded into the full words or phrases.
(2) the side-by-side format for words and their opposites has been abandoned. Words are listed in order of their entry number.
(3) each main entry (1035 entries) has a pound sign "#" in front of the number to facilitate computerized search.
(4) Greek words and phrases are transliterated and included between brackets in the format <gr/greek word/gr>.
(5) where italics occurred in the original, italics are used in the Microsoft Word format file. In the plain ASCII file, this formatting is lost.
(6) in the original book, words which were obsolete (in 1911) were marked with a dagger. In this version, those words are marked with a vertical bar ("|").
Some of the words which were still current in 1911, but are no longer found in a current college-size dictionary (presently obsolete words), or which are no longer used in the specific indicated sense, have been marked with a bar followed by an exclamation point "|!". However, this marking process has just commenced, and only a small portion of the words which are now obsolete have been thus marked. Most though not all of the foreign-language phrases are now obsolete. The "obsolete" notation [obs3] indicates that the previous word (or some word in the previous phrase) is not recognized by the word processor's spelling checker, and also is either NOT in a modern college-sized dictionary, or is noted there as being "ARCHAIC".
(7) This file contains only the main body of the thesaurus. Neither outline nor index are contained here. The outline with an overview of the organization of the concepts is contained in a separate file, "outline.doc", on the distribution disk.
This first edition of this supplemented 1911 thesaurus (June 1991) is very much less complete than the latest editions of commercial thesauri, and is probably not suitable for use as an adjunct to word-processing programs, but it has no proprietary claims attached to it by MICRA, Inc., and does not contain any material published commercially after 1911.
Future (copyrighted) versions of this thesaurus are planned, which will be reorganized in a hierarchical fashion to maximize the ability to take advantage of inheritance of semantic characteristics from higher categories. The objective is to create a database of words organized by semantic categories, suitable for use in natural-language understanding programs. This is a very small-scale project, which will not be competitive with large academic or commercial efforts such as the CYC project, but is intended to provide a convenient resource for experimentation in natural-language processing for individuals or small groups. Anyone who is currently engaged in or contemplating a similar thesaurus or dictionary project, who would be willing to collaborate on this project, is encouraged to contact us, so that unnecessary duplication of effort can be avoided. We would also appreciate being notified of typos, errors, or omissions in any version. Send inquiries or comments to:
Patrick Cassidy
MICRA Inc.
735 Belvidere Ave.
Plainfield, NJ 07062-2054
voice: (908) 668-5252
fax: (908) 668-5904
(If no one answers, please leave a message.)