PLEASE NOTE: THIS IS AN OLD VERSION. The current version is linked from The Complete Lojban Language.
The gismu were created through the following process:
- 1)
- At least one word was found in each of the six source
languages (Chinese, English, Hindi, Spanish, Russian, Arabic)
corresponding to the proposed gismu. This word was rendered
into Lojban phonetics rather liberally: consonant clusters
consisting of a stop and the corresponding fricative were
simplified to just the fricative (``tc'' became ``c'', ``dj''
became ``j'') and non-Lojban vowels were mapped onto Lojban
ones. Furthermore, morphological endings were dropped. The
same mapping rules were applied to all six languages for the
sake of consistency.
- 2)
- All possible gismu forms were matched against the six
source-language forms. The matches were scored as
follows:
- 2a)
- If three or more letters were the same in the proposed
gismu and the source-language word, and appeared in the same
order, the score was equal to the number of letters that were
the same. Intervening letters, if any, did not matter.
- 2b)
- If exactly two letters were the same in the proposed
gismu and the source-language word, and either the two
letters were consecutive in both words, or were separated by
a single letter in both words, the score was 2. Letters in
reversed order got no score.
- 2c)
- Otherwise, the score was 0.
- 3)
- The scores were divided by the length of the
source-language word in its Lojbanized form, and then
multiplied by a weighting value specific to each language,
reflecting the proportional number of first-language and
second-language speakers of the language. (Second-language
speakers were reckoned at half their actual numbers.) The
weights were chosen to sum to 1.00. The sum of the weighted
scores was the total score for the proposed gismu form.
- 4)
- Any gismu forms that conflicted with existing gismu were
removed. Obviously, being identical with an existing gismu
constitutes a conflict. In addition, a proposed gismu that
was identical to an existing gismu except for the final vowel
was considered a conflict, since two such gismu would have
identical 4-letter rafsi.
- More subtly: If the proposed gismu was identical to an
existing gismu except for a single consonant, and the
consonant was ``too similar'' based on the following table,
then the proposed gismu was rejected.
- proposed gismu existing gismu
- b p, v c j, s d t f p, v g k, x j c, z k g, x l r m n n m
p b, f r l s c, z t d v b, f x g, k z j, s
See Section 4 for an example.
- 5)
- The gismu form with the highest score usually became the
actual gismu. Sometimes a lower-scoring form was used to
provide a better rafsi. A few gismu were changed in error as
a result of transcription blunders (for example, the gismu
``gismu'' should have been ``gicmu'', but it's too late to
fix it now).
The language weights used to make most of the gismu were as
follows:
Chinese 0.36
English 0.21
Hindi 0.16
Spanish 0.11
Russian 0.09
Arabic 0.07
reflecting 1985 number-of-speakers data. A few gismu were made
much later
- using updated weights:
- Chinese 0.347 Hindi 0.196 English 0.160 Spanish 0.123
Russian 0.089 Arabic 0.085
(English and Hindi switched places due to demographic changes.)
Note that the stressed vowel of the gismu was considered
sufficiently distinctive that two or more gismu may differ only
in this vowel; as an extreme example, ``bradi'', ``bredi'',
``bridi'', and ``brodi'' (but fortunately not ``brudi'') are
all existing gismu.