The cmavo ``na'a'' (of selma'o BY) is a universal shift-word cancel: it
returns the interpretation of lerfu words to the default of lower-case Lojban
with no specific font. It is more general than ``lo'a'', which changes the
alphabet only, potentially leaving font and case shifts in place.
Several sections at the end of this chapter contain tables of proposed lerfu
word assignments for various languages.
Many languages that make use of the Latin alphabet add special
marks to some of the lerfu they use. French, for example, uses three
accent marks above vowels, called (in English) ``acute'', ``grave'', and
``circumflex''. Likewise, German uses a mark called ``umlaut''; a mark which
looks the same is also used in French, but with a different name and meaning.
These marks may be considered lerfu, and each has a corresponding lerfu
word in Lojban. So far, no problem. But the marks appear over lerfu,
whereas the words must be spoken (or written) either before or after the
lerfu word representing the basic lerfu. Typewriters (for mechanical reasons)
and the computer programs that emulate them usually require their users to
type the accent mark before the basic lerfu, whereas in speech the accent
mark is often pronounced afterwards (for example, in German ``a umlaut'' is
preferred to ``umlaut a'').
... foi">
Lojban cannot settle this question by fiat. Either it must be left up to
default interpretation depending on the language in question, or the
lerfu-word compounding cmavo ``tei'' (of selma'o TEI) and ``foi'' (of selma'o
FOI) must be used. These cmavo are always used in pairs; any number of lerfu
words may appear between them, and the whole is treated as a single compound
lerfu word. The French word ``été'', with acute accent marks on both ``e'' lerfu,
could be spelled as:
6.1) tei .ebu .akut. bu foi ty. tei .akut. bu .ebu foi
( ``e'' acute ) ``t'' ( acute ``e'' )
... foi">
and it does not matter whether ``akut. bu'' appears before or after ``.ebu'';
the ``tei ... foi'' grouping guarantees that the acute accent is associated
with the correct lerfu. Of course, the level of precision represented by
Example 6.1 would rarely be required: it might be needed by a Lojban-speaker
when spelling out a French word for exact transcription by another
Lojban-speaker who did not know French.
This system breaks down in languages which use more than one accent mark
on a single lerfu; some other convention must be used for showing
which accent marks are written where in that case. The obvious convention
is to represent the mark nearest the basic lerfu by the lerfu word closest
to the word representing the basic lerfu. Any remaining ambiguities must
be resolved by further conventions not yet established.
Some languages, like Swedish and Finnish, consider certain accented lerfu
to be completely distinct from their unaccented equivalents, but Lojban does
not make a formal distinction, since the printed characters look the same
whether they are reckoned as separate letters or not. In addition, some
languages consider certain 2-letter combinations (like ``ll'' and ``ch''
in Spanish) to be letters; this may be represented by enclosing the
combination in ``tei ... foi''.
In addition, when discussing a specific language, it is permissible to make
up new lerfu words, as long as they are either explained locally or well
understood from context: thus Spanish ``ll'' or Croatian ``lj'' could be called
``libu'', but that usage would not necessarily be universally understood.
Section 19 contains a table of proposed lerfu words for some common accent
marks.
Lojban does not have punctuation marks as such: the denpa bu and the
slaka bu are really a part of the alphabet. Other languages, however,
use punctuation marks extensively. As yet, Lojban does not have any
words for these punctuation marks, but a mechanism exists for devising
them: the cmavo ``lau'' of selma'o LAU. ``lau'' must always be followed by a
BY word; the interpretation of the BY word is changed from a lerfu to a
punctuation mark. Typically, this BY word would be a name or brivla with
a ``bu'' suffix.
Why is ``lau'' necessary at all? Why not just use a ``bu''-marked word and
announce that it is always to be interpreted as a punctuation mark?
Primarily to avoid ambiguity. The ``bu'' mechanism is extremely open-ended,
and it is easy for Lojban users to make up ``bu'' words without bothering to
explain what they mean. Using the ``lau'' cmavo flags at least the most
important of such nonce lerfu words as having a special function:
punctuation. (Exactly the same argument applies to the use of ``zai'' to
signal an alphabet shift or ``ce'a'' to signal a font shift.)
Since different alphabets require different punctuation marks, the
interpretation of a ``lau''-marked lerfu word is affected by the current
alphabet shift and the current font shift.
Chinese characters (``han4zi4'' in Chinese, ``kanji'' in Japanese) represent
an entirely different approach to writing from alphabets or syllabaries.
(A syllabary, such as Japanese hiragana or Amharic writing, has one
lerfu for each syllable of the spoken language.) Very roughly, Chinese
characters represent single elements of meaning; also very roughly,
they represent single syllables of spoken Chinese. There is in principle
no limit to the number of Chinese characters that can exist, and many
thousands are in regular use.
It is hopeless for Lojban, with its limited lerfu and shift words,
to create an alphabet which will match this diversity. However, there
are various possible ways around the problem.
First, both Chinese and Japanese have standard Latin-alphabet representations,
known as ``pinyin'' for Chinese and ``romaji'' for Japanese, and these can be
used. Thus, the word ``han4zi4'' is conventionally written with two characters,
but it may be spelled out as:
8.1) .y'y.bu .abu ny. vo zy. .ibu vo
``h'' ``a'' ``n'' 4 ``z'' ``i'' 4
The cmavo ``vo'' is the Lojban digit ``4''. It is grammatical to intersperse
digits (of selma'o PA) into a string of lerfu words; as long as the first
cmavo is a lerfu word, the whole will be interpreted as a string of lerfu
words. In Chinese, the digits can be used to represent tones. Pinyin is more
usually written using accent marks, the mechanism for which was explained
in Section 6.
The Japanese company named ``Mitsubishi'' in English is spelled the same way
in romaji, and could be spelled out in Lojban thus:
8.2) my. .ibu ty. sy. .ubu by. .ibu sy. .y'y.bu .ibu
``m'' ``i'' ``t'' ``s'' ``u'' ``b'' ``i'' ``s'' ``h'' ``i''
Alternatively, a really ambitious Lojbanist could assign lerfu words to
the individual strokes used to write Chinese characters (there are about
seven or eight of them if you are a flexible human being, or about 40 if
you are a rigid computer program), and then represent each character with a
``tei'', the stroke lerfu words in the order of writing (which is standardized
for each character), and a ``foi''. No one has as yet attempted this project.
So far, lerfu words have only appeared in Lojban text when spelling out
words. There are several other grammatical uses of lerfu words within
Lojban. In each case, a single lerfu word or more than one may be used.
Therefore, the term ``lerfu string'' is introduced: it is short for ``sequence
of one or more lerfu words''.
A lerfu string may be used as a pro-sumti (a sumti which refers to some
previous sumti), just like the pro-sumti ``ko'a'', ``ko'e'', and so on:
9.1) .abu prami by.
A loves B
In Example 9.1, ``.abu'' and ``by.'' represent specific sumti, but which sumti
they represent must be inferred from context.
Alternatively, lerfu strings may be assigned by ``goi'', the regular pro-sumti
assignment cmavo:
9.2) le gerku goi gy. cu xekri .i gy. klama le zdani
The dog, or G, is black. G goes to the house.
There is a special rule that sometimes makes lerfu strings more advantageous
than the regular pro-sumti cmavo. If no assignment can be found for a
lerfu string (especially a single lerfu word), it can be assumed to refer
to the most recent sumti whose name or description begins in Lojban with
that lerfu. So Example 9.2 can be rephrased:
9.3) le gerku cu xekri. .i gy. klama le zdani
The dog is black. G goes to the house.
(A less literal English translation would use ``D'' for ``dog'' instead.)
Here is an example using two names and longer lerfu strings:
9.4) la stivn. mark. djonz. merko
.i la .aleksandr. paliitc. kuzNIETsyf. rusko
.i symyjy. tavla .abupyky. bau la lojban.
Steven Mark Jones is-American.
Alexander Pavlovitch Kuznetsov is-Russian.
SMJ talks-to APK in Lojban.
Perhaps Alexander's name should be given as ``ru'o.abupyky'' instead.
What about
9.5) .abu dunda by. cy.
A gives B C
Does this mean that A gives B to C? No. ``by. cy.'' is a single lerfu
string, although written as two words, and represents a single pro-sumti.
The true interpretation is that A gives BC to someone unspecified.
To solve this problem, we need to introduce the elidable terminator
``boi'' (of selma'o BOI). This cmavo is used to terminate lerfu strings
and also strings of numerals; it is required when two of these
appear in a row, as here. (The other reason to use ``boi'' is to attach
a free modifier --- subscript, parenthesis, or what have you --- to a
lerfu string.) The correct version is:
9.6) .abu [boi] dunda by. boi cy. [boi]
A gives B to C
where the two occurrences of ``boi'' in brackets are elidable, but the
remaining occurrence is not. Likewise:
9.7) xy. boi ro [boi] prenu cu prami
X all persons loves.
X loves everybody.
requires the first ``boi'' to separate the lerfu string ``xy.'' from the
digit string ``ro''.
The rules of Section 9 make it impossible to use unmarked lerfu words
to refer to lerfu themselves. In the sentence:
10.1) .abu. cu lerfu
A is-a-letteral.
the hearer would try to find what previous sumti ``.abu'' refers to. The
solution to this problem makes use of the cmavo ``me'o'' of selma'o LI, which
makes a lerfu string into a sumti representing that very string of lerfu.
This use of ``me'o'' is a special case of its mathematical use, which is to
introduce a mathematical expression used literally rather than for its value.
10.2) me'o .abu cu lerfu
the-expression ``a'' is-a-letteral.
Now we can translate Example 1.1 into Lojban:
10.4) dei vasru vo lerfu
po'u me'o .ebu
this-sentence contains four letterals
which-are the-expression ``e''.
This sentence contains four ``e''s.
Since the Lojban sentence has only four ``e'' lerfu rather than fourteen,
the translation is not a literal one --- but Example 10.4 is a Lojban truth
just as Example 1.1 is an English truth. Coincidentally, the colloquial
English translation of Example 10.4 is also true!
... li'u for representing lerfu">
The reader might be tempted to use quotation with ``lu ... li'u'' instead of
``me'o'', producing:
10.4) lu .abu li'u cu lerfu
[quote] .abu [unquote] is-a-letteral.
(The single-word quote ``zo'' cannot be used, because ``.abu'' is a compound
cmavo.) But Example 10.4 is false, because it says:
10.5) The word ``.abu'' is a letteral
which is not the case; rather, the thing symbolized by the word ``.abu'' is
a letteral. In Lojban, that would be:
10.6) la'e lu .abu li'u cu lerfu
The-referent-of [quote] .abu [unquote] is-a-letteral.
which is correct.
This chapter is not about Lojban mathematics, which is explained in
Chapter 18, so the mathematical uses of lerfu strings will be listed and
exemplified but not explained.
A lerfu string as mathematical variable:
11.1) li .abu du li by. su'i cy.
the-number a equals the-number b plus c
a = b + c
A lerfu string as function name (preceded by ``ma'o'' of selma'o MAhO):
11.2) li .y.bu du li ma'o fy. boi xy.
the-number y equals the number the-function f of x
y = f(x)
Note the ``boi'' here to separate the lerfu strings ``fy'' and ``xy''.
A lerfu string as selbri (followed by a cmavo of selma'o MOI):
11.3) le vi ratcu ny.moi le'i mi ratcu
the here rat is-nth-of the-set-of my rats
This rat is my Nth rat.
A lerfu string as utterance ordinal (followed by a cmavo of selma'o MAI):
11.4) ny.mai
Nthly
A lerfu string as subscript (preceded by ``xi'' of selma'o XI):
11.5) xy. xi ky.
x sub k
A lerfu string as quantifier (enclosed in ``vei ... ve'o'' parentheses):
11.6) vei ny. [ve'o] lo prenu
( ``n'' ) persons
The parentheses are required because ``ny. lo prenu'' would be two separate
sumti, ``ny.'' and ``lo prenu''. In general, any mathematical expression
other than a simple number must be in parentheses when used as a quantifier;
the right parenthesis mark, the cmavo ``ve'o'', can usually be elided.
All the examples above have exhibited single lerfu words rather than lerfu
strings, in accordance with the conventions of ordinary mathematics. A
longer lerfu string would still be treated as a single variable or function
name: in Lojban, ``.abu by. cy.'' is not the multiplication ``a x b x c'' but
is the variable ``abc''. (Of course, a local convention could exist that made
the value of a variable like ``abc'', with a multi-lerfu-word name, equal to
the values of the variables ``a'', ``b'', and ``c'' multiplied together.)
There is a special rule about shift words in mathematical text: shifts
within mathematical expressions do not affect lerfu words appearing outside
- mathematical expressions, and vice versa.
-
12. Acronyms
An acronym is a name constructed of lerfu. English examples are ``DNA'',
``NATO'', ``CIA''. In English, some of these are spelled out (like ``DNA'' and
``CIA'') and others are pronounced more or less as if they were ordinary
English words (like ``NATO''). Some acronyms fluctuate between the two
pronunciations: ``SQL'' may be ``ess cue ell'' or ``sequel''.
In Lojban, a name can be almost any sequence of sounds that ends in a
consonant and is followed by a pause. The easiest way to Lojbanize
acronym names is to glue the lerfu words together, using ``''' wherever
two vowels would come together (pauses are illegal in names) and adding
a final consonant:
12.1) la dyny'abub. .i la ny'abuty'obub.
.i la cy'ibu'abub. .i la sykybulyl.
.i la .ibubymym. .i la ny'ybucyc.
DNA. NATO.
CIA. SQL.
IBM. NYC.
There is no fixed convention for assigning the final consonant. In
Example 12.1, the last consonant of the lerfu string has been replicated
into final position.
Some compression can be done by leaving out ``bu'' after vowel lerfu words
(except for ``.y.bu'', wherein the ``bu'' cannot be omitted without ambiguity).
Compression is moderately important because it's hard to say long names
without introducing an involuntary (and illegal) pause:
12.2) la dyny'am. .i la ny'aty'om.
.i la cy'i'am. .i la sykybulym.
.i la .ibymym. .i la ny'ybucym.
DNA. NATO.
CIA. SQL.
IBM. NYC.
In Example 12.2, the final consonant ``m'' stands for ``merko'', indicating
the source culture of these acronyms.
Another approach, which some may find easier to say and which is compatible
with older versions of the language that did not have a ``''' character, is
to use the consonant ``z'' instead of ``''':
12.3) la dynyzaz. .i la nyzatyzoz.
.i la cyzizaz. .i la sykybulyz.
.i la .ibymyz. .i la nyzybucyz.
DNA. NATO.
CIA. SQL.
IBM. NYC.
One more alternative to these lengthy names is to use the lerfu string itself
prefixed with ``me'', the cmavo that makes sumti into selbri:
12.4) la me dy ny. .abu
that-named what-pertains-to ``d'' ``n'' ``a''
This works because ``la'', the cmavo that normally introduces names used
as sumti, may also be used before a predicate to indicate that the
predicate is a (meaningful) name:
12.5) la cribe cu ciska
that-named ``Bear'' writes
Bear is a writer
Example 12.5 does not of course refer to a bear (``le cribe'' or ``lo cribe'')
but to something else, probably a person, named ``Bear''. Similarly,
``me dy ny. .abu'' is a predicate which can be used as a name, producing
a kind of acronym which can have pauses between the individual lerfu words.
Since the first application of computers to non-numerical information,
character sets have existed, mapping numbers (called ``character codes'') into
selected lerfu, digits, and punctuation marks (collectively called
``characters''). Historically, these character sets have only
covered the English alphabet and a few selected punctuation marks.
International efforts are now underway to create a unified character set
that can represent essentially all the characters in essentially all the
world's writing systems. Lojban can take advantage of these encoding
schemes by using the cmavo ``se'e'' (of selma'o BY). This cmavo is
conventionally followed by digit cmavo of selma'o PA representing the
character code, and the whole string indicates a single character in some
computerized character set:
13.1) me'o se'ecixa cu lerfu
la .asycy'i'is. loi merko rupnu
the-expression [code] 36 is-a-letteral
in-set ASCII
for-the-mass-of American currency-units.
The character code 36 in ASCII represents
American dollars.
``$'' represents American dollars.
Understanding Example 13.1 depends on knowing the value in the ASCII
character set (one of the simplest and oldest) of the ``$'' character.
Therefore, the ``se'e'' convention is only intelligible to those who know
the underlying character set. For precisely specifying a particular
character, however, it has the advantages of unambiguity and (relative)
cultural neutrality, and therefore Lojban provides a means for those
with access to descriptions of such character sets to take advantage
of them.
As another example, the Unicode character set (also known as ISO 10646)
represents the international symbol of peace, an inverted trident in a
circle, using the base-16 value 262E. In a suitable context, a Lojbanist
may say:
13.2) me'o se'erexarerei sinxa le ka panpi
the-expression [code] 262E is-a-sign-of
the quality-of being-at-peace
When a ``se'e'' string appears in running discourse, some metalinguistic
convention must specify whether the number is base 10 (as above) or
some other base, and which character set is in use.
cmavo selma'o meaning
bu BU makes previous word into
a lerfu word
ga'e BY upper case shift
to'a BY lower case shift
tau LAU case-shift next lerfu word only
lo'a BY Latin/Lojban alphabet shift
ge'o BY Greek alphabet shift
je'o BY Hebrew alphabet shift
jo'o BY Arabic alphabet shift
ru'o BY Cyrillic alphabet shift
se'e BY following digits are
a character code
na'a BY cancel all shifts
zai LAU following lerfu word
specifies alphabet
ce'a LAU following lerfu word
specifies font
lau LAU following lerfu word
is punctuation
tei TEI start compound lerfu word
foi FOI end compound lerfu word
Note that LAU cmavo must be followed by a BY cmavo or the equivalent,
where ``equivalent'' means: either any Lojban word followed by ``bu'', another
LAU cmavo (and its required sequel), or a ``tei ... foi'' compound cmavo.
The following sections contain tables of proposed lerfu words for some of
the standard alphabets supported by the Lojban lerfu system. The first
column of each list is the lerfu (actually, a Latin-alphabet name sufficient
to identify it). The second column is the proposed name-based lerfu word,
and the third column is the proposed lerfu word in the system based on using
the cmavo of selma'o BY with a shift word.
These tables are not meant to be authoritative (several authorities within
the Lojban community have niggled over them extensively, disagreeing with
each other and sometimes with themselves). They provide a working basis
until actual usage is available, rather than a final resolution of lerfu word
problems. Probably the system presented here will evolve somewhat before
settling down into a final, conventional form.
For Latin-alphabet lerfu words, see Section 2 (for Lojban) and Section 5
(for non-Lojban Latin-alphabet lerfu).
alpha .alfas. bu .abu
beta .betas. bu by
gamma .gamas. bu gy
delta .deltas. bu dy
epsilon .Epsilon. bu .ebu
zeta .zetas. bu zy
eta .etas. bu .e'ebu
theta .tetas. bu ty. bu
iota .iotas. bu .ibu
kappa .kapas. bu ky
lambda .lymdas. bu ly
mu .mus. bu my
nu .nus. bu ny
xi .ksis. bu ksis. bu
omicron .Omikron. bu .obu
pi .pis. bu py
rho .ros. bu ry
sigma .sigmas. bu sy
tau .taus. bu ty
upsilon .Upsilon. bu .ubu
phi .fis. bu py. bu
chi .xis. bu ky. bu
psi .psis. bu psis. bu
omega .omegas. bu .o'obu
rough .dasei,as. bu .y'y
smooth .psiles. bu xutla bu
17. Proposed lerfu words for the Cyrillic alphabet
The second column in this listing is based on the historical names
of the letters in Old Church Slavonic. Only those letters used in Russian
are shown; other languages require more letters which can be devised as
needed.
a .azys. bu .abu
b .bukys. bu by
v .vedis. bu vy
g .glagolis. bu gy
d .dobros. bu dy
e .iestys. bu .ebu
zh .jivet. bu jy
z .zemlias. bu zy
i .ije,is. bu .ibu
short i .itord. bu
k .kakos. bu ky
l .liudi,ies. bu ly
m .myslites. bu my
n .naciys. bu ny
o .onys. bu .obu
p .pokois. bu py
r .riytsis. bu ry
s .slovos. bu sy
t .tvriydos. bu ty
u .ukys. bu .ubu
f .friytys. bu fy
kh .xerys. bu xy
ts .tsis. bu tsys. bu
ch .tcriyviys. bu tcys. bu
sh .cas. bu cy
shch .ctas. bu ctcys. bu
hard sign .ier. bu jdari bu
yeri .ierys. bu .y.bu
soft sign .ieriys. bu ranti bu
reversed e .ecarn. bu
yu .ius. bu .iubu
ya .ias. bu .iabu
18. Proposed lerfu words for the Hebrew alphabet
aleph .alef. bu .alef. bu
bet .bet. bu by
gimel .gimel. bu gy
daled .daled. bu dy
he .xex. bu .y'y
vav .vav. bu vy
zayin .zai,in. bu zy
khet .xet. bu xy. bu
tet .tet. bu ty. bu
yud .iud. bu .iud. bu
kaf .kaf. bu ky
lamed .LYmed. bu ly
mem .mem. bu my
nun .nun. bu ny
samekh .samex. bu samex. bu
ayin .ai,in. bu .ai,in bu
pe .pex. bu py
tzadi .tsadik. bu tsadik. bu
quf .kuf. bu ky. bu
resh .rec. bu ry
shin .cin. bu cy
sin .sin. bu sy
taf .taf. bu ty.
dagesh .daGEC. bu daGEC. bu
hiriq .xirik. bu .ibu
tzeirekh .tseirex. bu .eibu
segol .seGOL. bu .ebu
qubbutz .kubuts. bu .ubu
qamatz .kamats. bu .abu
patach .patax. bu .a'abu
sheva .cyVAS. bu .y.bu
kholem .xolem. bu .obu
shuruq .curuk. bu .u'ubu
19. Proposed lerfu words for some accent marks and multiple letters
This list is intended to be suggestive, not complete: there are lerfu such as
Polish ``dark'' l and Maltese h-bar that do not yet have symbols.
acute .akut. bu or
.pritygal. bu [pritu galtu]
grave .grav. bu
or .zulgal. bu [zunle galtu]
circumflex .cirkumfleks. bu
or .midgal. bu [midju galtu]
tilde .tildes. bu
macron .makron. bu
breve .brevis. bu
over-dot .garmoc. bu [gapru mokca]
umlaut/trema relmoc. bu [re mokca]
over-ring .garjin. bu [gapru djine]
cedilla .seDIlys. bu
double-acute .re'akut. bu [re akut.]
ogonek .ogoniek. bu
hacek .xatcek. bu
ligatured fi tei fy. ibu foi
Danish/Latin ae tei .abu .ebu foi
Dutch ij tei .ibu jy. foi
German es-zed tei sy. zy. foi
20. Proposed lerfu words for radio communication
There is a set of English words which are used, by international agreement,
as lerfu words (for the English alphabet) over the radio, or in noisy
situations where the utmost clarity is required. Formally they are known
as the ``ICAO Phonetic Alphabet'', and are used even in non-English-speaking
countries.
This table presents the standard English spellings and proposed Lojban
versions. The Lojbanizations are not straightforward renderings of the
English sounds, but make some concessions both to the English spellings
of the words and to the Lojban pronunciations of the lerfu (thus
``carlis. bu'', not ``tcarlis. bu'').
Alfa .alfas. bu
Bravo .bravos. bu
Charlie .carlis. bu
Delta .deltas. bu
Echo .ekos. bu
Foxtrot .fokstrot. bu
Golf .golf. bu
Hotel .xoTEL. bu
India .indias. bu
Juliet .juliet. bu
Kilo .kilos. bu
Lima .limas. bu
Mike .maik. bu
November .novembr. bu
Oscar .oskar. bu
Papa .paPAS. bu
Quebec .keBEK. bu
Romeo .romios. bu
Sierra .sieras. bu
Tango .tangos. bu
Uniform .Uniform. bu
Victor .viktas. bu
Whiskey .uiskis. bu
X-ray .eksreis. bu
Yankee .iankis. bu
Zulu .zulus. bu