Chapter 4. Linguistic Issues pertaining to Lojban

bauske se casnu sera'a le lojbo
1. What is the Sapir-Whorf hypothesis?
2. Lojban sentences do not have unique interpretations; how can Lojban be said to be unambiguous?
3. If the meaning of a particular tanru cannot be completely understood from understanding the component parts, a separate dictionary entry is needed for every possible tanru, making the Lojban dictionary infinitely long. How can this be avoided?
4. The Lojban gismu seem to have been chosen at random, without regard to any sort of semantic theory. Why was this done?
5. tanru like nixli ckule, analogous to English girls' school, are so open-ended in sense that there is no way to block such far-fetched interpretations as 'a school intended to train girls between the ages of 6 and 10 to play the bassoon', which is patently absurd. What is the proper interpretation of tanru?
6. Lojban claims to be unambiguous, but many constructs have vague meanings, and the meanings of the gismu themselves are quite sparsely specified. On the other hand, Lojban forces precision on speakers where it is not wanted and where natural-language speakers can easily avoid it. Is this appropriate to a culturally neutral, unambiguous language?
7. Why are Lojban tanru involving more than two components always left-grouping (in the absence of a marker word), when right-grouping structure is much more natural to human languages?
8. Why are there so many cmavo, and why are many of them so similar? Wouldn't this make Lojban hard to understand at a cocktail party (or a similar noisy environment)?
9. Does Lojban have transformations, as are commonly assumed to exist for natural languages?
10. Lojban connectives cannot be used to correctly translate English If you water it, it will grow, because material implication is too weak and the special causal connectives, which connect assertions, are too strong. What can be done instead?
11. How can Lojban logical connectives be used in imperative sentences? Logical connectives work properly only on complete sentences, and of those, only those which actually assert something.
12. Since tense is optional in Lojban, how is a mixture of tensed and untensed sentences to be interpreted?
13. What theory underlies the choice of place structures?
14. How do borrowed words (fu'ivla) enter Lojban without compromising its stability and unambiguity?
15. The Lojban phonological system is hard to use for English-speakers (to say nothing of Japanese-speakers), due to the large numbers of consonant clusters. How can a language be culturally neutral when it is difficult to pronounce?
16. Lojban words resemble their English cognates, but unsystematically so. Does this really aid learning, or does it make learning more difficult?
17. What is the process by which the gismu were devised? What are some examples of the process and its results?
18. What is the standard word order of Lojban?
19. Lojban claims to be culturally neutral. But many of its conceptual distinctions, for example the color set, are clearly biased towards particular languages. There is a word for 'brown', which is a color not used in Chinese (although a word exists, it is rare); on the other hand, there is only one word for 'blue', although Russian-speakers convey the range of English 'blue' with two words. How can Lojban be prevented from splintering into dialects which differ in such points?
20. Lojban is supposed to be intended as a test of the Sapir-Whorf hypothesis in its negative form: "structural features of language make a difference in our awareness of the relations between ideas." Is this simply another way of saying "Distinctions are more likely to be noticed if structurally marked"? If so, this is trivially true.
21. How can 'ease of thought' be measured? Measuring facility with predicate logic may not be enough to establish 'ease of thought'.
22. What scientific and linguistic interest can Lojban possibly have, since it is entirely a product of conscious design? Its study will only reveal what its designers already put in the language -- which surely tells us little about language as it actually occurs among humans.
23. Why is Lojban a useful testbed for experimental linguistics? What are some plausible experiments that could be done using Lojban?
24. How is Lojban useful as a tool of linguistic analysis?
25. What sociolinguistic applications would Lojban have?

1. What is the Sapir-Whorf hypothesis?

The Sapir-Whorf hypothesis is the notion that the language you speak affects the way you think. Most people who have learnt a foreign language, or have grown up speaking two languages, will be familiar with this idea, having found themselves thinking and speaking in one language or the other because something is easier to say in that language. One of the main ideas behind the Loglan/Lojban project was to create a language which is both highly expressive and as culturally neutral as possible, then see what people from different cultures do with it.

To give an example, in most European languages tense and gender are very important, and need to be made explicit in most utterances -- you can say "She goes", "It went", "He'll go" and so on; but just to say "She/he/it go", with no particular gender or time in mind, sounds strange. In Chinese, on the other hand, ta1 qu4 ('he/she/it go') is perfectly normal. In Lojban there are plenty of words to show the time of an action, its length, how it happens and so on -- but you don't have to use any of them. If you really wanted to, you could say:

le ninmu puzuze'udi'i klama
the female-human past-long-time-distance-long-time-interval-regularly go
A long time ago, for a long time, she went regularly.

But you can equally well say:

klama
[someone/something] go 

Notice that you can translate the first example into English (more-or-less) but the second one just won't go into English, or most European languages. If you speak a European language and this strikes you as odd, you may have just witnessed a Sapir-Whorf effect! Understanding the potential for Sapir-Whorf effects may lead to better inter-cultural understanding, promoting communication and peace.

It is known that people's ideas and thought change somewhat when they learn a foreign language. It is not known whether this change is due to exposure to a different culture, or even just getting outside of one's own culture. It is also not known how much (if any) of the change is due to the nature of the language, as opposed to the cultural associations.

The Sapir-Whorf hypothesis was important in linguistics in the 1950s, but interest fell off afterwards, partially because testing it properly was so difficult. Lojban allows a new approach to such testing. Obviously, if a culture-independent language could be taught to groups of people, the effects of language could more easily be separated from those of culture.

Unique features of Lojban remove constraints on language in the areas of logic, ambiguity, and expressive power, opening up areas of thought that have not been easily accessible to human language before. Meanwhile, the formal rigidity of the language definition allows speakers to carefully control their expressions (and perhaps therefore their thought processes). This gives some measure of predictive power that can be used in designing and preparing for actual Sapir-Whorf experiments.

One of the prerequisites of a Sapir-Whorf experiment is an international body of Lojban speakers. We need to be able to teach Lojban to subjects who know only their native (non-English) tongue, and we need to know in advance the difficulties that people from each language and culture will have in learning Lojban. Thus, the Lojban community is actively reaching out to speakers of languages other than English.

Lojban does not need to prove or disprove the Sapir-Whorf hypothesis in order to be successful. However, if evidence is produced supporting the Sapir-Whorf effect, Lojban will likely be perceived as an outstanding tool of analytical and creative thought.


2. Lojban sentences do not have unique interpretations; how can Lojban be said to be unambiguous?

The sense in which Lojban is said to be unambiguous is not a simple one, and some amplification of the basic claim is necessary. Ambiguity can be judged on four levels: the phonological-graphical, the morphological, the syntactic, and the semantic.

Lojban is audio-visually isomorphic: the writing system has a grapheme for every phoneme and vice versa, and there are no supra-segmental phonemes (such as tones or pitch) which are not represented in the writing system. Lojban's phonology contains significant pauses that affect word boundaries, and allows pauses between any two words. The optional written representation for pause is a period, although required pauses can be unambiguously identified in written text from the morphological rules alone. Lojban also uses stress significantly, and again there is a written representation (capitalization of the affected vowel or syllable), which is omitted in most text, where the morphological default of penultimate stress applies.

Lojban is morphologically unambiguous in two senses: a string of phonemes (including explicit pause and stress information) can be broken up into words in only one way, and each compound word can be converted to and from its constituents in only one way.

The syntactic unambiguity of Lojban has been established by the use of a LALR(1) parser generator which, given a series of simple pre-parser operations, produces a unique parse for every Lojban text that follows its grammatical norms. In addition, the existence of a defined 'phrase structure rule' grammar underlying the language (and tested via the parser generator) guarantees that there are no sentences where distinct deep structures generate isomorphic surface structures.

The claim for semantic unambiguity is a limited one only. Lojban contains several constructs which are explicitly ambiguous semantically. The most important of these are Lojban tanru (so-called 'metaphors') and Lojban names. Names are ambiguous in almost any language, and Lojban is no better; a name simply must be resolved in context, and the only final authority for the meaning of a name is the user of the name. tanru are further discussed in later questions. However, the semantics of the root predicates of Lojban (gismu), and of its function words (cmavo), are explicitly defined, with a unique, though at times broad, sense.


3. If the meaning of a particular tanru cannot be completely understood from understanding the component parts, a separate dictionary entry is needed for every possible tanru, making the Lojban dictionary infinitely long. How can this be avoided?

tanru are binary combinations of predicates, such that the second predicate is the 'head' and the first predicate is a modifier for that head. The meaning of the tanru is the meaning of its head, with the additional information that there is some unspecified relationship between the head and the modifier.

tanru are the basis of compound words in Lojban. However, a compound word has a single defined meaning whereas the meaning of a tanru is explicitly ambiguous. Lojban tanru are not as free as English figures of speech; they are 'analytic', meaning that the components of the tanru do not themselves assume a figurative sense. Only the connection between them is unstated.

Most of the constructs of Lojban are semantically unambiguous, and there are semantically unambiguous ways (such as with relative clauses) to paraphrase the meaning of any tanru. For example, slasi mlatu ('plastic-cat') might be paraphrased in ways that translate to 'cat that is made from plastic' or 'cat which eats plastic' or various other interpretations, just as in English. However, the single (compound) word derived from this tanru, slasymlatu, has exactly one meaning from among the interpretations, which could be looked up in a dictionary (if someone had found the word useful enough to formally submit it for inclusion). There is no law compelling the creation of such a word, however, and there is even an 'escape mechanism' allowing a speaker to indicate that a particular instance of a 'nonce' compound word is 'non-standard' (has not been checked against a dictionary or other standard), and may have a meaning based on an unusual interpretation of the underlying tanru.


4. The Lojban gismu seem to have been chosen at random, without regard to any sort of semantic theory. Why was this done?

Lojban content words are built up from a list of around 1350 root words (gismu), which are not necessarily to be taken as semantically simple themselves. Lojban does not claim to exhibit a complete and comprehensive semantic theory which hierarchically partitions the entire semantic space of human discourse. Furthermore, while some gismu were chosen because they seem to be semantic primitives, many others (e.g. nanmu, meaning 'human male') are plainly not.

Rather, the 1350-odd root words blanket semantic space, in the sense that everything human beings talk about can be built up using appropriate tanru. This claim is being tested in actual usage, and root words can still be added if necessary (after careful consideration), if genuine gaps are found. For the most part, the few gaps which have been recognized (about 20 words have been added in the past decade) reflect the completing of semantic sets. It is no longer permitted for language users to create new gismu root words (in the standard form of the language, at least); newly coined words must fall recognizably outside the highly regulated gismu morphological space. (A specific and separate morphological structure is reserved for coined words -- usually borrowings; and a marker is available to indicate that a word is a 'nonce' coinage rather than an established 'dictionary word').

Lojban's empirically derived word list is similar to that of Basic English, which replaces the whole English vocabulary with English-normal compounds built from about 800 root words. Lojban and Basic English both allow for the adoption of technical terms from other languages to cover things like plant and animal names, food names, and names of chemical compounds.


5. tanru like nixli ckule, analogous to English girls' school, are so open-ended in sense that there is no way to block such far-fetched interpretations as 'a school intended to train girls between the ages of 6 and 10 to play the bassoon', which is patently absurd. What is the proper interpretation of tanru?

The Lojban tanru nixli ckule ('girl type of school') cannot mean, out of context, 'school intended to train girls between 6 and 10 years of age to play the bassoon', although if such a school existed it could certainly be called a nixli ckule. This interpretation can be rejected as implausible because it involves additional restrictive information. The undefined relationship between nixli and ckule cannot drag in additional information 'by the hair', as it were. Instead, this intricate interpretation would require a larger tanru incorporating nixli ckule as one of its components, or else a non-tanru construct, probably involving a Lojban relative clause. By way of comparison, such interpretations as 'school containing girls', 'school whose students are girls', and 'school to train persons to behave like girls' are plausible with minimal context, because these renderings do not involve additional restriction.


6. Lojban claims to be unambiguous, but many constructs have vague meanings, and the meanings of the gismu themselves are quite sparsely specified. On the other hand, Lojban forces precision on speakers where it is not wanted and where natural-language speakers can easily avoid it. Is this appropriate to a culturally neutral, unambiguous language?

Lojban's avoidance of ambiguity does not mean an avoidance of vagueness. A Lojban aphorism states that the price of infinite precision is infinite verbosity, as indeed Wilkins' Philosophical Language (1668) illustrates. Lojban's allowable vagueness permits useful sentences to be not much longer than their natural-language counterparts.

There are many ways to omit information in Lojban, and it is up to the listener to reconstruct what was meant, just as in natural languages. In each construct, there are specific required and optional components. Unlike English, omitting an optional component explicitly and unambiguously flags an ellipsis. Furthermore, the listener has a clear way of querying any of this elliptically omitted information.

There are also some categories which are necessary in Lojban and not in other languages. For example, Lojban requires the speaker, whenever referring to objects, to specify whether the objects are considered as individuals, as a mass, or as a (set theoretic) set. Likewise, logical relations are made explicit: there can be no neutrality in Lojban about inclusive vs. exclusive or, which are no more closely related semantically than any other pair of logical connectives.

These properties are a product of Lojban's fundamental design, which was chosen to emphasize a highly distinctive and non-natural syntax (that of formal first-order predicate logic) embedded in a language with the same expressive power as natural languages. Through the appearance of this one highly unusual feature, the intent of the Loglan/Lojban Project has been to maximize one difference between Lojban and natural languages without compromising speakability and learnability. This difference could then be investigated by considering whether the use of first-order predicate logic as a syntactic base has aided fluent Lojban speakers in the use of this logic as a reasoning tool.

Lojban gismu roots are defined rather abstractly, in order to cover as large a segment of closely related semantic space as possible. These broad (but not really vague) concepts can then be restricted using tanru and other constructs to any arbitrary degree necessary for clarity. Communicating the meaning of a gismu (or any other Lojban word) is a problem of teaching and lexicography. The concepts are defined as predicate relationships among various arguments, and various experimental approaches have been explored to determine the best methods of conveying these meanings.


7. Why are Lojban tanru involving more than two components always left-grouping (in the absence of a marker word), when right-grouping structure is much more natural to human languages?

Lojban is predominantly a left-grouping language. By default, all structures are left-grouping, with right-grouping available when marked by a particle. Since the head of most constructs appears on the left, left-grouping structures tend to favor the speaker. Nothing spoken needs to be revised to add more information. When the head is on the right, as in the case of tanru, left-grouping may seem counterintuitive, as it requires the listener to retain the entire structure in mind until the head is found. However, left-grouping was retained even in tanru for the sake of simplicity.

Experience has shown, however, that Lojban's left-grouping structure is not a major problem for language learners. Indeed, many longer English metaphors translate directly into Lojban using simple left-grouping structures.


8. Why are there so many cmavo, and why are many of them so similar? Wouldn't this make Lojban hard to understand at a cocktail party (or a similar noisy environment)?

One of the recurrent difficulties with all forms of Loglan, including Lojban, is the tendency to fill up the available space of structure words, making words of similar function hard to distinguish in noisy environments. This has happened because of the concern that Lojban allow the speaker to be as precise as they choose to be in using Lojban grammar, without privileging one language group's choices of what to express grammatically over another's. The phonological revisions made when Lojban split from Institute Loglan allowed for many more structure words, but once again the list has almost entirely filled.

In some cases, notably the digits 0-9, an effort has been made to separate them phonologically. The vocatives (including the words used for communication protocol, e.g. over the radio) are also maximally separated phonologically. Many other cmavo are based on shortened forms of corresponding gismu roots, however, and are not maximally separated.

A variety of ways to say "Huh?" have been added to the language, partially alleviating the difficulty. These question words can be used to specify the type of word that was expected, or the part of the relationship that was not understood by the listener.


9. Does Lojban have transformations, as are commonly assumed to exist for natural languages?

Yes, in the sense that there are several alternative surface structures that have the same semantics and therefore, presumably, the same deep structure. What Lojban does not have is identical surface structures with differing deep structures (leading to syntactic ambiguity), so a surface-structure-only grammar is sufficient to develop an adequate parsing for every text. Knowledge of transformations is required only to get the semantics right.


10. Lojban connectives cannot be used to correctly translate English If you water it, it will grow, because material implication is too weak and the special causal connectives, which connect assertions, are too strong. What can be done instead?

The English sentence If you water it, it will grow looks superficially like a Lojban na.a connection (material implication), but it actually has causal connotations not present in na.a. Therefore, a proper translation must involve the notion of cause. Neither the Lojban coordinating causal conjunction nor the two correlative subordinating causal conjunctions (one of which subordinates the cause and the other the effect) will serve, since these require that either the cause, or the effect, or both be asserted. Instead, the correct translation of the English involves 'cause' as a predicate, and might be paraphrased "The event of your watering it is a cause of the event of its future growing." (roda zo'u lenu do jacysabji da cu rinka lenu da ba banro)


11. How can Lojban logical connectives be used in imperative sentences? Logical connectives work properly only on complete sentences, and of those, only those which actually assert something.

There is a special imperative pronoun ko. This is a second person pronoun logically equivalent to do, the normal Lojban word for 'you', but conveying an imperative sense. Thus, an imperative can be understood as commanding the listener to make true the assertion which results when ko is replaced by do.

For example, ko sisti ('Stop!') is logically equivalent to do sisti ('you stop'), and pragmatically may be understood as "Make 'do sisti' true!". This allows logical connection to be used in imperatives without loss of clarity or generality; the logical connection applies to the assertion which is in effect embedded in the imperative.

So ko sisti .inaja mi ceclygau would seem to mean "Stop or I'll shoot", but actually means "bring about a situation whereby, if you don't stop, I'll shoot" -- not quite the same thing. The sense of "stop or I'll shoot" is properly conveyed by the phrase .i do bazi sisti .ijoinai mi ba ceclygau, "Either you will stop immediately, or I will shoot.".

A minor advantage of this style of imperative is that tensed imperatives like ko ba klama ('Come in-the-future!') become straightforward.


12. Since tense is optional in Lojban, how is a mixture of tensed and untensed sentences to be interpreted?

Lojban tense, like other incidental modifiers of a predication, tend to be contextually 'sticky'. Once specified in connected discourse, to whatever degree of precision seems appropriate, tense need not be respecified in each sentence. In narration, this assumption is modified to the extent that each sentence is assumed to refer to a slightly later time than the previous sentence, although with explicit tense markers it is possible to tell a story in reversed or scrambled time order. Therefore, each predication does have a tense, which is implicit if not necessarily explicit.


13. What theory underlies the choice of place structures?

Very little. Place structures are empirically derived, like the gismu list itself, and present a far more difficult problem; therefore, they were standardized rather late in the history of the project. There is no sufficiently complete and general case theory that allows the construction of a priori place structures for the large variety of predicates that exist in the real world.

The current place structures of Lojban represent a three-way compromise: fewer places are easier to learn; more places make for more concision (arguments not represented in the place structure may be added, but must be marked with appropriate case tags); the presence of an argument in the place structure makes a metaphysical claim that it is required for the predication to be meaningful.

This last point requires some explanation. For example, the predicate klama ('come, go') has five places: the actor, the destination, the origin, the route, and the means. Lojban therefore claims that anything not involving these five notions (whether specified in a particular sentence or not) is not an instance of klama. The predicate cliva ('leave') has the same places except for the destination; it is not necessary to be going anywhere in particular for cliva to hold. litru ('travel') has neither origin nor destination, merely, the actor, the route, and the means. The predicate cadzu ('walk'), involves a walker, a surface to walk on, and a means of walking (typically legs). One may walk without an origin or a destination (in circles, for instance); but walking in circles is not considered in Lojban to be 'going', as there is nowhere one is going to.

For describing the act of walking from somewhere to somewhere, the tanru cadzu klama or the corresponding lujvo dzukla would be appropriate. The tanru cadzu cliva and cadzu litru may be similarly analyzed.


14. How do borrowed words (fu'ivla) enter Lojban without compromising its stability and unambiguity?

There are four stages of borrowing in Lojban, as words become more and more modified (but shorter and easier to use). Stage 1 is the use of a foreign name quoted as it stands with the cmavo la'o. For example, me la'o ly. spaghetti .ly. is a predicate with the place structure "x1 is a quantity of spaghetti." The foreign term is entirely unchanged. However, it requires five extra syllables and two pauses, and its pronunciation is unspecified.

Stage 2 involves changing the foreign name to a Lojbanized name: me la spagetis. This saves three syllables and one pause over Stage 1, and has a definite Lojban pronunciation. However, it is still awkward to use repeatedly.

Even so, one of these expedients is often quite sufficient when a word is needed quickly in conversation. This can make it easier to get by without a full command of the Lojban vocabulary -- especially this early in the history of the language, when a wide range of vocabulary is yet to be devised.

Where a little more universality is desired, the word to be borrowed must be Lojbanized into one of several permitted forms. A rafsi is then attached to the beginning of the Lojbanized form, using an r or l to ensure that the resulting word doesn't fall apart. The result is a Stage 3 fu'ivla such as cidjrspageti, a true brivla (predicate word) rather than a phrase.

The rafsi used in a Stage 3 fu'ivla categorizes or limits its meaning; otherwise a word having several different jargon meanings in other languages would require the word-inventor to choose which single meaning should be assigned to the fu'ivla. (fu'ivla, like other brivla, are not permitted to have more than one definition.) Stage 3 borrowings are at present the most common kind of fu'ivla.

Finally, Stage 4 fu'ivla do not have any rafsi classifier, and are used where a fu'ivla has become so common or so important that it must be made as short as possible. The Stage 4 fu'ivla for 'spaghetti' is spageti; however, most Stage 4 words require a much greater distortion of the original form of the word. Stage 4 fu'ivla have to pass several careful morphological tests to eliminate confusion with existing words and phrases, and cannot easily be devised during conversation.


15. The Lojban phonological system is hard to use for English-speakers (to say nothing of Japanese-speakers), due to the large numbers of consonant clusters. How can a language be culturally neutral when it is difficult to pronounce?

Lojban phonology is carefully restricted. There are only 4 falling and 10 rising diphthongs, and the rising diphthongs are used only in names and in paralinguistic grunts representing emotions. All 25 possible vowel combinations are used, but they are separated by a voiceless vocalic glide written with an apostrophe, thus preventing diphthongization. English-speakers think of this glide as /h/, but even speakers of languages like French, which has no /h/, can manage this sound intervocalically.

Consonant clusters are carefully controlled as well. Only 48 selected clusters are permitted initially; some of these, such as ml and mr, do not appear in English, but are still possible to English-speakers with a bit of practice. Medial consonant clusters are also restricted, to prevent mixed voiced-unvoiced clusters, and other hard-to-handle combinations. The Lojban sound /y/, IPA [@], is used to separate 'bad' medial clusters wherever the morphology rules would otherwise produce them. (ASCII IPA transliteration uses Evan Kirshenbaum's scheme.)

The difficulties with the variety of permitted initial sounds are not as great as one might think. Initial consonant clusters occur only in content words (predicates) and names. These words seldom are spoken in isolation; rather, they are expressed in a speech stream with a rhythmic stress pattern, typically preceded by words that end with a vowel. The unambiguous morphology allows the words to be broken apart even if run together at a very high speech rate. Meanwhile, though, the final vowel of the preceding word serves to buffer the cluster, allowing it to be pronounced as a much easier medial cluster. Thus le mlatu ('the cat'), while officially pronounced /le,mla,tu/, can be pronounced as the easier /lem,la,tu/ with no confusion to the listener.

In addition, the buffering sound, IPA [I] (the i of English bit), is explicitly reserved for insertion at any point into a Lojban word where the speaker requires it for ease of pronunciation. The word mlatu may be pronounced /mIlatu/ by those who cannot manage ml, and nothing else need be changed. This sound is 'stripped' by the listener before any further linguistic processing is done.


16. Lojban words resemble their English cognates, but unsystematically so. Does this really aid learning, or does it make learning more difficult?

Lojban words only faintly resemble their English cognates. Most Lojban words are fairly equal mixtures of English and Chinese, with lesser influences from Spanish, Hindi, Russian, and Arabic.

There is no proven claim that the Lojban word-making algorithm has any meaningful correlation with learnability of the words. Informal 'engineering tests' were conducted early in the Loglan Project, leading to the selection of the current algorithm, but these tests have never been documented or subjected to review. The Logical Language Group has proposed formal tests of the algorithm, and has instrumented its vocabulary teaching software to allow data to be gathered that can confirm or refute this hypothesis. Gathering this data may incidentally provide insights into the vocabulary learning process, enabling Lojban to serve as a test bed for research in second language acquisition.

In any event, the word-making algorithm used for Lojban has the clear benefit of ensuring that phonemes occur in the language in rough proportion to their occurrence in the source natural languages, and in patterns and orders similar to those in the source languages. (Thus the first syllable of Lojban gismu most frequently ends in /n/, reflecting the high frequency of syllable-ending /n/ in Chinese.) The result is a language that is much more pleasant-sounding than, for example, randomly chosen phoneme strings, while having at least some claim to being free of the European cultural bias found in the roots of most other constructed languages.


17. What is the process by which the gismu were devised? What are some examples of the process and its results?

An appropriate term was chosen from each of the six languages used in the process: Chinese, English, Hindi, Spanish, Russian, Arabic. (Natural language forms below are given in this order, labeled by single letters.) In some cases, two different terms from one or another language were used, to see which would have the higher score. The terms were then converted to Lojban phonology, with the affricates reduced to their corresponding fricatives, to avoid a false match between the stop segment of a source-language affricate and a Lojban stop. Morphological affixes were removed, also to avoid false matches.

To score a candidate gismu, it was compared with each of the six source-language words to produce six raw scores. The raw score for a particular source word is roughly the number of letters it has in common (in the same order) with the candidate gismu. (If there are two or fewer letters in common, special rules apply.) The raw score was then divided by the length of the source-language word, and multiplied by a weight reflecting the relative number of speakers of the source language at the time the candidate gismu was devised. The sum of the six adjusted scores is the final score for that candidate gismu, typically expressed as a percentage.

For example, the candidate mamta for 'mother' was built from the following six source words, in Lojbanized form: C ma, E mam, H mata, S mam, R mat, A am. (The English forms are based on American English pronunciation and lexis.) The raw scores are 2, 3, 4, 3, 3, and 2 respectively, leading to a final score of 100%. Given the metrics of Lojban design, this word cannot be improved on!

At the other end of the spectrum is ciblu, the Lojban gismu for 'blood'. The source words here were C ciue, E blad, H rakt, S sangr, R krof, A dam. The raw scores are 3, 2, 0, 0, 0, 0 respectively; the adjusted scores are 0.27, 0.105, 0, 0, 0, 0; and the final score is 37.5%. This is quite low, since it reflects only the Chinese and English sources, but was still superior to all other possibilities devised.

The gismu cukta, meaning 'book', works out considerably better. Its sources are C cu, E buk, H pustak, S libr, R knik, A kitab, with raw scores of 2, 2, 3, 0, 0, 3 and a final score of 57.2%. Several different language families blended nicely to form this word.

Other gismu show similar patterns. prenu, for example, meaning 'person', is a mixture of Chinese ren and the 'person' root used in English, Spanish, and Russian. vanju, for 'wine' (Lojbanized as vain) matches well with every source language except Arabic, and even with French vin (which would Lojbanize as van), which is not a source language at all.

Finally, there is jmive, the gismu for '(a)live', which owes its form to a transcription error: the English source word was not Lojbanized to laiv but left as live. That this form has been retained in use shows that the etymology of Lojban gismu is now of secondary importance: retaining a stable vocabulary has become more important for the success of the language than fine-tuning its recognizability.


18. What is the standard word order of Lojban?

Lojban is only secondarily a 'word order' language at all. Primarily, it is a particle language. Using a standard word order allows many of the particles to be 'elided' (dropped) in common cases. However, even the standard unmarked word order is by no means fixed; the principal requirement is that at least one argument precede the predicate, but it is perfectly all right for all of the arguments to do so, leading to an SOV word order rather than the currently canonical SVO (subject-verb-object): the two orders are equally unmarked syntactically. VSO order is expressible using only one extra particle. In two-argument predicates, OSV, OVS, and VOS are also possible with only one particle, and various even more scrambled orders (when more than two-place predicates are involved) can also be achieved.


19. Lojban claims to be culturally neutral. But many of its conceptual distinctions, for example the color set, are clearly biased towards particular languages. There is a word for 'brown', which is a color not used in Chinese (although a word exists, it is rare); on the other hand, there is only one word for 'blue', although Russian-speakers convey the range of English 'blue' with two words. How can Lojban be prevented from splintering into dialects which differ in such points?

To some extent, such splitting is inevitable and already exists in natural languages. Some English-speakers may use the color term 'aqua' in their idiolect, whereas others lump that color with 'blue', and still others with 'green'. Understanding is still possible, perhaps with some effort. The Lojban community will have to work out such problems for itself; there are sufficient clarifying mechanisms to resolve differences in idiolect or style between individuals. The unambiguous syntax and other constraints defined in the language prescription should make such differences much more easily resolvable than, say, the differences between two dialects of English.

The prescriptive phase of Lojban is not intended to solve all problems (especially all semantic problems) but merely to provide enough structure to get a linguistic community started. After that, the language will be allowed to evolve naturally, and will probably creolize a bit in some cultures. Observing the creolization of such a highly prescribed constructed language will undoubtedly reveal much about the nature of the processes involved.


20. Lojban is supposed to be intended as a test of the Sapir-Whorf hypothesis in its negative form: "structural features of language make a difference in our awareness of the relations between ideas." Is this simply another way of saying "Distinctions are more likely to be noticed if structurally marked"? If so, this is trivially true.

A better paraphrase might be "Unmarked features are more likely to be used, and therefore will tend to constitute the backgrounded features of the language". By making the unmarked features those which are most unlike natural-language features, a new set of thought habits will be created (if Sapir-Whorf is true) which will be measurably different from that of non-Lojban-speakers. If Sapir-Whorf is false, which is the null hypothesis for Lojban purposes, no such distinctions in thought habits will be detectable.

Further elaboration of Loglan/Lojban Project thinking about Sapir-Whorf has led to an alternate formulation: "The constraints imposed by structural features of language impose corresponding constraints on thought patterns." In attempting to achieve cultural neutrality, Lojban has been designed to minimize many structural constraints found in natural languages (such as word order, and the structural distinctions between noun, verb, and adjective). If Sapir-Whorf is true, this should result in measurable broadening in thought patterns (which may be manifested as increased creativity or ability to see relationships between superficially unrelated concepts). Again, the null hypothesis is that no measurable distinction will exist.


21. How can 'ease of thought' be measured? Measuring facility with predicate logic may not be enough to establish 'ease of thought'.

Perhaps not. However, the Sapir-Whorf hypothesis may be confirmed if experiments show that Lojban-speakers have a greater facility with predicate logic than non-Lojban-speakers. That would indicate that (natural) language limits thought in ways that Lojban-speakers can bypass. This form of test is admittedly not free of its own difficulties, which have been discussed elsewhere.


22. What scientific and linguistic interest can Lojban possibly have, since it is entirely a product of conscious design? Its study will only reveal what its designers already put in the language -- which surely tells us little about language as it actually occurs among humans.

Any language is a highly complex system--even an artificial language, as long as it is non-trivial. (This certainly holds true for Lojban!) In such a system, the interaction of the design features displays properties that are more than the sum of its parts. For example it is possible that all language is merely a system comprised of a bunch of neurons releasing neurotransmitters. Biochemistry may eventually devise a complete explanation for the neuronic process (including its genetic components), and we may then say we "know the design principles of the system." But we won't know the system, because the complexity of those neuronic interactions is so great that knowing the pieces does not give a total understanding of the system. This indeed may be what defines the concept of 'system'.

Knowing all the prescribed rules of an artificial language does not tell you how that artificial language is used communicatively. Consider this question: Given multiple ways of communicating the same idea, do users of the language choose particular forms over others, and why? This is similar to a question commonly asked about natural languages.

However, a simpler system, which can be understood more fully, may serve as an excellent model for a less well understood, more complex system. Thus the simpler system could be examined for parallels to hypotheses about the more complex system. Examination of the simpler system may suggest properties to look for in the more complex system, or it may even suggest hypotheses that can be tested in the more complex system. Constructing simplified models is after all how much of science -- including a good deal of linguistics -- is conducted.

A hot topic in parts of the Lojban community is whether the language has, or should have, an underlying semantic theory. If one exists, it is certainly not as developed or prescribed as the syntactic design and theory of the language. Eliminating syntactic ambiguity, however, does allow a more direct examination of semantic ambiguities, including the properties of modification and restriction, resolution of anaphora, and identification of ellipses. Any semantic theories proposed for natural language can be looked at in terms of semantic usage in the simpler Lojban system.

As a model of a natural language, it seems likely that any theory that is not true of Lojban is at the very least doubtful with regard to natural language, which would allow partial verification of such theories. If the theory is demonstrably true of natural language, then you have found evidence that Lojban is in some way unnatural. You would then need to explain which of the (fully-known) design features of Lojban causes this unnaturalness. Given the counterexample of Lojban, that design feature is not a feature of natural languages; this would demonstrate something about natural language by studying an artificial one.

As another example, pragmatic effects can be more easily recognized in the simpler Lojban system, and can be clearly identified as pragmatic. Thus, insights about pragmatic effects may be more visible in Lojban -- and those insights would then be tested in natural languages.


23. Why is Lojban a useful testbed for experimental linguistics? What are some plausible experiments that could be done using Lojban?

As just discussed, a simple system is easier to perform experiments on than a more complex system. There are fewer variables, and if the system is 'designed', some things that are uncontrolled variables in complex systems are in effect tuneable constants in the simple, carefully-designed system. You can then rerun the experiment with minor changes to explore the effects of those variables.

Experimental linguistics of this kind is virtually unthinkable using natural languages. The Sapir-Whorf hypothesis is not really testable, since we can't control any pertinent variables in natural language, and we don't know what features of a language might be decisive in a culture. Sapir-Whorf may be more testable when you can reduce or even control the variables with a language like Lojban.

Lojban is a predicate language, with no distinct nouns, verbs, or adjectives. What are the linguistic (communicative) properties of such a system? The answer has been partially explored through symbolic logic. But do people, when thinking linguistically, mimic in any way the processes of formal logic? What effects would a formal-logic-based language have on those linguistic thinking processes? Is the resulting language susceptible to the same analysis as natural language, in terms of the various formal systems that have been developed by linguists over the past few decades?

Computational natural language processing usually involves converting natural language to some kind of predicate form, from which deductions can be made; so the usefulness of predicate logic as a tool for such analysis is already accepted. But how does one identify the logical deductions that a human being makes from a natural language statement? By thinking in Lojban, one is already thinking using predicate logic structures, so that the deduction process is much plainer.

If Lojban is shown by experiment to have the systemic properties of a natural language, and is easier to implement in computational linguistics research problems, it serves as a tool to bridge those two disciplines, leading to more rapid and effective natural language processing. But only if it is tried. Even if it proves less than ideal, the study of natural language using computational linguistic techniques and a Lojban-based tool can be instructive in ways not accessible using any natural language.

Lojban could also be used to study language acquisition. Take even a few children during the critical period of language learning and teach them this artificial language (at the same time as they learn their traditional language). Do they become truly bilingual? If they are as fluently communicative in the artificial language as they are in their natural language, then the artificial language is a suitable model of language; it becomes as real a language as any other. In that case, any theory of language that cannot encompass the features of the artificial language is inadequate. You could perform a series of experiments with ever more exotic artificial languages (obviously you would need new speakers for each test). Sooner or later, either the model breaks, and the artificial language is no longer acquirable by children or communicative as a language; or the theory breaks, and you've learned where to look for improvements in the theory.

With only natural languages, you have to devise theories based on the available data, and then look in other natural languages for confirmation or refutation. But this isn't the optimal kind of experimentation, because you really cannot plan the experiment or control the variables. (The other language may have the same apparent feature through a totally different process that you won't recognize, because you aren't looking for it.)

Lojban has a feature that is designed to explore a less-understood aspect of language -- the direct expression of emotion. Lojban allows expressive communication of emotions in words without suprasegmental features such as intonation. This is presumably unlike all natural languages, though not entirely, as many languages have a limited set of indicators of attitude in the form of interjections. Can human beings manipulate the symbols of emotion, in the same way they manipulate the comparable symbols of non-emotional expression? There is a whole range of experimental questions raised by this design element, probably the most 'unnatural' element of Lojban's design.

A language like Lojban is an ideal test bed for experimentation, because it is flexible; you can evolve slightly different versions of the language very easily by simply changing some features. Delete a particular construct from the prescription, and do not teach it to a child. Does the child develop that construct anyway, by analogy to other known languages, or does the child successfully adapt to whatever other processes you've designed into the language instead of the omitted construct? Investigating questions like these through Lojban can help us significantly advance our understanding of language.


24. How is Lojban useful as a tool of linguistic analysis?

Here is an example, based on the 1991 Scientific American Library book The Science of Words (George A. Miller, Scientific American Library Series, New York: W H Freeman, 1996).

Miller notes that Nootka (a Pacific Northwest language) has the single word inikwihl'minik'isit meaning the equivalent of the entire English sentence "Several small fires were burning in the house." Here is a Lojban sentence closely paralleling the English:

so'i cmalu fagripuca'ojelcane'ile zdani
Many small fireswere-thenburningwithinthe house

But here is the sentence as a single word (though not with the same structure as Nootka):

zdane'ikemcmafagyso'ikemprununjelca
house-inside-type-of-small-fire-many-type-of-previous-burning

In fact, according to Miller the Nootka word breaks down as:

inikw-ihl-'minih-'is-'it
fire/burnin-the-housepluraldiminutivepast-tense

This order is also readily expressible in Lojban:

fagykemyzdanerso'icmapru
fire-type-of-house-inside-many-small-past-event

In either case, the Lojban more accurately tracks the semantics of the Nootka, demonstrating the inadequacy of the English. The Nootka word as broken down did not require two separate semantic elements for 'fire' and 'burn' as did the English version, and the English translation used the more complicated tense 'were-burning' instead of the simpler, and presumably more accurate 'burned'. It is clear that in translating the word-sentence into English, considerable vagueness is introduced.

Lojban cannot, of course, express everything in the natural form of any language whatsoever. Lojban has a less-marked syntactic word order, and expressing other orders requires marking particles that would not be found in the source language. Thus there is a tradeoff between precise semantic and syntactic representations.

Still, this example suggests that, as a predicate language, Lojban is a much more effective tool at studying both the forms and semantics of other languages than is English, which has its own cultural, syntactic and semantic complexities to complicate the analysis. This is especially true for analysis by non-native English speaking linguists -- if there is any place where there is a justification for an international, culturally minimalist language, it is when linguists from different native language backgrounds try to perform and communicate their linguistic analyses.


25. What sociolinguistic applications would Lojban have?

A language that begins with a highly detailed prescription and then is allowed to drift naturally is an ideal test bed for examining the processes of language change. In the case of an artificial language like Lojban, as the speaking community in each culture grows, you can observe how the language changes in contact with the speakers' native languages. Because of the speed of learning, artificial languages should tend to show effects more quickly (by being mastered to a communicative level more quickly). Anecdotal evidence about Esperanto supports this idea.

Does this mean that the conclusions are completely replicable to natural language evolutionary processes? Obviously not. But again, we are performing experiments with a model, somewhat idealized, of a natural language. Unlike a purely theoretical model (as all linguistic theories must inherently be), Lojban is a model that can be experimented with using live speakers. Provided that we understand the model as it evolves, that understanding approximates an understanding of natural language more closely as time goes on.

Furthermore, most numerically and sociologically important languages have some degree, more or less, of prescription. Indeed, some natural languages, such as modern Hebrew, formal Swahili, Mandarin Chinese, and modern standard Arabic, in certain ways resemble artificial languages, though they are regarded with more interest by linguists. A predominantly prescribed language would seem an especially effective tool for studying the effects of prescription on language development and use.

Such studies may aid in first-language education as well as second-language acquisition. They may also aid in analyzing the development of different registers (usages based on social class and situation) of a single language: such registers often arise partly as reactions to prescriptive environments that constrain language use.

None of these scientific applications of Lojban inherently requires a large fluent body of speakers, or any solely-native speaker of that tongue. If any of the less scientific applications of Lojban provide it with a speaker base, the nature of Lojban's usefulness as a model will change. New applications, not really predictable as yet, will turn up, aided by our doubtless increased understanding of language. But the model of language constituted by Lojban, even if well understood, no longer is as simple; and new logical languages and other experimental linguistic tools will need to be developed to take the next step.