Semantic issues in a prescriptive word composition theory.

			    Nick Nicholas.


In this essay, I consider semantic issues in the formulation of a
prescriptive theory for word compound meaning, and to what extent they
run counter to prescription.

The language in which the formulation has occured is Lojban. Lojban is
an artificial language, the offshoot of an earlier project, Loglan
(Brown 1960). The declared aim of the Loglan project was to test the
Sapir-Whorf hypothesis on a language based on symbolic logic. Lojban
supporters are motivated by a more diffuse range of reasons, amongst
which apparently a desire for unambiguous communication. The language
has been under development for the past five years, and is nearing
publication.

The main unambiguities claimed for the language (Cowan 1991) are
phonological-graphical, morphological, and syntactic;

"The claim for semantic unambiguity is a limited one only. Lojban
contains several constructs which are explicitly ambiguous
semantically.  The most ambiguous of these are Lojban tanru (so-called
'metaphors') and Lojban names. [...] tanru are binary combinations of
predicates, such that the second predicate is the 'head' and the first
predicate is a modifier for that head. The meaning of the tanru is the
meaning of its head, with the additional information that there is some
unspecified relationship between the head and the modifier.

tanru are the basis of compound words in Lojban. However, a compound
word has a single defined meaning whereas the meaning of a tanru is
explicitly ambiguous, Lojban tanru are not as free as English figures
of speech; they are 'analytic', meaning that the components of tanru do
not themselves assume a figurative sense. Only the connection between
them is unstated." (Cowan 1991:22)

The above can be taken as a kind of manifesto of Lojban attitude to
compounding. The components, the morphologically primitive predicates
(gismu), are held to have an 'unambiguous' meaning by virtue of the
relationship they posit between their arguments. Thus botpi 'bottle' is
defined as:

x1 is a bottle/jar/urn/flask/closable container for x2, made of
material x3 with lid x4

and, in veridical use of the predicate, the presence of all arguments
is entailed. If one asserts that X is a bottle, it is entailed that it
is a bottle of something, with some lid. If it lacks either, it is not
a botpi, and cannot be called lo botpi (it can be called le botpi,
where le is the nonveridical determiner, corresponding somewhat to a
definite article: "that which I describe as a bottle".)

This attitude to sense clearly predates any awareness of prototype
semantics, and the insistence on positing necessary conditions for
sense may prove unworkable. If so, language users will presumably use
the words as they are and ignore the entailments. The language definers
have belatedly recognised this by introducing the particle zi'o to
'undefine' a predicate argument, but this will probably not be enough.

Notwithstanding the problems inherent to 'unambiguous' base predicates,
it is the manner in which they combine I wish to deal with. As we saw,
head and modifier are considered unambiguous, but the manner in which
they are related is not. Thus gerku zdani, "dog house", is as ambiguous
as it is in English, and can have a variety of denotations: a house
housing dogs, a house that is also a dog, a dog-shaped house, a house
owned by dogs, etc. But once a compound word is formed based on this
pair, like gerzda, "doghouse", it is supposed to entail only one of
these possible head-modifier relations, and to have a 'unique'
meaning.

While talk of unambiguous meanings seems a chimera, the claims being
made are not impractical, when properly reinterpreted. The predicate
definitions constrain their senses; in a tanru, the different ways the
head can relate to the modifier expand the resulting sense; fixing a
single relationship constrains sense sufficiently for there to be a
single well-defined set of predicate arguments, without implying that
there is a single sense or even a single denotation to the resulting
compound.

The Lojbanic propensity for disambiguation has manifested itself in the
formulation of rules, according to which the compound predicate can be
derived from the component predicates. Though this venture was
discouraged by the chief language engineer as premature, some
preliminary guidelines were outlined by Jim Carter, and formalised by
myself (Nicholas 1993). Further language usage, and considerable debate
on the issue, have shown that, while these guidelines are useful in
deriving compound predicates and augmenting Lojban vocabulary, there
are many semantic provisos to be borne in mind while using them. I will
list the main such problems below.

It is worth mentioning that a similar attempt to formalise word
compound meaning has occured in a more well-known artificial language,
Esperanto.  (see Schubert (1993)). To what extent has actual language
usage conformed to the prescriptive ideal, and were there any major
gaps in the prescriptive model that usage has had to work its way
around?

The prescribed rules (based on descriptive analysis of a corpus, to a
much greater extent than the analogous work in Lojban) have encouraged
productive use of word forms not used in the early language, most
notably in the elision (where the rules allow it) of the suffix -ec-,
"-ness". Thus laboremeco is already considered an archaism for
laboremo, "industriousness", and compounds like rugxo are gaining
ground over compounds like rugxeco ("redness"). Arguably the community
has picked up and creatively exploited these rules. This does not mean
that they are universally adhered to.

The first reason why is calques. For example, according to the rules,
korekta means not "correct" but "corrective". Schubert claims that "for
speakers not yet too familiar with the inherent regularity of
Esperanto, the temptation is strong to use korekta as the English
adjective correct or its French, German, etc., equivalents." (Schubert
1993:329), and the Full Illustrated Dictionary of Esperanto lists this
sense of korekta as evitinda (to be avoided). Though it runs against
the prescriptive norm, however, the use of korekta in this sense is so
prevalent, it would be irresponsible for a descriptive account of the
language not to list it as its primary meaning. This, I think, has two
lessons for Lojbanists: firstly, languages are not closed systems
(though by virtue of its morphology and philosophy Lojban will probably
be more closed than Esperanto), and any internally generated account of
the language is still subject to disruption from outside forces.
Secondly, lexemes in speakers' native languages fulfil certain
functional needs; if an artificial language community feels that need
must be met, the word will be calqued across, no matter what the
prescriptivists say. This has been the case for korekta, and I suspect
it may often prove the case for Lojban speakers, given the
inflexibility of the rules involved.

The second reason is idiomatisation. Disregarding early calques like
respondeca ("responsible"), this is most pertinent in Esperanto for
compounds where meaning can no longer be compositionally accounted for.
In some cases, the compound is a hyponym of the meaning obtained
componentially.  Thus a lernejo is not just a "place of learning", but
specifically a school; "this meaning has been defined by convention.
The full compositional meaning can be actualised when a speaker so
wishes, but special means are needed for this." (Schubert 1993:354)
(The reading obtained, I would add, would be highly marked.) Or, the
compound may have a figurative sense, as in the old calque disvolvigxi
"to unwind" for "to evolve". (Lojban, being used by a literal-minded,
logicist community, has an abhorrence for such compounds.) These
semantic shifts may also be amenable to a functionalist analysis, where
the de facto readings are somehow more useful, as taxonomically more
basic, than the prescribed readings of the compound. A pragmatics of
word compounding in lexically productive languages --- what pragmatic
circumstances motivate the coining of new lexemes as opposed to the
continuing use of periphrases --- would yield some interesting results
here. In any case, we can see that, again, if speakers feel they need
an expression for a certain concept, or if connotations and the
interaction of semantic fields are potent enough, then the meaning of a
compound will shift from its original, componential sense; the metaphor
underlying the compound will die. This process, some believe (Schubert
quotes Ferdinand de Saussure's discussion of artificial languages in
this light), is part of creolisation, and is inevitable when a language
gains a wide enough speaker community.

Semantic problems

(see Appendix for the prescriptive rules discussed.)

1. If the modifier describes an argument of the head, would a place for
this argument in the compound predicate be redundant? For example,
given that one says la monrePOS. zdani la spot. "Mon Repos houses
Spot", does it make sense for gerzda "doghouse" to still have a
thematic role for the entity housed (la monrePOS. gerzda la spot. "Mon
Repos dog-houses Spot")? There are intuitive reasons to say no: he is
the (*van) driver of a van would indicate a straighforward equivalence
between modifier and complement. Is this the case for "doghouse"
though? The 'transformation' of the English compound to noun with
complement is "house for dogs", where "dogs" is clearly a hypernym of
"Spot". This difference in denotation indicates that "Mon Repos
dog-houses Spot" is not conveying redundant information, in that the
identity of the dog is unspecified by the compound.

But if the proper meaning of "doghouse" is "house for dogs", shouldn't
the denotation of the "dog" thematic role be that of a mass noun,
rather than a proper name? And shouldn't the usage of gerzda with a
specific dog be blocked (Aronoff 1976:43) by the use of zdani? (The
former gives the identical relational information, merely imposing the
selectional restriction that the entity housed be a dog.) In other
words, is there any motivation for the predicate gerzda to mean "X is
the doghouse of dog Y", when Y seems to be properly a mass noun, and
when zdani is doing the same job? Opinion in the Lojban community is
divided. The conclusion I arrived at in formulating the guidelines was
that most of the time there was no such motivation, but what takes the
place of Y, if anything, is unclear. If Y is also dogs, then it is
redundant (*X is a doghouse for dogs); if it is a hyponym for dog,
while still a mass noun, it might not be (X is a doghouse for St
Bernards), though it can still be blocked by the analogous X houses St
Bernards.

2. The resulting compound argument order is counterintuitive. Typically
the first argument in Lojban corresponds to the subject, and the second
to the direct object. This doesn't work out when the compound is an
action verb (agentive causative) and the base predicate is an
instrumental state verb (using Fillmore's 1968 Case Grammar
terminology: see Cook 1989:15) Thus vreji "X is a record of event Y in
medium Z" has the causative veirgau "agent W makes X a record of event
Y in medium Z", rather than the expected "agent W records Y in medium Z
as X" ('X' is still the object, in case grammar terms).  Similarly, the
agentive of galfi "process X modifies Y into Z" is gafygau "agent W
uses process X to modify Y into Z" rather than "W modifies Y into Z
using process X". With an increasing number of base predicates being
defined as state rather than action verbs (lacking an agent thematic
role), this discrepancy is likely to cause confusion in usage, compared
to non- instrumentals (cf. fengu "X is angry about Y" and fegygau "W
makes X angry about Y", or glare "X is hot by standard Y" and glagau "W
makes X hot (heats X) by standard Y".)

3. The decision on which arguments should be eliminated as "irrelevant
to the definition" is arbitrary and non-compositional, and has drawn
criticism as an indirect means of calquing: by positing such
"irrelevancies", users are said to be matching word meaning to an
extraneous (English) model. An example of this is laurba'u for "to
bellow"; the decision to semantically restrict "loud utter" to "bellow"
is externally motivated. While this is a valid criticism, in the light
of Esperanto's experience with calques, and the impoverishment of
Lojban's native resources, I think such coinages are inevitable.
Unfortunately, they will foreseeably also cause much confusion. For
example, in considering the arguments for posydji "to want something"
(ponse djica "own want"), I have discarded as irrelevant the argument
of ponse denoting law of ownership, while John Cowan has discarded the
argument of djica denoting motive for desire. Leaving all such
arguments in is unworkable, since the compound's arguments would
proliferate unmanagably; but setting rules for which arguments to leave
out seem to me implausible.

4. The decision on which arguments to eliminate as redundant is also
somewhat arbitrary: it is a matter of deciding which thematic roles
should be coindexed. For posydji, the subject of wanting needn't be
coindexed with the subject of owning. It would be possible for the
predicate to have the form "X wants for Y to own Z", rather than "X
wants to own Z". A case could be made, however, based on some notion of
iconicity, that a shorter expression like the compound should express a
simpler or more frequent concept (by Zipfean metrics) The sentential
paraphrase can then be reserved for the more general concept, with more
frame roles. An analogy could be made with the English constructs She
wants to own it (for She wants herself to own it) versus She wants him
to own it. This would mean that as much coindexing as possible would be
encouraged between component predicate arguments.

5. There is a concern (pointed out by Mark Shoulson) that language
users are unnecessarily coining compounds to match the semantic map of
English, when a language as lexically restricted as Lojban (about 1350
morphologically primitive stems) should be content with a less finely
divided semantic taxonomy of the world. For example, Shoulson has
criticised my coining of djabeipre "waiter" (cidja bevri prenu "food
carry person"), arguing that in most contexts the stem bevri "carrier"
is sufficient. While this is a perceptive observation, native semantic
maps tend to be firmly entrenched in speakers' minds. Esperanto
borrowed its own maps from German and Russian (Waringhien 1959), and
consolidated them with a large corpus of writing.  Lojban can avail
itself of neither kind of resource. It remains to be seen to what
extent Lojban can find, let alone enforce, a native semantic map,
founded on its morphological primitives.

6. Because of the analytic nature of some of Lojban's predicates, there
is frequently free choice as to which compound component is the head,
and which the modifier. For example, in they killed each other,
"killed" would be taken as the head, and "each other" as a complement
or modifier, whether it appeared as a noun phrase, a prefix, or an
adverb. But Lojban predicate simxu "mutual" is so defined ("set of
participants X are reciprocally involved in activity Y"), that it could
be taken as the head. simcatra "kill each other" can be transformed to
the phrase simxu lenu catra "are mutual in that they kill". This
atypical free choice extends to several modifiers: milxe "mildly",
carmi "intense", cmalu "small", mabla "derogative", and so on.

In fact, it has become something of a shibboleth to note which element
a speaker uses as the head in compounds. Most language users make the
natural language modifier the modifier in their compounds, even if they
are aware that a Lojban 'deep structure' analysis has it as the head. A
minority though (most prominently Jorge Llambias and Jim Carter) do
not. Thus the former has used kakymli "clear one's throat" instead of
the expected mlikafke, from kafke milxe "cough mild, mild in coughing"
rather than milxe kafke "mildly cough". The question here is what
properly constitutes a head. Should the judgement be only syntactic/
morphological (in which case kafke can be both head and modifier
according to individual prefernce), or should it be semantically
motivated? (Which raises the problem of which semantic metalanguage one
should use for the judgement. Lojban metalanguage needn't have any
validity outside Lojban, and any metalinguistic decision seems entirely
arbitrary.)

7. Lojban compounds are not always as compositional as the outline
below would have it. Indeed, the merit of the compositional scheme
given is that it works as often as it does, not that it can predict the
behaviour of all conceivable compounds. The approach typically taken
towards compounds like gusfu'i "photocopy" (gusni fukpi "light copy")
is that they elide other morphemes in their derivation. Were these
morphemes present, they would give a compositional derivation --- one,
namely, in which the modifier would be an argument of the head, or in
which the modfier and head both describe the same denotatum. It is
worth asking how constructive such an account would be, when the
current compound, while not detailing how light relates to copying, is
sufficiently evocative of its sense. (Though the Chinese calque fragu'i
"laser" from frati gusni "reaction light" may show this evocation is a
matter of degree.)

The current system also has no account of compounds where the modifier
is a modal argument of the head predicate, such as jboselsku "Lojban
writings" (lojbo selsku "Lojban expression"). Again, a strict
compositionalist approach would posit some elided predicate which would
make the modifier a proper argument of the head (like seke lojbo pilno
cusku "PASSIVE ((Lojban use) express)"); again, the descriptive
adequacy of such an account is questionable.  Since modal arguments
don't present a problem for compositional accounts not as strict as
Lojban's (like Esperanto's), the system clearly needs some broadening
to treat such cases explicitly, instead of purporting to reduce away
all possible ambiguities.

The 'elision' account is also invoked to treat cases where heads or
modifiers are used unmodified, where the rules would predict
passivisation phenomena (where the first argument of the predicate
involved is exchanged with another argument). Two examples that have
drawn much comment are le'avla "loan word" (lebna valsi "borrow word")
and xekskapi "dark-skinned" (xekri skapi "black skin"). It has been
argued that the former should be selyle'avla, from selylebna valsi
"borrowed word", since loan words are both borrowed and words, but not
both borrowers and words. I believe that lebna valsi is as evocative a
compound as gusni fukpi, and there is no overriding need to saddle the
language with longer than necessary compounds, simply to satisfy
demands of an overstrict compositionality. I have accepted the
criticism made to me, though, that xekskapi is 'incorrect', in that the
head is not describing the denotatum, as is expected in Lojban. Someone
dark-skinned is not skin, but someone with skin. This has led me to
recommend against xekskapi in favour of xekselskapi ("black skinned")
in my guidelines, and to frown on unmodified heads more than unmodified
modifiers. It remains to be seen whether the prescriptive machinery of
the language will enforce this restriction; from my experience,
prescriptivism seems to be a potent force in artficial language
communities, whose hold on any model of their language is always
tenuous.

Appendix: The rules for Lojban compounds (lujvo)

The guidelines I have formulated in Nicholas (1993) can be summarised
as follows:

The argument set for a compound is a subset of the union of arguments
of its component predicates.

Arguments can be eliminated from this set by conveying redundant
information (having the same denotatum as some other argument), or
irrelevant information (which is taken as contrary to the definition of
the compound's new concept.)

As an example of the former, the predicate for gerzda does not have
both a place for the entity housed (x2 of zdani "house") and a place
(thematic role) for the dog (x1 of gerku "dog"), since they are
presumed to have the same denotatum. As an example of the latter,
laurba'u "to bellow", from cladu bacru "loud utter") does not have a
place for the location at which the bellower is loud (x2 of cladu),
since a bellower in New York is still "loud-uttering", even if she is
quiet relative to an observation point in Melbourne.

There are two interpretations of the relation between head and
modifier.  Either head and modifier are predicates both describing the
denotatum (eg.  balsoi "warrior" from banli sonci "great soldier"), or
the modifier describes an argument of the head predicate (eg. gerzda,
where gerku "dog" is an argument of zdani "house"). As a special case
of the latter, the modifier may be the predicate of a sentential
complement of the head (eg. ctigau "feed" from citka gasnu "eat act").

The arguments of any component predicate should appear in the same
order in the compound predicate, though they may be interleaved with
the arguments of other component predicates.

Bibliography

Aronoff, M. 1976. Word Formation in Generative Grammar. Linguistic
Inquiry Monograph 1. Cambridge (Mass.): MIT Press.

Brown, J. C. 1960. Loglan. Scientific American 202(6). 53-63.

Cook, W. A. 1989. Case Grammar Theory. Washington (DC): Georgetown
University Press.

Cowan, J. 1991. Response to Arnold Zwicky's review of 'Loglan 1':
Loglan and Lojban: A Linguist's Questions and an Amateur's Answers.
ju'i lobypli 14.  21-29.

Dasgupta, P. 1993. Idiomaticity and Esperanto texts: an empirical
study.  Linguistics 31. 367-386.

Fanselow, G. 1988. Word Structure and Argument Inheritance: How Much is
Semantics? In The Contribution of Word-Structure-Theories to the Study
of Word Formation. Linguistische Studien Reihe A Arbeitsberichthe 179.
Berlin:  Akademie de Wissenschaften der DDR. 31-52.

Nicholas, N. 1993. Doing the belenu blues: Lujvo place structure paper,
Version 2.  Lojban FTP Server (casper.cs.yale.edu).

Schubert, K. 1993. Semantic compositionality: Esperanto word formation
for language technology. Linguistics 31. 311-365.

Waringhien, G. 1959. Lingvo kaj Vivo: Esperantologiaj Eseoj. La Laguna