Thursday, April 19, 2012

Robin Lakoff's introduction (2004) to Language and Woman's Place (1975)

In 2004, a new annotated edition of Robin Lakoff's Language and Woman's Place came out. I addtion to the orignal text, this new edition includes a number of essays on the topics raised in the book as well as a new introduction by Lakoff herself.

The Frustration of a Comskyan

The introduction is interesting for several reasons. One of them is that it contains an interesting personal account of the ideas that were splitting transformational grammar apart in the era of Aspects:
By the late 1960s it had become clear to several of us that Chomsky's linguistic revolution wasn't the recolution in which we had enlisted. Chomsky had promised us a theory that would make language a "window into the mind." But within standard transformational theory that possibility could be realized only to a very limited dregree, if at all. While investigators could use their minds as interpretive instruments—to judge the grammaticality or semantic similarity of sentences—they were not permitted to investigate meaning, much less a speaker's intention in uttering a sentence in a particular form, or the effect of that utterance on the addressee.
Consequently, Lakoff and others got the idea of starting from meaning representations of a certain form and then deriving syntactic form out of that. To the extent that it ever was a single or coherent theory, this new framework is what we now call "generative semantics":
We devised rules and representations to relate externally accessible linguistic forms to mental states—for example, desires, assumptions, and personal identities—while retaining the Chomskyan belief in the primacy of the syntactic component of the grammar. Deep structure got deeper, wider, and more complex.
"Deep structure got deeper"—what a wonderful summary. Remember also George Lakoff's comment that he and his gang just wanted to be "good little comskyans," quoted in The Linguistics Wars. How wrong they both were—about what they were doing, and how much linguistics would change.

Grammar Unlimited?

Interestingly, the orthodox Chomskyan argument against this augmented view of grammar was, in Lakoff's rendering:
If you followed generative semantics to its logical conclusion, everything speakers know about the world would have to be included within the transformational component, which therefore would become infinite. [...]
Not necessarily, said the generative semanticists. The linguistic grammar need only include those aspects of the extralinguistic world that have direct bearing on grammatical form: just a small subset of everything. [...] But we still had to answer, at least to our own satisfaction, the question that these claims raised: What parts of our psychological and social reality did require linguistic encoding, in at least some languages?
Lakoff's point in the essay is of course that gender is one of the important variables of linguistic expression—not just "in a few 'exotic' languages (Japanese, Dyirbal, Arawak, and Koasati)," but in solid, run-of-the-mill English as well.

But at the same time, the quote points right at the big taboo of linguistics, whether or not Lakoff herself intends it to do so: Once we admit that syntax can't be isolated from meaning, the floodgates are open to seeing that there really isn't any such thing as a "language" at all; the difference between syntax and anthropology is really more about differences in interests than about anything inherent in the "object of study."

Is Linguistics Linguistics?

Lakoff is also aware that something in transformational grammar and generative semantics was holding it back from saying anything intelligent about discourse structure on a larger level:
Linguists spoke on occasion of "structure above (or beyond) the sentence level," but mostly about how it couldn't be done. When we attempted it, we thought of larger units as concatenations of sentences: S+S+S ..., rather than as structures with rules of their own, wholes different from the sum of their parts.
While I think that this has something to do with the game of showing-and-hiding that linguistics necessarily entails, Lakoff seems to attribute it more to the fact that different tools fit different situations:
While a generation ago, "structure above the sentence level" had the status of the basilisk (mythical and toxic), now it is an accepted area of linguistics [...] These analyses made it clear that discourse should be understood not as concatenations of S's, but as language directed toward particular interactive and psychological purposes.
So now we're OK, the message seems to be; we just had to realize that a different set of concepts was needed (turntaking, politeness, power, identity, etc.).

While I agree that conversational analysis has something new and interesting to say about language use, I also think that there's something genuinely wrong about saying that it does the same thing as Chomskyan grammar, only with a different tool. It's not just a shift of attention or of measuring equipment, it's a shift of standards, mindset, ethics, and goals.

Not that there's anything wrong with either hermeneutics or with mathematics—it's just that they are never going to be unified into a single methodology. There is a tension between the picture of what counts as data and valid argument that Lakoff's book drew attention to, and I don't think it's a tension that can or should be resolved.

Wednesday, April 18, 2012

Blutner: "Some Aspects of Optimality in Natural Language" (2000)

This paper by Reinhard Blutner is the one that introduced the idea of "bidirectional" optimality theory. Bidirectional optimality theory is bidirectional because it doesn't just optimize an input parse or an output expression, but does both at the same time. The whole speech situation thus defines a little game, and Blutner defines an equilibrium concept for such speaking/hearing games.

The Phenomenon in Focus

The observation that drives Blutner's idea is that non-standard forms tend to designate non-standard referents. So the straightforward formulation I killed him designates a stereotypical killing-event, while I caused him to die designates an atypical one (p. 9).

This can be explained economically if we assume that being ambiguous is more "costly" than being brief. Then the pairing (s, t')—i.e., unmarked sentence with marked meaning—is suboptimal because it makes the unmarked sentence ambiguous. It also makes sense information-theoretically, because you want to reserve the short sentences for the frequently occurring referents.

Blutner's Idea

It seems that Blutner wants to arrive at this conclusion, too, but from a different angle. Imagine that we still only have two signals and two interpretations, and we put them into a table like this:


tt'
s–m, –m –m, +m
s'+m, –m +m, +m

Here, being marked is taken to be bad, so we can imagine that people try to choose cells with as many occurrences of "–m" as possible. This means that the top left cell is the best one, corresponding to a pairing of unmarked form with unmarked meaning.

We accordingly take the whole first row and first column out of the game, since they have now been coupled with something. We are then left with a reduced subgame:


tt'
s–m, –m –m, +m
s'+m, –m +m, +m

In this subgame, both players have but a single option, so this obviously becomes optimal. We thus couple the marked form with the marked meaning, and we're done.

The same little game could of course be played with more signals and more meanings. We would then couple forms to meanings in increasing order of  markedness, until one of the sets had been exhausted. We could also play it with a single meaning and two forms, as his fury/furiosity example points out.

A More Formal Version of The Idea


In an attempt to capture the dynamics of this process, Blutner suggests the following definition (p. 11), here in a slightly reformulated version:
(s,t) is super-optimal if s is a candidate reading of t, and if
Q: There is no other pair (s',t) that satisfies I as well as u(s',t) ≥ u(s,t).
I: There is no other pair (s,t') that satisfies Q as well as u(s,t') ≥ u(s,t).
A different way of putting this is to translate it into an update procedure:

Let a table of numbers be given.

Randomly write crosses in some cells and nothing in others. This is your start configuration.

Then repeat the following loop:

    For each cell c:

        Find the set of competitors; these are the cells that
        are in the same row or the same column as c and are
        marked with a cross in the current configuration.

        If the number in c is larger than the number in all
        of these competing cells, give it a cross in the
        following configuration.

    If the last configuration is equal to the next, halt.

I realize this is not as romantic as a circular definition, but it is easier to apply. For instance, let's imagine we're starting with the following table:
 
897
321
564

If we start with a completely empty start configuration (no crosses in any cells), then we can apply the procedure 6 times before we arrive at the following configuration of crosses:


×


×
×


At this point, we have indeed reached a fixed point: No crossed cell can out-compete another crossed cell. Note that this is achieved by having the characteristic one-to-one mapping between forms and meanings.

Thursday, April 12, 2012

Hendriks, de Hoop, and de Swart in Journal of Logic, Language, and Information (2012)

In their brief introduction to a special issue on game theory and bidirectional game theory, Petra Hendriks, Helen de Hoop, and Henriëtte de Swart discuss the parallels between the two frameworks and their limits.

Handy References

The text contains the following references on bidirectional optimality theory:
In addition, they cite the following paper as pointing out "the connection between bidirectional Optimality Theory and Game Theory" (p. 2):

A Theoretical Point

Besides the general introduction of the field and the players, the three authors point to a possible theoretical shortcoming of both bidirectional optimality theory and game theory.

The problem they point out is that "these frameworks generally predict a one-to-one pairing of forms and meanings," which is not empirically true (p. 2). They illustrate this with the Dutch question Wie heeft Frank vermoord?, which is ambiguous between Who did Frank kill? and Who killed Frank?

More specifically, they note that while it is true that "marked forms go with marked meanings," the reverse is not: "unmarked forms can often be used to express unmarked as well as marked meanings" (p. 3). This claim is supported by a reference to a paper in Lingua, but I don't know exactly what the example they have in mind is.

A Note On That Point

Just on the face of it, there doesn't seem to be anything wrong, from the standpoint of microeconomics, with the many-to-many relation, but it does require a slightly more sophisticated model of the "cost" of an utterance. Consider for instance:
  • He is dead (+m) = "He is dead" (+m)
  • He is gone (–m) = "He is dead" (+m)
  • He is dead (+m) = *"He is gone" (–m)
  • He is gone (–m) = "He is gone" (–m)
I haven't done the math here, but this might be explained by pragmatic effects under the right assumptions: If all messages are equally cheap, then we should indeed expect a one-to-one correspondence to emerge; however, since one of the meanings is taboo, it will tend to increase the cost of whatever message gravitates towards it. This might sustain the incentive to use analogical references instead of direct (unambiguous) references.

Beate Hampe: "When down is not bad, and up is not good enough" (2005)

This thoughtful paper by Beate Hampe (see also From Perception to Meaning, 2005) is a nice empirical counterweight to some of the wildly speculative claims that are thrown around in cognitive linguistics. In this particular case, the issue is whether there is an inherent and global good/bad valence to the dichotomies up/down, in/out, on/off, front/back, etc.

The Theory
This claim has apparently been made most clearly by the Polish linguistic Tomasz P. Krzeszowski. Krzeszowski himself sees the claim as a specific version of the "Invariance Principle," since it claims that up/down metaphors inherit the good/bad valencies that standing and lying down have in our preconceptual lives.

This is a neat little fairytale about the genesis of meaning, but as usual, the empirical data spoils everything. The first, obvious sign comes from looking at bad things going up:
  • Unemployment is up. (bad)
  • Employment is up. (good)
Faced with such examples, one would have to say something like this: up and down have an inherent emotional value, but this emotional value is not very strong itself; a strongly laden context can thus pull the words in an "unnatural" direction. However, on average or in neutral contexts, up will have a weak tendency to lean towards positive emotional valence, and down a tendency towards negative.

The Counterevidence
However, this doesn't seem to be the case. Hampe has done a medium-sized corpus study of the constructions finish off, finish up, slow down and (the rare) slow up. For each occurrence, she looked for valence clues in the immediate context and categorized the example according to how positive it was. The result was, surprisingly, that up was much more likely to be used for negative purposes than down or off.

Now, it is of course a little unfortunate that she only looked at two verbs, and that one of her particle pairs (down/off) were not antonyms. A safer strategy would be to pick two antonym verbs and two antonym particles and then combine them in a table like the following:


in out
give 28.0 kk 27.3 kk
take 53.5 kk 118.0 kk

The numbers in this table are the number of occurrences (in millions), estimated by Google searches for the exact phrases. This excludes, for instance, split uses like take me out (or in general verb + NP + particle). However, Hampe's study seems to have the same weakness.

Similar tables could be made for the following:
  • come/go in/out
  • come/leave in/out
  • push/pull on/off
  • break/make up/down
  • break/fix up/down
  • etc.
Other Sources of Evidence
Several empirical hypotheses could be tested for such data.

For instance, one test whether there is a statistical tendency for some positive particle (e.g., in) to attach to the positive set of verbs (come, make, stand, keep) compared to its negative counterpart (out). This could be done with respect to grammaticality or with respect to empirical counts. It would essentially amount to collapsing all of the tables into a single 2 × 2 table.

Another question would be whether the positive and negative contexts where distributed evenly across the two columns of any such table. In order to do so, one would have to develop a valence assessment method like Hampe's, preferably an automatic one. After having trained such a model, one could use Fisher's exact test on the contingency table consisting of positive/negative valence × positive/negative column.

The automatic valence assessment might perhaps be achieved through semi-supervised learning. We can imagine starting from a valence function v0 defined on a small set of good and bad words:
  • v0(w) = 1 for w in some finite set G = {good, pleasant, improve, victory, truth, ...};
  • v0(w) = –1 for any w that is an antonym to a word in G;
  •  v0(w) = 0 for all other words w.
Assuming that good words co-occur with good words, this could be used to train a more fine-grained function, say, v1000.There are a number of problems with this assumption (it ignores rhetorical contrast effects as well as negation), but experiments would show whether it worked or not.

Wednesday, April 11, 2012

Helen de Hoop: "On the Interpretation of Stressed Pronouns" (2003)

This paper by Helen de Hoop is essentially about pronoun resolution. It focuses on two clues that can be used to narrow down the field of possible resolution candidates, contrastive stress and local coherence.

Concepts
Contrastive stress has to do with the fact that changes in focus often come in explicitly marked forms:
  • Paul1 hit Jim2. Then he1 kicked him2. (= Paul kicked Jim)
  • Paul1 hit Jim2. Then HE2 kicked HIM1. (= Jim kicked Paul)
Topic continuity is the constraint which is known from centering theory. In a simplified form, it says that pronouns refer to things that had high salience in the previous sentence. This explains differences like this one:
  • Paul1 is not around. He1 is talking to Alan2. He1 will be back later.
  • Paul1 is not around. He1 is talking to Alan2. *He2 will be back later.
Other clues exists as well (e.g., syntactic parallelism and more subtle salience differences), but these are the ones de Hoop discusses. I should note that centering does not actually fare very well when tested on corpus material.

Conclusions
In the last five pages of her paper, de Hoop formulates her observations about anaphor resolution in terms of an optimality-theoretic constraint system. This system consists of two constraints: one requiring all pronouns to refer to high-salience topics of the previous sentence, and one requiring stressed pronouns to be read in a way that introduces a shift of attention.

De Hoop claims that the Contrastive Stress constraint (S) is stronger than the Continuing Topic (T) constraint. This means that we get the following linear ordering on constraint violations:
None > only T > only S > both S and T.
So if some reading violates, e.g., the constraint S, it must mean that no readings were available that violated only T or nothing at all.

I don't find her argument for ordering the constraints this way entirely convincing, though. She gives examples that violate T but not S, and examples that violate both constraints, but she does not exclude the possibility that other examples might violate S but not T. If such counterexamples exist, they will look more or less like the following:
  • John1 hit Paul2. Then HE1 kicked Paul2.
I don't know if one could find such an example in naturally occurring discourse, but I'm not entirely convinced of the opposite, either. And this strong claim is the prediction that falls out of the optimality-theoretic analysis.