speaker as hearer

[Language post]

I’ve been impressed by the boldness of Fernández and Smith Cairns in devoting a chapter to “The Speaker” ahead of the chapter on “The Hearer” in their textbook Fundamentals of Psycholinguistics.

It’s one of the great fundamentals that there isn’t really a good model of how speech production works from a psycholinguistic perspective. The best established and most influential models of speech production certainly deal with linguistic units such as syllables or phonemes, but they don’t go any closer to articulation than that. These units serve as the input to whatever motor processes generate speech movements, but the motor processes themselves are generally treated as quite separate, if not trivial. (Fernández and Smith Cairns’s diagram of speech production has “articulatory system” well outside the box of interesting processes in their diagram at the start of the chapter.) The ‘perception’ side is generally much better understood than the ‘production’ side of things, so tackling production ahead of perception/comprehension is an interesting step.

But more striking – there’s a whole section of the Speaker chapter devoted to Producing Speech After It Is Planned. So might this be a place to find new insights linking mentally represented symbols to articulation? even tentatively, as befitting an introductory text?

Well, no – the section is acoustic, not articulatory. Shame! There’s a head diagram with the articulators labelled, but the diagrams are waveforms and spectrograms, not x-ray pellet tracings or EPG outputs. Not even so much as a diagram of a mass on a spring to help the reader feel warm and fuzzy.

It’s a perfectly fine section on the acoustic properties of consonants and vowels, I should add, but it does make you wonder what they’re going to talk about in the “Hearer” chapter now that all this talk of sine waves and formant transitions is out of the way.

segment sceptics

Selected, in chronological order

• Paul (1886), according to Abercrombie (1991): “In contrast to Pike’s view that a stretch of speech has a natural segmentation is the view that it is an indissoluble continuum, with no natural boundaries within it. This view is at least a hundred years old. It is clearly stated, for example, by Hermann Paul in his Principien der Sprachgeschichte in 1886. The word, he says, is ‘eine continuerliche reihe von unendlich vielen lauten,’ ‘a continuous series of infinitely numerous sounds,’ as HA Strong translates it in Principles of the History of Language. … As he puts it, ‘… A word is not a united compound of a definite number of sounds, and alphabetical symbols do no more than bring out certain characteristic points of this series in an imperfect way.’” (Abercrombie 1991: 29-30)

• Twaddell (1935) – the phoneme is “a fiction, defined for the purpose of describing conveniently the phonological relations among the elements of a language, its forms,” p53; “it is meaningless to speak of ‘the third phoneme … of the form sudden’, or to speak of ‘an occurrence of a phoneme’. What occurs is not a phoneme, for the phoneme is defined as the term of a recurrent differential relation. What occurs is a phonetic fraction or a differentiated articulatory complex correlated to a micro-phoneme. A phoneme, accordingly, does not occur; it ‘exists’ in the somewhat peculiar sense of existence that a brother, qua brother, ‘exists’ – as a term of a relation,” p49.

• Firth (1935) – “It is all rather like arranging a baptism before the baby is born. In the end we may have to say that a set of phonemes is a set of letters. If the forms of a language are unambiguously symbolised by a notation scheme of letters and other written signs, then the word ‘phoneme’ may be used to describe a constituent letter-unit of such a notation scheme” (Firth 1957 [1935]: 21)

• Firth (1948) – on using literacy-inspired transcriptions as a basis for phonological analysis (from the 1930s onwards, the writings of JR Firth show him distancing himself from over-reliance on transcriptions in alphabetic notation, for phonological analysis): “The linearity of our written language and the separate letters, words, and sentences into which our lines of print are divided still cause a good deal of confused thinking in due to the hypostatization of the symbols and their successive arrangement. The separateness of what some scholars call a phone or an allophone, and even the ‘separateness’ of the word, must be very carefully scrutinized” (Firth 1957 [1948]: 147).

• Ladefoged (1959) – quoted by Lüdtke (1969: 151): “The ultimate basis for the belief that speech is a sequence of discrete units is the existence of alphabetic writing. This system of analysing speech and reducing it to a convenient visual form has had a considerable influence on western thought about the nature of speech. But it is not the only possible, nor necessarily the most natural, form of segmentation.”

• Lyons (1962), commenting on Firth: “the practical advantages of phonemic description for typing and printing should not of course be allowed to influence the theory of phonological structure. It has been argued that phonemic theory has been built on the ‘hypostatisation’ of letters of the Roman alphabet: cf Firth, [‘Sounds and Prosodies,’ 1948], p134”

• Abercrombie (1965) is quoted by Lüdtke (1969: 151) as saying, “The phoneme … is not something which has a ‘real existence’.”

• Lüdtke (1969) – abstract, “the phoneme segment is not a natural item but a fictitious unit based on alphabetic writing”

• Householder (1971), summarised by Vachek (1989: 25): “[Householder] formulates the question whether, instead of postulating Chomskyan artificial underlying forms, it would not be more realistic to regard the graphical shapes of words as starting points from which the language user obtains their spoken, phonological shapes.”

• Linell (1982) – a whole book providing comprehensive, detailed coverage of the topic, Written Language Bias in Linguistics.

• Kelly and Local (1989) – the question of notation – aim to avoid doing phonetic transcription with the same symbols as are then used for doing phonological transcription/analysis.

• Abercrombie (1991) – “Segment, then, is the name of a fiction. It is a transient moment treated as if it was frozen in time, put together with other segments to form a ‘chain’ rather than a ‘stream’ of speech. Methodologically it is a very useful fiction. A segment, isolated from the flow of speech, can be taken out of its context, moved into other context, given a symbol to represent it, compared with segments from other languages, placed in systems of various sorts, singled out for special treatment in pronunciation teaching; and used in dialectology, speech therapy, the construction of orthographies. (The same is true, of course, of speech-sound and phone. They do not give rise, however, to the possibility of a word for the process, ‘segmentation.’)” (p30)

• Faber (1992) – “segmentation ability, rather than being a necessary precursor to the innovation of alphabetic writing, was a consequence of that innovation” (p112); “segmentation ability as a human skill may have been a direct result of (rather than an impetus to) the Greek development of alphabetic writing. Thus, the existence of alphabetic writing cannot be taken eo ipso as evidence for the cognitive naturalness of the segmentation that it reflects” (p127)

• Derwing (1992) – “the segment (or phoneme) may not be the natural, universal unit of speech segmentation after all, and that the orthographic norms of a given speech community may play a large role in fixing what the appropriate scope is for these discrete, repeated units into which the semi-continuous, infinitely varying physical speech wave is actually broken down.” p200

• Port & Leary (2005) in Language, 81

• Port (2006), ‘The graphical basis of phones and phonemes.’

• Ladefoged (2005) – “We should even consider whether consonants and vowels exist except as devices for writing down words … [they] are largely figments of our good scientific imaginations,” p186; “We also lose out in that our thinking about words and sounds is strongly influenced by writing. We imagine that the letters of the alphabet represent separate sounds instead of being just clever ways of artificially breaking up syllables,” p190; “the division of the syllable into vowels and consonants is not a natural one. Alphabets are scientific inventions, and not statements of real properties of words in our minds. … vowels and consonants are useful for describing the sounds of languages. But they may have no other existence,” p191; “The alphabet, which regards syllables as consisting of separate pieces such as vowels and consonants, … is a clever invention allowing us to write down words, rather than a discovery that words are composed of segment-size sounds,” p198.

• Silverman (2006) – p6, p11-13, and elsewhere.

• Lodge (2007): “There has been a long history of warnings against the notion of the phonological segment (eg Paul 1890, Kruszewski 1883, Baudoin de Courtenay 1927), as pointed out succinctly by Silverman (2006). Later the concept was criticised by Firthian prosodists (see Palmer 1970) and more recently reviewed by Bird & Klein (1990); the most recent exposé of the misguided acceptance of alphabetic segmentation in phonology can be found in Silverman (2006).”


anybody’s guess

Everyone blames phonological representations for language-related impairments, or deficits in phonology-related tasks like nonsense-word repetition. But what is a phonological representation? What do impaired phonological representations look like? In what specific ways do they differ from unimpaired representations, and how can you tell? What does it all mean?!

Munson (2006) in a commentary on Gathercole’s keynote article in Applied Psycholinguistics expatiates thus, and I can only concur:

Although there are many different perspectives on the factors that drive nonword repetition performance, we can all agree that the relationship between nonword repetition and word learning is due to the association of these constructs with phonological representations. The relevant question to ask, then, concerns the nature of phonological representations themselves. What are they? Textbook descriptions of these generally posit that they look something like the strings of symbols that we are taught to transcribe in phonetics classes. However, phonetic transcriptions, even narrow ones, are abstractions of the signals that are being transcribed. The level of detail that they code is ultimately related more to the perceptual abilities of the listener, the degrees of freedom in the symbol system, and a priori assumptions about the quantity of detail that is relevant for transcription than to the signal being transcribed and its associated phonological representation.

What, then, do “real” phonological representations encompass? What is being represented? The answer to that is anyone’s best guess. Representations themselves are latent variables. We can never see them, we can only posit them as explanations for the sensitivity that people have to variation and consistency in the speech signal in different tasks. (p578)

A welcome reality check in perhaps a slightly unexpected place, even though, of course, it still doesn’t solve the fundamental problem. Everybody’s preferred solution for testing the true nature of implicit phonological representations is different, and inadequate to different degrees and in different ways, but in the nature of the concept of phonological representations itself, that is simply how it has to be.


Munson, B. (2006). Nonword repetition and levels of abstraction in phonological knowledge. Applied Psycholinguistics 27: 4

accommodation in action

I had this fantastic conversation in a shop this morning. The assistant came up saying, as if straight out of the North of England, “Alright love?” With [w] or [ʋ] for /r/ and a definite [ʊ] in love.

I said, “Do you have any liquorice?” With, it transpires, a serious [ɾ] for the /r/. (I needed to ask – really wasn’t interested in Bassett’s.)

He immediately switched and said, “Liquo[ɾ]ice?” Precisely as I’d said it. “Have a look over here.”

And was impeccably Scottish from then on.

So: well-adjusted Anglo? or Scot who just expected shoppers to be English?

back from baap

And what a fascinating time it was. I went with the expectation of finding out about lots of new ideas, and there was certainly a lot of new findings, new measurement methods, new and refined analyses.

But by far the most engaging sessions (I thought) were the ones that looked back to the early days of phonetics and linguistics. The phonetics crew at UCL have recently discovered some forgotten film reels dating right back to the 1920s, and took the opportunity to show the conference what this collection consisted of. The films showed everything from early x-ray images of the vocal tract, to the first machine which could recognise speech, to the exciting kymography techniques which feature so prominently in some of Firth’s papers. (Wikipedia on kymograph; in the 20s they also used the sensitive flame, described in Wikipedia in its application in the Rubens’ tube.)

There was also a fascinating account of the work that was done in Japan in the 1940s. Somehow the groundbreaking work from the Japanese labs had featured in some of the reading I did for my thesis (completely unconnected to my thesis, just like lots of the most interesting stuff I read those years), but Michael Ashby and Kayoko Yanagisawa’s presentation of the London-Tokyo links also brought in some intriguing detective work as they tracked down the source of their collection of glass lantern slides, and threw light too on the development of the stylised “head diagram” used by everyone from Daniel Jones onwards for illustrating the articulators (see here, eg, p79 onwards).

Which got me thinking. On one hand, it was amazing how technologically advanced they all seemed to be in the early days – they had all sorts of innovative techniques for observing and imaging the production of speech, and they had no hesitation in making use of the newest technology available in order to apply it to questions of articulation and acoustics. That spirit, I think it’s fair to say, is still alive and well in phonetics, with people using all sorts of technologies to investigate different aspects of articulation (electropalatography, laryngoscopy, ultrasound, not to mention electromagnetism…), and so we continue to increment our knowledge of what goes on in the vocal tract when people speak.

On the other hand, a lot of the theoretical understandings were also in place about what speech means, or is, or does, in the context of human communication more broadly considered. Knowing what acoustic effects arise from air flowing across articulators arranging themselves in particular ways is one thing – knowing what contribution these sounds make in the enterprise of making each other understood, is a different matter. Yet for people like Firth and his direct intellectual descendants, their views on the phonological system (and other parts of the language system) grew out of the best understanding they had about phonetics, both in terms of their explicitly stated principles and to a large extent also in their descriptive and analytical practice.

Compare this to a talk I was at last week (not at BAAP) where a valiant attempt was made to integrate changing conjunctions of formant values into the generative understanding of what phonology is (ie, to allow phonological grammars to accommodate – even ‘predict’ – sound variation and change). I am tenatively, but increasingly, of the view that there is simply no way to validate the staples of the generative apparatus (is that a mixed metaphor?) on the basis of speech data. It may be possible to tweak a generative grammar so that it becomes something that can handle variation and change, but that’s what it becomes – it doesn’t start, from its first principles, with that capability. If you believe that “sounds”* can be decomposed into distinctive features, what aspects of the speech stream can you offer as evidence for such features? Increasingly, the defence that phonological features need not make reference to the speech stream by virtue of existing on an altogether different plane of being is unconvincing, particularly when it is coupled with an expressed wish to make allowances for phonetic variation within the phonological system.

In one of the presentations, Michael Ashby mentioned that the 1930s was the decade of international congresses (the first three ICPhS’s!) and commented, quite rightly, on what an exciting time it must have been, in terms of who was meeting who, and when, and what ideas influenced who, and the impacts of all of these developments right down to the present day. You can’t help feeling that even though the scientific study of speech sounds is so relatively young, we could be in danger of falling prey already to a sorry historical amnesia. Keep alive the sensitive flame of phonetics, the man said, but keep alive too the story of where we’ve come from, not just to make sense of the present, but to equip us for the future too! (Best read to a particularly jubilant trumpet fanfare, I would suggest.)


* Always bearing in mind Roy Harris’s immortal analogy, “To ask ‘How many sounds are there in this word?’ is to ask a nonsense question (for the same kind of reason as it is nonsense to ask how many movements it takes to stand up).” Precisely.

be-bop a doo baap

At the end of the week I’m off on a jolly to BAAP – the colloquium of the British Association of Academic Phoneticians, which I earnestly expect will be a lot of fun. All the papers on /r/! and articulation! and talk-in-interaction!

There’s actually a pretty impressive range of topics in the programme and it’s promising to be an extremely interesting time – lots of new, fresh findings, new technologies, and careful, detailed work on sometimes very subtle speech phenomena. Phonetics is so underrated, I say.

So while things like preparatory swotting up on vowel systems gobble up my time, and until I get back, feel free to talk amongst yourselves.