Fairbanks, Grant. Experimental Phonetics – T03

Effects of Delayed Auditory
Feedback Upon Articulation *1

Grant Fairbanks
Newman Guttman **2

An investigation of the influences of
delayed auditory feedback upon different
speech variables was reported
in a previous article (1). Four variables
were considered and the influences
were found to be dissimilar. The
results were interpreted as supporting
the conclusion that disturbed articulation
and increased duration are ‘direct
effects’ of time delay; that greater
sound pressure and higher fundamental
frequency are ‘indirect effects,’
evidencing ‘effort to maintain system
control’ and ‘resist experimental interference
with the response’; and that
articulatory disturbance is the ‘primary
effect (1).’ The nature of the
disturbance of articulation is the subject
of the present article. It constitutes
a second report of data from the
experiment and presents the results of
an attempt to make an orderly, primarily
phonetic description of a
speech display that, as observed earlier,
‘is often so chaotic that conventional
measurements do not describe
it (1).’ Whereas the first report considered
only the number of articulatory
errors, without regard to type,
and restricted the observations to a
sentence drawn from a longer sample,
the analysis reported here gave special
attention to the various types of errors
and extended the treatment to the
complete sample.

Procedure

Since the experiment has been described
elsewhere (1), only its general
plan need be reviewed. The subjects
were 16 young men, each of whom
read the same prose passage seven
times as follows: a free pre-experimental
reading without amplification
or earphones; five experimental readings
with amplification via earphones
and with time delays of 0, 0.1, 0.2, 0.4
and 0.8 sec.; a post-experimental reading
under the same conditions as the
0-sec. experimental reading. The performances
were recorded. A six-sentence
experimental passage was
used, of which the middle four sentences,
totaling 55 words, were studied
24in the present analysis. (The measurements
reported earlier were confined
to the third of the four sentences, a
13-word sample.)

The overall articulatory accuracy
of each sample was estimated by
counting the number of correct words.
This measure was introduced and discussed
in the first report. ‘The procedure
was to listen to the recording
of each individual word as many
times as necessary and… tally it as
correct or incorrect… The standard
was that of acceptability of articulation
and pronunciation interpreted
liberally, and each word was presumed
correct unless clearly incorrect.
[When material was inserted between
words the practice was] to reduce
the count of correct words by one in
each such instance, regardless of the
amount of inserted material, except
when both of the text words bounding
the insertion were themselves incorrect
(1).’ The ratio of this count
to the total reading rime furnished a
measure of articulatory efficiency in
correct words per second. This has
been termed correct word rate and
it also was discussed in the first report.

Errors of articulation were studied
by means of an independent procedure
which was performed at a different
time. The basis of the analysis was a
reference transcription of the passage
which showed the expected phonetic
output in an undisordered reading.
For the articulation of each element,
this constituted a standard ‘which
would be regarded as not having been
achieved by an obtained articulation
only when the latter was obviously
outside a wide region of acceptability
(1).’ This kind of standard implies
a large number of admissable alternatives
and a few illustrations will show
the level of specification employed.
In the words beyond the horizon, for
instance, the standard was as follows:

b
ɪ, i
j
ɑ, ɔ
n
d or omit
ð
ə, ʌ, ɪ, i
h or omit
ə, ɔ, o, ou
r
ɑɪ, aɪ
z
ṇ, ən

Examples of other alternatives are
[ʍ] or [w] in white, [ð] or [ɵ] in
with, omission of [d] in and its, ends
and finds, omission of one [l] in
people look.

A phonetic transcription of each
sample was made for comparison with
the reference. The practice was to
transcribe broadly by phrases as
spoken, recording stress patterns, and
the transcription served as a working
data sheet for the error analysis. All
phonetic deviations from the reference
were located and classified according
to major type of error, substitution,
omission or addition, plus a miscellaneous
category explained below. As
will appear, each instance was then
sub-classified according to various aspects
thought to be of interest. All
articulatory elements represented in
the reference transcription were potential
points of error, but when the
same type of deviation occurred in
two or more consecutive elements
(e.g., an omitted poly-phonemic
word), one instance of error was
noted and its length accounted for in
sub-classification. Different types occurring
in succession, however, were
counted as separate instances. Additions
to the expected output were25

Table 1. Group means for measures of correct words, duration and correct word rate. Lowe
section converted from previous report (1).

* Pre-experimental
† Post-experimental

numerous, as will be seen, and all
change-points between elements were
considered as possible loci. Each addition
was tallied once, regardless of its
other characteristics. Thus, in the
general count, error refers to an instance
of error of a specified major
type, consisting of a run of one or
more elemental deviations of that
type.

Results

General Articulatory Accuracy

General Articulatory Accuracy.
The upper section of Table 1 presents
group means for percentage of correct
words, duration and correct word
rate. Duration is expressed as total
speaking time divided by number of
text words, 55, for purposes of intercomparison
and the values may be
interpreted as mean word period, including
pause time. The lower section
shows similar measures on the 13-word
sentence. Data for the pre- and post-experimental
readings are given at the
left as a matter of interest; their
resemblance to the 0-sec experimental
reading will be observed. The experimental
changes in the sentence nave
been discussed thoroughly in the first
report and there is no evidence in any
of the measurements of the complete
passage which necessitates revision of
interpretation. Study of the means for
the undelayed performances suggests
that the proportion of correct words
in the passage as a whole appears to
have been somewhat higher basically
than in the one sentence studied earlier,
so that the whole set of passage
means is at a higher level. The duration
means for the passage are also
systematically larger than for the sentence,
presumably because it included
three between-sentence pauses. The
various values of correct word rate
are very similar for the two samples.
The major point, however, is that all
three measures of both samples varied
by large amounts in response to
changes in delay interval, with peak
disturbance invariably at 0.2 sec.

Effect of Delay Interval upon Type of Error

Effect of Delay Interval upon Type
of Error. The 90 experimental performances
yielded a total of 1,548 instances
of articulatory error. The
results of sorting these by delay interval
and type of error are given in26

Table 2. Group means and standard deviations for total number of articulatory errors and errors
of types shown.

* Pre-experimental
† Post-experimental

Table 2, which shows group means
and standard deviations. The change
in the mean number of total errors
(top row) with variation of time
delay evidences the afore-mentioned
increase with peak at 0.2 sec. The four
types of error also varied in this same
general way, but by relative amounts
that were not similar. In other words,
as the number of total errors varied,
the shape of the distribution according
to type did not remain the same. This
is most readily apparent in Figure 1,
which was prepared from the columns
of means in Table 2. It will be seen
that in the experimental condition
which involved no delay (labeled 0₂),
as well as in the other undelayed conditions,
the errors were preponderantly
substitutions and omissions,
about equally divided. When the articulatory
disturbance was at its peak
(0.2-sec delay), addition was the
most common error. Whereas the
number of omissions, for example,
approximately doubled from 0₂ to 0.2,

image mean number of errors | interval of delay (sec)

Figure 1. Distributions of major types of
articulatory errors at different time delays.
0₁ and 0₂: pre- and post-experimental readings
with 0-sec delay.27

Table 3. Distributions of total substitutions at different time delays.

additions became 20 times as common.
Comparison of the profiles in Figure 1
indicates the nature of the change in
shape as the severity of the general
disturbance changed at the various
time delays. The data indicate not only
that severity of articulatory disturbance
varies with interval of delay and
that delay-induced errors of different
types are not equally numerous, but
that the number of occurrences of a
given type of error depends upon the
interval of delay at which the observation
is made. The most distinctive
characteristic of peak disturbance is
high incidence of additions. 13

The errors of each type were subsorted
in various ways. The general
procedure was to tally the errors with
respect to two or more categories of a
given factor and the result was a number
of distributions among which were
some that appeared to have descriptive
utility. These are shown in the tables
which follow for the five conditions
of the experiment proper. Each entry
is the number of instances among all
16 subjects, the sum of the entries in
a given distribution corresponding to
the means of Table 2.

Substitutions

Substitutions. The results for errors
of this type are in Table 3. Of the 108
errors observed in the undelayed experimental
condition, 49 were substitutions.
As is shown, 39 of these were
voicing errors, or interchanges of consonant
cognates, which might be regarded
as instances of a minimal form
of substitution. The number of these
simple errors increased under time
delay, but at peak disturbance they
were outnumbered two to one by
more radical deviations. The latter
were so extremely varied and unusual
that attempts to sort them phonetically
were unproductive and it is
considered that this unconventionality,
which may readily be heard in casual
listening, is itself a distinctive attribute.
It is possible that most such substitutions28

Table 4. Distributions of total omissions at different time delays.

are to be interpreted as phonetic
anachronisms which are directly triggered
by delayed feedback and that
they have much the same significance
as the repetitive type of addition
which is discussed below.

The second row of distributions in
Table 3 refers to the ‘length’ of the
instance of error in number of phonemes,
one of the sortings that were
carried out with all error types. It will
be seen that only one-phoneme instances
were observed in the undelayed
condition and that such errors
were also most common in all conditions.
The performances under delay,
however, yielded a number of poly-phonemic
substitutions, which averaged
about one per subject at the peak.

The remainder of the table shows
the results of sorting the substitutions
according to the stress of the syllables
in which they occurred. 2 4Roughly
one-third of the errors in the undelayed
reading were in stressed syllables,
a number that probably differs
little from the proportion of such
syllables in the average reading. Under
time delay it will be seen that the distribution
shifted in relationship to
degree of disturbance, so that when
substitutions were most numerous,
more than one-half of them came in
syllables judged to have been stressed
in their respective spoken phrases. The
impression of stress in time delay is
that it is characteristically atypical
and that the speaker has sacrificed his
usual stress patterns as part of his
attempt to preserve phonetic accuracy.
The shift of the stress distribution
probably indicates that the speaker has
a tendency to ‘stress’ syllables in
which errors occur, in connection
with his effort to avoid them. The
auditory effect of the association of
substitution and stress is to emphasize
the phonetic unlawfulness of substitutions,
already alluded to, by increasing
'their relative prominence.

Omissions

Omissions. The sub-sorting of these
errors was unrewarding, as may be
seen in the two sets of distributions
shown in Table 4. In the undelayed
performances, 46 of the 49 instances29

Table 5. Distributions of total repetitions at different time delays.

were mono-phonemic. This continued
to be the most frequent length of
error with time delay, as shown, although
substantial numbers of poly-phonemic
omissions occurred, constituting
about one-fifth of all instances
at 0.2 sec.

The practice followed in judging
stress was different from that followed
with other errors because of the nature
of omission. When a portion of
a syllable remained, the stress of the
fragment was judged; when entire
syllables were omitted, their expected
stress was estimated. It also will be
noted that the mixed category mentioned
above was necessary. The distributions
in Table 4 are similar in
general form throughout the five conditions
and there is no sign of any
important change as the total number
of omissions varied.

Repetitions

Repetitions. The characteristically
high incidence of additions to the normally
expected output was remarked
above. These were divided into repetitions
and insertions (non-repetitive
additions) and the results of sub-sorting
the two classes are shown in
Tables 5 and 6, respectively. The relative
numbers of each may be compared
in the top lines of the two tables.
Among the total of 449 additions
counted in the 64 performances under
delay, 312, or approximately 70%,
were classified as repetitions. This
classification was assigned conservatively,
and only when the phonetic
resemblance between a given addition
and the preceding utterance was unquestionable.
Delay-induced repetitions
are interpreted herein according
to a conception of speech action in
which the cueing stimuli for the serial
motor responses in speaking are taken
as supplied sequentially by the feedbacks
of the responses themselves (2).
A given response cues the next response,30

Table 6. Distributions of total insertions at different time delays.

etc.. In this view of speech
control, when the feedback of one
response is delayed so that it coincides
with a second response for which it
is the stimulus, it will trigger a repetition
of the second response, if it
dominates the feedback complex during
the second response.

The habitual incidence of repetitions
in the speech of a small number
of individuals, who have been studied
extensively for that reason in part,
and the comparative rarity of this
type of error in the oral reading of
unselected subjects provide special interest
in this type of error. While
reading the 55-word passage without
time delay, 14 of the subjects yielded
no repetitions. The other two repeated
once each, both times obviously
to correct errors of reading.
Under 0.2-sec delay, the same subjects
averaged 7.3 repetitions, produced at
a rate of one every four seconds. No
subject was free of repetitions, and
one man produced 18.

The two corrective repetitions observed
in the undelayed condition
were ‘with its past — path high above’
and ‘people look for — people look.’
Undoubtedly some of the delay-induced
repetitions have similar functions,
but many of them do not sound
purposeful in the usual sense; consider,
for example, ‘white light — light’ and
‘these take the — the shape.’ The first
row of distributions in Table 5 gives
the results of an attempt to make a
division that would bear on this point.
The basis of sorting was the accuracy
or inaccuracy of the first articulation
of the repeated portion. It was reasoned
that corrective repetitions would
tend to come within the class with inaccurate
first articulation (although
not all of that class are of that type)
and that the class with accurate first
articulation would be composed largely
of non-corrective repetitions (although
it would not include all of
them). The distributions show that
most of the instances were in the latter
class. This result is viewed as
support for the auditory impression
that delay-induced repetitions ‘sound
as if’ a large proportion of diem are
direct and automatic responses to misinformation
in the feedback complex,
rather than corrections of other earlier31

Table 7. Distributions of total miscellaneous errors at different time delays.

tableau time delay (sec) | total | shifted juncture | slighting

errors. This interpretation is supported
inferentially by the very numerous
errors of other types that were
not immediately followed by repetitions.

The second section of Table 5 gives
the results of counting the number of
times the repeated portion was articulated
in each instance. This is pertinent
to the nature of the delay-induced
repetition. Let a train of speech
responses A, B, B′, C and their respective
feedbacks a, b, b′, c be assumed,
with B′ denoting an unintended repetition
of B. Let it also be assumed that
significant portions of the feedbacks
are delayed by a time interval such
that the delayed portion of a coincides
with B and that the train of responses
is adequately periodic so that the effective
delay is also one response
thereafter. B′ is interpreted as a response
to the delayed portion of a,
with a dominating the undelayed portion
of b during B. The delayed
portion of b then coincides with B′
and with the undelayed portion of b′.
Since b resembles b′ the feedback
complex during B′ triggers C. The
system does not repeat B again because
the first repetition temporarily restores
the normal phase relationships of output
and feedback, even though two
different versions of the same action
are involved. It will be understood
that speech action is sufficiently aperiodic
and output amplitudes sufficiently
variable that conditions such as
were assumed to produce B′ do not
prevail for long periods of time. If
this were not so, if the system continued
to make the effort to produce
its usual output, and if the delayed
feedback invariably dominated, then
it would be seen that a delay of feedbacks
by one response would yield
A, B, B′, C, C′, D, D′, etc.

Table 5 shows that almost all repetitions
were simple, two-articulation errors,
(B, B′), and that the remaining
few involved three articulations (B,
B′, B″). Longer repetitions have been
observed at other times, however;
they are uncommon, appear to be
person-linked, and give the impression
of wild and uncontrolled oscillation
of the vocal mechanism.

It was noted above that most substitutions
and omissions were mono-phonemic,
although longer errors became
more numerous as disturbance
increased. The situation is different
for repetitions, as the next distributions
in Table 5 show. First, poly-phonemic
errors outnumbered mono-phonemic
errors at both 0.2 and 0.4
sec. delay. Second, the 0.4-sec. delay
interval elicited about three-fifths of
all long (3-10 phonemes) repetitions.
At that delay, 30% of the errors were
long; at 0.2 sec., 12%. This is the only
classification of error anywhere in the
32data which yielded strong indication
of peak incidence elsewhere than at
0.2 sec. The finding is consistent with
the suggestion made elsewhere that
‘units of speech control’ should not
be ‘identified with any of the conventional
units such as the phoneme (2).’

The final section of Table 5 presents
the stress classifications of repetitions.
In each instance the category was determined
by the stress of the repeated
portion at the time of its first articulation.
Only rarely did the stress of the
second or third articulations differ. It
will be seen that the first articulation
was stressed more often than unstressed,
an association similar to that
reported for substitutions. It seems
unlikely that this indicates that stressed
syllables are vulnerable to repetition
because they are stressed. As a matter
of fact, if there is any difference, an
unstressed syllable should be more
vulnerable because the level of its undelayed
auditory feedback is ordinarily
low (by definition) and more
subject to masking by the delayed
signal. As has been said above, it is
believed that heightened ‘stress’ is a
part of the effort to evade articulatory
error. The association of repetition
and obtained stress is interpreted
as reflecting a tendency for both to
occur at times when control of speech
action is most precarious.

Insertions

Insertions. Table 6 is devoted to the
non-repetitive additions. The instances
ranged from complete words added to
the text to unexplainable, seemingly
random articulations within words. The
general tabulation indicates that
occurrence varied with time delay in
the familiar manner, with the average
subject producing about three instances
at the peak. As may be seen,
most insertions were mono-phonemic,
but longer instances became more frequent
as the delay interval was increased.
Unstressed insertions predominated
strongly, and with this type of
error, the stress of the error itself, or
of the syllable in which it occurred,
was noted. Most insertions occurred
between words in all experimental
conditions.

Miscellaneous Errors

Miscellaneous Errors. Two kinds of
errors, neither of high frequency,
were combined in this category. The
first has been termed shifted juncture;
an example is [rɑunʼdɑrtʃ] for round
arch. The other was the sort of semi-omission
known as slighting. The total
number of these kinds of errors in all
conditions was 115, about as many as
the number of repetitions under 0.2sec.
time delay. For the sake of completeness,
they are shown in Table 7,
but no sub-sorting was attempted.

Summary

Sixteen young men read a prose passage
five times each. The time delay
of amplified auditory feedback differed
at each reading, the values being
0, 0.1, 0.2, 0.4 and 0.8 sec. and the
performances were recorded. The
articulatory disturbances were analyzed
and described.

1. In agreement with previous report,
the general effect of time delay
was to reduce the number of correct
words, increase the total reading time
and retard the correct word rate. Disturbance
was maximal when the delay
was 0.2 sec.

2. Severity of articulatory disturbance,
estimated by number of instances
of error, varied substantially
with both delay interval and type of
error; interaction of interval and type
33was also large. In other words, delayed
auditory feedback not only induces
articulatory disturbances, but
selectively varies the number of disturbances
of certain types in relation
to the specific interval of delay.

3. The substitutions induced by delay
tended to involve improbable
phonetic elements, to be mono-phonemic,
and to occur in stressed syllables.
The latter relationship, which
apparently is based on a tendency to
increase vocal effort on syllables in
which errors occur, in an attempt to
avoid them, increases the listener's impression
that most substitutions are
phonetically unlawful.

4. Delay-induced omissions were
high in frequency of occurrence and
fairly substantial numbers of them
were poly-phonemic at the point of
peak disturbance, but otherwise they
were unremarkable.

5. High incidence of additions was
the most distinctive characteristic of
the peak disturbance and about 70%
of the additions were repetitive. The
repetitions were not predominantly of
the corrective type, such as those
heard occasionally in free speech, but
for the most part appeared to be unpurposeful
responses to stimuli in the
delayed feedback. Almost all repetitions
were simple double articulations
and it was pointed out that this would
be expected if the second articulation
temporarily restores the normal
output-feedback relationship. At 0.2
and 0.4 sec, poly-phonemic errors
were in the majority and the length
of the portion repeated varied directly
with the delay interval; short
errors peaked at 0.2 sec, long errors
at 0.4 sec The first articulation of the
repeated portion of the utterance was
more frequently stressed than unstressed,
seemingly indicating that
both error and increased effort tended
to occur at times of precarious control.

6. Most of the insertions, or non-repetitive
additions, were mono-phonemic,
unstressed and occurred between
words.

7. Two other forms of error,
shifted juncture and slighting, varied
with time delay in the manner of the
other types, but were considerably
less common.

References

1. Fairbanks, G. Selective vocal effects of
delayed auditory feedback. JSHD, 20,
1955, 333-346.

2. Fairbanks, G. Systematic research in
experimental phonetics: 1. A theory of
the speech mechanism as a servosystem.
JSHD, 19, 1954, 133-139.34

1* Reprinted from the Journal of Speech and Hearing Research, Vol. 1, 1958, pp. 12-22.

2** Grant Fairbanks (PhD., Sate University
of Iowa, 1936) is Professor of Speech at the
University of Illinois. Newman Guttman
(PhD., University of Illinois, 1954) is on
the research staff of the Bell Telephone
Laboratories. The investigation was supported
by the Research Board of the University
of Illinois.

31 It had been planned to test the interaction
of delay interval and type of error
by analysis of variance, but the test appears
questionable in view of the heterogeneity of
variance (see standard deviations in Table
2). It is believed that the systematic differences
between the means are sufficiently
large for confident interpretation without
formal test.

42 It was possible for an error, as defined,
to be polysyllabic, and all of the major
types included a few such instances. In some
of these the stress of the syllables was not
uniform, and these were termed ‘mixed’ in
sorting by stress. The category was not
needed for substitutions.