CTLF Corpus de textes linguistiques fondamentaux • IMPRIMER • RETOUR ÉCRAN
CTLF - Menu général - Textes

Fairbanks, Grant. Experimental Phonetics – T02

Selective Vocal Effects
Of Delayed Auditory Feedback *1

Grant Fairbanks **2

A conception of the speaking system
as an automatic control system of the
closed-cycle type has been outlined in
an earlier paper (4). A basic premise
of the formulation was mediation of
the production of the speech output
by auditory, proprioceptive and tactile
feedbacks. Because the mouth is
remote from the ear and because they
are linked outside the head by a gas,
the air-conduction auditory channel
is a convenient point for the experimenter
to intervene into the system,
modify the feedback, alter the system's
operation, and change the output
Directly or by implication this
experimental method has been used
many times.

In experiments of this type the time
constant of the auditory feedback
pathway is one dimension that can be
controlled over a useful range, and the
resulting disturbances of output have
attracted considerable interest since
three-head magnetic recording systems
have become common. The disturbances,
which are often spectacular,
convincingly demonstrate the
closed-cycle operation of the system
and the significance of the auditory
feedback in particular. Although individual
differences in the disturbance
are large, its universality is remarkable.
It is important, however, that
the system does not break down completely
and that disturbance varies
both in amount and kind.

Under the experimental situation of
delayed auditory feedback the proprioceptive
and tactile feedbacks remain
available. The auditory feedback has
become a mixture which, although it
may be dominated by the experimentally
delayed, amplified signal,
contains significant fragments of undelayed
signal. The total quantity of
such fragments depends upon two relationships
between the mixed delayed
and undelayed feedbacks, phase and
relative sound pressure. With time delay
constant, both relationships change
continuously because of the aperiodicity
of speech, and the fragments of
undelayed feedback are irregular in
frequency of occurrence and variable
in duration. To such a potpourri of
stimuli the controlling elements of the
system bring to bear the capacity of
selective attention to its feedbacks, a
feature somewhat akin to the manual
over-ride in certain automatic machines,
which it uses with variable
effectiveness. Thus the stimuli are not
only basically complex and variable,
but are also to some degree controllable
in these respects by the subject.
The nature of the disturbed output,
furthermore, is often so chaotic
that conventional measurements do
not completely describe it. As a result,
the essentially simple situation is cornplicated
10in detail and the results of it
difficult to interpret.

Since the dimension of feedback
that has been altered experimentally
is time, it would be expected that time
would be changed in the output. It
might be predicted that rate of output,
for example, would be slowed to accommodate
the system's relative sluggishness,
and this has been shown by
several investigators. In view of the
feedback mixture described above and
the ability of the system to change
its mode of operation, however, the
temporal accommodation is incomplete.
Probably the most significant
effect arises from the fact that the
delayed auditory feedback misinforms
the system about its success in effecting
and in ordering its intended output
units, thus impairing its basic
product. Among tile various possibilities
for resistance is increased vocal
intensity, a common reaction to noise.
Furthermore, the act of resistance is
presumably accompanied by more
muscular tension in the effector, from
which increased vocal intensity and
heightened fundamental frequency
might ensue along the lines or well-known
relationships. From these considerations
it would be expected that
the output would reflect various kinds
of disturbances, some of them irreducible,
some consequent upon the
attempt at resistance, some the result
of accepting one form of disturbance
for the sake of avoiding another.

For the student of speech who experiences
delayed auditory feedback
there is no necessity to justify serious
experimental investigation. Studies
such as those of Black (2) and Peters
(9), and the present experiment, for
example, ultimately should contribute
to specification of the unit of speech
control (4). In combination with
other experimental variables, delayed
auditory feedback supplies a tool for
inquiry into the relative roles of the
various sensory feedbacks and their
interactions, conceivably with application
also to motor systems of the body
in which experimental control may
not so readily be contrived. The
similarity of the disturbances to non-laboratory
speech disorders is intriguing,
as Lee has pointed out (7). The
experimenter can cause a normal
speaker's output to become like that
of a ‘stutterer,’ or an individual with
a deafness, a defect of the peripheral
speech mechanism, or a lesion of the
central nervous system, a fact that,
when thoroughly understood, is likely
to have theoretical significance for
speech pathology.

In the present experiment the main
concern has been to explore the characteristics
of the disturbance as it
varies over representative intervals of
time delay, with particular reference
to the shapes of the curves and the
locations of peak disturbance in articulation,
duration, intensity and frequency.
The experiment was thought
of as a necessary mapping of new territory,
preliminary to more detailed
explorations of particular areas. An
attempt was made, therefore, to employ
conditions and develop measurements
that would facilitate the procedures
of later experiments as well
as suggest their directions.

Procedure

General Conditions.

General Conditions. The basic plan
was to study speech output in response
to selected intervals of time
delay, each subject being required to
read the same prose material under
each of the auditory conditions. The
subjects were 16 young, male, lower*
division college students, chosen at
random from a pool of potential subjects
11who were vocally mature, presented
no clinical-level speech deviations,
habitually spoke the General
American dialect, had no history of
hearing loss, and had not previously
experienced delayed auditory feedback.
The material read was the Rainbow
Passage (5, p. 168) in its usual
experimental form, including the 55-word,
four-sentence passage proper,
preceded by the 17-word initial sentence
and followed by the 26-word
final sentence.

In the basic experiment five auditory
conditions, all employing amplified
feedback, were used. In the
first condition the feedback was undelayed;
in the remaining four it was
delayed, respectively, by .1, .2, .4 and
.8 sec. The subjects were assigned at
random to four sub-groups of four
subjects each, and the order of the
delayed conditions was rotated among
these sub-groups after the manner of
a 4 x 4 diagonally systematic Latin
square, the order shown above being
used for the first sub-group. To these
five conditions a formal pre-experimental
condition, involving neither
amplification nor headset, and a post-experimental
condition, which repeated
the undelayed, amplified experimental
condition, were added.

Instructions were general, informal
and minimal, each subject merely being
instructed to ‘read as you usually
do.’ Prior to the experiment, in order
that he might become familiar with
the conditions, each subject read the
passage three times under the following
conditions in order: without headset;
with amplified undelayed feedback;
with amplified feedback delayed
by .25 sec. After the last of these
readings he was informed, in effect,
that ‘when this happens during the
experiment you should continue to
read as you usually do.’ The attempt

image microphone | rec. amp. | rec. head | play. head | ear-phones | atten. | power amp. | play. amp. | U | D | tape

Figure 1. Arrangement of apparatus. Switch
positions: U, undelayed; D, delayed.

to do so, however, was not stressed as
an objective, and in all instances the
experimenter carefully refrained from
any specific reference to speech, i.e.,
to articulation, rate, etc., or to the fact
that time delay was an experimental
condition. Once the formal trials were
begun the subject proceeded through
the seven readings without further
instruction or interruption, reading in
response to cues from a signal light
with 10-15 secs. between readings.
The experiment was conducted in a
two-room laboratory, consisting of a
sound-treated chamber and associated
control room.

Apparatus.

Apparatus. The instrumentation has
been described previously (6). Essentially
it consisted of a special threehead
tape recorder-reproducer with
continuously variable distance between
the recorder and playback
heads. A block diagram of the system
is shown in Figure 1, the components
being as follows: microphone, Altec
21B; earphones, Permoflux PDR-10 in
1505 cushions; recording and playback
amplifiers, Magnecord PT6-R
and PT6-P, respectively; recording
and playback heads, Magnecord 91-6016;
power amplifier, Goodell ATB-3,
16 watts; attenuator, Hewlett-Packard
350A; tape transport mechanism
(not shown), adapted Magnecord
PT6-A. The system provided
two matched, independently variable,
high-fidelity channels, supplying
either undelayed or delayed outputs.

The microphone, mounted on a
boom stand, was positioned six inches
12from the mouth, approximately the
mouth-ear distance, and level with the
chin. The earphones, wired for conventional
‘two-eared’ listening, were
adjusted at the start of the experiment
and remained undisturbed throughout
The reading passage was typed on a
card and placed at a convenient position
for the subject by means of a
wire holder fastened to the microphone
boom. Subjects were seated.
Recording tape speed was 15 ips.

To avoid limiting as much as possible
all amplifiers were operated well
below capacity. Particular attention
was given to preliminary establishment
of a recording gain that would
not over-modulate the tape at high
vocal intensities, as observed on the
VU meter of the recording amplifier,
and to a playback gain that met similar
standards, with acceptable signal-to-noise
ratios at both stages. Once
adjusted and matched, the two channels
remained unchanged throughout
the experiment for all subjects and
trials, as verified routinely by means
of test signals. With the system as
used the amplified auditory feedback
signal, delayed or undelayed, varied
essentially freely with the vocal output
over, the usual, wide dynamic
range of connected speech. The feedback
level thus was determined by
both vocal output and amplification,
while variation, between subjects, between
trials and within trials, was a
function of the vocal output. In short,
variations were subject-determined
and the amplified levels were linearly
related to the unamplified levels, exceeding
them by a constant amount.
The amount of the excess was specified
in the following manner.

Six experienced male observers with
normal hearing by audiogram were
placed in the experimental situation
described for the present experiment,
furnished with undelayed feedback
(switch in position U, Figure 1), and
given control over the attenuator. A
sound level meter (General Radio
729-B; flat setting) was placed within
viewing distance, its microphone located
three feet from the lips. The
observer sustained a specified vowel,
matched in pitch to an oscillator tone
of 130 cps, at a 60 db level according
to the sound level meter, the performance
being confirmed by the experimenter.
Under these conditions
the earphones presented amplified
feedback essentially in phase with the
unamplified feedback, but the two
differed qualitatively. By simultaneously
manipulating the attenuator
the observer could raise the amplified
signal until it masked the unamplified
signal, lower it below threshold, or adjust
it so that both could be heard.
Instructions were to vary attenuation
until the two feedbacks sounded
‘equally loud.’ Time was not limited.
The procedure was followed with
[i], [æ], [ɑ] and [u], yielding 24
determinations. 13 The mean setting was
regarded as a reference and the gain
measured. The delayed channel was
matched by adjusting the gain of the
playback amplifier with the tape in
motion. Either setting could be duplicated
at any time.

A gain of 30 db over the reference
was used for the present experiment.
This arbitrary amount was decided
upon after considerable trial and introspection
13as appropriate for the purposes
of the study. In the opinion of
several experienced individuals this
value elicited substantial speech disturbances
in the delayed condition,
but did not approach the threshold of
discomfort so closely as to place a
ceiling on vocal intensity (estimated
from the loudness of shouts), while in
the undelayed condition it did not
cause speech to sound so loud to a
subject that he tended to lower vocal
intensity. The procedure provided for
internal checks upon such judgments
which, as will be seen, confirmed the
appropriateness of the choice.

Attention was called above to fragments
of unamplified, undelayed feedback
heard under conditions of delayed
feedback. The masking of the
former by the latter is some function
of the ratio of their levels, as well as
of their phase differences, and it is
suggested that it has an important
bearing upon interpretation of experimental
results and upon inter-comparison
of the results of different
experiments. Certain other experiments
have had other purposes and
have controlled feedback level in
other ways. Black (2), Atkinson (1),
Tiffany and Hanley (11) and Spilka
(10), to cite representative examples,
were concerned with sensation level
and held it approximately constant;
the first two experimenters employed
volume limiters, the other two varied
gain controls. Under those conditions
an increase in vocal intensity raises the
unamplified feedback level while the
amplification system limits the amplified
level; the amplification, in effect,
is inversely related to the original
vocal output. From this more or less
systematic change of an independent
variable it would be expected that
significant non-linearities of the response
would ensue, and minimization
of the possibility was regarded as important
in the present experiment.

Measurements.

Measurements. The sample selected
for measurement was the following
sentence: ‘There is, according to
legend, a boiling pot of gold at one
end.’ This sentence is the fourth of
six in the Rainbow Passage, comprising
the 52nd through 64th of the 98
words. Measurements were made of
articulation, duration, sound pressure
and fundamental frequency.

To facilitate study of articulation
the samples were re-recorded on disc
equipment (Presto 8DG turntable,
I-D head, 92-A amplifier). Two measurements
were made, the first of
which involved analysis of phonetic
errors. First, a ‘reference phonetic
transcription’ was made. This was
broad and lenient in the sense that it
provided considerable latitude among
articulations which would be regarded
as acceptable; thus nine of the 13
words had more than one acceptable
alternative, while two words had four
alternatives each. The intention was
to set up a standard which would be
regarded as not having been achieved
by an obtained articulation only when
the latter was obviously outside a
wide region of acceptability. A detailed
transcription of each sample
then was made, compared to the reference,
and instances of articulatory
error counted.

Since delayed feedback often results
in chaotic articulation it was necessary
to be arbitrary about definition
of instance of error. Each element
symbolized in the reference transcription
was regarded as a potential point
for such an instance, but when a series
of errors occurred in consecutive elements
only one instance was counted
for the series. Each change-point between
consecutive elements was also
14regarded as a potential locus of error.
When extraneous material was inserted
at such a point one instance of
error was tallied, without regard to
the amount or nature of the inserted
material, the place of insertion, or the
error status of the elements which
bounded it. Each case of combined
error (e.g., repetition of a syllable in
which an element also was omitted)
was tallied only once. Instances of
error so defined are referred to below
as articulatory errors. Reliability was
assessed by carrying out the complete
procedure twice with seven 55-word
samples (the complete four-sentence
passage). The two counts were made
one month apart and included both
transcription and error analysis. Self-agreement
in identifying specific instances
of error ranged from 93 to
99%.

The second measure of articulation
involved a count of the number of
words in a given sample which were
judged to be ‘correctly’ articulated.
The procedure was to listen to the
recording of each individual word as
many times as necessary and, using
copies of the text, to tally it as correct
or incorrect without analysis of the
type of error. The standard was that
of acceptability of articulation and
pronunciation interpreted liberally,
and each word was presumed correct
unless clearly incorrect.

Although it involved listening to
more than 1400 words plus numerous
additions to the basic text, the method
proved to be reasonably rapid and
decisions could be made confidently
in most cases. The only necessity for
arbitrary practice arose in connection
with material inserted between text
words. It was decided to reduce the
count of correct words by one in each
such instance, regardless of the
amount of inserted material, except
when both of the text words bounding
the insertion were themselves incorrect
The measure is hereinafter
termed correct words, and the number
of total words minus the number
of correct words as error words. For
estimation of reliability, advantage
was taken of the fact that the measure

Table 1. Group means for measures of articulation, duration, sound pressure and fundamental
frequency.

tableau time delay (sec.) | articulation | articulatory errors (N) | error words (N) | correct words (N) | duration | total sample (sec.) | phonation/N phons. | phonation/total sample | relative sound pressure | mean peak (db) | fundamental frequency | mean (cps) | 8.D. (cps)15

image time delay (sec.) | fundamental frequency (cps) | sound pressure (db) | total duration (sec.) | articulatory errors (N)

Figure 2. Variation of number of articulatory
errors, total sentence duration,
mean sound pressure and mean fundamental
frequency with time delay.

was also being made in connection
with four other experiments involving
delayed feedback which followed the
present experiment. Each of the four
judges for those experiments repeated
the complete procedure a second time
after several weeks. The four samples
ranged from approximately 1200 to
8200 total text words plus insertions,
and covered a variety of time delays,
subjects, texts and types of speaking.
The self-reliability coefficients were
.84, .94, .96 and .98.

Measurements of duration and relative
sound pressure were made from
tracings of a graphic sound level recorder
(Sound Apparatus HPL-E,
0-50 db potentiometer) for which the
original tape recordings furnished input.
Fundamental frequency was
measured by means of oscillograms
made on a locally constructed instrument
similar to that described by
Cowan (3). The samples were rerecorded
at 78 rpm for the latter
purpose, using the disc equipment
mentioned above.

Results

Basic Measurements.

Basic Measurements. 24 Group means
for representative measures of the
basic variables are presented in Table
1. The first row of means under each
sub-heading was used for the corresponding
curve in Figure 2, where
ordinate units were chosen to equate
the four ranges graphically so that the
shapes of the curves within the obtained
ranges might be intercompared;
the common abscissa is time delay.

In Table 1 it will be seen that instances
of articulatory error were
most numerous at a time delay of .2
sec, where they outnumbered those
for the undelayed condition approximately
3.5 times. As shown in Figure
2, the peak is prominent and the curve
is skewed toward the longer intervals
of delay. Number of error words
(Table 1) varied similarly, although
16the peak is less prominent. Approximately
80% of the total words were
correct in the undelayed condition,
approximately 60% in the .2 sec. condition,
for an over-all reduction of
about one-fourth.

Variation of total sentence duration,
the second curve of Figure 2,
closely paralleled that for errors, the
peak at .2 sec. being in general agreement
with the finding of Black (2),
who reported maximum duration of
five-syllable phrases at .18 sec. The
maximum effect of time delay under
the present conditions was mean durational
increase of approximately 40%.
In order to explore the possibility that
time delay might operate differently
upon the durations of phonations and
pauses certain additional measurements
were made, of which those pertaining
to phonation are shown in Table 1.
There it will be noted that the mean
duration of phonations (i.e., uninterrupted
periods of phonation) varied
in a manner generally like that for
total sentence duration and that the
proportionate amounts of total time
devoted to phonation and pause did
not change radically with time delay,
but firm conclusions are not warranted
because the one-sentence
sample was undoubtedly too short to
furnish a good answer to this question.

The effect of time delay upon vocal
sound pressure was to produce a mean
increase of 10 to 12 db which, as
shown in Figure 2, did not change
substantially over the range of time
delays studied. 3 5The data indicate that
the vocal response to time delay is
quite different in this dimension from
what it is in articulation and duration,
the ‘peak,’ if it may so be called, being
very broad and not denned within
the range of the experimental delays.
The large, undifferentiated change
suggests a general vocal reaction to
delayed feedback, which may be interpreted
as reflecting the increased
muscular tension implicit in the effort
to maintain system control.

Although the elevated intensity was
not auditorily useful to the subject in
this particular experiment, beyond the
delayed pauses, a conditioned response
to interference may have been a factor.
It is not believed that the plateau
can be attributed to either vocal, instrumental,
or auditory limiting. The
subjects gave no appearance of extreme
effort, and informal trial by the
experimenter and others showed that
the obtained vocal levels were considerably
below maximum. The instrumental
precautions and the relationships
to the threshold of discomfort
have been reviewed above (Procedure).
Study of the individual
graphic pressure records revealed no
signs of over-modulation or of restriction
of peak variation.

Results for the remaining basic
measurement, fundamental frequency,
are shown in the last section of Table
1. 4 6The nature of the changes in central
tendency (‘pitch level’) may best
be appreciated in Figure 2, which
shows a curve similar to that for
sound pressure, namely, an elevated
plateau which changed little as a
17function of delay interval. The elevation
is substantial, amounting to about
three and one-half semitones, and
could readily be perceived in the recordings.
The change is interpreted
as ensuing from increased muscular
tension in the vocal mechanism accompanying
the attempt to resist experimental
interference with the response,
and paralleling the changes in
sound pressure.

The last row of means in Table 1
shows that the mean standard deviation
of frequencies used was altered
little by time delay. This finding is
typical of the negative results of several
other standard measures of variability
(range, number of changes in
direction, inflection, etc.) that were
made and not shown. It should not be
concluded, however, that intonation
was unchanged by time delay. Plots
of frequency versus time were made
for each sample and disclosed that the
rather patterned curves for the undelayed
condition were radically
changed in the delayed conditions.
The intonational chaos was so great,
in fact, that careful study resulted in
no sensible hypotheses upon which to
base measurements to reveal its nature.
It was concluded that analysis of this
factor should await later experiments
designed for the specific purpose.

In view of the above findings it
would appear that articulatory errors
and increased duration may be regarded
as direct effects of delayed
auditory feedback, while increased
mean sound pressure and fundamental
frequency are indirect effects. It appears
that the direct effects are determined
by the phase relationships of
input and feedback. It also seems
proper to assume that the speaking
system operates on the basis of units
of speech control
(4) which vary in
period. Thus when time delay is held
constant, the phase relationships,
changing with changes in unit period,
will become critical with a frequency
determined by the frequency of
occurrence of units which have periods
critically related to the fixed
time delay. In other words, the
amount of direct disturbance of the
system produced by a given time
delay will be proportional to the number
of units of a given period which
occur in a given series of units. The
idea is advanced, therefore, that disturbance
functions for the direct
effects, such as the upper two curves
of Figure 2, provide a basis for inferring
the general characteristics of
a frequency distribution of the periods
of the hypothetical units of speech
control. The skewed shape of the distribution
is plausible. Thus Verzeano
(12, 13), measuring only the time
parameter, defined a ‘unit’ of the
speech output as the series (of phonations
and pauses or of uninterrupted
phonation) produced between two

Table 2. Group means for measures of rate in number per second.

tableau rate of | time delay (sec.) | total words | articulatory errors | error words | correct words18

pauses of specified duration or greater
and derived a group of distributions
which all were skewed, although the
degree of skewness varied with the
defining pause. It is not, of course,
suggested that the unit of speech control
should be identified with the
Verzeano unit, or with any of the
usual phonatory, phonetic or linguistic
units. It seems more likely that the
unit of control, although related to
such units, is more fundamental, and
that it may consist, at any given time,
of one or a number of such conventional
units, or of a fraction of the
smallest phonetic unit. As for assignment
of abscissa values to the hypothetical
distribution, it is not believed
that operation of the system is sufficiently
well understood as yet to
permit it. 57

It is interesting to conjecture, as
have Black (2) and Lee (8), about
possible relationships between the interval
of time delay that yields maximum
disturbance and the durations of
conventional speech output units such
as syllables, words, phonations, etc,
about which there is some information.
The criterion of disturbance,
however, is not yet established, the
data are meager, and the necessary
qualifications are numerous. The mode
duration of such a unit is the proper
statistic for the hypothesis, and although
some means are available, they
have limited serviceability because of
the large skewness of the distributions.

image time delay (sec.) | relative rate (R/RU) | articulatory errors | error words | total words | correct words

Figure 3. Variation of rate of articulatory
errors, error words, total words and correct
words with time delay. Ordinate is
rate/undelayed rate.

Measurements of Rate.

Measurements of Rate. The individual
measurements of articulation and
duration were used as a basis for
various measures of rate in units per
second, with results summarized by
the means presented in Table. 2. 6 8The
sets of means were the basis for
Figure 3, where they have been
plotted to a common ordinate, relative
rate, using the mean rates for the
undelayed condition as respective constants.
The curve for rate of total
words follows immediately from the
durations, the measure being proportional
to the reciprocal of duration. It
19shows the general retardation to 60-75%
of the usual rate of output. As
plotted in Figure 3 the curve represents
the effects upon the rate of any
text unit.

The upper curves of Figure 3 are
of interest not only because they show
the nature of changes in rate of error,
but also because they allow comparison
of the relative amounts of disturbance
in articulation and duration. It
was suggested above that disturbance
of articulation was probably the more
important, although both were termed
direct effects of time delay. In discussing
Figure 2 the similarities of the
two curves were noted, but it will be
remembered that the ranges were
equated in plotting, so that those
curves do not depict comparative
proportionate changes relative to the
undelayed condition. Both numerator
and denominator of the error rate
ratio are increased by time delay, and
if both were to increase by equal proportions
(e.g., twice as many errors
in double the time), the ratio would
remain unchanged, while departure
from the undelayed ratio would indicate
imbalance between the two forms
of disturbance.

The top curve of Figure 3 shows
that the rate with which instances of
error occurred increased for all delayed
conditions and that the effect
upon articulation was so relatively
powerful that the curve has the familiar
shape even though the concurrent
durational increases were such as to
tend to flatten it. The difference between
this curve and the curve for
total word rate in the same figure is
entirely in the numerator. It seems appropriate,
therefore, to suggest that,
of the two direct effects of delayed
auditory feedback, articulatory error
is primary and increased duration
secondary. Figure 3 also shows a
curve for rate of error words, which
is not inflected by delay interval, but
is greater than one for all delayed
conditions.

The final set of means in Table 2
and the lowest curve of Figure 3
relate to a measure of the efficiency
type, correct word rate, which indicates
disturbance inversely. Since
numerator and denominator do not
tend to cancel each other as in error
rate, the ratio becomes smaller if
number of correct words is reduced,
if duration is increased or if both occur,
and is weighted by both direct effects
of time delay. Thus it seemed to
have promise for prediction of total
disturbance, as might be judged by an
independent observer, for example. It
also accounted for an observation that
was made in study of individual recordings
and measurements: total disturbance
in a given sample involved
in many cases a seeming compromise
with the interference, in which duration
was ‘traded’ for articulation, or
vice versa, but in proportions that
varied from case to case (i.e., imperfect
intercorrelation of articulation
and duration). Apart from the effects
of delayed feedback, it was thought
that the concept might have some usefulness
in objective measurement of
speaking performance. The possibility
of application to this long-standing
problem was especially interesting,
because determination of the basic
quantities had been found to be relatively
easy and reliable, as shown
above.

The curve for correct word rate in
Figure 3 has the expected location
below that for total word rate. It is
inversely related to the error and
duration curves of Figure 2, and similar
in shape. As the measure has been
expressed for plotting in Figure 3,
the ordinate may be read as an index20

Table 3. Representative mean differences between pre- and post-experimental undelayed conditions
and experimental amplified, undelayed condition.

tableau pre-experimental unamplified | post-experimental amplified | error words (N) | sample duration (sec.) | correct word rate (N/sec.) | mean peak sound pressure (db) | mean fundamental frequency (cps)

of articulatory efficiency, interpreted
as indicating degree of achievement of
an arbitrary standard established by
the correct word rate in the undelayed
condition, or it may be subtracted
from 1.0 to yield a disturbance
measure. The ratio is applicable to any
text for which the standard of correct
word rate has been established, and
becomes sensitive to small articulatory
disturbances if the standard number
of correct words is taken as equal to
the total number of words. The duration
standard might be determined
empirically, predicted from existing
data or specified arbitrarily. Such a
ratio may be generalized as:
I = WcDn/WtDo,
where Wc, is obtained number of
correct words, Dn is normal duration,
Wt is total number of words,
and Do is obtained duration, Dn/Wt
being a constant for the text. It will be
observed that inter-comparison of different
texts is possible by means of
this expression. The index is proposed
strictly as a device for objectifying
this aspect of the speech output, and
its validity in measuring goodness of
performance as judged by a listener
is not here in question. Study of that
problem and refinement of the measure
are subjects of a later report.

Pre- and Post-experimental Conditions.

Pre- and Post-experimental Conditions.
It has been mentioned above
(Procedure) that the five conditions
of the main experiment were preceded
and followed, respectively, by an
unamplified, undelayed condition and
an amplified, undelayed condition.
Measurements of these samples were
also made, with the general result that
all differences between them and the
samples from the amplified, undelayed
condition of the experiment were
small. Differences between means are
shown in Table 3 for representative
measures. The purpose of the pre-experimental
condition was to furnish
a control for the amplification factor,
particularly with respect to vocal intensity,
which would have been attenuated
had amplification in the undelayed
condition of the experiment
been excessive. The comparison does
not indicate that such occurred. The
post-experimental condition was included
to allow perseverative effects
of time delay, if any, to be revealed.
Although the possibility had no influence
upon design of the experiment,
the condition was simple to add. The
results, however, showed no meaningful
trends. Needless to say, these
negative findings are not to be interpreted
as indicating either that amplification
of auditory feedback has no
effect upon speech output or that
exposure to delayed auditory feedback
has no after-effect.21

Summary

Subjects read a prose text under five
different stimulus conditions in which
the speech output was amplified by a
constant amount, delayed by 0, .1, .2,
.4 and .8 sec, respectively, returned
to the ears via earphones, and mixed
with the undelayed, unamplified,
lower-level auditory feedback. Measurements
of the responses yielded the
following results:

1. Delayed auditory feedback resulted
in various types of speech disturbance,
among which were increased
number of articulatory errors,
longer duration, greater sound pressure
and higher fundamental frequency.

2. The effect was found to be selective
within the range of time delays
employed. Disturbance curves for
articulation and duration were positively
skewed with prominent peaks
at .2 sec; for sound pressure and freuency
they were essentially uninflected
throughout the range. Disturbances
of articulation and duration
were interpreted as direct effects,
those of sound pressure and frequency
as indirect effects. The finding should
be investigated further at other ratios
of the amplified and unamplified feedback
levels.

3. It was suggested that the curves
for the direct effects might provide
a basis for inferring the general shape
of the distribution of a hypothetical
unit of speech control along a time
abscissa. The curves need extension
beyond the present range and more
precise definition at the short intervals
of time delay, particularly around
the point of peak disturbance.

4. Delayed auditory feedback increased
the rate of articulatory error,
indicating greater effect upon articulation
than upon duration and supporting
an interpretation of articulatory
disturbance as the primary effect of
time delay. This interpretation should
be tested in other experiments. More
detailed analysis of the articulatory
disturbances would appear to be fruitful.

5. Correct word rate was proposed
as a single, inverse measure of disturbance
that combines both direct
effects of delayed auditory feedback.
It was shown that the measure may
be generalized in the form of an index
of articulatory efficiency which is
convenient, reliable, and has possibilities
of larger usefulness as a measure
of speech.

References

1. Atkinson C J. Adapation to delayed
side-tone. JSHD, 18, 1953, 386-391.

2. Black, J. W. The effect of delayed
side-tone upon vocal rate and intensity.
JSHD, 16, 1951, 56-60.

3. Cowan, M. Pitch and intensity characteristics
of stage speech. Arch.
Speech
, 1, 1936, Suppl.

4. Fairbanks, G. Systematic research in
experimental phonetics: 1. A theory of
the speech mechanism as a servosystem.
JSHD, 19, 1954, 133-139.

5. Fairbanks, G. Voice and Articulation
Drillbook
. New York: Harper, 1940.

6. Fairbanks, G. and Jaeger, R. A device
for continuously variable time delay
of headset monitoring during magnetic
recording of speech. JSHD, 16, 1951,
162-164.

7. Lee, B. S. Artificial stutter. JSHD,
16, 1951, 53-55.

8. Lee, B. S. Effects of delayed speech
feedback. J. acoust. Soc. Amer., 22,
1950, 824-826.

9. Peters, R. W. The effect of changes
in side-tone delay and level upon rate
of oral reading of normal speakers.
JSHD, 19, 1954, 483-490.

10. Spilka, B. Some vocal effects of different
reading passages and time delays
in speech feedback. JSHD, 19,
1954, 37-47.22

11. Tiffany, W. R. and Hanley, C. N.
Delayed speech feedback as a test for
auditory malingering. Science, 115, 1952,
59-60.

12. Verzeano, M. Time-patterns of speech
in normal subjects. JSHD, 15, 1950,
197-201.

13. Verzeano, M. Time-patterns of speech
in normal subjects. II. JSHD, 16, 1951,
346-350.23

1* Reprinted from the Journal of Speech and Hearing Disorders, Vol. 20, 1955, pp. 333-46.

2** Grant Fairbanks (PhD., State University
of Iowa, 1936) is Professor of Speech
at the University of Illinois. The investigation
was supported by the Research Board
of the University of Illinois. Acknowledgement
is made to Newman Guttman, Dorothy
A. Huntington, Robert S. Brubaker,
Forrest M. Hull and Evan P. Jordan for assistance
in measurement.

31 Although the purpose was not to study
the self-loudness of speech, certain observations
are of interest. The range of attenuator
settings was compact; the point of
equal loudness was close to the point at
which the two feedbacks could not be distinguished,
and removal of five db attenuation
prevented confident detection
of the unamplified signal; at the point of
equal loudness connected speech sounded
lifelike.

42 Statistical treatment was rudimentary
throughout. Inspection of the raw data revealed
no necessity for more elaborate
analysis, considering the purposes of the
experiment. For extreme conditions, such
as zero and .2 sec. delay, the group distributions
were almost discrete for most
measures. From the trend standpoint every
one of the 16 subjects presented more articulatory
errors, longer duration, greater
sound pressure and higher fundamental frequency
in the .2 condition dun in the undelayed
condition. The differences between
means, as will be shown, were absolutely
and proportionately large.

53 Points selected for measurement were
the graphic peaks, ignoring fluctuations
smaller than five db. The average record contained'
approximately nine such points.
Measurements were made re the same arbitrary
reference for all samples; the measure
for each sample was the mean of the
peak measurements; the group means in
Table 1 have been expressed re the mean
for the undelayed condition.

64 Measurement was by consecutive time
units of .05 sec The mean frequency of
each time unit was measured, yielding approximately
80 such measures for the average
sample. For each sample the mean and
standard deviation of these primary measures
were calculated. The means of Table
1 are group means of the sample measures.

75 For example, if instability is maximal
with phase displacement of 360°, the time
delay values of Figure 2 might be reasonable
approximations, but if the critical
phase is 180°, the mode would be in the
neighborhood of .4 sec. Provisionally, this
experimenter, viewing the unit of control
as most probably a relatively long bio-acoustic
cycle, inclines toward the latter
speculation, but is not yet prepared to defend
the position firmly with evidence or
theory.

86 The values are group means calculated
from primary rate measures of individual
samples, i.e., not calculated from the means
of Table 1.