Fairbanks, Grant. Experimental Phonetics – T14

Diphthong Formants and Their Movements *1

Anthony Holbrook
Grant Fairbanks **2

The experimental work reported here
has taken for its main concern the
acoustical characteristics of the five
common diphthongs. For convenience,
these will be symbolized herein as /eɪ/,
/ɑi/, /ɔi/, /ou/, /ɑu/. The dynamic
nature of diphthongs has long been
recognized by the practice of transcribing
them with double symbols
such as the foregoing. There has been
considerable variation in the symbols,
however, and while it is not the purpose
of this article to treat symbolic usage,
the variation suggests the desirability
of more acoustic data.

The methods of formal experimentation
have not been applied extensively
to the diphthong. Visible Speech, the
well-known book by Potter, Kopp, and
Green (9), contains numerous illustrative
spectrograms. Potter and Peterson
(10), in a paper concerned with graphic
methods of representing vowels, traced
the shifting formants of a single
speaker's diphthongs, as also did Joos
(2). Peterson and Coxe (8) measured
some utterances of /eɪ/ and /ou/ in
isolation, using themselves as subjects,
while the work of Lehiste and Peter
son (3) included some treatment of
diphthongs. In the pre-spectrographic
period Liddell (4, 5, 6) performed
wave-by-wave Fourier analyses of diphthongs
and, noting the extensive shifting
of spectrum, suggested (6) the term
polyphthong.

In the present experiment the general
procedure has been to collect samples
of the diphthongs as spoken in words
by General American speakers under
laboratory conditions, and to subject
these samples to spectrographic analysis.
In addition, it was convenient and
appropriate to include the diphthong-like
/ju/. Since it is generally conceded
that the first three formants of vowels
contribute the greatest part of the information,
these formants, and their
frequencies in particular, were of major
interest. The amplitude factor, however,
which has been given only limited
attention in the vowel studies, was also
measured. Special consideration has also
been given to variations of frequency
and amplitude as a function of rime.
There was further interest in locating
the acoustical areas within which diphthongs
vary with reference to the vowel
system. To support this inquiry samples
of vowels were also collected from the
same subjects and measured with respect
to frequency and amplitude of
the first three formants.116

Procedure

Collection of Samples

Collection of Samples. Diphthongs
and vowels to be analyzed were recorded
by 20 male General American
speakers, with no obvious dialectal or
clinical speech problems, who were
volunteers from introductory speech
classes at the University of Illinois. The
spoken samples were drawn from real
words of the language which were as
free as possible from the determining
effects of adjacent consonants. The consonant
/h/, which has negligible influence
upon following vowels, was used
with both diphthongs and vowels. The
diphthong samples were found in the
six surnames Hay, High, Hoy, Hoe,
Howe, Hugh. For the vowels, the
words of Peterson and Barney (7) were
used, namely, heed, hid, head, had, hod,
hawed, hood, who'd, hud, heard. Although
the effect of the /d/ is real
and large it was believed that it might
be minimized by analyzing a portion as
close to the /h/ as possible after the
establishment of the steady state of the
vowel. Thus the samples of diphthongs
and vowels were roughly comparable,
since both were subjected to approximately
the same consonant influence.

In the collection of the diphthongs,
a method was sought whereby natural
utterances of the words would be obtained.
After various attempts it was
decided to have the subjects speak each
of the six surnames as the last word of
the sentence ‘My name is John Doe’ as
if responding to the question ‘What is
your name?’ The sentences were typed
in random order on a card and numbered.
At the time of the experiment
the subject was requested to study the
card, and it was explained that his
task was to read each line after the
experimenter spoke its number, and to
do so as if answering the question
mentioned. Each of the six sentences
was repeated three time in a random
order of 18 responses that was different
for each subject. The responses were
elicited one after the other without
stopping, at a rate of one every three
to five seconds. In the event of a reading
error, the number of the sentence
in question was repeated immediately.

The vowels were collected in a different
manner, for it was desired that
the pronunciation of the words be
predictable, but that the words be
spoken in a brisk and natural manner.
To accomplish this, pronunciations
were established by placing each word
as the fourth member of a series of
rhyming words. The subject read from
a card showing lines of the rhyming
words. The lines were in random order
and numbered. As was the case with
the diphthongs, the lines were read by
number, and each subject read in a
different random order of 10. To facilitate
brisk utterance, a reading rhythm
was established by the use of a light
which flashed automatically at intervals
of 0.8 sec. This pacing light was placed
so as to be in the periphery of the
subject's vision. A practice card which
did not contain the experimental words
was presented to the subject, and he
was allowed to practice the rhythm
with the pacing light prior to the reading
of the vowel card. This procedure
caused the words to be spoken at a
rate fast enough for naturalness of utterance,
but to be, nevertheless, discrete.
The words say and end bounded
the four rhyming words and were used
to absorb any end-effects upon intonation
117and stress. For example, the line
for /i/ was as follows:

say deed feed need beed end.

The entire procedure was recorded in
a sound-treated room by means of an
Altec M-11 microphone system and a
Magnecord M-90 tape recorder operated
at 15 ips, the recorder being
situated in a nearby control room.

Selection of Words for Analysis

Selection of Words for Analysis.
With few exceptions, any one of the
three samples of a diphthong produced
by each subject would have served for
analysis in the sense of being representative.
To eliminate the exceptions
and formalize selection of one of the
three a modest judgmental procedure
was used. This involved three experienced
faculty members who listened
independently to the samples over highquality
earphones. Each judge first
picked the two of the three that he
considered most representative, and
then chose between these two. The
sample preferred by a majority of the
three judges was selected. The judges
reported difficulty in making the judgments,
remarking on the similarity of
most sets, and also indicated that the
intelligibility of all samples was unquestionable.

Spectrographic Measurements

Spectrographic Measurements. A Kay
Electric Company Sona-Graph was
used for the acoustical analysis of the
samples. The normal frequency range
of 8 000 cps full-scale was modified
locally according to instruction received
from the manufacturer so that
it was reduced to 3 500 cps. The frequency
calibration of this apparatus
was accomplished by the method and
equipment described by Fairbanks and
Grubb (1). The frequency scale was
calibrated with this equipment for each
section. Also, the frequencies of Formants
One, Two, and Three were
determined by the method outlined in
the above article. Essentially, this
method consisted of determining the
component of greatest amplitude in a
formant, or the point midway between
two components of equal amplitude.

The amplitude characteristics of the
spectrograph were determined by tones
which were recorded on the instrument
at nine levels and 23 frequencies covering
the range of interest. The measurements
provided a basis for constructing
a transparent template by means of
which the amplitudes could be read in
decibels directly from the sections. The
formant amplitude was defined as that
of the strongest component, or the
arithmetic mean of two equal components,
expressed in decibels above the
baseline of the section.

Analysis of Diphthongs

Analysis of Diphthongs. In the spectrographic
analysis, the tape recordings
were played with the playback gain
adjusted to equate the maximum voltage
of the diphthongs as determined by the
tape recorder VU meter. The record
gain of the spectrograph was held constant
by means of a constant voltage
tone. The output of the tape recorder
was connected via a Hewlett-Packard
350A attenuator to a 500 ohm matching
transformer on the spectrograph. The
combination of 45 cps bandwidth and
HS frequency response was used, the
HS response imparting a positive slope
of about 6 db per octave to the sections.

The conventional spectrograms were
made with the playback gain of the
spectrograph adjusted to 10 db above
the calibrated level. In making the sections,
the playback gain was returned
to its calibrated level so that amplitude
118measures could be made. Due to the
differences in the two circuits involved,
the resolving power of the sections was
essentially equal to that of the conventional
spectrograms despite the difference
in gain setting. At the calibrated
level both the maximum and minimum
signal-to-noise ratios encountered in the
experimental samples were measurable.

Following the making of each spectrogram,
before it was removed from
the instrument, the display was inspected
and the portions to be sectioned
were located. The sections were made
immediately thereafter. The first section
was made at the initial appearance
of the formants, and then a section was
located at the end of each formant.
Following this, additional sections were
made as needed to describe all of the
significant formant changes.

On all sections, frequency and amplitude
measures were made for the fundamental
and the three lowest formants.
The frequency measures were then
plotted on semilogarithmic paper, with
frequency on the ordinate and duration
on the abscissa, and connected by a
smooth line. The result was a characterization
of the frequency-time display
of the spectrogram.

Analysis of Vowels

Analysis of Vowels. The acoustic
analysis of the vowels was accomplished
in a manner similar to that of the
diphthongs. However, the method of
locating the portion to be sectioned was
somewhat different, since only one section
was made per vowel. In order to
avoid the effect of the final /d/, the
spectrograms were carefully inspected
and the location of the section established
in the steady state portion of the
vowel close to /h/. The spectrogram
was then removed, and the section was
made immediately thereafter. The sections
were then measured in the manner
described above with respect to frequency
and amplitude of the fundamental
and the three lowest formants.

Results

Durations of Diphthong Formants

Durations of Diphthong Formants.
Although the measurement of the
formant durations was limited to the
resolving power of the spectrograph,
the spectrograms of the diphthongs and
the time plots made from the sections
showed, in a given utterance, variations
in the times of onset and cessation of
the formants and in their total durations.
In addition to the expected
variability between subjects, these differences
appeared to be in some respects
phoneme-linked. It was observed that
in all utterances the onset of the first

Table 1. Mean time intervals between first measurements of Formant One in diphthongs and
final measurements of each formant. Values in sec.

tableau diphthong | formant | one | two | three | /eɪ/ | /ɑɪ/ | /ɔɪ/ | /ou/ | /ɑu/ | /ju/119

image formant | time (sec) | eɪ | ɑɪ | ɔɪ | ou | ɑu | ju

Figure 1. Mean durations and time locations
of the fundamental and three formants of
diphthongs. Fundamental persisted at least
0.04 sec beyond final measurement, as shown
by dashed extensions.

formant either coincided with or preceded
the onset of the other formants,
and this point in time was used as a
reference point. The relative starting
and ending times for all formants were
determined, and the mean ending times
are given in Table 1. The mean starting
times were all zero except for
Formant Three. The mean onset of
Formant Three occurred 0.01 sec after
the start of Formant One in /eɪ/, /ɑɪ/,
and /ɑu/; 0.04 sec in /ou/; and 0.08
sec in /ɔɪ/. The time measurements
are displayed graphically in Figure 1.

The formant durations of /eɪ/ and
/ou/ are seen to be somewhat shorter
than the corresponding ones of /ɑɪ/
and /ɑu/. The short durations of /eɪ/
and /ou/ correspond to the general conception
that they involve less articulatory
movement than do /ɑɪ/ and /ɑu/.
When all three formants are considered,
/eɪ/, /ou/, and /ju/ had the shortest
durations. For all diphthongs there was
simultaneous onset of Formants One
and Two and the fundamental, as can
be seen by the alignment of the bars
at the left of each diphthong. The
lowest two formants had about equal
total mean duration with the exception
of /ju/.

It may be seen in Figure 1 that in
/eɪ/ and /ɑɪ/ the mean ending times
for all three formants were about
equal internally. In /ou/ and /ɑu/
Formant Two is slightly shorter than
Formant One, and Formant Three is
considerably shorter than Formant
Two. The difference in the second
vowel elements is, therefore, discernible
in the durations of Formants Two and
Three, and especially in Formant
Three. In /eɪ/, /ɑɪ/, and /oi/ there is
only small variation within the diphthong
in the ending times of the
formants. Study of /ou/, /ɑu/, and
/ju/ reveals a decreased length of
Formant Three. In /ju/ Formant Two
was also abbreviated. It should be observed
that Formant Three is the most
variable in duration. The influence of
the back vowel is apparent in the
decreased durations of Formant Three.
To illustrate the point, a direct comparison
of /eɪ/ and /ou/ may be made
with respect to Formant Three. Its
duration equals that of Formant One in
/eɪ/, but in /ou/ it is only about two-fifths
of Formant One. The late start
of Formant Three in /ɑɪ/ also reflects
the back vowel influence.

In Figure 1 it will be noted that /ju/
has unique features. Although Formant
One compares favorably with the durations
of /eɪ/ and /ou/, Formant Two
is exceptionally short, 0.07 sec shorter
than Formant One. Formant Three
starts early, showing the front vowel120

Table 2. Median frequencies of diphthong formants at sampling points. Medians based on
20 utterances with exceptions as noted in text. Values in cps.

tableau diphthong | sampling point | fundamental | /eɪ/ | /ɑɪ/ | /ɔɪ/ | /ou/ | /ɑu/ | /ju/ | formant one | formant two | formant three

* One utterance.

influence, and ends early, reflecting the
back vowel influence.

The fundamental invariably outlasted
all formants. In Figure 1 the dashed
extension seen at the right of the fundamental
bar indicates that it persisted
for at least 0.04 sec beyond the longest
formant in all utterances. No attempt
was made to measure its duration past
this point.

Temporal Variations of Diphthong Formants

Temporal Variations of Diphthong
Formants. Formants One and Two
essentially coincided throughout their
durations, and this common time interval
was used as the unit duration of the
individual utterance and served as a
basis for locating sampling points to
characterize the diphthong formants.
In the case of /ju/ Formant Two was
abbreviated, and the time interval of
121Formant One only was taken as the
unit duration.

Examination of the individual frequency
plots showed that changes of
all formants occurred for the most part
during this time interval and that three
more equally spaced points would follow
these interval changes adequately.
Three equidistant sampling points were
deemed adequate to describe the
changes of the frequency and amplitude
of the fundamental.

The measures at the five sampling
points described above were made in
the following manner. The first and
last sampling points were taken from
the frequency and amplitude measures
derived from the sections. If the section
was less than 0.02 sec away, the
section value was recorded. For the
second, third, and fourth sampling
points, the frequency was read from
the individual frequency-time plots to
the nearest 5 cps, and the amplitude

Table 3. Median amplitudes of diphthong formants at sampling points. Values in db re baseline
of section.

tableau diphthong | sampling point | fundamental | /eɪ/ | /ɑɪ/ | /ɔɪ/ | /ou/ | /ɑu/ | /ju/ | formant one | formant two | formant three122

image amp (db) | freq (kc) | eɪ | ɑɪ | ɔɪ | ou | ɑu | ju | relative duration

Figure 2. Temporal variations of median frequencies and amplitudes of formants during the
course of utterance of diphthongs. Amplitude in db re baseline of section.

was determined from the section data.
When a sampling point did not coincide
with a section, the amplitude value
was interpolated from adjacent sections.
These procedures resulted in 1800
sampling points for the three formants,
at most of which the amplitude was
sufficient for measurement. The median
frequencies and amplitudes of the three
formants are tabulated in Tables 2 and
3. Each entry is based on 20 subjects
except when the number was reduced
by inadequate amplitude. For example,
in Formant Two of /ju/ no measures
were available at 28 sampling points in
the last half of duration; in fact, only
one subject yielded a measure for
Formant Two at the last sampling
123point. In Formant Three there was no
measurable amplitude at 197 sampling
points. Such points tended to occur in
those diphthongs having back vowels as
one or both elements. As an example,
at the first sampling point of Formant
Three in /ɔɪ/ the amplitudes of 13 utterances
were insufficient for measurement,
and at the fifth sampling point
of Formant Three there was no measurable
energy for /ou/ and /ju/ for any
subject.

The median frequencies and amplitudes
of the three formants are graphed
in Figure 2. The base line is relative
duration, equal graphically for all
diphthongs. Frequency is plotted logarithmically
along the diagonal, and
the height of the vertical line indicates
amplitude in decibels. These lines are
connected to show the changes that
occurred during the course of the
diphthong.

In /eɪ/ it will be observed that the
frequencies (F₁ and F₂) of Formants
One and Two started at about 550 and
2 000 cps, respectively, and, as the utterance
progressed, F₁ decreased to about
400 cps, while F₂ rose to about 2 200
cps. The amplitudes of the formants
varied, with the amplitude of Formant
One (A₁) increasing about 5 db during
the first half of the diphthong, which
was followed by a decrease of about 10
db by the end of the last half. The A₂
variation was similar in both location
and extent. The changes of both frequency
and amplitude occurred at a
fairly regular rate in /eɪ/, and F₁ and
F₂ diverged during the course of the
diphthong.

In a general way, changes similar to
those of /eɪ/ may be observed in the
other two /ɪ/ diphthongs. The lower
two formants in /ɑɪ/ and /ɔɪ/ started
at different frequency locations, and
the change in frequency accelerated in
the last half of utterance. Also, F₁ and
F₂ diverged during the course of the
diphthong as was the case with /eɪ/.
With respect to frequency direction,
all three of the /ɪ/ diphthongs were
similar. The changes in A₁ and A₂ of
/ɑɪ/ were like those in /eɪ/, but the
range of amplitude change was considerable,
being 13 db in both A₁, and
A₂ during the course of the formants.
In /ɔi/ F₁ started slightly lower than
that of /ɑɪ/ and fell only a few cycles,
while F₂ showed the largest change of
the diphthongs. A₁ increased and then
diminished; in contrast, A₂ diminished
at a regular rate across a rather large
range of 17 db. In /eɪ/, /ɑɪ/, and /ɔɪ/,
Ai started with a relatively high amplitude,
increased slightly, and then diminished
to a smaller amplitude than that
at the start. In /eɪ/ A₂ diminished
throughout at a steady rate, in contrast
to the increase and decrease in /eɪ/
and /ɑɪ/.

The graph for /ou/ shows that Formant
One was similar to that of /eɪ/ in
both frequency and amplitude, but F₂
of /ou/ started much lower and went
steadily downward, while A₁ and A₂
increased slightly and then decreased,
the Ai range being 12 db while that
of A₂ was 8 db. F₁ and F₂ of /ou/
maintained about the same ratio since
both formants demonstrated a fairly
steady rate of decline during the course
of the diphthong. The formants of
/ɑu/ varied similarly to those of /ou/.
However, F₂ descended more swiftly
than did F₁ so that the two formants
tended to converge. Ai and A₂ rose
and then fell characteristically in /ɑu/.

The time course of /ju/ was unique
in that F₁ rose slightly while F₂ fell
124across a wide frequency range and
terminated at the fourth sampling point.
The relative convergence of F₁ and F₂
was far greater than in /ɑu/. The
greatest variation of A₁ and A₂ may
be observed in /ju/. The A₁ variation
was 16 db, while that of A₂ was 18
db. In addition, it may be seen that
/ju/ also had the greatest A₁/A₂ ratio
across the sampling points.

Among the six diphthongs F₃ either
stayed relatively constant or tended to
follow F₂. With respect to amplitude,
the largest contrast of A₃ was between
/eɪ/ and /ou/. In fact, only at the
second and third sampling points did
the median of A₃ of /ou/ exceed zero
db. With the exception of /ɔɪ/, the
variations of A₃ during the course of
the diphthong were quite similar to
those of A₂. In /ɔɪ/, however, A₃
started weakly and tended to increase
throughout except at the very end. In
/eɪ/, /ɑɪ/, and /ɔɪ/ A₃ tended to remain
at a fairly high level throughout,
but in /ou/, /ɑu/, and/ju/, it diminished
to zero db by the end of utterance.
Consequently, it may be seen that
the /ɪ/ diphthongs were characterized
by a fall of F₁, a rise of F₂, and a strong
A₃ for most of the duration. In contrast,
the /u/ diphthongs were characterized
by a fall of both F₁ and F₂
and by a diminishing of A₃ toward the
end of utterance. Finally, /ju/ was
characterized by a rising F₁, a falling
F₂, and a decrease of A₂ and A₃.

The locations of F₁ and F₂, as would
be expected from past research, appear
to constitute the main defining features
of the six individual phonemes. The data
shown in Figure 2 indicate, however,
that the amplitudes and amplitude
variations of Formant Three are distinctive
for each of the phonemes, although
the frequencies of this formant would
not appear to be very distinctive for
the individual phonemes. With no single
exception at any sampling point A₁,
A₂, and A₃ decreased progressively in
that order. An inspection of the diphthongs
in Figure 2 will show that the
steepness of the decrease in amplitude
varied from phoneme to phoneme at
the respective sampling points. The
slope of the amplitude across the formants
is related to the vowel elements
of the diphthong. For example, the
slope from A₁ to A₃ was about 15 db
steeper in /ou/ than in /eɪ/ over about
the same frequency range. The slopes
remained fairly constant across the
duration in both /eɪ/ and /ou/. At the
final sampling point, the A₁/A₃ slope
was about 18 db in /eɪ/ as compared
with 26 db in /ou/. The slope of /ɑu/
is similar to that of /eɪ/, that is, slight.
This is in contrast to that of /ɑu/,
which starts with a slope similar to
/ɑɪ/, but ends with a slope of 24 db.
This latter fact illustrates the difference
in the influences of an open back vowel
and a relatively close back vowel. In
/ɔɪ/ the change in the slope was large.
The A₁/A₃ slope was 33 db at the
beginning where the first element is
a back vowel, but only 14 db at the
end where the second element is a
front vowel. In general, with respect
to the final sampling points, the A₁/A₃
slope was steep for the /u/ diphthongs
and for /ju/, with A₃ not being measurable
in all three cases. In contrast, a
more gradual slope was seen in the /ɪ/
diphthongs.

Although the fundamental was not
of particular interest in the experiment,
it will be noted (Tables 2 and 3) that
the changes with time are not only
regular, but also substantial. In fact;
125the fundamental frequency typically
decreased about two and one-half tones
and amplitude about 5 db during the
course of the diphthong.

Frequencies of Formants One and Two in Diphthongs

Frequencies of Formants One and
Two in Diphthongs. Figure 3 shows a
coordinate plot of the frequency measurements
of Formants One and Two

image F₁ | F₂ | eɪ | ɑɪ | ɔɪ | ou | ɑu | ju

Figure 3. Frequencies (F₁ and F₂) of Formants One and Two as measured at all sampling
points in the various utterances of diphthongs.126

for all of the individual samples. The
display is in a manner usual for the
vowels with F₂ and F₁ shown logarithmically
along the ordinate and abscissa,
respectively. The five sampling points
have been plotted without regard to
their time locations, and the solid lines
enclose the total scatters of the points
for the various diphthongs. It may be
seen that the areas overlap one another

image F₁ | F₂ | eɪ | ɑɪ | ɔɪ | ou | ɑu | ju

Figure 4. Median frequencies (F₁ and F₂) of Formants One and Two at each sampling
point in diphthongs (dots), with directions of change shown by arrows. General areas
(dashed lines) from Figure 3.127

as a rule rather than the exception. The
areas reveal the general locations of
the diphthongs and their relationships
to each other. For instance, the F₁
range for /ɑɪ/ is similar to that of /ɑu/,
while the F₂ ranges of /ɑɪ/ and /ɑu/
overlap in part and then extend in
opposite directions. In /eɪ/ and /ou/
the F₂ ranges are widely separated,
but they have the F₁ range largely in
common. It may be seen that /ɔɪ/ has
a very wide F₂ range, and that it
occupies a central position in both F₁
and F₂ with respect to the other diphthongs.

In /ɔɪ/ and /ou/ F₁ and F₂ could not
be distinguished as separate regions in
some of the samples. In Figure 3 these
values are represented by the points
placed along the 45° line at the bottom
of the figure. In their study of vowel
formants, Fairbanks and Grubb (1)
found that in some samples of /ɔ/ only
one formant could be identified in the
lower frequency range. It was reported
that such vowels were carefully studied
to make certain that the sectioning point
was representative; several sections were
taken, and the one-formant finding was
confirmed in every case. The same one-formant
characteristic was found in the
present experiment in six utterances of
/ɔɪ/ and in four of /ou/, the two diphthongs
having a portion of their areas
in common.

The large size of the /ju/ area suggests
that considerable variation is
permissible in F₁ and F₂, perhaps because
confusion with other phonemes
would be unlikely.

In Figure 4 the medians of the sampling
points for the several diphthongs
are shown against the background of
the areas which have been duplicated
from Figure 3. The points have been
connected and the directions of change
indicated by arrows at the final sampling
points. In the right center of the
figure, the /ɑɪ/ and /ɑu/ are seen to
start essentially from the same area and
travel in opposite directions. The course
of /eɪ/ is seen for the most part as
a decrease in F₁. On the other hand,
/ou/ shows a decrease of both F₁ and
F₂, the F₁/F₂ ratio being about 1.5 and
remaining fairly constant. A long upward
shift of F₂ characterizes /ɔɪ/,
while /ju/ is seen as essentially a downward
shift of F₂. The final measure of
/ju/ represents the obtained value of
F₁, but F₂ has been extrapolated, as
shown by the dashed line since Formant
Two had a shorter duration than that
of Formant One.

The rate of frequency shift may be
estimated since the dots along the lines
of Figure 4 indicate equal time intervals.
In /ɑɪ/ it may be seen that the amount
of frequency change varied during the
four quarters of its duration. In the
first two quarters the change was small,
but there were large shifts during the
last two. A similar acceleration in frequency
change was apparent during the
last half of utterance in both /ɑu/ and
/ɔɪ/. For the three diphthongs above,
a basic vowel appears to be firmly
established in the first half, whereas
the latter half involves continuous and
accelerated change. In contrast, the rate
of frequency change in /eɪ/ and /ou/
is seen to be fairly regular throughout
with the exception of the third quarter
of /eɪ/. There is a notably small frequency
shift during the first quarter
of /ju/ with the most rapid rate of
change occurring during the second
quarter.128

image F₁ | F₂ | eɪ | ɑɪ | ɔɪ | ou | ɑu | ju

Figure 5. Central regions of variation of frequencies (F₁ and F₂) of Formants One and Two
in diphthongs (see text). Dashed circles indicate locations of preferred vowels (1).

Diphthong Formants Relative to Vowel Formants

Diphthong Formants Relative to
Vowel Formants. In Figure 5 the envelope
for each diphthong represents
a smoothed enclosure of the 10 samples
closest to the median at each sampling
point during the utterance. Each of the
envelopes may be interpreted as the
band within which the middle 50% of
the subjects varied during the course
of utterance. Except for /ju/, it will
be noticed that each envelope narrows
as time goes on. The diphthongs give
129the impression of heading toward target
points with variations decreasing as the
terminal points arc approached. This
is particularly true in /eɪ/ where the
terminal portion of the envelope is seen
to be quite small. In /ju/, a ‘rising diphthong,’
the glide portion also has the
smaller area and the base vowel the
larger, but the smaller area is seen at
the beginning rather than at the end.
The shapes of the envelopes suggest
somewhat less individual variability in
the glide portion than in the stressed
portion.

The small circles shown in Figure 5
indicate the locations of Fairbanks and
Grubb's (1) preferred vowel samples.
These circles may be considered as
representative of vowel areas since they
derive from a rigorous procedure of
selection and judgment. In the Fairbanks
and Grubb study nine General
American vowels were sustained by
seven skilled speakers at approximately
the same fundamental frequency. The
samples were recorded and self-approved
by each speaker as representative
of the intended vowel. The samples
were then presented twice to a group
of eight trained observers. The observers
attempted vowel identification
during the first presentation, and they
rated the samples on a scale of representativeness,
knowing the vowel
which each sample was intended to
represent during the second presentation.
An arbitrary category of identified
samples consisted of those samples correctly
identified by 75% or more of
the observers. Within the group of
identified samples of each vowel another
category of preferred samples
was defined. This last category was to
contain the most representative samples
from among the most readily identified
samples. From each set of identified
samples of a vowel the four with the
highest median ratings were chosen to
he the preferred samples.

The frequencies of Formants One
and Two of the vowels shown in Figure
5 are those of the preferred vowel
samples. In this figure it may be seen
that the starting area of /eɪ/ lies somewhat
above and slightly to the right
of /ɛ/, and its ending is just above
/ɪ/. At the bottom center of the figure,
the starting area of /ou/ is seen to be
slightly above /ɔ/. A starting area between
/æ/ and /ɑ/, perhaps in the
location of /a/, is common to /ɑɪ/ and
/ɑu/. It may also be seen that /ɑɪ/
passes through /æ/ and extends into
the starting area of /eɪ/. In the opposite
direction, /ɑu/ passes through /a/ and
extends into the starting area of /ou/.
It should be noted also that /ɑu/ ends
squarely at the preferred area of /ɔ/.
In a general way the ending areas of
/ɑɪ/, /eɪ/, and /ɔɪ/ approximate the
F₂ value of /ɪ/, whereas the endings
of /ɑu/ and /ou/ approximate the F₂
value of /u/. There is a sort of rudimentary
triangle formed by the pathways
of /ɑɪ/ and /eɪ/ on the one hand,

Table 4. Median frequencies of vowel
formants. Values in cps.

tableau vowel | F₀ | F₁ | F₂ | F₃130

and of /ɑu/ and /ou/ on the other,
which arc joined by the curved pathway
of /ɔɪ/. The area of /ɔɪ/ starts
near /ɔ/, passes through /ʌ/, and extends
slightly beyond /ɛ/ toward /ɪ/.

It will be observed that /ju/ starts between
/i/ and /ɪ/ and extends to the
/u/ area. Again, the dashed portion of
the envelope indicates extrapolation of
F₂, as was mentioned previously. The
results of the experiment show no
tendency for the second element of
/ju/ to reach the /u/ area in the F₁
and F₂ coordinate plane. In every case

image F₁ | F₂ | ɪ | ɑ | ɔ | u | u | ʌ | ɜ | ɛ

Figure 6. Frequencies (F₁ and F₂) of Formants One and Two for individual vowel samples.131

Formant Two subsided, and only
Formant One was measurable during
the final quarter of the utterance.

It is interesting to note that /eɪ/
appears to be essentially an extension
of /ɑɪ/. If one pronounces these two
diphthongs with /ɑɪ/ immediately
followed without interruption of phonation
by /eɪ/, the articulatory movement
may be perceived as being continuous.
A similar flowing movement
exists with the /ɑu/—/ou/ combination.
The major movement of the diphthongs
tends to occur during the last

image Peterson & Barney | F₁ | F₂ | ɪ | ɑ | ɔ | u | u | ʌ | ɜ | ɛ

Figure 7. Areas of frequencies (F₁ and F₂) of Formants One and Two for middle 50% of
vowel samples. Medians of all samples shown by points; means from Peterson and Barney (7)
shown by triangular points. General areas (dashed lines) from Figure 6.132

half of the utterance, so that the first
half of the time is spent on the base
vowel. As was seen in Figure 4, one-half
of the durations of /ɑɪ/, /ɑu/, and
/ɔɪ/ were devoted to accomplishing
the first one-third of the frequency
shift.

The off-glide is characterized by
shift of frequency which may not
achieve any specified combination of
F₁ and F₂ in the sense of a steady-state
vowel but may perhaps need only to
satisfy a general location of F₂ as the
utterance subsides. As has been seen
in connection with Figure 2, the amplitude
of Formant Three is strong at
the end of the /ɪ/ diphthongs, weak at
the end of the /u/ diphthongs. As will
be shown below in the discussion of
the vowel samples, the differences in
A₃ appear to characterize front and
back vowels as classes. Accordingly,
since the data on diphthongs show no
single terminal vowels to which the
different classes of diphthongs glide
when both F₁ and F₂ are considered,
the use of /i/ and /u/ as glide symbols,
as has sometimes been the practice,
might be considered to indicate the
polar changes of F₂ and A₃.

Frequencies of Formants One and Two in Vowels

Frequencies of Formants One and
Two in Vowels. In Table 4 the median
frequencies of the formants and the
fundamental are presented for each of
the 10 vowels studied. A graphic display
of the vowels may be seen in
Figure 6 wherein Formants One and
Two are plotted coordinately, with
solid boundary lines enclosing the 20
individual samples of each vowel. In
the upper left-hand corner the area
of /i/ may be seen to extend over a
relatively wide range in F₁, but the
F₂ range is somewhat restricted. A
larger variation of both F₁ and F₂ is
seen for /ɪ/. The areas of both /ɪ/
and /æ/ overlap the /ɛ/ area. In fact,
about 50% of the /æ/ area overlaps
into that of /ɛ/, whereas the /ɑ/ and
/ʌ/ areas overlap only slightly. The
/ɔ/ area is the only one which does not
overlap any of the other vowel areas.
The points located along the 45° line
at the bottom of the /ɔ/ area represent
samples similar to those mentioned
earlier in the discussion of /ɔɪ/ and
/ou/ in which there was only one low
frequency formant. The /u/ area extends
into both /ɝ/ and /u/, and the
largest vowel area is that of /u/.

In Figure 7 the vowel areas have been
reproduced from Figure 6 and are represented
by the dashed lines. The
smaller areas within each vowel area
surround the 10 samples closest to the
median. The median vowel is indicated
in each area by a round point, and the
means of Peterson and Barney (7) are
shown by triangular points. It is immediately
apparent that these restricted
areas are mutually exclusive and generally
small. For the most part the
differences between the findings of
Peterson and Barney and those of this

Table 5. Median amplitudes of vowel formants.
Values in db re baseline of section.

tableau vowel | A₀ | A₁ | A₂ | A₃ | A₄ | /i/ | /ɪ/ | /ɛ/ | /æ/ | /ɑ/ | /ɔ/ | /u/ | /u/ | /ʌ/ | /ɝ/133

image db | frequency (kc) | i | ɪ | ɛ | æ | ɑ | ɔ | u | ʌ | ɝ

Figure 8. Median frequencies and relative amplitudes of the fundamental and first three
formants of vowels. Amplitude in db re baseline of section.

experiment were minor. The only difference
of consequence is in /æ/, which
is closer to /ɛ/ in the present experiment.
The explanation of this difference
may lie in the method of collecting
samples. In Peterson and Barney's experiment
the subject read the words one
at a time from a list, which may have
invited contrasts more marked than
those occurring with the procedure of
the present study.

Spectra of Vowels

Spectra of Vowels. The median amplitudes
of Formants One, Two, and
Three and the fundamental may be seen
in Table 5. In Figure 8 the data in
Tables 4 and 5 were utilized for the
graphs in which frequency is displayed
logarithmically on the abscissa with the
amplitude in db on the ordinate. These
graphs show the essential features of
sections made on the sound spectrograph.134

A study of Figure 8 shows that F₃
tends to be higher in the front vowels
than in the corresponding back vowels.
This finding is in agreement with earlier
work. Also, it should be noted that F₃
in /ɝ/ is very low and close to F₂, a
finding which also agrees with previous
reports. The fundamental frequency
does not seem to vary systematically,
nor does it vary over a large range.
The characteristics of F₁ and F₂ have
already been discussed in connection
with Figure 7. There is very little difference
from vowel to vowel with
respect to A, and Ai, the range being
5 db over all. It will be observed that
A₁ is greater than A₂, which in turn
is greater than A₃ in all vowels. It may
also be observed that A₂ tends to vary
inversely with F₂. In other words the
graph suggests that the amplitude of
a given formant tends generally to be
dependent upon its frequency. With
respect to A₃, however, the largest A₃
among the back vowels is of less amplitude
than the smallest A₃ of the
front vowels. With the exception of
/ɔ/ A₃ increases from the high to the
low vowels in both the back vowel
and the front vowel series. Since A,
is relatively, constant for all vowels, the
steepness of the slope of the spectrum
is primarily controlled by A3. As a
result the back vowel spectra are steeper
than those of the corresponding front
vowels. A similar systematic difference
was observed in A₃ between the /ɪ/
diphthongs and the /u/ diphthongs.
Accordingly, the variations of A₃ during
the various diphthongs are seen to
be related to the differences in A₃ between
the vowels that are pertinent.

Summary

Six diphthongs and 10 vowels were
spoken in words by 20 men, all native
General American speakers. By means
of the sound spectrograph the variations
of the frequencies and amplitudes of
the lower three formants were measured
with the following major findings.

Durations of the formants were
relatively short in /eɪ/, /ou/, and /ju/,
longer in /ɑɪ/, /ɔɪ/, and /ɑu/. Formants
One and Two coincided approximately
throughout.

The Formant One and Formant Two
frequencies, F₁ and F₂, diverged during
their course in /eɪ/, /ɑɪ/, and /ɔɪ/,
tended to maintain a constant ratio as
both lowered during /ou/, and converged
during /ɑu/ and /ju/. The
Formant One and Formant Two amplitudes,
A₁ and A₂, tended to increase
slightly at first and to decrease thereafter,
becoming lower at the end than
at the beginning. A3 appeared to be
capable of differentiating classes of
diphthongs, being prominent in /ɪ/
diphthongs, weak in /u/ diphthongs.

Coordinate plots of F₁ and F2 showed
extensive overlap of vowel areas, but
the areas were mutually exclusive when
the 10 samples closest to the median
were considered. The findings were in
general agreement with those of earlier
measurements. In some of the /ɔ/
samples only one formant was identified
in the lower frequency region. A3 in
front vowels exceeded A3 in back
vowels. Back vowel spectra were
steeper than those of corresponding
front vowels.

Summario in Interlingua

Sex diphthongos e dece vocals esseva
parlate in vocabulos per vinti masculos,
toto native General American parlators.
Per medio de le sono spectrograph le
variations de le frequentias e amplitudes
135de le plus basse tres formants esseva
mesurate con le sequente major trovantes.

Durations de le formants esseva relativement
curte in /eɪ/, /ou/, e /ju/,
plus longe in /ɑɪ/, /ɔɪ/, e /ɑu/. Formants
Un e Duo coincide approximamente
in omne panes.

Le frequendas de Formant Un e
Formant Duo, F₁ e F₂, divergite durante
lor curso in /eɪ/, /ɑɪ/, e /ɔɪ/, tendite
mantener un constante ration como
ambes bassate durante /ou/, e convergite
durante /ɑu/ e /ju/. Le amplitudes
de Formant. Un e Formant Duo, A₁ e
A₂, tendite augmentar legiermente al
principio e discrescer pois, deveniente
plus basse a le fin que al entraca. A₃
apparite esser capabile de differendante
classes de diphthongos, essente prominente
in /ɪ/ diphthongos, debile in /u/
diphthongos.

Coordinate designantes de F₁ e F₂
monstrate extense duplication de vocal
areas, sed le areas esseva mutualmcnte
exclusive quando le dece exemplos le
plus proximine a le mediana esseva
considerate. Le trovantes esseva in
general concordantia con istes de ante
mesuradon. In alicun de le /ɔ/ exemplos
sol un formant esseva identificate in
le inferior frequenda region. A₃ in
fronte vocals excedite A₃ in post vocals.
Post vocal spectra esseva plus precipitose
que illes de correspondente fronte
vocals.

References

1. Fairbanks, G., and Grubb, Patti, A
psychophysical investigation of vowel
formants. J. Speech Hearing Res., 4, 1961,
203-219.

2. Joos, M., Acoustic phonetics. Language
Monogr., 23 (suppl. to 24, 1948).

3. Lehiste, I., and Peterson, G. E., Transitions,
glides, and diphthongs. J. acoust.
Soc. Amer., 33, 1961, 268-277.

4. Liddell, M. H., The physical characteristics
of speech sound. Bull. No. 16,
Eng. Exper. Stat., Purdue Univ., 1924.

5. Liddell, M. H., The physical characteristics
of speech sound-II. Bull. No. 23,
Eng. Exper. Stat., Purdue Univ., 1925.

6. Liddell, M. H., The physical characteristics
of speech sound-III. Bull. No. 28,
Eng. Exper. Stat., Purdue Univ., 1927.

7. Peterson, G. E., and Barney, H. L.,
Control methods used in a study of the
vowels. J. acoust. Soc. Amer., 24, 1952,
175-184.

8. Peterson, G. E., and Coxe, M. S., The
vowels /e/ and /o/ in American speech.
Quart. J. Speech, 39, 1953, 33-41.

9. Potter, R. K., Kopp, G. A., and Green,
Harriet C., Visible Speech. New York:
Van Nostrand, 1947.

10. Potter, R. K., and Peterson, G. E., The
representation of vowels and their movements.
J. acoust. Soc. Amer., 20, 1948,
528-535.136

1* Reprinted from the Journal of Speech and Hearing Research, Vol. 5, 1962, pp. 38-58.

2** Anthony Holbrook (Ph.D., University of
Illinois, 1958) is Assistant Professor of Speech,
Wayne State University. Grant Fairbanks
(Ph.D., University of Iowa, 1936) is Professor
of Speech, University of Illinois. This
article is based on the first author's Ph.D.
thesis.