CTLF Corpus de textes linguistiques fondamentaux • IMPRIMER • RETOUR ÉCRAN
CTLF - Menu général - Textes

Fairbanks, Grant. Experimental Phonetics – T13

A Psychophysical Investigation of Vowel Formants *1

Grant Fairbanks
Patti Grubb **2

Although acoustic vowels are specified
by combinations of formant frequencies,
it is commonly understood that
these frequencies vary considerably
from utterance to utterance. The investigations
of such variations have
provided useful information about
individual differences in speech and
about the range of vowel approximations
in the speech attempts that a
listener or a voice-operated device must
be prepared to accept. The experiment
reported here, however, has proceeded
in a different direction. It has taken for
its main purpose the study of the formant
structure of vowel samples that
meet high standards of identifiability
and judged representativeness under
controlled laboratory conditions.

Procedure

Collection of Samples

Collection of Samples. Vowels selected
for study were /i ɪ ɛ æ ʌ ɑ ɔ u u/.
It will be recognized that they are exemplary
of the complete range of English
vowels, that all occur as vowels in
General American speech, and that each
may be produced without ambiguity in
the steady state. The speakers who furnished
samples were seven men, professors
from the Department of Speech at
the University of Illinois, ranging in age
from 34 to 57 years with a median age
of 43 years. All had been teachers for
many years, were experienced speakers
and habitual users of the General
American dialect. About one week before
formal procedure in the laboratory
a copy of the following statement was
given to each speaker and discussed with
him. It is quoted in full because it explains
the general rationale of the problem.

Explanation of the Problem for Speakers

This experiment is concerned with nine
vowels of the General American dialect
and we would like to ask you to produce
examples of each of these vowels. For
certain reasons we want these examples
produced in particular ways, and the
entire success of the experiment depends
upon your being able to do this. The job
will not be easy, and it is for this reason
that we are using as subjects only people
such as yourself who have studied speech
for many years and who have exceptional
personal vocal skills.

Essentially what we are trying to do is
to collect samples of each vowel that are
as nearly typical or representative of that
vowel as possible. More specifically, we
are interested in samples that depict the
central tendency of each vowel. In order
to make this as clear as possible, let us
imagine that we have collected a large
number of different words as spoken by
a large number of General American
speakers, each word including an example
99of the vowel in question, and that we have
tape recordings of these words. Suppose
that we now cut out a short portion of
the vowel from each word, mix the
portions up in random order together with
similar samples from the other vowels, and
play them one at a time for a group of
observers who have been instructed to
identify each vowel. Now let us take out
into a separate group those examples of
the one particular vowel that we are
talking about that have been identified by
100 per cent of the observers. If we play
these and listen to them we will discover
that they are similar to each other but
not identical. They vary over a certain
range. It is the center point of that range
that we are interested in, and you can
regard that point as a target to shoot at
as you produce your examples. Another
way of putting the problem is to say what
we want you to do is to imagine the target
on the basis of your experience in listening
to speech, and then demonstrate what
the target is by producing a vowel of
your own that hits the target as you
imagine it.

You will understand from this, that we
are trying to get samples that are something
more than merely acceptable and
identifiable. It will also be clear that we
are not asking you to produce samples of
the way you personally, individually produce
any given vowel, but rather to think
about the entire population of General
American speakers. Further, we are not
asking you to exemplify the way in which
anyone thinks the vowels should be produced
in the sense of ‘standards’ or anything
of the sort. Instead we want the
central tendency as you hear it.

Each speaker came to the laboratory
individually and the complete procedure
for collection of samples was accomplished
with him in a single session.
The following instructions were read
aloud formally. Since they will be relied
upon for exposition of some of the procedure,
attention is invited to their details.

Instructions to Speakers

Do you have any questions about the
written explanation that I gave you earlier?
Would you mind summarizing it in your
own words so that we can make sure that
we understand each other?

Here is a list of the nine vowels that
we are going to work with, and a few
typical examples of words in which these
vowels occur in GA. Don't pay any
special attention to the words. They are
there simply to help identify the vowel
further; it would defeat our purpose if
you were to fix on the vowel of any
particular word as a model.

We are going to consider the vowels
one at a time, recording as we go. For
each vowel we will record and save two
examples that satisfy you and then go on
to the next vowel. After we have finished
all the vowels we will listen once more to
the examples that we have saved.

While you are producing the vowels
please sit here in this chair with the earphones
on and your mouth about one
root from the microphone. Like this. The
course of the procedure will be controlled
by you and we will stay with each vowel
until you are satisfied. A flash of this red
light will signify that the apparatus is
ready. As soon after as you like you can
make your first attempt and I will be recording
a section of it as you do so. A
second flash of the light will indicate
recording is over and you will stop phonating.
As soon as you stop you will hear
the section played back to you over and
over through the earphones. Listen to this
as many times as you like. Try to decide
if it is what you are trying for or not.
Ask yourself if it is as good as you can
do or if you can do better. If you are
satisfied, say ‘Keep’ and go on to the
next attempt. If not, say ‘Discard’ and
repeat the attempt. In either case wait for
the red light before going on. Please be
very critical of what you produce and do
not accept any example unless you are
entirely satisfied that it depicts the central
tendency of the vowel as you hear it in
the language.

You will notice that the sections that
are recorded are shore Therefore, each
vowel that you produce will need to be
sustained for only a second or two. However,
use any duration longer than that
which you find comfortable or otherwise
desirable.

As far as loudness is concerned, keep
your various attempts approximately the
same, but don't make any special effort to
make them exactly equal.

We would like you to produce all the
vowels at as nearly the same pitch as possible,
matching a standard pitch that I will
give you from time to time in the earphones.
100If you want this standard at any
other time before an attempt, please ask
for it. I will be able to hear you over the
microphone.

The recording referred to above was
accomplished on a system consisting of
an Altec 21-B microphone and a Magnecorder
PT-6 tape recorder operated
at 15 ips. A duplicate system, exclusive
of tape transport, was used for monitoring
by the speaker, who wore a binaural
(parallel) headset with Permoflux PDR-10
earphones over which he heard his
speech contemporaneously with production
and received instructions. As
will appear, he also heard and evaluated
his recorded product over the same system.
The level of reproduction was
adjusted to his preference during preliminary
trial. The pitch standard
referred to in the instructions was supplied
at 130 cps by an oscillator. It was
introduced into the earphones at the
start of each series of vowel attempts
and at such other times as requested by
the speaker, but matching was not
rigidly enforced. The equipment was
arranged in a two-room laboratory suitable
for this type of work.

Recording was done on loops previously
prepared, one for each sample.
Each loop was 30″ long and thus cycled
in about two seconds. The iron oxide
coating was scraped from all but 4.5″
of the loop, leaving a ‘live’ section proportional
to 0.3 sec. The signal light
mentioned above was controlled manually
by the experimenter to indicate to
the speaker the general period during
which the ‘live’ section of tape passed
the record head. The arrangement permitted
the experimenter to obtain an
isolated 0.3-sec sample from the middle
of a longer sustained sample. As soon
as each sample had been recorded and
approved by the speaker, according to
the procedure outlined in the instructions,
the process was repeated with the
next vowel, and so on until all 18 samples,
two for each of nine vowels, had
been collected. The typical individual
sample was preceded by a number of
attempts before final acceptance by the
speaker, and no speaker appeared to find
the task an easy one.

As the two samples of each vowel
were secured the two recording loops
were spliced together into a larger loop
that displayed the pair of samples in the
order of recording with a pause of
about 0.8 sec between. After all samples
had been recorded the speaker
rested briefly and then listened to these
pairs of samples one at a time over the
identical system used in reproducing
the samples earlier. He was asked to
pick that one of the two which he considered
the more successful, and then
invited to attempt still another sample
if he believed he could produce an even
more successful sample. In such a case
the third sample was spliced into the
loop following the second sample, and
the speaker was asked to make first
and second choices from the three. He
was then invited to try yet another,
continuing as long as he wished. No
speaker went beyond the third sample.

The ultimate yield of the procedure
was a set of 126 samples for analysis,
each of seven speakers having produced
a pair of samples of each of nine vowels;
each sample had been approved by the
speaker after an unspecified number of
attempts, and the speaker had ranked
the two members of each pair.

Judgmental Procedures

Judgmental Procedures. In preparation
for this portion of the experiment
the original samples were re-recorded
on equipment equivalent to the original.
In the process the samples were roughly
101equated in level, within a total range of
five decibels. They were ordered at
random and spliced into a continuous
stimulus tape, each sample being preceded
by a spoken item number and
followed by a 4-sec judgment interval.
The listening situation essentially duplicated
that used in the arrangement for
collecting samples, except that four
matched headsets, equivalent to that
used by the speakers, were paralleled to
accommodate four observers at a time,
seated facing away from each other in
the quiet room of the laboratory. Level
of reproduction was approximately 65
db above threshold.

The observers were eight young
adults, six of them men, trained in
phonetic analysis at the graduate level,
and experienced as experimental observers.
All spoke the General American
dialect natively and habitually. None
had a history of hearing loss.

Two types of procedure were used,
the first of which was attempted identification.
For this part a response sheet
was used which listed phonetic symbols
and key words for the nine vowels
across the top and item numbers down
the left, with rows of cells so that response
could be made by checking in
the appropriate column. The observers
were familiarized with the general nature
of the stimuli and with the use of
the response sheet by means of a short
practice tape consisting of vowel samples
similar to the stimuli.

The second judgmental situation was
designed to provide an estimate of the
degree to which each sample was representative
of the vowel intended, and
was carried on at later, separate sessions
with the same observers. The same
stimulus tape and general conditions of
administration were used. The response
sheet for this session showed each attempted
vowel by means of a phonetic
symbol after the item number, while
across the top were shown equidistant
points along a nine-point graphic rating
scale. The points were numbered 1
through 9 from left to right; above the
respective numbers were the words Inferior,
Very Poor, Poor, Below Average,
Average, Above Average, Good,
Very Good, Superior. Each item had a
corresponding row of nine spaces and
raring was recorded by checking. In
addition to the response sheet each observer
had a card on which the nine
vowel symbols were identified by code
numbers. Prior to each item the experimenter
spoke the code number of the
intended vowel, and this, together with
the phonetic symbol after the item number,
was the means of announcing the
intended vowel without articulating it.
The tape was stopped for judgment
after each sample was played. The observers
were invited to use as much
rime as needed and to ‘judge each sample
in terms of the success with which
it represents the attempted vowel as
that vowel most frequently occurs in
General American speech.’ Before the
experiment a brief practice tape was
played which consisted of about 20
samples from the center of the stimulus
tape. These included at least one of each
vowel and, in the opinion of the experimenters,
at least one instance of both
extremes of representativeness so that
the span of the raring scale might thus
be illustrated.

Spectrographic Analysis

Spectrographic Analysis. A Kay Electric
Company Sona-Graph, locally
modified to provide a full-scale range
of 3 500 cps, was used. The original tape
loops were reproduced on the original
recording equipment, and the first step102

Table 1. Identification matrix.

tableau intended vowel | identified as | i | ɪ | ɛ | æ | ʌ | ɑ | ɔ | u | u | total

with each sample was to make a conventional
spectrogram with the 300-cps
bandwidth and flat frequency response.
This was done to verify the essential
constancy of the sample and facilitate
selection of the portion to be analyzed
by means of the sectioner. The section
was made midway in the sample; the
45-cps bandwidth was used together
with a frequency response of positive
slope. The procedure for sectioning
was to make the first section at the
maximum level which would avoid
amplitude clipping of the strongest
formant, usually the first, and then to
resection as many times as necessary at
the same point, progressively increasing
the amplitude of the display until the
weakest formant, usually the third, had
been resolved. Decisions in such respects
were guided by the expectation of three
formants at certain general locations,
according to previous results. In over
half the cases the first section proved to
be adequate; two sufficed in most of
the others, but in some samples it was
necessary to make three. This practice
was conservative and multiple sections
were made in all cases that were not
completely unequivocal.

The frequency scale was calibrated
individually for each vowel section by

Table 2. Cumulative distributions of identification scores.

tableau intended vowel | identification score (number correct) | i | ɪ | ɛ | æ | ʌ | ɑ | ɔ | u | u | total103

sectioning the output of a generator (3)
which provided a complex test tone
having the harmonics of 200 cps. This
frequency was derived from an oscillator
and calibrated by Lissajous figure
against a highly stable 1 000-cps standard.
Although the drift of the frequency
scale of the spectrograph became measurable
only over periods of several
hours, it is believed that the labor involved
in this method of continuous recalibration
was repaid by increased precision.

Before any frequency measurements
of formants were made a jury of three
persons, all sophisticated in spectrographic
analysis, agreed upon the general
locations of major energy regions
and identified the three formants to be
measured. In a few cases where unanimity
failed the necessary decisions were
reached in consultation between the jury
and a fourth expert. 13 Consideration was
given to some method of amplitude
weighting in the frequency measurement
of formants, such as that used by
Potter and Steinberg (5). However, it
was not proposed to make any inferences
with respect to pitch, the theory
of weights is not compelling in any case,
and the method is extremely laborious.
It was decided that it would be preferable
to make direct measurements at
points of maximum amplitude as recorded
on the section, with due regard
to the shaping characteristics of the
system. In most instances one component
was obviously most prominent, but
two adjacent components were equal
in amplitude in some formants, and here
the arithmetic mean was used. 24

Results

Identifications and Ratings

Identifications and Ratings. Table 1
summarizes the results of the identification
procedure. Each row shows the distribution
of 112 judgments, eight observers
times 14 samples. The entries
along the central diagonal may be interpreted
as signifying ‘correct’ identifications,
in the sense that they indicate
agreements between what the listener
heard and what the speaker intended. It
is plain that the samples were characterized
by generally high identifiability.
The over-all figure is 74%; individual
vowels range from 53 to 92%. On the
assumption that the speakers made equal
effort across the vowels, it seems likely
that the variations along the diagonal
signify differences in ease of vowel production.
The most readily identified
samples were those of /i/ and /u/,
which might be expected from the fact
that they represent poles of articulatory
position and formant combination.

Table 2 is based on the identification
scores of individual samples, that is,
number correct, the maximum score104

Table 3. Cumulative distributions of median ratings.

tableau intended vowel | median rating (1-to-9 scale) | i | ɪ | ɛ | æ | ʌ | ɑ | ɔ | u | u | total

being eight A cumulative distribution
is shown for each vowel, and a given
entry is to be interpreted as showing the
number of samples having the indicated
score or better- For instance, 11 of the
14 samples of /ɛ/ had identification
scores of five or higher. The combined
distribution is shown along the bottom.
It will be noted that 106 of the 126
samples were identified by four or
more observers, 83 by six or more, and
45 by all eight. Special interest attaches
to the well-identified samples and in
particular, as will be seen, to those with
scores of six and higher. The samples
that were less frequently identified are
useful, however, for comparative purposes,
and Table 2 shows that the range
of identifiability was large enough to
provide this kind of contrast. 35

The results of the rating procedure
were used to derive for each sample a
median rating on the 1-to-9 scale, interpreted
as indicating the degree to
which the sample approached the central
tendency of the intended vowel,
that is, the extent of its representativeness.
Table 3 is concerned with these
data and shows cumulative distributions.
The general level of rating may be observed
to be high, with 88 samples
(70%) having medians at the midpoint
of the scale or higher, yet the range is
wide enough for discrimination. As
would be expected the relationship between
identification scores and median
ratings was substantial. For the 126
samples the product-moment correlation
coefficient was 0.76 between the
two. At the upper end of the distribution
of ratings, the region of greatest
interest for this study, it was notable
that identification was uniformly high.
For instance, the highest interval of
Table 3 contains 13 samples; 12 of these
were identified by all eight observers
and the remaining sample by seven.
However, at the upper end of the distribution
of identification scores the
range of median ratings was wide. The
45 samples identified by all observers
105(Table 2, last column) were distributed
continuously from 8.5 down to 3.5 in
median rating. That is, in representativeness
some of these easily identified
samples were considered to be Very
Good
to Superior, while others were
Below Average to Poor. This relationship
might be summarized by suggesting
that identifiability as a condition for
representativeness appears to be necessary
but not sufficient.

It will be remembered that each
speaker approved his own product, and
that in many instances a sample received
approval only after a string of attempts.
All 14 samples of a given vowel, therefore,
may be regarded as having been
screened by experts from a larger number
of vowel samples. Since they met
such a criterion of acceptability they
will be referred to as self-approved
samples
.

Data from the identification procedure
were employed to form an arbitrary
category of identified samples, as
they will be termed, consisting of those
samples correctly identified by 75% or
more of the observers. In Table 2 these
are the 83 samples tabulated in the
column under the heading 6, where it
will be noted that the number varied
from four to 13 among the individual
vowels.

Within the group of identified samples
of each vowel a third set, preferred
samples
, was defined. The intention
here was to pick the most representative
samples from among the most readily
identified samples. From each set of
identified samples of a vowel the four
with the highest median ratings were
selected. This number was also arbitrary,
constituting those above the 75th
percentile of the distribution of self-approved
samples. Ties in the fourth
rank occurred with /i/ and /ɛ/, and
were resolved by including five samples
in these cases only. In /ɪ/ only four
identified samples were available by the
criterion for that category, so in this
vowel the preferred samples included
all identified samples.

Formants One and Two

Formants One and Two. Table 4 is
devoted to mean frequencies and is

Table 4. Mean values of F1, F2, and F3 (cps) for self-approved, identified, and preferred
samples.

tableau intended vowel | i | ɪ | ɛ | æ | ʌ | ɑ | ɔ | u | u | formant one | self-approved | identified | preferred | formant two | formant three106

image F1 | F2 | preferred | identified | self-approved | i | ɪ | ɛ | æ | ʌ | ɑ | ɔ | u | u

Figure 1. Frequency areas of Formants One and Two for self-approved, identified, and preferred
samples of vowels. Values in cps.

presented mainly for reference. As explained
above, each set of self-approved
samples consisted of the 14 offered by
the speakers, the sets of identified
samples varied in number, and the sets
of preferred samples numbered four or
five.

The major findings are displayed in
Figure 1, where F, and F,, the frequencies
of the lowest two formants,
are shown along abscissa and ordinate,
respectively, the scale units being equal.
This general arrangement for showing
the combinations of F1 and F2 will be
familiar from previous reports, although
it will be noted that here both coordinates
are logarithmic throughout. 46
107For each vowel the self-approved area
was formed by plotting the 14 samples
as individual points (not shown) and
connecting the extreme points by
straight lines, reflex angles being
avoided. The general locations of the
areas are conventional, closely resembling,
for example, those shown by
Peterson and Barney (3). The areas are
smaller, however, than are usually obtained
for a group of speakers, with
correspondingly less overlapping, and
presumably this is attributable to the
use of expert subjects.

It will be noted in Figure 1 that the
lower slanting edge of the /ɔ/ area is
along the line where F1 = F2. The six
samples produced by three speakers fall
on this line. The original spectrographic
sections of these samples clearly showed
a single concentration of energy in the
lower frequencies, with maximum
amplitude either in one prominent component
or in two adjacent components
of equal amplitude. These were in contrast
to the eight remaining samples of
/ɔ/, each of which showed two definite
formants. The possibility was considered
that the points of sectioning
might have been unrepresentative in
spite of the fact that their locations had
been determined by means of the preliminary
conventional spectrograms.
Consequently, each sample was sectioned
periodically at seven points, 0.04
sec apart. The original finding was confirmed
in every case. It would appear
that in these six one-formant samples of
/ɔ/ F1 and F2 were identical, or at
least so close that two formants were
not differentiated by the available components.
It is notable that all six were
among the identified samples, which
totalled 10 in the case of this vowel.
Since this close proximity of F1 and F2
was found only in /ɔ/, this vowel is
apparently the closest approach to a
vowel having only one distinctive
formant, that is, two resonators similarly
tuned, and may well be the limiting
case. 57

The identified areas in Figure 1 surround
the points corresponding to
those samples identified by 75% or
more of the observers. It will be observed
that the effects of imposing this
criterion are that the area for each
vowel has been reduced in size and that
the nine areas are now mutually exclusive.
This suggests strongly that when
a vowel attempt is successful in the
sense of identifiability, that is, is essentially
free of ambiguity as a sample
of the vowel in question, F1 and F2
may be sufficient to specify the vowel.
Reciprocally, the finding suggests that
fulfillment of F1 and F2 requirements in
combination would be likely to foster
successful identification of a vowel so
constituted. In other words, the data
support the idea that the identifiability
of an uttered vowel depends in part
upon the degree to which F1 and F2
approach the standards for that vowel.
This is by no means an original view of
the matter, but in support of it the
past evidence drawn from live speech
has been meager. 68108

Study of Figure 1 will show that in
eight of the nine vowels, all but /u/,
the identified area is located near the
periphery of the self-approved area.
These locations vary from vowel to
vowel in an interesting manner. For
instance, to consider only the F, dimension,
in /ɪ/ and /ɛ/ they are found
among the lower values, in /ɑ/ among
the higher values, in /u/ among the
medium values, etc If such relative locations
within the larger areas are
studied in connection with the identification
matrix of Table 1, it will be
seen that in every vowel the location
tends to be away from the identified
area of that vowel for which misidentified
samples were most often mistaken.
This appears to be an important
finding and it may be illustrated by
reference to the general findings in the
case of two vowels. Table 1 shows that
self-approved samples of /ɔ/ were misidentified
29 times of which 27 were
as /ɑ/. Similarly, 42 of the 44 misidentifications
of the purported /ɑ/
samples were distributed between /ʌ/
and /ɔ/. In Figure 1 the relationships
between the self-approved and identified
areas for /ɔ/ and /ɑ/ are obviously
correlated with these misidentifications.
Thus, when the criterion of identifiability
was applied the vowel areas not
only became smaller and mutually exclusive,
but also were plausibly specific

According to Table 1 the vowel most
often used in judgments was /ʌ/. Of
the 154 usages, 76 were correct in the
sense that they were applied to self-approved
samples of /ʌ/; 63 of the
remaining usages were applied to alleged
samples of /æ/, /ɑ/ or /u/, and
in each of the three /ʌ/ was the second
most frequent judgment. If Figure 1
is examined with this distribution in
mind and studied with regard to the
differences between the original self-approved
areas and the identified areas
of the four vowels, and their relative
locations, it becomes apparent that portions
of the self-approved areas of /æ/,
/ɑ/ and /u/ outside of their respective
identified areas approach the identified
area of /ʌ/, with /ɑ/, in fact, encroaching
upon it. In short, the ‘incorrect’
judgments of such vowel samples
were not incorrect in terms of F1 and
F2. As a whole this evidence supports
the general conclusion that when cues
for detection other than those residing
in the spectrum are held to a minimum,
as they were in the present experiment,
vowel identifiability is dependent upon
requirements for the combination of F1
and F2 that are comparatively rigid.

The identified areas of Figure 1 afford
information that bears on the
absolute versus relative vowel theories.
In review, the essential position of the
absolute theory is that a vowel is
characterized by a unique combination
of formant frequencies, in which combination
the absolute frequencies F1 and
F2, not the relation between them, are
the important data; the relative theory
proposes that the vowel is specified by
the ratio of frequencies, F2/F1, not by
their absolute values. Thus in the relative
theory the formant frequencies
may vary within vowel, given only that
their ratio remain essentially constant.
In terms of Figure 1 the discreteness
of the nine identified areas supports the
absolute theory, but does not as such
confute the relative theory. However,
the figure permits a graphic test of
the relative theory. Since its coordinates
are logarithmic and of equal scale, the
ratio F2/F1 is constant along any
straight line which ascends to the right
109at an angle of 45° with the abscissa.
The relative theory is not valid where
such a line is common to more than one
vowel, and this is seen to be the case.
For example, the line for the ratio 2.5,
which originates in the lower left-hand
comer with 500/200, rises through the
identified areas for /u/, /u/ and /æ/.
When the smaller preferred areas (see
below) are considered it will be noted
that /u/ and /u/, /ɔ/ and /ɑ/, /ʌ/
and /æ/ are pairs of vowels with
similar F2/F1. However, Figure 1 shows
plainly that neither F1 nor F2 alone is
a complete specification; for example,
600 cps for F1, is common to three
identified areas, the vowels being differentiated
by F2. The importance of
absolute location is emphasized at several
places in Figure 1, but perhaps most
plainly by the compactness of the
measurements of F2 in /u/ and /i/.
Each of these vowels had 13 samples
in the identified group (Table 2). The
values of F1 ranged, respectively, from
233 to 300 and from 217 to 283 cps,
suggesting close adherence to a standard
frequency location. In summary, the
conclusions to be reached are that the
data are positive support for an absolute
theory and demonstrate that the relative
theory is not tenable as a complete
explanation. 79

Attention is now directed to the
preferred areas in Figure 1, where the
samples have been restricted to those
which the observers considered most
representative, according to criteria explained
above. In every vowel the preferred
area is seen to be very much
smaller than the self-approved area and,
in most cases, to be considerably smaller
than the identified area. The most extreme
instance of the latter relationship
is /u/; the main exception is /ɪ/, where
it will be recalled that the identified and
preferred samples were identical. Evidently
within-identification preference
is correlated with restriction of both
F1 and F2. It was noted above that most
of the identified areas tend toward extreme
locations within their respective
self-approved areas. The locations of
some of the preferred areas are even
more extreme in certain vowels, notably
/ɛ/, /æ/ and /ɑ/; in others they are
not. For instance, in /u/ identification
seems to depend upon low values of
both formants, but the extreme combinations
were not considered to be most
representative. The case of /ɔ/ is interesting
and instructive. As shown in
Figure 1, the criterion of identified
samples eliminated the four self-approved
samples with the highest values
of F2. The remaining 10 identified
samples included the six one-formant
samples discussed earlier, to be seen
along the 45° line in Figure 1. None
of these six, however, was among the
four top-rated samples that made up the
preferred group. Thus it might be said
that high identifiability of /ɔ/ seems to
obtain when F2 approaches F1, that is,
such a sample is not ambiguous, but that
samples in which the proximity is too
close seem to be less preferred.110

The locations of the preferred areas
in Figure 1 emphasize even more
strongly that discreteness which was
remarked in discussion of the identified
areas. In general, as the criteria become
more exacting the areas not only shrink,
but seem to be drawing away from each
other, so that the samples judged to be
most representative of a vowel are closely
bunched and the different vowels are
spaced across the coordinate field. In
order to show this clearly Figure 2 displays
only the preferred areas. In view
of these results and the procedures from
which they came, the mean frequencies
of the formants for the preferred samples
as shown in Table 4 take on considerable
interest. It is suggested that
this set of means together with the areas
in Figure 2 provide the closest approach

image F1 | F2 | preferred | identified | self-approved | i | ɪ | ɛ | æ | ʌ | ɑ | ɔ | u | u

Figure 2. Frequency areas of Formants One and Two for preferred samples of vowels.
From Figure 1. Values in cps.111

image F1 | F2 | preferred | identified | self-approved | i | ɪ | ɛ | æ | ʌ | ɑ | ɔ | u | u | speaker a | speaker b

Figure 3. Personal vowel systems of two individual speakers shown in relation to frequency
areas of Formants One and Two for preferred samples of vowels. Areas from Figure 2.
Speaker A, highest over-all ratings; Speaker B. lowest over-all ratings.

so far to a standard model of the
General American vowel system.

Figure 3 was prepared in order to
illustrate personal vowel systems, or
individualistic configurations of formant
combinations, in their relationships to
the preferred areas. The samples of
Speaker A were highest among the
seven speakers in both identification
scores and ratings, while those of Speaker
B were lowest. Only one sample per
vowel for each speaker has been plotted
in Figure 3, this sample being that one
which the speaker himself ranked as
more representative. In contrast to the
foregoing discussion, which proceeded
from judgment groups to formant
measurements, attention now shifts to
112the judgments that result from particular
combinations of formant frequencies.

As shown in Figure 3 the samples of
Speaker A are not especially instructive
except to illustrate close agreement between
speaker and observers; each was
among the preferred samples. But perhaps
this case is useful to exemplify
the fact that the central tendency of
the samples is not simply a statistical
fiction.

The samples of Speaker B as plotted
in Figure 3 are informative and repay
point-by-point study. Outstanding
general characteristics are, first, the
extremely small coordinate area used
for the entire vowel system other than
/ɔ/, about one-fourth of the figure's
total area; second, the extremely narrow
range (533-600 cps) within which the
F1 values of six samples are to be found;
third, the concentration of four samples
near /ʌ/. As a matter of fact, the key
vowel of this speaker's vowel system,
as Figure 3 shows vividly, is /ʌ/. His
intended sample of /ʌ/ was so identified
by seven of the eight observers,
had a median rating of 7.5 on the nine-point
scale of representativeness, and
was one of the four preferred samples
of /ʌ/. About it in Figure 3 are
clustered three other points, representing
attempts at /æ/, /ɑ/, and /u/. But
these three samples were identified as
/ʌ/ by eight, seven, and three observers,
respectively, and rated 1.0, 1.5, and 2.5
as examples of the vowels intended by
the speaker. The plotting of the sample
offered for /se/, unanimously labelled
/ʌ/, is a conspicuous instance of the
untenability of the relative vowel
theory. The ratio F2/F1 for this sample
is 2.24, very close to the average of the
preferred samples of /æ/, which is 2.26.
Speaker B's sample of /ɔ/ was one of
the one-formant samples, the lowest in
frequency of the six; its identification
score was seven, but its median rating
was only 5.0. That is, it was easily
identified, but not considered highly
representative. As Figure 3 shows, the
point for this sample is distant from the
preferred area, but no ambiguity resulted
because the displacement of the
formants was not directed toward any
other vowel area. The sample of /u/
was identified by all observers and was
one of the preferred group. As may be
seen in Figure 3, the remaining three
samples, offered for /i/, /ɪ/ and /ɛ/,
were consistently displaced upward with
respect to F1 and confined within an
extremely narrow range of F2. Their
respective identification scores were 3,
0, and 4, and their median ratings were
4.0, 2.5, and 4.5. Distributions of identifications
were unremarkable except for
the /ɪ/ sample, which seven observers
identified as /u/. Since the point is very
close to the /ɛ/ area, as may be seen,
although the sample resembles /u/ in
F1, this constitutes a discrepancy between
formant combination and identification,
notably rare, which suggests
possible operation of identifying cues
other than those afforded by F1 and F2,
perhaps in the amplitude domain.

Formant Three

Formant Three. The results of the
frequency measurements of the third
formant are summarized by the set of
means in Table 4. The individual values
of F, were studied in relation to the
judgmental data in the same manner as
for F1 and F2. Plots of F2 with F1 and
with F2 showed the same progressive
narrowing of areas from self-approved
to identified to preferred samples as
has been reported above for the plots
of the two lower formants. However,113

image F2 | F3 | preferred | identified | self-approved | i | ɪ | ɛ | æ | ʌ | ɑ | ɔ | u | u

Figure 4. Frequency areas of Formants Two
and Three for preferred samples of vowels.
Values in cps.

overlapping of ranges was very much
more extensive in the F3 dimension. In
general location, by either of the coordinate
combinations, the areas resembled
those plotted by Potter and
Peterson (4) for eight adults.

The nature of the results with the
third formant may be illustrated by
Figure 4, which shows the preferred
areas in a coordinate plot of F3 and F2.
Although this most extreme of the restrictions
has been accompanied by
considerable separation of areas, examination
of the areas will show that most
of this is attributable to F2. In fact, a
vertical line at about 2 550 cps would
pass through all areas except that of
/i/. The areas overlap at three points,
in contrast to Figure 1, where even the
less powerful restriction to identifiable
samples resulted in separation of all
areas. The plot of F3 and F1 was even
less effective in resolving the vowels.
The shrinking of range with application
of the identified and preferred criteria
suggests that F2 gives some information,
for example, may denote a subclass of
vowels, but in combination either with
F1 or with F2 alone the vowel is not
completely specified. The frequency of
the third formant may make its most
important contribution by participating
in the distinction between /i/ and the
non-/i/ vowels.

Summary

Nine General American vowels were
sustained by seven skilled speakers at
approximately the same fundamental
frequency. Steady-state samples, two
for each vowel, each 0.3 sec in duration,
were recorded, individually self-approved
by each speaker as representative
of the intended vowel, randomized,
approximately equated in level, and
presented twice to a group of eight
trained observers. At the first presentation
the observers attempted vowel
identification. At the second presentation
they rated the samples on a scale of
representativeness, knowing the vowel
which each sample was intended to
114represent. The frequencies of the formants
were measured and studied in
relation to the judgments, with the
following major findings.

a. Coordinate plots of the first two
formants were conventional in their
general locations, but the influence of
the selective sampling could be observed
in the areas for the nine vowels. These
were smaller and overlapped less extensively
than the areas for unselected
samples reported in previous studies.

When the samples were restricted to
those which had been correctly identified
by 75% or more of the observers,
the areas were considerably smaller and
mutually exclusive, so that specification
of F1, and F2 differentiated any
area from all others. In most instances
the ratio F2/F1 for a given vowel had
a range in common with that for at least
one other vowel, which should not be
the case if the relative vowel theory
is a complete explanation.

When the identified samples were
further restricted to those judged to be
most representative of the respective
vowels, most of the areas became very
small. However, even with these small
areas adequate differentiation between
them was not given by either F1, F2
or F2/F1 alone.

b. Although associations between
self-approval, identifiability, and representativeness
were close, the relationships
were not perfect. Self-approval
by an expert did not insure identifiability.
Identifiability was not invariably
accompanied by high judged representativeness;
it seemed to be a necessary
but not sufficient condition
therefor. When a given vowel sample
was misidentified or judged to be unrepresentative,
a plausibly related deviation
of F1 or F2 was in most cases
present. Most of the atypical values of
F1 or F2 were reflected in the judgments.

c. Study of the third formant confirmed
the results of past investigations
to the effect that F2 is a much less
powerful determinant of acoustic vowel-ness
than either of the lower two
formants. As the judgmental restrictions
were applied, the ranges of F3
tended to decrease progressively, suggesting
that some information is contributed.
When the most rigorous
restriction was exerted, F3 was higher
for the samples of /i/ than for almost
all of the preferred samples of other
vowels, such vowels being left with a
common range of F2.

d. In some of the samples of /ɔ/,
only one formant could be distinguished
in the lower range. Although such one-formant
samples were readily identified,
they were not among the group
judged to be more representative.

References

1. Crandall, I. B., The sounds of speech.
Bell Syst. tech. J., 4, 1925, 586-626.

2. Koenig, W., A new frequency scale for
acoustic measurements. Bell Lab. Rec., 27,
1949, 299-301.

3. Peterson, G. E., and Basket, H. L., Control
methods used in a study of the vowels.
J. acoust. Soc. Amer., 24, 1952, 175-184.

4. Potter, R. K., and Peterson, G. E., The
representation of vowels and their movements.
J. acoust. Soc. Amer., 20, 1948, 528-535.

5. Potter, R. K., and Steinberg, J. C., Toward
the specification of speech. J.
acoust. Soc. Amer.
, 22, 1950, 807-820.

6. Stevens, S. S., Volkmann, J., and Newman,
E.B.
, A scale for the measurement
of the psychological magnitude pitch. J.
acoust. Soc. Amer.
, 8, 1937, 185-190.115

1* Reprinted from the Journal of Speech and Hearing Research, Vol. 4, 1961, pp. 203-19.

2** Grant Fairbanks (Ph.D., University of
Iowa, 1936) is Professor of Speech, University
of Illinois. Patti Grubb (Ph.D., University
of Illinois, 1956) is Research Associate,
Laboratory of Neurological Research, College
of Medical Evangelise, Los Angeles. The
article is based on the Ph.D. dissertation of
Patti Grubb; the investigation was supported
by the Research Board of the University of
Illinois.

31 These procedures were independent of the
auditory judgments, of course, but when the
two types of data were brought together it
was interesting to note a strong impression
that auditory identifiability of a sample by
listeners is associated with visual identifiability
of the formants in a spectrum. This was
especially true of the preferred samples (see
below), in which the three formants were
almost invariably prominent and distinct.

42 Although the study of non-distinctive
energy regions was not an objective of the
experiment, certain observations may be of
interest. In most of the samples a concentration
was encountered in the 3 000-3 500 cps
range, a region that a number of previous
investigators have remarked. Additional regions
at lower frequencies, apparently characteristic
of individual voices, were also
found. Such regions tended to be reasonably
constant from sample to sample within
speaker, but to differ in location from speaker
to speaker. For instance, one speaker exhibited
two regions of this kind in most of his
samples, one at about 1250 cps and the
other at about 1650 cps.

53 It should not be overlooked that identification
here was of samples essentially alike in
duration, level, and fundamental frequency,
shorn of any assimilation cues. High specificity
of spectrum was required of a successful
sample, since this was the listener's sole
basis. This is obviously a different matter
from identification of a word, such as was a
feature of the procedure of Peterson and
Barney (3). Although word detection is based
in part on the vowel's spectrum, wide latitude
therein may be offset by the other cues that
are available.

64 Thus Figure 1 may not be exactly compared
to linear plots, or to those using the
mel scale of Stevens, Volkmann, and Newman
(6) or the pitch approximation scale of
Koenig (2).

75 Crandall (1) many yean ago also failed
to resolve two formants for the same vowel
as produced by adult speakers.

86 It Is interesting to note that imposition of
a criterion of 100% identifiability did not
result in any such large restriction of areas
in the experiment of Peterson and Barney
(3). As has already been mentioned, however,
in that experiment identification was essentially
word detection, and high word detection
does not necessarily mean that the vowel
will be identifiable in isolation.

97 In evaluating this evidence it should not
be overlooked that the present subjects were
exclusively men. As indicated by earlier investigations,
notably Potter and Steinberg (5)
and Peterson and Barney (3), the relative
theory may apply to vowels in a different
sense, namely, that the complete systems of
women and children are displaced upward
roughly along an equal ratio line in a plot
of F1 and F2, presumably because of smaller
physiological resonators. So interpreted the
relative theory may indeed be applicable as
a reference to the preservation of relations
between formant combinations in subgroup
or even individual systems of vowels.