CTLF Corpus de textes linguistiques fondamentaux • IMPRIMER • RETOUR ÉCRAN
CTLF - Menu général - Textes

Fairbanks, Grant. Experimental Phonetics – T19

Recent Experimental Investigations of Vocal Pitch in Speech *1

Grant Fairbanks
State University of Iowa, Iowa City, Iowa
(Received January 7, 1940)

Among the researches which have come from
the Experimental Phonetics Laboratories
of the State University of Iowa, a large number
have utilized acoustic measurement as a phase
of experimental method. The present paper summarizes
four of the most recent investigations
which deal with different aspects of the same
general problem: the fundamental pitch of the
male voice during speech. The general purposes
of the report are to illustrate experimental approaches,
to present experimental findings, and
to indicate directions in which future research
may proceed. 12

All of the experiments described below utilized
the same technique of acoustic measurement, a
modified oscillograph, designed originally by
Metfessel 23 and improved successively by Simon, 34
Lewis and Tiffin, 45 and Cowan, 56 which permits
measurement of fundamental sound wave frequencies
from phonograph recordings by phono-photographic
means. Although period-by-period
measurements are possible with this instrument,
the most common procedure in speech studies
has been to make average measurements over
consecutive intervals of 0.038 sec. each. The
over-all error of this method of frequency measurement
is approximately 0.5 percent, or 0.04

The first of the four experiments was concerned
with the pitch characteristics of the voice during
the simulation of specific emotional states. 67 Six
versatile amateur male actors served as subjects,
each reading the test passage, “There is no other
answer. You've asked me that question a
thousand times, and my reply has always been
the same. It always will be the same,” five times,
simulating in turn contempt, anger, fear, grief
and indifference. They repeated the simulations
until the experimenters judged that satisfactory
examples of the emotions had been secured from
each subject. High quality phonograph recordings
of all attempts were cut on lacquer disks.

In order to validate the simulations a method
of observer-identification was used. The recordings
of the 30 samples to be analyzed, six simulations
of each of the five emotions, were played in
random order before a group of 64 observers,
advanced students of speech. Five ambiguous
recordings were introduced into the random order
to prevent the observers from deducing that only
five different emotions were being studied. The
observers were provided with lists of 12 emotional
states: amusement, anger, astonishment, contempt,
doubt, elation, embarrassment, fear, grief,
indifference, jealousy, love. Their task was to
select from the list as each recording was played
the term which named the simulated emotion
most accurately. The gross loudness level was
kept as nearly constant as possible. The results
of this judgment procedure are shown in Fig. 1,
the identifications of all six simulations of each
emotion being grouped. It is readily apparent
that the simulations were highly satisfactory
examples of the emotions to be studied.

Differences of “pitch pattern” are illustrated
by the “pitch curves” of Fig. 2. For each emotion
the curve presented is that of the simulation
which was identified correctly by the largest
percentage of observers. The abscissa is time, one-second
intervals being indicated by vertical lines;
the ordinate is the equal-tempered musical scale
with horizontal lines marking the major triads. 78161

image percentage of the observers | contempt | anger | fear | grief | indifference | love | elation | amusement | embarrassment | astonishment | jealousy | doubt

Fig. 1. Distributions of identifications of simulated
emotions by observers.

Notable in this figure are the few extremely wide
downward inflections in the simulation of contempt,
the generally wide, rapid inflections of
anger, the irregularity of the pitch changes in
fear, the consistent vibrato in grief, and the lack
of distinguishing features in indifference. Typical
variations of pitch level are revealed in Fig. 3,
which shows frequency distributions of the
measured pitches for the two subjects who were
generally most successful (left group) and least
successful (right group) in producing identifiable
simulations. The ordinate again is the musical
scale; the abscissa, in each distribution, is percentage
of the total measured pitches; labeled
horizontal lines indicate the respective medians.
It will be observed that marked differences obtain
between the various emotions, both in pitch
level and dispersion. For example, consider in the
left group the extent of more than one octave
which separates the medians for fear (225∼)
and indifference (101∼), and the wide dispersion
for contempt as compared to that for grief although
the medians of the latter two simulations
are only one tone apart. Viewing the group of
distributions at the right of Fig. 3, it is seen that
the pitch levels are by no means as clearly defined
as those of the other group. This is true
especially of anger and fear, and of grief and
indifference. It may be suggested that these
similarities contributed to the reduction in the
percentage of correct identifications. Another
interesting feature of this graph is that it shows
both subjects to employ total pitch ranges of
over three octaves in simulating the five emotions.
In five of the six subjects this was found to
be the case.

The most important results of the pitch
measurements are summarized in Table I, in
which the six subjects are considered as a group.
For each measure all values for all subjects were
grouped in a frequency distribution and the
measure computed without reference to the
individual subjects. Only in mean total pitch
range, in which one value is derived from each
simulation, is the number of cases six. 89 Consideration
of Table I, together with the evidence of
Figs. 2 and 3, will reveal the distinctive vocal
pitch characteristics of five simulated emotions.162

image contempt | anger | fear | grief | indifference

Fig. 2. Pitch curves of typical emotional simulations. For each emotion is shown the curve of the simulation which
was identified correctly by the largest percentage of observers.

A second investigation 910 dealt primarily with
the relationship between the fundamental vocal
pitches used in speech and the range of pitches
which the larynx is capable of producing, i.e., the
maximum singing range. This study grew but of
the clinical observation that adjustment of the
pitch level is an important factor in vocal
therapy, especially in such cases as hoarseness,
breathiness, harshness and weak voice, in which
an improper pitch level frequently operates

Table I. Pitch measurements of five simulated emotions.

tableau contempt | anger | fear | grief | indifference | median pitch level (∼) | mean total pitch range (tones) | mean extent of inflections (tones) | mean extent of shifts (tones) | mean rate of pitch change (tones/sec.) | mean no. of changes in direction per second during phonation163

image contempt | anger | fear | grief | indifference

Fig. 3. Frequency distributions of pitches used in simulating the five emotions by the
two subjects who were most successful (left group) and least successful (right group) in
producing identifiable simulations.

causally. Superior speakers served as subjects,
it being assumed that such persons approach the
use of pitch levels which permit their mechanisms
to function with maximum general efficiency in
speech. Six such adult male speakers were selected
according to the following procedure. (1)
Twenty-five college students were recommended
to the experimenters as having superior speech
and voice usage by staff members in the Department
of Speech at the State University of Iowa.
(2) These students were subjected to informal but
rigid scrutiny of their speaking and oral reading
abilities, and rejected if they deviated from
superiority in any particular. Eight students
survived this procedure. (3) Each of these eight
students then read the same 55-word factual
prose test passage as well as he was able four
times. Phonograph recordings were made. The
four trials by each subject then were ranked in
order of general excellence by seven trained observers
and the best recording of each group of
four thus was selected for further consideration.
No attempt was made at this point to compare
the subjects. (4) The method of paired comparisons
then was used to select the best six of the
eight. Each recording was paired with each other
recording and played before observers who
judged which of the two was superior, making
possible thus a comparative ranking of the readings
on the basis of general excellence.

Partial results of measurement of the recordings
are shown in the frequency distributions of
Fig. 4. This figure is directly comparable to
Fig. 3, except that the new subscript system
mentioned above has been used. The solid vertical
line at the left of each distribution indicates
the total normal singing range and is surmounted
by a broken line showing the extension of the
range into falsetto. The figure presents several
interesting aspects. It is evident that these subjects
employed no frequencies in the upper
halves of their respective ranges, nor were any
frequencies measured which were higher than the
highest tones of the “normal” registers. 1011 At the
lower ends of the distributions it will be observed
that a significant number of frequencies are used
in speech which are lower than the lowest sustainable
tones. This is especially true of subjects C,
D and E. Such frequencies occur only momentarily
and are troublesome in the measurement
procedure since consecutive waves tend to vary
markedly in period length. Extreme conservatism
was used in measurement, however, all
ambiguous instances being omitted, and the frequencies
shown were without question present
in the speech. In this regard, subject E presented
several frequencies of the order of 25-30∼ which164

image falsetto | C | D | B | A | F | E

Fig. 4. Frequency distributions of pitches used by six superior speakers during the oral
reading of factual prose. Solid vertical lines indicate normal singing ranges; broken lines
indicate extension of the ranges into falsetto. Medians shown by the horizontal lines
across the distributions.

were omitted because they could not be measured
confidently, but which are readily perceptible in
the phonograph recording. This phenomenon appears
to be worthy of further investigation.
Another remarkable feature of Fig. 4 is the
proximity of the medians for the six subjects.

One object of the study was to compare various
clinical devices for calculation of the “optimum”
or “natural” pitch level. Without describing this
procedure in detail it may be stated in summary
that one method suggested by the above data
was discovered to be superior in accuracy, reliability
and convenience to the others which were
scrutinized. This method arose from the fact that
the mean ratio of the interval separating the
median pitch level from the lowest sustained
tone of the singing range to the total singing
range including falsetto was found to be 0.25,
with individual ratios varying from 0.21 to 0.28.
The natural level thus is predicted 25 percent
of the way up the total singing range including
falsetto. Using this method, the mean deviation
of the calculated natural pitch from the actual
median pitch was found to be 0.45 tone, with a
maximum deviation of 0.84 tone. When compared
to nine other devices this method predicted
the median more accurately 70 percent of the
time. Its reliability was demonstrated by repeating
the predictions for three of the subjects
on seven consecutive days, and for one of the
three subjects ten times on one day, the calculations
falling within ranges of 1.5, 1.0 and 0.13
tones for the respective subjects.

The same six superior speakers also participated165

image falsetto | normal | higher pitch | lower pitch | highest pitch | lowest pitch | more flexible | less flexible | most flexible | least flexible | subject E

Fig. 5. Frequency distributions of pitches used by a superior speaker reading factual prose in a normal
manner, and in response to instructions to reread the passage at higher and lower pitch levels and with more
and less general pitch variability.

in another experiment 1112 in which
each subject, after listening to his recording
which was measured above (hereinafter referred
to as the “normal reading”), read the same
passage (1) at a higher pitch level, (2) at a lower
pitch level, (3) with more general variability of
pitch, and (4) with less general variability of
pitch. Extremes were avoided, but audition of
the phonograph recordings indicated that instructions
had been followed in all cases. The
chief purposes of the study were to validate
certain measures which had been assumed to be
indicative of pitch variability and to determine
whether a relationship obtains between pitch
level and pitch variability.

With respect to pitch level the findings are
indicated clearly in Table II. in which the group
medians, i.e., medians of composite frequency
distributions considering the six subjects as a
group, are ranked in descending order. Expressions
are given both in cycles per second and
tones above the zero reference frequency, 16.35∼.
Immediately observable is the fact that an increase
or decrease in variability tended to be
accompanied by a systematic change in pitch
level, although this should not, of course be
interpreted to mean that it is impossible in a
given reading to reduce the variability and raise
the pitch level, or vice versa, simultaneously.
The trends in pitch level may be observed graphically

Table II. Composite median pitch levels.

tableau cycles per second | tones above 16.35∼ | higher pitched readings | more variable readings | normal readings | less variable readings | lower pitched readings166

in Fig. 5, which presents frequency distributions
of the measured pitches for one subject.
This figure also shows distributions of
additional readings at still higher and lower
levels and with still more and less variability.
Although recorded for all subjects, these extremes
were measured only in the case of this one
speaker. The distributions on the first, second
and fourth ordinates are those considered in this
study. The additional recordings are of interest,
however, since they show that the relationship
between level and variability tends to prevail, at
least for this subject, with even greater departures
from the normal reading.

The measures which this study demonstrated
to be descriptive of general pitch variability in
speech are given in Table III. Here the readings
are arranged from left to right in ascending order
of pitch variability as indicated by the measures
within the table. From the regular progression of
these values it is seen generally that the higher
pitched readings tend to depart from the normal
readings in the direction of increased variability,
while the lower pitched readings evince an opposite
tendency. One reversal of this trend is
noted in the mean extent of pitch shifts in the
lower pitched reading, which exceeds the value
for the normal reading; in two other cases adjacent
values are identical. The second term in
the table, mean functional pitch range, refers to a
computation of the pitch range required to subtend
the median 90 percent of the pitches used.
It is employed instead of the total pitch range
since the latter tends to be influenced by a few
extreme frequencies (see Figs. 3, 4 and 5) and
hence is not a stable measure of dispersion. The
use of the median 90 percent is entirely arbitrary,
but probably is a suitable fraction for descriptive

Table III. Measurements descriptive of general pitch variability. Means of performances by superior speakers reading
factual prose in a normal manner, and in response to instructions to reread the passage at higher and lower pitch levels and
with more and less variability. Measures arranged from left to right in ascending order of variability. All values in tones.

tableau less variable | lower pitched | normal | higher pitched | more variable | mean S.D. | mean functional pitch range | mean extent of inflections | mean extent of phonations | mean extent of shifts

* Subtends the median 90 percent of the pitches used.

image the rainbow is a division of white light into many beautiful colors

Fig. 6. Superimposed pitch curves of the first four
seconds of the more variable and less variable readings of
one subject, indicated by solid and dotted lines, respectively.

It might be expected that a more rapid rate of
pitch change, i.e., more abrupt inflections, would
accompany increased general variability of pitch.
It was interesting to discover, therefore, that the
mean rate of pitch change was slower in all “instructed”
readings than in the normal readings.
This is explained by the fact that all instructions
tended to lengthen the mean duration of inflections,
as shown by another phase of the investigation
not reported in this paper. 1113 With the
exception of the normal readings, however, the
expectation that rate of pitch change is a differentiating
feature of different degrees of general
pitch variability is confirmed by the descending
rank order shown in Table IV. It might be
predicted also that changes in direction of pitch

Table IV.

tableau mean rate of pitch change (tones per second) | normal reading | more variable reading | higher pitched reading | lower pitched reading | less variable reading167

image adult | age groups

Fig. 7. Composite frequency distributions of pitches
used by 10-year-old, 14-year-old, 18-year-old and adult
males reading factual prose.

would become more frequent as general variability
is increased. The reverse was found to be
true, however. The number of changes in direction
of pitch per second was larger for the less
variable than for the more variable readings in
all computations, considering changes during
phonation only, during the total speaking time,
changes one semi-tone and greater in extent, or
less than one semi-tone in extent. Fig. 6, showing
superimposed pitch curves of the first four seconds
of the more variable and less variable
readings of one subject, portrays graphically
some of the features discussed above.

The fourth study 1214 departed from consideration
of the adult voice to make an exploratory
investigation of the phenomenon of voice change
in the adolescent male. Three groups of six subjects
each were selected at the following age
levels: (1) pre-adolescent (10 years old), (2)
adolescent (14 years old), (3) post-adolescent (18
years old). In addition, the six superior adult
speakers (20-30 years old) who served as subjects
in the two experiments reviewed immediately
above were available for direct comparison, since
the same reading passage was used. The three
picked groups were selected on the basis of
chronological age, height, weight, intelligence,
reading comprehension and speaking ability to
approximate as closely as possible in all factors
the medians for their respective ages. General
aspects of pitch usage at the age levels studied
are shown in the composite frequency distributions
of Fig. 7, graphed from measurements of
the oral reading performances. One feature of
the graph is the fact that the median pitch levels
of the 10- and 14-year-old groups approach each
other in the neighborhood of C4 while those of
the 18-year-old and adult groups coincide closely
near C3, with a major difference of approximately
five tones between the 14- and 18-year-old pitch
levels. Apparently the most marked drop in vocal
pitch during adolescence occurs during the latter
four year period, and it is indicated that research
in that age range would be fruitful.

Consideration of the so-called adolescent
“voice breaks” formed another phase of the
investigation. Oscillograms showed their most remarkable
pitch characteristic to be the rapidity
of the change in fundamental frequency. This
change was found to be essentially different from
the inflectional type of frequency modulation in


Fig. 8. Pitch changes during adolescent “voice breaks”
of 15-year-old male subject Z. The abscissae are wave

which the period changes comparatively gradually,
being instead an abrupt change from wave
to wave. Twenty-five measurable' breaks were
found in the performances of the 14-year-old
subjects and 20 at the 10-year level. At both ages
a few were found which presented such an aperiodic,
chaotic state that they could not be measured
with certainty, although their terminal
pitches were clearly identifiable. No breaks were
found in the voices of the 18-year-old or adult
subjects. 1115 Fig. 8 presents plots of voice breaks
from measurements of the periods of individual
sound waves, the abscissae being wave number.
Whenever possible the curves include ten waves
both before and after each break, since inspection
of the oscillograms showed that this number gave
an adequate indication of the terminal frequencies.
The dotted lines in the curves of the lower
row are examples of the chaotic periods mentioned
above, with the respective durations indicated.
Although the curves of this figure are
from an abnormal case of voice change (subject
Z, mentioned briefly below), and are for the
most part higher on the frequency scale than
those at the 10- and 14-year-levels, they represent
very satisfactorily the typical form of the breaks.

Measurement of the 45 instances found in the
speech of the 10- and 14-year-old subjects revealed,
first, that downward breaks were more
frequent than expected, although outnumbered
by upward breaks, and that no important or
consistent differences obtained between the
breaks at 10 years and those at 14 years. A second
finding was that the median and also the mode
extents of the breaks were close to one octave,
although individual extents ranged from four to
11 tones. It was discovered further that the downward
and upward breaks simply reversed themselves
over approximately the same range in the
pitch scale. Downward breaks occurred from
frequencies close to the median speaking pitch
levels for the 10- and 14-year-old subjects down
to the region of the medians for the 18-year-old
and adult subjects, while upward breaks occurred
from the latter region to the former. In no subject
was a break discovered above the individual's
median; all were up from or down to the adult
male pitch level which probably is to be established

image Z | Y

Fig. 9. Frequency distributions of pitches used during
the reading of factual prose by two abnormal cases of voice
change, 15-year-old male subject Z and 19-year-old male
subject Y.

in a few years for these subjects, and which
adjustment may be assumed from the data to
be in progress at age 14. In this regard, the large
number of breaks found at 10 years indicates
that studies of voice during the early years also
169may yield important information. An additional
finding of interest is that breaks in a given direction,
upward or downward, occur from frequencies
within a relatively narrow range. The 11
upward breaks of the 10-year-old subjects, for
example, all occur from frequencies within a
range of three tones.

In addition to the groups discussed above, two
atypical cases of voice change were studied.
Frequency distributions of the pitches used by
these subjects are shown in Fig. 9. Subject Z
was a 15-year-old male whose voice breaks were
more frequent and noticeable than in average
speech at that age; subject Y was a 19-year-old
male referred to the Speech Clinic because of the
lengthy persistence of his period of voice change.
The latter subject shows a clear-cut picture of
partial establishment of an adult pitch level
(Median = 124∼) with breaks up to and down
from the childhood level. His distribution in Fig.
9 is dearly bi-modal, the secondary peak indicating
the upper terminal pitches of his voice
breaks. In fact, he used two discrete ranges in
the sample of speech measured. He differs from
the 10- and 14-year-old subjects in that his most
frequently used pitch level (and also his median)
is approximately one octave lower than theirs,
his breaks thus occurring above rather than
below his median level. Fifteen-year-old subject
Z presents a different situation. The median
(F#3) and mode pitch levels shown in Fig. 9 are
almost exactly midway between the typical childhood
(C4) and adult (C3) levels with pitches
distributed above and below this point over the
extremely wide range of 3.5 octaves. As can be
seen in Fig. 8, this subject's voice breaks were by
no means as definitely localized as the others
studied. Chaotic breaks also were more frequent.
Further study may reveal that the differences
shown by these two subjects are typical of certain
stages of adolescent voice change.

Each of these four investigations of vocal
pitch is immediately suggestive of further researches,
some of which are under way at present.
Among these are measurements of different types
of performances, such as impromptu speaking
and oral reading of poetry, comparison of inferior
and superior performers, genetic studies of children's
voices, and descriptive investigations of
female pitch.170

1* Reprinted from The Journal of the Acoustical Society of America, Vol. 11, 1940, pp. 457-66.

21 The last three of the studies reported are from Ph.D.
dissertations, completed under the direction of the author
in the Department of Speech, State University of Iowa.

32 M. Metfessel, “Technique for Objective Studies of the
Vocal Art,” Psychol. Monog. 36, 1-40 (1926).

43 C. T. Simon, “The Variability of Consecutive Wave
Lengths in Vocal and Instrumental Sounds,” Psychol.
Monog. 36, 41-83 (1926).

54 D. Lewis and J. Tiffin, “A Psychophysical Study of
Individual Differences in Speaking Ability,” Arch. Speech
1, 43-60 (1934).

65 M. Cowan, “Pitch and Intensity Characteristics of
Stage Speech,” Arch. Speech 1, suppl., 1-92 (1936).

76 G. Fairbanks and W. Pronovost, “An Experimental
Study of the Pitch Characteristics of the Voice During the
Expression of Emotion,” Speech Monog. 6, 87-104 (1939).

87 An A =440∼ scale is used. In this graph and in Fig. 3
middle C at 261.6∼ is labeled as C2. Now in use at the
Iowa Laboratories and employed in the other graphs of
this report is the new subscript system suggested by R.
W. Young, A Table Relating Frequency to Cents (C. G.
Conn Co., Elkhart, Indiana 1939), in which the zero
reference frequency of 16.35∼ proposed by H. Fletcher,
“Loudness, Pitch and Timbre of Musical Tones,” J. Acous.
Soc. Am. 6, 59-49 (1934), is designated by C2, middle C
thus becoming C1.

98 The median pitch level is the median fundamental frequency.
The total pitch range is the difference between the
highest and lowest fundamental frequencies measured in
a given sample and is expressed here in tones. It is computed,
as are the other measures involving pitch extent,
by means of the relation Ntones = 19.92 log10 f1/f0, where f1 is
higher and f0 the lower frequency. In the present studies
an inflection is defined as a frequency modulation, either
upward or downward, without interruption of phonation,
while the term shift refers to a change in pitch which takes
place between the terminal pitch of a given phonation and
the initial pitch of the subsequent phonation. The rate of
pitch change
is a measure of the rapidity with which frequency
is modulated per unit of time during inflections,
i.e., the relative “steepness” of the inflections. For any
inflection this is determined by dividing its extent in tones
by its duration in seconds. A change in direction of pitch
is a shift in frequency modulation from an upward
direction to a downward direction, or vice versa.

109 W. Pronovost, “An Experimental Study of the Habitual
and Natural Pitch Levels of Superior Speakers,” Ph.D.
diss., State University of Iowa, 1939.

1110 Some of the measured frequencies may have been
from falsetto phonations, however, since the two registers
overlap by an undetermined amount at the top of normal
register. Attempts were made to ascertain the lowest
falsetto tones, but abandoned because of the evident

1211 C. W. McIntosh, Jr., “A Study of the Relationship
Between Pitch Level and Pitch Variability in the Voices
of Superior Speakers.” Ph.D. diss., State University of
Iowa. 1939.

13 Voir note 12.

1412 E. T. Curry, “An Objective Study of the Pitch Characteristics
of the Adolescent Male Voice,” Ph.D. diss., State
University of Iowa, 1939.

1513 Such breaks sometimes occur in adult voices, however.