Fairbanks, Grant. Experimental Phonetics – T20

An Experimental Study of the Durational
Characteristics of the Voice during
the Expression of Emotion *1

Grant Fairbanks
State University of Iowa
and
LeMar W. Hoaglin
Parsons Junior College

In a previous report (Fairbanks and Pronovost 12) objective data
were presented on the pitch characteristics of five vocally simulated
emotions. The present paper is concerned with the durational
features of the same simulations.

I. Experimental Procedure

Since the first publication included a detailed description of the
procedure, a summary only will be given here. The major aspects
were the following.

1. Five emotional states — contempt, anger, fear, grief, indifference — were
studied.

2. The same test passage was employed for simulation of all five
emotions. This passage, 27 words in length, and capable of assuming
various emotional meanings according to the vocal interpretation,
was:

“There is no other answer. You've asked me that question a
thousand times, and my reply has always been the same. It always
will be the same.”

3. Using this passage, six versatile, amateur male actors, 20-30
years of age, each simulated the five emotions in turn. High-quality
phonograph recordings were made.

4. In order to determine the degree to which the simulations
exemplified the emotions which they purported to portray, the 30
recordings, five readings by each of the six actors, were played in
random order, before a group of 64 young adults. These observers
did not know what emotions were being simulated, and, to prevent
them from deducing that only five different emotions were concerned,
additional ambiguous recordings were distributed throughout the
random order. As each recording was played the observers, individually,
171selected from a list of 12 emotional states, containing the
five listed above, the term which seemed to name most accurately the
emotion being simulated. In view of the various precautions it is
believed that this identification task was certainly no easier and probably
more difficult than procedures employed previously in studies of
facial expression.

5. The judgments disclosed that most of the simulations were
highly satisfactory examples of the intended emotions. Considering
the six actors as a group, the percentages of correct identifications by

Table I.
Measures of rate. Phonation only, phrases only, total speaking time.
Duration in seconds, rate in words per minute. All values are means

the observers were as follows: contempt, 84; anger, 78; fear, 66;
grief, 78; indifference, 88.

6. For the physical division of the experiment, both fundamental
frequency and duration measurements were made by means of a
modified oscillograph which permits sound-wave photography from
phonograph recordings and highly reliable measurements of these
two aspects of sound. 23

II. Results 34

Rate

Rate. Table I presents the data on three different concepts of
speaking rate. Using the values for contempt as examples, the table
may be interpreted as follows: First, the mean total duration of phonation
was 9.34 sec, i.e., the average simulation involved 9.34 sec. of
vocal cord vibration. The second row gives the mean duration per
172word, an average of 0.35 sec. during the portrayals of contempt. The
mean rate, phonation only being considered, was 174 words per
minute. The next row adds pauses within phrases to the values of
the first row, a total phrasal duration of 11.22 sec, with a corresponding
rate in words per minute of 144. The third group of measures
concerns the total speaking time, the traditional method of calculating
rate, and shows that the average simulation of contempt was accomplished
in 14.03 sec, with a mean over-all rate of 116 words per
minute.

Comparison of the emotions in this table shows that fear and
indifference are most rapid and approximately equal in rate, followed
closely by anger, and then by grief and contempt in that order. This
ranking remains consistent for all three types of computation. That

Table II.
Mean percentages of pause and phonation

rate probably is one vocal symbol of emotional expression is indicated
by the magnitude of the differences shown. The mean duration of
the total speaking time in contempt, for example, is nearly twice that
in indifference. When the values in the last row are compared to
available data on the oral reading of factual prose it is seen that the
rate is somewhat slower than average in simulated contempt and
grief, and somewhat more rapid in anger, fear and indifference. Both
Mclntosh's 45 mean for a 55-word passage and Darky's 56 median for a
300-word selection were 166 words per minute. Any such comparison
must be qualified, however, because of variations in the lengths of
the samples. It seems probable that factual readings of the 27-word
passage used in the present experiment would prove in calculation u
be slightly more rapid, on the average, than similar readings of longer
selections.

Proportions of the Total Speaking Time Devoted to Phonation and Pause

Proportions of the Total Speaking Time Devoted to Phonation
173and Pause. That the division of the total speaking time into phonation
and pause is a distinctive feature of at least two emotions, grief
and contempt, is shown by the values given in Table II. In indifference
it will be observed that, of the total duration, 12 per cent was
devoted to pauses within phrases 6 7and 17 per cent to pauses between
phrases, a total pause percentage of 29. The balance, 71 per cent, is,
of course, the proportion devoted to phonation. Approximately these
same percentages prevail in all the emotions except grief and this has
also been the common finding in previous studies of oral reading of
factual prose. In the simulations of grief, however, the ratio of pause
to phonation was found to be 47 to 53, instead of approximately 30

Table III.
Number and duration of phonations and pauses. Duration in seconds

to 70. This increase in the relative amount of pause time in grief is
seen from the table to occur both within and between phrases, although
it is most marked at the latter points. The striking fact may
be noted in passing that in three of the six individual simulations of
grief more than 50 per cent of the total speaking time was devoted
to silence. Apparently the pause time is a highly important aspect of
the slowness of rate in grief. This is not true of contempt, however.
Although, as was seen in Table I, the mean over-all rate is even
slower in this emotion than in grief, Table II reveals that the proportions
of pause and phonation are very similar to those found for anger,
fear and indifference.

Duration of Phonations and Pauses

Duration of Phonations and Pauses. Table III compares the emotions
with respect to these aspects of duration. Both the mean and
median duration of phonations and of pauses within phrases were
174computed, because several of the distributions were somewhat skewed.
Although for any given emotion these two measures may differ in
magnitude, either is seen to be satisfactory for comparative purposes.

In duration of phonations, anger, fear, grief and indifference are
grouped closely, while the values for contempt are considerably
greater. With respect to the duration of pauses, within and between
phrases, however, both contempt and grief exceed the other emotions.
This is especially striking in the case of pauses between phrases,
where the mean duration of such pauses in grief is more than twice
that in anger, fear and indifference. Both contempt and grief exceed
the other emotions in number of phonations and pauses. These findings
are notable when considered in connection with other data discussed
above. In comparison to the other emotions both contempt

Table IV.
Mean duration of inflections in seconds

and grief are relatively slow in over-all rate, but it seems clear that
this slowness is accomplished in two different ways. In contempt a
relatively uniform slowing of the rate is found; phonations and pauses
both are prolonged, increased in number, and the proportionate division
of the total speaking time into pause and phonation is not disturbed.
But in simulations of grief the slowness appears to be primarily
a function of increased pause length, particularly between phrases,
without prolongation of phonations, although both increase in number;
the total time thus divides more equally into pause and phonation. 78

Duration of Inflections

Duration of Inflections. An in/lection is a pitch modulation in a
given direction without interruption of phonation. Table IV, which
presents the mean durations of inflections, is included primarily to
show one additional characteristic of simulated contempt. In general,
the situation here is very similar to that of the mean duration of
phonations: contempt exceeds the other emotions, which, in turn,
are closely grouped. The one exception to this observation occurs in
175the upward inflections of indifference. It is also noteworthy that indifference
is the only emotion in which the mean duration of upward
inflections is not substantially shorter than that of downward inflections.
These upward inflections may be an important feature of indifference;
it is recalled that in the study of the pitch characteristics
of the same simulations 89 it was found that the mean extent of upward
inflections was smaller than that of downward inflections in all emotions,
but that this difference was very minor in the case of indifference.

III. Summary

Six male actors simulated five different emotional states vocally,
using the same prose passage in all simulations. The readings were
recorded phonographically and, by means of a rigid identification
technique, were shown to be typical of the intended emotions. Objective
measurements of the duration aspects of the simulations were
made by means of sound-wave photography; the following were the
major results.

1. Considering the data as a whole, anger, fear and indifference
were found to differ markedly from contempt and grief. All three
of the first group presented rapid rate and short duration of phonations
and pauses, but they did not differ importantly from each other
in any respect considered in the present study.

2. Contempt and grief, on the other hand, may be differentiated,
both from each other and from the other emotions. Although they
are similar in that slow rate is characteristic of both, contempt being
the slower of the two, the slowness of the latter is produced by
approximately equal prolongation of both phonations and pauses,
the ratio of pause time to phonation time remaining the same as in
factual reading and the more rapid emotions. The slow rate of grief,
however, is caused almost entirely by prolongation of pauses, particularly
between phrases; so marked is this effect that the total pause
time is almost equal to the total phonation time.176

1* Reprinted from Speech Monographs, Vol. 8, 1941, pp. 85-90.

21 Fairbanks, G. and Pronovost, W., “An Experimental Study of the Pitch
Characteristics of the Voice during the Expression of Emotion,” Speech
Monog., 6, (1939), 87-104.

32 The most recent modification and description of this instrument has been
made by Cowan, M., “Pitch and Intensity Characteristics of Stage Speech,”
Arch. Sp., 1, (1936), Suppl., 1-92.

43 No attempt is made in this report to consider variability among the six
actors. A subsequent paper will discuss such differences, both in pitch and
duration, as they may be related to variations in identifiability of the portrayals.

54 Mclntosh, C. W., Jr. “A Study of the Relationship Between Pitch Level
and Pitch Variability in the Voices of Superior Speakers,” Ph.D. Dissertation,
State University of Iowa, 1939.

65 Darley, F. L., “A Normative Study of Oral Reading Rate,” M.A. Thesis,
State University of Iowa, 1940.

76 For purposes of analysis, the phrase was defined somewhat arbitrarily
in this study. Since almost all of the subjects divided the passage clearly into
four divisions with pauses after “answer,” “times,” and “same,” these limits
were used in all cases.

87 In any given sample of speech the number of pauses is, by definition,
equal to one less than the number of phonations. That this apparently is not
exactly true for the means in Table III is caused by rounding to one decimal
place in computing.

98 Fairbanks, G. and Pronovost, W., op. cit., 97.