CTLF Corpus de textes linguistiques fondamentaux • IMPRIMER • RETOUR ÉCRAN
CTLF - Menu général - Textes

Fairbanks, Grant. Experimental Phonetics – T12

Effects of Vocal Effort upon the Consonant-Vowel Ratio within the Syllable *1

Grant Fairbanks and Murray S. Miron
Speech Research laboratory, University of Illinois, Urbana, Illinois
(Received February 7, 1957)

Subjects spoke /s/-vowel-/s/ monosyllables at two syllabic levels which required a large contrast in effort
without extreme departure from average conversational effort. At the higher level the median within-syllable
consonant-vowel ratio was approximately -14 db; at the lower level (vowel 18 db down) it was approximately
-7 db. The ratio also varied with position of the consonant within the syllable, with vowel and with
sex of speaker; effort, vowel, and sex interacted. The variations were systematic and large enough to have
implications for intelligibility.

In connected speech the level of the average consonant
is substantially lower than that of the average
vowel. Among the data of Sacia and Beck 12 are measurements
of mean power for 15 consonants and 10 vowels
sampled from speech at conversational levels. The
mean consonant-vowel ratio is approximately -15 db,
and 14 of the consonant means are lower than any
vowel mean. There is reason to believe that this relationship
changes with the general level of live speech
in response to variations of vocal effort. Casual listening
shows that in shouting the disparity is large and that in
whispering it is small. Between these extremes, and in
general correspondence to the degree of effort, it would
seem likely that the consonant-vowel ratio should vary.
In view of the well-known contrasts which characterize
consonants and vowels as classes (e.g., “close” vs
“open”), it would not be expected that their levels
would change equally with effort. The purpose of the
present experiment was to explore the relationship
within the syllable and in the middle of the range.

The problem is pertinent to the interpretation of
certain measurement procedures, e.g., specification of
signal-to-noise ratio. In practice such procedures usually
involve selective sampling of signal “peaks,”
which, by the nature of speech, confines the measurement
largely to the vowels. The importance of vocal
effort as a factor in the psychoacoustics of intelligibility
has been emphasized by Kryter. 23 A study of the effects
of extreme effort upon word intelligibility has recently
been reported by Pickett. 34

Procedure

The general plan was to secure samples of monosyllables
spoken with two distinctly different degrees of
effort, both within the functional range, using maximum
syllabic intensity as the indicator and establishing
common targets for all subjects. After exploring typical
ranges it was decided to use a separation by 16 db as
the working requirement for variation of effort, with a
minimum separation by 12 db, and to fix the higher
target at an intensity level of approximately 70 db. 45

It was decided to consider a single consonant, /s/.
This voiceless fricative provides a suitable contrast with
vowels in the features of articulation presumed to be
related to the experimental problem, is of interest because
of its high frequency of occurrence in the language,
and is in the mid-range of consonant power. 16 During
exploratory measurements it was found that /s/, when
combined with a vowel in a syllable and normally
articulated by typical speakers without special stress,
could be differentiated in almost all cases from the
adjacent vowel in sound pressure tracings made with
the equipment available. The consonant /s/ was somewhat
more reliable in this respect than other consonants
studied, but its modulation in response to the planned
experimental conditions appeared to be very representative
of voiceless fricatives as a class. In order to
study the effects of vowel and consonant position, the
following specific syllables were formed: /sis/, /sæs/,
/sʌs/, /sas/, /sus/. The exploratory work also made it
clear that the sex of the speaker should be considered.
Accordingly, two subgroups of subjects were drawn,
consisting of 10 men and 10 women each, from lower
division, undergraduate, university students. Equal
numbers of men and women were assigned to a counterbalanced
order of levels. The order of vowel syllables
within level was randomized for each subject.

The subject stood 12 in. from the microphone with
his forehead against a small positioning bar, and observed
a VU meter connected across the output of the
recording amplifier, the center of the meter face being
marked for a 4-db target range. The five monosyllables,
spelled and with diacritic marks, were presented on
cards. Instructions were to repeat the syllable until
stopped, attempting to peak the meter in the range.
The meter target was the same throughout, the difference
between levels being effected by a 16-db adjustment
in recording amplifier gain. The subject began on
a light flash signal and fixed his own repetition rate.
All utterances, however, were discrete and produced on
smooth expiratory movements. The average rate was
approximately two syllables per second. Care was
exercised to offer the subject no auditory models for
93pronunciation. For this purpose lists of key words for
the vowels were shown to the subject, along with the
stimulus cards which were to be used in the experiment
proper. The complete experimental session was recorded
at 15 ips with a Magnecord M-90 tape recorder
and Altec M-11 microphone system.

At each of the 10 syllable-level combinations the
subject produced numerous samples in a sequence, as
explained. Five utterances from each sequence, a total
of 50 per subject, were selected for measurement. Two
judges trained in phonetics first screened each sequence
for phonemic accuracy and for approximate attainment
of the level target, as indicated by VU meter. Sound
pressure tracings of the acceptable utterances then
were made and inspected for identifiable maxima of all
three phonetic elements. The last five utterances which
met this requirement in a given sequence were
measured.

The output tracings were made with a sound apparatus
model HPL-E recorder. Frequency response was
essentially flat over the speech range and amplitude
calibration was verified. Because of the 16-db adjustment
of tape recording gain between the two syllabic
levels, the graphic amplitude was in the same general
range for all utterances. The recorder was set for “fast”
stylus, a nominal speed of about 360 db/sec with the
50-db potentiometer. In order to display the pressure
dynamics within the syllable more fully and enhance
identification of the consonant peaks, the speed was
raised, effectively, by reproducing the 15-ips tape recordings
at 7.5 ips. This divided frequencies, reduced
output and doubled the time base, but had inconsequential
effect upon frequency response. Over the 125
to 6000 cps range the combined response of tape recorder
and level recorder was changed from ±1.5 db
to ±2.5 db by the half-speed playback, which was
small in comparison to the experimental changes, as
will be seen. The effect of this procedure upon measurements
of typical syllables was also studied. Differences

image relative sound pressure (db) | duration (sec) | s | æ

Fig. 1. Curves for
one syllable spoken
by a typical subject
at high and low syllabic
levels.

were all small and were such as might be expected from
the different effective velocities. In particular, no
systematic effect upon consonant-vowel ratio could be
discerned. Each of the 1000 experimental utterances
was measured at the maximum value of the ordinate
for each of the three phonetic elements.

General Effect of Syllabic Effort

Figure 1 displays a pair of tracings of sound pressure
records at the two levels, obtained for the syllable /sæs/
spoken by the same subject. The curves are typical of
the records and show the general nature of the results.
It will be noted that the syllables were reasonably symmetrical
in time, that decreases of pressure clearly
separated the three maxima at transition points, and
that the descending order of maximum pressure was
vowel, prevocalic consonant, postvocalic consonant. It
will also be observed that the consonant pressure was

Table I. Median relative sound pressures of vowels and consonants
of /s/-vowel-/s/ syllables spoken at high and low syllabic
levels. Values expressed in db re median vowel of all high-level
utterances.

tableau high level | low level | vowel | cons. | all utterances | prevocalic consonants | postvocalic consonants | /sis/ | /sæs/ | /sʌs/ | /sɑs/ | /sus/ | men | women

closer to the vowel pressure in the low level syllable
than in the high level syllable, a difference in relationship
which is the main object of interest in the
experiment.

The statistical treatment was based upon those
utterances by each subject which most closely approximated
the central tendency of the entire group of subjects
for a given level with respect to the sound pressure
of the vowel of the syllable. It will be recalled that at a
given level, either high or low, each subject offered
numerous utterances of each of the vowel syllables, and
that five were selected for measurement by a method
which has been described. The net yield, then, was 25
sets of syllabic measurements per subject per level, or
500 at each level for the 20 subjects. The two groups of
500 measurements of vowel pressure were distributed
and their composite medians calculated. The obtained
ratio of these medians was 17.6 db, which was considered
a good realization of the planned separation of
16 db. The utterances of each subject were then inspected.
94For each vowel syllable, that one of the
utterances was selected for which the vowel pressure
most nearly approached the composite median for that
level. The measurements of that utterance furnished
individual entries for the statistical treatment, except
for one comparison which, as will be seen, employed all
utterances.

Table I presents the medians for various groupings
of the data as specified, the value for the vowel at high
level being used as the reference. The top row shows the
variation when all syllables and subjects were pooled.
It will be seen that the median for the low-level vowels
was -17.7 db, and that the consonant medians at the
high and low levels were -13.8 and -24.9 db, respectively.
These four values are graphed at the extreme
left of Fig. 2, where the solid bars indicate vowels and
the open bars consonants. The values for the two levels
are aligned vertically. It will be noted that, while the
decrease in vowel pressure from high to low level was
17.7 db, the change in pressure of the consonants was
11.1 db. In other words, the reduction in vowel pressure
of that magnitude was accompanied by a reduction in
the consonant-vowel ratio of approximately 7 db. It
will also be observed in Fig. 2 and Table I that the
direction of this difference holds without exception for
all of the other subdivisions of the data which will be
discussed below. The consonant-vowel ratios for the
totals at the high and low levels, -13.8 and -7.2 db,
respectively, compare favorably to the data of Sacia
and Beck 17 for samples of connected speech. In that
study the ratio of /s/ to the mean power of the five
vowels in question was -12.5 db.

Although the mechanisms of voiceless consonants
and vowels are different, the production of a consonant-vowel-consonant
syllable is a unitary event. Systematic
variation of intensity is a secondary characteristic of
the vowel which has been shown to be closely related
to the articulatory variation of mouth opening. 58 A
similar positive relationship may be observed as the
intensity of a given vowel is varied. As the intensity
of an isolated voiceless fricative consonant is raised,
however, a decrease of aperture would appear to be a
common feature, so that as the intensity requirement is
raised for syllables such as have been measured here,
the articulatory adjustments for consonant and vowel
should become increasingly incompatible with respect
to the mouth opening. Since auditory monitoring of
syllabic level undoubtedly is referred predominately to
the vowels, the necessary mouth opening for the vowel
would tend to be the primary consideration, and any
articulatory assimilation related to syllabic intensity
should be from vowel to consonant. It is suggested that
the finding of variation in consonant-vowel ratio in the
presence of variation in syllabic effort may be interpreted
as reflecting such assimilation with respect to
mouth opening. With strong effort, for example, the

image relative sound pressure (db) | sex | vowel | total | position

Fig. 2. Median relative sound pressures of vowels and consonants
of /s/-vowel-/s/ syllables spoken at high and low syllabic
levels. Solid bars, vowels; open bars, consonants; bars for both
levels aligned vertically for each grouping.

full effect of the expiratory force would not be realized
by the voiceless consonants because of the assimilated
increase in aperture.

Effect of Consonant Position

Prevocalic and postvocalic consonants are compared
in the second sections of Table I and Fig. 2. The data
show that the former exceeded the latter by about 2
db at both levels. Comparison of the two consonants
in each of the 1000 utterances showed a similar difference
in 697 instances. When the samples were subsorted
by level, vowel, sex, and the combinations thereof,
all of the 54 distributions showed the same direction of
preponderance. When they were sorted by subject into
20 distributions of 50 utterances each, the same relationship
was characteristic of 17 subjects, a division
which is significant at the 1% level by sign test. It is
concluded that the tendency is strong; there is no evidence
that its direction is essentially influenced by the
other factors studied.

The greater sound pressure of /s/ in the prevocalic
position is in agreement with the variation of air pressure
during the syllable, as shown by the tracings of
Stetson. 69 During the course of measurement an interesting
durational tendency was noted. The time interval
between the prevocalic and postvocalic consonant
maxima did not change greatly with syllabic level; at
the low and high levels the mean intervals were 0.36
and 0.37 sec, respectively. Within that interval, at low
level, the vowel maximum was fairly well centered; the
respective C-V and V-C means were 0.17 and 0.19 sec.
At high level, however, the mean C-V interval was
0.15 sec and the V-C interval 0.22 sec, so that the syllable
became definitely asymmetrical.

Effect of Vowel

It may be seen in Fig. 2 that the median consonant
varied from syllable to syllable within level, the variation
with vowel being more pronounced at the high
95level. The amount of consonant change which ensued
as a result of instructed change in syllabic effort, as
can be seen from the amount of vertical separation between
the open bars in the figure, differed strikingly
from vowel to vowel, vowel pressure essentially constant.
This separation became as large as 14 db in /sis/
and as small as nine decibels in /sʌs/. The significance
of the variation was assessed by means of Friedmann's
test which involves analysis of variance by ranks.7 This
nonparametric method was considered preferable to
the conventional analysis because of the logarithmic
data. The analysis took the form of five vowel syllables
by 20 subjects. For a given vowel syllable and subject
the basic entry was the consonant-consonant ratio between
the high and low levels. The procedure involved
ranking the five such entries for each subject. The
degrees of freedom, which are calculated in this test,
were 3.9 and 74.1. The F was 9.24, which is significant
beyond the 0.1% level. Thus, the evidence appears to
indicate that the degree of modulation of consonant
pressure which accompanies changes in syllabic level is
associated with the vowel of the syllable.

With the syllables ordered as in Fig. 2, a conventional
arrangement of the vowels, the consonant variation at
high level may be characterized as W-shaped. This
pattern is a complete inversion of the systematic variation
of intensity and mouth opening which appears to
be “natural” to the vowels of these syllables, patterns
which are both M-shaped. 710 If, as suggested earlier,
articulatory assimilation from vowel to consonant is
present in syllables such as these, it would be expected
that the aperture of /s/ would change with the adjacent
vowel in a pattern that is also M-shaped, and that the
resulting variation in sound pressure would be inverse,
or have the W-shaped pattern shown in Fig. 2. For
example, if /s/ had a narrower aperture by assimilation
in /sis/ and /sus/ than in /sæs/ and /sɑs/, it would be
expected that the sound pressure of /s/ in the first two
syllables would exceed that in the last two, which is
what was found at the high level. It is also suggested
that articulatory assimilation from vowel to consonant
might have been operating differentially with respect
to force of expiration. In the present experiment the
condition that the five vowels be approximately equal
at a given level erased the natural M-shaped intensity
variation. It seems likely that a component of the
physiological activity by means of which that was
accomplished was varying expiratory force, particularly
at the high level. This would tend to be inversely related
to the natural pressures of the vowels, or to take a
W-shaped variation similar to that of the obtained
consonant pressures. The resulting differences in consonant
pressure from syllable to syllable at the high
level are consistent with either or both of the factors
mentioned, if assimilation is present.

Effect of the Speaker's Sex

The relative vowel and consonant pressures at the
two levels for the subgroups of men and women are
presented at the bottom of Table I, and the values are
plotted in the last subdivision of Fig. 2. Inspection will
show that when the subgroups were similar with respect
to vowels, in line with the major condition of the experiment,
the median consonants were different at
both levels. At high level the men exceeded the women
by about 2 db; at low level the women exceeded the
men by about the same amount. In other words, the
consonant range for the women was about 4 db smaller
than that of the men, and located within the latter
(see Fig. 2). This smaller range for the women obtained
without exception in the medians for both consonant
positions, for the different vowel syllables, and for the
position-vowel combinations. Within the two different
ranges, however, the internal variations with position
and with vowel were dissimilar at no point, and require
no qualification of the foregoing discussion of those
effects. The significance of the sex difference in range
was assessed by means of the Mann-Whitney test of
difference in location within rank order, 811 which yielded
a z of 1.74, significant at the 4% level for the one-sided
test.

It will be observed that the variation of the consonant-vowel
ratio with syllabic effort thus was greater
for the women. Although revealed in the consonants as
shown, it is believed that the most probable basis for
the sex difference is that the syllabic levels of the experiment
constituted requirements of effort that were
relatively higher in the vowel ranges of the women than
in those of the men. There are two major reasons for
favoring this interpretation. First, there is evidence
that, at conversational level and with self-determined
effort, the vocal intensity of men exceeds that of
women, and that the difference is primarily in the low-frequency,
vowel-dominated part of the spectrum. 912 The
other reason comes from experiences during the preliminary
work which guided the choice of the high-level
target for the present experiment. Considerably higher
levels were explored, and the impression was formed
that men tend to exceed women in average upper limit
of output. If the interpretation is correct, the lower
consonant pressure of the women at high level may be
explained as a more pronounced effect of the vowel-to-consonant
assimilation of mouth opening previously
suggested. At low level the reversed relationship of men
and women may accompany greater expiratory effort
by the women, with consonant opening comparable to
that of the men, or it might reflect more efficient fricative
96production by the women for equal expiratory
effort, because of their smaller mouth dimensions.

Summary and Discussion

For the voiceless fricative /s/ the consonant-vowel
ratio was found to vary within the syllable in response
to changes in the speaker's effort. When the effort was
strong, but not extreme, the obtained median ratio for
all measurements pooled was about -14 db. When the
effort was reduced, so that the vowel was about 18 db
lower, the ratio was about -7 db. Although the variation
of effort was large, all utterances were in a functionally
useful range, and no subject either “shouted”
or “whispered” at any time. The vowels and consonants
of all measured utterances satisfied a standard of
phonemic accuracy that was readily attained by all
subjects. The samples included no gross distortions of
articulation such as accompany extremes of effort in
either direction, and no tendency for articulation to be
systematically more accurate at either level was noticed.

The consonant-vowel ratio also varied, effort constant,
with position of consonant, vowel, and sex of
speaker; the extent, but not the direction, of the variation
with effort was dependent upon certain of these
variables, as has been shown in detail. When the effects
of the other variables are combined with that of effort,
the range of variation of the consonant-vowel ratio
was found to be substantial. For example, the median
ratio for prevocalic /s/ in a low-level syllable was
about 8 db smaller than for postvocalic /s/ in a high-level
syllable (see, e.g., the pair of utterances in Fig. 1);
the ratio was about 10 db smaller in low-level /sʌs/
than in high-level /sɑs/ (see Fig. 2). For the prevocalic
/s/ in /sʌs/, spoken by the women at low level, the
median ratio was -5.3 db; for the postvocalic /s/ in
/sɑs/, spoken by the women at high level, the median
was -18.1 db (from specific combinations not reported).

Insofar as /s/ is concerned, the within-syllable variations
of the consonant-vowel ratio are lawful and appear
to be large enough to be important in word intelligibility.
The extent to which they are present with other
consonants remains to be determined. Undoubtedly, the
variations are smaller with some consonants, and the
ratio may not vary significantly at all with others. The
characteristics of consonants, however, are such as to
make it unlikely that the average within-syllable ratio
for any consonant phoneme would vary systematically
in a manner opposite to /s/ in response to the conditions
studied. Therefore, it seems reasonable to suggest
that the findings of the experiment, although not
necessarily applicable to all consonants, have application
to the tendencies of consonants as a class, with
respect to the conditions under which the within-syllable
consonant-vowel ratio would be expected to
vary, and to the direction of the variation.

Other factors equal, a small consonant-vowel ratio
should favor intelligibility inherently. The experimental
work of Licklider 1013 and of Davis et al. 1014 has shown that
peak clipping and amplitude compression, both of which
reduce the consonant-vowel ratio, have advantages
which outweigh the effects of distortion under some
conditions. The change of effort required of speakers in
the present experiment was found to be accompanied
by a within-syllable change from low to high level that
may be thought of as an amplitude expansion of about
7 db. In a transmission system a speaker's effort to
increase his intelligibility by increasing his effort might
be unrewarding, if his effort improved the listener's
vowel-to-noise ratio (the usual criterion) only, without
adequately increasing the consonant-to-noise ratio. In
some systems the attempt would even defeat itself, e.g.,
if vowel intelligibility were degraded by amplitude distortion.
Within the common range of effort it is possible
that natural variation of the consonant-vowel ratio
may prove to be as important as variation of articulation.
In the complex of factors that modulate intelligibility
over the common range, it may be that the
variation of the ratio serves to counteract other systematic
effects. One example of such an effect is the
reduction of articulatory precision that usually accompanies
lessened vocal effort in untrained speakers, which
has long been observed by students of speech. Over the
middle range of effort the average speaker's net intelligibility
may be essentially constant for a given listener's
signal-to-noise ratio, as suggested by the curves of
Pickett. 315 A skilled speaker, however, capable of varying
vowel intensity without essential changes of articulatory
precision, should find the naturally smaller consonant-vowel
ratio at the lower vowel levels potentially
advantageous with a given speaker's signal-to-noise
ratio. For this reason, if the speaker is skilled and may
be stationed in quiet, the data indicate that the best
talker doctrine for extremely difficult conditions of
transmission and reception might be low-level speech.

In the experiment of Pickett the subjects, all men,
were required to produce degrees of effort that were
more extreme than either level used in the present
study, as has been mentioned. Intelligibility was determined,
and an important analysis of listener's errors in
the three portions of PB monosyllables was made. This
analysis showed the following: “Shouting degrades
primarily the intelligibility of the [prevocalic and post-vocalic]
parts of the syllable [and] weak vocal effort
degrades the intelligibility of all parts of the syllable.” 316
The consistency with the direction of change of the
consonant-vowel ratio in response to more moderate
variations of effort is notable. Pickett also found that
postvocalic errors were more numerous than prevocalic
97errors for all levels of effort, which agrees with the
asymmetry of sound pressure found in the present
experiment.

The findings support the general suggestion of
Kryter 217 that more study of the effect of vocal effort
upon intelligibility should be made, if acoustic data
are to be used for prediction. They imply that substantial,
systematic changes in the internal relationships
of the speech signal may accompany variations of
effort that are not exceptional. In particular, they suggest
that the common procedure of metering the vowels
of monosyllables deserves careful study in relation to
the speaker's original effort, and may have important
practical limitations.

Acknowledgments

The authors are indebted to the University Research
Board of the University of Illinois for support of the
investigation. They are also grateful to Anthony Holbrook
and Thomas H. Fay, Jr., for technical assistance.98

1* Reprinted from The Journal of the Acoustical Society of America, Vol. 29, 1957, pp. 621-26.

21 C.F. Sacia and C. J. Beck, Bell System Tech. J. 5,393 (1926).

32 K. D. Kryter, J. Speech and Hearing Disorders 21, 208 (1956).

43 J. M. Pickett, J. Acoust. Soc Am. 28, 902 (1956).

54 By GR 759-B meter in a rock wool-lined, 9 by 16 by 7 ft
laboratory. The separation required was somewhat less than the
syllabic range of average connected speech. The targets were
between Pickett's “low force” and “very loud” conditions, and
straddled his “normal force” condition (see reference 3).

Voir note 1 6 2.

Voir note 1 7 2.

85 G. Fairbanks, Speech Monogr. 17, 390 (1950).

96 R. H. Stetson, Motor Phonetics (Oberlin College, Oberlin,
1951), second edition.

107 H. W. Walker and J. Lev, Statistical Inference (Henry Holt
and Company, New York, 1953).

118 See reference 7. The entry for a given subject was the range in
db from the median low-level utterance to the median high-level
utterance.

129 R. W. Benson and I. J. Hirsh, J. Acoust. Soc Am. 25, 499
(1953). See also H. K. Dunn and S. D. White, J. Acoust. Soc Am.
11, 278 (1940). The obtained differences in both cases were
about 3 db.

1310 J. C. R. Licklider, J. Acoust. Soc. Am. 18, 429 (1946);
H. Davis et al., Hearing Aids: An Experimental Study of Design
Objectives
(Harvard University Press, Cambridge, 1947).

14 Voir note 13.

Voir note 3 15 4.

Voir note 3 16 4.

17 Voir note 3.