Fairbanks, Grant. Experimental Phonetics – T15

Test of Phonemic Differentiation: The Rhyme Test *1

Grant Fairbanks
Speech Research Laboratory, University of Illinois, Urbane, Illinois
(Received March 10, 1958)

Materials are presented for a test of word identification in which the cues for response are confined to
the initial consonants and consonant-vowel transitions. Some preliminary results are discussed.

This article describes a method for testing a restricted
aspect of speech reception which may be
referred to as phonemic differentiation. The test was
motivated originally by need for experimental materials
in which: (1) the spoken word would be the stimulus
unit; (2) recognition of the word would be the response;
(3) the response would depend upon the initial consonant
and consonant-vowel transition; and (4) the
subject's task would bear valid relation to the discrimination
demands of real speech. In order that a
reasonably parsimonious explanation of variance would
be possible, it was desired that auditory-phonemic
factors weigh heavily in the score, and that linguistic
factors of higher order weigh lightly. Another requirement
was that several different, but reasonably comparable
forms be developed, so that subjects would
remain naive with respect to vocabulary through a
series of determinations. In addition, it was considered
necessary that the test be short, convenient to administer,
and suitable for groups.

Design

The Rhyme Test is of the completion type. The
stimulus words are drawn from the vocabulary of 250
common monosyllables shown in Table I. This vocabulary
consists of 50 sets of five rhyming words each. To
yield a single 50-item form, one word is drawn from
each set. In the table the sets are shown in a usable
random order that is the same for all forms. Various
forms are indicated within Table I, and will be discussed
below.

Within a set the five rhyming words differ in the
initial consonant phoneme, in each case an element.
The words of a set are also spelled alike in the rhyming
portion (“stem”), and differ in the initial consonant
spelling, in each case a single letter. Each spelled stem
is unique. The subject's response sheet shows the 50
stems in order of stimulus, each preceded by a space
in which he enters one letter to complete the spelling of
the word (e.g., —ot, —ay, —op). 1 2Copies of the same
response sheet serve for different forms. Care has been
taken to avoid spellings with variable pronunciation
(e.g., some-home, lose-pose-dose, etc), both within the
stimulus words and among the possible response words
outside the list. Thus, each spelled stem denotes an
unmistakable class of words. Instructions are simple.
The subject is told, with a couple of illustrations, that
he will hear 50 words that are spelled in order on the
sheet, each with its first letter omitted, and that he is
to write in the letters as he hears the words. The
literacy demand is close to minimal, probably less than
that imposed by reading the alternatives in a multiple-choice
test.

In constructing the materials, the first consideration
was to pick sets that would provide, within the constraints
mentioned, a number of possible response words
for each stem, from which were to be selected five
stimulus words for each stem that rank high in familiarity,
five being the number of matched forms.

The total vocabulary of possible responses corresponding
to the 50 stems finally chosen, i.e., the English
words formed by one-letter additions to the stems, is
estimated as 536. The matrix shows a range of 6 to 16
response words among the 50 stems, with a median
of 11. When words of low stability in the working
language (regional, archaic, and obscene words, rare
slang, rare names) are excluded, the test taps a vocabulary
of about 475 words. From words of this kind it is
likely that the typical stem offers the average adult a
choice of 8 to 9 alternatives. For example, the first
stem would be likely to offer col, dot, got, hot, lot, not,
pot, rot; in most cases jot, sot and tot would also be
expected, wot rarely. This is a larger number of alternatives
than is practical in a multiple-choice test. It will
be noted that the chance probability is determined by
the personal vocabulary of the subject, but remains
constant for him from form to form.

Selection of the stimulus words was guided by the
data of Thorndike and Lorge. 2 3The cumulative distribution
is shown in Table II. According to that count, the
250 stimulus words are among the 9000 most frequent
words of the language, 200 are among the most
frequent 3000, and 112 are among the most frequent
1000. The 112 from the first 1000 are distributed among
46 of the rhyming sets; five sets consist entirely of such
words. Each of the 50 sets has at least three words
among the most common 4000.

The stimulus vocabulary involves 18 consonant
phonemes, and their distribution is given in Table III.
139According to the data of French, Carter, and Koenig, 3 4
these 18 consonants account for approximately 90% of
all consonant occurrences in the language. The remaining
seven are eliminated by the design; /ŋ/ and /ʒ/
occur only finally, while /ɵ/, /ð/, /ʃ/, /tʃ/ and /hw/
take two-letter spellings, which were excluded in the
interests of simplifying the response. The rank-difference
correlation between order of frequency of initial consonants
in the test (the order in Table III) and total
occurrences (all positions) in the language is 0.65, including
in the calculation the seven consonants absent

Table I. Stimulus vocabulary. Columns: five comparable forms,
matched in phonemic distributions and word familiarity. Italics:
Form RT-F, blased with high word familiarity. Asterisks: Form
RT-P, matched to consonant distribution of language.

Table II. Cumulative distribution of word frequency,
based on count of Thorndike and Lorge. 2 5

tableau most common (x 1000) | N

Table III. Distributions of consonants and vowels in
the stimulus vocabulary. Descending orders.

tableau consonants | vowels | /s/ | /t/ | /b/ | /m/ | /l/ | /p/ | /r/ | /w/ | /k/ | /h/ | /f/ | /d/ | /n/ | /g/ | /ʤ/ | /v/ | /j/ | /z/ | /ɪ/ | /e/ | /ɛ/ | /ɑ/ | /ɑɪ/ | /ʌ/ | /i/ | /æ/ | /ɔ/ | /u/ | /u/ | /o/ | /ɔɪ

from the test. The distribution of the 13 vowels and
diphthongs in the stems is also shown in Table III.
The most ubiquitous vowel, /ə/, is excluded by restricting
the words to isolated monosyllables, and /ɚ/,
/ɝ/, /ɑu/, and /ju/ are also absent. Occurrences of /e/
and /ʌ/ are relatively high in the order, but the main
characteristic of the vowel distribution is that it is somewhat
flatter then in the language.

The consonant-vowel transitions in the stimulus
vocabulary were also tabulated, but the diversity was
such that they need not be reported in detail. Among
the 250 words, 129 different CV combinations were
found, 63 of them occurring once only. The most
frequent are /bɛ/, /tɛ/, /se/, /me/, and /wɪ/, each
with five words. The total number of different combinations
is 55% of the possible number for the 18 consonants
and 13 vowels, and 31% of those possible in
English (using 23C x 18V and ignoring incompatibilities).
It is concluded that transitions are adequately
represented and that there are no undue biases. The
diversity, incidentally, has some interesting possibilities
for comparison. For example, /s/ is combined with 12
different vowels and diphthongs, /b/ with 11, /t/, /m/,
/k/, /h/ with 10 each. The eight sets of words with /e/
involve 15 different initial consonants, the seven with
/e/ involve 14, etc.140

Table IV. VU readings of the recorded version.

tableau VU | RT-1 | RT-2 | RT-3 | RT-4 | RT-5 | total

Recorded Version and Results in Noise

A version of the stimulus materials has been recorded
on tape, with the author as speaker, and given some
experimental trial. The words were spoken in isolation,
clearly, naturally, and with average vocal effort.
Because of the effect of vocal effort upon the consonant-vowel
ratio, 4 6reasonable uniformity was considered important,
especially with a test of this kind. An attempt
to keep the effort uniform was facilitated through use of
a regular 5-sec respiratory cycle, established by light
flashes, with one word uttered at the beginning of the
expiratory phase of each cycle. Rhyming words were
spoken consecutively and the entire list was completed
without stopping. No metering was employed. The
recording was made on Magnecord M-90 equipment
at 7.5 ips; an Altec M-11 microphone system was
used. The distribution of VU readings for the total
vocabulary is shown in the last column of Table IV.
The measurements were of the master version and were
made with a Daven 911-B meter, reading the vowel
maxima rounded to the midpoints shown. The table
shows the distribution to be within a range of 5db
with a mild negative skew. The middle 125 words are
within 1.5 db by interpolation.

The recorded test was studied in the following
manner. The master version was divided into five
stimulus tapes. The five words of each set were ordered
at random and one word assigned to each tape. The
same random order of sets was used for all tapes. The
subjects were 40 university students, divided into two
main groups of 20 (A and B), with each group divided
further into five subgroups of four. With Group A five
vowel-to-noise ratios, ranging from 2 to -6 db in 2-db
steps, were used to bracket the point of 50% identification.
For a given subgroup one tape was assigned to
each V/N ratio, and the five tapes administered in a
single descending series with brief rest between tapes.
Each subject thus was exposed to the complete vocabulary
once. Assignment of tapes to V/N ratios was
varied across the subgroups in a systematic Latin
square, so that all words and subjects were represented
equally at each ratio. The procedure with Group B was
identical, except that ratios of 15, 9, 5,1, and -2 db
were used. These were chosen to span the upper part of
the identification range and overlap the values for
Group A, repeating one ratio, -2db. All tapes were
administered at the same level, with the median vowel
about 65 db above threshold. Noise (0-20000 cps)
from a Grason-Stadler generator was adjusted re the
medians of the individual tapes. PDR-10 headsets were
used; presentation was by subgroups of four; the
method of response was as described earlier.

The results are presented in Table V. The progressions
are regular, and the means from the two groups agree
very satisfactorily in the range of overlap. It will be
noted that from 9db to -6db, word recognition
declined in an approximately linear manner with vowel-to-noise
ratio. The slope is about 3% per db. This is
similar to that usually found with PB words in noise and
to that of gain functions with PBs in quiet, suggesting.
that the heterogeneity of the Rhyme Test might be!
equally suitable for discrimination functions. As would
be expected from the nature of the task, the scores
appear to be somewhat higher than those obtained
with PBs in noise, possibly around 15% over the range
studied. For 50% identification the difference should
be about 5 db. In interpreting Table V it should be
remembered that the ratios were established re the
vowels. The average consonant-vowel ratio was probably
around -15 db for the level of effort employed
by the speaker.

The procedure was not planned for detailed comparison
of the five random lists. At each V/N ratio,
each list was heard by a different subgroup. It is possible,
however, to make general comparisons by combining
the data for each list, since all lists appeared at all
ratios and were heard by all subjects. For the 20
subjects of Group A the lists ranged from 44 to 54%
around a mean of 48%; for Group B (larger ratios)
the range was 68 to 73% and the mean 71%. The grand
mean for all 40 subjects was 60%, with the lists ranging
from 57 to 64%. Variations of this size correspond to
changes of about 2 to 3 db in V/N ratio (see Table V).

The ratio -2db was used with both groups and
yielded respective means of 49 and 51%. Thus it was
possible to develop a score for each word from the responses
of 8 subjects, at approximately the point of
50% identification of the mean word. These scores were
pooled by consonant phonemes in order to explore their
relative identifiability, the total number of responses
per phoneme being 8 times the entry in Table III. The
results of these tabulations are shown in Table VI,

Table V. Mean percentage correct for complete stimulus vocabulary
at various V/N ratios. Twenty subjects in each group.

tableau V/N (db) | group a | group b141

where the rank is a descending order of percentage
correct. The findings are to be interpreted with due
regard to the fact that only one speaker was used.

It will be seen that the over-all variation is three to
one. Comparison of /m/ and /t/, both with substantial
numbers of total responses, shows a difference of two
to one. Each of the 21 words with /m/ was identified by
at least 4 subjects; 15 of the 22 words with /t/ were
identified by fewer than 4 subjects. The power of the
phonemic factor is also shown by the range of variation
within rhyming sets. For instance, in the first set of
Table I, hot, etc, the words ranked 8, 8, 6, 4, 1 in
number of correct identifications, a range of 7 subjects.
The median range for the 50 sets was 5 subjects; in 45
sets the range was 4 subjects or greater.

Table VI also discloses that the phonemic differences
are lawful as well as powerful. The nasal consonants
are at the top of the order, and 2/3 of the voiceless consonants
are in the bottom 1/2 of the order. 5 7The following
relationships may not be so evident.

/m/ > /n/
/g/ > /b/ > /d/
/k/ > /p/ > /t/
/f/ > /s/
/v/ = /z/

Rows and columns of the above arrangement designate
features; within either, the contrast of any two adjacent
members is minimal. A high degree of orderliness is
evident in both dimensions. Probably the acoustical
characteristics of the consonants per se are primary

Table VI. Mean percentage correct for the various consonants
at V/N = -2 db. Descending order. Eight subjects.

tableau % correct | /m/ | /n/ | /j/ | /g/ | /f/ | /l/ | /b/ | /w/ | /r/ | /k/ | /d/ | /ʤ/ | /s/ | /p/ | /h/ | /t/ | /v/ | /z/

determinants of the vertical relationships, while the
consonant-vowel transitions figure importantly in the
horizontal progressions. This is an intriguing finding
that should be followed up with more speakers. The
data support the idea that the Rhyme Test intercepts
the speech reception process at a stage in which a substantial
portion of the variance in word identification
is attributable to the distinctive features of phonemes.
The systematic variations also justify the matching
procedures described immediately below.

Matched Forms and Special Forms

The columns of Table I indicate five test forms,
designated as RT-1, etc, that are comparable in their
phonemic distributions. As nearly as the numbers of
total occurrences permit (see Table III), the distributions
are matched. That is, the number of occurrences
of a given consonant in any form does not differ from
that in any other form by more than one. These one-word
differences were offset as much as possible by
“dovetailing” with similar consonants, so that the
distributions of features are also comparable. After the
consonants had been distributed, the CV transitions
were examined. There appeared to be no undue bias.
The 63 unique combinations are divided among the
forms with a range of 11 to 14; the number of different
combinations per form varies from 41 to 45; the maximum
number of occurrences of one combination in any
one form is three. The forms also are satisfactorily
comparable in word frequency. They range from 20 to
27 words among the most common 1000. All 50 words
of RT-3 are among the most common 6000; the form
that differs most at that point has 47 words among the
most common 6000. Table IV shows the distributions
of VU readings for the RT forms, and the similarities
will be noted. The five medians are within a range of
0.2 db by interpolation.

Two special forms are also indicated in Table I.
Italics show a form, RT-F, that is loaded with familiar
words and might be suitable for testing a child or
semiliterate adult. All but four of the words are among
the most common 1000. The other words are sale, pink,
and jump in the second, and luck in the third 1000. In
addition to high familiarity, RT-F meets the standards
of phonemic distribution used with the numbered forms
and tests 16 different consonants.

The asterisks in Table I indicate RT-P, a form that
was drawn to approximate the consonant distribution
reported by French, Carter, and Koenig. 3 8 This form includes
all 18 consonants of Table III, and the rank-difference
correlation between their frequency and that
of all consonants in the language is 0.86. Familiarity
was also a condition of selection. All words are in the
first 5000, 48 in the first 3000, 41 in the first 1000. RT-P
and RT-F have 34 words in common.

It will not be overlooked that a variety of other
forms may be drawn which control the distributions
of consonants or consonant-vowel transitions in particular
142ways. Study of the sets will show that one form
with 48 voiceless consonants, another with no voiceless
consonants, and another with 43 voiceless fricatives,
for example, are possible. By subscoring, forms with
controlled distributions of features should have considerable
utility in analytic studies of individuals and
systems. One such form might consist of 25 voiced and
25 voiceless consonants, the two halves being matched
with respect to the other features. Among persons with
hearing loss, degree of loss might be correlated with
the voiced subscore and type of loss with the difference
between the voiced and voiceless subscores, etc. Finally,
it should be mentioned that the Rhyme Test stems
may be used with an extended stimulus vocabulary
that can serve various special purposes. Consonant
phonemes that take two-letter spellings, consonant
clusters, rare words, etc., are examples of controls that
may be exerted within the same format.

Acknowledgments

The author is grateful to Anthony Holbrook and
Murray S. Miron for technical assistance. He is also
indebted to the latter for conducting the experimental
trials.143

1* Reprinted from The Journal of the Acoustical Society of America, Vol. 30, 1958, pp. 596-600.

21 We have found it convenient to arrange the stems in columns
of 10, designated A, B, C, D, E, with a single numbering from 1
to 10 along the left margin. In the recorded versions mentioned
below, the start of each column is identified by “Column A,” etc.
Individual items are not identified. We consider this preferable
and have found it to be feasible.

32 E. L. Thorndike and I. Lorge, The Teacher's Word Book of
30,000 Words (Columbia University Press, New York, 1944).

43 French, Carter, and Koenig, Bell System Tech. J. 9, 290
(1930).

Voir note 2 5↑ 3.

64 G. Fairbanks and M. S. Miron, J. Acoust. Soc. Am. 29, 621
(1957).

75 Cf. G. A. Miller and P. E. Nicely, J. Acoust. Soc. Am. 27, 338
(1955). “…voicing and nasality are much less affected by a random
masking noise than are the other features.” Based on initial
consonants followed by /ɑ/.

Voir note 3 8↑ 4.