Speech Segmentation in Japanese and English Bilinguals

Colleges of Computer Sciences and Science (Psychology)
Northeastern University Boston, MA
Topics Concerning Japanese and English Bilinguals

A Literature Review

The purpose of this paper is to provide an overview of the existing literature concerning speech segmentation, categorical perception, and some other issues concerning bilinguals of English and Japanese. It will touch on some of the differences in the two languages and how they affect learning the L2. The paper will start by providing background information about some of the two languages and some of the current issues involved in speech processing. The paper will then delve into the most current research, what the issues are, how it was done, and the results they have found. It will then go on to discuss the possible future directions of this research and end with references.

I. Introduction

People trying to learn a second language always have a difficult task ahead of them. Learning a new grammar and lexicon takes both time and practice. Some languages are diverse enough that on top of a new lexicon and grammar, there is also an entirely new segmentation system to be learned. Japanese and English are diverse in the way the speech streams are segmented into parts, with Japanese using the mora as the basic unit of perception (McQueen, Otake & Cutler, 2001) and English using stress (Cutler & Butterfield, 1992). Along with learning a new lexicon, grammar, and segmentation system is the problems with categorical perception. How can bilinguals learn new sounds that aren’t in their native language? Another important issue is if there are multiple segmentation cues, can there be universal segmentation cues besides the rhythm-based mora, syllable and stress-based processes? This paper will try to uncover these questions.


When participating in conversation the listener must do a myriad of tasks to comprehend what the other speaker is saying. On the physical side of things, the comprehended must process the sound waves from the air into electrical signals into the brain. From there he/she must then begin the process of turning sound signals into words, phrases, sentences, and finally into a complete dialog. The aspect of changing the raw speech stream into words is called speech segmentation (Cutler, Hehler, Norris & Segui, 1986). Speech segmentation is usually accomplished in different ways according to the language, the major rhythm-based segmentation types are mora in Japanese, syllable in French (Cutler, Hehler, Norris & Segui, 1986), and stress in English. The mora is the smallest Japanese unit of perception, it is subsyllabic. “It can be a vocalic nucleus, a nucleus plus syllable onset, or, as in the second and fourth morae of shinshinto, it can be the postvocalic portion of a syllable, i.e., the coda.” (Otake, Hatano, Cutler & Mehler, 1993). Japanese is a “mora-timed” language, where each mora represents a rhythmic unit; in comparison to English which is “stress-timed” (Beckman, 1982). There has been much research into the field of speech segmentation, specifically into the types listed above including stressed-base and syllable-based (Cutler, Mehler, Norris & Segui, 1986), as well as mora-based (Cutler & Otake, 1994). There has also been a fair amount of study on phoneme discrimination of /r/ and /l/ in native speakers of Japanese and English. One particular study (Miyawaki, Strange & Verbrugge, Liberman, Jenkins & Fujimura, 1975) used synthesized speech to compare Americans and Japanese at discriminating /ra/ and /la/. A follow-up study a few years later reinforced previous findings that Americans can categorically discriminate the phonemes, and that Japanese are at a near-chance level of perception (Strange & Dittmann, 1984). This study also showed that after training, the native Japanese performance on those phones increased, and therefore categorical perception is indeed possible for non-native speakers.

II. Review of the existing literature

1. Discrimination of /r/ and /l/ by native speakers of Japanese and English

Categorical perception of speech sounds is an important aspect of speech segmentation. Without the ability to properly hear differences in non-native speech sounds, then segmentation and comprehension will be negatively impaired. One of the seminal research papers written comparing the abilities of English and Japanese natives to discriminate the /r/ and /l/ sound was by Miyawaki, Strange, Verbrugge, Liberman, Jenkins and Fujimura in 1975. The research here confirms earlier findings that Japanese subjects cannot distinguish between /r/ and /l/ (Goto, 1971). This paper investigated the effects of linguistic use on the ability to discriminate the class of liquid phones in English and Japanese natives. They focused on the phonemes /l/ and /r/ in syllable-initial position. The choice for the liquid /l/ and /r/ phones were made due to the fact that “the distinction between these phones is phonemic in English but not in Japnanese.” (Miyawaki et al., 1975). The /r/ and /l/ phones don’t constitute a phonemic contrast in Japanese, and therefore would provide a good base to conduct tests on the differences in native and non-native discrimination tasks. This research paper used a speech sound generator to create a series of sounds between two phones in order to see where and how categorical perception would occur. The parallel-resonance synthesizer generated 15 3-formant sounds that would be used in the tests. The third formant (F3) was varied in frequency in steps between the /ra/ and /la/ sounds. From this set of 15 sounds, there would be two types of tests conducted on each subject group, an identification test and an oddity discrimination test. The subjects of this research consisted of 39 native American speakers and 21 native Japanese speakers. The discrimination task was to listen to a series of three sounds, and make note of which of the three was different. The results showed that the Americans could easily discriminate the target sound, only getting low scores if the sounds were ambiguous as to if it was /r/ or /l/, “pairs whose members were labeled as the same phoneme was considerable less accurate, although still above the 33% chance level” (Miyawaki, et al. 1975). The Japanese however, showed a near-chance level of discrimination. An interesting aspect of this study was the finding that when speech sound context was not included, the two groups behaved almost identically, “we see very clearly that the Japanese do not differ from the Americans on any of the comparison pairs. The nonspeech discrimination functions are virtually identical for the two groups of subjects.” (Miyawaki, et at., 1975). Both groups were able to discriminate isolated F3 patterns quite accurately, which indicates that both groups are able to hear sound differences physical sub-contextual level.

A later study on the /r/ and /l/ perception task using a synthetic speech generation process similar to the above study focused on if linguistic experience had an impact on categorical perception of the /r/ and /l/ phonemes. They were suggesting that “… native Japanese adults learning English as a second language are capable of categorical perception of /r/ and l/l.” (MacKain, et al., 1981). This study was similar to the above, but had a few differences. The first was how they varied the acoustic values both temporally and spectrally “to optimize the Japanese subjects’ opportunity to show perceptual differentiation of the /r/-/l/ contrast.” (MacKain, et al., 1981). The second was the inclusion of an AXB oddity discrimination task which is thought to provide a better opportunity to let the subjects detect auditory differences. The third was an identification task for both the American and Japanese groups. The Americans, as expected, displayed a strong category boundary with strong identification scores. The not-experienced Japanese subjects displayed poor categorization of the /r/ and /l/ with near chance levels in all stimuli types. The results in the non-experienced Japanese group extended the Miyawaki results. The experienced group “had intensive English conversation training with native American-English speakers and as a group spent a larger percentage of the average day conversing in English than did the not-experienced Japanese.” (MacKain, et at., 1981). This group displayed similar results to the Americans on both the identification tasks and the discrimination tasks. These results suggests “that the occurrence and abruptness of an /r/-/l/ category boundary for the experience Japanese might be related to their grater conversational English experience…” (MacKain et al., 1981). This particular study had added the AXB oddity task because it is less memory demanding and “they could use nonphoenetic auditory memory to aid performance” (MacKain et al., 1981) in the hopes that it would allow the non-experienced Japanese to obtain better results, but this particular task did not achieve the hopes of its design intentions. Overall, their research suggested that Japanese native speakers can obtain categorical perception of /r/ and /l/ with some practice and more experience, which is good news to aspiring bilinguals.

Research into the /r/ and /l/ phonemes is interesting particularly to Japanese because of the lack of contrast between these sounds. Distinctive contrast in particular speech sounds allows speakers of that language to discriminate where other language speakers would not. The above studies examined this aspect as well, but in a 1984 study by Strange and Dittmann focused on how the ability to discriminate sounds not available in the L1 can change with explicit training, “We were interested in whether we could modify the perception of AE word-initial /r/ and /l/ by adult Japanese learners of English, in the laboratory, using the psychophysical training task successfully employed by Carney et al. (1977).” (Strange & Dittmann, 1984). The design of this study was to test the abilities of eight female Japanese native speakers before training and then again after training to examine the effects of training on discrimination of the /r/ and /l/ series of synthetic speech sounds. The initial pre-training tests consisted of a minimal pairs test, an identification of the rock-lock series and an oddity discrimination task. The training was done individually over a three week period that totaled 14 to 18 sessions, it consisted of an AX discrimination task with immediate feedback. At the end of the training the post-training tests were given, which were the same as the pre-training tests. Pre-training results were similar to the results found in by MacKain in 1981, with near-chance levels of accuracy. The training task performance showed “gradual improvement over session with the greatest improvement in the first several sessions” (Strange & Dittmann, 1984). All subjects showed increased performance as the training sessions progressed. After the training, the post-training tests showed that “pretraining versus posttraining categorical perception tests for each of the eight subjects of the rock-lock series revealed that seven of the eight subjects improved as a function of the training.” (Strange & Dittmann, 1984). Post-training test results also showed improvement in their discrimination of cross-category pairs with over 75% correct (Strange & Dittmann, 1984). Overall, training did indicate improvement as performance on post-training tests showed better performance results (Strange & Dittmann, 1984). “We can thus conclude that training with a fixed-standard AX discrimination task resulted in improved (categorical) perception of the training stimuli, as tested by the (more demanding) identification and oddity discrimination tasks” (Strange & Dittmann, 1984). This study also tested differences in sound according to acoustic properties. They wanted to test if training in the /rock/ - /lock/ set would transfer over to good performance in a similar /rake/ - /lake/ test. Their results showed that it was indeed the case, and training did help this test achieve improved performance, this also supports the idea that bilinguals can indeed learn to perceive new sounds in a second language.

2. Speech segmentation using the Mora

One of the major differences in a language is how it is timed, or the rhythm of the language. Bilinguals not only have to learn a new lexicon and grammar system, but sometimes must also learn a new rhythm to speak their L2. An example of this would be a native English speaker trying to learn Japanese or vise-versa. English is a stress-timed language whereas Japanese is mora-timed. Research of the mora can help lead to further understanding of how languages of different timing style can be learned more efficiently for bilinguals as well as how differences in timing can affect different aspects of speech segmentation.

In a study by Otake et al., 1993, of how Japanese words are segmented by native and non-native listeners, results found that Japanese responses were consistent with moraic segmentation, while non-native listeners responded differently. The mora is a uniquely Japanese timing mechanism; it is smaller than a syllable, and is considered the basic unit of perception. This study set up four experiments with native and non-native speakers to examine the effects of native timing type on Japanese words. In the first experiment 40 native Japanese speakers were used to listen to a series of 3 to 6 words. When a word was heard with the target letters on a printed card, the subject was to press a button. Results showed that the native Japanese listeners responses better supported the mora hypothesis than the syllable hypotheses, confirming their initial hypothesis that Japanese speakers will segment words best with the mora-timed segmentation style. “The pattern of results in this experiment thus appears to offer strong support for the mora hypothesis but none to the syllable hypothesis.” (Otake et al, 1993). The second experiment tested English speaking subjects on Japanese words. The design was similar to the previous task on the Japanese subjects, where a series of Japanese words would be played, and the subject was to press a button once the sound on the printed card was heard. The results of this experiment are in stark contrast with that of experiment 1. The findings in this experiment support that “we many now conclude that English listeners do not exploit mora structure in the same way.” (Otake et al., 1993) [as Japanese native listeners]. A note about experiment 1, the targets were presented on printed card in Roman text orthography, this can introduce confounds due to how Japanese naturally represent sounds with kana characters. Experiment 3 was exactly the same design as experiment 1 with the exception that the target word was played through the headphones before the word sequence rather than being printed on a card. Experiment 3 had 40 native Japanese speaking subjects. The target sound was played first, followed by the sequences of test words. The subject would press a button when they heard the target word in the sequence. The results of experiment 3 replicated those of experiment 1. “The replication of experiment 1’s results strongly confirms our conclusions from the preceding experiments: Japanese listeners do not naturally segment speech syllable by syllable; they do naturally segment it mora by mora.” (Otake et al., 1993). These findings also cleared up the problem of orthography of Japanese speech sounds mentioned earlier. Experiment 4 was identical to experiment 3 with the exception of the subjects being 33 native French speakers. The results of experiment 4 were as expected, “the response patterns of French listeners are, as predicted, best accounted for by the syllabic hypothesis…” (Otake et al., 1993). These results also support findings by Cutler et al., 1986. Finally, these results support the predictions made initially, that non-native listeners will “not replicate the pattern of results shown with the same materials by native Japanese listeners” (Otake et al., 1993). These results support that non-native listeners will try to segment the sounds of a non-native language by applying their native speech segmentation system, whether its mora-based, stress-based or syllable-based. How could that affect learning a second language where the segmentation process is different from the native language?

Taking from some of the findings from the previous study, Cutler and Otake started a new research project in 1994 that would focus on whether subjects would apply their native segmentation processes to a foreign language. This research could show that inappropriate use of a segmentation process could inhibit the processing of a non-native language in bilinguals. “This suggests that segmentation procedures may indeed be highly similar to phonological categorization procedures: they effectively aid processing of the native language, but they may reduce processing efficiency for input in a foreign language.” (Cutler & Otake, 1994). This study also hoped to further support the mora hypothesis put forth by Otake in 1993. The first experiment was to test Japanese native speakers on if moraic targets “will be easier to detect than nonmoraic” (Cutler & Otake, 1994). This should prove to be the case under the mora hypothesis. This experiment will also test for whether phoneme detection “will be differentially difficult for vowels versus consonants” (Cutler & Otake, 1994). Experiment 1 was designed with 40 native Japanese speakers, they would listen for a target sound (O or N) in a series of Japanese words, and press a response key as soon as they detected the sound. The results from this experiment showed the same results seen in Otake (1993): “mora structure is crucially involved in the process by which Japanese listeners convert spoken input in lexically accessible representation.” (Cutler & Otake, 1993), as well as showing that quick responses to moraic input is not only restricted to CV input as seen in Otakes’ 1993 experiment. It should be noted that there were no significant findings concerning the differences in vowel and consonant response times mentioned earlier. Experiment 2 was conducted in the same way as experiment 1, with the exception that the subjects were 24 English native speakers. As expected, the non-native speakers did not demonstrate the moraic effects shown in experiment 1. Interestingly, “The main effect of vowel versus consonant target was, however, significant; consonants were detected both faster and more accurately than vowels” (Cutler & Otake, 1994). This supports the earlier findings that native listeners apply their native segmentation procedures to a non-native language. Experiment 3 was similar to experiment 2 except with English words as both target and sequence words. The subjects were 24 native English speakers. Findings in experiment 3 showed that “These findings are in line with previous failures to find significant phoneme detection differences between targets in stressed versus unstressed position with English listeners and laboratory-read speech.” (Cutler & Otake, 1994). These results were similar to experiment 2. Experiment 4 used the same materials as experiment 3, English words, while it used the same design, as well as the same subjects as in experiment 1, using Japanese native speakers. Results in this experiment found the vowel sounds were much more difficult to find than the consonantal targets (Cutler & Otake, 1994). The findings of experiment 4 also help support the mora hypothesis, namely “that mismatch between in the input and the native language phonemic repertoire plays a role in phoneme detection in a foreign language.” (Cutler & Otake, 1994). Experiment 5 was designed with 20 native Japanese speakers listening to a played back words of native Japanese. The procedure was that of experiment 1. The subjects were to press a button when they heard a target phoneme in the played back word sequence. The results of this experiment demonstrated what the experimenters had hoped for, “a new effect in Japanese phoneme-monitoring: targets in word-initial position are detected faster than targets in word-medial position.” (Cutler & Otake, 1994). This experiment showed the mora effect was significant for all four phoneme targets. Experiment 6 involved doing the same as experiment 5 but with English subjects. This experiment was designed to show the English speakers advantage of detection of vowels over consonant targets, as well as observing a difference in RTs in A and O due to “phoneme repertoire mismatch” (Cutler & Otake, 1994). This experiment used the same materials in experiment 5 with 23 native English speakers using the procedures from experiment 2. The results of this experiment showed that “there was again a significant overall advantage for consonant over vowel targets” (Cutler & Otake, 1994). Overall these experiments had shown that the “the moraic effect which Japanese listeners show in phoneme detection in their native language appears in world-initial well as in word-medial position and with a variety of phoneme targets.” (Cutler & Otake, 1994). These experiments also go to explain that the consonant and vowel detection abilities of English speakers appear when listening to Japanese, due to native English segmentation process advantages in this particular task.

In terms of bilinguals, the above studies used Japanese native speakers, some of whom had some English experience. Even with experience, it was shown that they were still applying mora segmentation to English input, as well as English speakers applying stress-based segmentation when it is inappropriate in Japanese input. As quoted from the above study “We believe that this finding has potentially important implication for understanding the processes of acquisition of a second language.” (Cutler & Otake, 1994). Although this research seems to cast a dim picture on those trying to learn a second language, all is not lost, in a study by Cutler et al., 1992, showed that more than one segmentation process is available, and even if not available, these rhythm based processes are heuristics for processing a non-native language, are not necessary for comprehension. (Cutler et al., 1992).

3. Rhythmic cues and the Lexicon

An important area of study is on how rhythmic cues in the speech stream affect lexical decisions in Japanese. A learner of Japanese would be affected by this in that many languages don’t use rhythmic cues, and so in learning Japanese, rhythmic cues may not serve as an available heuristic. Japanese provides a good test platform to study rhythmic cues and speech segmentation because Japanese “rhythm is based neither on syllables nor on stress. Instead, it is based on the mora, a subsyllabic unit which can be of five different types” (McQueen et al, 2001).

In a study by Cutler and Otake, examining old Japanese wood-block prints which contained a word-based joke system called “Goroawase” to examine if there existed sub-moraic information processing in the speech stream. This joke system is used by substituting a single mora in a word with another mora to create a similar sounding word with a different meaning, thus creating a word-pun. Their findings suggest that “mora substitution is more often than would be expected by chance in effect phoneme substitution because two words which overlap in all but a single consonant or vowel form a better target-pun pair than two words which overlap in all but a CV mora.” (Cutler & Otake, 2002). Another experiment in the same paper uses word reconstruction. They had subjects listen to a word with a replaced mora, and were told which mora was changed. The subjects were then to tell which word had been intended. This experiment would use mora as the cue to the target word, which would then be accessed via the lexicon. 45 native Japanese speaking subjects partook in this experiment and the results were as expected. 4 mora words were more easily accessed than 3 mora words. Their results suggested that “word reconstruction was significantly easier when the initial mora had been replaced by another mora sharing with it either C or V.” (Cutler & Otake, 2002). In their second experiment, they distorted the final mora of the word, and found that identification was faster and more accurate, which suggests “that this information can be exploited continuously rather than only on a mora-by-mora basis” (Cutler & Otake, 2002). This indicates that lexical access can occur without complete moraic information. In experiments 3 and 4 of their research, they tested replacing both third and fourth mora and placed a focus on consonant versus vowel replacement, they found similar results to experiment two. “Although both the V-replacement and C-replacement condition proved easier than M-replacements, there was also a difference between the first two: Replacement of a vowel proved easier than replacement of a consonant.” (Cutler & Otake, 2002). On their 5th experiment, they used a Yes/No type response to the design of the previous experiment, their results helped to cement their conclusion that there is continuous speech processing, even below the level of the mora. (Cutler & Otake, 2002). This does not suggest that there is no mora, but only that continuous speech sounds are helpful to the segmentation of the continuous speech stream (Cutler & Otake, 2002). Their results do however suggest, and support the findings of Norris et al., in 1997, “that the contribution of rhythmic categories in word recognition is the same for all languages” (Cutler & Otake, 2002). As rhythmic categories are universal to all languages (Norris et al., 1997), this would provide some relief to learners of Japanese a second language, as mora are not the end-all to segmentation, but a heuristic function that helps to make segmentation more efficient.

If mora and meter are both parts of speech segmentation, how can we combine the two? A study by McQueen, Otake and Cutler in 2001 tries to answer that. Their experiments use the PWC, or Possible Word Constraint developed by Norris et al., in 1997 to test if Japanese speakers use the PWC just as English and Dutch speakers. The PWC is another type of heuristic function used in speech segmentation particular to the Shortlist model developed by Norris et al., (1997). Their first experiment used 54 native Japanese speakers to listen to a native speaker of Japanese pronounce nonwords with Japanese words embedded within them. The subjects were then to press a button and pronounce the embedded word once they heard it. The results of the experiment showed that indeed, “listeners find it harder to spot words in impossible word contexts than in possible word contexts. Japanese listeners therefore appear to use the PWC when segmenting speech.” (McQueen, Otake & Cutler, 2001). Other tests done in this study all supported the PWC and show similar results done with English speakers. (McQueen, Otake & Cutler, 2001). The PWC suggests the speech segmentation is more universal than previously devised models. This model suggests that words are activated and no particular lexical decision is 100% decided, until all are decided. “The present experiments therefore support the theory of lexical segmentation that the PWC offers. On this view, candidate words are activated by the incoming speech stream and compete with each other until a lexical parse is settled upon.” (McQueen, Otake & Cutler, 2001). This model also allows for the rhythmic segmentation process to exist and help add to the parsing of lexical and segmental structure. “Furthermore, just as the rhythmic structure of English or Dutch provides English and Dutch listeners with cues to the location of likely word boundaries (Cutler & Norris, 1988, Vroomen et at., 1996), so too does the characteristic rhythm of Japanese provide Japanese listeners with a segmentation cue.” (McQueen, Otake & Cutler, 2001).

Rhythmic segmentation provides another type of segmentation heuristic along with mora-based segmentation as seen in the above papers. Although bilinguals have trouble when inappropriately applying their native language segmentation style to a non-native language, rhythmic segmentation is a universal heuristic which exists to compensate when L1 segmentation style fails to properly segment the non-native language.

4. Segmentation in non-native languages

An important area of study concerning bilingual speech segmentation is how non-native speakers segment their second language (L2). If non-native speakers can use any of the non-native language rhythmic, prosodic, lexico-syntatic, and syntactic segmentation processes, then they will have a much easier time segmenting the L2. The ability to learn a non-native segmentation process would be beneficial to any bilingual as it would facilitate comprehension of the L2. Research into brain plasticity for learning non-native language processes would directly benefit the existing body of knowledge.

In a study of brain plasticity and non-native lexical and segmentation processes, Sanders et al. in 2002 came up with interesting methods for probing non-native speakers to see if learning the L2 early in life or later in life would effect the ability to learn non-native lexical and segmentation processes. This study would research various language subsystems such as lexico-semantic, syntactic, and prosodic information processing. Using the earlier proposed theories that each language uses different methods of speech segmentation (English: Stress-based, French: Syllable-based, Japanese: Mora-based) they tried to find if non-native speakers used lexico-syntactic information processes when segmenting a non-native language (Sanders et al., 2002). Four groups were obtained to study this hypothesis, a group of native English speakers as a control group (E), early English Japanese native speakers (JE), early English Spanish native speakers (SE), late English Japanese native speakers (J), and a group of late English Spanish native speakers (S). “If non-native speakers fail to use rhythmic segmentation cues other than the rhythmic cues relevant to their L1, native speakers of Japanese (mora-timed) and Spanish (syllable-timed) would not be expected to use stress pattern as a segmentation cue when listening to English. Alternatively, native Japanese and native Spanish speakers might differ in their abilities to use stress pattern as a segmentation cue in English.” (Sanders et al., 2002). For this experiment they created 5 groups of 3 sentences each. The 5 groups were “Strong stress, initial position (SI), strong stress, medial position (SM), weak stress, initial position (WI), weak stress, medial position (WM), and target absent (TA).” (Sanders et al., 2002). Each group contained 3 sentences each of which were a semantic, syntactic or acoustic target sentence. The semantic words were normal English sentences, the syntactic sentences replaced all open-class words with non-words, and the acoustic words only retained the original prosody. Participants were asked to listen to a target, and then were asked to press one button if that target was heard in the beginning of the sentence, another button if it was heard in the middle, and the third button if it was not heard. Results indicated that “the fact that both groups of late-learners were able to use the lexical information supports the hypothesis that the lexico-semantic system remains relatively plastic beyond the age of 12.” (Sanders et al., 2002). These results support the idea that late learners can learn non-native language processes later in life, which is a reassuring fact for late-bilinguals, however “No group of non-native speakers used syntactic information to the same extent as native speakers.” (Sanders et al., 2002). These results, as well as other indicate that syntactic information processes are not as easily learned later in life (Sanders et al., 2002). An interesting finding was how both early and late learners of English Japanese-natives were able to use some segmentation cues that are seemingly effective in both English and Japanese; “this study could either indicate that both groups were applying a Japanese segmentation cue that happens to co-occur with stress in English or that both groups had enough exposure to English to learn a new segmentation cue.” (Sanders et al., 2002). Overall the findings of this study show that both lexical and semantic segmentation subsystems retain the “ability to change to a greater degree than do syntactic subsystems” (Sanders et al., 2002).

In terms of bilinguals, some very reassuring findings from the above study is that some segmentation processes are still learnable later in life, and they are also interchangeable, one or more can be used when another is unavailable or not applicable. “The findings also indicate that segmentation cues can be used flexibly by both native and non-native speakers, such that cues that are both available in the speech stream and usable by the listener are employed to a greater extent when other segmentation cues are either absent or not accessible to the listener.” (Sanders et al., 2002).

III. Possible directions for future research

In studies such as Otake (1993), findings were made that support the idea that non-native speakers segment a foreign language using their native segmentation processes. Future research could focus on bilinguals learning an L2 that has a totally different segmentation process. An example would be how a Japanese learning English or an American learning French begins to learn a new segmentation style, and how specific training on segmentation may make learning more efficient. A similar study design that could be used would be Strange & Dittmanns’ (1984) study on the /r/ - /l/ distinction of Japanese speaking English. This study could be applied to learning a new segmentation style. The time of acquisition of a new non-native speech segmentation process would also be of interest to this field. Also of interest to future research is the topics of multiple segmentation cues, such as lexico-semantic, and prosody, and how these can be used interchangeably along with more native segmentation processes such as mora, syllable, or stress-based segmentation.


Beckman, M. (1982). Segment Duration and the ‘Mora’ in Japanese. Phonetica, 39, 113-135.

Cutler, A., & Butterfield, S. (1992). Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of Memory and Langauge, 31, 218-236.

Cutler, A., Mehler, J., Norris, D., Segui, J., (1986). The Syllable’s Differing Role in the Segmentation of French and English. Journal of Memory and Language, 25, 385-400.

Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14, 113-121.

Cutler, A., Otake, T. (2002). Rhythmic Categories in Spoken-Word Recognition. Journal of Memory and Language, 46, 296-322.

Cutler, A., Otake, T. (1994). Mora or Phoneme? Further Evidence for Langauge-Specific Listening. Journal of Memory and Language, 33, 824-844.

Goto, H. (1971). Auditory perception by normal Japanese adults of the sounds “L” and “R”. Neuropsychologia, 9, 317-323.

MacKain, K. S., Best, C. T., Strange, W. (1981). Categorical perception of English /r/ and /l/ by Japanese bilinguals. Applied Psycholinguistics, 2, 369-390.

McQueen, M. J., Otake, T., Cutler, A., (2001). Rhythmic Cues and Possible-Word Constraints in Japanese Speech Segmentation. Journal of Memory and Language, 45, 103-132.

Miyawaki, K., Strange, W., Verbrugge, R., Liberman, A. M., Jenkins, J. J., Fujimura, O. (1975). An effect of linguistic experience: The discrimination of [r] and [l] by native speakers of Japanese and English, Perception & Psychophysics, 18, 331-340.

Norris, D. G., McQueen, J. M., Cutler, A., & Butterfield, S. (1997). The possible-word constraint in the segmentation of continuous speech. Cognitive Psychology, 34 191-243

Otake, T., Hatano, G., Cutler, A., Mehler, J., (1993) Mora or Syllable? Speech Segmentation in Japanese. Journal of Memory and Language, 32, 258-278.

Sanders, L. D., Neville, H. J., Woldorff, M. G., (2002). Speech Segmentation by Native and Non-Native Speakers: The Use of Lexical, Syntactic, and Stress-Pattern Cues. Journal of Speech, Language, and Hearing Research, 45, 519-530.

Strange, W., Dittmann, S., (1984). Effects of discrimination training on the perception of /r-l/ by Japanese adults learning English. Perception & Psychophysics, 36(2), 131-145.