Thursday, July 31, 2008

New et al. (2006, PB&R) and Yarkoni et al. (in press, Neuroimage)

A length effect (or lack thereof) has traditionally been used to distinguish between serial and parallel processing. For lexical decision under central presentation, most studies have shown that reaction times are independent of length for words of 4 to 6 letters, leading to the conclusion that letters are processed in parallel. However, examine of a large corpus of data across wide range of lengths (New et al., 2006) revealed a more interesting, complicated picture. When the effects of frequency and neighborhood size are partialled out, length has a facilitative effect for words of 3 to 5 letters (i.e., shorter RTs for longer words), no effect for words of 5 to 8 letters, and an inhibitory effect for words of 8 to 13 letters. These experimental results are shown in blue in the figure below, regraphed from Fig. 2 of New et al. (2006).

It is straightfoward to explain this pattern under the SERIOL model, under the assumption that RT is the sum of two components: (1) the total time that it takes for all of the letters to fire and (2) the time it takes for the lexical network to settle following firing of the last letter. The total firing time is given by Len * firing-time/letter, where the firing-time/letter is assumed to be on the order of 15-20 ms/letter, corresponding to a firing rate of around 60 Hz. It is assumed that the settling function has the shape shown in red above; settling time decreases across increasing word length, and then asymptotes. That is, more letters provide more information, so the lexical network can settle more quickly for longer words. However, there is a limit to how quickly the lexical network can settle, so beyond a certain length the settling function is flat.

The points in green show modeled RT; it is equal to the settling function + length * 20 ms. It is evident that this modeled RT gives an excellent fit to the data. Modeled RT is decreasing across lengths 3 to 5, flat across lengths 5 to 8, and increasing across lengths 8 - 13.

But is there any evidence for the above assumptions? I have previously pointed out that ERP studies by Hauk and Pulvermuller are consistent with these claims. They have shown that increased length initially (~ 100 - 200 ms post-stimulus), leads to increased amplitudes near occipital cortex. Interestingly, this effect is lateralized to the RH, indicating that it is not merely a result of increased visual angle, as such an effect would be symmetric. Rather it is consistent with a serial encoding driven by a retinotopic activation gradient that is strongest over the initial letters (i.e., in the RH). Later on (300+ ms ) increased length leads to decreased amplitude from left posterior cortex. This reduced signal is consistent with the claim of faster lexical settling for longer words. Thus the timing, direction, and location of these effects are consistent with the proposal that longer words cause increased processing time at the letter level, followed by decreased settling time at the lexical level.

A recent fMRI study (Yarkoni et al., in press), has yielded even stronger evidence for the proposed settling function. They had subjects perform text reading under RSVP, and used regression analyses to determine effects of different variables in different brain regions. For the VWFA, they found that the effect of length had a quadratic component. The fitted function decreased across lengths 2 to 7, was fairly flat across lengths 7 to 10, and increased across lengths 10 to 13 (as shown in Fig. 5 of Yarkoni et al.). Thus, the observed effect of length in the VWFA across lengths 2 to 10 is very similar to the proposed settling function above. For very long words (> 10 letters), there is a mismatch between the two functions (i.e., increasing for the quadratic fit vs. flat for the proposed settling function), but the experimental estimate may be unreliable due to the relatively small number of very long words, coupled with the requirement of a quadratic fit.

I'd like to make one other point about the Yarkoni et al. article. In another analysis, they seek to discover whether the encoding in the VWFA is purely orthographic or whether it includes phonological information. They find an effect of phonological neighborhood-size that cannot be reduced to an effect of orthographic neighborhood-size, and conclude that the encoding in the VWFA is partially phonological. However, I would suggest that more precision is required in the statement of the issues and the interpretation of the results.

Due to interactivity between brain regions, an area that initially encodes only orthographic information could later be affected by feedback from a phonological area. Thus the VWFA encoding could well be purely orthographic during initial feedforward processing, and then VWFA activity could be affected later by phonological attributes. Thus the real question is whether the encoding in the VWFA is initially purely orthographic, not whether VWFA activity is ever influenced by phonological variables. The real question cannot be answered by fMRI, due to lack of temporal precision.

Wednesday, July 23, 2008

Recent articles by Wimmer & colleagues, and Pitchford & colleagues

In Hawelka and Wimmer (2008, Vision Research), young adult dyslexics and controls performed letter search on 5-letter strings. The target letter appeared prior to the string, and remained visible when the string appeared. Dependent variable was reaction time for detecting a present target. The authors found that the dyslexics were actually faster than the controls (with the same high accuracy) and concluded that "the slow reading speed of German dyslexic readers cannot be traced to inefficient visual processing of letter strings".

However, I would suggest that this conclusion is unwarranted. The task of detecting a letter within a string differs from the visual processing required for reading, where automatic encoding of all of the letters' positions within the string is necessary. Just because dyslexics are as fast as controls at detecting a letter target does not mean that they encode letter order in a normal manner. In fact, when processing requires fast automatic encoding of letter position across the entire string, dyslexics are notably impaired, as found by Hawelka et al. (2005; 2006, Vision Research), Enns, Bryson & Roes (1995, Can Jour of Exp Psych.) and various studies by Valdois and colleagues.

It is of interest to look at the RT patterns of the dyslexics vs controls in this letter search paradigm. Pitchway, Ledgeway and Masterson (in press, QJEP) did so for adult English dyslexias. They found an LVF advantage for controls, but not dyslexics. A similar pattern is also evident in Hawelka and Wimmer's (2008) data - numerically, controls were faster on position 2 than 4, but dyslexics were not. These patterns are consistent with my idea that normal readers perform rapid serial processing of letters of sub-parts of a single object (the string), whereas dyslexics process letters in parallel as individual objects.

The length of the string in these experiments (5 letters) is near the limit (~4) for the number of visual objects that can be processed in parallel. Thus dyslexics do not show increased RTs in the letter search task because they can process the five letters of the string mostly in parallel, but they do show a different RT pattern due to this parallel processing. For longer strings, the difference between the two styles of processing has stronger implications, because the rapid serial processing (at 10-15 ms/letter) allows ~10 letters to be processed per fixation, whereas parallel processing in highly-compensated adult dyslexics is probably restricted to 5 letters max, due to innate limitations on the visual systems' ability to process multiple objects in parallel. This accounts for the slow reading that is characteristic of dyslexic readers in transparent orthographies. (In English, dyslexics would have the same visual limit. Due to the irregularity, they may adopt the approach of processing only the salient letters, and guessing at the word. This yields faster, less accurate reading.)

Bergmann and Wimmer (in press, Cog. Neuropsychology) then examined performance of German dyslexics versus controls on lexical decision (LD) versus pseudohomophone decision (PD). (In PD, the answer is "yes" if the pronunciation of a pseudoword is a word, e.g., yes for "taksi", no for "tazi"). Looking at accuracy, they found that dyslexics were only slightly impaired (with respect to controls) for PD, but were highly impaired on LD. In fact, controls were better at LD than PD, while dyslexics were better at PD than LD. For RT, dyslexics were considerably slower than controls on both tasks.

This provides yet more evidence that, universally, the characteristic pattern of dyslexia is a limitation in the uptake of orthographic information, rather than a phonological deficit. However, predicated on their presumption that there is no deficit in the dyslexics' visual processing of strings, the authors place the dyslexics' deficits in three places: poor representations of orthographic word forms, slow connections between orthographic word forms and phonological word forms and slow connections between graphemes and phonemes.

But the data are explained more compactly via the proposal of abnormal, parallel encoding of letters as individual objects. This limits the number of letters that can be processed within a fixation. Furthermore, parallel processing probably also slows down grapheme-phoneme mapping within a fixation, as such translation likely functions more automatically under seriality. Both of these factors yield increased RTs. The parallel processing also prohibits the encoding of a string as a single object, which precludes normal representation of orthographic word forms, yielding poor LD performance.

Monday, July 21, 2008

Share (2008) in Psychological Bulletin

David Share has written a wonderful article entitled On the Anglocentricities of Current Reading Research and Practice: The Perils of Overreliance on an "Outlier" Orthography. The title says it all.

One issue that Share addresses is how we should conceive of multiple processing routes in reading. The standard division is on the lexical/sub-lexical dimension. He suggests that the important distinction is the learned/novel dimension (fluent/non-fluent). However, I would suggest that there really are two different processing routes (ventral, dorsal) and the proper distinction is the nature of visual/orthographic analysis. The high-level ventral orthographic representation (open-bigrams) is parasitic upon parts-based object recognition, and does not encode phonological information. The high-level dorsal orthographic representation is parasitic upon speechreading and encodes graphosyllablic information (i.e., letters grouped into onsets, vowels and codas). Both routes encode lexical information (i.e. there are connections from open-bigrams to lexical items, and connections from graphosyllables to lexical items), and both routes are activated by all letter strings.

Share stresses the importance of understanding how fluency arises. I would suggest that in order to do this, we must understand the nature of orthographic processing in exquisite detail; if orthographic processing qualitatively differs from normal, I think that fluency is not possible.

Thursday, July 17, 2008

Greek, RT patterns, and the final-letter advantage.

Nikki Pitchford, her student Maria Ktori, Tim Ledgeway and Jackie Masterson have been looking at reaction-time patterns for letter search. In this task, a target letter is presented, and then a random string of 5 letters is displayed, and the subject specifies whether the target letter is in the string. The string remains displayed until the response. Looking at reaction times for positive trials as a function of the position of the target, English readers show initial- and final-letter advantages; that is, a target in the first position is detected more quickly than one in the second position and a target in the fifth position is detected more quickly than one in the fourth position. However Greek readers presented with Greek stimuli show an initial-letter advantage, but not a final-letter advantage. This is true for children and adults.

Now the final-letter advantage is one of my favorite topics, and I've noted that the final-letter advantage is not present in English for exposure durations < 100 ms. My explanation has been that the final letter is not reached for exposures of < 100 ms (due to seriality); at longer exposures, the final letter is activated and can fire for an extended period because it is not inhibited by a subsequent letter, creating a final-letter advantage. So this advantage is taken to occur at the letter level.

However, this explanation is inconsistent with the Greek data. If the final-letter advantage occurs at the letter level, it should be present for Greek, but it is not. This caused me to think that perhaps the initial- and final-letter advantages actually arise an the open-bigram level. Since 2004, the SERIOL model has included edge bigrams, which encode the first and last letters. Recall that open-bigrams are taken to be specific to the ventral/visual route. Greek is a transparent language and there is evidence that transparent languages weight the dorsal/phonological route relatively more heavily than English. Thus ventral-route orthographic representations (i.e., open-bigrams) may play less of a role in Greek string processing than English string processing. If the final-letter advantage actually reflects activation of the final edge-bigram, this would explain why it is absent for Greek.

Note, however, that the original argument on temporal dependency still holds for English. For very brief exposures, the final letter is activated weakly or not at all, and so the final edge-bigram is weakly activated, so there is no final-letter advantage. At longer exposures, the final letter and the final edge-bigram are activated, and so there is a final-letter advantage.

The proposal that these letter effects occur at the bigram level also explains another aspect of their data. For English, they found that positional letter frequency influenced RTs at the first and final positions (i.e., faster RTs for letters more likely to occur at a given position), but there was no effect of positional frequency at the internal positions. Now, under the SERIOL model, the only position-specific representations are edge bigrams: letter units and non-edge open-bigrams are not position-specific. Thus the only positions at which position-specific letter effects could possibly occur are at the edges. If one assumes that frequency affects excitability of bigram units, and bigram excitability affects RTs, this then explains the effect of positional letter frequency at the exterior letters.

SSSR meeting

Recently got back from the Society for the Scientific Study of Reading conference, where I chaired a symposium, with Nikki Pitchford and Daisy Powell, on orthographic learning. We were heartened that there seems to be an increasing openness to the importance of visual/orthographic processing in reading, and felt that the symposium was well received.

Nikki gave a talk on RT patterns for letter search in 5-letters arrays in English, English dyslexic, and Greek readers. Her Greek data have caused me to reconsider my explanation of the final-letter effect somewhat, which I'll address in a subsequent post.

Daisy discussed experiments with poor readers without phonological deficits but with Rapid Automatized Naming deficits. These subjects had poorer orthographic knowledge than controls in general, but actually out-performed controls on orthographic learning in Share's self-teaching task. (In this task, pseudowords are included in passages read by subjects, and the subjects are later tested on the spelling of the pseudowords.) These were four-letter pseudowords. It would be interesting to try the experiment with longer pseudowords, as I think that four letters can be processed in parallel by dyslexics, and whereas processing should particularly break down on longer words. So the poor readers may depend on visual information more than the controls, and be capable of remembering this visual information better than controls for strings up to four letters.

Sylviane presented longitudinal data showing that her Visual Attention Span measure (the number of letters than can be reported following brief presentation of five letters) is predictive of reading achievement. Sylviane and I are both interested in gaining a better understanding of whether this task measures a general deficit in the ability to distribute visual attention across multiple objects, or is more specific to learned orthographic processing. As I mentioned in a previous post, it may well measure both, and a deficit may arise at different levels in different subjects.

Piers wowed everyone with MEG data showing early (~100 ms post-target) phonological priming in IFG and precentral gyrus.

I harped on my favorite subject - perceptual patterns for identification of briefly presented strings, and suggested that the trigram identification task could be used to measure whether normal visual/orthographic processing has been learned. In particular, the SERIOL model predicts that, at a given eccentricity, increased letter position within a string should have a much larger detrimental effect on accuracy in the LVF than the RVF for normal readers. Thus they should show an VF asymmetry on the effect of string position. If normal string processing has not been learned, the pattern should be symmetric, with little effect of string position in either VF. Data from 1 seventh-grade dyslexic and 7 age-matched controls, from Dubois et al. (2007), support this proposal, as discussed in this post. Clearly "more research is required".

Wednesday, June 25, 2008

What's wrong with American scientists?

A friend just got back from a computational linguistics conference in the U.S. She commented that she and her colleague had noticed that the best talks came from European labs, and that Europeans seemed more open to exchange of ideas than American researchers. This observation precisely matches my own experience and opinions.

In general, it seems to me that American researchers have little interest in any ideas but their own. All of their energy is spent churning out papers to further their careers. I think that this is due to the extreme competitiveness of the funding situation here. It results in thrashing - scientists spend all of their time maneuvering to get funding, based on safe incremental changes to their previously funded work. Anything not directly related to funding for their own research is of no use; it is as if they have blinders on. As a result, cronyism rules; established researchers will only help other researchers if there's something in it for them. If a new researcher independently generates original ideas, it is impossible to get ahead based solely on the quality of those ideas.

That's why the U.S. is losing its pre-eminence in science and Europe is gaining, as shown by an NSF study of where the leading papers in a variety of fields are being generated.

Work from Eran Zaidel's lab

Zaidel's lab has done a series of interesting experiments on hemifield lexical decision. The following findings seem particularly intruiging. (1) In English, LVF performance is worse than RVF performance for acceptance of words, but performance for rejection of pseudowords is equivalent across VFs. In Hebrew, this interaction is not present. (2) Reading ability (vocabulary and comprehension) is correlated with LVF/RH measures in English, but not Hebrew. To explain these phenomena, the authors suggest that lexical processing is more left-lateralized in Hebrew. I'd like to suggest an alternative account of these findings.

Consistent with imaging evidence for left lateralization of the VWFA, I assume that LVF/RH stimuli are projected to the LH pre-lexically, and that VF differences therefore arise at a pre-lexical, orthographic level. Recall that the SERIOL model posits a monotonically-decreasing activation gradient. In left-to-right languages, formation of the activation gradient requires learned visual processing in the RH in order to invert the acuity gradient into the activation gradient. In right-to-left languages, this learned processing would occur in the LH. The process of inverting the acuity gradient is especially costly in left-to-right languages because it is coupled with callosal transfer to the LH.

For words presented to the LVF/RH in English, incomplete inversion of the acuity gradient would lead to a non-optimal encoding of letter order, especially for longer words. This would tend to make words look unfamiliar, thus there would be an impact for accuracy on words, but not pseudowords, creating the observed interaction. More frequent readers would gain reading expertise in vocabulary, comprehension, and learned string-specific processing in RH visual areas. Thus the correlation between reading ability and LVF measures. In Hebrew, acuity-gradient inversion is not required in the LVF/RH, so LVF measures are not correlated with reading ability. Similarly, the word/pseudoword interaction is not present. (You don't see the opposite VF pattern because acuity-gradient inversion in the LH is more efficient than in the RH because it is not combined with callosal transfer.)

So I suggest that their findings indicate that specialized visual processing of strings is more right-lateralized in English and more left-lateralized in Hebrew. This idea of RH specialization for early visual string processing in English is consistent with the results of an EEG study, which showed that a length effect was right-lateralized initiallly (at 90 ms) and then became left-lateralized (at 200 ms) (Hauk, Davis, Ford, Pulvermuller & Marslen-Wilson, 2006).


Thursday, June 5, 2008

Dubois et al. (2007) in Cognitive Psychology

The authors look at perceptual patterns for a seventh-grade French dyslexic, MT, who has no phonological deficit. In particular, they look at trigram identification across a range of retinal locations (centered from -7 to 7), for MT versus seven age-matched controls. The authors fit curves to the trigram data, and did not find any difference between MT and the controls.

However, the SERIOL model makes quite specific predictions of how perceptual patterns should differ between dyslexics and controls, which the authors did not evaluate. The model predicts that a letter's position within the string should have a much stronger influence in the LVF than the RVF. This is due to the proposal of learned left-to-right inhibition in the LVF/RH. For younger readers, this effect should be strongest near fixation, where perceptual learning is the strongest. For example, accuracy for a letter at retinal location -2 should be much better when it is the 1st letter in the string than when it is the 3rd letter. In contrast, accuracy for a letter at retinal location 2 should be minimally affected by its position within the string. This asymmetry should be a signature of normal visual/orthographic processing, and it should be absent for dyslexics, under the assumption that they are not performing normal visual processing.


Indeed, inspection of the data in Figure 6, shown above, supports this prediction. In this figure, a filled circle represents the 1st letter in the LVF and the 3rd letter in the RVF. Conversely, an open circle represents the 3rd letter in the LVF and 1st letter in the RVF. For controls for eccentricities of 1 to 3 letter widths, it is evident that string position had a strong effect in the LVF, but not the RVF, while the pattern was symmetric for MT. Examination of the individual data shows that the asymmetric pattern held at the individual level.

Of course, this is a very small sample size. I would suggest that it is important to try this experiment on a large group of school-age controls and dyslexics to see how diagnostic this asymmetric vs. symmetric pattern truly is. If it is highly diagnostic, this would be quite informative as to the nature of core deficits in developmental dyslexia.

Tuesday, June 3, 2008

Overlap model - Gomez, Ratcliff & Perea (in press) in Psychological Review

In this paper, the authors introduce the Overlap Model of letter-position encoding, which is essentially a sloppy slot-based model. For example, an L in the third position would activate an encoding for L in the third position, as well L in the second and fourth positions, to a lesser degree. The authors present a series of experiments to determine parameter settings for the model, which govern the amount of spread for each letter position. In the experiments, a five-letter string was presented for 60ms, followed by a mask and 2AFC. The choices were strings, and the way in which the distractor differed from the target was systematically varied. Note also, that the choices were presented well below where the string occurred, so they did not act as an additional backward mask.

From my point of view, the most interesting aspect of this paper is the finding that the final letter was the least well recognized and localized. In contrast, at longer exposures (>= 100 ms), the usual final-letter advantage can be observed. This contrast is consistent with serial processing and the resulting account of the final-letter advantage, and is difficult to explain otherwise.

However, as a model of letter-position encoding, the Overlap Model faces some difficulties.
  1. It is not a full model, as it does not explain how the positions are computed. How is the retinotopic representation transformed into a string-centered positional representation?
  2. It cannot explain the finding that, for nine-letter words, the prime 6789 provides facilitation (Grainger et al., 2006 in JEP:HPP). Even with a sloppy position encoding, there would be too much difference between the letter's positions in the prime and target to provide any overlap. Nor could the model be modified to include a position encoding anchored at the final letter; their experiments do not support the existence of such an encoding, as the final letter was the least well anchored/localized (in contrast to the initial letter, which was the best anchored/localized).

Martin et al. (2007) in Brain Research

In this study, the subjects performed the Reicher-Wheeler task on five-letter words versus unpronounceable nonwords, for exposure durations of 50 and 66 ms. The stimuli were presented so that the target letter always occured at fixation. So for example, when the target was the second letter, the second letter appeared at fixation, putting the first letter in the LVF and the third to fifth letters in the RVF. Thus the retinal location and visual acuity of the target letter did not vary with its position within the string. The task was performed by adult unimpaired readers and dyslexics.

This provides an opportunity to look at how accuracy interacts with string position and exposure duration. First we consider unimpaired readers. Under the assumption of serial processing, some letters may be read out before the mask occurs, and others will be read out after the mask occurs. The latter letters should be at a disadvantage. In general, the SERIOL model predicts that an increase in exposure duration should have the strongest effect at string positions in the transition zone (i.e., letters that were formerly read out after the mask occurred, but now are read out before the mask.) In this experiment, the change in exposure duration was 16ms, which is on the time scale proposed for per-letter processing. So at first glance, this suggests that early string positions should not be affected by an increase in exposure (because they are read out before the mask in any case) and later string positions should also not be affected (because they are read out after the mask in any case), while a transitional position should be affected. Here are the results from the experiment:

The nonword results for control subjects show an asymmetric effect of increased exposure, with the largest improvement for position 1, and no improvement at positions 4 and 5. This pattern is difficult to explain under parallel processing, but does not exactly match the SERIOL intuition that the improvement should be localized at the position that was not read out at 50 ms, but was read out at 66ms.

However, let's consider the mechanics in more detail. Due to strong bottom-up activation in the LVF/RH, an increase in string position at fixation will not necessarily cause that letter to be read out (at the letter level) a full "time slot" (~15 ms) later, because the additional LVF/RH letters in earlier positions can "fill in" earlier time periods. That is, at the feature level, an initial letter at -1 reaches a higher activation than an initial letter at 0 (fixation). Hence, for a letter at fixation, activations (at the feature level) are similar for position 1 versus position 2. Therefore the timing of activation at the letter level does not vary much either. This is illustrated in the following figure, which shows the proposed time that a letter starts firing at the letter level, based on its retinal location and string position. It shows how each increase in string position from 1 to 3 at fixation could delay firing by ~5 ms, rather than ~15 ms. There is a much larger difference going from position 3 to 4 under the assumption that an initial letter at -3 is too far from fixation to reach maximal activation at the feature level; the reduced activation level then percolates through the string, due to RH-left-to-right and cross-hemispheric lateral inhibition.


Under this account, for the 50 ms exposure, the letter at fixation does not fire before the mask appears. For a 67 ms exposure, the letter at fixation can start to fire before the mask when it is in positions 1, 2, or 3. This explains the observed interaction of increased exposure with string position. (However, this doesn't explain why having the third letter at fixation yields the poorest results overall. This may be due to greater positional uncertainty about the middle letter.)

It is also interesting that there was no or a very weak initial-letter advantage in the nonword data. This is consistent with the idea that the initial-letter advantage is essentially a LVF non-initial-letter disadvantage. That is, when a second letter falls in the LVF, it receives much more additional lateral inhibition (at the feature level) than when it is the first letter in the LVF. In contrast, when a second-letter falls at fixation, it only receives slightly more lateral inhibition than when it is the first letter. Thus the advantage for being the first letter is much reduced at fixation, compared to retinal locations in the LVF. In contrast, Tydat and Grainger (in press, JEP:HPP) claim that the initial-letter advantage is due to reduced receptive-field sizes for letters, such that a letter receives considerably less inhibition with 1 immediate flanker than with 2 flankers. This account incorrectly predicts that an initial-letter advantage should be present at fixation.

Note that the best overall firing patterns are obtained when fixation falls on the second or third letter. That is, these conditions allow the earliest completion of letter readout. This explains the OVP effect observed in the word conditions. Thus the word and nonword conditions yield different patterns, with fixation on the third letter yielding the poorest results for nonwords, but the best results for words. This is because accuracy in the word condition is influenced by the processing of the entire string (to yield lexical activation), which is best at positions 2 and 3, while accuracy in the nonword condition is influenced primarily by the processing and localization of the target letter.

It is also interesting to see that the dyslexics showed a different pattern. First, there was no position X exposure-duration interaction. This is consistent with my proposal that dyslexics process letters in parallel, while unimpaired readers process them serially. Secondly, there was no word-superiority effect, except when fixation fell on the third letter. This may indicate that these dyslexics use a retinotopic method to encode letter position, which is keyed to having two letters in the LVF. When the presentation condition matches this requirement, lexical representations are well activated; otherwise they are not. This is consistent with the case study of a single French dyslexic (Dubois et al., 2007, Cognitive Neuropsychology), which showed that lexical recognition was best when fixation fell on the third letter, independently of string length.

Monday, June 2, 2008

Shalev, Mevorach & Humphreys (in press) in Neuropsychologia

The authors investigate the deficits of two patients with parietal lesions. They find that the patients have a selective deficit in encoding the order of letters in a string, with spared ability to identify letter identities. For example, the patients are much more likely to make false positive responses (in lexical decision) to nonwords formed by transposing letters of words than to nonwords formed by replacing letters. In contrast, a patient with a left occipitotemporal lesion did not show this pattern.

The authors conclude that letter identity and position are encoded separately. Suprisingly, in the ensuing discussion of models of orthographic encoding, they do not reference any of the recent developments in this area; the most recent reference is to 2001 paper on the dual-route model.

I would explain their results as follows. In my article on alexia, I propose that the serial letter representation is transformed into two different high-level orthographic representations - an open-bigram encoding on the ventral (occipitotemporal) route, and a graphosyllabic encoding on the dorsal (occipito-parieto-frontal) route. The latter would provide a more robust encoding of letter order, as open-bigrams introduce ambiguity. If the dorsal graphosyllabic encoding is abolished, the result should be a less robust encoding of letter order in lexical processing, leading to a decreased sensitivity to transpositions. This is exactly what is observed in these parietal patients.

Burgund & Edwards (2008) in Neuroreport

In this fMRI study of letter priming in the VWFA, the authors compared same-identity and different-identity primes that both had moderate visual similarity to the target. They found no advantage for the same-identity primes, and concluded that the VWFA does not employ abstract letter representations.

However, the task that the subjects performed was based on a visual attribute - whether the letter had an enclosed space. It is perhaps not surprising then that priming was determined by the visual similarity between the prime and the target. It is quite possible that a task based on letter identity would show a different pattern of results.

Also, there is some evidence (e.g., studies by Gauthier and colleagues) that single letters are processed differently than strings. So studies using single-letter tasks may not tap into the abstract letter representations used for string processing.

Work by Sylviane Valdois

In March, I had a marvelous visit to Sylviane's lab in Grenoble. She does very interesting work on string processing in dyslexia, and has shown that dyslexics are able to report fewer letters per fixation than unimpaired readers. We both agree that this is related to a deficit in the ability to allocate visual attention, although we have somewhat different interpretations. She thinks that dyslexics have a general deficit in the ability to allocate attention across multiple objects, whereas I think that dyslexics have a covert-attention deficit that interferes with letter-string processing in particular.

However, like many things in neuropsychology, there may be multiple contributing factors. Sylviane showed me some individual data that convinced me that some dyslexics do indeed have trouble distributing attention over more than one or two objects. I suspect that there are probably other subjects whose deficit is specific to letter strings.

Cohen et al. (2008) in Neuroimage

In this fMRI experiment, the authors presented words that were progressively degraded, under three manipulations:
  • Shifting the word into the LVF
  • Increasing the spacing between letters
  • Rotating the string
Within each manipulation, there were 5 levels of degradation, where level 1 was normal presentation, and level 5 was maximally degraded. For levels 4 and 5, the authors found a behavioral length effect under all three degradations. Parietal activity increased from level 1 to level 5. The authors conclude that words are normally processed in parallel, while degradation causes attention-driven, serial processing.

I wrote a commentary on this article, but Neuroimage would not publish it. Briefly, the article points out that there are 3 main problems with their analysis.
  • If there's an abrupt shift in processing mode at the onset of the length effect (between levels 3 and 4), parietal activity should show a large jump between levels 3 and 4. However within each manipulation, parietal activity was similar for levels 3 and 4.
  • Attention-driven processing cannot explain the time scale of the length effect, which was ~20 ms/letter, as it's been shown that serial covert shifts of attention take at least 300 ms per shift.
  • The authors cannot explain the results of Whitney & Lavidor (2004), who showed that the LVF length effect can be abolished via a contrast manipulation.
Furthermore, it is straightforward to explain their results under the SERIOL model. As I previously proposed in email to Andy Ellis, degradation would interfere with automatic bottom-up formation of the locational gradient. Therefore, top-down attention is recruited to form the activation gradient. This yields a less finely tuned (steeper) gradient than normal, and a length effect emerges from the usual serial processing. See the commentary for details.

Friday, May 30, 2008

Vinckier et al. (2007) in Neuron

In this fMRI study, the authors varied the word-likeness of string stimuli at six levels - false fonts, rare letters forming rare (contiguous) bigrams, frequent letters forming rare bigrams, frequent letters forming frequent bigrams but rare quadrigrams, frequent bigrams forming frequent quadrigrams, words - and investigated the sensitivity of brain regions around the VWFA to this manipulation. In their Discussion, the authors state:
"Our results demonstrate effects of letter and quadrigram frequency above and beyond those of bigram frequency, suggesting that all of these levels (Dehaene et al., 2005), not just bigrams (Grainger & Whitney, 2004; Whitney, 2001), may be useful subcomponents of visual word recognition."

However the claim that their data provides evidence for quadrigram detectors is tenuous, at best. The claim comes from the comparison of frequent bigrams forming rare quadrigrams vs. frequent bigrams forming frequent quadrigrams. However, this comparison is confounded with the pronounceability of the stimuli. The rare quadrigram stimuli were not pronounceable, whereas the frequent quadrigram stimuli were pronounceable. Thus frequent-quadrigram stimuli were much more likely to yield partial activation of lexical representations, and therefore any difference between the two may reflect different levels of lexical activation, rather than quadrigram activation. This is supported by their finding that only words and frequent quadrigrams yielded significant activation of posterior middle temporal gyrus, a region associated with lexico-semantic processing.

On the other hand, the contrast between frequent letters forming rare bigrams vs frequent letters forming frequent bigrams provides does not suffer this confound, as both types of strings were not pronounceable. Differences between between these types of stimuli were found in middle/anterior left fusiform. Binder et al. (2006, Neuroimage) also found sensitivity to bigram frequency in this area in another fMRI study. These results support the claim of multi-letter units, such as open-bigrams, and are difficult to explain under models that do not include them, such as Davis's SOLAR model and Gomez et al.'s Overlap model.

Tydgat & Grainger (in press) in JEP:HPP

In this paper, the authors look at perceptual patterns for 5-character strings composed of letters, symbols (i.e., non-alphabetic characters such as @), or a combination. The string was presented for 100 ms, and then masked. As has been previously found, there is no external character advantage for symbols; the first and last symbol were perceived no better than the second and fourth symbols. For letters, they found a strong initial-letter advantage, but in experiments 1-4, there was no final-letter advantage. This pattern held for letter targets in strings of mostly symbols, and for symbols in strings of mostly letters. This shows that the patterns are inherent to the characters themselves, and not driven by different patterns of attention to letter vs. symbol strings.


In experiments 5 and 6, the authors attained a final-letter advantage. Experiments 1-4 used partial report 2AFC, where the two choices appeared directly above and below the target position when the backward mask appeared. Experiment 5 used free report, and experiment 6 used 2AFC, but target position was indicated by surrounding lines and the two choices appeared well below where the stimulus had been. Thus it appears that surrounding letters inhibit the final-letter advantage, but not the initial-letter advantage.

The authors proposed that this asymmetry was due to different receptive-field shapes in the two hemispheres, with RVF/LH fields being circular and LVF/RH fields being elongated along the horizontal axis. As a result, there is more vertical interaction in the RVF/LH and so characters above and below the target have a stronger masking effect, explaining the lack of final-letter advantage when the choices appeared around the target letter.

The authors also noted that experiments 5 and 6 yielded a W-shaped pattern, where accuracy levels were similar for the first, third, and fifth letters, and accuracy levels were lower and equal to each other for the second and fourth letters. The authors claim that this symmetric pattern is inconsistent with the left-to-right serial processing proposed in the SERIOL model.

As for the difference between letters and symbols, the authors suggest that receptive fields are larger for symbols than for letters. Symbol detectors cover the neighboring positions, whereas letter detectors are more narrowly focused. Therefore, a single neighboring symbol has a strong ceiling effect and there is little advantage for having only one neighbor. In contrast, a single neighboring letter has a much weaker effect, and so there is an advantage for having only one neighboring letter, and so the outer letters are more easily perceived.

Before commenting on this article, I would like to say how glad I am to see that Grainger and colleagues are now investigating perceptual patterns. I have been arguing for quite a while that examination of perceptual patterns can tell us a lot about visual specializations for letter string processing, and its good to see experimentation in this area.

Once again, JEP:HPP did not ask me to review this paper, despite the fact that I am probably the leading expert on perceptual patterns for letter strings. So I make my comments here ...

First, about the final-letter advantage. I would say that the reason that surrounding letters affected the final letter but not the initial letter was because they appeared when the final letter was being read out. Due to seriality, the initial letter had already been fully processed at that point, and so was not affected. Thus I suggest that the effect depends on timing and position within the string, not on retinal location.

Now suppose you had an LVF letter at the same retinal location as the letter in position 1 and an RVF letter in the same location as the letter in position 5, but each of those letters appeared as the first or second letter in a string. Under Tydgat and Grainger's account, vertically- surrounding choices should still yield an RVF disadvantage, because of stronger vertical interactions inherent to the RVF/LH. Under my account, vertically-surrounding choices should not have a stronger effect on the RVF letter, because the letter is now read out early enough that is processed prior to the appearance of the choices. In a subsequent investigation (Tydgat and Grainger, submitted), the authors performed experiments with just such conditions. There was no disadvantage for the RVF in these experiments, contradicting their account.

In fact there's evidence that vertically surrounding letters actually have a stronger effect in the LVF/RH than the RVF/LH, the direct opposite of their account. For example consider identification of vertical trigrams presented to a single VF. It's been established that perception of individual letters is equivalent in the two VFs, so any differences between VFs in trigram identification come from interactions amongst the letters. If there are stronger vertical interactions in the RVF/LH, the middle letter of a vertical consonant trigram should be perceived worse in the RVF than the LVF. However, the opposite is the case; the middle letter is perceived better in the RVF than the LVF (e.g., Luh and Wagner, 1997, Brain and Language).

By the way, I have previously addressed the issue of different perceptual patterns for vertical LVF and RVF trigrams, and suggested that trigrams are first mentally transformed to the horizontal (Whitney, 2001, PB&R). I now think it is more likely that the different patterns come directly from vertical interactions. I've proposed that there's strong left-to-right inhibition in the LVF/RH to invert the acuity gradient. This inhibition occurs, by definition, along the horizontal axis. However, in the brain, angular directions are usually coarsely coded, and so it may be the case that the left-to-right inhibition "bleeds" onto the vertical dimension, giving increased vertical interactions in the LVF/RH for letter strings.

Also in opposition to Tydgat and Grainger's account of the final-letter advantage is its dependency on exposure duration. According to the SERIOL model, the final-letter advantage arises when the final letter can fire for a longer period of time than the previous letter. For very brief presentations, there should be no final-letter advantage, even when there are no surrounding letter choices. This is indeed the case. For example Gomez, Ratcliff & Perea (in press, Psych Rev) presented five-letter strings for 60 ms, followed by a mask and 2AFC of two strings presented well below the stimulus location. There was no final-letter advantage at this exposure duration.

In motivating their proposal of hemisphere-specific receptive field shapes, Tydgat and Grainger suggest that the importance of the initial letter causes elongated receptive fields to develop in the LVF/RH and that these fields are asymmetric in that they encompass the letter to the right, but not the left. Note that this implies that LVF letters are quite sensitive to interference from a letter to the left, something that I've been arguing for years. However, I suggest that this is a result of lateral inhibition, rather than receptive-field shape. Note that the presence of additional letters to the left (two or three) has a stronger inhibitory effect in the LVF than the RVF (e.g., Wolford & Hollingsworth, 1974, Perception & Psychophysics). This is easily explained under lateral inhibition, but difficult to explain via receptive fields, as implausibly large fields would be required.

Now for the issue of the W-shaped accuracy pattern. In explaining hemifield accuracy patterns ( e.g., Whitney, in press, LC&P), I have assumed that accuracy is primarily governed by activation at the letter level, which is taken to index the probability that the letter is sufficiently activated to be consciously accessible for report. The probability that the activated letter is the actually the correct one is taken to be approximately constant across the string. Recall that the acuity gradient is steep within the fovea, and is shallow in the parafovea. Thus for parafoveal presentation, the difference in acuity across letter positions is relatively small. So it makes sense to assume that the probability that the correct letter is activated is approximately constant, and that activity level is the primary determinant of accuracy. However, in the fovea, acuity differences across string positions are much larger, meaning that the probability that the correct letter is activated varies substantially with string position.

Hence, for accuracy under foveal presentation, availability for report should be weighted by the probability that the correct letter is activated, which would be inversely proportional to feature-level noise. Feature-level noise is determined by acuity, and also by whether the letter receives left-to-right inhibition (under the assumption that such inhibition may be non-uniform and thereby directly introduce noise).

Hence, for five-letter centrally-fixated strings, the probability that the correct letter is activated is proportional to the acuity gradient, with a decrement at the second position, due to LVF/RH left-to-right inhibition. Next we consider the activity pattern at the letter level, which determines probability of accessibility for report. For foveal presentation, the resulting locational gradient would be fairly steep and smooth. As a result, activation of the first four letters would be determined primarily by firing rate. So activation (at the letter level) would decrease across the first four positions, and then rise for the final letter. The figure below illustrates examples of these two probability patterns (correctness and accessibility). At each position, Accuracy (shown in green) is the product of the two probabilities. This yields a symmetric W shape, in line with the experimental data.



Thus a symmetric W-shaped pattern is not inconsistent with serial processing. Serial processing yields the activity pattern in black, which when combined with the effects of feature-level noise (in red), gives the W-shaped pattern. Now if the task instead involved an explicit temporal component, such as reaction time for letter search in a string, we would expect this pattern to be inverted and weighted by the time that it takes to reach the target letter during serial processing. This would yield an asymmetric, upward-sloping M, which is indeed what is observed for the search task (e.g., Pitchford, Ledgeway, & Masterson, 2008, J. of Res. in Reading).


As for Tydgat and Grainger's account of letter vs symbol patterns, note that this explanation depends on the assumption of retinotopic symbol detectors. However, it seems unlikely that we have sufficient experience with symbols to develop location-specific symbol detectors. Instead, I suggest that the letter pattern is a signature of serial processing. An initial-letter advantage arises because the initial letter has the highest activation at the feature level, which causes it to have the highest activation at the letter level. A final-letter advantage arises when the final letter can fire for an extended period of time. Therefore, the final-letter advantage is less robust (than the initial-letter advantage); it is abolished by very brief exposures, or under lateral masking when the final letter is being processed. Symbols do not show this pattern because they are not processed serially; their perceptual pattern depends only on acuity.

Tuesday, May 27, 2008

Pure alexia

The SERIOL model offers a novel account of pure alexia. The usual interpretation is that the spared RH cannot process letters in parallel, thus serial letter-by-letter reading is performed instead. I suggest instead that letters normally undergo very rapid serial analysis, and the RH cannot provide this rapid serial processing. Because serial input is required by the spared lexical system, overt serial processing is performed instead. For a more detailed development of this idea, including a novel proposal for the nature of RH-specific visual processing and a more detailed specification of the dorsal route in the SERIOL model, see this article.

Guerrera & Forster (2008) in LC&P

The authors investigated priming of eight-letter targets in lexical decision, where the primes employed "extreme transposition". This article includes match scores from the SERIOL model. Again, these scores are based on the original (position-specific) formula for bigram activation, rather than the current assumption that bigram activations depend only on the separation of the constituent letters. Also, the article incorrectly states that the original specification of the SERIOL model placed no limit on the number of intervening letters allowed for bigram activation, whereas this limit has always been two letters.

So here are the match scores under the current parameters, where bigram activation is 1.0 for contiguous letters, 0.8 for a one-letter separation, and 0.4 for a two-letter separation (and 0.0 otherwise).
Note that these numbers are quite different than the published ones for Experiment 3 (in particular) and provide a better fit to those data.

Because bigram activations are no longer positional, the third and fourth primes above now give equivalent match scores. However, simulations give an advantage for matching initial versus final letters due to seriality, as discussed in Whitney (in press).

SOLAR Model - Davis & Bowers (2006) in JEP:HPP

For five-letter targets in lexical decision, the authors compare primes of the form 1d345 vs. 13d45. The SOLAR model predicts stronger priming for 1d345. In the original specification of the SERIOL model, open-bigram activation decreased with increasing string position. Under this assumption, the SERIOL model predicts stronger priming for 13d45. Davis and Bowers found stronger priming for 1d345 than 13d45, and concluded that the SERIOL model is inconsistent with their experimental results.

However, the conclusions of Davis and Bowers (2006) are based on an obsolete parameterization of the SERIOL model. In my 2004 dissertation, the assumption that bigram activation depends on string position was dropped; bigram activations now depend only on the separation between the constituent letters. As a result, the SERIOL model, like the SOLAR model, predicts stronger priming for 1d345 than 13d45. Thus the priming results do not actually differentiate between the two models. (For a discussion of why the positional variation was originally included, why it was dropped, and how the model now accounts for the phenomenon originally explained by the positional variation, see Whitney (in press, LC&P) ).

Despite the fact that a primary focus of Davis and Bowers (2006) was to compare the SERIOL and SOLAR models, JEP:HPP did not invite me to review their article. Hence, their article includes claims that are no longer accurate. (It also includes a claim that is outright incorrect - that even if the positional variation were dropped, the SERIOL model would still be inconsistent with their results.) Nor would JEP:HPP allow me to publish a reply. Fortunately, I found a fairer venue, and my reply to Davis and Bowers (2006) is now in press at Brain and Language.

Briefly, the main points of the response are:
  • The SERIOL model is consistent with the results of Davis & Bowers (2006).
  • SOLAR's lexical activation function is not biologically plausible.
  • A recent re-parameterization of the SOLAR model leads a contradiction of Davis and Bower's core claim that contiguity matters.
  • Experimental comparisons between models should test inherent aspects of the models and not depend on particular parameterizations.
  • A critical comparison is presented. That is, I specify a pair of primes for which the two models inherently make different predictions. (Colin Davis agrees with this analysis.)

Friday, May 23, 2008

An anti-rant

In the last post, I ranted about unfair treatment by TICS. I do want to say that, for the most part, I've been treated very fairly by reviewers and editors, with respect to my own publications and being asked to review others' publications. I greatly appreciate this fair treatment. TICS and JEP:HPP (more on JEP in later posts) are the notable exceptions. I guess the highest-status journals only care about status. Makes sense, I suppose.

I also greatly appreciate how I am treated by most European colleagues. For example, at ESCOP I talked with a number of people who were not aware that I do not have a research position. When I told them that I do not, it did not seem to diminish their interest in my work at all.

Goswami & Ziegler (2006) in TICS

In their TICS Letter, Goswami and Ziegler rightly argue that the sub-lexical, phonological route must also be considered in proposals for how letter-order is encoded. They point out that open-bigrams are not suitable for sub-lexical processing, as they are not phonologically relevant units.

However, in the extended SERIOL model (Whitney & Cornelissen, 2005), the serial encoding of individual letters provides input to both the ventral (lexical) and dorsal (sub-lexical) routes. On the ventral route, the letters activate open-bigrams, which activate visual word forms. On the dorsal route, the letter sequence is parsed into a grapho-phonological representation. Thus the serial encoding provides a location-invariant encoding of individual letters, which of course provides suitable input for either route.

Despite the suitability of SERIOL’s representations for the demands of phonological processing, Goswami and Ziegler (2006) argued that “This solution ignores data showing that phonology affects the lexical route, such as body-neighborhood effects in lexical decision (Ziegler & Perry, 1998)”. It is unclear what they could mean by this statement.

  • First, Whitney (2004) specifically discussed the data presented by Ziegler and Perry (1998), showing in detail how the SERIOL model explains their findings.
  • Second, the general issue of interaction between the lexical and sub-lexical routes is orthogonal to the question of how letter order is encoded. Presumably, the lexical and sub-lexical routes converge onto the same lexical representations. Feedback from the lexical representations to letters and phonemes would then cause interaction between the routes. This connectivity pattern is independent of how letter position is encoded.
And now for a rant. Despite the fact that Goswami and Ziegler's Letter and Dehaene's article on the LCD model both focus on the open-bigram representation and I was the first to propose such a representation (Whitney & Berndt, 1998), I was not asked by TICS to review either article. I do know that a co-author was asked to review Goswami and Ziegler. To ask this person and not me is ludicrous, as their Letter focused on my idea. It is just one example of how status trumps accomplishment in science. Apparently, having a research position (as does my co-author) is more important to TICS than ownership of ideas. (Due to geographic constraints and lack of connections in the U.S., I have been unable to obtain a research position; I work independently and collaborate on experiments.)

Thursday, May 22, 2008

LCD model - Dehaene et al. (2005) in TICS

Following Whitney's and Grainger's proposal that the highest pre-lexical orthographic encoding on the lexical (ventral) route is comprised of non-contiguous bigrams (dubbed open-bigrams by Grainger), Dehaene and company got into the act with their Local Combination Detector (LCD) model. While the model is somewhat vague, they do make two specific claims:

  1. Open-bigram representations do not provide sufficient accuracy in encoding letter order. Therefore, the highest level of the LCD model includes quadrigram detectors to provide a more precise encoding of letter order.
  2. Open-bigram-like representations occur as a result of retinotopic bigram detectors operating over noisy retinotopic representations of individual letters.

Claim (1) has some problematic aspects:

  • The model does not include a location-invariant encoding, as the quadigram detectors are retinotopic.
  • Quadrigrams do not provide a realistic similarity metric. For example, LAME and LIME would activate different quadrigrams, making them completely different from each other at the lexical level.
  • The authors only considered on/off open-bigrams with no encoding of edges. The addition of graded activations and edge bigrams, as in the SERIOL model, allows more accurate encoding of order information.
  • However, there is evidence that letter-order encoding on the ventral route is indeed somewhat imprecise, whereas the encoding is more accurate on the dorsal phonological route. Occipito-parietal lesions lead to a selective deficit in encoding letter order (Friedmann & Gvion, 2001; Shalev, Mevorach & Humphryes, in press) . See also Frankish & Turner (2007) . So experimental results indicate that open-bigrams do not need to be "fixed" with quadrigrams.

In contrast, claim (2) above offers a reasonable account of how open-bigrams could be activated within a parallel framework. Grainger et al. (2006) incorporated this suggestion into their Overlap Open-Bigram (OOB) model. Within the abstract open-bigram layer, the OOB model is quite similar to the SERIOL model, in that it employs open-bigrams with graded activation levels. The only difference is that the OOB model includes activation of transpositions (e.g., BIRD would activate bigram RI to a low level). Of course, the two models radically differ on how the open-bigrams become activated, as the SERIOL model proposes that open-bigrams are activated serially. This serial mechanism explains perceptual patterns for consonant strings, while parallel accounts do not. For a detailed comparison of serial versus parallel activation of open-bigrams, see Whitney & Cornelissen (2008).

Monday, May 19, 2008

Welcome to Orthoblography

This is a scientific blog dealing with orthographic encoding in visual word recognition, written from the point of view of the SERIOL model. Gratifyingly, research into orthographic processing in normal and dyslexic populations is heating up. With the increasing activity in this field, I wanted forum in which to comment on recent publications (especially those that make incorrect claims about the SERIOL model). I also wanted a place in which to rapidly disseminate new ideas. Hence, Orthoblography.

This blog is intended for researchers already familiar with the model. If you are not familiar with it, but would like to be, you should first read Whitney (2001). Since that publication, modifications have been made to the bigram layer of the model, as described in Whitney (in press; Language and Cognitive Processes).