Friday, May 30, 2008

Vinckier et al. (2007) in Neuron

In this fMRI study, the authors varied the word-likeness of string stimuli at six levels - false fonts, rare letters forming rare (contiguous) bigrams, frequent letters forming rare bigrams, frequent letters forming frequent bigrams but rare quadrigrams, frequent bigrams forming frequent quadrigrams, words - and investigated the sensitivity of brain regions around the VWFA to this manipulation. In their Discussion, the authors state:
"Our results demonstrate effects of letter and quadrigram frequency above and beyond those of bigram frequency, suggesting that all of these levels (Dehaene et al., 2005), not just bigrams (Grainger & Whitney, 2004; Whitney, 2001), may be useful subcomponents of visual word recognition."

However the claim that their data provides evidence for quadrigram detectors is tenuous, at best. The claim comes from the comparison of frequent bigrams forming rare quadrigrams vs. frequent bigrams forming frequent quadrigrams. However, this comparison is confounded with the pronounceability of the stimuli. The rare quadrigram stimuli were not pronounceable, whereas the frequent quadrigram stimuli were pronounceable. Thus frequent-quadrigram stimuli were much more likely to yield partial activation of lexical representations, and therefore any difference between the two may reflect different levels of lexical activation, rather than quadrigram activation. This is supported by their finding that only words and frequent quadrigrams yielded significant activation of posterior middle temporal gyrus, a region associated with lexico-semantic processing.

On the other hand, the contrast between frequent letters forming rare bigrams vs frequent letters forming frequent bigrams provides does not suffer this confound, as both types of strings were not pronounceable. Differences between between these types of stimuli were found in middle/anterior left fusiform. Binder et al. (2006, Neuroimage) also found sensitivity to bigram frequency in this area in another fMRI study. These results support the claim of multi-letter units, such as open-bigrams, and are difficult to explain under models that do not include them, such as Davis's SOLAR model and Gomez et al.'s Overlap model.

Tydgat & Grainger (in press) in JEP:HPP

In this paper, the authors look at perceptual patterns for 5-character strings composed of letters, symbols (i.e., non-alphabetic characters such as @), or a combination. The string was presented for 100 ms, and then masked. As has been previously found, there is no external character advantage for symbols; the first and last symbol were perceived no better than the second and fourth symbols. For letters, they found a strong initial-letter advantage, but in experiments 1-4, there was no final-letter advantage. This pattern held for letter targets in strings of mostly symbols, and for symbols in strings of mostly letters. This shows that the patterns are inherent to the characters themselves, and not driven by different patterns of attention to letter vs. symbol strings.


In experiments 5 and 6, the authors attained a final-letter advantage. Experiments 1-4 used partial report 2AFC, where the two choices appeared directly above and below the target position when the backward mask appeared. Experiment 5 used free report, and experiment 6 used 2AFC, but target position was indicated by surrounding lines and the two choices appeared well below where the stimulus had been. Thus it appears that surrounding letters inhibit the final-letter advantage, but not the initial-letter advantage.

The authors proposed that this asymmetry was due to different receptive-field shapes in the two hemispheres, with RVF/LH fields being circular and LVF/RH fields being elongated along the horizontal axis. As a result, there is more vertical interaction in the RVF/LH and so characters above and below the target have a stronger masking effect, explaining the lack of final-letter advantage when the choices appeared around the target letter.

The authors also noted that experiments 5 and 6 yielded a W-shaped pattern, where accuracy levels were similar for the first, third, and fifth letters, and accuracy levels were lower and equal to each other for the second and fourth letters. The authors claim that this symmetric pattern is inconsistent with the left-to-right serial processing proposed in the SERIOL model.

As for the difference between letters and symbols, the authors suggest that receptive fields are larger for symbols than for letters. Symbol detectors cover the neighboring positions, whereas letter detectors are more narrowly focused. Therefore, a single neighboring symbol has a strong ceiling effect and there is little advantage for having only one neighbor. In contrast, a single neighboring letter has a much weaker effect, and so there is an advantage for having only one neighboring letter, and so the outer letters are more easily perceived.

Before commenting on this article, I would like to say how glad I am to see that Grainger and colleagues are now investigating perceptual patterns. I have been arguing for quite a while that examination of perceptual patterns can tell us a lot about visual specializations for letter string processing, and its good to see experimentation in this area.

Once again, JEP:HPP did not ask me to review this paper, despite the fact that I am probably the leading expert on perceptual patterns for letter strings. So I make my comments here ...

First, about the final-letter advantage. I would say that the reason that surrounding letters affected the final letter but not the initial letter was because they appeared when the final letter was being read out. Due to seriality, the initial letter had already been fully processed at that point, and so was not affected. Thus I suggest that the effect depends on timing and position within the string, not on retinal location.

Now suppose you had an LVF letter at the same retinal location as the letter in position 1 and an RVF letter in the same location as the letter in position 5, but each of those letters appeared as the first or second letter in a string. Under Tydgat and Grainger's account, vertically- surrounding choices should still yield an RVF disadvantage, because of stronger vertical interactions inherent to the RVF/LH. Under my account, vertically-surrounding choices should not have a stronger effect on the RVF letter, because the letter is now read out early enough that is processed prior to the appearance of the choices. In a subsequent investigation (Tydgat and Grainger, submitted), the authors performed experiments with just such conditions. There was no disadvantage for the RVF in these experiments, contradicting their account.

In fact there's evidence that vertically surrounding letters actually have a stronger effect in the LVF/RH than the RVF/LH, the direct opposite of their account. For example consider identification of vertical trigrams presented to a single VF. It's been established that perception of individual letters is equivalent in the two VFs, so any differences between VFs in trigram identification come from interactions amongst the letters. If there are stronger vertical interactions in the RVF/LH, the middle letter of a vertical consonant trigram should be perceived worse in the RVF than the LVF. However, the opposite is the case; the middle letter is perceived better in the RVF than the LVF (e.g., Luh and Wagner, 1997, Brain and Language).

By the way, I have previously addressed the issue of different perceptual patterns for vertical LVF and RVF trigrams, and suggested that trigrams are first mentally transformed to the horizontal (Whitney, 2001, PB&R). I now think it is more likely that the different patterns come directly from vertical interactions. I've proposed that there's strong left-to-right inhibition in the LVF/RH to invert the acuity gradient. This inhibition occurs, by definition, along the horizontal axis. However, in the brain, angular directions are usually coarsely coded, and so it may be the case that the left-to-right inhibition "bleeds" onto the vertical dimension, giving increased vertical interactions in the LVF/RH for letter strings.

Also in opposition to Tydgat and Grainger's account of the final-letter advantage is its dependency on exposure duration. According to the SERIOL model, the final-letter advantage arises when the final letter can fire for a longer period of time than the previous letter. For very brief presentations, there should be no final-letter advantage, even when there are no surrounding letter choices. This is indeed the case. For example Gomez, Ratcliff & Perea (in press, Psych Rev) presented five-letter strings for 60 ms, followed by a mask and 2AFC of two strings presented well below the stimulus location. There was no final-letter advantage at this exposure duration.

In motivating their proposal of hemisphere-specific receptive field shapes, Tydgat and Grainger suggest that the importance of the initial letter causes elongated receptive fields to develop in the LVF/RH and that these fields are asymmetric in that they encompass the letter to the right, but not the left. Note that this implies that LVF letters are quite sensitive to interference from a letter to the left, something that I've been arguing for years. However, I suggest that this is a result of lateral inhibition, rather than receptive-field shape. Note that the presence of additional letters to the left (two or three) has a stronger inhibitory effect in the LVF than the RVF (e.g., Wolford & Hollingsworth, 1974, Perception & Psychophysics). This is easily explained under lateral inhibition, but difficult to explain via receptive fields, as implausibly large fields would be required.

Now for the issue of the W-shaped accuracy pattern. In explaining hemifield accuracy patterns ( e.g., Whitney, in press, LC&P), I have assumed that accuracy is primarily governed by activation at the letter level, which is taken to index the probability that the letter is sufficiently activated to be consciously accessible for report. The probability that the activated letter is the actually the correct one is taken to be approximately constant across the string. Recall that the acuity gradient is steep within the fovea, and is shallow in the parafovea. Thus for parafoveal presentation, the difference in acuity across letter positions is relatively small. So it makes sense to assume that the probability that the correct letter is activated is approximately constant, and that activity level is the primary determinant of accuracy. However, in the fovea, acuity differences across string positions are much larger, meaning that the probability that the correct letter is activated varies substantially with string position.

Hence, for accuracy under foveal presentation, availability for report should be weighted by the probability that the correct letter is activated, which would be inversely proportional to feature-level noise. Feature-level noise is determined by acuity, and also by whether the letter receives left-to-right inhibition (under the assumption that such inhibition may be non-uniform and thereby directly introduce noise).

Hence, for five-letter centrally-fixated strings, the probability that the correct letter is activated is proportional to the acuity gradient, with a decrement at the second position, due to LVF/RH left-to-right inhibition. Next we consider the activity pattern at the letter level, which determines probability of accessibility for report. For foveal presentation, the resulting locational gradient would be fairly steep and smooth. As a result, activation of the first four letters would be determined primarily by firing rate. So activation (at the letter level) would decrease across the first four positions, and then rise for the final letter. The figure below illustrates examples of these two probability patterns (correctness and accessibility). At each position, Accuracy (shown in green) is the product of the two probabilities. This yields a symmetric W shape, in line with the experimental data.



Thus a symmetric W-shaped pattern is not inconsistent with serial processing. Serial processing yields the activity pattern in black, which when combined with the effects of feature-level noise (in red), gives the W-shaped pattern. Now if the task instead involved an explicit temporal component, such as reaction time for letter search in a string, we would expect this pattern to be inverted and weighted by the time that it takes to reach the target letter during serial processing. This would yield an asymmetric, upward-sloping M, which is indeed what is observed for the search task (e.g., Pitchford, Ledgeway, & Masterson, 2008, J. of Res. in Reading).


As for Tydgat and Grainger's account of letter vs symbol patterns, note that this explanation depends on the assumption of retinotopic symbol detectors. However, it seems unlikely that we have sufficient experience with symbols to develop location-specific symbol detectors. Instead, I suggest that the letter pattern is a signature of serial processing. An initial-letter advantage arises because the initial letter has the highest activation at the feature level, which causes it to have the highest activation at the letter level. A final-letter advantage arises when the final letter can fire for an extended period of time. Therefore, the final-letter advantage is less robust (than the initial-letter advantage); it is abolished by very brief exposures, or under lateral masking when the final letter is being processed. Symbols do not show this pattern because they are not processed serially; their perceptual pattern depends only on acuity.

Tuesday, May 27, 2008

Pure alexia

The SERIOL model offers a novel account of pure alexia. The usual interpretation is that the spared RH cannot process letters in parallel, thus serial letter-by-letter reading is performed instead. I suggest instead that letters normally undergo very rapid serial analysis, and the RH cannot provide this rapid serial processing. Because serial input is required by the spared lexical system, overt serial processing is performed instead. For a more detailed development of this idea, including a novel proposal for the nature of RH-specific visual processing and a more detailed specification of the dorsal route in the SERIOL model, see this article.

Guerrera & Forster (2008) in LC&P

The authors investigated priming of eight-letter targets in lexical decision, where the primes employed "extreme transposition". This article includes match scores from the SERIOL model. Again, these scores are based on the original (position-specific) formula for bigram activation, rather than the current assumption that bigram activations depend only on the separation of the constituent letters. Also, the article incorrectly states that the original specification of the SERIOL model placed no limit on the number of intervening letters allowed for bigram activation, whereas this limit has always been two letters.

So here are the match scores under the current parameters, where bigram activation is 1.0 for contiguous letters, 0.8 for a one-letter separation, and 0.4 for a two-letter separation (and 0.0 otherwise).
Note that these numbers are quite different than the published ones for Experiment 3 (in particular) and provide a better fit to those data.

Because bigram activations are no longer positional, the third and fourth primes above now give equivalent match scores. However, simulations give an advantage for matching initial versus final letters due to seriality, as discussed in Whitney (in press).

SOLAR Model - Davis & Bowers (2006) in JEP:HPP

For five-letter targets in lexical decision, the authors compare primes of the form 1d345 vs. 13d45. The SOLAR model predicts stronger priming for 1d345. In the original specification of the SERIOL model, open-bigram activation decreased with increasing string position. Under this assumption, the SERIOL model predicts stronger priming for 13d45. Davis and Bowers found stronger priming for 1d345 than 13d45, and concluded that the SERIOL model is inconsistent with their experimental results.

However, the conclusions of Davis and Bowers (2006) are based on an obsolete parameterization of the SERIOL model. In my 2004 dissertation, the assumption that bigram activation depends on string position was dropped; bigram activations now depend only on the separation between the constituent letters. As a result, the SERIOL model, like the SOLAR model, predicts stronger priming for 1d345 than 13d45. Thus the priming results do not actually differentiate between the two models. (For a discussion of why the positional variation was originally included, why it was dropped, and how the model now accounts for the phenomenon originally explained by the positional variation, see Whitney (in press, LC&P) ).

Despite the fact that a primary focus of Davis and Bowers (2006) was to compare the SERIOL and SOLAR models, JEP:HPP did not invite me to review their article. Hence, their article includes claims that are no longer accurate. (It also includes a claim that is outright incorrect - that even if the positional variation were dropped, the SERIOL model would still be inconsistent with their results.) Nor would JEP:HPP allow me to publish a reply. Fortunately, I found a fairer venue, and my reply to Davis and Bowers (2006) is now in press at Brain and Language.

Briefly, the main points of the response are:
  • The SERIOL model is consistent with the results of Davis & Bowers (2006).
  • SOLAR's lexical activation function is not biologically plausible.
  • A recent re-parameterization of the SOLAR model leads a contradiction of Davis and Bower's core claim that contiguity matters.
  • Experimental comparisons between models should test inherent aspects of the models and not depend on particular parameterizations.
  • A critical comparison is presented. That is, I specify a pair of primes for which the two models inherently make different predictions. (Colin Davis agrees with this analysis.)

Friday, May 23, 2008

An anti-rant

In the last post, I ranted about unfair treatment by TICS. I do want to say that, for the most part, I've been treated very fairly by reviewers and editors, with respect to my own publications and being asked to review others' publications. I greatly appreciate this fair treatment. TICS and JEP:HPP (more on JEP in later posts) are the notable exceptions. I guess the highest-status journals only care about status. Makes sense, I suppose.

I also greatly appreciate how I am treated by most European colleagues. For example, at ESCOP I talked with a number of people who were not aware that I do not have a research position. When I told them that I do not, it did not seem to diminish their interest in my work at all.

Goswami & Ziegler (2006) in TICS

In their TICS Letter, Goswami and Ziegler rightly argue that the sub-lexical, phonological route must also be considered in proposals for how letter-order is encoded. They point out that open-bigrams are not suitable for sub-lexical processing, as they are not phonologically relevant units.

However, in the extended SERIOL model (Whitney & Cornelissen, 2005), the serial encoding of individual letters provides input to both the ventral (lexical) and dorsal (sub-lexical) routes. On the ventral route, the letters activate open-bigrams, which activate visual word forms. On the dorsal route, the letter sequence is parsed into a grapho-phonological representation. Thus the serial encoding provides a location-invariant encoding of individual letters, which of course provides suitable input for either route.

Despite the suitability of SERIOL’s representations for the demands of phonological processing, Goswami and Ziegler (2006) argued that “This solution ignores data showing that phonology affects the lexical route, such as body-neighborhood effects in lexical decision (Ziegler & Perry, 1998)”. It is unclear what they could mean by this statement.

  • First, Whitney (2004) specifically discussed the data presented by Ziegler and Perry (1998), showing in detail how the SERIOL model explains their findings.
  • Second, the general issue of interaction between the lexical and sub-lexical routes is orthogonal to the question of how letter order is encoded. Presumably, the lexical and sub-lexical routes converge onto the same lexical representations. Feedback from the lexical representations to letters and phonemes would then cause interaction between the routes. This connectivity pattern is independent of how letter position is encoded.
And now for a rant. Despite the fact that Goswami and Ziegler's Letter and Dehaene's article on the LCD model both focus on the open-bigram representation and I was the first to propose such a representation (Whitney & Berndt, 1998), I was not asked by TICS to review either article. I do know that a co-author was asked to review Goswami and Ziegler. To ask this person and not me is ludicrous, as their Letter focused on my idea. It is just one example of how status trumps accomplishment in science. Apparently, having a research position (as does my co-author) is more important to TICS than ownership of ideas. (Due to geographic constraints and lack of connections in the U.S., I have been unable to obtain a research position; I work independently and collaborate on experiments.)

Thursday, May 22, 2008

LCD model - Dehaene et al. (2005) in TICS

Following Whitney's and Grainger's proposal that the highest pre-lexical orthographic encoding on the lexical (ventral) route is comprised of non-contiguous bigrams (dubbed open-bigrams by Grainger), Dehaene and company got into the act with their Local Combination Detector (LCD) model. While the model is somewhat vague, they do make two specific claims:

  1. Open-bigram representations do not provide sufficient accuracy in encoding letter order. Therefore, the highest level of the LCD model includes quadrigram detectors to provide a more precise encoding of letter order.
  2. Open-bigram-like representations occur as a result of retinotopic bigram detectors operating over noisy retinotopic representations of individual letters.

Claim (1) has some problematic aspects:

  • The model does not include a location-invariant encoding, as the quadigram detectors are retinotopic.
  • Quadrigrams do not provide a realistic similarity metric. For example, LAME and LIME would activate different quadrigrams, making them completely different from each other at the lexical level.
  • The authors only considered on/off open-bigrams with no encoding of edges. The addition of graded activations and edge bigrams, as in the SERIOL model, allows more accurate encoding of order information.
  • However, there is evidence that letter-order encoding on the ventral route is indeed somewhat imprecise, whereas the encoding is more accurate on the dorsal phonological route. Occipito-parietal lesions lead to a selective deficit in encoding letter order (Friedmann & Gvion, 2001; Shalev, Mevorach & Humphryes, in press) . See also Frankish & Turner (2007) . So experimental results indicate that open-bigrams do not need to be "fixed" with quadrigrams.

In contrast, claim (2) above offers a reasonable account of how open-bigrams could be activated within a parallel framework. Grainger et al. (2006) incorporated this suggestion into their Overlap Open-Bigram (OOB) model. Within the abstract open-bigram layer, the OOB model is quite similar to the SERIOL model, in that it employs open-bigrams with graded activation levels. The only difference is that the OOB model includes activation of transpositions (e.g., BIRD would activate bigram RI to a low level). Of course, the two models radically differ on how the open-bigrams become activated, as the SERIOL model proposes that open-bigrams are activated serially. This serial mechanism explains perceptual patterns for consonant strings, while parallel accounts do not. For a detailed comparison of serial versus parallel activation of open-bigrams, see Whitney & Cornelissen (2008).

Monday, May 19, 2008

Welcome to Orthoblography

This is a scientific blog dealing with orthographic encoding in visual word recognition, written from the point of view of the SERIOL model. Gratifyingly, research into orthographic processing in normal and dyslexic populations is heating up. With the increasing activity in this field, I wanted forum in which to comment on recent publications (especially those that make incorrect claims about the SERIOL model). I also wanted a place in which to rapidly disseminate new ideas. Hence, Orthoblography.

This blog is intended for researchers already familiar with the model. If you are not familiar with it, but would like to be, you should first read Whitney (2001). Since that publication, modifications have been made to the bigram layer of the model, as described in Whitney (in press; Language and Cognitive Processes).