Language Testing and Second Language Acquisition Research: Worlds Apart

Meredith Cicerchia is Director of E-Learning for Lingua.ly. She holds an M.Sc in Applied Linguistics and Second Language Acquisition from the University of Oxford and a B.A. in French language and literature from Georgetown University. You can find her on Twitter (@MereLanguage) or follow Lingua.ly’s blog of SLA inspired tips for language learners.

Maybe I was just being naive, but as a language learner, teacher, curriculum developer and researcher, I always thought Language Testing (LT) and Second Language Acquisition (SLA) were closely linked fields. I figured the LT people who spent their days constructing various assessment tools were not only aware of the most recent findings in the SLA research, but used them to inform test design. Similarly, I assumed the psychometric analyses put into test design were well understood by SLA researchers seeking out reliable instruments for their studies.

But several months into a major language testing-project, I quickly understood the reality of the situation: the worlds of LT and SLA could not be further apart. First off, I discovered I had barely scratched the surface in my understanding of language testing during an SLA degree. Secondly, when it came to the reading and listening content we were developing, the inner-workings of the test-tasks could not have been more foreign. I started asking around and realized I was not the only language-learning advocate surrounded by test developers and assessment gurus who seemed to be “speaking another language.”

So when it comes to LT and SLA- why the rift? To be honest, I still don’t know the answer to this question. The closest I’ve come to understanding it was at an LTRC Language Testing Researchers Conference last year in South Korea. A keynote delivered by British Council APTIS test designer Barry O’Sullivan caused quite a stir with the audience yet I felt myself nodding along as O’Sullivan argued that the two fields needed to make more efforts to work together.

After all, SLA trained item writers and language teachers often create test content. Test specifications typically reference parameters recognized by both fields. And most importantly, the same individuals who use SLA based curriculum and are taught by SLA trained teachers sit the exams (and depending on the stakes, make major life decisions based on the results).

Googling the issue doesn’t really add much to the discussion. According to Dr. Geoff Jordan “The relationship has certainly been theorized a number of times. But yet there remains very little contact between these two critical branches of applied linguistics research.” Cambridge University Press’s Interfaces between Second Language Acquisition and Language Testing Research says trends are being reversed thanks to overlaps in research interests and empirical approaches. Yet Shohamy’s 2000 study cites “limited interfaces” and a lack of relevance of language testing to SLA. What are we to think?

Language Tests from an SLA Perspective

I recall a former professor at Oxford saying exams were a snapshot and if they caught you at the wrong moment or measured only one skill, you could come away believing you were less fluent than you really were. Maybe that’s why I never fully bought in to the language-testing world.

At some point in their studies, most Applied Linguists encounter Selinker’s Interlanguage Continuum, a long and extended line with ups and downs. Selinker depicted language as a life-long journey characterized by many U-shaped curves. When we have understood a rule, we begin to apply it en masse. Eventually, we become aware of exceptions to the rule. Nonetheless, our execution is imperfect and unpredictable for the remainder of the upswing. Therefore, while we may not be 100% correct in demonstrating our knowledge, we have in fact moved forward on our continuum.

But what happens when a language test occurs during an up-swing? And if language testers are not using SLA research to inform their choice of constructs and task-design, how well do test results correlate with the ability and performance of learners? Of course there are many occasions on which we need standardized assessment tools and tests can come in all shapes and sizes. Yet what if there was a simpler, SLA based approach for the rest of us?

Vocabulary-Based Learning

These days I work in big-data fueled digital language learning on methods that simulate immersion and force a departure from the traditional beginner, intermediate and advanced levels that most of the industry is wedded to. So I got to thinking how SLA researchers typically measure ability level and realized the most common tool is some form of productive/receptive vocabulary test.

Considering vocabulary has long been hailed as one of the best predictors of proficiency across reading, listening, writing and speaking it does make some sense. And consequently there are a few startups out there who are imagining language learning in a whole new vocabulary driven light. They get to know an individual’s working vocabulary and let word look-up and frequency data from exposure to authentic content do the rest.

Lingua.ly, for example, is able to achieve this thanks to a robust backend that uses SLA rules governing acquisition from context to turn a measure of working vocabulary into a guideline for sourcing “comprehensible input” from the worldwide web. Bliu Bliu does it by asking learners outright what they do and don’t know. Individuals can bootstrap their way from there and scaffolding is provided in the form of dictionaries, flashcard-makers and smart review platforms.

New Kinds of Tests

Yet despite new vocabulary-driven approaches to language learning, there hasn’t really been a parallel wave of novel approaches to measurement — at least not to my knowledge. Earlier this year, the University of Ghent developed a research tool which spread like wildfire on social media as a fun, fast vocabulary test that tells you about your language ability in very little time. Its popularity was no surprise given it appeals to a generation of multi-tasking millennials who will do anything to avoid a three hour-long exam. But it was more of a game than a real test.

Can we take mobile exams and smartphone testing seriously? This past April, another startup popular with milennials, Duolingo, announced they were throwing their hat in the certified testing ring with a mobile LT center available to people who use their big-data fueled language learning platform. Duolingo’s lessons work via crowd-sourced translation so it will be interesting to understand more as their approach to testing develops.

Conclusion

Yet circling back, I still strongly believe that the separation between LT and SLA camps deserves more attention from everyone involved. With new digital approaches to learning we need new, dynamic and complementary assessment measures and more cross-pollination of ideas between the two fields, both in practice and in the research community.

Imagine the benefits if language testers and second language acquisition researchers came together in the digital startup age! We’d not only have enhanced insight into test constructs and new takes on task-construction (of particular importance for tests delivered in the digital medium) but a new generation of learning and testing tools to help life-long language learners meet their goals.

Featured Photo Credit: Mimolalen via Compfight cc. Text added by ELTjam.

26 thoughts on “Language Testing and Second Language Acquisition Research: Worlds Apart”

  1. There has always been a significant divide between SLA and language teaching /testing, it’s still there despite (or perhaps because of!) the much more substantial body of research now available. I fear we are more or less where we were when Claire Kramsch commented on the domain specific nature of SLA terminology and the gap it created with teachers at the chalk face. (apologies that I can’t find the original article right now!)
    Re testing, I feel that apart from the generally accepted lexis-based and grammar-based tests which seem to be so popular, we should be teaching towards and testing a variety of competencies, including strategic competence – that which supports a 2nd language learner when all else fails!
    As to the ‘good day, bad day for a test’ notion, I doubt we’ll ever be able to entirely eliminate that nor make a test which is 100% reliable and accurate so I think we have to live with the inadequacies of the system while trying to improve it little by little.

    Reply
  2. What a wonderful read.

    I learnt my first words from a restaurant menu, and by ordering and pointing to items on the menu. I went to the same restaurant everyday, and the waiters became my teachers, gesturing, shaping. I remember they taught me colors too, and well done and medium. I had to get that down, because a well done steak really annoys me! So, food vocab first and pronunciation second for me, with role playing. Then I remember trying to figure out street signs, and words that popped up regularly in signage in and around the community. Aberto, fechado, sobre, directo, esquina, farmacia etc but, to communicate effectively I had to pronounce the word just so. It took me a long time to learn the difference between “pao” and “pau” – a very light nasal sound that I was unfamiliar with, changed the meaning/ one created a piece of bread, the other a penis. Who knew I was going into the bakery and asking for 6 penises each morning; girls in a constant giggle behind the counter? Not me.

    As a TOEIC proctor I administered the testing – and I couldn`t believe the cost to be quite frank. Money being funneled overseas for nibs, because Asian employers liked it. Wow – what a money maker I thought. SO, I set up a testing lab, which turned into a pronunciation classroom, and coding center.

    Anyway, I`m not an academic. But, I know how to create code. So, I`m going to suggest that teachers collaborate and start creating some of these apps using an “open based company” model of complete transparency. If there is a problem with “testing” lets fix it, and market a “free” testing app, which is 100 percent better than the proprietary systems forced upon educators today. IELTS, TOEIC, TOEFL are all in it for the money, the commission – which is fair enough in a capitalistic world. But, education begins with the poor, and has to be free to be accessible. So, we need our own app. Then, we can customize it for each market, and teachers can contribute data, fork, “bootstrap”, and we can own the data, instead of being blackmailed by it. Shared knowledge is more powerful than proprietary knowledge, and hierarchy will fight tooth and nail to guard knowledge, whilst a distributed collective is driven by sharing knowledge. An enormous difference.

    Reply
  3. What a wonderful post! I am left with nothing but questions. These are:

    1. Do you think it is possible to create a language test in which it is difficult to “artificially” increase ones scores by studying just for the test and nothing else?

    2. My belief is that students acquire language skills by trading increased levels of skills for time spent studying. Your point was that this increase is not linear and sometimes (often? always?) one can regress on the path forward. What do people in LT make of this “fact”?

    3. I suspect most language teachers tell students they need to spend time on a variety of skills (vocab, reading, grammar, listening, speaking, writing) to develop in each of these areas. After your time spent in LT/SLA do you believe there is a shortcut to learning such that time spent learning one or two skills can greatly effect other skill development (or conversely that time spent on one or two skills has little contribution to the development of others)?

    Reply
    • Thanks for your kind words Mike. To be honest the only way I can imagine #1 being possible is if you had no consistency in task/item types. But without a blueprint and consistent specs it would be difficult to trial and build a reliable instrument. I think in response to #2 LT holds its hands up that they can’t always be accurate and tests have to happen so it is unfortunate when they are poorly timed (and maybe that is why in some respect I am arguing against traditional approaches to testing). #3 is an interesting comment because I remember a few students from my SLA class that were constantly looking for said “shortcuts” you mention. For me, vocabulary is the “long” shortcut and obviously reading exposes you to the most words. You need basic reading and listening skills though: de-coding the alphabet, parsing speech, accurate grapheme-phoneme mapping so you can extract new vocabulary from comprehensible input, the more words you know, the more input is accessible to you and it snowballs from there. But it also depends on why you want to learn a language as you may just need speaking skills for your call center job and therefore learning the written alphabet is less important than mastering pronunciation and common native speaker phrases and chunks. I think the answer is it all depends (frustrating I know!)

      Reply
  4. To raise a dissenting voice, I would say that SLA (in its present state) and LT have more in common than you suggest – principally, an unswerving commitment to a monolingual, native-speaker model of competence, a standard which both determines the success or not of the acquisition process, and sets the criteria for testing achievement in language performance. Both disciplines are predicated on this superannuated model, despite the fact that learners are, by definition, bilinguals, and that they will rarely, if ever, pass as native speakers, nor even want to a fact that all of us who are complicit in the teaching/testing/material writing process secretly know, although we act as if the NS model were a realizable standard.

    Reply
    • Scott- thanks for bringing this point up. Who wants to sound like a native-speaker anyway? I think half of our SLA class would have raised their hands and the other half argued on behalf of learners maintaining their bilingual identity in the L2. You are right- both SLA and LT have the same, unrealistic expectations. But do you think young-startups like Duolingo who are approaching LT for the first time will diverge from the NS model based on (or thanks to) the learner generated big-data which fuels their platforms?

      Reply
      • Good question, Meredith, but I wonder if this ‘learner generated data’ won’t simply perpetuate the problem, even aggravate it exponentially. If all your data is drawn from tests that (a) are not performative, but simply test accumulated, isolated, and decontextualized ‘bits’ of knowledge and (b) are benchmarked to the standards of what idealized native speakers would do under the same conditions, then your test will be forever skewed. It’s like old-fashioned IQ testing: if the test type and success criteria are biased towards a certain social grouping, no amount of data from subsequent testing is going to change that bias. On the contrary, it will simply entrench it.

        Reply
        • I still didn’t understand the relationship between SLA and language testing. would you please explain how they are related

          Reply
  5. Hi Meredith,

    Thank you for this thought-provoking post though what seems most provocative to me here is that a casual reading of this could leave the impression that both SLA and LT are reasonably unified fields at their core and so the problem(s), such as they are, are between these two disciplines rather than within each one.

    Please note that I am aware that that might seem a tad unfair – you are careful, for instance, to add the caveat that assessment tools and tests can come in all shapes and sizes – but even so, you do otherwise seem to refer to SLA as a somewhat monolithic and unproblematic field. So for example when you say:

    I figured the LT people who spent their days constructing various assessment tools were not only aware of the most recent findings in the SLA research, but used them to inform test design.

    It almost gives the impression that ‘findings’ in SlA are far from tendentious and equivalent to a proven theory. The same applies when you talk of SLA trained item writers and language teachers – what exactly is being implied by the phrase SLA trained? Intentional or not, it strongly implies that having an understanding and experience of SLA research design and a knowledge of SPSS (or whatever) is better able than non-SLA trained practitioners to create more effective forms of testing.

    But surely that is impossible? How can it be otherwise when:
    (i) there is still no consensus about whether or not we are talking about language (as a mass noun as in e.g. Saussure) a language (as a unit noun, as in e.g. national languages such as French, Spanish etc. – which itself is a highly controversial term given the status of accents, dialects and so on) or languages (i.e. genres, so that someone could conceivably outline a complex surgical procedure using the L2 yet have absolutely no idea how to explain to a hairdresser what kind of haircut you want);
    (ii) the notion of a second language, the object being learned, is likewise in dispute and
    (iii) what acquisition means in terms of a language and if/whether it is even sensible to try and classify the difference between a learned, unlearned or partially learned language?

    If each of the three terms that S, L and A stand for is each hotly contested by different schools of thought in their own right, then to me what seems more surprising is that anyone might expect there to be any genuinely applied overlap between SLA and LT at all, not that there is a gap between the two.

    While I’m sure it is true that SLA researchers typically measure ability level using some form of productive/receptive vocabulary test., I think this might well have at least as much to do with the conditions under which SLA researchers are typically working. SLA research in a significant number of cases is conducted on a fairly small scale by small teams of researchers with very small budgets – under these conditions, it is probably not that surprising that learner vocabulary features prominently in SLA research (i.e. because it is more manageable within the constraints the researchers are working under). This, therefore, could give a biased view that has less to do with the soundness of SLA theory and more to do with the constraints the discipline has to work with in out of necessity.

    By contrast, LT can often be found conducted on a national or international level, involve large-scale budgets, be under pressure from investors (whether private or government) and on top of all of that, they generally have the responsibility of being able to say that e.g. ‘If candidate P gets Q grade in R test then this means he/she should be able to perform S task with a moderate to fair amount of competence.’ Failure to show that this is generally the case will have numerous real-world repercussions that – I have to say – much SLA research – however inspired, systematic, thoughtful, well designed etc, it is – does not, or certainly not in the same way.

    In answer to your (and in a sense Scott’s) question Who wants to sound like a native-speaker anyway? I would say, lots of people actually.

    However, if you were to ask ‘How much effort are you willing and/or able to devote to pursuing such a goal and what extent are you willing to sacrifice native speaker fluency in the L2 for more pragmatic concerns (e.g. communicating effectively for work or study)?’ then I think we are getting nearer the mark.

    In any case, I don’t see it any problem with choosing a standardized educated native speaker’s competence as the basis for the foundation of an L2 syllabus.

    Apologies for the long post, it seems to be a habit of mine on ELT Jam comments.

    Reply
    • Nik, I agree that the terms ‘second’, ‘language’ and ‘acquisition’ are all contentious (David Block does a nice job of problematizing them in his 2004 book The Social Turn in SLA) so I’m surprised that you should – in almost the same breath – suggest that there is nothing unproblematic in the terms ‘standardized’, ‘educated’ or ‘native speaker’. Or nothing untoward in measuring the effectiveness of non-native speakers solely in terms of how closely they approximate native speakers – which, I repeat, is still the gold standard that both SLA researchers and ‘second’ language testers uncritically share.

      To quote Vivian Cook:

      “People who speak differently from some arbitrary group are not speaking better or worse, just differently. … However, teachers, researchers, and people in general have often taken for granted that L2 learners represent a special case that can be properly judged by the standards of another group. Grammar that differs from native speakers’, pronunciation that betrays where L2 users come from, and vocabulary that differs from native usage are treated as signs of L2 users’ failure to become native speakers, not of their accomplishments in learning to use the L2. … L2 users are not monolingual native speakers and never will be; they are as incapable of changing places as most women and men. L2 users have to be looked at in their own right as genuine L2 users, not as imitation native speakers.”

      Cook, V. 1999. Going beyond the native speaker in language teaching. TESOL Quarterly, 33:2, pp. 194-5.

      Reply
  6. Scott,

    Without trying to sound too picky, I had not meant to suggest that concepts such as ‘standardized’, ‘educated’ or ‘native speaker’ were ‘unproblematic’, but rather that I had no issue with employing them ‘as the basis for the foundation of an L2 syllabus’. And that is something that I still see as justified and am still happy to stand by however fuzzy the ideal might be.

    In short, I do not agree that using an idealized model of an educated native speaker of a natural language necessarily always results in an unnatural model of an L2 (L3, L4, Ln) speaker. In other words, the fact that there have been (and still are) times when an L2 speaker is judged according to perceived deficiencies doesn’t mean that it has to inevitably be the case as the quote from Cook implies.

    On the other hand, as a very general point, I do agree that studies in SLA seem to have, as a rule, far too narrow a focus on what I suppose you could call ‘optimal’ SLA, i.e. gaining native-like use of a language other than the one(s) known from childhood.

    In one sense, there is nothing mysterious or complex about acquiring language as everybody does it all the time in the form of catch phrases (‘Am I bovvered?’), buzz words (‘mansplaining’), and slang (http://www.timwoods.org/the-london-slang-dictionary-project/) etc. etc. And that works just as well when someone is living and working in a country where a different language is spoken. For instance, I was fascinated to overhear a conversation between two Irish ex-pats in Moscow once because the English they were using was littered with domesticated Russian words e.g. ‘How much dengi (money) did that remont (repairs) you had done set you back?’/ ‘How many peevers (beers) did we have last night, eh?’ and so on.

    SLA also seems to have mostly ignored (to my knowledge at least) studies of polylingual communities of the kind found in many parts of Africa where language use appears to be tied much more visibly to context and function – e.g. where it will be common for someone to use a regional language in the market, a local language in the home, a European language to deal with officialdom, English, French or Arabic for religious purposes and so on. My suspicion is that this kind of fluency in multiple languages operates in quite different ways from the kind of models typically proposed for language learners in most SLA studies.

    However, going back to learning languages for a purpose – work or study, say – then I support using an idealized model of an educated native speaker to inform the syllabus and create the benchmark for learning goals because it is just this kind of model that has the greatest adaptability to the widest range of social, vocational and academic situations. It is precisely that breadth of coverage that makes the model ideal as a starting point for what learners will need to learn (regardless of who gets to choose what gets to be learned).

    The idealized NS model is therefore not a random choice at all but quite a pragmatic one based on a principle of versatility. I think Cook is quite wrong to suggest both that NS models are ‘arbitrary’ and also that one language variety is not ‘better or worse, just differen[t]’.

    I’d even go so far as to say that it was irresponsible to suggest such a thing because it doesn’t take into account the social or political reality that the actual real life users of L2 languages have to deal with. All other things being equal, a prospective employer looking to take on fresh graduate trainees will be much more likely to choose the candidate who speaks English ‘better’ than the others, and at some point ‘better’ will be related to that idealized and educated native speaker. A graduate who has learned English solely by working on building sites in London for instance, may well be fluent but unless they fully appreciate just how context-bound much of that language is likely to be, it won’t do them all that much good when it comes to trying to win a place on a graduate training scheme at DHL or KPMG or whatever.

    Reply
    • Thanks for your thoughts, Nik. Two things:

      You say that “the fact that there have been (and still are) times when an L2 speaker is judged according to perceived deficiencies doesn’t mean that it has to inevitably be the case”: I would say that it is, for all intents and purposes, ONLY the case. For all the lip-service that has been made to the notion of communicative competence ever since Dell Hymes first coined the phrase in the 1960s, the default model for assessing learners, both for the purposes of research and for the purposes of proficiency testing, has been the extent to which they compare (invariably less successfully) to a (monolingual) educated native speaker. Coincidentally, and by way of an example, I read this in the introduction to a research paper only yesterday: ‘Compared to children’s ultimate success in language learning, the achievements of late learners of a second language (L2) seem often very poor. The prototypical late L2 learners’ speech is riddled with grammatical errors and usually identifiable by its foreign-sounding accent…’ Note the negative connotations of ‘foreign-sounding’ in association with ‘poor’, ‘riddled’ and ‘errors’! And this was written by a second language acquisition researcher who is a non-native speaker herself!

      2. Yes, employers and other gate-keepers may well continue to favour bilinguals whose English is native-like over bilinguals who are resourceful translingual communicators, but that is no reason why we – as teachers and materials designers – should accept such a status quo uncritically. Coincidentally (again!) the latest issue of TESOL Quarterly (which plopped through the mailbox yesterday) is devoted to ‘standards-based educational reform’ and includes one paper titled ‘Dynamic bilingualism as the norm: Envisioning a heteroglossic approach to standards-based reform’. The argument that the authors (Flores and Schissel) make should be obvious from the title: among other things they bemoan the negative washback effect of monolingual approaches to assessment, but argue that ‘we do not have to wait until a policy shift occurs […] Working directly with teachers can begin to open spaces of resistance where the dynamic bilingualism of emergent bilinguals is affirmed and built upon [and] may eventually coalesce into a bottom-up movement against monoglossic language ideologies that brings about a national transformation where bilingualism truly does become the norm for all students’ (p. 476).

      Reply
      • Dear Scott,
        “ever since Hymes…” I think the lip service is homemade. I’ve noticed that it is the teaching community that often superimposes value judgments on language performance. Maverick language learners, backpackers, immigrants, tourists, etc that run off the track of formal language instruction appreciate language learning in its pristine communicative nature and in my own experience as casual language learner, ie. Spanish, the native community has been less prone to sanction, as in discriminate against, unorthodox use. In fact, I benefit from sounding foreign. It is when I move to English that I have to watch out. English teachers are their own worst superego–homo homini lupus. Coming to think of it, some years ago a language trainer confessed to me that he would not interact in Spanish with local authorities (Chile) because he did not want them to think of him as any less intelligent then he really was. Again, I think the education industry often turns ugly Darwinian. Speaking a foreign language with native like accent, doing math at advanced levels, knowing how to draw, etc. stands for intelligence, which stands for prestige, and so on. At the same time, if one can offset the temptation of turning learning outcomes into appreciation (and we do like to teach talented students), I find no issue with teaching towards a standard model when reasonably defined. The NS standard is an artificial construct (which native speaker?) In my own efforts to become better (watch the term) at English I try to draw on native speaker, or native speaker type, texts. What else could there be?

        Reply
        • Yes, Thomas, language teachers are probably the least well-equipped to assess communicative competence. Forgive me for quoting Dick Schmidt again, writing about his subject, the multilingual Wes, “If one views language as a system of elements and rules, with syntax playing a major role, then Wes is clearly a very poor learner. Friends and acquaintances who are not in the language or language teaching business generally evaluate Wes’s English favourably, pointing out, for example, that ‘I understand him a lot better than X, who’s been here over 20 years.’ Several sociolinguists with whom I have discussed this case have given similar evaluations, sometimes proclaiming him a superior language learner who just doesn’t care about grammatical do-dads, most of which are eliminated in normal speech anyway. Grammar teachers, on the other hand, generally consider him a disaster, possibly beyond rescue.”

          In terms of your last question – what other standards are there – you need look no further than the Common European Framework which, for all its flaws, at least describes language use in terms of communicative competencies, and not in terms of ‘grammar do-dads’.

          Reply
  7. I found this post quite confusing. I’m a big fan of language testing and so your opening gambit of ‘there’s this huge problem with tests’ got me interested. There’s this huge problem and everyone knows about it -yet I’m struggling to see any concrete examples of what you mean? In fact the first section just talk about this shocking problem. Are you talking a the test constructs? How the constructs relate to theories of SLA (like whether gist is a valid indicator of reading ability?) or are you talking about the method of delivery? The inauthentic nature of some test stimuli? And which tests are they that specifically have these problems? IELTS? Trinity? GCSE French tests? JLPT?

    We then move on to what I think is an example; the theory of interlanguage. You suggest that students interlanguage goes up and down and the test might catch it someone on a downward swing and this not represent their true ability. Actually, unless you’re claiming that a students ‘true’ ability is the highest or ‘most correct’ point they ever reach before any downward spiral (and not the mean or the actual point they are at when the test is being taken) then the test is doing a fabulous job of assessing that student’s real ability.

    So we establish there is a major problem, then we go on to note that vocab based learning is ‘where it’s at’. SLA researchers, you claim without citation, often use this to measure student ability. Well who? And are they right to do so? Is it a valid and accurate measure of general student ability? The answers are not given but it doesn’t matter because the direction of this article because clear when you write ‘lingu.ly’ for example is able to achieve this…’ Ah-ha!

    So there’s this huge problem and your company has the solution, right?

    Sorry for my cynicism but I’d like to hear more concrete specifics about what exactly the gap between SLA and testing is.

    Cheers

    Russ

    Reply
    • Hi Russ,

      I think what so disappointed me when I first started working in LT was the inauthentic nature of the stimulus and item material and the overwhelming impression that test makers spend far more time considering the logic behind a test task than they do the actual language involved. So many tests seem unfair and the emphasis is always on planting material for these 4 arbitrary constructs which we use to measure reading and listening comprehension vs. taking into consideration how we actually process language and read and listen. What really is power in reading in a second language, is it speed, overall vocabulary or the ability to distinguish the gist from a cleverly worded distractor? So in part, yes it is the constructs and inauthentic content that I found so detached from SLA and the heavy logic side to stems and distractors which are created and edited to almost mathematical standards. They often result in extremely awkward language which no teacher would ever put together and which I feel learners and even native speakers sometimes struggle to understand. (I also write test tasks in addition to advising a startup. Pick up a highly abstract C1 text and look at the jargon that shows up. Sometimes I can’t even answer the items I write if I haven’t had yet had my morning cup of coffee.) It becomes a wording game that benefits those who know how to manipulate tasks vs. a measure of a learner’s proficiency. I think it is a bit unfair to say I am simply plugging a startup. I was bringing up big-data and vocabulary more as my own preference when it comes to being tested. I speak five languages fluently and have passed tests attesting to my proficiency with flying colors, yet I believe I did so because I am a good test-taker not because I am adept at navigating the language I was tested on. Citations? They seem so prevalent for both L1 and L2 I didn’t go to the trouble but Nation and Laufer for starters. In any case I did not mean to so offend and apologize for any ruffled feathers my post may have caused. I simply wanted to highlight the fact that not everyone is as content with the current state of LT and its relationship to SLA as you seem to be and I think more everyday language teachers and students should be aware of it. I’d love to see a new movement in LT that who knows, may be startup driven, that takes more of a balanced approach to test task creation.

      Reply
  8. Scott,

    Thanks for your response, to which I just wanted to make a quick(ish!) reply:

    Point 1:

    While I concede the point that the “(monolingual) educated native speaker” has provided “the default model for assessing learners, both for the purposes of research and for the purposes of proficiency testing” I still find it hard to accept that a deficit view is inevitable – why, for instance, can the results of such assessments not be seen in terms of the progress made toward something rather than the distance away?

    That seems to be a not entirely unreasonable position to take and, I assume, is the one taken by all those ‘can do’ statements that present language proficiency in terms of a repertoire of purposes to which the L2 can be successfully applied and, what’s more, applied without overt reference to L1 speaker competence in that language, at least up until post-intermediate.

    There are certainly things that can only be done in the L2 that require a much greater degree of delicacy and precision of language use – the language skills needed to become part of an academic or professional community for instance. And it’s with this upper or more advanced repertoire of things to be done in the L2 that I find hard to imagine being defined without reference at some point to the putative educated native speaker.

    How else can the differences between, say, result, consequence, outcome, repercussion be interpreted and/or exploited for a communicative purpose without reference to the unmarked use of those words in contexts that most (educated) native speakers would recognize as being appropriate?

    And that, I think, is regardless of whether this is an NS-NNS or an NNS-NNS interaction we’re talking about because the appropriate use of language is determined by the purpose and context of use and not what the first language of one or more of the speakers may or may not be (i.e. a Polish C2 level speaker of English may well be able to accommodate her level to an Italian A2 level speaker if all they are talking about is how much they both like coffee and doughnuts but no amount of accommodation will make it feasible for them to meaningfully discourse about e.g. the influence of Durkheim on Bourdieu).

    Following a short lecture by Jennifer Jenkins during which she referred repeatedly to ‘proficient’ ELF speakers, I asked how she had determined that they were proficient speakers if she was rejecting the notion of native speakers as the benchmark. Perhaps she was just annoyed at the question, but her reply at the time was that she knew that these ELF speakers were proficient because they had all passed CPE at a ‘C’ grade or higher. This was some years ago now so maybe she has a better way of defining levels in ELF, but using CPE as a benchmark did seem to rather undermine her main argument (on that day at least).

    Point 2:

    “Yes, employers and other gate-keepers may well continue to favour bilinguals whose English is native-like over bilinguals who are resourceful translingual communicators, but that is no reason why we – as teachers and materials designers – should accept such a status quo uncritically.”

    I’m afraid I have to disagree here:

    [1] How likely is it that “bilinguals whose English is native-like” are less able to draw on their language resources to translate as (or even more) effectively than ” bilinguals who are resourceful translingual communicators”? I can see how someone with relatively modest language skills can be an effective go-between, but am less clear about how someone with superior language skills could be less effective.

    [2] As the use of English as an academic and professional lingua franca seems likely to become more rather than less important for the foreseeable future, doesn’t that mean that the things that the students today are most likely to need to do in the L2 are all those parts of the repertoire that are post-intermediate?

    And again, how is it then possible to define ‘advanced’ levels of competence without reference to delicate and sophisticated manipulation of the language and how, in turn, can that delicacy and sophistication in language use be defined without reference at some point to uses of that NS speakers also consider to be advanced (such as giving a presentation at a board meeting or publishing a research paper and so on)?

    Reply
  9. Nik writes, ‘And it’s with this upper or more advanced repertoire of things to be done in the L2 that I find hard to imagine being defined without reference at some point to the putative educated native speaker.’

    Here’s a thought experiment: How would you assess a speaker (or writer) of a language that has no native speakers, putative or otherwise, such as Esperanto? (And, apparently they do exist).

    Reply
  10. Here’s a thought experiment: How would you assess a speaker (or writer) of a language that has no native speakers, putative or otherwise, such as Esperanto? (And, apparently they do exist).

    It’s an interesting question to pose, although if we are specifically referring to Esperanto, then there is no need for a thought experiment as exams for this language already exist and, according to the website of the British association of Esperanto speakers (which is based in Stoke-on-Trent of all places), the third of the three levels available is already tied to the CEFR (The Advanced exam is described as being C1).

    Each of the three levels of the exam award marks of between 30% and 50% for translation (both Esperanto to English and English to Esperanto) and I was slightly amused to note the following instruction (in English) on the Intermediate level sample paper: “Translate the following passage into good English” (my emphasis). Reading through the online guides for prospective candidates, it is quite clear that this is an assessment that is focused on accuracy. I suppose this is to be expected for a language whose creator(s) can actually be named in person (as opposed to natural languages).

    If we discount Esperanto, there are a number of other examples of languages with no non-native speakers: Latin, (Ancient) Greek, various ConLangs, e.g. Klingon, Dothraki, Elvish.

    The Klingon Language Institute offers the Klingon Language Certification Program, for which “[t]he questions will either be translation, fill-in-the-blank, or require answers to questions about grammar.” For the curious, here are some sample questions:

    1a. Translate the following sentence: SuS’a’mo’ pum Sorvetlh

    The answer is apparently “That tree fell because of the powerful wind.” – take that Henry Sweet!

    2b. ya ghaH wo’rIv’e’

    Worf is the __________.

    The missing phrase in the blank (corresponding to ya ghaH) is “tactical officer” – again, nineteenth century Classics students puzzling over the manner in which philosophers pull the lower jaws of certain types of farmyard animal has nothing on this, clearly!

    More seriously though, what the results of my attempt at this thought experiment appear to suggest is that assessment is heavily invested in a faith in and adherence to grammatical and lexical accuracy both of which are related to clearly defined standard forms of the language, regardless of the fact that it no (longer) has any native speakers, or even whether the native speakers are really or wholly fictitious.

    As far as I am concerned, this should not be surprising. While it can be reasonably argued that this emphasis on accuracy is the result of these exams being based on those made for natural / national languages, I think it also points clearly to the fact that what defines the kind of invented languages referred to above is that they are ultimately created as a leisure time pursuit (albeit a highly rarefied and sophisticated one). That being the case, there is such a heavy emphasis on translation as the primary use of these languages must surely be translating text to be spoken by actors and eventually subtitled on screen (Dothraki, Elvish), explaining to laypeople what you’ve just said (Esperanto, Volapük) or making the dead live by offering translations of literary, religious or historical documents (Latin, Sanskrit).

    The conclusion is – as it should be – that the form of assessment is matched to the purposes for which the language will eventually be made use of.

    In contrast to these languages, there are alternative Lingua Francas to English and in some parts of the world (e.g. Swahili), and there are languages that have been specifically created for the purposes of diplomacy, trading and negotiation and only used in specific frontier and border spaces (see e.g. the work of Peter Mühlhäusler on contact languages and the ecology of language / language of ecology for more on this). I’m afraid I don’t have the details to hand, but I know that there is a hill region of Papua New Guinea on the frontier between two tribal regions and in this zone, the men (it is only men who can go there) use a specific language that belongs to neither one tribe nor the other.

    Although you can take an exam in Swahili (http://www.cie.org.uk/images/128567-2015-syllabus.pdf), a ‘real’ assessment of efficacy (there’s that word again!) in such languages presumably depends on achieving specific outcomes: conflict is avoided (or successfully provoked, human nature being what it is!), the number of pigs offered as a dowry meets with the satisfaction of all parties involved, you get a good bargain or price for the cassava / vegetable oil / AAA batteries etc. bought or sold in the market and so on.

    To be blunt, though, such a form of assessment does not apply to English – or for that matter, a number of natural national languages used in mainly industrialised nations (e.g. Spanish). English, in my opinion, is definitely unique amongst world languages in this regard.

    English is not only a language of contact and trade, but it is also a language of science, technology, academia, diplomacy and international relations. Given that the latter five uses of English in the modern world each require a high degree of delicacy in the use of grammar and lexis to ensure success, it should hardly be surprising that the ideal of an educated native speaker model is essential as a basis for learning and assessment.

    Of course, Jenkins, Prodromou (and others) are quite right to point out that idioms and proverbs such as ‘to gain Brownie points’ or ‘You can lead a horse to water, but you can’t make it drink’ are basically useless (or worse), but please note that such phrases tend (at least on the whole) not to have a significant bearing on the language of science, technology, academia, diplomacy or international relations (I’m aware the odd exception can be found).

    Quite by chance, I noticed that the latest edition of the ELT Journal includes a number of articles on just this topic of assessment, and the kind of language that should form the basis of it. But even here, I find the position put forward by Christopher Hall in his article Moving beyond accuracy: from tests of English to tests of ‘Englishing’ almost immediately undermined.

    On the one hand, Hall declares that teachers and testers should “question the monolithic position” and that “[w]hat is not helpful […] is the presentation of [standardized native speaker-based] norms as the only ones for successful English usage.” (2014:377); on the other hand, he also concedes that “I recognize the need to test conformity with such varieties under many circumstances (for example in some EAP contexts)” (ibid.).

    As far as I’m concerned, as soon as Hall acknowledges that “conformity”, standards and accuracy still apply to assessment for specific purposes (such as EAP, or even “some EAP” contexts), then the argument is effectively over.

    Education in general is there to set people free, to encourage intellectual development and promote social mobility. This equally applies to education in English language, if not more so. To encourage parochialism is surely to defeat the object of having a more or less standardized international form of communication in the first place.

    For many in the world, this means acquiring a good – more or less – standardized form of the use of English for certain purposes (such as those noted above) and so efforts to avoid assessing (and thereby teaching) such types of language may place a restriction on students before they have even begun learning. And it is to this that I am most strongly opposed: as teachers and materials designers – we do absolutely have to accept certain aspects of the status quo – not necessarily uncritically, but nevertheless we do for the most part have to accept it.

    Education tends towards conservatism of attitude (though not as a rule of politics) precisely because, and especially with regard to the young, it is hard to say what they may need to know in the future. No one wants to experiment with the future life chances of someone else’s child, or someone else’s potential.

    For my own part, I would much rather teach with accuracy and a more standard model in mind now because frankly, I don’t know what kind of English my students may want or need to use in future. To be honest, most of them have no idea either and may not come to a decision until long after they have forgotten my name (which admittedly may not take all that long!).

    If I decide to teach to the language of an idealized educated native speaker model now, and the students choose to reject that model and/or not achieve that model, then they at least know what it is they are working toward should they at a later date decided (or have forced upon them by circumstance) the need to develop their English for much more delicate and sophisticated uses than hanging out in bars or doing shopping etc.; If on the other hand, I decide on their behalf that ‘They don’t really need to know all that stuff, they just need to be able to communicate well’, then I potentially make it much more difficult (if not actually impossible) for them to develop beyond whatever fossilized form of the language they have become accustomed to using.

    I’d rather go with the option that has more potential (if not actual) choice. And therefore, whether it is English, Latin, Elvish or Swahili, I would as a rule prefer the version of assessment that promotes accuracy in relation to communicative competence across the repertoire of uses for that language.

    Reply
  11. Thanks for your fascinating response to my ‘thought experiment’, Nik. However, I suspect you didn’t understand the point of it, and it’s my fault for wording it badly. It’s not relevant that there are, or that there are not, exams of Esperanto (or of Latin or of Klingon, for that matter). The question should be, ‘Given there are no educated native-speakers of these languages, by what standards would such exams (or are such exams) scored?’

    The answer, presumably, is that there is some normative standard based on someone’s idea of what a native-speaker would be like, if there were such a thing. But this would seem to be neither a valid nor a fair way of testing, given its arbitrariness. Nor would it say very much about the Esperanto (or Latin or Klingon) test-takers communicative effectiveness. It would say more about the tester’s own particular biases and dispositions. The standards, effectively, have been manufactured and are based on an ‘airy nothing’.

    Take a more realistic example. Assume a test of (standard) English includes this item: ‘What did you do ___ the weekend?’. As an educated native-speaker of (New Zealand) English, I would, of course, answer ‘in’. If the examiner was an educated native-speaker of British or American English I would probably be marked down. But who, really, has failed here?

    The example is not trivial: language examinees face this problem on a regular basis. As a learner of Spanish, I’ve found that the goal-posts are constantly shifting. Which, for example, is correct: ‘le hablé’ or ‘lo hablé’? And why should it matter? (See http://en.wikipedia.org/wiki/Loísmo for an explanation, if you’re not a Spanish speaker). Likewise, there is fierce debate among Catalan speakers (all native speakers, presumably) as to which preposition is correct in certain contexts: ‘per’ or ‘per a’.

    Where, in the end, does accuracy reside? What is ‘accurate’ Klingon really like? And what is ‘accurate’ Catalan really like? Even where there are educated native-speakers of a language, such as Catalan or Spanish or English, who is to decide which of these (possibly millions of) speakers rules? Perhaps, as Humpty Dumpty put it, “The question is, which is to be master – that’s all.”

    For the moment it’s the native speaker who still calls the shots. But which native speaker, and with what authority, and for how long?

    Reply
  12. Scott,

    I think I probably owe an apology because as usual I wrote rather too much, inviting the tl;dr response it probably deserved – my rather weak defence for that is that I tend to get carried away by certain hobby horses and this was one of them.

    All that being said, I’m not sure that I did in fact miss the point of the ‘thought experiment’ – whether the question was “How would you assess a speaker (or writer) of a language that has no native speakers, putative or otherwise, such as Esperanto?” or “‘Given there are no educated native-speakers of these languages, by what standards would such exams (or are such exams) scored?’” my answer is:

    The conclusion is – as it should be – that the form of assessment is matched to the purposes for which the language will eventually be made use of.

    I then tried to hint that the distinctions between the uses of, say, Sanskrit, Klingon, Swahili and English are carried over into the forms of assessment (formal or informal) used for each.

    Languages with no native speakers tend to be oriented toward accuracy because either they are mostly used for translation purposes (e.g. Sanskrit) or there is a prescribed set of rules without which the community of its speakers could not exist at all (e.g. Esperanto).

    I suggested that English was almost unique in this regard, not because English is unique per se, but that the way in which it is used – in terms of sheer coverage, scope, depth – is unique in modern history (while there have been and still are other lingua francas none can ever have been said to be so truly global in the way that English quite clearly is).

    With respect, I was therefore trying to suggest that the ‘thought experiment’ was not truly viable because it didn’t take context or use into account and the experiment disintegrates once those issues are included or only returns a superficial answer if they are excluded.

    What I have been arguing all along is that there is not only not an issue with basing a syllabus on the repertoire of an (imaginary?) idealized native speaker but that it is also inappropriate to describe it as an ‘arbitrary’ rather than a principled decision to do, especially with regard to more sophisticated post-intermediate uses of an L2.

    I agree completely that ‘What did you do ___ the weekend?’ is potentially problematic, however what is important here in terms of assessment is the context and use to which such a sentence is being put. I know four sets of exams well, Cambridge main suite, Cambridge young learners, IELTS and the Pearson PTE-Academic and in each one of these the use of a ‘wrong’ preposition in that slot would be mitigated by other considerations such as context and task appropriacy, overall fluency, consistency of style etc. (in fact, as you probably know, that particular example would be marked as correct with the options at / in / on in each of those exams to the best of my knowledge).

    English, thankfully in my opinion, does not have a single official body to make declarations of what counts as absolutely correct in such cases (such as Spain’s Real Academia Española has). But even if we were to say that IELTS, as the world’s most popular English language examination (so we are apparently told), stood as a proxy ‘Academy’ of English, there is no point at which it’s descriptors would lead to a pass or fail over such minutiae in usage.

    To try and avoid another case of tl;dr (if it’s not already well past that point!), I would like to ask more about this:

    “For the moment it’s the native speaker who still calls the shots. But which native speaker, and with what authority, and for how long?”

    I am genuinely fascinated as to why this is a cause of no small concern to some people and arouses such passions, especially when we are talking about English as opposed to other languages (or at least, that’s how it seems to me.

    For instance, I was speaking to a German friend recently, who now teaches EAP in the UK but who has also taught German as a second or foreign language in the past, and she seemed to see no difficulty at all with the idea that a German language exam should be based on the speech of an idealized educated (Hochdeutsch or Standarddeutsch) native speaker – yet German, too, has a wide range of accents, dialects and national models on which that native speaker could potentially be based.

    Reply
    • Thanks, Nik, for this interesting discussion (or dialogue, as we seem to be on our own!) By way of drawing it to a close – and returning to the topic that triggered it, I’d re-iterate that SLA research AND language testing are each predicated on the assumption that the goal of language learning is to achieve native-like competence, and that the benchmark for measuring success (or, more often, failure) in both endeavours (i.e. research and testing) is the educated native-speaker. I’d add that not only does this benchmark ignore the inherent multilingualism of the learner, it assumes the existence of what is, for all intents and purposes, a mythical best, or, as Pennycook (2012) puts it, ‘a folk concept, held in place to signal certain ideas about language’’. He adds: ‘The idea of native and non-native speakers really does not do any useful work in thinking about real language use, and at a great deal of harm as a categorisation that cannot escape its roots in nationalism, racism and colonialism’.

      Reply
      • Many thanks indeed, I’ve found it very stimulating and hope it was of some interest to you too. While I hope you appreciate I have taken your points seriously and have given them much thought (or as much though as I am capable of giving!), I still find my original position – that the deployment of an educated NS in language syllabuses and assessment is neither detrimental to or disruptive of a learner’s identity or self-esteem, nor an arbitrary decision but a principled one – is still valid.

        This must be for another occasion (should one arise), but on a final note I’m afraid I find Pennycook’s ideas concerning (Neo-)Imperialism and the English language absolutely preposterous. And I would like to point out that this is despite the fact the great majority of my master’s degree was devoted to the study of World Englishes (especially but not exclusively in West Africa), Contact languages and Postcolonial criticism; while this was only an MA and so I am in nowise claiming expertise in these areas, I do feel at least both generally familiar with and sympathetic to a number of issues that also concern Pennycook – and yet I still find that he is way, way off base in much of what he proposes.

        Anyway, many thanks again and have a good day.

        Reply

Leave a Comment