We believe that artificial intelligence (AI), machine learning and natural language processing are going to have a massive impact on ELT, and probably more rapidly than many might expect. A fascinating example of this is a new product from Cambridge called Write & Improve, which aims to provide automated help with writing. Diane Nicholls is one of the team behind the product, and we asked her to tell us more about how it. In this in-depth interview Diane talks about how the system works and, perhaps even more interestingly, how it was developed and what was learned in the process.
We think it encapsulates a lot of where ELT is heading – both in what the product itself is trying to do, but also in the way the project has brought together the worlds of ELT, academic research and technology in a way we haven’t seen before. This isn’t a review or endorsement of the product (you can try it out for yourself and see what you think), more a case study in how a product like this comes to exist and the thinking that goes into it.
Can you give us a brief overview of what Write and Improve is?
It’s a free online automated English language writing assessment and feedback tool that exploits advances in computational linguistics and machine learning to provide writing practice and feedback for EFL learners in an intuitive, engaging and easily interpretable way. Importantly, it’s a pedagogical tool and a practice environment, not a text editing facility.
Who is it designed for, and how are you hoping it will help them?
It’s designed to be a supportive, encouraging environment for learners of English at all levels to practise their writing in a low-stakes, pressure-free environment, either in the classroom or for self-study, on mobile, tablet or desktop.
It’s based on the simple belief that the very best way for a learner to improve their writing, is through practice and feedback.
To provide the practice, it offers a range of essay prompts based on Cambridge English exams from KET to CPE. Learners choose the level they’re comfortable with or aiming for, select a prompt that appeals, and start writing. Alternatively, their teachers create Workbooks and tailor-made tasks for their students and assign them by sharing an invitation code or sending an email. Learners can also use the same Workbook function to set their own tasks, so that they can write about anything they like, or do their homework.
The main thing is to get learners writing. Through this writing practice, they gain confidence and, where relevant, familiarity with what might be expected from them in an exam.
When the learner submits their draft, the system provides targeted feedback of three types – summative, formative and indirect semi-corrective – in about 15 seconds. They can then make edits and resubmit as many times as they like. The aim is to gradually reduce the amount of shading and the number of feedback tips in their essay and, hopefully, raise their estimated score.
Because the system is non-judgmental and anonymous, they learn to ‘have a go’ and, because the feedback is immediate, they’re encouraged to keep trying.
It’s hoped users will learn the benefits and habit of reviewing their own writing, gain an awareness of their common and fossilized errors and the ability to spot and fix them, and become aware of other areas of frequent confusion to study further or take up with a teacher. Ultimately, we hope learners will start to enjoy writing!
There’s a lot of concern at the moment that AI is seeking to replace or sideline the role of the teacher. Is there a risk that Write & Improve could be seen as a threat?
To the teaching of writing by human teachers? Not at all! I doubt I need to list for your readers what a qualified and dedicated teacher brings to the teaching of EFL writing. Write & Improve is already miles better than no teacher. But even with the sort of scaffolding and other automated support we have planned for the near future, it is just a tool. What it can do is train motivated learners to identify and eliminate their common and repeated errors so that the version the teacher sees is free from those and the teacher is freed up to focus on the things only they can help with – discourse organisation, argumentation, nuance, and much more. Despite all the hype about AI, I can’t imagine an intelligent tutoring system ever being able to help with those things, because it can’t understand or infer communicative intent from ill-formed text, as a human can. Being sensitive to and anticipating the needs of a learner and finding just the right way to encourage and motivate them? Connecting with a learner and inspiring them? Being a language-learning role model? These are also not something anyone needs to worry about.
How exactly does it work?
It works by supervised machine learning based on an algorithm which is fed training data from the 30-million-word error-annotated Cambridge Learner Corpus and, from that data, ‘learns’ to spot the same errors and patterns of error in any future L2 data that is fed into it – a continuous improvement process. Basically, it ‘speaks’ learner English, exclusively – in fact, perfect English, purple prose and made-up nonsense confuse it enormously, because it’s so different to the training data. As it receives more data from users and our annotation team annotate that data and feed it back into the pipeline, it learns more and gives more and more accurate feedback. It was out in Beta, collecting data and learning from it for more than three years before launch in September, so it’s already very accurate. But, like teachers (and us!), it’s always learning.
Because it’s a pedagogical tool and because it can be used with or without a teacher, the algorithm is calibrated to be extremely cautious.
It only flags up possible errors when it’s more than 90% certain it’s right, as research is clear* that when it comes to feedback in pedagogy, an error left unmarked is less damaging than a correct use marked wrong. Precision is key here. This is all the more important in the absence of a teacher.
The system also refrains from making suggestions if it’s unable to be sufficiently certain what was intended. Instead, it highlights the whole sentence as dubious and needing attention. The learner is then, at least, pointed to areas where they could usefully focus their attention.
The algorithm’s also carefully calibrated not to give too much feedback in one go. We all know that too much red pen is ultimately demotivating and counter-productive. Instead, it returns initial feedback on common errors after first submission and then, as the learner edits those errors, surrounding errors become marked on next view, where possible and appropriate. This is designed to keep the user motivated and engaged.
There are three types of feedback:
- Summative, overall assessment of writing competence, giving an approximate CEFR level, allowing learners to benchmark their progress against their peers. This score is based on their language production only.
- Indirect, semi-corrective feedback on word-level issues, including spelling, grammar and vocabulary choice, plus a qualitative assessment of each individual sentence, making the learner aware of areas that might need attention. Crucially, with this feedback there is no “right answer” – don’t do that; do this. We only make suggestions on areas that need attention and, if clear, why. It encourages students to revise their work and assess the feedback to make their own decisions. There’s no spoon-feeding here!
- Overview feedback, served with a heavy dollop of teacherly encouragement. It lets the learner know how they’re doing now – measuring themselves against their last performance, not anyone else’s. In this sense, it is personal, adaptive and formative. There are currently 268 different graded feedback bubbles to suit different achievement scenarios and levels.
We also provide a personal progress graph so learners can visualise their progress on a task or across tasks:
This overview feedback performs three main functions:
- It reassures learners that they’re not practising writing in a vacuum.
- It takes the focus away from levels in cases where the level has not improved.
- It emphasises that progress is incremental and not in six increments from A1 to C2, but little by little, within a level. That every little bit of progress counts!
Are there any other products that do something similar?
No. As far as we know, it’s completely unique. I think some users expect it to work as a spelling and grammar checker like the one Word offers, or a program like Grammarly or StyleWriter, for example. But that’s not what it is at all. It’s designed exclusively for EFL learners, and for teachers to use with their learners. It provides practice materials and a platform for that practice and, in its feedback, rather than correcting writing, it gives suggestions, leaving learners to reflect and make decisions themselves. It’s a pedagogical tool; a permanently free writing gym where learners are encouraged to practise, revise their work, then keep on practising, because, well, that’s how to improve.
What’s the story behind its development?
Like most things, it started with a conversation. With Write & Improve, it was between Ted Briscoe and Michael Milanovic. Ted Briscoe is Professor of Computational Linguistics at the University of Cambridge and co-founder and CEO of iLexIR Ltd, the company that spun out Swiftkey, maker of the world’s most popular predictive keyboard for smartphones. Mike Milanovic was CEO of Cambridge English Language Assessment (now retired), and had been working in English language teaching and assessment since 1977. They discussed the great things that could be achieved if the natural language processing and machine learning expertise at the Cambridge Computer Laboratory could meet the invaluable learner language data represented by the Cambridge Learner Corpus, a corpus of 30 million words of the written production of EFL learners which had been built, annotated and analysed over a period of 23 years by Cambridge English. That data, they agreed, would make excellent training data for pedagogical tools for learners.
As a result, English Language Intelligent Tutoring (ELíT) was created as a technology transfer business to develop the vehicle that would bring the data and the expertise together and drive the results out into the world. Paul Butcher then joined ELíT as third co-founder and CTO. Most recently, Paul was Chief Software Architect of SwiftKey, the #1 best-selling Android application worldwide for 3 years (SwiftKey was recently acquired by Microsoft). His role is to ensure that ELíT’s technology is robust and adaptable, and can scale to accommodate the demands of millions of learners worldwide.
The technology investment from the founder was complemented by funding from our Joint Venture partners at Cambridge University; Cambridge University Press and Cambridge Assessment.
Tim Parish and I had both worked with Ted on a variety of Natural Language Processing (NLProc) projects over the years and Tim, a software developer, joined to take care of the NLProc pipeline; the behind-the-scenes parts of Write & Improve which process learners’ submissions. I took on management of the data annotation project and recruited our team of 5 veteran EFL teachers to do the annotation. I had worked on the Cambridge Learner Corpus annotation for 20 years (part-time!), so I know my way around learner writing and the training data. And as I’ve worked in ELT materials development for just as long, I have a lot of input into the learner-accessibility and pedagogy of the content.
Meanwhile, Paul put together a crack team of developers, 4 of whom came to ELiT as a package deal after the demise of music-streaming service, MixRadio, and they work out of a converted shipping container in Bristol. Henry Garner, our data scientist and a Clojure expert, works from London and is responsible for the data capture and close analysis of learner behaviour.
Finally, Sara Garnham, joined us as General Manager in the spring of this year and is coordinating brand and marketing as well as relations and communications between the various stakeholders. She brings a lot of energy and business expertise to the company.
Did the proposition change or evolve during the course of development?
The big picture hasn’t changed, no, but as always, the devil is in the details.
What did you learn through developing the product?
Write & Improve was in Beta for more than 3 years and underwent rigorous testing with schools, and user and usage data were carefully analysed at the computer lab, among ourselves, and at the ALTA Institute (Automated Language Teaching and Assessment), which Ted directs. And so, that was a long and productive learning phase. One important lesson was that we needed to make the sign-up process much slicker and allow users to use the site without having to create a profile, at least initially. The leap from the Beta version to the current live version was enormous in terms of look and feel and technology. Crucially, we went from a traditional website model, to a Single Page App (SPA), which gives us much quicker response times. We were getting assessment times with the old Write & Improve of about 40 seconds, sometimes more, but now it’s around 15 seconds.
Since launch, of course, we’ve continued to learn. The product’s still under active development and we’re constantly assessing learner activity on the site to better understand how to make it more effective for them. We achieve this through a mixture of web analytics, data mining, and a variant of A/B testing called bandit testing. This is Henry’s department. A bandit test is similar to an A/B test, but, rather than assign variations equally amongst the users of Write & Improve during the test phase (as we would with an A/B test), a bandit test constantly analyses which variations are leading to positive outcomes and prioritises them in real time.
We’ve used this technique to prove that user-interface changes are actually yielding improvements in learner outcomes. For example, we log whether learners are interacting with the detailed feedback suggestions we provide – a strong indicator that the learner is attempting to engage with Write & Improve’s suggestions. Through bandit testing, we learned we can encourage learners to ‘click and reveal’ detailed feedback if we return the first detailed feedback pre-revealed. As data was collected on learner behaviour with this test in place, the site automatically adjusted itself to always present learners with their feedback this way.
Well, it’s also helped us prove that lots of things don’t make that much difference, but I don’t think that’s the most exciting message! For example, our pop-up messages encouraging users to create a profile are all bandit tested, and Henry still hasn’t uncovered any particularly statistically significant difference between our various messages (including the variant which just randomly chooses one). In a similar way, it helped by showing us that, surprisingly, conversion rates weren’t differentiated by whether we offered 1, 2 or 3 pre-signup tasks before asking users to sign up.
That last example led to a deeper scrutiny of what was actually going on, and the fact that most users weren’t reaching their pre-sign-up task entitlement. That came from ‘data mining’ – it’s the data mining, rather than the bandit testing, that’s showing the number of answers attempted by the average user each day is very gradually increasing for example. And how many and what sort of workbooks users are creating.
Luke, one of our Bristol team, managed to lure a number of EFL learners into the shipping container to do some live observation of them using the tool blind. This gave us a lot of interesting UX insights, many of which we’ve acted on.
So, based on these findings, we’re constantly trying different versions of functionality and optimising them in real time based upon real user behaviour.
What’s the business model?
The vision behind Write & Improve is to contribute to the democratisation of English language learning – easy access to a simple tool that can help anyone (with access to the internet), on any device, anywhere in the world, increase their confidence and improve their English, leading to greater opportunities in their life. To that end, we’re committed to keeping everything that Write & Improve currently offers free for all users. Going forward, we will make greater functionality available for paying customers as part of a freemium model. By the end of 2017 we hope students, teachers and institutions will all be customers of Write & Improve, and, looking ahead, will be able to use ELiT tools to support the other skills, not just writing. We are also working with our joint venture partners, and others, to incorporate this technology into their product ranges.
How would you like to see it developing in the future?
Write & Improve’s been built from the ground up to be a data-driven product. In practice, this means that future development will be guided by a close understanding of which features will improve learners’ experience the most, and help ensure that everyone using Write & Improve is motivated to achieve their goals.
We’re constantly working on Write & Improve, adding new features and adding to those already there, so it’s developing all the time. And we’re learning all the time, too, by studying how our users use the tool and responding to user feedback. This week, we added a History view, which will mean learners will be able to trace back through every iteration of their essays, from first draft to last, to see what they wrote, what feedback they got, what they did in response, how it improved their writing and their score etc. All their work is saved chronologically in a library they can revisit at any time. Of course, that means they can also share previous drafts with their teachers, for example.
Colleagues at the ALTA Institute at Cambridge University have been working on a prompt relevance function that will provide a score alongside the CEFR level for task achievement, gauging how relevant the writing is to the prompt. That’s all ready to go but we found it was slowing down the overall assessment, so it’s undergoing refinement. This feature, we hope, will help learners focus on *answering the question*, which is something we know they’re not generally good at.
We’ll also be starting to roll out our Premium offerings early in the new year, including a full teacher mode, so that teachers can see their students’ work and get diagnostic reports on progress etc.
And a trophy cabinet is coming soon where users will get ‘badges’, not just for achievement but for attendance, perseverance, frequency of sessions etc. Anything that will keep them engaged and keep them writing!
In the long-term, there are exciting plans for a Speak & Improve product, and much more besides …
And personally, I’d like to see Write & Improve becoming a regular habit with students and teachers as a way to practise their writing and learn to be better reviewers of their own work. Since launch in late September, learners in 180 different countries have had free writing practice and feedback with Write & Improve. I get a real kick out of that and I’d love to see that number get as close as politically and geographically possible to the full 195! One learner responding to the ‘tell us what you think’ prompt in Write & Improve said ‘I’m even starting to like writing!’ If we could get other learners to feel the same, that would be amazing, too.
Write and Improve will be involved in the first ever summer school in Machine Learning for Digital English Language Teaching (ELT), to be held 3-7 July 2017, in Chania, Crete, Greece, organised by the Automated Language Teaching and Assessment Institute, University of Cambridge. Expect to leave with a better understanding of various aspects of Machine Learning, Natural Language Processing and Psychometrics and how they can apply to your ELT context. Also expect to leave with a bit of a tan! For more information and tickets, check out the summer school website.
Write & Improve website: writeandimprove.com
Write & Improve promotional/overview video: https://www.youtube.com/watch?v=5EwJnFRfK9I
* www.aclweb.org/anthology/C10-2103 Ryo Nagata and Kazuhide Nakatani. 2010. Evaluating performance of grammatical error detection to maximize learning effect. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, pages 894-900, Stroudsburg, PA, USA. Association for Computational Linguistics.
A lot of the early background research and NLProc is in this paper, if you’re interested in looking into it in more depth:
You can also find Write & Improve on Twitter at @WriteandImprove