This week ELTjam are at the ALTA Machine Learning Summer School in Crete and you can read regular updates of what’s happening here on the blog.
Today, Day 3, through her morning workshop, Diane Nicholls gave us an insight into the human element to the Write & Improve (W&I) product, both in terms of the annotation done to the the text by human annotators, and the insights that teachers can get into their learners’ progress (You can read the summaries of the other days here: Day 1, Day 2, Day 4, Day 5)
For those of you who don’t know, W&I is free a tool that allows learners to have instant, automated feedback on their writing, along with suggestions for how to improve the text when they resubmit. There are also a range of teacher tools to support learners.
Seeing W&I in action was a really interesting insight into the inner workings of the product. What follows is a summary of the day and a list of things it would be great if we could collectively answer! As with other days, this is just what I understood from the day and I hope any corrections or clarifications will be made in the comments.
Human Annotation of Errors
Each text submitted to W&I is automatically graded by the system and is then passed to a human marker (called a humannotator) behind scenes. The corrections that the human makes to the text are never seen by the learner, but are used to feed back into the automated marking system by facilitating the creation of rules. I’m not exactly sure, however, of how these rules are then used and how they fit in with the system as a whole (see below for questions from the day. In this case, question 1).
Each script submitted to W&I passes through a number of stages:
- A script is submitted and the learner gets an automated score.
- The same text is then marked by a human annotator in the system. This involves the addition of text, removal of text, or replacement of text for something different. The marker then adds a CEFR score and submits their corrections. The interface for the marking is clean and clear and effectively acts like Google Docs or Word’s track changes feature.
- These corrections are then checked by Diane Nicholls, who makes any changes to what she sees as inaccurate correction (see question 3, 4).
- Based on the Pat of Speech (PoS) tagging and other analysis that the system has done of the learners’ text, and the corrections supplied by the humannotator, the system is then able to classify all the errors (see question 5)
- Rules are then created from an error. For example, if ‘I goed to the beach’ was consistently corrected to ‘I went to the beach’, a rule may be created to say that this change should always be suggested to the learner. In order to highlight some of the challenges in rule creation, Diane created a short quiz where for every learner error and suggested correction, we had to decide firstly if the error was always wrong, and then if it was, could it only be corrected by the option in the question. In very few cases is something a) definitely wrong and b) fixable in only one way. (See question 6)
- These rules are then used to feed into the automated error correction and suggestions for the learners (again, see question 1, 2)
For teachers, W&I allows you to create a class or ‘workbook’ that you can invite learners to via email or a code that they enter through the website. This allows teachers to track the progress of their learners in class groups.
Once in, a teacher can see a huge amount of information about their learners’ writing. They can see all the essays they’ve submitted, and the number of resubmissions, the differences between the different attempts, the scores they got for each, and then comparative data about the learners in the class and how they are progressing. This is a really impressive teaching tool to track writing progress.
Some teachers have also expressed an interest in marking and annotating the work of their learners directly and this feature is on its way. This has a dual benefit for W&I as it provides functionality that teachers want and it also provides correction and annotation data that W&I can use to improve the quality of the automated correction. The idea would be that teachers can annotate the texts and add the specific codes for the errors that they see, then assign a grade and add global feedback and present back to their learners.
As this feature is still being conceptualised, Diane asked the group what they would want to see in a feature like this. The key points to come from this were:
- In order for this feature to be effective, it should reflect how teachers generally behave and the way that they currently do their marking for their learners
- It should be easy to use an intuitive
- There is an opportunity here to allow for more global level feedback at this stage as all the other feedback to the learner is very grammar-/vocab-based. This could be an opportunity to give feedback on task achievement, cohesion and coherence at text level (question 7)
- It’s important to speak to teachers during the development process, to do focus groups and understand better what teachers need. Effort should be made to ensure this is done in a way that the questioning doesn’t influence the answers!
It was really great to see the W&I features in action but it was still a bit hard to see how all the things we had seen in the last few days actually practically come together to offer the learner feedback the system provides (question 8)
1. How exactly are the rules that come from the humannotation used in the automated correction?
2. Are we close to, or would it ever be desirable to get to, a point where we no longer use human corrections and annotation, and just rely on error correction methods that use native data?
3. If it’s desirable for error correction techniques that use annotated data to have as many annotations of the same data as possible, why do W&I effectively collate the correction of the original marker and then the reviewer into one rather than treating them both as separate annotations?
4. Is there a possibility that learners using W&I are effectively being taught to write in a specific way that the humannotator team and Diane find acceptable? Would that be a problem if so?
5. Is it the case that all error classification is done automatically? What happens with the longer range tagging such as idioms? I saw no way of these being added by the original markers.
6. If a truly accurate error correction rule is hard to define, what are the precision and recall rules/thresholds around how these W&I rules are used? In Day 2 input we saw that the rule-based error correction systems is one of the least effective and most time consuming to update. Are the rules created from the human annotation being used in this way or are there other other benefits/uses?
7. Are there any plans to try to automate feedback at wider text or paragraph level rather than just grammar and vocabulary checking. Is it the case that now in W&I a learner can write any response to any prompt and the score / feedback they get would be the same
8. It seems that W&I uses a range of different NLP and error correction techniques. How is it decided on the level of impact that each technique has in terms of the overall experience for the learner. How are those algorithms calculated and tweaked? It would be good to get an understanding of how this works.
9. We saw presentations about very cutting edge techniques and data around error correction and ML. How closely does the W&I backend follow these developments? Is there a lag?
10. A couple of times, when asked about whether a particular technique or technology was being used in W&I, an expert would say something like “the learners wouldn’t notice a difference from this advancement”. Can this really be true? If it is true, what are the other incentives for the sorts of research being discussed?
After the session on Wednesday, there was a coach trip around the local Chania area. I didn’t attend but hear it was good fun. There are some photos here.