It started with K.I.T.T.
When I was four, going on five, a TV show called Knight Rider premiered in the UK. I don’t remember much about the first episode, but I do remember one thing very clearly: my dad’s uncontrolled excitement. It built throughout the day (‘It’s going to be amazing! Amazing!’), and he made absolutely sure I was sat next to him when the credits rolled and that iconic theme song began.
All that fuss, all that excitement, for a talking car.
Dad was right, though; it was pretty amazing. I loved it and remained a fan for most of my childhood (OK, I admit it; I’m still a fan). There was The Hoff, of course – all leather jackets, open shirt buttons and swagger – but the real star of the show was K.I.T.T – Knight Industries Two Thousand – the ‘advanced, artificially intelligent, self-aware and nearly indestructible car’.
Whatever it was that we loved about K.I.T.T, it must have stuck. Over thirty years later, two of the most successful companies in the world (Apple and Google) are in a head-to-head race to bring K.I.T.T’s spiritual successor – the driverless car – to market. It’s taken three decades and millions of dollars, but we can now talk confidently about driverless cars in terms of when rather than if. And, as a little-known and hard-to-spot side effect, the ramifications for the teaching of languages, especially English, could be huge.
The Cambridge connection
In 2013, ELTjam cofounder Tim Gifford had a few meetings with a small, Cambridge-based company called VocalIQ. VocalIQ was founded by a group of researchers who were working on what Tim described at the time as ‘some awesome speech recognition stuff’. We’d founded ELTjam with the hope of helping to bring exciting new technology into mainstream ELT, and I remember hubristically thinking that VocalIQ would be a great place to start. If we’d had a pile of funding capital sitting around, we might even have considered investing (they didn’t receive their £1.28m seed round until June 2014, close to a year later). Looking back, that investment would have been a smart play: just over a month ago, VocalIQ were acquired by Apple for an undisclosed amount. ‘We saw them first,’ I mumbled to myself.
VocalIQ’s website no longer exists, but the company is described on Crunchbase like this:
VocalIQ was formed in March 2011 to exploit technology developed by the Spoken Dialogue Systems Group at University of Cambridge, UK. Still based in Cambridge, the company builds a platform for voice interfaces, making it easy for everybody to voice enable their devices and apps. Example application areas include smartphones, robots, cars, call-centres, and games.
It’s unclear when the Cambridge University Dialogue Systems Group’s website was last updated, but, according to the information it currently displays, the group aims to,
design systems that can be trained on real dialogue data and which explicitly model the uncertainty present in human-machine interaction.
Currently the dialogue systems group is working on the EU funded Probabilistic Adaptive Learning And Natural Conversational Engine (PARLANCE) project. The goal of this project is to design and build mobile applications that approach human performance in conversational interaction, specifically in terms of the interactional skills needed to do so.
Apple is known for many things, but one of the most significant is its breakthroughs in user interface (UI) design, including the Apple Mac’s graphical user interface (GUI) and the iPhone touchscreen. With the release of its virtual assistant Siri in 2011, Apple revealed their vision for the future of UI design: the voice.
The acquisition of VocalIQ’s technology will certainly help Apple further optimise the voice UI of its devices, but the real motivation behind the purchase may have had more to do with cars than virtual assistants. In October, Macrumors reported on how VocalIQ had a particular interest in using voice-based technology to make driving safer. In a now unpublished blog post, VocalIQ are quoted as describing how
a “conversational voice-dialog system” in a car’s navigation system could prevent drivers from becoming distracted by looking at screens.
The specifics around Apple’s move into the electric and/or driverless car market are sketchy, but rumours of a 2019 release date circulate, backed up by evidence such as the company’s hiring of motor industry veterans over the summer. It’s also unclear whether Apple will follow Google’s pursuit of a 100% driverless car or attempt some kind of voice-mediated electric car in the interim. In either case, if Apple’s track record in creating paradigm shifting devices is anything to go by, the car’s release is likely to ultimately change driving forever. And it may, as a by-product, change a few other things too.
Beyond speech recognition
It goes almost without saying that any mobile application that (in the words of Cambridge’s Dialogue Systems Group ) approached human performance in conversational interaction would have an impact on the teaching of languages. But what might that impact look like? The answer lies in the difference between speech recognition and natural language processing.
It can be handy to think of speech recognition as the equivalent of dictation: you speak, the system recognises what you’ve said and then reproduces it. For example, I can dictate an email into my iPhone, and it will reproduce it in text form with considerable accuracy. I can also tell my phone to ‘Call Laurie’, which it would do by recognising my simple voice command. You’ll have spotted voice recognition features in some of your favourite digital language-learning products, including Duolingo. You may also have spotted that, in many of those products, it rarely works with any level of accuracy. This has led to a sense (in ELT at least) that voice recognition is kind of a gimmick – a marketing ploy at best – and not a serious tool in language-learning. In my 11 years in and on the peripheries of the ELT publishing industry, I’ve never had a serious conversation with a publisher about the inclusion of voice recognition as a product feature (others have, of course).
Natural language processing (NLP) is different. With NLP, the system doesn’t simply attempt to reproduce what you say; it attempts to derive meaning from it. It allows you to have something that starts to sound like an actual conversation with a machine. This two-minute clip from Hound (a personal voice assistant product designed to go up against Apple’s Siri), shows how far the technology has come. What hits you first is the speed of the app’s response: it’s almost instant. But there’s a naturalness to the interaction that seems both mesmerising and slightly chilling, especially when it asks questions back in order to get all of the information it needs:
Human: What’s the monthly payment on a million-dollar home?
App: What is the down payment?
Human: Hundred thousand.
App: OK, using a down payment of one hundred thousand dollars, what is the mortgage period?
Human: Thirty years and the interest rate is 3.9%.
App: OK, … [goes on to answer correctly]
Human: Show me Asian restaurants, excluding Chinese and Japanese.
App: Here are several Asian restaurants, excluding Chinese restaurants, Japanese restaurants or sushi bars.
It’s a small detail, but notice how the system has recognised that, by saying he doesn’t want to eat at a Japanese restaurant, the speaker also probably doesn’t want to eat at a sushi bar. That demonstrates an understanding of context which goes far beyond what basic speech recognition can handle. I watch that section of the video and, while I can’t yet quite imagine a free-flowing conversation between human and machine, I can quite easily see basic A2-level transactional exchanges happening, where the chances of deviance from an expected set of outcomes are slim – think checking into a hotel or ordering a meal. I can also easily imagine an app such as Hound taking the place of a human IELTS or Cambridge Main Suite speaking examiner. The more clearly defined the rubric around a task, the easier it would be to replace the human with a machine.
The humans vs. the machines
It may seem over dramatic to frame the conversation about technology in education in the stark terms of human vs. machine, but that’s exactly what it is. And there was no better example of that than the reaction to Sugata Mitra’s infamous IATEFL plenary in 2014. Mitra is often depicted as the poster child for the neoliberal takeover of education, and this in part comes from the accusation that he supports the automation of human teacher labour through the use of computers. When I visited Newcastle to interview him, he spoke bluntly in those terms:
“I don’t see why replacement of human beings with machines should be considered as negative.”
[Of hand-pulled rickshaw drivers] “Would it be unkind to replace them with a machine?”
[Of the profession of a postman] “It’s still there, but what for? It’s a job that can be done by a drone. It’s a job that needn’t be done at all.”
“Why would education be considered that one special subject where challenging how things are done because the times are changing doesn’t apply? The teacher can never be replaced by a machine. The school can never disappear.”
Imagine labour as a continuum, with 100% human at one end and 100% machine at the other. In some industries, such as car manufacturing, we’re already seeing the balance tip towards the machine. And it’s easy to see why: jobs which require the same task to be done with the same level of perfection thousands of times in a row with no deviation are ripe for automation (cue cries that the robots stealing our jobs!). But that’s certainly not what teaching is like: it’s a complex job where an infinite number of possible outcomes exist every time you walk into the classroom.
Yet isn’t driving also a complex task, with a huge number of potential, human-related variables? Where does driving sit on that same automation continuum? We’ve already outsourced one human element – navigation – to GPS devices. And, if you don’t want to drive at all, you can outsource it completely – to an Uber driver that you order on your phone. In an excellent New Yorker article (Two Paths Towards Our Robot Future), Mark O’Connell writes that,
the technology that allows you to summon an Uber, after all, and allows the Uber driver to navigate you to your destination, is on a continuum with the technology that will eventually displace the driver entirely.
In the same article, O’Connell highlights an important distinction between Artificial Intelligence (AI) and Intelligence Augmentation (IA). Apple’s plans for an electric car directed partly by voice are an example of IA: technology is used to augment or improve human performance, in this case by allowing drivers to spend more time looking at the road and less time looking at a screen or other instruments. A completely driverless car would be an example of AI: technology that is so autonomous in its behaviour that it removes the need for a human altogether.
Much of the educational technology available to teachers these days is IA. LMSs, apps, IWBs, online workbooks – they’re all intended, in theory, to augment the teacher: to liberate him or her from mundane tasks such as setting and marking homework. But the fallout from Sugata Mitra’s IATEFL talk showed that many EFL teachers are terrified of a world where technology might replace the teacher completely. They’re terrified of AI going mainstream. If we can create a driverless car, can we also create teacherless learning? Software like VocalIQ’s is a step towards that reality, and Apple’s involvement has just sped everything up.
The death of language teaching as we know it
Imagine a world where language-learning apps contain natural language processing capabilities so powerful that they remove the need for a human teacher. Apps that can offer on-the-spot feedback on errors that would outperform a newly qualified CELTA graduate. Apps that, from the learner’s perspective, seem as fluent and natural in their interaction as a human teacher. Apps that have a sense of humour. Apps that learn about you as they’re teaching and tailor the lesson content to your needs and interests. Apps that give every learner access to the equivalent of an expensive one-on-one teacher.
Left to its own devices, how long do you think it would take the language-teaching industry to develop apps like that (assuming they even would)? 10 years? 20 years? Now imagine the technology needed coming into being as the byproduct of another product, in this case Apple’s electric car. How long would it take then?
Next, imagine that these apps cost next to nothing or – as will probably be the case – they come bundled free with your phone’s operating system. What does that do to the language teaching market? What does that do to private language schools who charge hundreds of pounds for group courses? What does that do to print course books? What does that do to all of the sub-par language-learning apps currently on the market?
Some will claim that, in a world where apps like that existed, there would still be a need and desire for more traditional language teaching. And there will be, for the small group of people who are willing to pay for it. But language teaching isn’t about small, niche markets; it’s about global mass markets measured in hundreds of millions of learners. ELT as we know it doesn’t exist without those numbers.
What is a mass market – and a growing one, especially in emerging markets – is smartphones (specifically low-cost ones). So ask yourself this, whether you’re a teacher, a language-school owner, a publisher, a materials writer or an EdTech company: are the learning experiences that you’re creating – be they lessons, courses, course books or apps – amazing and compelling enough that people would pay for them if they didn’t have to? Would people pay for them if they could get something as ostensibly good for free on their phone? Nothing I’ve ever seen in language-teaching is that good. Yet. But it could be.
I don’t own a car, but I love driving. I love everything about it. I love nipping around a big, unknown city in a three-door hatchback, dodging other drivers and getting lost. I love those long, straight American highways that cut through deserts. I love hairpin turns on tricky mountain roads. I’d lose something I loved if driverless cars became ubiquitous. As an industry, our challenge is to make people love learning as much as that. To make learning such a joy that they’d want to do it anyway. In the era of Apple’s electric car, the challenge won’t be how quickly we can get a student from A to B; the challenge will be how amazing we can make the journey.
19 thoughts on “Apple’s electric car and the death of language teaching as we know it”
As always, an interesting read. So many things to comment about here, so I’ll keep myself brief…
1) Given how awful Siri is at understanding most things which are not said slowly, clearly and in some form of standard English, I would neither a) put my life (or anybody else’s, for that matter) in the hands of a vehicle I could speak to, or b) entrust any form of language learning to something so discriminatory and rudimentary. It’ll get better, I’m sure – but they’ve been saying that for decades…
2) Not everyone can run apps. Not everyone (in fact not many people %-wise) have access to the Internet for that kind of processing – so a lot of this will be lost on them.
3) Not everybody wants to have conversations in pubs, restaurants and board meetings which are mediated by iPhones. Good for a one-off – but if you want to do business, trying to do it through an app on your phone is not going to go down well.
4) Mitra, well… where to start? How will the rickshaw driver eat when his living has been taken away from him by one of Mitra’s drones? I could go on here, but it would be pointless. Mitra is like Marmite.
I love driving, too….
Always an interesting read, Nick! I’m sure it’ll happen eventually, in some capacity, but I think we’re still a ways off. I recently bought the Amazon Echo (Alexa) after seeing all the Watson commercials. There are good and bad things about Alexa, but she’s still more of a voice-activated music player than a “Samantha” like adaptive assistant and pal.
The more interesting Q, in my opinion, is whether a computer will be able to make better “global” content judgments than a human. In other words, could a computer better judge how to construct all of the content for a language app? Or choose initial language approaches?
It seems that algorithms are (becoming) equipped to make meaningful judgments about someone’s performance and preferences once they have a certain amount of data. But could they make those judgments independently? For example, could they construct the actual questions that make up an algorithm better than a data scientist could? I don’t know the answer to this, but innovation is not entirely reliant on data.
Great post, Nick. Getting clarity with the difference between Speech recognition and Natural Language Processing thing is always welcome. The AI debate around conversations between human and machine is an interesting one, and one that is set to continue for many years to come, I’m sure. Personally, I think that the video you mention is probably a little misleading, though. The annual Loebner prize exists to prove Alan Turing’s assertion that a machine is incapable of fooling a human being that its responses to questions emanate from another human. It is what logicians refer to as the ‘Universe of discourse’ that always fouls things up: chatbots (remember them?) were great at talking about hamburgers (Terry Winograd’s SHRDLU), for example, but just try and get them to talk about mustard, or McDonald’s, and they soon show themselves up.
Great article Nick, even if a bit long 😉
I have to admit I find the idea of technology becoming so advanced that it can replace many human tasks, both exciting and scary. I certainly see no reason why it should not happen from a technical perspective. It is simply a matter of time.
“But language teaching isn’t about small, niche markets; it’s about global mass markets measured in hundreds of millions of learners. ELT as we know it doesn’t exist without those numbers.”
How many of these are serious language learners of their own volition and how many are in some way being forced to learn? I have the feeling that apps and the like are great for people who want to dabble with some language learning, but people who really need to learn a language require an external force to keep them going. Of course apps and software provide reminders and other stimuli, but it is not the same as having to attend a real class and feeling guilty if you don’t go.
There are always exceptions, but my feeling is that even if we no longer need a human to teach, there will still be a lot of people that will want one. My bet is that we will see a combination of both, with humans acting as coaches ensuring that their students stay on track, but the software taking care of the actual learning.
What do the rest of you think?
“…our challenge is to make people love learning as much as that. To make learning such a joy that they’d want to do it anyway. ”
I couldn’t agree more, Nick. My recent experience trying to crank up my dormant Spanish involved technology at various points, including a totally machine-mediated pre-and post test, vocabulary learning software, texts downloaded to my iPad, and so on. All very satisfactory. And I could envisage engaging a robotic interlocutor of the type you describe, and getting hours and hours of very useful practice, including feedback.out of him/her/it. But it’s difficult to believe that anything could quite simulate the intense one-to-one conversations I had with my Spanish-speaking friend in a bar, simply because he WAS a friend, and the desire to communicate was an imperative. If we don’t attempt to re-create that desire in the classroom (because, for example, we think that explaining the syntax of phrasal verbs is more important) we are doomed.
Oops, Schank, not Winograd, sorry.
One point that you don’t make. We are still a long way away for machines to recognize second language speakers of English consistently enough to even allow basic NLP. It may seem a simple point but it’s crucial.
Regarding publishers and them all thinking speech recognition a “gimmick”. That isn’t true in my experience and at EnglishCentral we’ve partnered with many large traditional and online publishers who “get” how well speech recognition does work (especially on mobile devices) and is a serious language learning tool. We’ve built a profitable company around that. The key just like teaching in a class isn’t to focus on every error. It’s about giving them information and saying – “is your speech intelligible or not”? Imagine a classroom where the teacher corrected every error/mistake a student would make. It wouldn’t be a sterling experience.
I’d be happy to hook you up to have a conversation with our speech scientists who could explain in detail how our speech recognition works. Now with 300 million lines of speech recorded by second language speakers around the world, we are building a speech engine that can recognize second language speakers and that has important implications beyond ELT.
For those interested in machine intelligence, IBM’s BlueMix platform is not too complex and not unaffordable. You can get Watson up and running, speaking on a local machine quite quickly – http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/ He’s / It’s quite amazing.
I’m not even sure people will be learning languages much longer. Even if a few people do learn languages, the huge market we now enjoy will certainly disappear. It’s already happening. I know business people who have stopped language training altogether and prefer to use speech & translate apps. They sit in meetings and yes it’s not perfect, but neither is language training. They communicate using the app and their own professional understanding of what makes sense in their particular context. Perhaps they would like to have language lessons, but it’s just too time intensive. The ROI is better with the app.
Nice post, Nick. There seems little doubt to me that your vision is on target, although it may take a little longer than we think. For teachers to be replaced by AI options is inevitable – and fighting it is Luddite futility. But not, I think in my lifetime (mind you, I’m an OAP already).
Nick at the very heart of your essay is an assertion without merit that goes to the heart of this argument:
You said, “That demonstrates an understanding of context which goes far beyond what basic speech recognition can handle.”
Computers may be able to do many things in 2020 but they will not be able to UNDERSTAND. They are being programmed to respond to certain stimuli– not understand. If I tell a computer my throat hurts and who should I see, the computer won’t be able to understand what “hurts” means even as he can tell me where to go. This was of course a transaction but one devoid of meaning.
Is this distinction important? Will computers have opinions? Will they care about your opinions? Does a machine care if it wins or loses? Will machines be able to understand governments, churches, families, marriage, death or sadness? Will they be able to express sympathy? Will they be able to say your new shoes look cool or congratulations on passing that new test? When you receive an automated Happy Birthday from a machine does that make you feel happy or sad (especially if it is the only note you got)? When you lose your job to a teaching machine and the next day in class when a second machine says he/she/it is sorry you lost your job, will you decide to stave in its non-existent head? The T.V. show you mentioned was good but I hope you also stop to consider what happened, in another great cinemagraphic moment, when the lion in the Wizard of Oz finally arrived at his destination. Yes, I agree that teachers need to up their games and consider the emotional elements in the teaching equation. But, the real moral of this story is that there is more to learning than just i+1 and we just need to get better at seeing those things.
Mike Butler makes some good points that I *think* relate to mine. As of now, human value judgments drive algorithm development, programming, etc. If the questions or directions in an algorithm are unreliable, the users’ results will be unreliable. It’s not totally dissimilar to test development, actually.
More to Mike’s specific points…a computer currently does what it’s programmed to do. He gives examples where I do not think computers will be able to construct their own algorithms successfully, at least not for quite some time.
(Mike brought up looking beyond n + 1, so I’ll drop “n = all” here.)
Having said that, confiding in a robot or getting attached to a computer character is not impossible. There are many, many things I would rather tell an AI doctor, because I know the bot won’t judge me. I’ve also, quite honestly, grown very fond of an AI companion in Fallout 4.
To me, the Q is whether AI truly becomes I.
Oh, Nick, if I had known you were so easily impressed, I would have canvassed your opinion on my best rendition of Moon River a long time ago. 🙂
As others have pointed out, scooping out sushi bars as a subset of a restaurant array is a relatively simple algorithmic task, and a long, long way from providing spontaneous responses to random and unpredicted interactions.
I do get your general point, though, that automation is moving from purely repetitive, mechanical tasks into areas that are threatening more skilled jobs. And, why shouldn’t doctors, lawyers, teachers and accountants start to be concerned? Blue collar workers are no longer the only victims. And, yes, you are correct in urging us to consider what the essentially human (and, hence, irreplaceable) features of our job are.
However, I don’t share your faith in Apple. Sure, they make beautifully-designed products, many of which have been real game changers; but, these have always piggy-backed on the R&D of others: Xerox Park and the US government, just to name two. If we accept that computer-based NLP requires much, much more work, then Apple aren’t the company to do it. They don’t really do R&D, and their most recent product releases show that they are currently (and understandably) short on big ideas – the iWatch, for example, is aesthetically superior to its competitors (In my opinion) and maybe provides a better user experience, but it was a long way from being first to market and brought nothing new when it did arrive. And, let’s not discuss the latest Apple TV model.
Great article, really interesting. As Scott Thornbury mentioned my current language learning experiences are also online through computer-mediated learning with the very occasional conversation in French. This said I think it would be a sad state of affairs if I never used a foreign (or someone else’s) language to communicate with a fellow human being. Sure we’d be losing something if we didn’t talk to each other in person.
Nick wrote: “But the fallout from Sugata Mitra’s IATEFL talk showed that many EFL teachers are terrified of a world where technology might replace the teacher completely.”
Interesting, but I think the sentiment could have been better described as revulsion and anger.
The find myself going back to your concluding remarks again and again. I found myself first agreeing and then not. Here is how I tweaked your conclusion so I could agree with you wholeheartedly. Sorry that I took these liberties with your sentence.
…..the challenge won’t be how quickly we can get a student from A to B but also how amazing we can make the journey.
Great article Nick, but I’m afraid I don’t agree. AI will only ever be as clever as the programming it has received. I love machines, as you know, but I’m going to be bold and say it is impossible for a machine to pick up the subtle nuances of the variety of language a learner uses when interacting spontaneously. In effect much like Adaptive Learning, it can only adapt to the limitations of what had been programmed. I think technology, be it apps, learning platforms or automatas, will only ever make that language journey one hell of a great ride just like K.I.T.T.
Great article Nick, way back in 1999 I worked as the British Council’s in-house ELT futurologist in a project called “English 2000” in that pre-millennial fever of futurology, and commissioned a book called “The Language Machine” written by Eric Attwell, which you can download from his website http://www.comp.leeds.ac.uk/eric/atwell99bc.pdf David Graddol, who I’d worked with on “The Future of English?” helped with the design and editing, and I think both books stand the test of time, as its description of the challenges of creating a language machine is still relevant. This quote on page 12 still works: “Do not be bullied by authoritative pronouncements about what machines will never do. Such statements are based on pride, not fact.” 16 years on, it’s interesting to see what we got right, but also how eurocentric the book is, no mention of course of Google or indeed Apple. As for self-driving cars, I hope they work well enough by the time I’m too old and doddery to drive muse. And we may have to modify how we interact with both language machines and cars: shortly after getting her first car my daughter got frustrated at her new satnav and swore at it, telling it to “eff off!”. The satnav switched itself off.
The story of Microsoft’s ill-fated Tay chatbot is another reminder of how far off we are from AI as the provider of reliable simulation communication:
I think there is an element of truth in this article however I don’t think we will see a death of language teaching as we know it. Learning a language is a personal thing and in my opinion there plenty of students who would still like to learn a language from a human being. We even had a class debate on this at one point. Whilst I don’t dispute the technology as quoted in this article won’t come into force I feel it will be a 50/50 split in 10-20 years time. Sure, there will always be people looking a cheap app or gadget to fix their language learning needs but I fully feel that most students will still want the human element. It’s true that a great many things can be automised but I feel teaching has too many human elements required to just completely die. It might make the competition more fierce in the future but what I feel that could happen and will be a good thing for they industry is that it will divide the poor quality language schools from the good ones. Ion there words the cheap and not-so-cheerful lower quality schools will go out of business to people using apps or other types of software and the better quality schools will remain because those that can afford it would rather learn with a human than via device. It’s just my two cents, I could be completely wrong – but it’s my general take on the situation.
Thanks for the comment, Jonathan. Our recent experiment with our ELT ‘bot’ (which isn’t a bot at all) would certainly seem to echo what you’re saying. People place real value on the human side of things.