The man sits down in front of the computer and says, affably: "Computer!" Nothing happens. In a now-hear-this tone, the man repeats: "Computer?" Still nothing happens. Puzzled, he picks up the mouse and speaks into it: "Hello, computer?" Beside him, the impatient owner says: "Just use the keyboard." The first man replies, "A keyboard?" Then, slightly annoyed: "How quaint." The scene comes from the 1986 film Star Trek IV, where Scotty, the engineer, and the rest of the crew have flown back in time from the 23rd century; Scotty needs to get some work done on the computer, and, of course, in the 23rd century they all work by voice command, unlike those 1980s' throwbacks. Ha, ha. Yet if the crew were to land 35 years later, in the present day, Scotty would still be just as puzzled at the computer's lack of responsiveness, unless, that is, he picked up one of the latest breed of smartphones, where being able to respond to the human voice has become the new frontier in interaction. Since October, people have been buying and using Apple's new iPhone 4S, which comes with a function called Siri, a "voice-driven assistant" which can take dictation, fix or cancel appointments, send e-mails, start phone calls, search the web and generally do all those things for which you might once have employed a secretary. Siri isn't just a "voice recognition" tool, though it can do that (so you speak some words and it turns them into text, and sends them as an e-mail or a text message). You can also ask it questions such as: "How's the weather looking tomorrow in London?", and it will come back with the forecast for London ("England"). It will do currency conversions or give stock prices. Or try asking it, "Why is the sky blue?", and, after a little thinking, the screen will show an explanation: "The sky's blue colour is a result of the effect of Rayleigh scattering." Lots of people think Siri isn't anything new. We have had voice-dialling on our phones for ages (where you say a name to the phone, link that to a phone number, and when you say it again, it dials it). Google already offers a voice-driven web search app, which it backed with a big poster campaign around London earlier this year, with phonetic spellings of searches you might want to do — "pih-ka-di-lee sur-khus" or "tak-see num-buhz". But experts say that Siri and what it represents might be as subtly revolutionary as the iPhone's multitouch screen was when unveiled in January 2007. That is because Siri isn't just "voice dialling" or "voice recognition" (which tries to turn speech into its text equivalent); it is "natural language understanding" — or NLU, in the lingo. I have been testing an iPhone 4S, and have found lots of uses for Siri: for doing currency and other conversions (square feet to square metres, anyone?) or finding companies' stock values and market valuations (useful in stories) when a web search would distract from writing, and in dodgier parts of town to play songs or playlists (instructed via the headset microphone) without taking the phone out of my pocket. Origins in a US defence project Siri grew out of a huge project inside the Pentagon's Defence Advanced Research Projects Agency (Darpa), those people who previously gave you the internet and, more recently, a scheme to encourage people to develop driverless cars. Siri's parent project, called Calo (Cognitive Assistant that Learns and Organises) had $200 million (Dh734 million) of funding and was the United States' largest-ever artificial intelligence project. In 2007 it was spun out into a separate business; Apple quietly acquired it in 2010, and incorporated it into its new phone. When you ask Siri to do something, it first sends a little audio file of what you said over the air to some Apple servers, which use a voice-recognition system from a company called Nuance to turn the speech in a number of languages and dialects into text. A huge set of Siri servers then processes that to try to work out what your words actually mean. That is the crucial NLU part, which nobody else yet does on a phone. Then an instruction goes back to the phone, telling it to play a song, or do a search (using the data search engine Wolfram Alpha, rather than Google), or compose an e-mail, or a text, or set a reminder, or — boring! — call a number. NLU has been one of the big unsolved computing problems (along with image recognition and "intelligent" machines) for years now, but we are reaching a point where machines are powerful enough to understand what we are telling them. The challenge about NLU is that, first, speech-to-text transcription can be tricky (did he just say, "This computer can wreck a nice beach," or "This computer can recognise speech"?); and second, acting on what has been said demands understanding both of the context and the wider meaning. The demonstration that computers have cracked this, just as their relentless improvement cracked draughts and then chess, came in February this year, when IBM's Watson system competed in the American game show Jeopardy!, where the quizmaster provides a sort-of answer and you have to come up with the question. (For example: "Originally called What's the Question, it got its name when a producer at testing said it needed more jeopardies.") Except Watson didn't compete against just anyone. It was ranged against two humans who between them had scores of wins, and won millions of dollars in prize money from the game. They battled it out over "answers" such as "William Wilkinson's An Account of the Principalities of Wallachia and Moldavia inspired this author's most famous novel." Yes, of course Bram Stoker. Watson, and the humans, answered correctly, but Watson had more points, and won. Watson wasn't competing on absolutely level terms; it was fed the questions as text at the same time as they were read to the other two players. Still, its victory led to plenty of tweeting from people welcoming our new robotic overlords. And it wasn't even connected to the internet; it just relied on a huge database of information stored on its system. "While competing at Jeopardy! is not the end goal," its IBM engineers noted drily, "it is a milestone in demonstrating a capability that no computer today exhibits — the ability to interact with humans in human terms over broad domains of knowledge." And, of course, in understanding what people mean when they say something. Having gained that glory, Watson has been shunted off to work on health-care problems with the addition of Nuance's speech-to-text technology. More interaction such as Watson's is likely because of the growing availability of "cloud computing", where huge amounts of processing power is available ad hoc over the internet. Amazon, for instance, now adds almost as much computing power per day to its cloud computing systems as it used to need for all its site in 2000. Other companies are doing the same. Increasing dependence on NLU That means NLU is starting to seep into our daily lives: We have grown used to computers getting better at understanding us in limited contexts, where you phone an automated system and can pay bills just by reading out your payment card number, or use booking systems that don't have humans. Now, it is spreading wider, being used for "semantic analysis" of Facebook and Twitter postings by companies eager to figure out whether people are saying positive or negative things about them online. Feed in the text, and the computer figures out whether people are happy or annoyed with you. It could even work on your car — no more fiddling with the dashboard. "The key thing is NLU understanding what you mean and what you want," says Neil Grant of Nuance. "Historically, cars have voice elements but historically you had to learn a huge long list of commands. As NLU progresses, you can say what you want in a way that's natural to you." Norman Winarsky, who coordinated the funding for the Siri company, says it is "real AI [artificial intelligence] with real market use", adding that "Apple will enable millions of people to interact with machines with natural language. [Siri] will get things done and this is only the tip of the iceberg. We're talking another technology revolution. A new computing paradigm shift." Francisco Jeronimo, smartphones analyst for the research company IDC, is less sure: "We've had voice recognition on phones for a few years, but it never really took off. There are two reasons for that: There was no brand or company that really pushed it the way that Apple is doing. And there's no other company with a brand like Apple. It was only when Apple launched the new iPhone with Siri that I tried voice search on my Android phone — I'd never tried it before." Sounding a little surprised, he adds: "It worked really well." In fact, what is happening now with NLU, which is spreading far beyond just the iPhone, could just be the beginning of a revolution in how we use computers (particularly smartphones) as big as that which came with the iPhone in 2007, when suddenly multitouch screens (where you can do more than just prod the screen) were de rigueur. It has taken almost five years, but now multitouch screens are everywhere: Hundreds of millions of touchscreen phones and tablets have been sold, and even the next version of Microsoft's Windows due in about a year's time will offer swooshy touchscreen operation. Horace Dediu, who runs the consultancy Asymco, having formerly worked for the mobile-phone maker Nokia, thinks the time is ripe for a new way of interacting with our computers. He points to how Apple drove changes in interaction before: the mouse and windows in 1985, the iPod's click wheel in 2001, and multitouch in 2007. And he also points out that Siri has some interesting similarities with what people thought about the original iPhone touchscreen: "It's not good enough; there are many smart people who are disappointed by it; competitors are dismissive; it does not need a traditional, expensive smartphone to run, but it uses a combination of local and cloud computing to solve the user's problem." Certainly there has been no shortage of people who say that Siri (which Apple meticulously points out is a "beta", or unfinished product) "isn't good enough", because it can't yet deal with every accent, or every possible query. They also say that the whole idea of asking a disembodied computer questions is ridiculous: "You won't catch me saying things into a phone" is typical of the sort of reaction. (An online poll at the website Ars Technica found 36 per cent saying "you won't catch me talking at a machine in public".) Which is odd when you think about it, since phones are designed for talking into, and nobody seems to get embarrassed about announcing that they are on the train to a carriageful of people, or reading their credit-card details out loudly and slowly for all to note. Perhaps there is an element of self-consciousness: We are more concerned with our own feelings about asking questions of a machine than anyone else's about hearing them. The thing about NLU is that people have been expecting it to happen for ages. In 1996 I watched Bill Gates announce that by 2011 we would have computers that could recognise their owners' faces and voices. That is this year! And he is right if you count smartphones as computers, which they are, really, about as powerful as the laptop computers of 2001. The newest Android phones can be unlocked by showing them your face. And the voice thing — well, we are working on it. Not that things are perfect now, either: Siri's servers have already had a number of outages. At Nuance, Grant thinks that will be solved: "Time will sort out the connection problems," he says. Even so, it is not clear whether in our offices we will talk to our computers Scotty-style. Apart from anything, the noise would be maddening. Wouldn't it? Then again, that is probably what people thought in the 19th century about introducing telephones to offices. And we already have offices where people spend the whole time on the phone. We know them as call centres. Just hold on for a few years, Scotty. The computer will hear you soon.
GMT 15:03 2017 Tuesday ,24 October
Second Palestinian mobile provider enters GazaGMT 14:34 2017 Sunday ,15 October
US mobile carriers Sprint, T-Mobile to mergeGMT 16:20 2017 Thursday ,21 September
Google likely to buy stake in Taiwan smartphone maker HTCGMT 09:46 2017 Friday ,15 September
Apple's grand plan in Ireland held up by a forestGMT 14:01 2017 Thursday ,14 September
Saudis urged to report on fellow citizens via mobile appGMT 16:15 2017 Sunday ,03 September
China's Huawei unveils mobile AI assistant at Berlin's IFAGMT 14:32 2017 Monday ,26 June
Russian intelligence says Telegram app used in bombingGMT 14:00 2017 Sunday ,25 June
Dutch invent phone app to stop kids texting on bikesMaintained and developed by Arabs Today Group SAL.
All rights reserved to Arab Today Media Group 2021 ©
Maintained and developed by Arabs Today Group SAL.
All rights reserved to Arab Today Media Group 2021 ©
Send your comments
Your comment as a visitor