Artificial Intelligence (AI) systems built by Alibaba and Microsoft beat humans in Stanford University reading test
The AI systems built by Chinese retail giant Alibaba and Microsoft have outperformed humans in the Stanford Question Answering Dataset (SQuAD), Alibaba’s AI program was the first to beat the human score. Using natural-language processing, Alibaba’s machine-learning network model scored 82.44 on the test on January 11, narrowly beating the 82.304 scored by the human participants. A day later, Microsoft’s AI program also beat the human score, with a result of 82.650. “These kinds of tests are certainly useful benchmarks for how far along the AI journey we may be,” said Andrew Pickup, a spokesman for Microsoft. “However, the real benefit of AI is when it is used in harmony with humans,” he added.
SQuAD is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 107,785 question-answer pairs on 536 articles, SQuAD is significantly larger than previous reading comprehension datasets.
Last week, Alibaba tested its deep neural network model through at various levels, asking the AI to provide exact answers to more than 100,000 questions comprising a quiz that’s considered one of the world’s most authoritative machine-reading gauges. Deep neural network model developed by Alibaba’s Institute of Data Science of Technologies beat rival humans with a score of 82.44 compared to 82.305, the company said. “That means objective questions such as ‘What causes rain’ can now be answered with high accuracy by machines,” Luo Si, chief scientist for natural language processing at the Institute of Data Science of Technologies, said in a statement.
“This is the first time that a machine has outperformed humans on such a test,” Alibaba said in a statement Monday. “The technology underneath can be gradually applied to numerous applications such as customer service, museum tutorials and online responses to medical inquiries from patients, decreasing the need for human input in an unprecedented way.”
In a tweet postedby Pranav Rajurkar, the AI systems beat humans in a Stanford Question Answering Dataset (SQuAD) test.
A strong start to 2018 with the first model (SLQA+) to exceed human-level performance on @stanfordnlp SQuAD's EM metric! Next challenge: the F1 metric, where humans still lead by ~2.5 points!https://t.co/Uq10Dm2Ss5
— Pranav Rajpurkar (@pranavrajpurkar) January 11, 2018