![]() ![]() Could the judge reliably detect which was the computer? That was a question equivalent to ‘Can machines think?’, Turing suggested. This was a scenario in which human judges hold short, text-based conversations with a hidden computer and an unseen person. Turing suggested an assessment that he called the imitation game 2. The most famous test of machine intelligence has long been the Turing test, proposed by the British mathematician and computing luminary Alan Turing in 1950, when computers were still in their infancy. “We have to understand what they can do and where they fail, so that we can know how to use them in a safe manner.” Is the Turing test dead? If LLMs are going to be applied in real-world domains - from medicine to law - it’s important to understand the limits of their capabilities, Mitchell says. Research on how best to test LLMs and what those tests show also has a practical point. Such benchmarks could also help to show what is missing in today’s machine-learning systems, and untangle the ingredients of human intelligence, says Brenden Lake, a cognitive computational scientist at New York University. Tests such as the logic puzzles that reveal differences between the capabilities of people and AI systems are a step in the right direction, say researchers from both sides of the discussion. “There’s no Geiger counter we can point at something and say ‘beep beep beep - yes, intelligent’,” Ullman adds. The reason for the split, he says, is a lack of conclusive evidence supporting either opinion. “There’s very good smart people on all sides of this debate,” says Ullman. Others (including himself and researchers such as Mitchell) are much more cautious. Some attribute the algorithms’ achievements to glimmers of reasoning, or understanding, he says. Other AI systems might beat the LLMs at any one task, but they have to be trained on data relevant to a specific problem, and cannot generalize from one task to another.ĬhatGPT is a black box: how AI research can break it openīroadly speaking, two camps of researchers have opposing views about what is going on under the hood of LLMs, says Tomer Ullman, a cognitive scientist at Harvard University in Cambridge, Massachusetts. What’s striking is the breadth of capabilities that emerges from this autocomplete-like algorithm trained on vast stores of human language. For chatbots built on LLMs, there is an extra element: human trainers have provided extensive feedback to tune how the bots respond. They work simply by generating plausible next words when given an input text, based on the statistical correlations between words in billions of online sentences they are trained on. In the past two to three years, LLMs have blown previous AI systems out of the water in terms of their ability across multiple tasks. “People in the field of AI are struggling with how to assess these systems,” says Melanie Mitchell, a computer scientist at the Santa Fe Institute in New Mexico whose team created the logic puzzles (see ‘An abstract-thinking test that defeats machines’). Tested another way, they seem less impressive, exhibiting glaring blind spots and an inability to reason about abstract concepts. Tested in one way, they breeze through what once were considered landmark feats of machine intelligence. The team behind the logic puzzles aims to provide a better benchmark for testing the capabilities of AI systems - and to help address a conundrum about large language models (LLMs) such as GPT-4. But GPT-4, the most advanced version of the AI system behind the chatbot ChatGPT and the search engine Bing, gets barely one-third of the puzzles right in one category of patterns and as little as 3% correct in another, according to a report by researchers this May 1. In a test consisting of a series of brightly coloured blocks arranged on a screen, most people can spot the connecting patterns. ![]() What can’t they do? Solve simple visual logic puzzles. The world’s best artificial intelligence (AI) systems can pass tough exams, write convincingly human essays and chat so fluently that many find their output indistinguishable from people’s. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |