Monday, January 6, 2025
Home Blog Page 1474

New open-source platform allows users to evaluate performance of AI-powered chatbots

0
New open-source platform allows users to evaluate performance of AI-powered chatbots


New open-source platform allows users to evaluate performance of AI-powered chatbots
(A) Contrasting typical static evaluation (Top) with interactive evaluation (Bottom), wherein a human iteratively queries a model and rates the quality of responses. (B) Example subset of the chat interface from CheckMate where users interact with an LLM. The participant can type their query (Lower Left), which is compiled in LaTeX (Lower Right). When ready, the participant can press “Interact” and have their query routed to the model. Credit: Proceedings of the National Academy of Sciences (2024). DOI: 10.1073/pnas.2318124121

A team of computer scientists, engineers, mathematicians and cognitive scientists, led by the University of Cambridge, have developed an open-source evaluation platform called CheckMate, which allows human users to interact with and evaluate the performance of large language models (LLMs).

The researchers tested CheckMate in an experiment where human participants used three LLMs—InstructGPT, ChatGPT and GPT-4—as assistants for solving undergraduate-level mathematics problems.

The team studied how well LLMs can assist participants in solving problems. Despite a generally positive correlation between a chatbot’s correctness and perceived helpfulness, the researchers also found instances where the LLMs were incorrect, but still useful for the participants. However, certain incorrect LLM outputs were thought to be correct by participants. This was most notable in LLMs optimized for chat.

The researchers suggest models that communicate uncertainty, respond well to user corrections, and can provide a concise rationale for their recommendations, make better assistants. Human users of LLMs should verify their outputs carefully, given their current shortcomings.

The results, reported in the Proceedings of the National Academy of Sciences, could be useful in both informing AI literacy training, and help developers improve LLMs for a wider range of uses.

While LLMs are becoming increasingly powerful, they can also make mistakes and provide incorrect information, which could have negative consequences as these systems become more integrated into our everyday lives.

“LLMs have become wildly popular, and evaluating their performance in a quantitative way is important, but we also need to evaluate how well these systems work with and can support people,” said co-first author Albert Jiang, from Cambridge’s Department of Computer Science and Technology. “We don’t yet have comprehensive ways of evaluating an LLM’s performance when interacting with humans.”

The standard way to evaluate LLMs relies on static pairs of inputs and outputs, which disregards the interactive nature of chatbots, and how that changes their usefulness in different scenarios. The researchers developed CheckMate to help answer these questions, designed for but not limited to applications in mathematics.

“When talking to mathematicians about LLMs, many of them fall into one of two main camps: either they think that LLMs can produce complex mathematical proofs on their own, or that LLMs are incapable of simple arithmetic,” said co-first author Katie Collins from the Department of Engineering. “Of course, the truth is probably somewhere in between, but we wanted to find a way of evaluating which tasks LLMs are suitable for and which they aren’t.”

The researchers recruited 25 mathematicians, from undergraduate students to senior professors, to interact with three different LLMs (InstructGPT, ChatGPT, and GPT-4) and evaluate their performance using CheckMate. Participants worked through undergraduate-level mathematical theorems with the assistance of an LLM and were asked to rate each individual LLM response for correctness and helpfulness. Participants did not know which LLM they were interacting with.

The researchers recorded the sorts of questions asked by participants, how participants reacted when they were presented with a fully or partially incorrect answer, whether and how they attempted to correct the LLM, or if they asked for clarification. Participants had varying levels of experience with writing effective prompts for LLMs, and this often affected the quality of responses that the LLMs provided.

An example of an effective prompt is “what is the definition of X” (X being a concept in the problem) as chatbots can be very good at retrieving concepts they know of and explaining it to the user.

“One of the things we found is the surprising fallibility of these models,” said Collins. “Sometimes, these LLMs will be really good at higher-level mathematics, and then they’ll fail at something far simpler. It shows that it’s vital to think carefully about how to use LLMs effectively and appropriately.”

However, like the LLMs, the human participants also made mistakes. The researchers asked participants to rate how confident they were in their own ability to solve the problem they were using the LLM for. In cases where the participant was less confident in their own abilities, they were more likely to rate incorrect generations by LLM as correct.

“This kind of gets to a big challenge of evaluating LLMs, because they’re getting so good at generating nice, seemingly correct natural language, that it’s easy to be fooled by their responses,” said Jiang. “It also shows that while human evaluation is useful and important, it’s nuanced, and sometimes it’s wrong. Anyone using an LLM, for any application, should always pay attention to the output and verify it themselves.”

Based on the results from CheckMate, the researchers say that newer generations of LLMs are increasingly able to collaborate helpfully and correctly with human users on undergraduate-level math problems, as long as the user can assess the correctness of LLM-generated responses.

Even if the answers may be memorized and can be found somewhere on the internet, LLMs have the advantage of being flexible in their inputs and outputs over traditional search engines (though should not replace search engines in their current form).

While CheckMate was tested on mathematical problems, the researchers say their platform could be adapted to a wide range of fields. In the future, this type of feedback could be incorporated into the LLMs themselves, although none of the CheckMate feedback from the current study has been fed back into the models.

“These kinds of tools can help the research community to have a better understanding of the strengths and weaknesses of these models,” said Collins. “We wouldn’t use them as tools to solve complex mathematical problems on their own, but they can be useful assistants if the users know how to take advantage of them.”

More information:
Katherine M. Collins et al, Evaluating language models for mathematics through interactions, Proceedings of the National Academy of Sciences (2024). DOI: 10.1073/pnas.2318124121

Citation:
New open-source platform allows users to evaluate performance of AI-powered chatbots (2024, June 4)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-source-platform-users-ai-powered.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

An eerie ‘digital afterlife’ is no longer science fiction. So how do we navigate the risks?

0
An eerie 'digital afterlife' is no longer science fiction. So how do we navigate the risks?


digital afterlife
Credit: Pixabay/CC0 Public Domain

Imagine a future where your phone pings with a message that your dead father’s “digital immortal” bot is ready. This promise of chatting with a virtual version of your loved one—perhaps through a virtual reality (VR) headset—is like stepping into a sci-fi movie, both thrilling and a bit eerie.

As you interact with this digital dad, you find yourself on an emotional rollercoaster. You uncover secrets and stories you never knew, changing how you remember the real person.

This is not a distant, hypothetical scenario. The digital afterlife industry is rapidly evolving. Several companies promise to create virtual reconstructions of deceased individuals based on their digital footprints.

From artificial intelligence (AI) chatbots and virtual avatars to holograms, this technology offers a strange blend of comfort and disruption. It may pull us into deeply personal experiences that blur the lines between past and present, memory and reality.

As the digital afterlife industry grows, it raises significant ethical and emotional challenges. These include concerns about consent, privacy and the psychological impact on the living.

What is the digital afterlife industry?

VR and AI technologies are making virtual reconstructions of our loved ones possible. Companies in this niche industry use data from social media posts, emails, text messages and voice recordings to create digital personas that can interact with the living.

Although still niche, the number of players in the digital afterlife industry is growing.

HereAfter allows users to record stories and messages during their lifetime, which can then be accessed by loved ones posthumously. MyWishes offers the ability to send pre-scheduled messages after death, maintaining a presence in the lives of the living.

Hanson Robotics has created robotic busts that interact with people using the memories and personality traits of the deceased. Project December grants users access to so-called “deep AI” to engage in text-based conversations with those who have passed away.

Generative AI also plays a crucial role in the digital afterlife industry. These technologies enable the creation of highly realistic and interactive digital personas. But the high level of realism may blur the line between reality and simulation. This may enhance the user experience, but may also cause emotional and psychological distress.

A technology ripe for misuse

Digital afterlife technologies may aid the grieving process by offering continuity and connection with the deceased. Hearing a loved one’s voice or seeing their likeness may provide comfort and help process the loss.

For some of us, these digital immortals could be therapeutic tools. They may help us to preserve positive memories and feel close to loved ones even after they have passed away.

But for others, the emotional impact may be profoundly negative, exacerbating grief rather than alleviating it. AI recreations of loved ones have the potential to cause psychological harm if the bereaved ends up having unwanted interactions with them. It’s essentially being subjected to a “digital haunting.”

Other major issues and ethical concerns surrounding this tech include consent, autonomy and privacy.

For example, the deceased may not have agreed to their data being used for a “digital afterlife.”

There’s also the risk of misuse and data manipulation. Companies could exploit digital immortals for commercial gain, using them to advertise products or services. Digital personas could be altered to convey messages or behaviors the deceased would never have endorsed.

We need regulation

To address concerns around this quickly emerging industry, we need to update our legal frameworks. We need to address issues such as digital estate planning, who inherits the digital personas of the deceased, and digital memory ownership.

The European Union’s General Data Protection Regulation (GDPR) recognizes post-mortem privacy rights, but faces challenges in enforcement.

Social media platforms control deceased users’ data access, often against heirs’ wishes, with clauses like “no right of survivorship” complicating matters. Limited platform practices hinder the GDPR’s effectiveness. Comprehensive protection demands reevaluating contractual rules, aligning with human rights.

The digital afterlife industry offers comfort and memory preservation, but raises ethical and emotional concerns. Implementing thoughtful regulations and ethical guidelines can honor both the living and the dead, to ensure digital immortality enhances our humanity.

What can we do?

Researchers have recommended several ethical guidelines and regulations. Some recommendations include:

  • obtaining informed and documented consent before creating digital personas from people before they die
  • age restrictions to protect vulnerable groups
  • clear disclaimers to ensure transparency
  • and strong data privacy and security measures.

Drawing from ethical frameworks in archaeology, a 2018 study has suggested treating digital remains as integral to personhood, proposing regulations to ensure dignity, especially in re-creation services.

Dialogue between policymakers, industry and academics is crucial for developing ethical and regulatory solutions. Providers should also offer ways for users to respectfully terminate their interactions with digital personas.

Through careful, responsible development, we can create a future where digital afterlife technologies meaningfully and respectfully honor our loved ones.

As we navigate this brave new world, it is crucial to balance the benefits of staying connected with our loved ones against the potential risks and ethical dilemmas.

By doing so, we can make sure the digital afterlife industry develops in a way that respects the memory of the deceased and supports the emotional well-being of the living.

Provided by
The Conversation


This article is republished from The Conversation under a Creative Commons license. Read the original article.The Conversation

Citation:
An eerie ‘digital afterlife’ is no longer science fiction. So how do we navigate the risks? (2024, June 24)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-eerie-digital-afterlife-longer-science.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Virtual and mixed realities converge in new driving simulator

0
Virtual and mixed realities converge in new driving simulator


Virtual, mixed realities converge in new driving simulator
Credit: ACM SIGCHI

Portobello, a new driving simulator developed by researchers at Cornell Tech, blends virtual and mixed realities, enabling both drivers and passengers to see virtual objects overlaid in the real world.

This technology opens up new possibilities for researchers to conduct the same user studies both in the lab and on the road—a novel concept the team calls “platform portability.”

The research team, led by Wendy Ju, associate professor at the Jacobs Technion-Cornell Institute at Cornell Tech, presented their paper, “Portobello: Extended Driving Simulation from the Lab to the Road,” at the ACM Conference on Human Factors in Computing Systems (CHI) in May. The paper earned honorable mention at the conference.

Co-authors included doctoral students Fanjun Bu, Stacey Li, David Goedicke, and Mark Colley; and Gyanendra Sharma, an industrial adviser from Woven by Toyota.






Credit: ACM SIGCHI

Portobello is an on-road driving simulation system that enables both drivers and passengers to use mixed-reality (XR) headsets. The team’s motivation for developing Portobello stemmed from its work on XR-OOM, an XR driving simulator system. The tool could merge aspects of the physical and digital worlds, but it had limitations.

“While we could stage virtual objects in and around the car—such as in-car virtual displays and virtual dashboards—we had problems staging virtual events relative to objects in the real world, such as a virtual pedestrian crossing on a real crosswalk or having a virtual car stop at real stop signs,” Bu said.

This posed a significant obstacle to conducting meaningful studies, particularly for autonomous driving experiments that require precise staging of objects and events in fixed locations within the environment.

Portobello was conceived to overcome these limitations and anchor on-road driving simulations in the physical world. During the design phase, researchers utilize the Portobello system to generate a precise map of the study environment. Within this map, they can strategically position virtual objects based on real-world elements (placing virtual pedestrians near stop signs, for example). The vehicle operates within the same mapped environment, seamlessly blending simulation and reality.

With the successful integration of Portobello, the team has not only addressed the limitations of XR-OOM but has also introduced platform portability. This innovation enables researchers to conduct identical studies in both controlled laboratory settings and real-world driving scenarios, enhancing the precision and applicability of their findings.

“Participants treat in-lab simulators as visual approximations of real-world scenarios, almost a performative experience,” Bu said. “However, participants treat on-road simulators as functional approximations. [They] felt more stress in on-road simulators and felt their decisions carried more weight.”

Bu said Portobello could facilitate the “twinning of studies”—running the same study across different environments. This, he said, not only makes findings more realistic, but also helps uncover how other factors might affect the results.

Ju said, “We believe that by going beyond running pristine studies and allowing some variability from real-world to bleed through, research results will be more applicable to real-world settings.”

Hiroshi Yasuda, a human-machine interaction researcher at Toyota Research Institute (TRI), also contributed to the research.

Provided by
Cornell University


Citation:
Virtual and mixed realities converge in new driving simulator (2024, June 20)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-virtual-realities-converge-simulator.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Future-self chatbot gives users a glimpse of the life ahead of them

0
Future-self chatbot gives users a glimpse of the life ahead of them


Future-self chatbot gives users a glimpse of the life ahead of them
“Future You” is an interactive chat platform that allows users to chat with a relatable yet virtual version of their future selves in real time via a large language model that has been personalized based on a pre-intervention survey centered on user future goals and personal qualities. To make the conversation realistic, the system generates an individualized synthetic memory for the user’s future self that contains a backstory for the user at age 60. To increase the believability of the future-self character, the system applies age progress to the user’s portrait. Credit: arXiv (2024). DOI: 10.48550/arxiv.2405.12514

A team of AI researchers with members from several institutions in the U.S. and KASIKORN Labs, in Thailand, has built an AI-based chatbot that allows users to chat with a potential version of their future selves.

The group has published a paper on the arXiv preprint server describing the technology and how it has been received by volunteers who interacted with the system.

As chatbots grow more sophisticated, system builders have begun looking for new ways to use them. In this new effort, the team, based at MIT, built a chatbot that gives users a sense of their own fate by allowing them to chat with a potential future version of themselves.

Prior research has shown that when younger people spend time talking with older people, they often come away with a broader outlook on life and how their own future might unfold. Thinking that young people would benefit even more if they could talk to their future, older selves, the researchers set out to build a system that would mimic such an opportunity.

To create the chatbot, the researchers put together several modules, the first of which involved building a regular chatbot that asked users a series of questions about themselves and the people in their lives. It also asked about their background, their hopes and plans for the future and their vision of an idealized life.

The same chatbot also asked users to submit a current picture of themselves. A separate routine aged the photo, allowing the user to see what they might look like in the distant future.

The second module fed the information from the first module to a separate language module that generated “memories” based on experiences of others that were mixed with some of the events and experiences of the original user.

The third module was the future chatbot. It applied the results of the first two modules as it interacted with the same user, giving future, experienced-based answers to questions.

The research team tested the system using themselves as guinea pigs and then asked 344 volunteers to give it a go as well and report how it went.

The research team found mostly positive results—most users reported feeling more optimistic about their future and more connected to their future selves. And one of the researchers, after a session with the new bot, found himself more aware of the limited amount of time he would have with his parents and began to spend more time with them.

More information:
Pat Pataranutaporn et al, Future You: A Conversation with an AI-Generated Future Self Reduces Anxiety, Negative Emotions, and Increases Future Self-Continuity, arXiv (2024). DOI: 10.48550/arxiv.2405.12514

Project: www.media.mit.edu/projects/future-you/overview/

Journal information:
arXiv


© 2024 Science X Network

Citation:
Future-self chatbot gives users a glimpse of the life ahead of them (2024, June 5)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-future-chatbot-users-glimpse-life.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Apple partners with OpenAI as it unveils ‘Apple Intelligence’

0
Apple partners with OpenAI as it unveils 'Apple Intelligence'


Tim Cook, Apple chief executive officer, speaks during Apple's annual Worldwide Developers Conference in Cupertino, California
Tim Cook, Apple chief executive officer, speaks during Apple’s annual Worldwide Developers Conference in Cupertino, California.

Apple on Monday unveiled “Apple Intelligence,” its suite of new AI features for its coveted devices—and a partnership with OpenAI—as it seeks to catch up to rivals racing ahead on adopting the white-hot technology.

For months, pressure has been on Apple to persuade doubters on its AI strategy, after Microsoft and Google rolled out products in rapid-fire succession.

But this latest move will take the experience of Apple products “to new heights,” chief executive Tim Cook said as he opened an annual Worldwide Developers Conference at the tech giant’s headquarters in the Silicon Valley city of Cupertino, California.

To help towards that end, Apple has partnered with OpenAI, which ushered in a new era for generative artificial intelligence in 2022 with the arrival of ChatGPT.

OpenAI was “very happy to be partnering with Apple to integrate ChatGPT into their devices later this year! Think you will really like it,” posted the company’s chief Sam Altman on social media.

Apple Intelligence will also be added to a new version of the iOS 18 operating system, similarly unveiled Monday at the week-long conference.

Apple executives stressed privacy safeguards have been built into Apple Intelligence to make its Siri digital assistant and other products smarter, without pilfering user data.

The big challenge for Apple has been how to infuse ChatGPT-style AI—which voraciously feeds off data—into its products without weakening its heavily promoted user privacy and security, according to analysts.

The system “puts powerful generative models right at the core of your iPhone, iPad and Mac,” said Apple senior vice president of software engineering Craig Federighi.

“It draws on your personal context to give you intelligence that’s most helpful and relevant for you, and it protects your privacy at every step.”

But Tesla and SpaceX tycoon Elon Musk lashed out at the partnership, saying the threat to data security will make him ban iPhones at his companies.

“Apple has no clue what’s actually going on once they hand your data over to OpenAI. They’re selling you down the river,” Musk said in a post on social media.

Musk is building his own rival to OpenAI, xAI, and is suing the company that he helped found in 2015.

Sam Altman, chief executive officer of OpenAI, attends Apple's annual Worldwide Developers Conference (WDC) in Cupertino, California
Sam Altman, chief executive officer of OpenAI, attends Apple’s annual Worldwide Developers Conference (WDC) in Cupertino, California.

Apple Intelligence, which runs only on the company’s in-house technology, will enable users to create their own emojis based on a description in everyday language, or to generate brief summaries of e-mails in the mailbox.

Apple said Siri, its voice assistant, will also get an AI infused upgrade and now will appear as a pulsating light on the edge of your home screen.

Launched over 12 years ago, Siri has long since been seen as a dated feature, overtaken by the new generation of assistants, such as GPT-4o, OpenAI’s latest offering.

GPT-4o grabbed the headlines last month when actress Scarlett Johansson accused OpenAI of copying her voice to embody the assistant after she turned down an offer to work with the company.

OpenAI has denied this, but suspended the use of the new voice in its products.

ChatGPT on offer

In its deal with OpenAI, users can choose to enhance Siri on certain requests with ChatGPT, Federighi said.

“It sounds like it’s Apple—then if it needs ChatGPT, it offers it to you,” Techsponential analyst Avi Greengart said.

“The implementation is what is special here.”

The partnership with OpenAI was not exclusive, unlike Apple’s landmark tie-up with Google for search, which has drawn the scrutiny of antitrust regulators.

Apple said it expected to announce support for other AI models in the future.

The company founded by Steve Jobs had remained very quiet on AI since the start of the ChatGPT-sparked frenzy, with Apple for a while avoiding the term altogether.

But the pressure became too great, with Wall Street propelling Microsoft past Apple as the world’s biggest company when measured by stock price, largely because of the Windows-maker’s unabashed embrace of AI.

Wall Street investors were not overly impressed by the AI announcements, with Apple’s share price down nearly two percent at the close on Monday.

© 2024 AFP

Citation:
Apple partners with OpenAI as it unveils ‘Apple Intelligence’ (2024, June 10)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-apple-partners-openai-unveils-intelligence.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link