Friday, January 31, 2025
Home Blog Page 1656

New algorithm discovers language just by watching videos

0
New algorithm discovers language just by watching videos


New algorithm discovers language just by watching videos
The algorithm DenseAV learns the meaning of language solely by associating audio and video signals. Credit: Mark Hamilton

Mark Hamilton, an MIT Ph.D. student in electrical engineering and computer science and affiliate of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), wants to use machines to understand how animals communicate. To do that, he set out first to create a system that can learn human language “from scratch.”

“Funny enough, the key moment of inspiration came from the movie ‘March of the Penguins.’ There’s a scene where a penguin falls while crossing the ice, and lets out a little belabored groan while getting up. When you watch it, it’s almost obvious that this groan is standing in for a four letter word. This was the moment where we thought, maybe we need to use audio and video to learn language.” says Hamilton. “Is there a way we could let an algorithm watch TV all day and from this figure out what we’re talking about?”

“Our model, DenseAV, aims to learn language by predicting what it’s seeing from what it’s hearing, and vice-versa. For example, if you hear the sound of someone saying ‘bake the cake at 350’ chances are you might be seeing a cake or an oven. To succeed at this audio-video matching game across millions of videos, the model has to learn what people are talking about,” says Hamilton.

A paper describing the work appears on the arXiv preprint server.

Once they trained DenseAV on this matching game, Hamilton and his colleagues looked at which pixels the model looked for when it heard a sound. For example, when someone says “dog,” the algorithm immediately starts looking for dogs in the video stream. By seeing which pixels are selected by the algorithm, one can discover what the algorithm thinks a word means.

Interestingly, a similar search process happens when DenseAV listens to a dog barking: It searches for a dog in the video stream.

“This piqued our interest. We wanted to see if the algorithm knew the difference between the word ‘dog’ and a dog’s bark,” says Hamilton. The team explored this by giving the DenseAV a “two-sided brain.” Interestingly, they found one side of DenseAV’s brain naturally focused on language, like the word “dog,” and the other side focused on sounds like barking. This showed that DenseAV not only learned the meaning of words and the locations of sounds, but also learned to distinguish between these types of cross-modal connections, all without human intervention or any knowledge of written language.

One branch of applications is learning from the massive amount of video published to the internet each day.

“We want systems that can learn from massive amounts of video content, such as instructional videos,” says Hamilton. “Another exciting application is understanding new languages, like dolphin or whale communication, which don’t have a written form of communication. Our hope is that DenseAV can help us understand these languages that have evaded human translation efforts since the beginning. Finally, we hope that this method can be used to discover patterns between other pairs of signals, like the seismic sounds the earth makes and its geology.”






Credit: Massachusetts Institute of Technology

A formidable challenge lay ahead of the team: Learning language without any text input. Their objective was to rediscover the meaning of language from a blank slate, avoiding using pre-trained language models. This approach is inspired by how children learn by observing and listening to their environment to understand language.

To achieve this feat, DenseAV uses two main components to process audio and visual data separately. This separation made it impossible for the algorithm to cheat, by letting the visual side look at the audio and vice versa. It forced the algorithm to recognize objects and created detailed and meaningful features for both audio and visual signals. DenseAV learns by comparing pairs of audio and visual signals to find which signals match and which signals do not. This method, called contrastive learning, doesn’t require labeled examples, and allows DenseAV to figure out the important predictive patterns of language itself.

One major difference between DenseAV and previous algorithms is that prior works focused on a single notion of similarity between sound and images. An entire audio clip like someone saying “the dog sat on the grass” was matched to an entire image of a dog. This didn’t allow previous methods to discover fine-grained details, like the connection between the word “grass” and the grass underneath the dog.

The team’s algorithm searches for and aggregates all the possible matches between an audio clip and an image’s pixels. This not only improved performance, but allowed the team to precisely localize sounds in a way that previous algorithms could not.

“Conventional methods use a single class token, but our approach compares every pixel and every second of sound. This fine-grained method lets DenseAV make more detailed connections for better localization,” says Hamilton.

The researchers trained DenseAV on AudioSet, which includes 2 million YouTube videos. They also created new datasets to test how well the model can link sounds and images. In these tests, DenseAV outperformed other top models in tasks like identifying objects from their names and sounds, proving its effectiveness.

“Previous datasets only supported coarse evaluations, so we created a dataset using semantic segmentation datasets. This helps with pixel-perfect annotations for precise evaluation of our model’s performance. We can prompt the algorithm with specific sounds or images and get those detailed localizations,” says Hamilton.

Due to the massive amount of data involved, the project took about a year to complete. The team says that transitioning to a large transformer architecture presented challenges, as these models can easily overlook fine-grained details. Encouraging the model to focus on these details was a significant hurdle.

Looking ahead, the team aims to create systems that can learn from massive amounts of video- or audio-only data. This is crucial for new domains where there’s lots of either mode, but not together. They also aim to scale this up using larger backbones and possibly integrate knowledge from language models to improve performance.

“Recognizing and segmenting visual objects in images, as well as environmental sounds and spoken words in audio recordings, are each difficult problems in their own right. Historically researchers have relied upon expensive, human-provided annotations in order to train machine learning models to accomplish these tasks,” says David Harwath, assistant professor in computer science at the University of Texas at Austin who was not involved in the work.

“DenseAV makes significant progress towards developing methods that can learn to solve these tasks simultaneously by simply observing the world through sight and sound—based on the insight that the things we see and interact with often make sound, and we also use spoken language to talk about them. This model also makes no assumptions about the specific language that is being spoken, and could therefore in principle learn from data in any language. It would be exciting to see what DenseAV could learn by scaling it up to thousands or millions of hours of video data across a multitude of languages.”

Additional authors are Andrew Zisserman, professor of computer vision engineering at the University of Oxford; John R. Hershey, Google AI Perception researcher; and William T. Freeman, MIT electrical engineering and computer science professor and CSAIL principal investigator.

More information:
Mark Hamilton et al, Separating the “Chirp” from the “Chat”: Self-supervised Visual Grounding of Sound and Language, arXiv (2024). arxiv.org/abs/2406.05629

Journal information:
arXiv


This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation:
New algorithm discovers language just by watching videos (2024, June 11)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-algorithm-language-videos.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Researchers take new ‘mixed reality’ headsets for a spin

0
Researchers take new 'mixed reality' headsets for a spin


Researchers take new 'mixed reality' headsets for a spin
Stanford VHIL researchers developing the protocol for how to safely use headsets in public. Credit: Virtual Human Interaction Lab

Among the buzziest consumer technologies right now are “mixed reality” or “spatial computing” headsets that convincingly blend views of the real world with digital content.

A key enabling technology behind these gizmos is passthrough video, which involves blocking out all light so users must rely on cameras on the headsets to see the external world around them via real-time video playing on tiny screens. The arrangement allows users to physically interact with their environments and go about daily activities but with added digital content displayed, ranging from familiar device apps to innovative gaming scenarios. If tech companies’ visions come true, users would wear these headsets for extended periods, even all day long at work and at home, ushering in new modes of human-computer and social interaction.

To put passthrough video through its paces, a diverse team of Stanford researchers recently conducted field tests alongside longitudinal analyses of their personal journeys and interpersonal interactions. As described in a new study in Technology, Mind, and Behavior, overall user experiences proved—fittingly enough—mixed, with moments of both awe and unsettlement. The researchers accordingly recommend caution regarding prolonged headset use and call for a longer-term assessment.

“Given how far headsets with passthrough video have come, it’s time to dedicate serious academic thought to the psychological and behavioral effects of this technology,” said Jeremy Bailenson, the Thomas More Storke Professor in the Stanford School of Humanities and Sciences and founding director of the Virtual Human Interaction Lab (VHIL). “We want to understand the implications of living in a life in which we rely on passthrough for hours every day to see the world around us.”






Research showed that despite initial experiences of awe when using mixed reality headsets, interacting with the world over time can become difficult for various reasons, including distortions in object size and distance and challenges related to social interaction. Credit: Stanford University

Pros of passthrough

For the study, 10 research scholars in the VHIL and Bailenson himself spent at least 140 minutes over two or three sessions wearing Meta Quest 3 passthrough video headsets, which became widely available in October 2023.

The researchers engaged in a wide range of activities such as having conversations, walking outdoors, playing games, and eating and cooking food. For safety reasons, given concerns about potentially tripping over objects or encountering moving people or vehicles, a chaperone not wearing a headset remained present at all times.

The study participants attempted to examine the experience from both a hands-on, subjective perspective as well as a removed, clinical view. “We took an observational approach, more akin to naturalists, and really dove into the medium in an exploratory way,” said study co-author James Brown, a master’s student in the Symbolic Systems Program.

In general, the researchers found they enjoyed many aspects of having reality filtered through passthrough. “For a lot of us, wearing a headset in public was exciting,” said study co-author Monique Tania Santoso, a doctoral student in the Department of Communication.

“It was a very novel experience being in these headsets while walking around campus, interacting with strangers, and even buying coffee,” said co-author Portia Wang, a second-year master’s student in the Management Science and Engineering Department studying computational social science.

As for Bailenson, who has long followed the development of passthrough video and recalls first donning a rudimentary device back in the late 1990s, the experience was “mind-blowing” in comparison.

“It’s hard to describe until you try it, but it feels like magic with these newest headsets,” Bailenson said. “The immediacy of the video, the stereo color, and the incredible visuals that can be rendered, including making walls or objects disappear—your eyes and brain for the most part can’t tell the difference.”

Researchers take new 'mixed reality' headsets for a spin
The “Pinocchio Test” is used to study near-eye distance underestimation. Participants point to where they think the tip of their nose should be based on passthrough vision and are surprised to find their nose has “grown” due to the spatial distortion of passthrough. Credit: Virtual Human Interaction Lab

Still not as real as real

As the researchers continued to spend time immersed in passthrough video, however, significant imperfections became apparent that impacted how users felt and would likely pose problems for frequent headset wearing.

In the headset, peripheral vision is lost and users can only take in around half of what humans normally see. And the gadgets still cannot quite match the sharpness of natural vision. Distortion occurs as well—a sort of “funhouse mirror” effect with objects’ shapes and dimensions appearing unnatural or morphing—and there was a just-noticeable lag in the display changing when users move their heads to a new view.

“Even though the world you are looking at is real, it certainly has a video-game-like ‘otherness’ to it,” said Brown.

These issues manifested as users often underestimating distances to objects. For example, giving “high fives” proved challenging, and when users tried bringing a spoon to their mouths when eating, the headset view suggested the spoon had reached their lips, though, in reality, the spoon hovered a few inches away.

While headset wearers learned to account for these inaccuracies, what concerns Bailenson’s team is the extent to which such overcompensation could linger after prolonged headset usage.

“The companies making these headsets want you to wear them all day, but what are the aftereffects and how long do they last?” Bailenson said. “A plausible scenario could be walking down a flight of stairs and you miss a step, or driving a car and you misjudge distances.”

All these effects contributed to profound feelings of what is known in this research as “social absence.” Instances of this included “challenges of discerning distant facial expressions,” noted by Wang, and the “lack of eye gaze,” reported by Santoso. “People in the outside world became very absent, as if we were watching them on TV,” Bailenson said. “The person walking or cycling by or sitting near you didn’t feel physically real.”

A final problem the team encountered in their field tests was simulator sickness, a kind of motion sickness long-documented in virtual reality and first-person gaming.

“When your eyes see the world move one way, and your body experiences it differently, simulator sickness can follow,” said Bailenson. “I was surprised because all 11 of us in this study are headset veterans, but even from relatively short periods of use, we tended to feel uncomfortable.”

Adapting and moderating

Given their experiences, the Stanford researchers recommend that mixed reality headset users proceed cautiously as they adjust to the medium rather than dive into day-long binges.

Bailenson specifically advocates for users of mixed reality products—as well as the headset manufacturers themselves—to consider reducing the amount of time in the headset and taking breaks.

“There is great potential for passthrough video headsets across all kinds of applications,” said Bailenson. “But there are pitfalls as well that can lessen the user experience, from feelings of social absence to motion sickness, and aftereffects that could possibly even be dangerous.”

Bailenson is a professor in the Department of Communication, a senior fellow at Stanford Woods Institute for the Environment, and a member of Stanford Bio-X, the Wu Tsai Human Performance Alliance, and the Wu Tsai Neurosciences Institute.

Additional Stanford authors include Brian Beams, lab manager of VHIL; graduate students Cyan DeVeaux, Eugy Han, Tara Srirangarajan, and Yujie Tao; and postdoctoral scholar Anna C. M. Queiroz. Co-author Rabindra Ratan is from Michigan State University.

More information:
Jeremy N. Bailenson et al, Seeing the World through Digital Prisms: Psychological Implications of Passthrough Video Usage in Mixed Reality, Technology, Mind, and Behavior (2024).
vhil.stanford.edu/sites/g/file … y-of-passthrough.pdf

Citation:
Researchers take new ‘mixed reality’ headsets for a spin (2024, February 1)
retrieved 24 June 2024
from https://techxplore.com/news/2024-02-reality-headsets.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Though small in volume, gallium and germanium hold big potential for the global critical minerals market

0
Though small in volume, gallium and germanium hold big potential for the global critical minerals market


Critical minerals: the quiet achievers gallium and germanium
According to CRU, global gallium demand in 2023 was 708 metric tons, and is expected to reach 1,180 metric tons by 2030. Credit: CRU Group

Australia exports about 1 billion metric tons of iron ore each year and 300 metric tons of gold. Yet, beyond these well-known commodities lies a suite of lesser-known minerals, which are critical for the world’s advancement.

Critical minerals are designated critical for their essential role in modern technologies. They are identified as critical based on a number of factors including supply chain security, economic benefits and strategic importance.

Australia already packs a punch in global critical mineral supply, producing 14 of the 31 minerals listed in Australia’s Critical Minerals List. However, as the world’s reliance on technology increases, and countries transition to renewable energy, there is an opportunity for Australia to grow its stake further.

The hidden gems: Gallium and germanium

Among Australia’s unsung critical minerals are gallium and germanium. Like many critical minerals, they are not typically mined directly but are by-products of the processing of other minerals.

Both minerals hold significant growth potential for Australia, despite the country’s current limited production.

Despite gallium being as abundant on earth as copper, it never occurs at high enough concentrations to mine. It occurs in small but appreciable quantities in bauxite which is the main ore source of alumina.

Global demand for gallium is only about 708 metric tons, so even small amounts are valuable.

Gallium is crucial for producing high-speed semiconductor chips and LEDs. It is also used to create solar photovoltaic (PV) cells and electronic devices that operate at high frequencies and temperatures, making it ideal for military and satellite communications.

Germanium is a by-product of lead and zinc mining. This mineral plays a pivotal role in renewable technologies, particularly in solar cells and fiber optics, enhancing their efficiency and performance.

The global production of germanium is currently around 220 metric tons annually.

Unlocking Australia’s critical mineral potential

Dr. Chris Vernon is our Australian Critical Minerals R&D Hub Lead. He said Australia had most of the critical minerals on any country’s list so there was huge potential to grow.

“Gallium and germanium have recently gained attention because China is the main producer of each and the Chinese government has indicated that it will control export for strategic reasons. However, there are many other critical mineral markets Australia could have a role in,” Chris said.

“Large amounts of potential byproduct is either left in the ground, goes to tailings, or is exported as a contaminant in the primary product because they don’t usually cooperate, or separate easily, so you need process technology, and equipment.

“The cost of separating can be significant, and there’s a lot of competition on price, so the decision to separate out some of these materials is often made on strategic grounds, rather than on economics.”

Critical minerals: the quiet achievers gallium and germanium
CRU’s data shows in 2023, global germanium consumption was 220 metric tons, and by 2030 it will reach 280 metric tons. Credit: CRU Group

The need for strategic mineral management

As part of the Australian Critical Minerals Research and Development Hub launched late last year, a project is underway with Geoscience Australia and ANSTO. It sets out to estimate the resource potential of critical minerals like gallium, germanium and indium in Australian zinc deposits.

The project aims to evaluate the techno-economic opportunities for Australia to produce these minerals from existing operations and explore the technical recovery of gallium from existing bauxite refineries.

The goal is to make the extraction process economically viable while meeting the growing demand for these critical minerals.

One significant challenge in such small critical minerals markets is the risk of oversupply. Processing too much of a particular mineral can flood the market, driving down prices and making the extraction process economically unviable. This delicate balance requires careful management to ensure a steady supply without overwhelming the market.

Jason Needham is Principal Consultant at global mineral economics firm CRU International. He said supply and demand shocks were usually the cause of price volatility in the critical minerals market.

“Over-supply will generally cause a price decrease, whereas higher demand will often result in price increases,” Jason said.

“Small-volume markets, especially with niche critical minerals, are particularly sensitive. New mines or downstream plants coming online can drive a rapid increase in supply or even over-supply, resulting in a price drop. Often, market prices can also be affected by sentiment.”

“A good recent example of this is the nickel market, which saw a moderate price increase largely brought about by the civil unrest in New Caledonia, despite the jurisdiction only producing around five percent of global supply,” Jason said.

Developing strategies around Australia’s critical minerals is crucial, particularly for those produced in smaller quantities like our quiet achievers, germanium and gallium.

Without strategic planning, these valuable resources might lose their value, making them uneconomical to extract.

Australia’s growing role in critical mineral processing

“I expect Australia’s role in the critical mineral supply chain will mostly remain in the mining of raw materials. However, with the introduction of Government incentives we will increasingly see Australia adding value through downstream mineral processing and refining,” Jason said.

“A good example of this is lithium. In 2023, Australia produced 37% of global lithium raw materials in the form of mineral spodumene concentrates. However, Australia currently supplies only 5% of global lithium hydroxide. By 2028, CRU expects the country’s market share will increase to 12%, representing a five-fold increase in refining capacity.”

Extracting critical minerals was generally a challenging process, Chris said. Yet, processing them and turning them into engineered or functional materials would provide a massive uplift in value.

“An analogy is selling the wool, versus selling the yarn, versus selling the cloth, versus selling the suit,” Chris said.

“The wool might be worth something and you don’t need much processing to harvest it, but how much more is it worth if we take another step? These are some of the issues we have to consider. How far along the value chain does it make sense to go?”

Australia’s potential in the critical minerals market is immense. Particularly now, with lesser-known minerals like gallium and germanium playing pivotal roles in modern technology and renewable energy.

By strategically developing these resources, enhancing processing technologies, and moving further up the value chain, Australia can significantly bolster its economic resilience and global market position.

Leveraging these quiet achievers not only secures a stable supply for high-tech industries, but also ensures Australia’s sustainable and economically viable future in the critical minerals landscape.

Citation:
Though small in volume, gallium and germanium hold big potential for the global critical minerals market (2024, June 20)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-small-volume-gallium-germanium-big.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

New all-optical approach could miniaturize night vision technology

0
New all-optical approach could miniaturize night vision technology


New all-optical approach to revolutionize night vision technology
Infrared (IR) to visible (VIS) up-conversion for vision applications. a) Schematic of the nonlinear up-converter for infrared imaging, where infrared light illuminating an object and passing through a lens (L1) is coherently up-converted to visible light and captured by another lens (L2) to be finally observed on a conventional silicon-based camera. b) The ideal up-converter shall convert all rays, incident at different angles, with the same efficiency, i.e., H(k) = constant. Credit: Advanced Materials (2024). DOI: 10.1002/adma.202402777

Researchers from TMOS, the ARC Center of Excellence for Transformative Meta-Optical Systems, have made significant progress in their journey to deliver a new approach to night vision technology, creating an infrared filter that is thinner than a piece of cling wrap, and that could one day be placed on everyday eyewear, allowing the user to view the infrared and visible light spectrum at the same time.

Night vision devices have primarily been used by the military, hunting enthusiasts willing to lug around multipurpose binoculars, or photographers happy to carry around heavy lenses. This is due to the weight and bulk of the technology. The average person is not going for a night-time run with an additional kilo strapped to their forehead.

Miniaturizing night vision could therefore lead to widespread adoption. Creating night vision filters that weigh less than a gram and can sit as a film across a pair of traditional spectacles opens up new, everyday applications.

Consumer night vision glasses that allow the user to see the visible and infrared spectrum at the same time could result in safer driving in the dark, safer nighttime walks, and less hassle working in low-light conditions that currently require bulky and often uncomfortable headlamps.

In research published in Advanced Materials, TMOS researchers from the Australian National University demonstrate enhanced infrared vision non-linear upconversion technology using a non-local lithium niobate metasurface.

Traditional night vision technology, specifically image intensifiers, requires infrared photons to pass through a lens, encounter a photocathode that transforms these photons into electrons, and then go through a microchannel plate to increase the number of electrons generated. These electrons travel through a phosphor screen to be reconverted back to photons, producing an intensified visible image that can be seen by the eye.

Unlike thermal imaging systems, which operate at much longer wavelengths and often require cryogenic cooling to prevent thermal noise, image intensifiers used in night vision devices do not generally require such cooling. However, a high-quality night vision system, like the one described above, is heavy and bulky. Additionally, these systems often block visible light.

The metasurface-based upconversion technology requires fewer elements, drastically reducing its footprint. Photons pass through a single resonant metasurface where they are mixed with a pump beam. The resonant metasurface enhances the energy of the photons, drawing them into the visible light spectrum—no conversion of electrons needed. It also works at room temperature, eliminating the need for bulky and heavy cooling systems.

In addition, traditional infrared and visible imaging systems cannot produce identical images, as they capture images from each spectrum side-by-side. By using up-conversion technology, imaging systems can capture both the visible and non-visible in one image.

The work is an improvement on the researchers’ original technology, which featured a gallium arsenide metasurface. Their new metasurface is made from lithium niobate, which is fully transparent in the visible range, making it far more efficient. In addition, the photon beam is spread over a wider surface area, limiting angular loss of data.

Lead author Laura Valencia Molina says, “People have said that high efficiency up-conversion of infrared to visible is impossible because of the amount of information not collected due to the angular loss that is inherent in non-local metasurfaces. We overcome these limitations and experimentally demonstrate high efficiency image up-conversion.”

Author Rocio Camacho Morales says, “This is the first demonstration of high resolution up-conversion imaging from 1550 nm infrared to visible 550 nm light in a non-local metasurface. We choose these wavelengths because 1550 nm, an infrared light, is commonly used for telecommunications, and 550 nm is visible light to which human eyes are highly sensitive.

“Future research will include expanding the range of wavelengths the device is sensitive to, aiming to obtain broadband IR imaging, as well as exploring image processing, including edge detection.”

Chief Investigator Dragomir Neshev says, “These results promise significant opportunities for the surveillance, autonomous navigation, and biological imaging industries, among others. Decreasing the size weight and power requirements of night vision technology is an example of how meta-optics, and the work TMOS is doing, is crucial to Industry 4.0 and the future extreme miniaturization of technology.”

More information:
Laura Valencia Molina et al, Enhanced Infrared Vision by Nonlinear Up‐Conversion in Nonlocal Metasurfaces, Advanced Materials (2024). DOI: 10.1002/adma.202402777

Provided by
ARC Centre of Excellence for Transformative Meta-Optical Systems (TMOS)

Citation:
New all-optical approach could miniaturize night vision technology (2024, June 3)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-optical-approach-miniaturize-night-vision.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

The limits of ChatGPT for scriptwriting

0
The limits of ChatGPT for scriptwriting


Censoring creativity: The limits of ChatGPT for scriptwriting
This diagram shows the process by which the researchers audited ChatGPT, using the first episode of Game of Thrones as an example. Credit: Yaaseen Mahomed, Charlie M. Crawford, Sanjana Gautam, Sorelle A. Friedler, Danaë Metaxa

Last year, the Writers Guild of America (WGA) labor union, which represents film and TV writers, went on strike for nearly five months, in part to regulate AI’s role in scriptwriting. “Alexa will not replace us,” read one picket sign.

Now, researchers at Penn Engineering, Haverford College, and Penn State have presented a paper at the 2024 Association of Computing Machinery Conference on Fairness, Accountability and Transparency (ACM FAccT) that identifies a previously unreported drawback to writing scripts using OpenAI’s ChatGPT: content moderation so overzealous that even some PG-rated scripts are censored, potentially limiting artistic expression.

The research is published in The 2024 ACM Conference on Fairness, Accountability, and Transparency.

The guidelines established by the agreement between the WGA and the Association of Motion Picture and Television Producers (AMPTP) that ended the strike permitted certain uses of AI in scriptwriting. While both the WGA and AMPTP agreed that AI cannot be credited as a writer, they allowed the use of AI as a tool in the creative process.

The new study raises questions about the efficacy of this approach, showing that automated content moderation restricts ChatGPT from producing content that has already been permitted on television. ChatGPT’s automated content moderation filters for topics including violence, sexuality and hate speech to prevent the generation of inappropriate or dangerous content.

In the study, which examined both real and ChatGPT-generated scripts for IMDb’s 100 most-watched television shows, including Game of Thrones, Stranger Things and 13 Reasons Why, ChatGPT flagged nearly 20% of scripts that ChatGPT itself generated for content violations, and nearly 70% of actual scripts from the TV shows on the list, including half of tested PG-rated shows.

“If AI is used to generate cultural content, such as TV scripts, what stories won’t be told?” write the paper’s co-senior authors, Danaë Metaxa, Raj and Neera Singh Assistant Professor in Computer and Information Science (CIS) at Penn Engineering, and Sorelle Friedler, Shibulal Family Computer Science Professor at Haverford College.

“We tested real scripts,” says Friedler, “and 69% of them wouldn’t make it through the content filters, including even some of the PG-rated ones. That really struck me as indicative of the system being a little overager to filter out content.”

Censoring creativity: The limits of ChatGPT for scriptwriting
Researchers found that even shows rated TV-PG were flagged by ChatGPT for content violations. Credit: University of Pennsylvania

Prompted by the writers’ strike, the project began with Friedler and Metaxa wondering if a large language model (LLM) like ChatGPT could actually produce a high-quality script. “We started trying to produce scripts with LLMs,” recalls Metaxa, “and we found that before we could even get to the question of whether the script is high quality, in many cases we were not able to get the LLM to generate a script at all.”

In one instance, given a prompt drawn from a summary of an episode of Game of Thrones, ChatGPT declined to produce the script and responded with a red warning, “This content may violate our usage policies.”

To study ChatGPT’s content moderation system, the researchers employed a technique known as an “algorithm audit,” which draws conclusions about software whose internal workings remain proprietary by analyzing the software’s outputs.

The team, which also included first author Yaaseen Mahomed, a recent master’s graduate in CIS at Penn Engineering, Charlie M. Crawford, an undergraduate at Haverford, and Sanjana Gautam, a Ph.D. student in Informatics at Penn State, repeatedly queried ChatGPT, asking it to write scripts based on summaries of TV show episodes pulled from the Internet Movie Database (IMDb) and Wikipedia.

For each script request, the team probed ChatGPT’s “content moderation endpoint,” a tool accessible to programmers that returns a list of 11 categories of prohibited content (including “hate,” “sexual” and “self-harm“) and indicates which categories, if any, were triggered by the prompt, as well as a score between 0 and 1 of ChatGPT’s confidence in its assessment of a violation for each category.

In effect, this approach allowed the team to determine why certain script-writing requests were censored, and to deduce the sensitivity of ChatGPT’s content moderation settings to particular topics, genres and age ratings.

As the paper’s authors acknowledge, content moderation is an essential part of LLMs, since removing inappropriate content from the models’ training data is extremely difficult. “If you don’t bake in some form of content moderation,” says Friedler, “then these models will spew violent and racist language at you.”

Still, as the researchers found, overzealous content moderation can easily tip into censorship and limit artistic expression. Aggregating over 250,000 outputs from the content moderation endpoint allowed the researchers to observe patterns in ChatGPT’s choice to permit (or not permit) itself to write certain scripts.

Censoring creativity: The limits of ChatGPT for scriptwriting
Certain categories were flagged for content violations more than others; real scripts had the highest rates of content violations. Credit: University of Pennsylvania

Among the researchers’ most notable findings is that different categories of potentially harmful content flag at different rates. The researchers found that scripts were very frequently flagged for violent content, driving many of the other findings, such as a high likelihood of flagging for crime and horror shows. Real scripts had high relative scores for sexual content, while GPT-generated scripts were less likely to generate content deemed inappropriately sexual in the first place.

In many cases, content seen as appropriate for TV viewers—and watched by millions of fans—was still identified as a content violation by Open AI.

TV scripts that mention self-harm, for instance, could be dangerous, or a form of artistic expression. “We need to be talking about topics like self-harm,” says Metaxa, “but with a level of care and nuance, and it’s just not in the interest of a company producing this kind of tool to put in the enormous amount of effort that it would require to walk that line carefully.”

One aspect of ChatGPT that the researchers hope to explore further is the extent to which the software’s content moderation settings filter out content related to marginalized identities. As Friedler puts it, “This type of filtering may filter out some voices and some representations of human life more than others.”

Indeed, the researchers found that ChatGPT was more likely to flag scripts describing female nudity as improperly sexual than scripts describing male nudity, and that ChatGPT was more likely to rate scripts that included descriptions of disabilities and mental illness as violent, although the researchers say that both trends need to be further investigated.

“Ironically,” says Metaxa, “the groups that are likely to be hurt by hate speech that might spew from an LLM without guardrails are the same groups that are going to be hurt by over-moderation that restricts an LLM from speaking about certain types of marginalized identities.”

In the context of the recent strike, the researchers affirm the necessity of both content moderation and artistic expression, neither of which they believe should be left entirely in the hands of autonomous systems. “Content moderation is far from a solved problem and undeniably important,” the researchers conclude. “But the solution to these issues must not be censorship.”

This study was conducted at the University of Pennsylvania School of Engineering and Applied Science, Haverford College and The Pennsylvania State University.

More information:
Yaaseen Mahomed et al, Auditing GPT’s Content Moderation Guardrails: Can ChatGPT Write Your Favorite TV Show?, The 2024 ACM Conference on Fairness, Accountability, and Transparency (2024). DOI: 10.1145/3630106.3658932

Citation:
Censoring creativity: The limits of ChatGPT for scriptwriting (2024, June 12)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-censoring-creativity-limits-chatgpt-scriptwriting.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link