AI researchers at Google’s DeepMind, working with colleagues at the University of British Columbia, have announced the development of Genie, an AI-backed application capable of turning a single image into a playable 2D virtual world.
The team has posted a paper on the arXiv preprint server outlining their work and have also posted an announcement page on DeepMind’s research site.
Two-dimensional video games, such as Super Mario Brothers, allow players to manipulate a character on a video screen as they proceed through a virtual world. In this new effort, the team at DeepMind has automated the process of creating 2D video games by allowing Genie to accept a single image, such as a character in front of an imagined background, and then using it to generate the rest of the game. This was made possible by training it on thousands of hours of video from hundreds of 2D video games.
To create Genie, the team first built an AI application that was able to tokenize video frames into millions of parameters that it could use to build new frames. They then added what they describe as a “latent action model” to make predictions about what a given next scene might look like based on the current image.
Next, they added a module to generate a dynamic model to make guesses about possible next sequences based on what it learned during the training phase. The result is a series of frames linked together to form what looks like a 2D virtual world.
The researchers acknowledge that Genie is still very much a work in progress. It has several limitations not easily seen in the examples provided. It takes a very long time to run, for example—it is approximately 20 to 30 times slower than what the average player would consider normal speed. It also makes a lot of mistakes—it can create unrealistic worlds that are not playable, for example. It is also currently limited in scope—it can only run 16 frames at a time.
Still, the team at DeepMind suggests that Genie demonstrates a new step forward in video game development, allowing users to generate their own games based on their own unique preferences.
More information:
Jake Bruce et al, Genie: Generative Interactive Environments, arXiv (2024). DOI: 10.48550/arxiv.2402.15391
Citation:
DeepMind demonstrates Genie, an AI app that can generate playable 2D worlds from a single image (2024, March 6)
retrieved 24 June 2024
from https://techxplore.com/news/2024-03-deepmind-genie-ai-app-generate.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
How can virtual reality (VR) be experienced haptically, i.e., through the sense of touch? This is one of the fundamental questions that modern VR research is investigating.
The award-winning work is about how physical props (technical term: “proxies”) can be used to make objects in virtual environments tangible.
“Of course, you can’t have a proxy for every virtual object, then the approach wouldn’t be scalable. In my dissertation, I, therefore, thought about what devices could look like that could be used to simulate the physical properties of several different virtual objects as effectively as possible,” explains Zenner, who completed his doctorate at the Saarbrücken Graduate School of Computer Science at Saarland University and is now conducting research at Saarland University and the German Research Center for Artificial Intelligence.
This resulted in the prototypes for two special VR controllers, “Shifty” and “Drag:on.” VR controllers are devices that can be held in the user’s hand to control or manipulate objects in virtual reality using tracking technology.
“Shifty” is a tubular controller in which a movable weight is installed. The weight can be moved along the lengthwise axis by a motor, changing the center of gravity and inertia of the rod.
“In combination with corresponding visualizations in virtual reality, Shifty can be used to create the illusion that a virtual object is getting longer or heavier,” explains Zenner. In experiments, he was able to prove that objects are perceived as lighter or smaller when the weight is close to the user’s hand and that, coupled with the corresponding visual input; they are perceived as longer and heavier the further the weight in the rod moves away from the user.
“This is mainly due to changes in the inertia of the controller, as the overall weight does not change,” explains Zenner. The research and development department of gaming giant Sony is already experimenting with this concept and cites Zenner’s work in the development of new VR controllers.
The second controller, “Drag:on,” consists of two flamenco fans that can be unfolded using servomotors, thus increasing the air resistance of the controller. This means that the further the fans are unfolded, the more force the user has to exert to move the controller through the air.
“Coupled with the right visual stimuli, Drag:on can be used to create the impression that the user is holding a small shovel or a large paddle, for example, or that they are pushing a heavy trolley or are twisting a knob that is difficult to turn,” explains Zenner.
Both controllers are basic research and so-called “proof of concepts.” This means that the prototypes can be used to show in user experiments that different controller states can improve the perception of different VR objects. Still, specific products using this technology are not yet available on the market.
With the controllers, the Saarbrücken-based computer scientist first addressed the so-called ‘similarity problem.” The aim here is to ensure that virtual and real objects feel as similar as possible. In the second part of his work, he dealt with the so-called “colocation problem,” i.e., the question of how the proxy can be spatially located in real life where the user sees it in virtual reality.
This is particularly challenging as the controllers act as proxies for different virtual objects. Consequently, the user must be given the illusion that they are reaching for various objects, although in reality, they will always grasp the same proxy.
To achieve this, the researcher made use of the already established method of “hand redirection.” As the name suggests, this involves redirecting the movement of the hand in virtual reality so that the user thinks they are reaching to the left, for example, even though they are actually stretching their hand forward.
“We conducted experiments to investigate the point at which users realize that their hand has been redirected. Our results showed that this point was reached quickly, so we thought about how we could better conceal the hand redirection,” says Zenner.
The solution: he tricked the brain by only redirecting the hand when the brain was blind to visual changes—namely during blinking. Together with a student under his supervision, he developed the appropriate software and used the eye trackers built into many VR headsets.
In control studies, the team was then able to show that their new controllers, in combination with hand redirection algorithms, led to more convincing VR perceptions than previously possible.
Provided by
Universität des Saarlandes
Citation:
Tricking the brain: New dimensions of haptics in virtual reality (2024, June 10)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-brain-dimensions-haptics-virtual-reality.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Imagine driving through a tunnel in an autonomous vehicle, but unbeknownst to you, a crash has stopped traffic up ahead. Normally, you’d need to rely on the car in front of you to know you should start braking. But what if your vehicle could see around the car ahead and apply the brakes even sooner?
Researchers from MIT and Meta have developed a computer vision technique that could someday enable an autonomous vehicle to do just that.
They have introduced a method that creates physically accurate, 3D models of an entire scene, including areas blocked from view, using images from a single camera position. Their technique uses shadows to determine what lies in obstructed portions of the scene.
They call their approach PlatoNeRF, based on Plato’s allegory of the cave, a passage from the Greek philosopher’s “Republic” in which prisoners chained in a cave discern the reality of the outside world based on shadows cast on the cave wall.
By combining lidar (light detection and ranging) technology with machine learning, PlatoNeRF can generate more accurate reconstructions of 3D geometry than some existing AI techniques. Additionally, PlatoNeRF is better at smoothly reconstructing scenes where shadows are hard to see, such as those with high ambient light or dark backgrounds.
In addition to improving the safety of autonomous vehicles, PlatoNeRF could make AR/VR headsets more efficient by enabling a user to model the geometry of a room without the need to walk around taking measurements. It could also help warehouse robots find items in cluttered environments faster.
“Our key idea was taking these two things that have been done in different disciplines before and pulling them together—multibounce lidar and machine learning. It turns out that when you bring these two together, that is when you find a lot of new opportunities to explore and get the best of both worlds,” says Tzofi Klinghoffer, an MIT graduate student in media arts and sciences, affiliate of the MIT Media Lab, and lead author of the paper on PlatoNeRF.
Klinghoffer wrote the paper with his advisor, Ramesh Raskar, associate professor of media arts and sciences and leader of the Camera Culture Group at MIT; senior author Rakesh Ranjan, a director of AI research at Meta Reality Labs; as well as Siddharth Somasundaram at MIT, and Xiaoyu Xiang, Yuchen Fan, and Christian Richardt at Meta. The research is being presented at the Conference on Computer Vision and Pattern Recognition, held 17–21 June.
Shedding light on the problem
Reconstructing a full 3D scene from one camera viewpoint is a complex problem.
Some machine-learning approaches employ generative AI models that try to guess what lies in the occluded regions, but these models can hallucinate objects that aren’t really there. Other approaches attempt to infer the shapes of hidden objects using shadows in a color image, but these methods can struggle when shadows are hard to see.
For PlatoNeRF, the MIT researchers built off these approaches using a new sensing modality called single-photon lidar. Lidars map a 3D scene by emitting pulses of light and measuring the time it takes that light to bounce back to the sensor. Because single-photon lidars can detect individual photons, they provide higher-resolution data.
The researchers use a single-photon lidar to illuminate a target point in the scene. Some light bounces off that point and returns directly to the sensor. However, most of the light scatters and bounces off other objects before returning to the sensor. PlatoNeRF relies on these second bounces of light.
By calculating how long it takes light to bounce twice and then return to the lidar sensor, PlatoNeRF captures additional information about the scene, including depth. The second bounce of light also contains information about shadows.
The system traces the secondary rays of light—those that bounce off the target point to other points in the scene—to determine which points lie in shadow (due to an absence of light). Based on the location of these shadows, PlatoNeRF can infer the geometry of hidden objects.
The lidar sequentially illuminates 16 points, capturing multiple images that are used to reconstruct the entire 3D scene.
“Every time we illuminate a point in the scene, we are creating new shadows. Because we have all these different illumination sources, we have a lot of light rays shooting around, so we are carving out the region that is occluded and lies beyond the visible eye,” Klinghoffer says.
A winning combination
Key to PlatoNeRF is the combination of multibounce lidar with a special type of machine-learning model known as a neural radiance field (NeRF). A NeRF encodes the geometry of a scene into the weights of a neural network, which gives the model a strong ability to interpolate, or estimate, novel views of a scene.
This ability to interpolate also leads to highly accurate scene reconstructions when combined with multibounce lidar, Klinghoffer says.
“The biggest challenge was figuring out how to combine these two things. We really had to think about the physics of how light is transporting with multibounce lidar and how to model that with machine learning,” he says.
They compared PlatoNeRF to two common alternative methods, one that only uses lidar and the other that only uses a NeRF with a color image.
They found that their method was able to outperform both techniques, especially when the lidar sensor had lower resolution. This would make their approach more practical to deploy in the real world, where lower resolution sensors are common in commercial devices.
“About 15 years ago, our group invented the first camera to ‘see’ around corners, that works by exploiting multiple bounces of light, or ‘echoes of light.’ Those techniques used special lasers and sensors, and used three bounces of light. Since then, lidar technology has become more mainstream, that led to our research on cameras that can see through fog,” Raskar says.
“This new work uses only two bounces of light, which means the signal to noise ratio is very high, and 3D reconstruction quality is impressive.”
In the future, the researchers want to try tracking more than two bounces of light to see how that could improve scene reconstructions. In addition, they are interested in applying more deep learning techniques and combining PlatoNeRF with color image measurements to capture texture information.
“While camera images of shadows have long been studied as a means to 3D reconstruction, this work revisits the problem in the context of lidar, demonstrating significant improvements in the accuracy of reconstructed hidden geometry. The work shows how clever algorithms can enable extraordinary capabilities when combined with ordinary sensors—including the lidar systems that many of us now carry in our pocket,” says David Lindell, an assistant professor in the Department of Computer Science at the University of Toronto, who was not involved with this work.
This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.
Citation:
Researchers leverage shadows to model 3D scenes, including objects blocked from view (2024, June 18)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-leverage-shadows-3d-scenes-blocked.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Facial recognition startup Clearview AI reached a settlement Friday in an Illinois lawsuit alleging its massive photographic collection of faces violated the subjects’ privacy rights, a deal that attorneys estimate could be worth more than $50 million.
But the unique agreement gives plaintiffs in the federal suit a share of the company’s potential value, rather than a traditional payout. Attorneys’ fees estimated at $20 million also would come out of the settlement amount.
Judge Sharon Johnson Coleman, of the Northern District of Illinois, gave preliminary approval to the agreement Friday.
The case consolidated lawsuits from around the U.S. filed against Clearview, which pulled photos from social media and elsewhere on the internet to create a database it sold to businesses, individuals and government entities.
The company settled a separate case alleging violation of privacy rights in Illinois in 2022, agreeing to stop selling access to its database to private businesses or individuals. That agreement still allowed Clearview to work with federal agencies and local law enforcement outside Illinois, which has a strict digital privacy law.
Clearview does not admit any liability as part of the latest settlement agreement.
“Clearview AI is pleased to have reached an agreement on this class action settlement,” James Thompson, an attorney representing the company in the suit, said in a written statement Friday.
The lead plaintiffs’ attorney Jon Loevy said the agreement was a “creative solution” necessitated by Clearview’s financial status.
“Clearview did not have anywhere near the cash to pay fair compensation to the class, so we needed to find a creative solution,” Loevy said in a statement. “Under the settlement, the victims whose privacy was breached now get to participate in any upside that is ultimately generated, thereby recapturing to the class to some extent the ownership of their biometrics.”
It’s not clear how many people would be eligible to join the settlement. The agreement language is sweeping, including anyone whose images or data are in the company’s database and who lived in the U.S. starting on July 1, 2017.
A national campaign to notify potential plaintiffs is part of the agreement.
The attorneys for Clearview and the plaintiffs worked with Wayne Andersen, a retired federal judge who now mediates legal cases, to develop the settlement. In court filings presenting the agreement, Andersen bluntly writes that the startup could not have paid any legal judgment if the suit went forward.
“Clearview did not have the funds to pay a multi-million-dollar judgment,” he is quoted in the filing. “Indeed, there was great uncertainty as to whether Clearview would even have enough money to make it through to the end of trial, much less fund a judgment.”
But some privacy advocates and people pursuing other legal action called the agreement a disappointment that won’t change the company’s operations.
Sejal Zota is an attorney and legal director for Just Futures Law, an organization representing plaintiffs in a California suit against the company. Zota said the agreement “legitimizes” Clearview.
“It does not address the root of the problem,” Zota said. “Clearview gets to continue its practice of harvesting and selling people’s faces without their consent, and using them to train its AI tech.”
Citation:
Facial recognition startup Clearview AI settles privacy suit (2024, June 22)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-facial-recognition-startup-clearview-ai.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Using NASA’s first two-way, end-to-end laser relay system, pictures and videos of cherished pets flew through space over laser communications links at a rate of 1.2 gigabits per second—faster than most home internet speeds.
NASA astronauts Randy Bresnik, Christina Koch, and Kjell Lindgren, along with other agency employees, submitted photos and videos of their pets to take a trip to and from the International Space Station.
The transmissions allowed NASA’s SCaN (Space Communications and Navigation) program to showcase the power of laser communications while simultaneously testing out a new networking technique.
“The pet imagery campaign has been rewarding on multiple fronts for the ILLUMA-T, LCRD, and HDTN teams,” said Kevin Coggins, deputy associate administrator and SCaN program manager at NASA Headquarters in Washington. “Not only have they demonstrated how these technologies can play an essential role in enabling NASA’s future science and exploration missions, it also provided a fun opportunity for the teams to ‘picture’ their pets assisting with this innovative demonstration.”
This demonstration was inspired by “Taters the Cat”—an orange cat whose video was transmitted 19 million miles over laser links to the DSOC (Deep Space Optical Communications) payload on the Psyche mission. LCRD, DSOC, and ILLUMA-T are three of NASA’s ongoing laser communications demonstrations to prove out the technology’s viability.
The images and videos started on a computer at a mission operations center in Las Cruces, New Mexico. From there, NASA routed the data to optical ground stations in California and Hawaii. Teams modulated the data onto infrared light signals, or lasers, and sent the signals to NASA’s LCRD (Laser Communications Relay Demonstration) located 22,000 miles above Earth in geosynchronous orbit. LCRD then relayed the data to ILLUMA-T (Integrated LCRD Low Earth Orbit User Modem and Amplifier Terminal), a payload currently mounted on the outside of the space station.
Since the beginning of space exploration, NASA missions have relied on radio frequency communications to send data to and from space. Laser communications, also known as optical communications, employ infrared light instead of radio waves to send and receive information.
While both infrared and radio travel at the speed of light, infrared light can transfer more data in a single link, making it more efficient for science data transfer. This is due to infrared light‘s tighter wavelength, which can pack more information onto a signal than radio communications.
This demonstration also allowed NASA to test out another networking technique. When data is transmitted across thousands and even millions of miles in space, the delay and potential for disruption or data loss is significant. To overcome this, NASA developed a suite of communications networking protocols called Delay/Disruption Tolerant Networking, or DTN. The “store-and-forward” process used by DTN allows data to be forwarded as it is received or stored for future transmission if signals become disrupted in space.
To enable DTN at higher data rates, a team at NASA’s Glenn Research Center in Cleveland developed an advanced implementation, HDTN (High-Rate Delay Tolerant Networking). This networking technology acts as a high-speed path for moving data between spacecraft and across communication systems, enabling data transfer at a speed of up to four times faster than current DTN technology—allowing high-speed laser communication systems to utilize the “store-and-forward” capability of DTN.
The HDTN implementation aggregates data from a variety of different sources, like discoveries from the scientific instrumentation on the space station, and prepares the data for transmission back to Earth. For the pet photo and video experiment, the content was routed using DTN protocols as they traveled from Earth to LCRD, to ILLUMA-T on the space station. Once they arrived, an onboard HDTN payload demonstrated its ability to receive and reassemble the data into files.
This optimized implementation of DTN technology aims to enable a variety of communications services for NASA, from improving security through encryption and authentication to providing network routing of 4K high-definition multimedia and more. All of these capabilities are being tested on the space station with ILLUMA-T and LCRD.
As NASA’s Artemis campaign prepares to establish a sustainable presence on and around the moon, SCaN will continue to develop ground-breaking communications technology to bring the scalability, reliability, and performance of the Earth-based internet to space.
Citation:
NASA’s laser relay system sends pet imagery to and from Space Station (2024, June 11)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-nasa-laser-relay-pet-imagery.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.