Thursday, January 9, 2025
Home Blog Page 1487

Innovative bird eye–inspired camera developed for enhanced object detection

0
Innovative bird eye–inspired camera developed for enhanced object detection


Innovative bird-eye-inspired camera developed for enhanced object detection
Figure 1. Structures and functions of bird’s eye. (a) Bird vision. (b) Deep central fovea and four types of cones. (c) Foveated vision and tetrachromatic vision. Credit: Science Robotics (2024). DOI: 10.1126/scirobotics.adk6903

The eyes of raptors can accurately perceive prey from kilometers away. Is it possible to model camera technology after birds’ eyes? Researchers have developed a new type of camera that is inspired by the structures and functions of birds’ eyes. A research team led by Prof. Kim Dae-Hyeong at the Center for Nanoparticle Research within the Institute for Basic Science (IBS), in collaboration with Prof. Song Young Min at the Gwangju Institute of Science and Technology (GIST), has developed a perovskite-based camera specializing in object detection.

The work is published in the journal Science Robotics.

The eyes of different organisms in the natural world have evolved and been optimized to suit their habitat. As a result of countless years of evolutionary adaptation and flying at high altitudes, bird eyes have developed unique structures and visual functions.

In the retina of an animal’s eye, there is a small pit called the fovea that refracts the light entering the eye. Unlike the shallow foveae found in human eyes, bird eyes have deep central foveae, which refract the incoming light to a large extent. The region of the highest cone density lies within the foveae (Figure 1b), allowing the birds to clearly perceive distant objects through magnification (Figure 1c). This specialized vision is known as foveated vision.

While human eyes can only see visible light, bird eyes have four cones that respond to ultraviolet (UV) as well as visible (red, green, blue; RGB) light. This tetrachromatic vision enables birds to acquire abundant visual information and effectively detect target objects in a dynamic environment (Figure 1c).

Innovative bird-eye-inspired camera developed for enhanced object detection
Figure 2. Bird-eye-inspired camera. (a) Schematic view of bird-eye-inspired camera. (b) Artificial fovea. (c) Schematic of a multispectral image sensor. (d) Multispectral image sensor. Credit: Science Robotics (2024). DOI: 10.1126/scirobotics.adk6903

Inspired by these capabilities, the IBS research team designed a new type of camera that specializes in object detection, incorporating artificial fovea and a multispectral image sensor that responds to both UV and RGB (Figure 2a).

First, the researchers fabricated the artificial fovea by mimicking the deep central foveae in the bird’s eyes (Figure 2b) and optimized the design through the optical simulation. This allows the camera to magnify distant target objects without image distortion.

The team then used perovskite, a material known for its excellent electrical and optical properties, to fabricate the multispectral image sensor. Four types of photodetectors were fabricated using different perovskite materials that absorb different wavelengths. The multispectral image sensor was finally fabricated by vertically stacking the four photodetectors (Figure 2c and 2d).

The first co-author Dr. Park Jinhong states, “We also developed a new transfer process to vertically stack the photodetectors. By using the perovskite patterning method developed in our previous research, we were able to fabricate the multispectral image sensor that can detect UV and RGB without additional color filters.”

Innovative bird-eye-inspired camera developed for enhanced object detection
Figure 3. Performance of the bird-eye-inspired camera. (a) Setup for measurement. (b) Bird-eye-inspired camera perceives both the distant object (star) through magnification in the foveal region and nearby objects (triangle, square, circle) in the peripheral region. (c, d) The multispectral image sensor can distinguish UV and RGB light without color filters and capture colored images. Credit: Science Robotics (2024). DOI: 10.1126/scirobotics.adk6903

Conventional cameras that use a zoom lens to magnify objects have the disadvantage of focusing only on the target object and not its surroundings. On the other hand, the bird-eye-inspired camera provides both a magnified view of the foveal region along with the surrounding view of the peripheral region (Figure 3a and 3b).

By comparing the two fields of vision, the bird-eye-inspired camera can achieve greater motion detection capabilities than the conventional camera (Figure 3c and 3d). In addition, the camera is more cost-effective and lightweight as it can distinguish UV and RGB light without additional color filters.

The research team verified the object recognition and motion detection capabilities of the developed camera through simulations. In terms of object recognition, the new camera demonstrated a confidence score of 0.76, which is about twice as high as the existing camera system’s confidence score of 0.39. The motion detection rate also increased by 3.6 times compared to the existing camera system, indicating significantly enhanced sensitivity to motion.

“Birds’ eyes have evolved to quickly and accurately detect distant objects while in flight. Our camera can be used in areas that need to detect objects clearly, such as robots and autonomous vehicles. In particular, the camera has great potential for application to drones operating in environments similar to those in which birds live,” remarked Prof. Kim.

This innovative camera technology represents a significant advancement in object detection, offering numerous potential applications across various industries.

More information:
Jinhong Park et al, Avian eye–inspired perovskite artificial vision system for foveated and multispectral imaging, Science Robotics (2024). DOI: 10.1126/scirobotics.adk6903

Citation:
Innovative bird eye–inspired camera developed for enhanced object detection (2024, May 30)
retrieved 24 June 2024
from https://techxplore.com/news/2024-05-bird-eyeinspired-camera.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

A virtual reality pegboard test shows performance does not always match user preference

0
A virtual reality pegboard test shows performance does not always match user preference


A virtual reality pegboard test shows performance does not always match user preference
VR pegboard image and study participant. Credit: Laurent Voisard et al

Virtual hand interactions are one of the most common and useful applications that virtual reality (VR) systems offer users. But, as a new Concordia-led study shows, personal preference remains an important factor in how the technology is applied, regardless of the effect on overall performance.

In a paper presented at the IEEE International Symposium on Mixed and Augmented Reality (ISMAR) in October 20203, the researchers shared their findings from experiments involving participants performing repetitive tasks on a VR-based Purdue Pegboard Test (PPT).

One of the many applications of PPT is as a therapeutic tool for patients who have suffered neurological damage, such as a stroke. It is designed to improve gross and fine motor skills.

The participants were equipped with a VR headset. They were then instructed to pick up a virtual object and place it in a hole as quickly and as accurately as possible. Variations involved using dominant and non-dominant hands, both hands and assembly tasks.

The tasks were repeated across three separate modes. In the first instance, the user’s virtual hand was opaque, meaning they could not through see through it. In the second instance, the outline of the user’s hand was visible but the hand itself was transparent. And in the third case, the hand disappeared once the peg was picked up.

Metrics such as duration, downtime, movement time, path length, linear velocity, angle and angular velocity were recorded.

The opaque hands were found to have performed noticeably slower. Users opened their fingers more narrowly and performed fewer tasks when compared to invisible hand visualization.

“This is what we hypothesized, because the invisible hand visualization does not occlude the object the participant is holding,” says lead author Laurent Voisard. “The invisible hand gives users more control and lets them see where they are placing their peg better. It also increases motor dexterity when performing movements requiring fine hand movements. This case could be used to create more effective and efficient medical applications in VR.

“But the participants did not all necessarily prefer the invisible hand,” he adds. “In fact, 10 participants said they preferred the transparent hand while seven chose the opaque hand. Seven others selected the invisible hand.”

A virtual reality pegboard test shows performance does not always match user preference
left to right: Transparent hand, opaque hand, invisible hand. Credit: Laurent Voisard et al

Participants who preferred the transparent hand emphasized that they felt the hands and the environment were easier to perceive at the same time. They also said it was easy to interact with the objects.

Participants who preferred the opaque hand said movements were easier to track and control. Conversely, participants who liked the invisible hand said they found it easier and more comfortable to accomplish the task and to understand when it was completed.

Personalizing home rehab

The researchers say they hope the study can serve as a basis for more research. Potential topics include how VR and PPT can be used therapeutically, and how they can be applied in technical fields such as surgery planning.

“Every individual is different, so they will have different preferences. That is why we recommend giving users the choice of how they visualize their VR experience,” says co-author Anil Ufuk Batmaz, an assistant professor the Department of Computer Science and Software Engineering at the Gina Cody School of Engineering and Computer Science. Batmaz is also the director of the EXIT Lab.

“One visualization may have better results. But if it is not preferred by users, then they may not use the system at all.”

“The PPT is often used as a diagnostic tool by neurologists for people who have suffered brain injuries or strokes. However, it can also be used for rehabilitation,” notes co-author Marta Kersten-Oertel, an associate professor in the same department and the director of the Applied Perception Lab.

“Studies like ours show the best interaction methods for doing this type of rehabilitation at home in a virtual environment.”

Amal Hatira and Mine Sarac at Kadir Has University in Istanbul, Turkey, also contributed to this study.

More information:
Laurent Voisard et al, Effects of Opaque, Transparent and Invisible Hand Visualization Styles on Motor Dexterity in a Virtual Reality Based Purdue Pegboard Test, 2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (2023). DOI: 10.1109/ISMAR59233.2023.00087

Citation:
A virtual reality pegboard test shows performance does not always match user preference (2024, January 30)
retrieved 24 June 2024
from https://techxplore.com/news/2024-01-virtual-reality-pegboard-user.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Study reveals strategies for effective Industry 4.0 implementation

0
Study reveals strategies for effective Industry 4.0 implementation


architecture
Credit: Pixabay/CC0 Public Domain

Constructor University researchers, Prof. Dr.-Ing. Hendro Wicaksono, Linda Angreani and Annas Vijaya, published a study on Industry 4.0 technologies in the Journal of Manufacturing Technology Management.

Their research powerfully illustrates how companies can navigate the complexities of integrating advanced technologies, such as automation and the Internet of Things (IoT) into their manufacturing processes.

This research is unique as it is the first to explore the alignment between maturity models and reference architecture models, offering valuable insights for companies striving to enhance their Industry 4.0 adoption strategies.

The study does so by introducing a comprehensive maturity model to assess an industry’s readiness to adopt Industry 4.0, aligned with reference architecture models (RAMs) like RAMI4.0, NIST-SME, IMSA, IVRA, and IIRA, enabling better implementation strategies for companies.

“One of the significant findings is the identification of varied interpretations of Industry 4.0 maturity models within organizations. The research highlights the critical challenge of aligning these models with established RAMs, which is essential for a successful Industry 4.0 transformation,” write Angreani and Vijaya, both research associates under Prof Hendro Wicaksono, from Constructor University.

“Additionally, the study reveals that both maturity models and reference architectures often overlook human and cultural aspects, which are vital for effective implementation.”

More information:
Linda Salma Angreani et al, Enhancing strategy for Industry 4.0 implementation through maturity models and standard reference architectures alignment, Journal of Manufacturing Technology Management (2024). DOI: 10.1108/JMTM-07-2022-0269

Citation:
Study reveals strategies for effective Industry 4.0 implementation (2024, June 20)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-reveals-strategies-effective-industry.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Japan’s high-tech toilets go global

0
Japan's high-tech toilets go global


With their warm seats and precision spray technology, bidet toilets are the norm in Japan
With their warm seats and precision spray technology, bidet toilets are the norm in Japan.

As Japan plays host to a record influx of tourists, one of the country’s more private attractions—the high-tech toilet—is becoming a must-have in luxury bathrooms worldwide.

With their warm seats and precision spray technology, bidet toilets are the norm in Japan, where more than 80 percent of homes have one, according to a government survey.

Now sales are surging abroad and especially in the United States, led by A-list bidet fans such as Drake, the Kardashians and Alexandria Ocasio-Cortez.

Japanese company TOTO, which pioneered the electric bidets it claims have sparked “a global revolution from wiping to washing”, says overseas revenue for toilets has roughly doubled from 100 billion yen ($673 million) in 2012.

The pandemic was a key driver, bringing a home-renovation boom but also germ-conscious consumers desperate for an alternative to toilet paper after shelves were cleared by panic-buyers.

Senior TOTO executive Shinya Tamura, who oversees international business, told AFP the brand’s growth has been a word-of-mouth success.

When people first learn how the toilets’ water jets work, with pressure and temperature controls, “there’s an image that it’s not pleasant”.

But “we can’t explain how good it is with words. You need to experience it”, Tamura said.

“After a while, most users can’t live without it.”

The company’s international net sales for housing equipment are currently less than a third of those in Japan.

It wants to boost sales in the Americas by 19 percent over two years to “establish a solid position” there and offset less urgent demand in China.

But with more people in the market for a squeaky clean bum, US competitors are challenging TOTO and its Japanese rivals such as Panasonic and LIXIL for their throne.

A staff member makes a toilet at a factory of Japanese toilet manufacturer TOTO in the city of Kitakyushu, Fukuoka Prefecture
A staff member makes a toilet at a factory of Japanese toilet manufacturer TOTO in the city of Kitakyushu, Fukuoka Prefecture.

‘Smartest toilet’

At a major tech fair in Las Vegas this year, the marketing manager of US brand Kohler called its Numi 2.0—which takes spoken instructions via an in-built Amazon Alexa—”the smartest toilet that exists”.

Just like top-end Japanese models, the Numi 2.0 has an automatic deodoriser and a motion-activated lid that opens when you enter the bathroom and closes when you leave.

Its spray wand has pulsating and oscillating functions, and users can adjust the warm-air dryer in minute detail.

But such pampering comes at a price: around $8,500 to $10,000, compared to around $500 for more basic bidet seats.

Americans who travel to Japan are often inspired to upgrade their toilet, a salesman at Ardy’s Bath Collection in Beverly Hills told AFP.

“They see it in the airport, and they see it in public restrooms, and they use it, and they’re like, ‘wow, this is great,'” he said.

Bidets are “popular everywhere” but it’s still a “private experience” and “weird to talk about” for some customers.

Although fancy Japanese-style toilets are fast becoming a status symbol, TOTO’s executives have long fought prudishness when trying to expand abroad.

After the US launch of its Washlet bidet in 1986, the firm struggled to place advertisements, and its pop-up event was kicked out of a high-end mall because other stores complained.

Japanese company TOTO says overseas revenue for toilets has roughly doubled from 100 billion yen ($673 million) in 2012
Japanese company TOTO says overseas revenue for toilets has roughly doubled from 100 billion yen ($673 million) in 2012.

‘Does it hurt?’

How things have changed in the share-all internet era.

“Why am I nervous? Does it hurt? Is it cold?” 21-year-old Canadian Spencer Barbosa, who has 10 million TikTok followers, said in a clip of her trying a Japanese toilet.

Superstar rapper Drake made a grand public gesture of gifting his friend DJ Khaled luxury TOTO loos in 2022.

And US congresswoman Ocasio-Cortez joked in an Instagram video last year that she was shopping for a bidet after going to Japan because “life will never be the same”.

Funnily enough, when TOTO first began selling bidets—to hospitals in Japan—it imported them from the United States, but users complained that the stream was unstable.

The company was founded in 1917 as a father and son from a wealthy business family tried to bring Western-style ceramic toilets to Japan.

With sewer systems still undeveloped and squat-style toilets common, the business struggled, so they relied on tableware sales until habits began to change after the 1970 World Expo in Osaka, said Junichi Koga, head of TOTO’s history museum.

More than 300 employees helped develop and test the Washlet by specifying their preferred location for the water jet.

Now, worldwide, TOTO has sold 60 million Washlets—featured in episodes of “The Kardashians” and “South Park”, which parodied the company as “TOOTTOOT”.

As the bidet craze grows, even the trepidatious might be converted in time, the Ardy’s salesman said.

He recommends customers put in the necessary electrics when they remodel their bathroom, telling them: “You could always buy it down the line”.

© 2024 AFP

Citation:
Feeling flush: Japan’s high-tech toilets go global (2024, June 2)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-flush-japan-high-tech-toilets.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

New algorithm discovers language just by watching videos

0
New algorithm discovers language just by watching videos


New algorithm discovers language just by watching videos
The algorithm DenseAV learns the meaning of language solely by associating audio and video signals. Credit: Mark Hamilton

Mark Hamilton, an MIT Ph.D. student in electrical engineering and computer science and affiliate of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), wants to use machines to understand how animals communicate. To do that, he set out first to create a system that can learn human language “from scratch.”

“Funny enough, the key moment of inspiration came from the movie ‘March of the Penguins.’ There’s a scene where a penguin falls while crossing the ice, and lets out a little belabored groan while getting up. When you watch it, it’s almost obvious that this groan is standing in for a four letter word. This was the moment where we thought, maybe we need to use audio and video to learn language.” says Hamilton. “Is there a way we could let an algorithm watch TV all day and from this figure out what we’re talking about?”

“Our model, DenseAV, aims to learn language by predicting what it’s seeing from what it’s hearing, and vice-versa. For example, if you hear the sound of someone saying ‘bake the cake at 350’ chances are you might be seeing a cake or an oven. To succeed at this audio-video matching game across millions of videos, the model has to learn what people are talking about,” says Hamilton.

A paper describing the work appears on the arXiv preprint server.

Once they trained DenseAV on this matching game, Hamilton and his colleagues looked at which pixels the model looked for when it heard a sound. For example, when someone says “dog,” the algorithm immediately starts looking for dogs in the video stream. By seeing which pixels are selected by the algorithm, one can discover what the algorithm thinks a word means.

Interestingly, a similar search process happens when DenseAV listens to a dog barking: It searches for a dog in the video stream.

“This piqued our interest. We wanted to see if the algorithm knew the difference between the word ‘dog’ and a dog’s bark,” says Hamilton. The team explored this by giving the DenseAV a “two-sided brain.” Interestingly, they found one side of DenseAV’s brain naturally focused on language, like the word “dog,” and the other side focused on sounds like barking. This showed that DenseAV not only learned the meaning of words and the locations of sounds, but also learned to distinguish between these types of cross-modal connections, all without human intervention or any knowledge of written language.

One branch of applications is learning from the massive amount of video published to the internet each day.

“We want systems that can learn from massive amounts of video content, such as instructional videos,” says Hamilton. “Another exciting application is understanding new languages, like dolphin or whale communication, which don’t have a written form of communication. Our hope is that DenseAV can help us understand these languages that have evaded human translation efforts since the beginning. Finally, we hope that this method can be used to discover patterns between other pairs of signals, like the seismic sounds the earth makes and its geology.”






Credit: Massachusetts Institute of Technology

A formidable challenge lay ahead of the team: Learning language without any text input. Their objective was to rediscover the meaning of language from a blank slate, avoiding using pre-trained language models. This approach is inspired by how children learn by observing and listening to their environment to understand language.

To achieve this feat, DenseAV uses two main components to process audio and visual data separately. This separation made it impossible for the algorithm to cheat, by letting the visual side look at the audio and vice versa. It forced the algorithm to recognize objects and created detailed and meaningful features for both audio and visual signals. DenseAV learns by comparing pairs of audio and visual signals to find which signals match and which signals do not. This method, called contrastive learning, doesn’t require labeled examples, and allows DenseAV to figure out the important predictive patterns of language itself.

One major difference between DenseAV and previous algorithms is that prior works focused on a single notion of similarity between sound and images. An entire audio clip like someone saying “the dog sat on the grass” was matched to an entire image of a dog. This didn’t allow previous methods to discover fine-grained details, like the connection between the word “grass” and the grass underneath the dog.

The team’s algorithm searches for and aggregates all the possible matches between an audio clip and an image’s pixels. This not only improved performance, but allowed the team to precisely localize sounds in a way that previous algorithms could not.

“Conventional methods use a single class token, but our approach compares every pixel and every second of sound. This fine-grained method lets DenseAV make more detailed connections for better localization,” says Hamilton.

The researchers trained DenseAV on AudioSet, which includes 2 million YouTube videos. They also created new datasets to test how well the model can link sounds and images. In these tests, DenseAV outperformed other top models in tasks like identifying objects from their names and sounds, proving its effectiveness.

“Previous datasets only supported coarse evaluations, so we created a dataset using semantic segmentation datasets. This helps with pixel-perfect annotations for precise evaluation of our model’s performance. We can prompt the algorithm with specific sounds or images and get those detailed localizations,” says Hamilton.

Due to the massive amount of data involved, the project took about a year to complete. The team says that transitioning to a large transformer architecture presented challenges, as these models can easily overlook fine-grained details. Encouraging the model to focus on these details was a significant hurdle.

Looking ahead, the team aims to create systems that can learn from massive amounts of video- or audio-only data. This is crucial for new domains where there’s lots of either mode, but not together. They also aim to scale this up using larger backbones and possibly integrate knowledge from language models to improve performance.

“Recognizing and segmenting visual objects in images, as well as environmental sounds and spoken words in audio recordings, are each difficult problems in their own right. Historically researchers have relied upon expensive, human-provided annotations in order to train machine learning models to accomplish these tasks,” says David Harwath, assistant professor in computer science at the University of Texas at Austin who was not involved in the work.

“DenseAV makes significant progress towards developing methods that can learn to solve these tasks simultaneously by simply observing the world through sight and sound—based on the insight that the things we see and interact with often make sound, and we also use spoken language to talk about them. This model also makes no assumptions about the specific language that is being spoken, and could therefore in principle learn from data in any language. It would be exciting to see what DenseAV could learn by scaling it up to thousands or millions of hours of video data across a multitude of languages.”

Additional authors are Andrew Zisserman, professor of computer vision engineering at the University of Oxford; John R. Hershey, Google AI Perception researcher; and William T. Freeman, MIT electrical engineering and computer science professor and CSAIL principal investigator.

More information:
Mark Hamilton et al, Separating the “Chirp” from the “Chat”: Self-supervised Visual Grounding of Sound and Language, arXiv (2024). arxiv.org/abs/2406.05629

Journal information:
arXiv


This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation:
New algorithm discovers language just by watching videos (2024, June 11)
retrieved 24 June 2024
from https://techxplore.com/news/2024-06-algorithm-language-videos.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link