last posts

This article on Artificial Intelligence (AI) from South Korea offers FFNeRV: a new per-frame video representation using per-frame flow maps and multi-resolution temporal grids

techsm5

Research on neural fields, which represent signals by mapping coordinates to their quantities (e.g., scalars or vectors) with neural networks, has exploded recently. This has sparked increased interest in using this technology to handle a variety of signals including audio, image, 3D form and video. The Universal Approximation Theorem and coordinate coding techniques provide the theoretical basis for an accurate signal representation of brain fields. Recent investigations have shown its adaptability in data compression, generative models, signal manipulation and basic signal representation.

Figure 1 shows (a) the general structure of the proposed stream-guided per-image representations, (b) per-frame video representations, (c) per-pixel video representations (FFNeRV)

Research on neural fields, which represent signals by mapping coordinates to their quantities (e.g., scalars or vectors) with neural networks, has exploded recently. This has sparked increased interest in using this technology to handle a variety of signals including audio, image, 3D form and video. The Universal Approximation Theorem and coordinate coding techniques provide the theoretical basis for an accurate signal representation of brain fields. Recent investigations have shown its adaptability in data compression, generative models, signal manipulation and basic signal representation.

Each time coordinate is represented by a video frame created by a stack of MLP and convolutional layers. Compared to the basic neural field design, our method significantly reduced encoding time and outperformed common video compression techniques. This paradigm is followed by the recently suggested E-NeRV while improving video quality. As shown in Figure 1, they offer flow-guided neural representations for movies (FFNeRV). They embed optical flows in the frame-by-frame representation to utilize temporal redundancy, drawing inspiration from common video codecs. By combining nearby flow-directed frames, FFNeRV creates a video frame that enforces the reuse of pixels from previous frames. Encouraging the network to avoid re-storing the same pixel values ​​from frame to frame greatly improves parameter efficiency.

Discover Hailo-8™: an artificial intelligence processor that uses computer vision for multi-camera and multi-person re-identification (sponsored)

FFNeRV beats alternative algorithms per frame in video compression and frame interpolation, according to experimental results on the UVG dataset. They suggest using multi-resolution temporal grids with fixed spatial resolution instead of MLP to map continuous temporal coordinates to corresponding latent features to further improve compression performance. This is driven by grid-based neural representations. Additionally, they suggest using a more condensed convolutional architecture. They use group and point convolutions in the recommended per-image stream representations, driven by generative models that produce high-quality images and lightweight neural networks. FFNeRV beats popular video codecs (H.264 and HEVC) and performs on par with state-of-the-art video compression algorithms using quantization-aware training and entropy coding. The code implementation is based on NeRV and is available on GitHub.


Check Paper, GithubGenericNameand Project. All credit for this research goes to the researchers on this project. Also don’t forget to register. our Reddit page and discord channelwhere we share the latest AI research news, cool AI projects, and more.


Aneesh Tickoo is an intern consultant at MarktechPost. He is currently pursuing his undergraduate studies in Data Science and Artificial Intelligence at Indian Institute of Technology (IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He enjoys connecting with people and collaborating on interesting projects.


techsm5

Comments



Font Size
+
16
-
lines height
+
2
-