Now Facebook’s AI model can anticipate your future actions
Anticipating the next moves and predicting the same ones with precision is certainly exciting but difficult. For example, it can be easy to predict whether the next ball in a game of cricket will be hit for a six or a four; however, any bad prediction will not be a costly affair. Consider another situation where an autonomous vehicle is on the road at a stop sign. Now the situation requires the AV to predict whether the pedestrian will cross the road or not. Anticipating future activities is a difficult issue for AI because it requires both predicting the multimodal distribution of future activities and modeling the course of previous actions.
To meet this challenge, two researchers, namely Rohit Girdhar of Facebook AI Research and Kristen Grauman of the University of Texas, Austin, have come together to come up with Anticipative Video Transformer (AVT).
Register>>
The science behind AVT
The researchers took advantage of recent advances in transformer architectures, particularly for image modeling and natural language processing for AVT. This end-to-end attention-based video modeling architecture takes previously observed video into account in order to anticipate future actions.
The model is designed to produce predictions for future actions, given an input video clip. To accomplish the same thing, it relies on a two-stage architecture, consisting of:
- Unifying network that works on individual images or short clips. This backbone, called AVT-b, adopts the recently proposed Vision Transformer (ViT) architecture, and it has previously shown impressive results for the classification of static images, followed by;
- Head architecture which operates on the characteristics at the image / clip level to predict future characteristics and actions. It is called AVT-h and is used to predict the future characteristics of each input frame using a causal transformer decoder.
Additionally, AVT uses causal attention modeling – predicting future actions based solely on frames seen so far – and is trained using goals inspired by self-supervised learning. The architecture of the AVT model is shown below:
Image source: paper
In addition to this, the researchers train the model to predict future actions and features using three losses:
- First, feature classification was performed in the last frame of a video clip to predict future tagged action.
- Second, the model regresses the intermediate frame characteristic to the characteristics of subsequent frames, which ultimately causes the model to predict what the next possible step will be.
- Third, they train the model to classify intermediate actions.
“Through extensive experimentation on four popular benchmarks, we show its applicability to anticipate future actions, achieve peak results and demonstrate the importance of its anticipatory training goals,” according to the paper.
Talking about some of its future applications, the researchers believe that AVT would be a good candidate for tasks beyond anticipation, such as self-supervised learning, general recognition of actions in tasks requiring attention. modeling of a temporal order, and even the discovery of patterns and limits of action.
Recent advances in Facebook’s AI
- In a recent claim, Facebook AI introduced a new audio-only language model – Generative Spoken Language Model (GSLM). This can now be considered the first high performance text independent NLP model. GSLM can operate directly from raw audio signals without labels or text, with optional voice input to voice output, expanding the boundaries of NLP without text in various spoken languages.
- Last month, the Facebook team introduced the Instance-Conditioned GAN (IC-GAN), a new model of image generation. With or without input photographs of the drive assembly, this new model produces diverse, high-quality images. In addition, IC-GANs, unlike previous approaches, can produce combinations of realistic and unexpected images.
- Opacus, a free open source library for training deep learning models with differential privacy, was recently released by Facebook. This new tool is intended to be simple, flexible and fast. It has a simple, user-friendly API that allows ML practitioners to privateize a training pipeline with just two lines of code.
Facebook’s recent AI advancements in AI and ML have come a long way. Every now and then, the organization’s researchers advance the field of artificial intelligence with good results-oriented work.
Subscribe to our newsletter
Receive the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community