Abstract and 1. Introduction
Related Works
MaGGIe
3.1. Efficient Masked Guided Instance Matting
3.2. Feature-Matte Temporal Consistency
Instance Matting Datasets
4.1. Image Instance Matting and 4.2. Video Instance Matting
Experiments
5.1. Pre-training on image data
5.2. Training on video data
Discussion and References
\ Supplementary Material
Architecture details
Image matting
8.1. Dataset generation and preparation
8.2. Training details
8.3. Quantitative details
8.4. More qualitative results on natural images
Video matting
9.1. Dataset generation
9.2. Training details
9.3. Quantitative details
9.4. More qualitative results
This section elaborates on the video matting aspect of our work, providing details about dataset generation and offering additional quantitative and qualitative analyses. For an enhanced viewing experience, we recommend visit our website, which contains video samples from V-HIM60 and real video results of our method compared to baseline approaches.
To create our video matte dataset, we utilized the BG20K dataset for backgrounds and incorporated video backgrounds from VM108. We allocated 88 videos for training and 20 for testing, ensuring each video was limited to 30 frames. To maintain realism, each instance within a video displayed an equal number of randomly selected frames from the source videos, with their sizes adjusted to fit within the background height without excessive overlap.
\ We categorized the dataset into three levels of difficulty, based on the extent of instance overlap:
\ • Easy Level: Features 2-3 distinct instances per video with no overlap.
\ • Medium Level: Includes up to 5 instances per video, with occlusion per frame ranging from 5 to 50%.
\ • Hard Level: Also comprises up to 5 instances but with a higher occlusion range of 50 to 85%, presenting more complex instance interactions.
\ During training, we applied dilation and erosion kernels to binarized alpha mattes to generate input masks. For testing purposes, masks were created using the XMem technique, based on the first-frame binarized alpha matte.
\ We have prepared examples from the testing dataset across all three difficulty levels, which can be viewed in the website for a more immersive experience. The datasets V-HIM2K5 and V-HIM60 will be made publicly available following the acceptance of this work.
\
:::info Authors:
(1) Chuong Huynh, University of Maryland, College Park (chuonghm@cs.umd.edu);
(2) Seoung Wug Oh, Adobe Research (seoh,jolee@adobe.com);
(3) Abhinav Shrivastava, University of Maryland, College Park (abhinav@cs.umd.edu);
(4) Joon-Young Lee, Adobe Research (jolee@adobe.com).
:::
:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.
:::
\

