Glass Surface Segmentation / ガラス表面セグメンテーション

Glass is a transparent solid material with no fixed shape. Glass surfaces are continuous glass regions widely utilized for both practical and decorative purposes, such as glass walls, doors, and windows. However, the presence of glass surfaces presents significant challenges for computer vision tasks. For instance, if robots or drones can not detect the presence of glass surfaces accurately, they may collide with these obstacles during navigation.

ガラスは、固定された形状を持たない透明な固体材料である。ガラス表面とは、ガラスの壁、ドア、窓など、実用的および装飾的な目的で広く使用される連続したガラス領域である。しかし、ガラス表面の存在はコンピュータビジョンタスクに重大な課題を引き起こす。たとえば、ロボットやドローンがガラス表面の存在を正確に検出できない場合、ナビゲーション中にこれらの障害物に衝突する可能性がある。

Glass Surface Segmentation is a specialized category of semantic segmentation. It performs regression predictions for every pixel in the input RGB image, ultimately producing an 8-bit mask image. In the output mask image, each pixel value ranges from 0 to 255, where 0 indicates that the pixel does not belong to the glass surface, and 255 indicates that it does. In the 8-bit image, pixels with a value of 0 are displayed as white, while pixels with a value of 255 are displayed as black, as shown in the figure above.

ガラス表面セグメンテーションとは、セマンティックセグメンテーションの中での特殊な種類である。入力されたRGB画像内の各ピクセルに対して回帰予測を行い、最終的に8ビットのマスク画像を生成する。出力されるマスク画像では、各ピクセルの値は0から255の範囲内にあり、0はそのピクセルがガラス表面に属さないことを、255はガラス表面に属することを示す。8ビット画像では、値が0のピクセルは白で表示され、値が255のピクセルは黒で表示される。上記の図は例を示している。

In our research, we propose two glass surface segmentation networks: CGSDNet and CGSDNet V2. Specifically, we introduce the novel Cascade Atrous Pooling (CAP) module and its improved version, CAP V2, to extract rich multi-scale contextual features (e.g., the difference between objects located inside and outside the glass, the reflection, texture and obstructed situation of the glass) of glass surfaces from images. Additionally, we develop a lightweight Holistic Boundary Detection (HBD) module to capture boundary features of glass surfaces from contextual features. Finally, we propose the Cascade Network Architecture (CNA) and its enhanced version, CNA V2, to fuse multi-level contextual and boundary features, generating boundary-enhanced dense large-field contextual features. Extensive experiments on benchmark datasets demonstrate that our proposed methods outperform State-of-the-Art (SOTA) methods in related fields.

本研究では、CGSDNet と CGSDNet V2 という 2 つのガラス表面セグメンテーションネットワークを提案した。具体的には、画像からガラス表面の豊富なマルチスケールコンテキスト特徴（例：ガラスの内側と外側にあるオブジェクトの違い、ガラスの反射、テクスチャ、遮られた状態など）を抽出するために、新しいCascade Atrous Pooling（CAP）モジュールとその改良版であるCAP V2を提案した。また、コンテキスト特徴からガラス表面の境界特徴を抽出するために、軽量なHolistic Boundary Detection（HBD）モジュールを開発した。さらに、マルチレベルのコンテキスト特徴と境界特徴を融合し、境界による強化された高密度かつ広範囲のコンテキスト特徴を生成するために、Cascade Network Architecture（CNA）およびその改良版CNA V2を提案した。ベンチマークデータセットでの広範な実験により、我々の提案手法が関連分野の最先端の手法を上回ることが示された。

Publication

Zeyuan Chen, Masahiko Mikawa and Makoto Fujisawa, “CGSDNet: Cascade Network with ConvNeXt as Backbone for Glass Surface Detection”, IEEE ICAICA, 2023. DOI: 10.1109/ICAICA58456.2023.10405501.
Zeyuan Chen, Qiang Gao, Masahiko Mikawa and Makoto Fujisawa, “GSSDENet: Network for Simultaneous Glass Surface Segmentation and Depth Estimation”, submitted.