Segmentation Masks in 2026: Semantic vs Instance Guide

Learn semantic vs instance segmentation, proper mask formats, clean dataset packaging, and practical QA checks to avoid training failures in vision pipelines.

By Yaniv Noema2026-02-16

Summary

A practical explanation of semantic vs instance segmentation, how masks should be encoded, and how to package masks + COCO + images so your training pipeline works without hacks. Includes validation checks and failure patterns.

Introduction

Segmentation datasets fail for boring reasons: mismatched sizes, ambiguous encoding, broken class maps, or masks that do not align with images. This updated 2026 version keeps it practical: what semantic and instance segmentation mean, how to package the dataset, and what to validate before training.

For segmentation dataset generation:


Semantic segmentation

Semantic segmentation assigns a class label to every pixel. All objects of the same class share the same label. Example: every "road" pixel is road.

Use semantic segmentation when:

  • you care about regions (road, sky, water)
  • you do not need to separate individual objects

Instance segmentation

Instance segmentation assigns a unique mask to each object instance. Two cars are two instances, even if both are "car".

Use instance segmentation when:

  • you need object separation (counting, tracking, picking)
  • overlapping objects must be separated

Mask encoding options (do not mix them)

Common encodings:

  1. Binary mask per instance (PNG)
  • Each mask is 0 or 255, one file per object instance.
  1. Multi-class semantic mask (PNG)
  • Each pixel value is a class id, one file per image.
  1. COCO polygon or RLE
  • Stored in coco.json under annotations.

Pick one primary encoding and document it in meta.json. Mixing encodings without clear rules creates bugs.


Clean dataset ZIP layout (recommended)

  • images/
  • masks/ (png)
  • coco/ (coco.json)
  • yolo/ (optional boxes)
  • index.csv
  • meta.json

meta.json should include:

  • class list (id to name mapping)
  • task type (semantic or instance)
  • mask encoding (binary, multi-class, RLE)
  • image size and any resize rules

Quality checks before training

Run these checks on a random sample:

  1. Alignment
  • mask width and height equals image width and height
  1. Unique values
  • semantic masks: only expected class ids appear
  1. Empty masks
  • empty masks should be rare unless expected
  1. Class mapping
  • confirm pixel values or instance ids map back to class names
  1. Overlaps
  • instance segmentation: overlapping objects have separate instances

Common mistakes and fixes

  1. Masks are shifted Cause: preprocessing resized the image but not the mask. Fix: apply the same resize and padding to both.

  2. Wrong mask values Cause: palette conversions change values. Fix: use lossless PNG and test unique values.

  3. Training works but generalization fails Cause: domain mismatch or synthetic artifacts. Fix: add real images for calibration and improve prompts.


Links

Share this article

Related Posts