Segmentation Masks in 2026: Semantic vs Instance Guide

Introduction

Segmentation datasets fail for boring reasons: mismatched sizes, ambiguous encoding, broken class maps, or masks that do not align with images. This updated 2026 version keeps it practical: what semantic and instance segmentation mean, how to package the dataset, and what to validate before training.

For segmentation dataset generation:

Semantic segmentation

Semantic segmentation assigns a class label to every pixel. All objects of the same class share the same label. Example: every "road" pixel is road.

Use semantic segmentation when:

you care about regions (road, sky, water)
you do not need to separate individual objects

Instance segmentation

Instance segmentation assigns a unique mask to each object instance. Two cars are two instances, even if both are "car".

Use instance segmentation when:

you need object separation (counting, tracking, picking)
overlapping objects must be separated

Mask encoding options (do not mix them)

Common encodings:

Binary mask per instance (PNG)

Each mask is 0 or 255, one file per object instance.

Multi-class semantic mask (PNG)

Each pixel value is a class id, one file per image.

COCO polygon or RLE

Stored in coco.json under annotations.

Pick one primary encoding and document it in meta.json. Mixing encodings without clear rules creates bugs.

Clean dataset ZIP layout (recommended)

images/
masks/ (png)
coco/ (coco.json)
yolo/ (optional boxes)
index.csv
meta.json

meta.json should include:

class list (id to name mapping)
task type (semantic or instance)
mask encoding (binary, multi-class, RLE)
image size and any resize rules

Quality checks before training

Run these checks on a random sample:

Alignment

mask width and height equals image width and height

Unique values

semantic masks: only expected class ids appear

Empty masks

empty masks should be rare unless expected

Class mapping

confirm pixel values or instance ids map back to class names

Overlaps

instance segmentation: overlapping objects have separate instances

Common mistakes and fixes

Masks are shifted Cause: preprocessing resized the image but not the mask. Fix: apply the same resize and padding to both.
Wrong mask values Cause: palette conversions change values. Fix: use lossless PNG and test unique values.
Training works but generalization fails Cause: domain mismatch or synthetic artifacts. Fix: add real images for calibration and improve prompts.