ZEDprofiler Walkthrough

ZEDprofiler is a CPU-first toolkit for extracting morphological features from 3D fluorescence microscopy images. It is designed for high-content and high-throughput workflows where images are volumetric z-stacks with multiple spectral channels and segmentation labels.

What this walkthrough covers

By the end of this notebook you will have:

Generated synthetic 3D image data: two channels (a compact nuclear stain and a diffuse marker with per-object colocalization) and a ground-truth label mask
Explored the data interactively: browsed the z-stack volume with a slice-by-slice viewer
Loaded image sets and objects: configured ImageSetLoader, ObjectLoader, and TwoObjectLoader to feed data into the pipeline
Extracted six feature classes: colocalization, granularity, intensity, neighbors, texture, and volume/size/shape
Merged all features into a single profile DataFrame with one row per object, following the ZEDprofiler naming convention
Normalized and feature selected the profile using Pycytominer: z-scoring features and removing low-variance and redundant columns
Visualized pairwise object similarity: computed and plotted a Pearson correlation heatmap across all objects

Each section includes a brief description of what is being computed and what the key parameters control.

Imports

import logging
import warnings

# suppress third-party noise before any imports
logging.basicConfig(level=logging.WARNING)
logging.getLogger("matplotlib").setLevel(logging.WARNING)
warnings.filterwarnings("ignore", category=SyntaxWarning, module="mahotas")

import pathlib

import numpy as np
import pandas as pd
import tifffile
from itables import show

import zedprofiler
from zedprofiler.IO import loading_classes

annotation=NoneType required=False default_factory=list

/home/docs/checkouts/readthedocs.org/user_builds/zedprofiler/envs/latest/lib/python3.13/site-packages/mahotas/morph.py:322: SyntaxWarning: invalid escape sequence '\s'
  with ``1``\s, while the ``0``\s line up with ``0``\s (``2``\s correspond to
/home/docs/checkouts/readthedocs.org/user_builds/zedprofiler/envs/latest/lib/python3.13/site-packages/mahotas/features/texture.py:111: SyntaxWarning: invalid escape sequence '\|'
  Feature 10 (index 9) has two interpretations, as the variance of \|x-y\|
/home/docs/checkouts/readthedocs.org/user_builds/zedprofiler/envs/latest/lib/python3.13/site-packages/mahotas/features/texture.py:223: SyntaxWarning: invalid escape sequence '\|'
  Feature 10 (index 9) has two interpretations, as the variance of \|x-y\|

Define Paths and Parameters

This walkthrough uses synthetically generated 3D arrays so it runs anywhere without real microscopy data. Two random intensity channels and a label mask are generated from spherical objects placed at known positions.

The channel_mapping dictionary maps logical names to substrings found in your image filenames: ZEDprofiler uses these keys to identify which file corresponds to which channel or segmentation label.

Expand the cells below to see how paths are configured and how the synthetic data is generated.

Show code cell source

Hide code cell source

# --- synthetic data parameters ---
VOLUME_SHAPE = (80, 80, 30)  # (x, y, z)
N_OBJECTS = 6
MIN_RADIUS, MAX_RADIUS = 6, 12
rng = np.random.default_rng(seed=42)

# --- build label mask and intensity channels ---
# Channel 1: nuclear stain: compact, bright signal confined to each nucleus
# Channel 2: diffuse stain: partially overlaps each nucleus to a varying degree,
#            simulating a cytoplasmic/organelle marker with per-object colocalization
nuclei_labels = np.zeros(VOLUME_SHAPE, dtype=np.uint16)
channel1_array = np.zeros(VOLUME_SHAPE, dtype=np.float32)
channel2_array = np.zeros(VOLUME_SHAPE, dtype=np.float32)

xs = np.arange(VOLUME_SHAPE[0])
ys = np.arange(VOLUME_SHAPE[1])
zs = np.arange(VOLUME_SHAPE[2])
grid = np.stack(np.meshgrid(xs, ys, zs, indexing="ij"), axis=-1)

centers, radii = [], []
for obj_id in range(1, N_OBJECTS + 1):
    for _ in range(50):
        r = rng.integers(MIN_RADIUS, MAX_RADIUS)
        cx = rng.integers(r + 2, VOLUME_SHAPE[0] - r - 2)
        cy = rng.integers(r + 2, VOLUME_SHAPE[1] - r - 2)
        cz = rng.integers(r + 2, VOLUME_SHAPE[2] - r - 2)
        center = np.array([cx, cy, cz])
        if all(
            np.linalg.norm(center - c) > r + rv + 4 for c, rv in zip(centers, radii)
        ):
            break
    centers.append(center)
    radii.append(r)

    dist = np.linalg.norm(grid - center, axis=-1)
    nucleus_mask = dist <= r
    nuclei_labels[nucleus_mask] = obj_id

    # Channel 1: compact nuclear signal with fine texture
    ch1_base = rng.uniform(0.7, 1.0)
    channel1_array[nucleus_mask] = ch1_base + rng.normal(
        0, 0.06, size=nucleus_mask.sum()
    )

    # Channel 2: diffuse stain with per-object colocalization strength
    # coloc_strength controls how much signal sits inside vs outside the nucleus
    coloc_strength = rng.uniform(0.15, 0.95)
    diffuse_r = r * rng.uniform(1.3, 2.0)
    diffuse_mask = (dist <= diffuse_r) & ~nucleus_mask

    ch2_base = rng.uniform(0.4, 0.9)
    channel2_array[nucleus_mask] += coloc_strength * ch2_base + rng.normal(
        0, 0.08, size=nucleus_mask.sum()
    )
    if diffuse_mask.any():
        channel2_array[diffuse_mask] += (
            1 - coloc_strength
        ) * ch2_base * 0.6 + rng.normal(0, 0.06, size=diffuse_mask.sum())

# clip negatives and scale to uint8
channel1_array = np.clip(channel1_array / channel1_array.max() * 255, 0, 255).astype(
    np.uint8
)
channel2_array = np.clip(channel2_array / channel2_array.max() * 255, 0, 255).astype(
    np.uint8
)

# save to disk
image_set_path.mkdir(parents=True, exist_ok=True)
label_set_path.mkdir(parents=True, exist_ok=True)
tifffile.imwrite(image_set_path / "channel1.tif", channel1_array)
tifffile.imwrite(image_set_path / "channel2.tif", channel2_array)
tifffile.imwrite(label_set_path / "nuclei_labels.tif", nuclei_labels)

print(f"Generated volume: {VOLUME_SHAPE}, objects: {N_OBJECTS}")
print("Channel 1: nuclear stain (compact)")
print("Channel 2: diffuse stain (per-object colocalization strength varies)")

Generated volume: (80, 80, 30), objects: 6
Channel 1: nuclear stain (compact)
Channel 2: diffuse stain (per-object colocalization strength varies)

Preview Synthetic Data

The interactive viewer below shows a z-slice browser for all three arrays. Use the slider to step through the volume and see how the objects and their labels change across z-planes.

Expand the cell below to see how the interactive viewer is built.

Show code cell source

Hide code cell source

import plotly.graph_objects as go
from IPython.display import HTML
from plotly.subplots import make_subplots

n_slices = channel1_array.shape[2]

fig = make_subplots(
    rows=1,
    cols=3,
    subplot_titles=("Channel 1", "Channel 2", "Labels"),
    horizontal_spacing=0.05,
)

frames = []
for z in range(n_slices):
    frames.append(
        go.Frame(
            data=[
                go.Heatmap(
                    z=channel1_array[:, :, z].T,
                    colorscale="gray",
                    zmin=0,
                    zmax=255,
                    showscale=False,
                ),
                go.Heatmap(
                    z=channel2_array[:, :, z].T,
                    colorscale="gray",
                    zmin=0,
                    zmax=255,
                    showscale=False,
                ),
                go.Heatmap(
                    z=nuclei_labels[:, :, z].T,
                    colorscale="turbo",
                    zmin=0,
                    zmax=N_OBJECTS,
                    showscale=False,
                ),
            ],
            traces=[0, 1, 2],
            name=str(z),
        )
    )

mid = n_slices // 2
fig.add_trace(
    go.Heatmap(
        z=channel1_array[:, :, mid].T,
        colorscale="gray",
        zmin=0,
        zmax=255,
        showscale=False,
    ),
    row=1,
    col=1,
)
fig.add_trace(
    go.Heatmap(
        z=channel2_array[:, :, mid].T,
        colorscale="gray",
        zmin=0,
        zmax=255,
        showscale=False,
    ),
    row=1,
    col=2,
)
fig.add_trace(
    go.Heatmap(
        z=nuclei_labels[:, :, mid].T,
        colorscale="turbo",
        zmin=0,
        zmax=N_OBJECTS,
        showscale=False,
    ),
    row=1,
    col=3,
)

fig.frames = frames

fig.update_layout(
    title=f"Synthetic 3D volume: z-slice browser ({n_slices} planes)",
    height=380,
    sliders=[
        {
            "active": mid,
            "steps": [
                {
                    "args": [[str(z)], {"frame": {"duration": 0}, "mode": "immediate"}],
                    "label": str(z),
                    "method": "animate",
                }
                for z in range(n_slices)
            ],
            "x": 0.1,
            "len": 0.8,
            "y": -0.05,
            "currentvalue": {"prefix": "z = ", "visible": True, "xanchor": "center"},
        }
    ],
    updatemenus=[
        {
            "type": "buttons",
            "showactive": False,
            "y": -0.15,
            "x": 0.5,
            "xanchor": "center",
            "buttons": [
                {
                    "label": "▶ Play",
                    "method": "animate",
                    "args": [None, {"frame": {"duration": 80}, "fromcurrent": True}],
                },
                {
                    "label": "⏸ Pause",
                    "method": "animate",
                    "args": [[None], {"frame": {"duration": 0}, "mode": "immediate"}],
                },
            ],
        }
    ],
    margin={"t": 60, "b": 120},
)
fig.update_yaxes(scaleanchor=None)
HTML(fig.to_html(include_plotlyjs="cdn", full_html=False))

Loading Objects and Image Sets

ZEDprofiler uses three loader classes to organize image data before featurization:

ImageSetConfig: declares which keys are raw intensity channels (raw_image_key_name) and which are segmentation labels (label_key_name). This config is passed to the loader so it knows how to partition the channel mapping.
ImageSetLoader: loads the full image set from disk (or from in-memory arrays) and stores each channel and label as a numpy array. The anisotropy_spacing parameter describes the physical voxel size in [z, y, x] order and is used by downstream features (e.g. neighbors, surface area) that require real-world distances.
ObjectLoader: a lightweight view over one channel + one compartment within a loaded image set. Most featurization functions accept an ObjectLoader.
TwoObjectLoader: a paired view over two channels within one compartment, required by colocalization which compares signal across two channels simultaneously.

isc = loading_classes.ImageSetConfig(
    image_set_name="test_image_set",
    raw_image_key_name=[CHANNEL1, CHANNEL2],
    label_key_name=[COMPARTMENT],
)

image_set_loader = loading_classes.ImageSetLoader(
    anisotropy_spacing=[1, 0.1, 0.1],
    channel_mapping=channel_mapping,
    image_set_path=image_set_path,
    label_set_path=label_set_path,
    config=isc,
    # we can also directly pass arrays.
    # This is what ZEDprofiler expects when the images are multichannel.
    # Load the image arrays in and pass them in directly instead
    # of loading from path.
    image_set_array=None,
    label_set_array=None,
)

ol = loading_classes.ObjectLoader(
    image_set_loader=image_set_loader,
    channel_name=CHANNEL1,
    compartment_name=COMPARTMENT,
)

coloc_loader = loading_classes.TwoObjectLoader(
    image_set_loader=image_set_loader,
    compartment=COMPARTMENT,
    channel1=CHANNEL1,
    channel2=CHANNEL2,
)

Colocalization

Colocalization

Colocalization measures the degree to which two fluorescent channels occupy the same spatial locations within a segmented compartment. This is useful for quantifying co-expression or co-localization of proteins in 3D cell images.

ZEDprofiler computes several colocalization coefficients per object pair:

Pearson correlation: linear correlation of pixel intensities between the two channels
Manders coefficients (M1, M2): fraction of channel 1 (or 2) signal that overlaps with channel 2 (or 1)
Costes thresholded Manders: Manders coefficients computed above a statistically determined threshold
Overlap coefficient: combined spatial and intensity overlap
Rank-weighted colocalization (RWC): robust to intensity differences, weights overlap by rank

Key parameters:

thr: minimum pixel intensity threshold below which pixels are excluded
fast_costes: "Accurate" uses the full bisection algorithm for the Costes threshold; "Fast" uses a linear approximation

For full feature definitions, see the Colocalization reference page.

colocalization_features = zedprofiler.colocalization.compute_colocalization(
    two_object_loader=coloc_loader,
    thr=15,
    fast_costes="Accurate",
    channel1=CHANNEL1,
    channel2=CHANNEL2,
)

Granularity

Granularity

Granularity characterizes the texture of an object at multiple spatial scales by repeatedly applying morphological erosions and measuring how much signal is removed at each scale. The result is a spectrum of values: one per scale: that captures whether an object’s intensity is concentrated in fine, coarse, or mixed-scale structures. This is analogous to CellProfiler’s granularity module and is particularly useful for distinguishing organelle morphology.

Key parameters:

granular_spectrum_length: number of erosion scales to compute (default 16); higher values capture coarser structure
radius: radius of the structuring element used for background subtraction before the spectrum is computed
subsample_size: fraction of the image used during spectrum computation (speeds up processing on large volumes)
image_sample_size: fraction used for background estimation
mask_threshold: threshold applied after interpolation to define the object mask

For full feature definitions, see the Granularity reference page.

granularity_features = zedprofiler.granularity.compute_granularity(
    object_loader=ol,
    radius=10,  # radius of the structuring element for background
    # removal (CellProfiler default)
    granular_spectrum_length=16,  # range of the granular spectrum
    subsample_size=0.25,  # subsample the image for faster processing
    image_sample_size=0.25,  # further subsample for background removal
    mask_threshold=0.9,  # threshold for determining mask after interpolation
    verbose=False,  # print out intermediate steps and values for debugging
)

Intensity

Intensity

Intensity features summarize the distribution of pixel intensities within each segmented object. They provide a compact description of how bright an object is, how variable its signal is, and how that signal is spatially distributed relative to the object’s edges.

Features computed per object include:

Mean, median, standard deviation, max, min of voxel intensities
Integrated intensity: sum of all voxel intensities (correlated with object volume)
Mass displacement: distance between the object’s geometric centroid and its intensity-weighted centroid, indicating asymmetric signal distribution
Lower/upper quartile intensities: robust measures of the intensity distribution

For full feature definitions, see the Intensity reference page.

intensity_features = zedprofiler.intensity.compute_intensity(
    object_loader=ol,
)

Neighbors

Neighbors

Neighbor features describe the spatial relationships between objects within a compartment. For each object, ZEDprofiler identifies all other objects whose centroids fall within a given distance and computes summary statistics over that neighborhood.

Features include counts of neighbors, mean and standard deviation of distances to neighbors, and the angle between each object and its closest neighbor.

Key parameters:

distance_threshold: maximum centroid-to-centroid distance (in voxels) for two objects to be considered neighbors
anisotropy_factor: the ratio of z-spacing to xy-spacing (anisotropy_spacing[0] / anisotropy_spacing[1]); used to rescale z-distances to physical units before computing Euclidean distances, ensuring accurate neighbor relationships in anisotropic volumes

For full feature definitions, see the Neighbors reference page.

neighbors_features = zedprofiler.neighbors.compute_neighbors(
    object_loader=ol,
    distance_threshold=10,
    anisotropy_factor=image_set_loader.anisotropy_spacing[0]
    / image_set_loader.anisotropy_spacing[
        1
    ],  # z to xy spacing ratio for distance calculation
)

Texture

Texture

Texture features capture the spatial arrangement of intensities within an object using the gray-level co-occurrence matrix (GLCM). The GLCM records how often pairs of pixel intensities occur at a given spatial offset, and summary statistics computed from it describe properties like smoothness, contrast, and periodicity.

ZEDprofiler computes GLCM-based features (e.g. angular second moment, contrast, correlation, entropy, homogeneity) at multiple offsets and directions in 3D, then aggregates across directions.

Key parameters:

distance: the pixel offset at which co-occurrence is measured; larger values capture coarser texture patterns
grayscale: the number of intensity bins used to quantize the image before computing the GLCM; fewer bins are faster but lose intensity resolution

For full feature definitions, see the Texture reference page.

texture_features = zedprofiler.texture.compute_texture(
    object_loader=ol,
    distance=3,
    grayscale=256,
)

Volume, Size, and Shape

Volume, Size, and Shape

Volume, size, and shape features characterize the 3D geometry of each segmented object. They are computed using skimage.measure.regionprops on the label image, with surface area calculated via marching cubes.

Features per object include:

Volume: total number of voxels in the object
Bounding box: min/max extents in x, y, z
BboxVolume: volume of the axis-aligned bounding box
Centroid: geometric center in x, y, z
Extent: ratio of object volume to bounding box volume (compactness)
Euler number: topological measure of holes/tunnels in the object
Equivalent diameter: diameter of a sphere with the same volume
Surface area: estimated via marching cubes on the object boundary

For full feature definitions, see the Volume, Size, and Shape reference page.

volume_size_shape_features = zedprofiler.volumesizeshape.compute_volume_size_shape(
    image_set_loader=image_set_loader,
    object_loader=ol,
)

Merging All Features into a Single Profile

Each featurization function returns a dictionary that can be converted to a DataFrame. All DataFrames share two index columns: Metadata_Experiment_ImageSet and Metadata_Object_ObjectID: which are used as merge keys.

The final merged DataFrame contains one row per object and one column per feature, following the ZEDprofiler naming convention: {Compartment}_{Channel}_{FeatureType}_{Measurement}

See the Feature Schema for the full specification of how columns are named and structured.

The cell below also estimates the total expected feature count when scaling to a full experiment with multiple compartments and channels, giving a sense of the profile dimensionality at scale.

coloc_df = pd.DataFrame(colocalization_features)
granularity_df = pd.DataFrame(granularity_features)
intensity_df = pd.DataFrame(intensity_features)
neighbors_df = pd.DataFrame(neighbors_features)
texture_df = pd.DataFrame(texture_features)
volume_size_shape_df = pd.DataFrame(volume_size_shape_features)
# subtract 2 from each df shape to account for image_set and object_id
# columns that will be used for merging
size_of_all_dfs = [
    df.shape[1] - 2
    for df in [
        coloc_df,
        granularity_df,
        intensity_df,
        neighbors_df,
        texture_df,
        volume_size_shape_df,
    ]
]
# add the 2 back to account for the image_set and object_id columns
# that will be retained in the final merged DataFrame
expected_num_columns = sum(size_of_all_dfs) + 2

columns_to_merge = ["Metadata_Experiment_ImageSet", "Metadata_Object_ObjectID"]

for df in [
    coloc_df,
    granularity_df,
    intensity_df,
    neighbors_df,
    texture_df,
    volume_size_shape_df,
]:
    if not all(col in df.columns for col in columns_to_merge):
        raise ValueError(
            f"DataFrame is missing required columns for merging: {columns_to_merge}"
        )
# merge all features into a single DataFrame
all_features_df = pd.merge(
    coloc_df,
    pd.merge(
        granularity_df,
        pd.merge(
            intensity_df,
            pd.merge(
                neighbors_df,
                pd.merge(
                    texture_df, volume_size_shape_df, on=columns_to_merge, how="inner"
                ),
                on=columns_to_merge,
                how="inner",
            ),
            on=columns_to_merge,
            how="inner",
        ),
        on=columns_to_merge,
        how="inner",
    ),
    on=columns_to_merge,
    how="inner",
)
if all_features_df.shape[1] != expected_num_columns:
    raise ValueError(
        f"""
        Merged DataFrame has {all_features_df.shape[1]} columns
        but expected {expected_num_columns} based on individual DataFrames"""
    )
print(
    f"Shape: {all_features_df.shape[0]} objects x {all_features_df.shape[1]} features"
)
show(all_features_df)

Shape: 6 objects x 233 features

Loading ITables v2.8.0 from the internet... (need help?)

expected_num_compartments = 4
expected_num_of_channels = 4
expected_num_of_total_features = (
    expected_num_columns - 2
)  # subtract 2 for image_set and object_id columns
expected_num_of_total_features = (
    expected_num_of_total_features
    * expected_num_compartments
    * expected_num_of_channels
    + 2
)
expected_num_of_total_features

if all_features_df.shape[1] != expected_num_columns:
    raise ValueError(
        f"Expected {expected_num_columns} columns but got {all_features_df.shape[1]}"
    )

Normalize and Feature Select

With all features merged into a single profile DataFrame, we use Pycytominer to:

Normalize: z-score each feature across objects so all features are on the same scale
Feature select: remove low-variance and highly correlated features that carry redundant information

Pycytominer infers which columns are features (prefixed Nuclei_) and which are metadata (prefixed Metadata_) automatically.

from pycytominer import feature_select, normalize

# z-score normalize across all objects
normalized_df = normalize(
    profiles=all_features_df,
    method="standardize",
    output_file=None,
)

# remove low-variance and highly correlated features
selected_df = feature_select(
    profiles=normalized_df,
    operation=["variance_threshold", "correlation_threshold"],
    output_file=None,
)

meta_cols = [c for c in selected_df.columns if c.startswith("Metadata_")]
feature_cols = [c for c in selected_df.columns if c not in meta_cols]
print(
    f"Features after selection: {len(feature_cols)} (from {all_features_df.shape[1] - len(meta_cols)} original)"
)

Features after selection: 25 (from 231 original)

Pairwise Object Correlation

The heatmap below shows the Pearson correlation between every pair of objects across all selected features. Objects with similar morphological profiles appear as warm (high correlation) clusters; dissimilar objects appear cool.

Each row and column is labelled by object ID.