ZEDprofiler Walkthrough

ZEDprofiler is a CPU-first toolkit for extracting morphological features from 3D fluorescence microscopy images. It is designed for high-content and high-throughput workflows where images are volumetric z-stacks with multiple spectral channels and segmentation labels.

What this walkthrough covers

By the end of this notebook you will have:

  1. Generated synthetic 3D image data: two channels (a compact nuclear stain and a diffuse marker with per-object colocalization) and a ground-truth label mask

  2. Explored the data interactively: browsed the z-stack volume with a slice-by-slice viewer

  3. Loaded image sets and objects: configured ImageSetLoader, ObjectLoader, and TwoObjectLoader to feed data into the pipeline

  4. Extracted six feature classes: colocalization, granularity, intensity, neighbors, texture, and volume/size/shape

  5. Merged all features into a single profile DataFrame with one row per object, following the ZEDprofiler naming convention

  6. Normalized and feature selected the profile using Pycytominer: z-scoring features and removing low-variance and redundant columns

  7. Visualized pairwise object similarity: computed and plotted a Pearson correlation heatmap across all objects

Each section includes a brief description of what is being computed and what the key parameters control.

Imports

import warnings
import logging
import pathlib

import numpy as np
import pandas as pd
import tifffile
from itables import show

import zedprofiler
from zedprofiler.IO import loading_classes

logging.basicConfig(level=logging.WARNING)
logging.getLogger("matplotlib").setLevel(logging.WARNING)
warnings.filterwarnings("ignore", category=SyntaxWarning, module="mahotas")
annotation=NoneType required=False default_factory=list
INFO:matplotlib.font_manager:Failed to extract font properties from /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf: Can not load face (unknown file format; error code 0x2)
INFO:matplotlib.font_manager:generated new fontManager
/home/docs/checkouts/readthedocs.org/user_builds/zedprofiler/envs/latest/lib/python3.13/site-packages/mahotas/morph.py:322: SyntaxWarning: invalid escape sequence '\s'
  with ``1``\s, while the ``0``\s line up with ``0``\s (``2``\s correspond to
/home/docs/checkouts/readthedocs.org/user_builds/zedprofiler/envs/latest/lib/python3.13/site-packages/mahotas/features/texture.py:111: SyntaxWarning: invalid escape sequence '\|'
  Feature 10 (index 9) has two interpretations, as the variance of \|x-y\|
/home/docs/checkouts/readthedocs.org/user_builds/zedprofiler/envs/latest/lib/python3.13/site-packages/mahotas/features/texture.py:223: SyntaxWarning: invalid escape sequence '\|'
  Feature 10 (index 9) has two interpretations, as the variance of \|x-y\|

Define Paths and Parameters

This walkthrough uses synthetically generated 3D arrays so it runs anywhere without real microscopy data. Two random intensity channels and a label mask are generated from spherical objects placed at known positions.

The channel_mapping dictionary maps logical names to substrings found in your image filenames: ZEDprofiler uses these keys to identify which file corresponds to which channel or segmentation label.

Expand the cells below to see how paths are configured and how the synthetic data is generated.

Hide code cell source

notebook_root = pathlib.Path.cwd().resolve()
image_set_path = (notebook_root / "test_data" / "dummy_image_set").resolve()
label_set_path = (notebook_root / "test_data" / "dummy_label_set").resolve()

channel_mapping = {
    "Channel1": "channel1",
    "Channel2": "channel2",
    "Nuclei": "nuclei_labels",
}

CHANNEL1 = "Channel1"
CHANNEL2 = "Channel2"
COMPARTMENT = "Nuclei"

Hide code cell source

# --- synthetic data parameters ---
VOLUME_SHAPE = (80, 80, 30)  # (x, y, z)
N_OBJECTS = 6
MIN_RADIUS, MAX_RADIUS = 6, 12
rng = np.random.default_rng(seed=42)

# --- build label mask and intensity channels ---
# Channel 1: nuclear stain: compact, bright signal confined to each nucleus
# Channel 2: diffuse stain: partially overlaps each nucleus to a varying degree,
#            simulating a cytoplasmic/organelle marker with per-object colocalization
nuclei_labels = np.zeros(VOLUME_SHAPE, dtype=np.uint16)
channel1_array = np.zeros(VOLUME_SHAPE, dtype=np.float32)
channel2_array = np.zeros(VOLUME_SHAPE, dtype=np.float32)

xs = np.arange(VOLUME_SHAPE[0])
ys = np.arange(VOLUME_SHAPE[1])
zs = np.arange(VOLUME_SHAPE[2])
grid = np.stack(np.meshgrid(xs, ys, zs, indexing="ij"), axis=-1)

centers, radii = [], []
for obj_id in range(1, N_OBJECTS + 1):
    for _ in range(50):
        r = rng.integers(MIN_RADIUS, MAX_RADIUS)
        cx = rng.integers(r + 2, VOLUME_SHAPE[0] - r - 2)
        cy = rng.integers(r + 2, VOLUME_SHAPE[1] - r - 2)
        cz = rng.integers(r + 2, VOLUME_SHAPE[2] - r - 2)
        center = np.array([cx, cy, cz])
        if all(
            np.linalg.norm(center - c) > r + rv + 4 for c, rv in zip(centers, radii)
        ):
            break
    centers.append(center)
    radii.append(r)

    dist = np.linalg.norm(grid - center, axis=-1)
    nucleus_mask = dist <= r
    nuclei_labels[nucleus_mask] = obj_id

    # Channel 1: compact nuclear signal with fine texture
    ch1_base = rng.uniform(0.7, 1.0)
    channel1_array[nucleus_mask] = ch1_base + rng.normal(
        0, 0.06, size=nucleus_mask.sum()
    )

    # Channel 2: diffuse stain with per-object colocalization strength
    # coloc_strength controls how much signal sits inside vs outside the nucleus
    coloc_strength = rng.uniform(0.15, 0.95)
    diffuse_r = r * rng.uniform(1.3, 2.0)
    diffuse_mask = (dist <= diffuse_r) & ~nucleus_mask

    ch2_base = rng.uniform(0.4, 0.9)
    channel2_array[nucleus_mask] += coloc_strength * ch2_base + rng.normal(
        0, 0.08, size=nucleus_mask.sum()
    )
    if diffuse_mask.any():
        channel2_array[diffuse_mask] += (
            1 - coloc_strength
        ) * ch2_base * 0.6 + rng.normal(0, 0.06, size=diffuse_mask.sum())

# clip negatives and scale to uint8
channel1_array = np.clip(channel1_array / channel1_array.max() * 255, 0, 255).astype(
    np.uint8
)
channel2_array = np.clip(channel2_array / channel2_array.max() * 255, 0, 255).astype(
    np.uint8
)

# save to disk
image_set_path.mkdir(parents=True, exist_ok=True)
label_set_path.mkdir(parents=True, exist_ok=True)
tifffile.imwrite(image_set_path / "channel1.tif", channel1_array)
tifffile.imwrite(image_set_path / "channel2.tif", channel2_array)
tifffile.imwrite(label_set_path / "nuclei_labels.tif", nuclei_labels)

print(f"Generated volume: {VOLUME_SHAPE}, objects: {N_OBJECTS}")
print("Channel 1: nuclear stain (compact)")
print("Channel 2: diffuse stain (per-object colocalization strength varies)")
Generated volume: (80, 80, 30), objects: 6
Channel 1: nuclear stain (compact)
Channel 2: diffuse stain (per-object colocalization strength varies)

Preview Synthetic Data

The interactive viewer below shows a z-slice browser for all three arrays. Use the slider to step through the volume and see how the objects and their labels change across z-planes.

Expand the cell below to see how the interactive viewer is built.

Hide code cell source

import plotly.graph_objects as go
from IPython.display import HTML
from plotly.subplots import make_subplots

n_slices = channel1_array.shape[2]

fig = make_subplots(
    rows=1,
    cols=3,
    subplot_titles=("Channel 1", "Channel 2", "Labels"),
    horizontal_spacing=0.05,
)

frames = []
for z in range(n_slices):
    frames.append(
        go.Frame(
            data=[
                go.Heatmap(
                    z=channel1_array[:, :, z].T,
                    colorscale="gray",
                    zmin=0,
                    zmax=255,
                    showscale=False,
                ),
                go.Heatmap(
                    z=channel2_array[:, :, z].T,
                    colorscale="gray",
                    zmin=0,
                    zmax=255,
                    showscale=False,
                ),
                go.Heatmap(
                    z=nuclei_labels[:, :, z].T,
                    colorscale="turbo",
                    zmin=0,
                    zmax=N_OBJECTS,
                    showscale=False,
                ),
            ],
            traces=[0, 1, 2],
            name=str(z),
        )
    )

mid = n_slices // 2
fig.add_trace(
    go.Heatmap(
        z=channel1_array[:, :, mid].T,
        colorscale="gray",
        zmin=0,
        zmax=255,
        showscale=False,
    ),
    row=1,
    col=1,
)
fig.add_trace(
    go.Heatmap(
        z=channel2_array[:, :, mid].T,
        colorscale="gray",
        zmin=0,
        zmax=255,
        showscale=False,
    ),
    row=1,
    col=2,
)
fig.add_trace(
    go.Heatmap(
        z=nuclei_labels[:, :, mid].T,
        colorscale="turbo",
        zmin=0,
        zmax=N_OBJECTS,
        showscale=False,
    ),
    row=1,
    col=3,
)

fig.frames = frames

fig.update_layout(
    title=f"Synthetic 3D volume: z-slice browser ({n_slices} planes)",
    height=380,
    sliders=[
        {
            "active": mid,
            "steps": [
                {
                    "args": [[str(z)], {"frame": {"duration": 0}, "mode": "immediate"}],
                    "label": str(z),
                    "method": "animate",
                }
                for z in range(n_slices)
            ],
            "x": 0.1,
            "len": 0.8,
            "y": -0.05,
            "currentvalue": {"prefix": "z = ", "visible": True, "xanchor": "center"},
        }
    ],
    updatemenus=[
        {
            "type": "buttons",
            "showactive": False,
            "y": -0.15,
            "x": 0.5,
            "xanchor": "center",
            "buttons": [
                {
                    "label": "▶ Play",
                    "method": "animate",
                    "args": [None, {"frame": {"duration": 80}, "fromcurrent": True}],
                },
                {
                    "label": "⏸ Pause",
                    "method": "animate",
                    "args": [[None], {"frame": {"duration": 0}, "mode": "immediate"}],
                },
            ],
        }
    ],
    margin={"t": 60, "b": 120},
)
fig.update_yaxes(scaleanchor=None)
HTML(fig.to_html(include_plotlyjs="cdn", full_html=False))

Loading Objects and Image Sets

ZEDprofiler uses three loader classes to organize image data before featurization:

  • ImageSetConfig: declares which keys are raw intensity channels (raw_image_key_name) and which are segmentation labels (label_key_name). This config is passed to the loader so it knows how to partition the channel mapping.

  • ImageSetLoader: loads the full image set from disk (or from in-memory arrays) and stores each channel and label as a numpy array. The anisotropy_spacing parameter describes the physical voxel size in [z, y, x] order and is used by downstream features (e.g. neighbors, surface area) that require real-world distances.

  • ObjectLoader: a lightweight view over one channel + one compartment within a loaded image set. Most featurization functions accept an ObjectLoader.

  • TwoObjectLoader: a paired view over two channels within one compartment, required by colocalization which compares signal across two channels simultaneously.

isc = loading_classes.ImageSetConfig(
    image_set_name="test_image_set",
    raw_image_key_name=[CHANNEL1, CHANNEL2],
    label_key_name=[COMPARTMENT],
)

image_set_loader = loading_classes.ImageSetLoader(
    anisotropy_spacing=[1, 0.1, 0.1],
    channel_mapping=channel_mapping,
    image_set_path=image_set_path,
    label_set_path=label_set_path,
    config=isc,
    # we can also directly pass arrays.
    # This is what ZEDprofiler expects when the images are multichannel.
    # Load the image arrays in and pass them in directly instead
    # of loading from path.
    image_set_array=None,
    label_set_array=None,
)
ol = loading_classes.ObjectLoader(
    image_set_loader=image_set_loader,
    channel_name=CHANNEL1,
    compartment_name=COMPARTMENT,
)
coloc_loader = loading_classes.TwoObjectLoader(
    image_set_loader=image_set_loader,
    compartment=COMPARTMENT,
    channel1=CHANNEL1,
    channel2=CHANNEL2,
)

Colocalization

Colocalization

Colocalization measures the degree to which two fluorescent channels occupy the same spatial locations within a segmented compartment. This is useful for quantifying co-expression or co-localization of proteins in 3D cell images.

ZEDprofiler computes several colocalization coefficients per object pair:

  • Pearson correlation: linear correlation of pixel intensities between the two channels

  • Manders coefficients (M1, M2): fraction of channel 1 (or 2) signal that overlaps with channel 2 (or 1)

  • Costes thresholded Manders: Manders coefficients computed above a statistically determined threshold

  • Overlap coefficient: combined spatial and intensity overlap

  • Rank-weighted colocalization (RWC): robust to intensity differences, weights overlap by rank

Key parameters:

  • thr: minimum pixel intensity threshold below which pixels are excluded

  • fast_costes: "Accurate" uses the full bisection algorithm for the Costes threshold; "Fast" uses a linear approximation

colocalization_features = zedprofiler.colocalization.compute_colocalization(
    two_object_loader=coloc_loader,
    thr=15,
    fast_costes="Accurate",
    channel1=CHANNEL1,
    channel2=CHANNEL2,
)

Granularity

Granularity

Granularity characterizes the texture of an object at multiple spatial scales by repeatedly applying morphological erosions and measuring how much signal is removed at each scale. The result is a spectrum of values: one per scale: that captures whether an object’s intensity is concentrated in fine, coarse, or mixed-scale structures. This is analogous to CellProfiler’s granularity module and is particularly useful for distinguishing organelle morphology.

Key parameters:

  • granular_spectrum_length: number of erosion scales to compute (default 16); higher values capture coarser structure

  • radius: radius of the structuring element used for background subtraction before the spectrum is computed

  • subsample_size: fraction of the image used during spectrum computation (speeds up processing on large volumes)

  • image_sample_size: fraction used for background estimation

  • mask_threshold: threshold applied after interpolation to define the object mask

granularity_features = zedprofiler.granularity.compute_granularity(
    object_loader=ol,
    radius=10,  # radius of the structuring element for background
    # removal (CellProfiler default)
    granular_spectrum_length=16,  # range of the granular spectrum
    subsample_size=0.25,  # subsample the image for faster processing
    image_sample_size=0.25,  # further subsample for background removal
    mask_threshold=0.9,  # threshold for determining mask after interpolation
    verbose=False,  # print out intermediate steps and values for debugging
)

Intensity

Intensity

Intensity features summarize the distribution of pixel intensities within each segmented object. They provide a compact description of how bright an object is, how variable its signal is, and how that signal is spatially distributed relative to the object’s edges.

Features computed per object include:

  • Mean, median, standard deviation, max, min of voxel intensities

  • Integrated intensity: sum of all voxel intensities (correlated with object volume)

  • Mass displacement: distance between the object’s geometric centroid and its intensity-weighted centroid, indicating asymmetric signal distribution

  • Lower/upper quartile intensities: robust measures of the intensity distribution

intensity_features = zedprofiler.intensity.compute_intensity(
    object_loader=ol,
)

Neighbors

Neighbors

Neighbor features describe the spatial relationships between objects within a compartment. For each object, ZEDprofiler identifies all other objects whose centroids fall within a given distance and computes summary statistics over that neighborhood.

Features include counts of neighbors, mean and standard deviation of distances to neighbors, and the angle between each object and its closest neighbor.

Key parameters:

  • distance_threshold: maximum centroid-to-centroid distance (in voxels) for two objects to be considered neighbors

  • anisotropy_factor: the ratio of z-spacing to xy-spacing (anisotropy_spacing[0] / anisotropy_spacing[1]); used to rescale z-distances to physical units before computing Euclidean distances, ensuring accurate neighbor relationships in anisotropic volumes

neighbors_features = zedprofiler.neighbors.compute_neighbors(
    object_loader=ol,
    distance_threshold=10,
    anisotropy_factor=image_set_loader.anisotropy_spacing[0]
    / image_set_loader.anisotropy_spacing[
        1
    ],  # z to xy spacing ratio for distance calculation
)

Texture

Texture

Texture features capture the spatial arrangement of intensities within an object using the gray-level co-occurrence matrix (GLCM). The GLCM records how often pairs of pixel intensities occur at a given spatial offset, and summary statistics computed from it describe properties like smoothness, contrast, and periodicity.

ZEDprofiler computes GLCM-based features (e.g. angular second moment, contrast, correlation, entropy, homogeneity) at multiple offsets and directions in 3D, then aggregates across directions.

Key parameters:

  • distance: the pixel offset at which co-occurrence is measured; larger values capture coarser texture patterns

  • grayscale: the number of intensity bins used to quantize the image before computing the GLCM; fewer bins are faster but lose intensity resolution

texture_features = zedprofiler.texture.compute_texture(
    object_loader=ol,
    distance=3,
    grayscale=256,
)

Volume, Size, and Shape

Volume, Size, and Shape

Volume, size, and shape features characterize the 3D geometry of each segmented object. They are computed using skimage.measure.regionprops on the label image, with surface area calculated via marching cubes.

Features per object include:

  • Volume: total number of voxels in the object

  • Bounding box: min/max extents in x, y, z

  • BboxVolume: volume of the axis-aligned bounding box

  • Centroid: geometric center in x, y, z

  • Extent: ratio of object volume to bounding box volume (compactness)

  • Euler number: topological measure of holes/tunnels in the object

  • Equivalent diameter: diameter of a sphere with the same volume

  • Surface area: estimated via marching cubes on the object boundary

volume_size_shape_features = zedprofiler.volumesizeshape.compute_volume_size_shape(
    image_set_loader=image_set_loader,
    object_loader=ol,
)

Merging All Features into a Single Profile

Each featurization function returns a dictionary that can be converted to a DataFrame. All DataFrames share two index columns: Metadata_Experiment_ImageSet and Metadata_Object_ObjectID: which are used as merge keys.

The final merged DataFrame contains one row per object and one column per feature, following the ZEDprofiler naming convention: {Compartment}_{Channel}_{FeatureType}_{Measurement}

See the Feature Schema for the full specification of how columns are named and structured.

The cell below also estimates the total expected feature count when scaling to a full experiment with multiple compartments and channels, giving a sense of the profile dimensionality at scale.

coloc_df = pd.DataFrame(colocalization_features)
granularity_df = pd.DataFrame(granularity_features)
intensity_df = pd.DataFrame(intensity_features)
neighbors_df = pd.DataFrame(neighbors_features)
texture_df = pd.DataFrame(texture_features)
volume_size_shape_df = pd.DataFrame(volume_size_shape_features)
# subtract 2 from each df shape to account for image_set and object_id
# columns that will be used for merging
size_of_all_dfs = [
    df.shape[1] - 2
    for df in [
        coloc_df,
        granularity_df,
        intensity_df,
        neighbors_df,
        texture_df,
        volume_size_shape_df,
    ]
]
# add the 2 back to account for the image_set and object_id columns
# that will be retained in the final merged DataFrame
expected_num_columns = sum(size_of_all_dfs) + 2
columns_to_merge = ["Metadata_Experiment_ImageSet", "Metadata_Object_ObjectID"]

for df in [
    coloc_df,
    granularity_df,
    intensity_df,
    neighbors_df,
    texture_df,
    volume_size_shape_df,
]:
    if not all(col in df.columns for col in columns_to_merge):
        raise ValueError(
            f"DataFrame is missing required columns for merging: {columns_to_merge}"
        )
# merge all features into a single DataFrame
all_features_df = pd.merge(
    coloc_df,
    pd.merge(
        granularity_df,
        pd.merge(
            intensity_df,
            pd.merge(
                neighbors_df,
                pd.merge(
                    texture_df, volume_size_shape_df, on=columns_to_merge, how="inner"
                ),
                on=columns_to_merge,
                how="inner",
            ),
            on=columns_to_merge,
            how="inner",
        ),
        on=columns_to_merge,
        how="inner",
    ),
    on=columns_to_merge,
    how="inner",
)
if all_features_df.shape[1] != expected_num_columns:
    raise ValueError(
        f"""
        Merged DataFrame has {all_features_df.shape[1]} columns
        but expected {expected_num_columns} based on individual DataFrames"""
    )
print(
    f"Shape: {all_features_df.shape[0]} objects x {all_features_df.shape[1]} features"
)
show(all_features_df)
Shape: 6 objects x 233 features
Loading ITables v2.8.0 from the internet... (need help?)
expected_num_compartments = 4
expected_num_of_channels = 4
expected_num_of_total_features = (
    expected_num_columns - 2
)  # subtract 2 for image_set and object_id columns
expected_num_of_total_features = (
    expected_num_of_total_features
    * expected_num_compartments
    * expected_num_of_channels
    + 2
)
expected_num_of_total_features
3698
if all_features_df.shape[1] != expected_num_columns:
    raise ValueError(
        f"Expected {expected_num_columns} columns but got {all_features_df.shape[1]}"
    )

Normalize and Feature Select

With all features merged into a single profile DataFrame, we use Pycytominer to:

  1. Normalize: z-score each feature across objects so all features are on the same scale

  2. Feature select: remove low-variance and highly correlated features that carry redundant information

Pycytominer infers which columns are features (prefixed Nuclei_) and which are metadata (prefixed Metadata_) automatically.

from pycytominer import feature_select, normalize

# z-score normalize across all objects
normalized_df = normalize(
    profiles=all_features_df,
    method="standardize",
    output_file=None,
)

# remove low-variance and highly correlated features
selected_df = feature_select(
    profiles=normalized_df,
    operation=["variance_threshold", "correlation_threshold"],
    output_file=None,
)

meta_cols = [c for c in selected_df.columns if c.startswith("Metadata_")]
feature_cols = [c for c in selected_df.columns if c not in meta_cols]
print(
    f"Features after selection: {len(feature_cols)} (from {all_features_df.shape[1] - len(meta_cols)} original)"
)
Features after selection: 27 (from 231 original)

Pairwise Object Correlation

The heatmap below shows the Pearson correlation between every pair of objects across all selected features. Objects with similar morphological profiles appear as warm (high correlation) clusters; dissimilar objects appear cool.

Each row and column is labelled by object ID.

Hide code cell source

import plotly.graph_objects as go
from IPython.display import HTML

# compute pairwise correlation between objects (rows = objects)
feature_matrix = selected_df[feature_cols].astype(float)
corr_matrix = feature_matrix.T.corr()

object_ids = selected_df["Metadata_Object_ObjectID"].astype(str).tolist()
labels = [f"Object {oid}" for oid in object_ids]

fig = go.Figure(
    go.Heatmap(
        z=corr_matrix.values,
        x=labels,
        y=labels,
        colorscale="RdBu",
        zmid=0,
        zmin=-1,
        zmax=1,
        colorbar={"title": "Pearson r"},
        text=corr_matrix.round(2).values,
        texttemplate="%{text}",
    )
)

fig.update_layout(
    title="Pairwise object correlation (normalized, selected features)",
    height=500,
    xaxis={"tickangle": 45},
    margin={"t": 60, "b": 120, "l": 120},
)

HTML(fig.to_html(include_plotlyjs="cdn", full_html=False))