Feature schema
Status: experimental | Version: 0.0.1 | Date: April 9th, 2026 | Authors: ZEDprofiler development team
Important
Every feature column produced by ZEDprofiler follows this naming pattern:
{Compartment}_{Channel}_{FeatureType}_{Measurement}
Examples: Nuclei_DNA_Intensity_MeanIntensity, Cell_DNA-Mito_Colocalization_Correlation
Metadata columns use a separate prefix:
Metadata_{Category}_{Name}
Examples: Metadata_Experiment_ImageSet, Metadata_Object_ObjectID
Abstract
This document specifies the naming specification and schema for morphological features extracted using ZEDprofiler. The specification defines requirements for feature identifiers, data structures, and formatting rules to ensure consistency, interoperability, and maintainability across pipelines.
1. Introduction
1.1 Purpose
This specification establishes a standardized feature naming specification and data schema. Standardization enables:
Consistent feature identification across analysis stages
Automated feature parsing and metadata extraction
Integration with downstream analysis tools
Reproducible research outputs
1.2 Scope
This specification applies to all feature extraction modules within the pipeline, including but not limited to:
Area, size, and shape measurements
Colocalization analysis
Granularity features
Intensity measurements
Neighbor relationships
Deep learning features (sammed3d)
Texture features
1.3 Key words
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
2A. feature name format specification
2.1 General structure
Feature names MUST conform to the following structure:
<Compartment>_<channel>_<featuretype>_<measurement>
Where each component is separated by a single underscore character (_).
2.2 Component definitions
2.2.1 Compartment component
The <compartment> component:
MUST identify the cellular or spatial compartment from which the feature is extracted
MUST be one of the following enumerated values:
Nuclei- nuclear compartmentCell- whole cell compartmentCytoplasm- cytoplasmic compartment (cell excluding nucleus)Organoid- organoid-level compartment
MUST NOT contain whitespace or special characters
MUST use pascalcase capitalization
Example: Nuclei, Cytoplasm, Organoid
2.2.2 Channel component
The <channel> component:
MUST identify the imaging channel or fluorophore used for the measurement
MUST be one of the following values:
DNA- DAPI/hoechst nuclear stain (405nm excitation)AGP- AGP marker (488nm excitation)ER- endoplasmic reticulum marker (555nm excitation)Mito- mitochondrial marker (640nm excitation)BF- brightfield/transmitted light
MUST NOT contain whitespace
MAY use pascalcase capitalization
MAY use hyphen-separated channel combinations for colocalization features (e.g.,
DNA-mito)MUST list channels in alphabetical order when combined (e.g.,
DNA-mitonotmito-DNA)MUST be set to
NoChannelfor channel-independent features (e.g., volumesizeshape)
Example: DNA, Mito, DNA-Mito
2.2.3 Featuretype component
The <featuretype> component:
MUST identify the category or method of feature extraction
MUST be one of the following enumerated values:
volumesizeshape- morphological measurements (area, volume, shape descriptors)Colocalization- channel colocalization metricsGranularity- granular spectrum and texture-at-scale featuresIntensity- pixel intensity statisticsNeighbors- spatial relationship and neighbor countingSAMMed3D- deep learning features from SAM-med3d modelTexture- haralick texture features
MUST NOT contain whitespace
MUST use pascalcase capitalization
MUST NOT include version numbers or implementation details
Example: Intensity, Texture, Colocalization
2.2.4 Measurement component
The <measurement> component:
MUST identify the specific measurement or metric
MUST NOT contain underscores, periods, spaces, or forward slashes
MUST replace prohibited characters with hyphens (
-)SHOULD use pascalcase for measurement names to maintain consistency
MAY include parameter values appended with hyphens (e.g.,
entropy-256-3)MUST be descriptive and unambiguous
Character replacement rules:
Underscore (
_) → hyphen (-)Period (
.) → hyphen (-)Space (
) → hyphen (-)Forward slash (
/) → hyphen (-)
Example: meanintensity, entropy-256-3, angularsecondmoment
2.3 Complete feature name examples
Valid feature names conforming to this specification:
Nuclei_DNA_Intensity_MeanIntensity
Cytoplasm_Mito_Texture_Entropy-256-3
Cell_DNA-Mito_Colocalization_Correlation
Organoid_NoChannel_volumesizeshape_Volume
Nuclei_NoChannel_Neighbors_AdjacentCount
Cell_Mito_Granularity_Spectrum-10
Nuclei_DNA_SAMMed3D_CLSFeature-512
2B. metadata naming specification
2.1 General structure
Metadata are non morphology feature values that provide contextual information about the sample, experiment, or imaging conditions, or objects in the dataset. Metadata are used to capture information that may be relevant for analysis, interpretation, or downstream processing but do not represent morphological measurements of the objects themselves.
Metadata names MUST conform to the following structure:
Metadata_<featurecategory>_<featurename>
Where each component is separated by a single underscore character (_).
The Metadata_ prefix is used to clearly distinguish metadata features from morphological features in the dataset.
Each category metadata name MUST be in pascalcase and MUST NOT contain whitespace or special characters. the <featurename> component MUST be descriptive and unambiguous, following the same character restrictions as morphological feature names.
2.2 Metadata category definitions
The <featurecategory> component MUST identify the type (category) of metadata and MUST be one of the following enumerated values:
Storage- metadata related to data storage and file management.Biology- metadata related to biological characteristics of the sample.Experiment- metadata related to experimental conditions and treatments.Imaging- metadata related to imaging parameters and conditions.Microscopy- metadata related to microscopy settings and configurations.Object- metadata related to specific objects (e.g., nuclei) or regions of interest in the dataset.Neighbors- metadata related to spatial relationships and neighbor counts of objects in the dataset.Location- metadata related to spatial information and coordinates.Other- this is a place holder for any metadata that might be used in the future that does not fit into the above categories. new categories can be added as needed, but theothercategory provides a catch-all for any metadata that does not fit into the predefined categories.
2.3 Complete metadata name examples
Valid metadata names conforming to this specification:
Metadata_Storage_FilePath
Metadata_Biology_PatientID
Metadata_Experiment_Treatment
Metadata_Imaging_ExposureTime
Metadata_Microscopy_Magnification
Metadata_Object_ObjectID
Metadata_Neighbors_AdjacentCount
Metadata_Location_Cell_CentroidX
3. References
3.1 Normative references
RFC 2119: key words for use in rfcs to indicate requirement levels Https://www.ietf.org/rfc/rfc2119.txt
Apache parquet format specification Https://parquet.apache.org/docs/file-format/
3.2 Informative references
Cellprofiler feature naming specification Influenced naming structure for biological image analysis
OME data model Open microscopy environment standards for microscopy data