# Feature schema **Status:** experimental | **Version:** 0.0.1 | **Date:** April 9th, 2026 | **Authors:** ZEDprofiler development team ```{important} Every feature column produced by ZEDprofiler follows this naming pattern: **`{Compartment}_{Channel}_{FeatureType}_{Measurement}`** Examples: `Nuclei_DNA_Intensity_MeanIntensity`, `Cell_DNA-Mito_Colocalization_Correlation` Metadata columns use a separate prefix: **`Metadata_{Category}_{Name}`** Examples: `Metadata_Experiment_ImageSet`, `Metadata_Object_ObjectID` ``` ______________________________________________________________________ ## Abstract This document specifies the naming specification and schema for morphological features extracted using ZEDprofiler. The specification defines requirements for feature identifiers, data structures, and formatting rules to ensure consistency, interoperability, and maintainability across pipelines. ______________________________________________________________________ ## 1. Introduction ### 1.1 Purpose This specification establishes a standardized feature naming specification and data schema. Standardization enables: - Consistent feature identification across analysis stages - Automated feature parsing and metadata extraction - Integration with downstream analysis tools - Reproducible research outputs ### 1.2 Scope This specification applies to all feature extraction modules within the pipeline, including but not limited to: - Area, size, and shape measurements - Colocalization analysis - Granularity features - Intensity measurements - Neighbor relationships - Deep learning features (sammed3d) - Texture features ### 1.3 Key words The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). ______________________________________________________________________ ## 2A. feature name format specification ### 2.1 General structure Feature names MUST conform to the following structure: ``` ___ ``` Where each component is separated by a single underscore character (`_`). ### 2.2 Component definitions #### 2.2.1 Compartment component The `` component: - MUST identify the cellular or spatial compartment from which the feature is extracted - MUST be one of the following enumerated values: - `Nuclei` - nuclear compartment - `Cell` - whole cell compartment - `Cytoplasm` - cytoplasmic compartment (cell excluding nucleus) - `Organoid` - organoid-level compartment - MUST NOT contain whitespace or special characters - MUST use pascalcase capitalization **Example:** `Nuclei`, `Cytoplasm`, `Organoid` #### 2.2.2 Channel component The `` component: - MUST identify the imaging channel or fluorophore used for the measurement - MUST be one of the following values: - `DNA` - DAPI/hoechst nuclear stain (405nm excitation) - `AGP` - AGP marker (488nm excitation) - `ER` - endoplasmic reticulum marker (555nm excitation) - `Mito` - mitochondrial marker (640nm excitation) - `BF` - brightfield/transmitted light - MUST NOT contain whitespace - MAY use pascalcase capitalization - MAY use hyphen-separated channel combinations for colocalization features (e.g., `DNA-mito`) - MUST list channels in alphabetical order when combined (e.g., `DNA-mito` not `mito-DNA`) - MUST be set to `NoChannel` for channel-independent features (e.g., volumesizeshape) **Example:** `DNA`, `Mito`, `DNA-Mito` #### 2.2.3 Featuretype component The `` component: - MUST identify the category or method of feature extraction - MUST be one of the following enumerated values: - `volumesizeshape` - morphological measurements (area, volume, shape descriptors) - `Colocalization` - channel colocalization metrics - `Granularity` - granular spectrum and texture-at-scale features - `Intensity` - pixel intensity statistics - `Neighbors` - spatial relationship and neighbor counting - `SAMMed3D` - deep learning features from SAM-med3d model - `Texture` - haralick texture features - MUST NOT contain whitespace - MUST use pascalcase capitalization - MUST NOT include version numbers or implementation details **Example:** `Intensity`, `Texture`, `Colocalization` #### 2.2.4 Measurement component The `` component: - MUST identify the specific measurement or metric - MUST NOT contain underscores, periods, spaces, or forward slashes - MUST replace prohibited characters with hyphens (`-`) - SHOULD use pascalcase for measurement names to maintain consistency - MAY include parameter values appended with hyphens (e.g., `entropy-256-3`) - MUST be descriptive and unambiguous **Character replacement rules:** - Underscore (`_`) → hyphen (`-`) - Period (`.`) → hyphen (`-`) - Space (` `) → hyphen (`-`) - Forward slash (`/`) → hyphen (`-`) **Example:** `meanintensity`, `entropy-256-3`, `angularsecondmoment` ### 2.3 Complete feature name examples Valid feature names conforming to this specification: ``` Nuclei_DNA_Intensity_MeanIntensity Cytoplasm_Mito_Texture_Entropy-256-3 Cell_DNA-Mito_Colocalization_Correlation Organoid_NoChannel_volumesizeshape_Volume Nuclei_NoChannel_Neighbors_AdjacentCount Cell_Mito_Granularity_Spectrum-10 Nuclei_DNA_SAMMed3D_CLSFeature-512 ``` ______________________________________________________________________ ## 2B. metadata naming specification ### 2.1 General structure Metadata are non morphology feature values that provide contextual information about the sample, experiment, or imaging conditions, or objects in the dataset. Metadata are used to capture information that may be relevant for analysis, interpretation, or downstream processing but do not represent morphological measurements of the objects themselves. Metadata names MUST conform to the following structure: ``` Metadata__ ``` Where each component is separated by a single underscore character (`_`). The `Metadata_` prefix is used to clearly distinguish metadata features from morphological features in the dataset. Each category metadata name MUST be in pascalcase and MUST NOT contain whitespace or special characters. the `` component MUST be descriptive and unambiguous, following the same character restrictions as morphological feature names. ### 2.2 Metadata category definitions The `` component MUST identify the type (category) of metadata and MUST be one of the following enumerated values: - `Storage` - metadata related to data storage and file management. - `Biology` - metadata related to biological characteristics of the sample. - `Experiment` - metadata related to experimental conditions and treatments. - `Imaging` - metadata related to imaging parameters and conditions. - `Microscopy` - metadata related to microscopy settings and configurations. - `Object` - metadata related to specific objects (e.g., nuclei) or regions of interest in the dataset. - `Neighbors` - metadata related to spatial relationships and neighbor counts of objects in the dataset. - `Location` - metadata related to spatial information and coordinates. - `Other` - this is a place holder for any metadata that might be used in the future that does not fit into the above categories. new categories can be added as needed, but the `other` category provides a catch-all for any metadata that does not fit into the predefined categories. ### 2.3 Complete metadata name examples Valid metadata names conforming to this specification: ``` Metadata_Storage_FilePath Metadata_Biology_PatientID Metadata_Experiment_Treatment Metadata_Imaging_ExposureTime Metadata_Microscopy_Magnification Metadata_Object_ObjectID Metadata_Neighbors_AdjacentCount Metadata_Location_Cell_CentroidX ``` ## 3. References ### 3.1 Normative references - **RFC 2119**: key words for use in rfcs to indicate requirement levels Https://www.ietf.org/rfc/rfc2119.txt - **Apache parquet format specification** Https://parquet.apache.org/docs/file-format/ ### 3.2 Informative references - **Cellprofiler feature naming specification** Influenced naming structure for biological image analysis - **OME data model** Open microscopy environment standards for microscopy data