The goal of the phenopacket-schema is to define a machine-readable phenotypic description of a patient/sample in the context of rare disease, common/complex disease, or cancer. It aims to provide sufficient and shareable information of the data outside of the EHR (Electronic Health Record) with the aim of enabling capturing of sufficient structured data at the point of care by a clinician or clinical geneticist for sharing with other labs or computational analysis of the data in clinical or research environments.
The phenopacket schema defines a common, limited set of data types which may be composed into more specialised types for data sharing between resources using an agreed upon common schema.
This common schema has been used to define the ‘Phenopacket’ which is a catch-all collection of data types, specifically focused on representing disease data both initial data capture and analysis. The phenopacket schema is designed to be both human and machine-readable, and to inter-operate with standards being developed in organizations such as in the ISO TC215 committee and the HL7 Fast Healthcare Interoperability Resources Specification (aka FHIR®).
The diagram below shows an overview of the schema elements.
Version 2.0 includes significant changes and additions to the model to enable better representation of cancer and common disease, as well as catering for the original use-case for rare-disease.
The following elements and their sub-elements were added to the 2.0 schema. Other additional fields have been added throughout the schema.
Added a new Measurement message for capturing quantitative, ordinal (e.g., absent/present), or categorical measurements. This element is available as a repeated field in the Phenopacket and Biosample top-level elements.
The TimeElement was added to collect the various ways of expressing time or age throughout the schema. In general where there was an onset or start time, a resolution or end TimeElement has been added.
The .proto files in the schema have been re-organised into more self-contained logical groups extracted from the base.proto file. These files are all organised into a v2 package which lives alongside the v1 package. For some language bindings it may be required to fix import paths for code created with the previous version to compile against the latest release, but otherwise code using v1.0 of the schema should work identically.
Time in Individual, Biosample, Disease, Phenotypic Feature¶
The TimeElement replaces the onset oneof in PhenotypicFeature and Disease, the time_of_collection field in Biosample. The Individual age field has been replaced with a time_at_encounter TimeElement and Biosample individual_age_at_collection has been replaced with a time_of_collection TimeElement. PhenotypicFeature ‘negated’ field was renamed to ‘excluded’ to be in line with Disease when indicating an absent phenotype.
Gene and Variant contexts¶
The v2.0 Interpretation is now a sub-element of a phenopacket, rather than an enclosing element. The change
allows for better semantics on the
Gene (now replaced by GeneDescriptor) and
Variant (now replaced by VariationDescriptor)
types and their relationship to an Individual or Biosample in the context of a Diagnosis
based on a GenomicInterpretation.