The phenopacket schema can be used with any ontologies. The phenopacket can be compared to a hierarchical structure with “slots” for ontology terms and other data. Different use cases may require different ontology terms to cover the subject matter or to fulfil requirements of a specific research project. The spectrum of requirements is so broad that we do not think it is appropriate to require a specific set of ontologies for use with phenopackets. Nonetheless, the value of phenopacket-encoded data will be greatly increased if the community of users converges towards a common set of ontologies (to the extent possible). Here, we provide general recommendations for ontologies that we have found to be useful. This list is incomplete and we would welcome feedback from the community about ontologies that should be added to this page.
We do anticipate that individual research consortia or other groups should agree on a set of allowed ontologies for specific projects.
Mondo Disease Ontology provides a comprehensive logically structured ontology of diseases that integrates multiple other disease ontologies.
|dilated cardiomyopathy 3B||MONDO:0010542|
Other disease ontologies of note include The National Cancer Institute’s thesaurus (NCIT), Orphanet Rare Disease Ontology (ORDO), Disease Ontology (DO), and the Online Mendelian Inheritance in Man (OMIM).
The Human Phenotype Ontology (HPO) provides a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human disease.
|Patent ductus arteriosus||HP:0001643|
UBERON is an integrated cross-species ontology with classes representing a variety of anatomical entities.
The HUGO Gene Nomenclature Committee (HGNC) provides standard names, symbols, and IDs for human genes.
Units of Measurement¶
The Units of measurement ontology (denoted UO) provides terms for units commonly encountered in medical data. The following table shows some typical examples.
|millimetres of mercury||UO:0000272|
GENO is anontology of genotypes their more fundamental sequence components, and links to related biological and experimental entities. We use GENO terms to denote genotypes.
Logical Observation Identifiers Names and Codes (LOINC) is a database and universal standard for identifying medical laboratory observations. It can be used to denote clinical assays in the Measurement element.
|Platelets [#/volume] in Blood||LOINC:26515-7|
|Calcium [Mass/volume] in Serum or Plasma||LOINC:17861-6|
DrugCentral integrates a broad spectrum of drug resources related to chemical structures, biological activities, regulatory data, pharmacology and drug formulations
Other ontologies with coverage of drugs include ChEBI, RxNorm, and DrugBank.
The National Cancer Institute’s Thesaurus¶
The National Cancer Institute’s thesaurus (NCIT) provides a wide range of terms that can be useful for phenopackets. In addition to providing an ontology of cancers, NCIT provides terms for procedures, findings, units or measurement, scheduling, etc. The following table shows an an example pf the subhierarchy for Unit of Measure (NCIT:C25709). and for Schedule Frequency (NCIT:C64493).
|Milligram per Kilogram per Dose||NCIT:C124458|
|Cells per Milliliter||NCIT:C74919|
Experimental Factor Ontology¶
Experimental factor ontology (EFO) is an ontology of experimental variables particularly those used in molecular biology. EFO imports terms from many source ontologies and provides additional terms needed to provide a systematic description of many experimental variables available in EBI databases.
|milligram per kilogram||EFO:0002902|