Gene

This element represents an identifier for a gene. It can be used to transmit the information that the gene is thought to play a causative role in the disease phenotypes being described in cases where the exact variant cannot be transmitted, either for privacy reasons or because it is unknown.

Data model

Definition of the Gene element
Field Type Status Description
id string required Official identifier of the gene
alternate_ids repeated string optional Alternative identifier(s) of the gene
symbol string required Official gene symbol

Example

{
  "id": "HGNC:347"
  "symbol": "ETF1"
}

Optionally, with alternative identifiers:

{
  "id": "HGNC:347",
  "alternate_ids": ["ensembl:ENSRNOG00000019450", "ncbigene:307503"],
  "symbol": "ETF1"
}

id

The id represents the accession number of comparable identifier for the gene.

It SHOULD be a CURIE identifier with a prefix that is used by the official organism gene nomenclature committee. In the case of Humans, this is the HGNC e.g. HGNC:347

alternate_ids

This field can be used to provide identifiers to alternative resources where this gene is used or catalogued. For example, the NCBI and Ensemble both provide alternative identifiers for genes where they catalogue the transcripts for a gene e.g. ncbigene:2107, ensembl:ENSG00000120705 These identifiers SHOULD be represented in CURIE form with a corresponding Resource.

symbol

This SHOULD use official gene symbol as designated by the organism gene nomenclature committee. In the case of human this is the HUGO Gene Nomenclature Committee e.g. ETF1.

Model Organisms

Model organisms represented by the Alliance of Genome Resources should use the primary identifier and symbol provided. e.g. for Mus musculus gene eukaryotic translation termination factor 1

{
  "id": "MGI:2385071",
  "alternate_ids": ["ensembl:ENSMUSG00000024360", "ncbigene:225363"],
  "symbol": "Etf1"
}