Individual

The subject of the Phenopacket is represented by an Individual element. This element intends to represent an individual human or other organism. In this documentation, we explain the element using the example of a human proband in a clinical investigation.

Data model

Field Type Status Description
id string required An arbitrary identifier
alternate_ids a list of CURIE optional A list of alternative identifiers for the individual
date_of_birth timestamp optional A timestamp either exact or imprecise
age Age or AgeRange recommended The age or age range of the individual
sex Sex recommended Observed apparent sex of the individual
karyotypic_sex KaryotypicSex optional The karyotypic sex of the individual
taxonomy OntologyClass optional an OntologyClass representing the species (e.g., NCBITaxon:9615)

Example

The following example is typical but does not make use of all of the optional fields of this element.

{
    "id": "patient:0",
    "dateOfBirth": "1998-01-01T00:00:00Z",
    "sex": "MALE"
}

id

This element is the primary identifier for the individual and SHOULD be used in other parts of a message when referring to this individual - for example in a Pedigree or Biosample. The contents of the element are context dependent, and will be determined by the application. For instance, if the Phenopacket is being used to represent a case study about an individual with some genetic disease, the individual may be referred to in that study by their position in the pedigree, e.g., III:2 for the second person in the third generation. In this case, id would be set to III:2.

If a Pedigree element is used, it is essential that the individual_id of the Pedigree element matches the id field here.

If a Biosample element is used, it is essential that the individual_id of the Biosample element matches the id field here.

All identifiers within a phenopacket pertaining to an individual SHOULD use this identifier. It is the responsibility of the sender to provide the recipient an internally consistent message. This is possible as all messages can be created dynamically be the sender using identifiers appropriate for the receiving system.

For example, a hospital may want to send a Family to an external lab for analysis. Here the hospital is providing an obfuscated identifier which is used to identify the individual in the Phenopacket, the Pedigree and mappings to the sample id in the rsthtsfile.

In this case the Pedigree is created by the sending system from whatever source they use and the identifiers should be mapped to those Individual.id contained in the Family.proband and Family.relatives phenopackets.

In the case the VCF file, the sending system likely has no control or ability to change the identifiers used for the sample id and it is likely they use different identifiers. It is for this reason the rsthtsfile has a local mapping field HtsFile.individual_to_sample_identifiers where the Individual.id can be mapped to the sample id in that file.

example

In this example we show individual blocks which would be used as part of a singleton ‘family’ to illustrate the use of the internally consistent Individual.id. As noted above, the data may have been constructed by the sender from different sources but given they know these relationships, they should provide the receiver with a consistent view of the data both for ease of use and to limit incorrect mapping.

"individual": {
  "id": "patient23456",
  "dateOfBirth": "1998-01-01T00:00:00Z",
  "sex": "MALE"
}

"htsFile": {
    "uri": "file://data/genomes/germline_wgs.vcf.gz",
    "description": "Germline sample",
    "htsFormat": "VCF",
    "genomeAssembly": "GRCh38",
    "individualToSampleIdentifiers": {
      "patient23456": "NA12345"
    }
}

"pedigree": {
    "persons": [
        {
            "familyId": "family 1",
            "individualId": "patient23456",
            "sex": "MALE",
            "affectedStatus": "AFFECTED"
        }
    ]
}

alternate_ids

An optional list of alternative identifiers for this individual. These should be in the form of rstcurie`s and hence have a corresponding :ref:`rstresource listed in the MetaData. These should not be used elsewhere in the phenopacket as this will break the assumptions required for using the id field as the primary identifier. This field is provided for the convenience of users who may have multiple mappings to an individual which they need to track.

date_of_birth

This element represents the date of birth of the individual as an ISO8601 UTC timestamp that is rounded down to the closest known year/month/day/hour/minute. For example:

  • “2018-03-01T00:00:00Z” for someone born on an unknown day in March 2018
  • “2018-01-01T00:00:00Z” for someone born on an unknown day in 2018
  • empty if unknown/ not stated.

See here for more information about timestamps.

The element is provided for use cases within protected networks, but it many situations the element should not be used in order to protect the privacy of the individual. Instead, the Age element should be preferred.

age

An age object describing the age of the individual at the time of collection of biospecimens or phenotypic observations reported in the current Phenopacket. It is specified using either an Age element, which can represent an Age in several different ways, or an AgeRange element, which can represent a range of ages such as 10-14 years (age can be represented in this was to protect privacy of study participants).

sex

Phenopackets make use of an enumeration to denote the phenotypic sex of an individual. See Sex.

karyotypic_sex

Phenopackets make use of an enumeration to denote the chromosomal sex of an individual. See KaryotypicSex.

taxonomy

For resources where there may be more than one organism being studied it is advisable to indicate the taxonomic identifier of that organism, to its most specific level. We advise using the codes from the NCBI Taxonomy resource. For instance, NCBITaxon:9606 is human (homo sapiens sapiens) and or NCBITaxon:9615 is dog.