Individual¶
The subject of the Phenopacket is represented by an Individual element. This element intends to represent an individual human or other organism. In this documentation, we explain the element using the example of a human proband in a clinical investigation.
Data model
Field Type Status Description id string required An arbitrary identifier alternate_ids a list of CURIE optional A list of alternative identifiers for the individual date_of_birth timestamp optional A timestamp either exact or imprecise age Age or AgeRange recommended The age or age range of the individual sex Sex recommended Observed apparent sex of the individual karyotypic_sex KaryotypicSex optional The karyotypic sex of the individual taxonomy OntologyClass optional an OntologyClass representing the species (e.g., NCBITaxon:9615)
Example
The following example is typical but does not make use of all of the optional fields of this element.
{
"id": "patient:0",
"dateOfBirth": "1998-01-01T00:00:00Z",
"sex": "MALE"
}
id¶
This element is the primary identifier for the individual and SHOULD be used in other parts of a message when
referring to this individual - for example in a Pedigree or Biosample. The contents of the element
are context dependent, and will be determined by the application. For instance, if the Phenopacket is being used to
represent a case study about an individual with some genetic disease, the individual may be referred to in that study by
their position in the pedigree, e.g., III:2 for the second person in the third generation. In this case, id would be set
to III:2
.
If a Pedigree element is used, it is essential that the individual_id
of the Pedigree element matches
the id
field here.
If a Biosample element is used, it is essential that the individual_id
of the Biosample element
matches the id
field here.
All identifiers within a phenopacket pertaining to an individual SHOULD use this identifier. It is the responsibility of the sender to provide the recipient an internally consistent message. This is possible as all messages can be created dynamically be the sender using identifiers appropriate for the receiving system.
For example, a hospital may want to send a Family to an external lab for analysis. Here the hospital is providing an obfuscated identifier which is used to identify the individual in the Phenopacket, the Pedigree and mappings to the sample id in the rsthtsfile.
In this case the Pedigree is created by the sending system from whatever source they use and the identifiers should be mapped to those Individual.id contained in the Family.proband and Family.relatives phenopackets.
In the case the VCF file, the sending system likely has no control or ability to change the identifiers used for the sample id and it is likely they use different identifiers. It is for this reason the rsthtsfile has a local mapping field HtsFile.individual_to_sample_identifiers where the Individual.id can be mapped to the sample id in that file.
example
In this example we show individual blocks which would be used as part of a singleton ‘family’ to illustrate the use of the internally consistent Individual.id. As noted above, the data may have been constructed by the sender from different sources but given they know these relationships, they should provide the receiver with a consistent view of the data both for ease of use and to limit incorrect mapping.
"individual": {
"id": "patient23456",
"dateOfBirth": "1998-01-01T00:00:00Z",
"sex": "MALE"
}
"htsFile": {
"uri": "file://data/genomes/germline_wgs.vcf.gz",
"description": "Germline sample",
"htsFormat": "VCF",
"genomeAssembly": "GRCh38",
"individualToSampleIdentifiers": {
"patient23456": "NA12345"
}
}
"pedigree": {
"persons": [
{
"familyId": "family 1",
"individualId": "patient23456",
"sex": "MALE",
"affectedStatus": "AFFECTED"
}
]
}
alternate_ids¶
An optional list of alternative identifiers for this individual. These should be in the form of rstcurie`s and hence have a
corresponding :ref:`rstresource listed in the MetaData. These should not be used elsewhere in the phenopacket
as this will break the assumptions required for using the id
field as the primary identifier. This field is provided
for the convenience of users who may have multiple mappings to an individual which they need to track.
date_of_birth¶
This element represents the date of birth of the individual as an ISO8601 UTC timestamp that is rounded down to the closest known year/month/day/hour/minute. For example:
- “2018-03-01T00:00:00Z” for someone born on an unknown day in March 2018
- “2018-01-01T00:00:00Z” for someone born on an unknown day in 2018
- empty if unknown/ not stated.
See here for more information about timestamps.
The element is provided for use cases within protected networks, but it many situations the element should not be used
in order to protect the privacy of the individual. Instead, the Age
element should be preferred.
age¶
An age object describing the age of the individual at the time of collection of biospecimens or phenotypic observations reported in the current Phenopacket. It is specified using either an Age element, which can represent an Age in several different ways, or an AgeRange element, which can represent a range of ages such as 10-14 years (age can be represented in this was to protect privacy of study participants).
sex¶
Phenopackets make use of an enumeration to denote the phenotypic sex of an individual. See Sex.
karyotypic_sex¶
Phenopackets make use of an enumeration to denote the chromosomal sex of an individual. See KaryotypicSex.
taxonomy¶
For resources where there may be more than one organism being studied it is advisable to indicate the taxonomic identifier of that organism, to its most specific level. We advise using the codes from the NCBI Taxonomy resource. For instance, NCBITaxon:9606 is human (homo sapiens sapiens) and or NCBITaxon:9615 is dog.