Biosample

A Biosample refers to a unit of biological material from which the substrate molecules (e.g. genomic DNA, RNA, proteins) for molecular analyses (e.g. sequencing, array hybridisation, mass-spectrometry) are extracted. Examples would be a tissue biopsy, a single cell from a culture for single cell genome sequencing or a protein fraction from a gradient centrifugation. Several instances (e.g. technical replicates) or types of experiments (e.g. genomic array as well as RNA-seq experiments) may refer to the same Biosample.

Data model

Field Type Status Description
id string required arbitrary identifier
individual_id string recommended arbitrary identifier
description string optional arbitrary text
sampled_tissue OntologyClass required Tissue from which the sample was taken
phenotypic_features PhenotypicFeature recommended List of phenotypic abnormalities of the sample
taxonomy OntologyClass optional Species of the sampled individual
individual_age_at_collection Age OR AgeRange recommended Age of the proband at the time the sample was taken
histological_diagnosis OntologyClass recommended Disease diagnosis that was inferred from the histological examination
tumor_progression OntologyClass recommended Indicates primary, metastatic, recurrent
tumor_grade OntologyClass recommended List of terms representing the tumor grade
diagnostic_markers OntologyClass recommended Clinically relevant biomarkers
procedure Procedure required The procedure used to extract the biosample
hts_files HtsFile optional list of high-throughput sequencing files derived from the biosample
variants Variant optional List of variants determined to be present in the biosample
is_control_sample boolean optional (default: false) whether the sample is being used as a normal control

Example

{
  "id": "sample1",
  "individualId": "patient1",
  "description": "",
  "sampledTissue": {
    "id": "UBERON_0001256",
    "label": "wall of urinary bladder"
  },
  "ageOfIndividualAtCollection": {
    "age": "P52Y2M"
  },
  "histologicalDiagnosis": {
    "id": "NCIT:C39853",
    "label": "Infiltrating Urothelial Carcinoma"
  },
  "tumorProgression": {
    "id": "NCIT:C84509",
    "label": "Primary Malignant Neoplasm"
  },
  "procedure": {
    "code": {
      "id": "NCIT:C5189",
      "label": "Radical Cystoprostatectomy"
    }
  },
  "htsFiles": [{
    "uri": "file://data/genomes/urothelial_ca_wgs.vcf.gz",
    "description": "Urothelial carcinoma sample"
    "htsFormat": "VCF",
    "genomeAssembly": "GRCh38",
    "individualToSampleIdentifiers": {
      "patient1": "NA12345"
    }
  }],
  "variants": [],
  "isControlSample": false
}

id

The Biosample id. This is unique in the context of the server instance.

individual_id

The id of the Individual this biosample was derived from. It is recommended, but not necessary to provide this information here if the Biosample is being transmitted as a part of a Phenopacket.

description

The biosample’s description. This attribute contains human readable text. The “description” attributes should not contain any structured data.

sampled_tissue

On OntologyClass describing the tissue from which the specimen was collected. We recommend the use of UBERON. The PDX MI mapping is Specimen tumor tissue.

phenotypic_features

The phenotypic characteristics of the BioSample, for example histological findings of a biopsy. See PhenotypicFeature for further information.

taxonomy

For resources where there may be more than one organism being studied it is advisable to indicate the taxonomic identifier of that organism, to its most specific level. We advise using the codes from the NCBI Taxonomy resource. For instance, NCBITaxon:9606 is human (homo sapiens sapiens) and or NCBITaxon:9615 is dog.

individual_age_at_collection

An Age or AgeRange describing the age or age range of the individual this biosample was derived from at the time of collection. See Age for further information.

histological_diagnosis

This is the pathologist’s diagnosis and may often represent a refinement of the clinical diagnosis (which could be reported in the Phenopacket that contains this Biosample). Normal samples would be tagged with the term “NCIT:C38757”, “Negative Finding”. See OntologyClass for further information.

tumor_progression

This field can be used to indicate if a specimen is from the primary tumor, a metastasis or a recurrence. There are multiple ways of representing this using ontology terms, and the terms chosen should have a specific meaning that is application specific.

For example a term from the following NCIT terms from the Neoplasm by Special Category can be chosen.

tumor_grade

This should be a child term of NCIT:C28076 (Disease Grade Qualifier) or equivalent. See the tumor grade fact sheet.

diagnostic_markers

Clinically relevant bio markers. Most of the assays such as immunohistochemistry (IHC) are covered by the NCIT under the sub-hierarchy NCIT:C25294 (Laboratory Procedure), e.g. NCIT:C68748 (HER2/Neu Positive), NCIT:C131711 (Human Papillomavirus-18 Positive).

procedure

The clinical procedure performed on the subject in order to extract the biosample. See Procedure for further information.

hts_files

This element contains a list of pointers to the relevant HTS file(s) for the biosample. Each element describes what type of file is meant (e.g., BAM file), which genome assembly was used for mapping, as well as a map of samples and individuals represented in that file. It also contains a URI element which refers to a file on a given file system or a resource on the web.

See HtsFile for further information.

variants

This is a field for genetic variants and can be used for listing either candidate variants or diagnosed causative variants. If this biosample represents a cancer specimen, the variants might refer to somatic variants identified in the biosample. The resources using these fields should define what this represents in their context. See Variant for further information.

is_control_sample

A boolean (true/false) value. If true, this sample is being use as a normal control, often in combination with another sample that is thought to contain a pathological finding the default value is false.