Working with Phenopackets in Python

Similarly to Java, the Phenopacket Schema can be considered the source of truth for the specification, and the JSON produced by an arbitrary implementation can be used to inter-operate with other services. Nevertheless, we strongly suggest to use the phenopackets library available from Python Package Index (PyPi) or use the Python bindings generated by Protobuf compiler from the Protobuf files.

Here we provide a brief overview of the phenopackets library.

Install phenopackets into your Python environment

The phenopackets package can be installed from PyPi by running:

python3 -m pip install phenopackets

We use pip to install phenopackets and the required libraries/dependencies.

Create building blocks programmatically

Let’s start by importing all building blocks of Phenopacket Schema v2:

>>> import phenopackets.schema.v2 as pps2

Now we can access all building blocks of v2 Phenopacket Schema via pps2 alias.

For instance, we can create an Ontology class that corresponds to a Human Phenotype Ontology term for Spherocytosis (HP:0004444):

>>> spherocytosis = pps2.OntologyClass(id='HP:0004444', label='Spherocytosis')
>>> spherocytosis 
  id: "HP:0004444"
  label: "Spherocytosis"

All schema building blocks, including OntologyClass, are available under pps2 alias, and can be created with constructors that accept key/value arguments. The constructors will not allow passing of arbitrary attributes:

>>> pps2.OntologyClass(foo='bar')
Traceback (most recent call last):
  ...
ValueError: Protocol message OntologyClass has no "foo" field.

We do not have to provide all attributes at the creation time and we can set the fields sequentially using Python property syntax, to achieve the same outcome:

>>> spherocytosis2 = pps2.OntologyClass()
>>> spherocytosis2.id = 'HP:0004444'
>>> spherocytosis2.label = 'Spherocytosis'
>>> spherocytosis == spherocytosis2
True

However, setting the field values with property syntax only works for singular (non-message) fields, such as bool, int, str, or float, and the assignment will NOT work for message fields:

>>> pf = pps2.PhenotypicFeature()
>>> pf.type = spherocytosis 
Traceback (most recent call last):
  ...
AttributeError: Assignment not allowed to composite field "type" in protocol message object.

To set a message field, we must use the CopyFrom function:

>>> pf.type.CopyFrom(spherocytosis)
>>> pf 
  type {
    id: "HP:0004444"
    label: "Spherocytosis"
  }

Last, a repeated field can be set using list-like semantics:

>>> modifiers = (
...   pps2.OntologyClass(id='HP:0003623', label='Neonatal onset'),
...   pps2.OntologyClass(id='HP:0011010', label='Chronic'),
... )
>>> pf.modifiers.extend(modifiers)
>>> pf 
  type {
    id: "HP:0004444"
    label: "Spherocytosis"
  }
  modifiers {
    id: "HP:0003623"
    label: "Neonatal onset"
  }
  modifiers {
    id: "HP:0011010"
    label: "Chronic"
  }

See Protobuf documentation for more info.

Building blocks I/O

Having an instance with data, we can write the content into Protobuf’s wire format:

>>> binary_str = pf.SerializeToString()
>>> binary_str
b'\x12\x1b\n\nHP:0004444\x12\rSpherocytosis*\x1c\n\nHP:0003623\x12\x0eNeonatal onset*\x15\n\nHP:0011010\x12\x07Chronic'

and get the same content back:

>>> pf2 = pps2.PhenotypicFeature()
>>> _ = pf2.ParseFromString(binary_str)
>>> pf == pf2
True

We can also dump the content of the building block to a JSON string or to a dict with Python objects using MessageToJson or MessageToDict functions:

>>> from google.protobuf.json_format import MessageToDict
>>> json_dict = MessageToDict(pf)
>>> json_dict
{'type': {'id': 'HP:0004444', 'label': 'Spherocytosis'}, 'modifiers': [{'id': 'HP:0003623', 'label': 'Neonatal onset'}, {'id': 'HP:0011010', 'label': 'Chronic'}]}

We complete the JSON round-trip using Parse or ParseDict functions:

>>> from google.protobuf.json_format import ParseDict
>>> pf2 = ParseDict(json_dict, pps2.PhenotypicFeature())
>>> pf == pf2
True