.. _rstpython: ################################### Working with Phenopackets in Python ################################### Similarly to :ref:`Java `, the :ref:`Phenopacket Schema ` can be considered the source of truth for the specification, and the JSON produced by an arbitrary implementation can be used to inter-operate with other services. Nevertheless, we **strongly** suggest to use the `phenopackets` library available from Python Package Index (PyPi) or use the Python bindings generated by Protobuf compiler from the Protobuf files. Here we provide a brief overview of the `phenopackets` library. Install `phenopackets` into your Python environment *************************************************** The `phenopackets` package can be installed from PyPi by running: .. code-block:: shell python3 -m pip install phenopackets We use `pip` to install `phenopackets` and the required libraries/dependencies. Create building blocks programmatically *************************************** Let's start by importing all building blocks of Phenopacket Schema v2: >>> import phenopackets.schema.v2 as pps2 Now we can access all building blocks of v2 Phenopacket Schema via `pps2` alias. For instance, we can create an :ref:`Ontology class ` that corresponds to a Human Phenotype Ontology term for *Spherocytosis* (`HP:0004444`): >>> spherocytosis = pps2.OntologyClass(id='HP:0004444', label='Spherocytosis') >>> spherocytosis # doctest: +NORMALIZE_WHITESPACE id: "HP:0004444" label: "Spherocytosis" All schema building blocks, including `OntologyClass`, are available under `pps2` alias, and can be created with constructors that accept key/value arguments. The constructors will not allow passing of arbitrary attributes: >>> pps2.OntologyClass(foo='bar') Traceback (most recent call last): ... ValueError: Protocol message OntologyClass has no "foo" field. We do not have to provide all attributes at the creation time and we can set the fields sequentially using Python property syntax, to achieve the same outcome: >>> spherocytosis2 = pps2.OntologyClass() >>> spherocytosis2.id = 'HP:0004444' >>> spherocytosis2.label = 'Spherocytosis' >>> spherocytosis == spherocytosis2 True However, setting the field values with property syntax only works for `singular `_ (non-message) fields, such as `bool`, `int`, `str`, or `float`, and the assignment will *NOT* work for message fields: >>> pf = pps2.PhenotypicFeature() >>> pf.type = spherocytosis # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... AttributeError: Assignment not allowed to composite field "type" in protocol message object. To set a message field, we must use the `CopyFrom` function: >>> pf.type.CopyFrom(spherocytosis) >>> pf # doctest: +NORMALIZE_WHITESPACE type { id: "HP:0004444" label: "Spherocytosis" } Last, a repeated field can be set using list-like semantics: >>> modifiers = ( ... pps2.OntologyClass(id='HP:0003623', label='Neonatal onset'), ... pps2.OntologyClass(id='HP:0011010', label='Chronic'), ... ) >>> pf.modifiers.extend(modifiers) >>> pf # doctest: +NORMALIZE_WHITESPACE type { id: "HP:0004444" label: "Spherocytosis" } modifiers { id: "HP:0003623" label: "Neonatal onset" } modifiers { id: "HP:0011010" label: "Chronic" } See `Protobuf documentation `_ for more info. Building blocks I/O ******************* Having an instance with data, we can write the content into Protobuf's wire format: >>> binary_str = pf.SerializeToString() >>> binary_str b'\x12\x1b\n\nHP:0004444\x12\rSpherocytosis*\x1c\n\nHP:0003623\x12\x0eNeonatal onset*\x15\n\nHP:0011010\x12\x07Chronic' and get the same content back: >>> pf2 = pps2.PhenotypicFeature() >>> _ = pf2.ParseFromString(binary_str) >>> pf == pf2 True We can also dump the content of the building block to a *JSON* string or to a `dict` with Python objects using `MessageToJson `_ or `MessageToDict `_ functions: >>> from google.protobuf.json_format import MessageToDict >>> json_dict = MessageToDict(pf) >>> json_dict {'type': {'id': 'HP:0004444', 'label': 'Spherocytosis'}, 'modifiers': [{'id': 'HP:0003623', 'label': 'Neonatal onset'}, {'id': 'HP:0011010', 'label': 'Chronic'}]} We complete the JSON round-trip using `Parse `_ or `ParseDict `_ functions: >>> from google.protobuf.json_format import ParseDict >>> pf2 = ParseDict(json_dict, pps2.PhenotypicFeature()) >>> pf == pf2 True