API reference¶

Auto-generated from the source docstrings via mkdocstrings. Private members (leading _) are omitted.

sdata.metadata¶

metadata ¶

Attribute ¶

Bases: object

Attribute class

guess_dtype `staticmethod` ¶

guess_dtype(value)

returns dtype class

Parameters:

Name	Type	Description	Default
`value`			required

Returns:

Type	Description
	class

to_dict ¶

to_dict()

Return a dict of the attribute's items (name/value/unit/dtype/...).

to_csv ¶

to_csv(prefix='', sep=',', quote=None)

export Attribute to csv

Parameters:

Name	Type	Description	Default
`prefix`			`''`
`sep`			`','`
`quote`			`None`

Returns:

Type	Description

Metadata ¶

Bases: object

Metadata container class

each Metadata entry has has a * name (256) * value * unit * description * type (int, str, float, bool, timestamp)

df `property` ¶

df

create dataframe

udf `property` ¶

udf

create dataframe for user attributes

dft `property` ¶

dft

create transposed dataframe for sdata attributes

sdf `property` ¶

sdf

create dataframe for sdata attributes

sdft `property` ¶

sdft

create transposed dataframe for sdata attributes

a `property` ¶

Attribut-Autocomplete-Accessor für die interaktive Nutzung (m.a.force_x).

size `property` ¶

size

return number uf Attribute

sha3_256 `property` ¶

sha3_256

Return a new SHA3 hash object with a hashbit length of 32 bytes.

Returns:

Type	Description
	hashlib.sha3_256.hexdigest()

add_attribute ¶

add_attribute(attr, **kwargs)

set Attribute

set_attr ¶

set_attr(name='N.N.', value=None, **kwargs)

set Attribute

get_attr ¶

get_attr(name)

get Attribute by name

to_dict ¶

to_dict()

serialize attributes to dict

guess_dtype_from_value `staticmethod` ¶

guess_dtype_from_value(value)

guess dtype from value, e.g. '1.23' -> 'float' 'otto1.23' -> 'str' 1 -> 'int' False -> 'bool'

Parameters:

Name	Type	Description	Default
`value`			required

Returns:

Type	Description
	dtype(value), dtype ['int', 'float', 'bool', 'str', 'list']

update_from_dict ¶

update_from_dict(d, guess_dtype=True)

set attributes from dict

Parameters:

Name	Type	Description	Default
`d`		dict	required

Returns:

Type	Description

from_dict `classmethod` ¶

from_dict(d)

setup metadata from dict

get_sdict ¶

get_sdict()

get sdata attribute as dict

get_udict ¶

get_udict()

get user attribute as dict

get_dict ¶

get_dict()

get user attribute as dict

to_dataframe ¶

to_dataframe()

create dataframe

get_dft ¶

get_dft(index_name=None)

create transposed dataframe for sdata attributes

from_dataframe `classmethod` ¶

from_dataframe(df)

create metadata from dataframe

update_from_usermetadata ¶

update_from_usermetadata(metadata)

update user metadata from metadata

to_csv ¶

to_csv(filepath=None, sep=',', header=False)

serialize to csv

to_csv_header ¶

to_csv_header(prefix='#', sep=',', filepath=None)

serialize to csv

from_csv `classmethod` ¶

from_csv(filepath)

create metadata from dataframe

to_json ¶

to_json(filepath=None)

create a json

Parameters:

Name	Type	Description	Default
`filepath`		default None	`None`

Returns:

Type	Description
	json str

from_json `classmethod` ¶

from_json(jsonstr=None, filepath=None)

create metadata from json file

Parameters:

Name	Type	Description	Default
`jsonstr`		json str	`None`
`filepath`		filepath to json file	`None`

Returns:

Type	Description
	Metadata

to_list ¶

to_list()

create a nested list of Attribute values

Returns:

Type	Description
	list

from_list `classmethod` ¶

from_list(mlist)

create metadata from a list of Attribute values

[['force_x', 1.2, 'kN', 'float', 'force in x-direction'], ['force_y', 3.1, 'N', 'float', 'force in y-direction', 'label', True]]

to_jsonld ¶

to_jsonld(context_mode='inline')

Serialisiere als selbstbeschreibendes JSON-LD-Dokument (dict).

from_jsonld `classmethod` ¶

from_jsonld(doc)

Rekonstruiere Metadata aus einem JSON-LD-Dokument (dict oder str).

get_prefixed ¶

get_prefixed(prefix)

Neue Metadata nur mit Attributen, deren Schlüssel mit prefix beginnt.

validate ¶

validate(schema)

Validiere diese Metadaten gegen ein :class:sdata.schema.MetadataSchema.

apply_schema ¶

apply_schema(schema)

Vervollständige diese Metadaten in-place gemäß einem MetadataSchema.

to_rdf ¶

to_rdf(fmt='turtle')

Serialisiere als RDF (rdflib falls vorhanden, sonst JSON-LD).

to_verifiable_credential ¶

to_verifiable_credential(issuer_did, issuer_priv_jwk, kid=None, extra_claims=None)

Signiere die Metadaten als W3C Verifiable Credential (Compact-JWS).

from_verifiable_credential `classmethod` ¶

from_verifiable_credential(vc_jws, pub_jwk)

Verifiziere ein VC und rekonstruiere die Metadata aus dem credentialSubject.

to_turtle ¶

to_turtle()

Convenience: RDF im Turtle-Format.

write_sidecar ¶

write_sidecar(path=None, indent=2)

Schreibe <sname>.meta.jsonld neben einen Datenblob; gibt den Pfad zurück.

read_sidecar `classmethod` ¶

read_sidecar(path)

Lade eine .meta.jsonld-Sidecar-Datei zu Metadata.

add ¶

add(name, value=None, **kwargs)

add Attribute

Parameters:

Name	Type	Description	Default
`name`			required
`value`			`None`
`kwargs`			`{}`

Returns:

Type	Description

relabel ¶

relabel(name, newname)

relabel Attribute

Parameters:

Name	Type	Description	Default
`name`		old attribute name	required
`newname`		new attribute name	required

Returns:

Type	Description
	None

keys ¶

keys()

Returns:

Type	Description
	list of Attribute names

values ¶

values()

Returns:

Type	Description
	list of Attribute values

items ¶

items()

Returns:

Type	Description
	list of Attribute items (keys, values)

copy ¶

copy()

returns a deep copy

update_hash ¶

update_hash(hashobject)

A hash represents the object used to calculate a checksum of a string of information.

.. code-block:: python

hashobject = hashlib.sha3_256()
metadata = Metadata()
metadata.update_hash(hashobject)
hash.hexdigest()

Parameters:

Name	Type	Description	Default
`hashobject`		hash object	required

Returns:

Type	Description
	hash_function().hexdigest()

set_unit_from_name ¶

set_unit_from_name(add_description=True, fix_name=True)

try to extract unit from attribute name

Returns:

Type	Description

guess_value_dtype ¶

guess_value_dtype()

try to cast the Attribute values, e.g. str -> float

Returns:

Type	Description

is_complete ¶

is_complete()

check all required attributes

extract_name_unit ¶

extract_name_unit(value)

extract name and unit from a combined string

.. code-block:: python

value: 'Target Strain Rate (1/s) '
name : 'Target Strain Rate'
unit : '1/s'

value: 'Gauge Length [mm] monkey '
name : 'Gauge Length'
unit : 'mm'

value: 'Gauge Length <mm> whatever '
name : 'Gauge Length'
unit : 'mm'

Parameters:

Name	Type	Description	Default
`value`		string, e.g. 'Length whatever'	required

Returns:

Type	Description
	name, unit

sdata.base¶

base ¶

Base ¶

Base class for sdata objects with metadata management. Provides core functionality for handling metadata, unique identifiers, serialization, and hierarchical relationships.

md `property` ¶

md

Shortcut to access the metadata object.

mdf `property` ¶

mdf

Shortcut to access the metadata as a pandas DataFrame.

osname `property` ¶

osname

Returns an OS-compatible ASCII name.

class_name `property` ¶

class_name

Returns the class name of the object.

uuid `property` ¶

uuid

Returns the UUID component of the SUUID.

huuid `property` ¶

huuid

Returns the hex representation of the UUID.

suuid `property` ¶

suuid

Returns the SUUID object.

suuid_bytes `property` ¶

suuid_bytes

Returns the SUUID string encoded as UTF-8 bytes.

did `property` ¶

did

DID

parent `property` ¶

parent

Returns the parent as a SUUID object.

project `property` ¶

project

Returns the project as a SUUID object.

udf `property` ¶

udf

Returns the user-defined metadata as a DataFrame.

sdf `property` ¶

sdf

Returns the system-defined metadata as a DataFrame.

get_sdata_spec `classmethod` ¶

get_sdata_spec()

Kanonischer, importierbarer String für eine Klasse: modul:Klasse.

validate ¶

validate()

Validiere die Metadaten gegen das deklarierte SDATA_SCHEMA.

Ohne Schema ist das Ergebnis trivial gültig.

get_sdata_did_method ¶

get_sdata_did_method()

Returns the sdata class id.

to_jsonld ¶

to_jsonld(context_mode='inline')

Serialisiere die Metadaten als JSON-LD (siehe :mod:sdata.semantic).

to_turtle ¶

to_turtle()

Serialisiere die Metadaten als RDF/Turtle (siehe :mod:sdata.semantic).

write_sidecar ¶

write_sidecar(path=None, indent=2)

Schreibe <sname>.meta.jsonld neben einen Datenblob; gibt den Pfad zurück.

read_sidecar `classmethod` ¶

read_sidecar(path)

Lade eine .meta.jsonld-Sidecar-Datei zu einer Metadata.

update_data ¶

update_data(data_dict)

Update the data dictionary with recursive merge for nested dicts.

Parameters:

Name	Type	Description	Default
`data_dict`	`Dict[str, Any]`	Dictionary to merge into self._data.	required

Raises:

Type	Description
`ValueError`	If data_dict is not a dict.

get_parent ¶

get_parent()

Returns the parent SUUID.

get_project ¶

get_project()

Returns the project SUUID.

set_default_attributes ¶

set_default_attributes()

Set default attributes from self.default_attributes list.

to_dict ¶

to_dict()

Convert the object to a dictionary representation.

Returns:

Type	Description
`Dict[str, Any]`	Dictionary with metadata, data, and description.

from_dict `classmethod` ¶

from_dict(d)

Create an instance from a dictionary.

Parameters:

Name	Type	Description	Default
`d`	`Dict[str, Any]`	Dictionary with metadata, data, and description.	required

Returns:

Type	Description
`Base`	Instance of Base or subclass.

to_json ¶

to_json(filepath=None, sidecar=False)

Export the object to JSON format, either as a string or to a file.

Parameters:

Name	Type	Description	Default
`filepath`	`Optional[str]`	Optional file path to write JSON (default: None).	`None`
`sidecar`	`bool`	bei True zusätzlich `<dir>/<sname>.meta.jsonld` schreiben (nur wirksam mit `filepath`; Default False = unverändertes Verhalten).	`False`

Returns:

Type	Description
`Optional[str]`	JSON string if filepath is None, else None.

from_json `classmethod` ¶

from_json(s)

Create an instance from a JSON string or file path.

Parameters:

Name	Type	Description	Default
`s`	`str`	JSON string or path to JSON file.	required

Returns:

Type	Description
`Base`	Instance of Base or subclass.

Raises:

Type	Description
`json.JSONDecodeError`	If invalid JSON.
`FileNotFoundError`	If file not found.

to_zip ¶

to_zip(filepath=None, *, compresslevel=6, deterministic=True)

Serialisiert das Objekt via to_json() und packt es als data.json in ein ZIP. - Wenn 'filepath' gesetzt ist, wird die ZIP-Datei geschrieben. - Rückgabe ist immer ein BytesIO, beginnend bei Position 0.

from_zip `classmethod` ¶

from_zip(src, *, member=None, encoding='utf-8')

Erzeugt eine Instanz aus einem ZIP, das eine JSON-Datei enthält. 'src' kann ein Dateipfad, Bytes, BytesIO oder beliebiger file-like Stream sein. - 'member': Name des JSON-Eintrags im ZIP. Wenn None: * nimm die einzige .json-Datei * oder, wenn nur ein Eintrag existiert, nimm diesen * sonst Fehler (Mehrdeutigkeit).

cls_from_spec ¶

cls_from_spec(sdata_spec='sdata.base:Base', on_error='strict', sdata_attrs=None, **kwargs)

Factory function to create an instance of a dynamically generated subclass.

Parameters:

Name	Type	Description	Default
`sdata_spec`	`Optional[str]`	The `module:class` spec to inherit from (default: `sdata.base:Base`).	`'sdata.base:Base'`
`sdata_attrs`	`Optional[Dict[str, Any]]`	Optional dict of custom attributes/methods to add to the class.	`None`
`kwargs`	`Any`	Keyword arguments to pass to the instance initialization.	`{}`

Returns:

Type	Description
`Any`	generated class.

sclass_factory ¶

sclass_factory(sdata_spec='sdata.base:Base', on_error='strict', sdata_attrs=None, **kwargs)

Factory function to create an instance of a dynamically generated subclass.

Parameters:

Name	Type	Description	Default
`sdata_spec`	`Optional[str]`	The `module:class` spec to inherit from (default: `sdata.base:Base`).	`'sdata.base:Base'`
`sdata_attrs`	`Optional[Dict[str, Any]]`	Optional dict of custom attributes/methods to add to the class.	`None`
`kwargs`	`Any`	Keyword arguments to pass to the instance initialization.	`{}`

Returns:

Type	Description
`Any`	An instance of the generated class.

sdata_factory ¶

sdata_factory(class_name, sdata_class=Base, sdata_attrs=None, **kwargs)

Factory function to create an instance of a dynamically generated subclass.

Parameters:

Name	Type	Description	Default
`class_name`	`str`	The name of the class to generate (e.g., "Material").	required
`sdata_class`	`Type`	The base class to inherit from (default: Base).	`Base`
`sdata_attrs`	`Optional[Dict[str, Any]]`	Optional dict of custom attributes/methods to add to the class.	`None`
`kwargs`	`Any`	Keyword arguments to pass to the instance initialization.	`{}`

Returns:

Type	Description
`Any`	An instance of the generated class.

sdata.sclass.dataframe¶

dataframe ¶

DataFrame ¶

Bases: ContentIntegrityMixin, Base

content_bytes `property` ¶

content_bytes

Binary serialization of the data (plain Parquet of the df, without the embedded sdata metadata).

The hook the inherited :class:~sdata.sclass.content.ContentIntegrityMixin hashes over — enables sha256/md5/sha1, size and verify()/update_checksum() directly on a :class:DataFrame. Hashing the data only keeps the checksum stable when metadata changes (otherwise storing the checksum in the metadata would alter the hash).

column_metadata `property` ¶

column_metadata

Retrieve the per-column metadata.

Returns:

Type	Description
`Metadata`	a :class:`~sdata.metadata.Metadata` holding one attribute per column (e.g. `label`/`unit` annotations).

col `property` ¶

col

Column-annotation accessor: df.col.weight / df.col['weight'].

Returns the column :class:Attribute; mutate its fields in place (df.col.weight.unit = 'kg') or use :meth:set_column.

column_units `property` ¶

column_units

Mapping {column: unit} (in df-column order) from column_metadata.

shape `property` ¶

shape

(nrows, ncols) of the underlying df.

columns `property` ¶

columns

Column index of the underlying df.

dtypes `property` ¶

dtypes

Per-column dtypes of the underlying df.

get_column ¶

get_column(name)

Return the column :class:~sdata.metadata.Attribute for name.

Parameters:

Name	Type	Description	Default
`name`		column name.	required

Returns:

Type	Description
`Optional[Attribute]`	the column's :class:`Attribute`, or `None` if unknown.

set_column ¶

set_column(name, *, unit=None, label=None, description=None, ontology=None, required=None, dtype=None)

Annotate a column; writes through to :attr:column_metadata.

Only the provided fields are changed; existing annotations are preserved (delegates to :meth:~sdata.metadata.Metadata.set_attr). A warning is logged if name is not a column of the current df.

Parameters:

Name	Description	Default
`name`	column name.	required
`unit`	physical unit (e.g. `"kg"`).	`None`
`label`	human-readable label.	`None`
`description`	free-text description.	`None`
`ontology`	CURIE/IRI of the column's class.	`None`
`required`	whether the column is required.	`None`
`dtype`	declared dtype string.	`None`

Returns:

Type	Description
`Attribute`	the (created or updated) column :class:`Attribute`.

validate_table ¶

validate_table(schema=None)

Validate the df/column_metadata against a :class:~sdata.schema.TableSchema.

Parameters:

Name	Type	Description	Default
`schema`		a `TableSchema`; defaults to the class-level :attr:`TABLE_SCHEMA`. Without any schema the result is trivially valid.	`None`

Returns:

Type	Description
	a :class:`~sdata.schema.ValidationReport` (truthy if valid).

head ¶

head(n=5)

First n rows of the underlying df (delegates to pandas.DataFrame.head).

describe ¶

describe(*args, **kwargs)

Descriptive statistics of the df (delegates to pandas.DataFrame.describe).

to_dict ¶

to_dict(engine='pyarrow')

Serialize to a dict, embedding the df as base64 Parquet plus column_metadata.

Parameters:

Name	Type	Description	Default
`engine`	`str`	Parquet engine for pandas (default `"pyarrow"`).	`'pyarrow'`

Returns:

Type	Description
`Dict[str, Any]`	dict with the :class:`~sdata.base.Base` payload plus `data['parquet_bytes']` and `data['column_metadata']`.

to_jsonld ¶

to_jsonld(context_mode='inline')

JSON-LD der Metadaten inkl. Spalten-Metadaten (csvw:column).

to_turtle ¶

to_turtle()

RDF/Turtle inkl. Spalten-Metadaten.

write_sidecar ¶

write_sidecar(path=None, indent=2)

Sidecar <sname>.meta.jsonld inkl. Spalten-Metadaten; gibt den Pfad zurück.

from_dict `classmethod` ¶

from_dict(d, engine='pyarrow')

Reconstruct a DataFrame from a dict produced by :meth:to_dict.

Restores metadata and column_metadata and decodes the df from the embedded base64 Parquet payload.

Parameters:

Name	Type	Description	Default
`d`	`Dict[str, Any]`	dict with `metadata` and `data.{parquet_bytes,column_metadata}`.	required
`engine`	`str`	Parquet engine for pandas (default `"pyarrow"`).	`'pyarrow'`

Returns:

Type	Description
`DataFrame`	a :class:`DataFrame` instance.

to_dataframe ¶

to_dataframe()

Return a copy of the pandas DataFrame with sdata metadata in df.attrs.

Metadata, column_metadata and description are embedded under the "_sdata" key (same layout as :meth:to_parquet), so that a round-trip through pandas keeps the annotations discoverable.

Returns: pandas.DataFrame: A copy of the DataFrame with attrs['_sdata'] set.

to_parquet ¶

to_parquet(path=None, filename=None, **kwargs)

Serialize this sdata.DataFrame to Parquet format, embedding metadata.

This method will copy the internal pandas DataFrame, attach SData metadata (dataset‐level metadata, per‐column metadata, and description) to df.attrs, and then write the result as a Parquet file. If no output path is given, it will return the Parquet bytes buffer.

Args: path (str, optional): Directory where the Parquet file will be saved. If provided (and filename is None), a file named <sname>.spq is created under this directory. filename (str, optional): Exact filename (without full path) for the output Parquet file. Defaults is <sname>.spq. **kwargs: Additional keyword arguments passed to pandas.DataFrame.to_parquet, e.g.: - engine (str): Parquet engine, defaults to "pyarrow". - compression (str): Compression codec, defaults to "zstd".

Returns: str or bytes: - If path (or filename) is provided, returns the full file path (str) where the Parquet file was written. - Otherwise, returns the in‐memory Parquet representation as bytes.

Example: # Save to disk under /data/output/.spq with default naming: out_fp = sdf.to_parquet(path="/data/output") print("Saved to:", out_fp)

# Get in‐memory Parquet bytes (no file on disk):
parquet_bytes = sdf.to_parquet()

from_parquet_bytes `classmethod` ¶

from_parquet_bytes(parquet_bytes, engine='pyarrow')

Load a DataFrame from in-memory Parquet bytes.

Parameters:

Name	Type	Description	Default
`parquet_bytes`		Parquet file content as bytes.	required
`engine`	`str`	Parquet engine for pandas (default `"pyarrow"`).	`'pyarrow'`

Returns:

Type	Description
	a :class:`DataFrame` instance.

from_parquet `classmethod` ¶

from_parquet(filepath, engine='pyarrow')

Load a DataFrame from a Parquet file on disk.

Parameters:

Name	Type	Description	Default
`filepath`		path to the `.spq`/Parquet file.	required
`engine`	`str`	Parquet engine for pandas (default `"pyarrow"`).	`'pyarrow'`

Returns:

Type	Description
	a :class:`DataFrame` instance.

Raises:

Type	Description
`FileNotFoundError`	if `filepath` does not exist.

to_csv ¶

to_csv(path=None, filename=None, sidecar=False, **kwargs)

Serialize the df to CSV (pure pandas, no extra dependency).

CSV carries data only; the qualifying metadata travels in the optional <sname>.meta.jsonld sidecar. The index is dropped by default (override via index=True).

Parameters:

Name	Description	Default
`path`	directory to write `<sname>.csv` into (if given).	`None`
`filename`	exact output filename (defaults to `<sname>.csv`).	`None`
`sidecar`	also write a JSON-LD metadata sidecar next to the file.	`False`
`kwargs`	forwarded to :meth:`pandas.DataFrame.to_csv`.	`{}`

Returns:

Type	Description
	the file path (if written to disk) or the CSV string.

from_csv `classmethod` ¶

from_csv(filepath, **kwargs)

Load a DataFrame from a CSV file (pure pandas).

Parameters:

Name	Type	Description	Default
`filepath`		path to the CSV file.	required
`kwargs`		forwarded to :func:`pandas.read_csv`.	`{}`

Returns:

Type	Description
	a :class:`DataFrame` instance (data only; use a sidecar for metadata).

Raises:

Type	Description
`FileNotFoundError`	if `filepath` does not exist.

to_arrow ¶

to_arrow()

Return a :class:pyarrow.Table with sdata metadata in the schema.

The dataset metadata, column_metadata and description are embedded as JSON under the b"_sdata" schema-metadata key. In addition, each column's unit/label/description/ontology are attached natively to that column's Arrow field metadata, so Arrow-aware tools (DuckDB, Polars, pyarrow) can read the per-column annotations without sdata.

Returns:

Type	Description
	a `pyarrow.Table`.

Raises:

Type	Description
`ImportError`	if pyarrow is not installed (`pip install sdata[parquet]`).

from_arrow `classmethod` ¶

from_arrow(table)

Build a DataFrame from a :class:pyarrow.Table written by :meth:to_arrow.

The b"_sdata" schema blob is restored if present; per-column Arrow field metadata (unit/label/...) is also merged into column_metadata, so tables produced by other Arrow-native tools keep their column annotations.

Parameters:

Name	Type	Description	Default
`table`		a `pyarrow.Table` (sdata metadata restored if present).	required

Returns:

Type	Description
	a :class:`DataFrame` instance.

Raises:

Type	Description
`ImportError`	if pyarrow is not installed.

to_feather ¶

to_feather(path=None, filename=None, sidecar=False, **kwargs)

Serialize to the Feather (Arrow IPC) format, embedding sdata metadata.

Parameters:

Name	Description	Default
`path`	directory to write `<sname>.feather` into (if given).	`None`
`filename`	exact output filename (defaults to `<sname>.feather`).	`None`
`sidecar`	also write a JSON-LD metadata sidecar next to the file.	`False`
`kwargs`	forwarded to :func:`pyarrow.feather.write_feather`.	`{}`

Returns:

Type	Description
	the file path (if written to disk) or the Feather bytes.

Raises:

Type	Description
`ImportError`	if pyarrow is not installed.

from_feather `classmethod` ¶

from_feather(filepath)

Load a DataFrame from a Feather file written by :meth:to_feather.

Parameters:

Name	Type	Description	Default
`filepath`		path to the `.feather` file.	required

Returns:

Type	Description
	a :class:`DataFrame` instance.

Raises:

Type	Description
`FileNotFoundError`	if `filepath` does not exist.
`ImportError`	if pyarrow is not installed.

as_blob ¶

as_blob(fmt='parquet', **kwargs)

Render the table as a standalone :class:~sdata.sclass.blob.Blob.

The df is serialized once (in the chosen fmt) into a fixed bytes-content Blob — a binary snapshot that can be hashed, signed, stored or transferred like any other asset. This is composition, not inheritance: a living, multi-format table is rendered into one chosen format on demand (RFC 0004, Option C). The Blob's checksum is filled (update_checksum), so blob.verify() works out of the box.

Parameters:

Name	Type	Description	Default
`fmt`	`str`	serialization format — `"parquet"` (default), `"csv"`, `"arrow"` or `"feather"`.	`'parquet'`
`kwargs`		forwarded to :class:`~sdata.sclass.blob.Blob` (`name` and `description` default to this DataFrame's).	`{}`

Returns:

Type	Description
	a :class:`~sdata.sclass.blob.Blob` with the serialized bytes.

Raises:

Type	Description
`ValueError`	if `fmt` is not a supported format.
`ImportError`	if the format needs pyarrow and it is not installed.

to_datapackage ¶

to_datapackage(path=None, filename=None, fmt='csv', sidecar=True)

Write a Frictionless Data Package (.zip) — a portable bundle.

The zip holds a standard datapackage.json descriptor (so generic Frictionless tooling can read it), the data as CSV or Parquet, and — for a lossless sdata round-trip — the full sdata metadata under the descriptor's "sdata" key. Optionally the <sname>.meta.jsonld JSON-LD sidecar.

Parameters:

Name	Description	Default
`path`	directory to write `<sname>.zip` into (if given).	`None`
`filename`	exact output filename (defaults to `<sname>.zip`).	`None`
`fmt`	data format inside the package, `"csv"` (default) or `"parquet"`.	`'csv'`
`sidecar`	also embed the JSON-LD sidecar in the zip (default `True`).	`True`

Returns:

Type	Description
	the file path (if written to disk) or the zip bytes.

Raises:

Type	Description
`ValueError`	if `fmt` is not `"csv"`/`"parquet"`.

from_datapackage `classmethod` ¶

from_datapackage(filepath)

Load a DataFrame from a Data Package .zip written by :meth:to_datapackage.

Restores the data and — losslessly — metadata/column_metadata/description from the descriptor's "sdata" block.

Parameters:

Name	Type	Description	Default
`filepath`		path to the `.zip` data package.	required

Returns:

Type	Description
	a :class:`DataFrame` instance.

Raises:

Type	Description
`FileNotFoundError`	if `filepath` does not exist.

to_hdf ¶

to_hdf(path=None, filename=None, key=None, sidecar=False, **kwargs)

Serialize the df to HDF5 (PyTables), embedding sdata metadata as a node attr.

HDF5 has no in-memory bytes form, so a path/filename is required. The sdata metadata (metadata/column_metadata/description) is stored as the node's _sdata attribute; several DataFrames can share one file via distinct key.

Parameters:

Name	Description	Default
`path`	directory to write `<sname>.h5` into.	`None`
`filename`	exact output filename (defaults to `<sname>.h5`).	`None`
`key`	HDF5 node/key (default: `self.sname`).	`None`
`sidecar`	also write a JSON-LD metadata sidecar next to the file.	`False`
`kwargs`	forwarded to `pandas.HDFStore.put` (e.g. `format`, `complevel`, `complib`).	`{}`

Returns:

Type	Description
	the file path.

Raises:

Type	Description
`ImportError`	if PyTables is not installed (`pip install sdata[hdf]`).
`ValueError`	if neither `path` nor `filename` is given.

from_hdf `classmethod` ¶

from_hdf(filepath, key=None)

Load a DataFrame from an HDF5 file written by :meth:to_hdf.

Parameters:

Name	Type	Description	Default
`filepath`		path to the `.h5` file.	required
`key`		HDF5 node/key to read (default: the first key in the file).	`None`

Returns:

Type	Description
	a :class:`DataFrame` instance.

Raises:

Type	Description
`FileNotFoundError`	if `filepath` does not exist.
`ImportError`	if PyTables is not installed.

sdata.sclass.blob¶

blob ¶

Blob ¶

Bases: ContentIntegrityMixin, Base

A derived class from Base that represents a generic binary large object (Blob). Stores the content in self.data['content'] as a dictionary with: - 'type': 'bytes' for in-memory bytes (base64-encoded for serialization) or 'uri' for a filesystem URI (local path, S3 object, Zip path, etc., handled via fsspec). - 'value': The base64-encoded bytes string (for 'bytes') or the URI string (for 'uri'). - 'filetype': The file type (e.g., 'pdf', 'png', 'jpg', 'txt', or any custom type). This is always stored and serialized.

Additionally, integrates hash calculations (SHA1 and MD5) from the provided class for integrity checks. The actual bytes are loaded lazily when accessed via .content_bytes property, ensuring large content is not loaded unless explicitly requested. For serialization in to_dict(), bytes are base64-encoded if type is 'bytes'; URIs are kept as-is. Supports PDFs, images (png, jpg), or any arbitrary file types. Uses fsspec to handle various URI schemes: - Local file: 'file:///path/to/file.pdf' or simply '/path/to/file.pdf' - S3: 's3://bucket/key.pdf' - Zip: 'zip://innerfile.txt::/path/to/outer.zip'

content_bytes `property` ¶

content_bytes

Lazily load and retrieve the content as bytes (only when this property is accessed). If type is 'uri', use fsspec to open and read; if 'bytes', decode from base64. Caches the result for subsequent accesses.

Returns:

Type	Description
`bytes`	The content as bytes.

Raises:

Type	Description
`ValueError`	If loading fails or no value set.
`Exception`	If fsspec encounters an error (e.g., invalid URI, missing dependencies like s3fs for S3).

filetype `property` ¶

filetype

Retrieve the filetype (no content loading required).

Returns:

Type	Description
`str`	The filetype string (e.g., 'pdf', 'png', 'jpg').

set_content ¶

set_content(content_type, value, filetype=None)

Update the content type, value, and optionally filetype. Clears any cached content to maintain lazy loading.

Parameters:

Name	Type	Description	Default
`content_type`	`ContentType`	New content_type ('bytes' or 'uri').	required
`value`	`Any`	New value (bytes or URI str).	required
`filetype`	`Optional[str]`	Optional new filetype (e.g., 'pdf', 'png', 'jpg').	`None`

exists ¶

exists()

Test whether the blob content exists. For 'uri', checks if the path/URI is accessible via fsspec; for 'bytes', always True if value set.

write ¶

write(uri, **kwargs)

Write the content to a destination uri via fsspec (local/S3/zip/…).

Parameters:

Name	Type	Description	Default
`uri`	`str`	destination URI/path (e.g. `/out/file.pdf`, `s3://bucket/key`).	required
`kwargs`	`Any`	forwarded to `fsspec.open`.	`{}`

Returns:

Type	Description
`str`	the destination `uri`.

Raises:

Type	Description
`ImportError`	if fsspec is not installed (`pip install sdata[blob]`).

open ¶

open(mode='rb')

Return a file-like handle to the content; use as a context manager.

For uri content a streaming fsspec handle is returned (no full in-memory load); for bytes content an :class:io.BytesIO over the decoded bytes.

Parameters:

Name	Type	Description	Default
`mode`	`str`	open mode (default `"rb"`).	`'rb'`

Raises:

Type	Description
`ValueError`	if no content/value is set or the content_type is unknown.
`ImportError`	if fsspec is required (uri) but not installed.

to_dict ¶

to_dict()

Extend Base.to_dict to include the content dict as-is (with base64 for bytes and filetype). Does not include or load the actual content bytes.

from_dict `classmethod` ¶

from_dict(d)

Create a Blob instance from a dictionary. Restores content dict including filetype; no content loading occurs here (lazy via content_bytes).

sdata.imagemeta¶

imagemeta ¶

Native, format-übergreifende Einbettung von sdata-Metadaten in Bild-Bytes.

Reiner Python-Code (keine Pillow-Abhängigkeit für die Metadaten-Schicht): ein Text-Payload (i. d. R. das sdata-Metadaten-JSON) wird über eine einheitliche API verlustfrei in den jeweiligen Bildcontainer geschrieben bzw. daraus gelesen.

Unterstützte Container und ihr nativer Träger:

PNG — iTXt-Chunk mit Schlüsselwort sdata (UTF-8)
JPEG — APP1-Segment mit Kennung sdata\0 (UTF-8)
JP2 — uuid-Box (JPEG 2000, ISO BMFF) mit fester sdata-UUID
GIF — Comment-Extension mit Präfix sdata\0
WebP — eigener RIFF-Chunk sdAT (von Decodern als unbekannt ignoriert)
TIFF — privates IFD-Tag (65000); die Original-Bytes bleiben unverändert

Das Format wird an den Magic-Bytes erkannt (:func:detect_format); :func:embed und :func:extract wählen den passenden Handler. Die Schreibsemantik ist replace (eine vorhandene sdata-Nutzlast wird ersetzt, nicht dupliziert). Pillow (optional) wird nur zum Transkodieren der Pixel benötigt, nicht für die Metadaten — das Lesen funktioniert daher vollständig Pillow-frei.

:Example:

from sdata import imagemeta png_with_meta = imagemeta.embed(png_bytes, '{"name": "probe"}') # doctest: +SKIP imagemeta.extract(png_with_meta) # doctest: +SKIP '{"name": "probe"}'

ImageMetadataError ¶

Bases: Exception

Basisfehler der Bild-Metadaten-Schicht.

UnsupportedImageFormatError ¶

Bases: ImageMetadataError

Das Bildformat wird (zum Schreiben) nicht unterstützt.

PayloadTooLargeError ¶

Bases: ImageMetadataError

Die Nutzlast passt nicht in ein einzelnes Format-Segment (z. B. JPEG APP1).

detect_format ¶

detect_format(data)

Erkenne das Bildformat an den Magic-Bytes.

Parameters:

Name	Type	Description	Default
`data`	`bytes`	die Bild-Bytes.	required

Returns:

Type	Description
`Optional[str]`	`"png"`/`"jpeg"`/`"jp2"`/`"gif"`/`"webp"` oder `None`.

supported_formats ¶

supported_formats()

Die unterstützten Format-Schlüssel (Reihenfolge der Registry).

embed ¶

embed(data, payload, fmt=None)

Bette payload (Text) nativ in die Bild-Bytes ein (replace-Semantik).

Parameters:

Name	Type	Description	Default
`data`	`bytes`	die Original-Bild-Bytes.	required
`payload`	`str`	der einzubettende Text (i. d. R. sdata-Metadaten-JSON).	required
`fmt`	`Optional[str]`	Format-Schlüssel; `None` → automatische Erkennung.	`None`

Returns:

Type	Description
`bytes`	neue Bild-Bytes mit eingebetteter sdata-Nutzlast.

Raises:

Type	Description
`UnsupportedImageFormatError`	wenn das Format unbekannt/nicht unterstützt ist.
`PayloadTooLargeError`	wenn die Nutzlast nicht in ein Segment passt (JPEG).

extract ¶

extract(data, fmt=None)

Lies eine eingebettete sdata-Nutzlast aus den Bild-Bytes (Pillow-frei).

Lenient beim Lesen: unbekannte/nicht unterstützte Formate liefern None (kein Fehler), ebenso Bilder ohne eingebettete sdata-Nutzlast.

Parameters:

Name	Type	Description	Default
`data`	`bytes`	die Bild-Bytes.	required
`fmt`	`Optional[str]`	Format-Schlüssel; `None` → automatische Erkennung.	`None`

Returns:

Type	Description
`Optional[str]`	die eingebettete Nutzlast (Text) oder `None`.

sdata.sclass.image¶

image ¶

Image — ein :class:~sdata.sclass.blob.Blob über Bild-Inhalt.

Der Bild-Inhalt liegt als Blob-Content (uri für Dateien, bytes für In-Memory-Daten). sdata-Metadaten werden format-übergreifend nativ in die Bilddatei eingebettet (PNG/JPEG/JP2/GIF/WebP/TIFF) — über :mod:sdata.imagemeta, das ohne Pillow auskommt. Formate ohne nativen Metadaten-Träger (z. B. BMP) erhalten einen verlustfreien <filepath>.meta.json-Sidecar; die save/ from_file-API ist für alle Formate identisch. Pillow wird nur lazy zum Dekodieren/Transkodieren der Pixel genutzt (:attr:Image.pil/ :meth:Image.to_numpy/:meth:Image.save bei Formatwechsel) und ist optional (pip install pillow).

Image ¶

Bases: Blob

Image object based on :class:~sdata.sclass.blob.Blob.

pil `property` ¶

pil

The image decoded lazily with Pillow (:class:PIL.Image.Image).

basename `property` ¶

basename

The image file base name (== name).

from_file `classmethod` ¶

from_file(filepath, project=None, ns_name=None, **kwargs)

Create an Image referencing an image file (kept as uri content).

Any sdata metadata is read back and merged: natively embedded (PNG/JPEG/JP2/GIF/WebP/TIFF, Pillow-free) and/or from an adjacent <filepath>.meta.json sidecar (for formats without a native container).

Parameters:

Name	Description	Default
`filepath`	path to the image file.	required
`project`	namespace for the deterministic SUUID (alias of `ns_name`).	`None`
`ns_name`	namespace for the deterministic SUUID.	`None`
`kwargs`	forwarded to :class:`Blob`/:class:`~sdata.base.Base`.	`{}`

Returns:

Type	Description
	an :class:`Image` instance.

from_bytes `classmethod` ¶

from_bytes(name, image_data, project=None, **kwargs)

Create an Image from in-memory image bytes.

Any embedded sdata metadata is read back and merged (Pillow-free).

Parameters:

Name	Description	Default
`name`	a name for the image (its suffix sets the filetype).	required
`image_data`	the raw image bytes.	required
`project`	namespace for the deterministic SUUID.	`None`

Returns:

Type	Description
	an :class:`Image` instance.

to_numpy ¶

to_numpy()

The image as a NumPy array (colour channels).

embedded_metadata ¶

embedded_metadata()

Return the sdata metadata embedded in the image bytes, or None.

Reads the native sdata payload (PNG iTXt / JPEG APP1 / JP2 uuid box / GIF comment / WebP sdAT chunk) without Pillow.

Returns:

Type	Description
	a :class:`~sdata.metadata.Metadata`, or `None` if absent/invalid.

sidecar_path `staticmethod` ¶

sidecar_path(filepath)

Path of the metadata sidecar for filepath (<filepath>.meta.json).

write_sidecar ¶

write_sidecar(filepath)

Write the sdata metadata next to filepath as a lossless JSON sidecar.

The sidecar carries the same payload as the embedded form (metadata.to_json()), so a round-trip is lossless regardless of whether the format has a native metadata container.

Parameters:

Name	Type	Description	Default
`filepath`		the image path the sidecar belongs to.	required

Returns:

Type	Description
`str`	the sidecar path (`<filepath>.meta.json`).

save ¶

save(filepath, sidecar=None, **kwargs)

Save the image to filepath with sdata metadata — one API for all formats.

The container is chosen from the file suffix. For a format with a native metadata container (PNG/JPEG/JP2/GIF/WebP/TIFF) the metadata is embedded: without re-encoding if the stored bytes already use that container (lossless, Pillow-free), otherwise Pillow transcodes first. For any other format (e.g. BMP) the image is written via Pillow and the metadata travels in a lossless <filepath>.meta.json sidecar — so metadata is never lost.

Parameters:

Name	Description	Default
`filepath`	destination path (its suffix selects the format).	required
`sidecar`	sidecar policy — `None` (default) writes a sidecar only when the format has no native container; `True` always writes one (in addition to embedding); `False` never writes one.	`None`
`kwargs`	forwarded to `PIL.Image.save` when transcoding.	`{}`

Returns:

Type	Description
	the destination `filepath`.

Raises:

Type	Description
`ImportError`	if Pillow is required (transcode / non-native format) but not installed.

sdata.schema¶

schema ¶

Metadaten-Schemata / Templates zum Vollqualifizieren von Metadaten.

Ein :class:MetadataSchema deklariert die erwarteten Attribute einer Datenklasse (Name, dtype, Einheit, required, ontology, …) und kann Metadaten dagegen validieren sowie vervollständigen (Defaults/Einheiten/Ontologie auffüllen).

Reine-Python-Validierung ist immer verfügbar; mit dem optionalen Extra [schema] (jsonschema) lässt sich zusätzlich gegen ein generiertes JSON Schema prüfen.

AttrSpec `dataclass` ¶

Spezifikation eines erwarteten Attributs.

ValidationReport `dataclass` ¶

Ergebnis einer Schema-Validierung (truthy, wenn ok).

MetadataSchema ¶

Sammlung von :class:AttrSpec für eine Datenklasse.

validate ¶

validate(metadata)

Prüfe metadata gegen das Schema (wirft nie; liefert ValidationReport).

apply ¶

apply(metadata)

Vervollständige metadata in-place: fehlende Attribute aus Defaults anlegen, vorhandene um Einheit/Ontologie/Beschreibung ergänzen. Gibt metadata zurück (für nicht-destruktiv: schema.apply(md.copy())).

to_json_schema ¶

to_json_schema()

Generiere ein JSON Schema (Draft 2020-12) für metadata.get_udict().

validate_jsonschema ¶

validate_jsonschema(metadata)

Validiere via jsonschema (falls installiert), sonst native validate.

TableSchema ¶

Spalten-Schema für eine :class:~sdata.sclass.dataframe.DataFrame.

Deklariert je Spalte den erwarteten name/dtype/unit/required (wiederverwendet :class:AttrSpec) und kann ein DataFrame dagegen validieren sowie dessen column_metadata aus dem Schema vervollständigen.

Der dtype wird gegen die tatsächlichen df.dtypes geprüft, die Einheit gegen die column_metadata-Annotation der Spalte.

validate ¶

validate(dataframe)

Prüfe ein :class:DataFrame gegen das Spalten-Schema.

Wirft nie; liefert einen :class:ValidationReport mit fehlenden Spalten, dtype-Abweichungen (gegen df.dtypes), Einheiten-Abweichungen (gegen column_metadata) und zusätzlichen (nicht spezifizierten) Spalten.

apply ¶

apply(dataframe)

Vervollständige column_metadata in-place aus dem Schema.

Für jede im df vorhandene Schema-Spalte: fehlende Annotation anlegen bzw. leere unit/ontology/description auffüllen. Gibt dataframe zurück.

sdata.dtypes¶

dtypes ¶

Reine-stdlib dtype-Registry für :class:sdata.metadata.Attribute.

Single Source of Truth für Wert-Coercion, JSON-(De)Serialisierung, die zu einem dtype gehörige Python-Klasse und das XSD-Typ-Mapping (das die JSON-LD-Schicht konsumiert).

Designziele:

Rückwärtskompatibel – im lenienten Default-Modus (strict=False) verhält sich die Coercion byte-genau wie der bisherige Attribute._set_value: leere/falsy Werte werden je dtype zu np.nan / None / "" / False; nicht-castbare Werte lösen DtypeError aus (vom Setter geloggt, Wert bleibt unverändert).
Strikt opt-in – strict=True wirft DtypeError statt still zu degradieren.
Erweiterbar – neben den 6 Alt-dtypes (str/int/float/bool/timestamp/list) zusätzlich bytes (base64), json (dict/list), uri, date, time, duration (ISO 8601 / timedelta), decimal (exakt), complex, floatlist (typisierte Float-Liste) sowie langstring (sprach-getaggt, rdf:langString). complex/floatlist haben keinen Standard-XSD-Typ und nutzen eigene Datentyp-CURIEs (sdata:complex / sdata:floatlist); langstring wird in JSON-LD über @language ausgedrückt.

DtypeError ¶

Bases: ValueError

Wert kann nicht in den Ziel-dtype überführt werden (v.a. im strict-Modus).

LangString ¶

Ein sprach-getaggter String (rdf:langString): text + BCP-47 lang.

In JSON-LD als {"@value": text, "@language": lang} repräsentiert (nicht über @type). Die kompakte Textform ist "text@lang" (z. B. "Hallo@de").

DtypeSpec ¶

Beschreibt einen dtype: Coercion, JSON-Repräsentation, Klasse, XSD-Typ.

get ¶

get(name)

DtypeSpec zum kanonischen Namen oder None.

names ¶

names()

Liste aller kanonischen dtype-Namen.

xsd_map ¶

xsd_map()

Kopie der {dtype_name: xsd_iri}-Tabelle (für die JSON-LD-Schicht).

resolve ¶

resolve(dtype)

Normalisiere einen dtype-Input (String ODER Klasse) auf einen Registry-Key.

Spiegelt die bisherigen _set_dtype-Regeln: Klassen via DTYPES_INV, 'float64'/'int32' -> 'float'/'int', Unbekanntes -> 'str'. None -> None (kein dtype-Wechsel).

coerce ¶

coerce(value, dtype, strict=False)

Überführe value in dtype (String oder Klasse).

json_default ¶

json_default(obj)

default= für json.dumps: serialisiert die nicht-nativen dtype-Werte (TimeStamp/bytes/Decimal/timedelta/date/time) JSON-sicher.

sdata.semantic¶

semantic ¶

JSON-LD-/Linked-Data-Serialisierung für sdata-Metadaten + Sidecar-Dateien.

Wandelt eine :class:sdata.metadata.Metadata in ein selbstbeschreibendes JSON-LD-Dokument (und zurück) und schreibt/liest optionale Sidecar-Dateien <sname>.meta.jsonld neben einem Datenblob.

Modellierung (siehe Projektentscheidungen):

@id = DID des Objekts did:suuid:<sname>:sdata.
@type = [sdata:<Klasse>, <BFO-IRI der Topologieklasse>].
Reservierte _sdata_*-Felder -> schema.org/DCAT/PROV-Terme (name, identifier, generatedAtTime, wasDerivedFrom, isPartOf, …).
Hybrid für User-Attribute: mit echter Einheit -> qudt:Quantity-Knoten; ohne Einheit aber mit ontology -> getypter Knoten; sonst einfaches typisiertes Literal. ontology ist stets ein @type/Klasse des Werts; das Prädikat ist sdata:<name>.

Pure-Python; keine Pflicht-Dependency.

column_node ¶

column_node(attr)

JSON-LD-Knoten (CSVW) für eine Tabellenspalte aus column_metadata.

attr.name = Spaltenname, attr.value = pandas-dtype-Name (z.B. float64); optional unit/label/description.

to_jsonld ¶

to_jsonld(metadata, context_mode='inline', columns=None)

Serialisiere metadata als JSON-LD-Dokument (dict).

Parameters:

Name	Type	Description	Default
`columns`		optionale, geordnete Iterable von Spalten-`Attribute`s (DataFrame, in df-Spaltenreihenfolge); wird als `csvw:column`-Liste ergänzt.	`None`

from_jsonld ¶

from_jsonld(doc)

Rekonstruiere eine :class:Metadata aus einem JSON-LD-Dokument (dict oder str).

to_rdf ¶

to_rdf(metadata, fmt='turtle')

Serialisiere die Metadaten als RDF.

Mit installiertem rdflib ([rdf]) wird das JSON-LD nach fmt (turtle/nt/xml/…) serialisiert. Ohne rdflib wird das JSON-LD selbst zurückgegeben – application/ld+json ist bereits gültiges RDF.

rdf_from_doc ¶

rdf_from_doc(doc, fmt='turtle')

Serialisiere ein bereits gebautes JSON-LD-Dokument als RDF (siehe :func:to_rdf).

to_turtle ¶

to_turtle(metadata)

Convenience: RDF im Turtle-Format (siehe :func:to_rdf).

to_verifiable_credential ¶

to_verifiable_credential(metadata, issuer_did, issuer_priv_jwk, kid=None, extra_claims=None)

Signiere die Metadaten als W3C Verifiable Credential (Compact-JWS, EdDSA).

Wickelt :func:to_jsonld als credentialSubject und signiert über den pure-Python-EdDSA-Stack (:mod:sdata.did.jose) – keine externe Krypto.

Returns:

Type	Description
	Compact-JWS-String (`header.payload.signature`)

verify_credential ¶

verify_credential(vc_jws, pub_jwk)

Verifiziere ein VC (Compact-JWS) und gib das credentialSubject (JSON-LD) zurück.

Raises:

Type	Description
`sdata.did.errors.VerificationError`	bei ungültiger Signatur.

write_sidecar ¶

write_sidecar(metadata, path=None, indent=2)

Schreibe <sname>.meta.jsonld (JSON-LD) neben einen Blob; gibt den Pfad zurück.

write_sidecar_doc ¶

write_sidecar_doc(doc, path, sname, indent=2)

Schreibe ein bereits gebautes JSON-LD-Dokument als <sname>.meta.jsonld.

read_sidecar ¶

read_sidecar(path)

Lade eine .meta.jsonld-Sidecar-Datei und rekonstruiere die Metadata.

sdata.units¶

units ¶

Einheiten-Vokabular: kuratierte Abbildung Einheit → QUDT-IRI / UCUM-Code.

Reine-Python-Tabelle (keine Abhängigkeit). Optionales Extra [units] = pint erweitert die Validierung auf beliebige parsebare Einheiten; ohne pint greift die kuratierte Tabelle.

has_pint ¶

has_pint()

True, falls das optionale pint-Backend verfügbar ist.

normalize_symbol ¶

normalize_symbol(symbol)

Trimme/normalisiere ein Einheiten-Symbol auf einen kanonischen Schlüssel.

qudt_iri ¶

qudt_iri(symbol)

QUDT-Einheiten-IRI-CURIE für ein Symbol oder None (unbekannt/dimensionslos).

ucum_code ¶

ucum_code(symbol)

UCUM-Code für ein Symbol; Fallback ist das (getrimmte) Symbol selbst.

unit_node ¶

unit_node(symbol)

JSON-LD-Fragment für eine Einheit: {"unitRef": <iri>, "symbol": <sym>}.

unitRef entfällt, wenn keine QUDT-IRI bekannt ist (z.B. "-" oder unbekannte Einheit) – das rohe symbol bleibt stets erhalten.

validate_unit ¶

validate_unit(symbol)

True, wenn die Einheit bekannt (kuratierte Tabelle) oder – mit pint – parsebar ist.

sdata.vocab¶

vocab ¶

Vokabular & JSON-LD-@context für die semantische Metadaten-Schicht.

Reine-Python-Tabellen (keine Abhängigkeit): Namespace-Präfixe, ein @context-Builder, das XSD-Typ-Mapping (aus :mod:sdata.dtypes) und die Auflösung von BFO-Topologieklassen sowie von Attribut-ontology-Annotationen.

Designentscheidungen (siehe Projektplan):

Identität: @id eines Objekts ist seine DID did:suuid:<sname>:sdata.
ontology = immer @type/Klasse des Werts (z.B. bfo:Quality); das Subjekt→Wert-Prädikat kommt NICHT aus ontology, sondern ist sdata:<name> (bzw. ein gemappter schema.org/qudt-Term).

build_context ¶

build_context(mode='inline', extra=None)

Liefere das JSON-LD-@context (inline = vollständige Term-Map) oder – bei mode="url" – die gehostete Context-URL.

Parameters:

Name	Type	Description	Default
`mode`		`"inline"` (default) oder `"url"`	`'inline'`
`extra`		optionale zusätzliche Term-Definitionen	`None`

xsd_for_dtype ¶

xsd_for_dtype(dtype)

XSD-Typ-CURIE für einen sdata-dtype (String oder Klasse).

expand_curie ¶

expand_curie(curie)

Expandiere eine CURIE (schema:name) zur vollen IRI; IRIs/Unbekanntes werden unverändert zurückgegeben.

bfo_iri ¶

bfo_iri(topology_class)

Mappe einen Topologieklassen-String ("sdata.sclass:IndependentContinuant") auf die BFO-CURIE oder None, falls unbekannt/leer.

safe_term ¶

safe_term(name)

Normalisiere einen Attributnamen zu einem CURIE-tauglichen lokalen Teil.

predicate_for ¶

predicate_for(attr_name)

Subjekt→Wert-Prädikat für ein User-Attribut: Default sdata:<safe_name>.

type_iri ¶

type_iri(ontology)

@type/Klasse eines Werts aus dem ontology-Feld (CURIE/IRI) oder None. (ontology ist per Projektentscheidung stets eine Klasse.)

sdata.interactive¶

interactive ¶

Interaktive Ergonomie für Metadaten: Jupyter-_repr_html_, Attribut- Autocomplete (m.a.force_x) und Prefix-Filterung.

Reine Standardbibliothek; die Renderer bauen HTML manuell (keine pandas-Styler-/ tabulate-Abhängigkeit).

AttrAccessor ¶

Attribut-Autocomplete-Helfer: m.a.force_x liefert/setzt ein Attribut.

Liegt bewusst auf einem eigenen Objekt (nicht auf :class:Metadata), damit Tab-Completion nur Attributnamen zeigt und keine Methoden überschattet.

ColumnAccessor ¶

Spalten-Annotation-Autocomplete: df.col.weight / df.col['weight'].

Liefert das Spalten-:class:Attribute aus column_metadata; dessen Felder lassen sich direkt mutieren (df.col.weight.unit = 'kg'). Zum Anlegen/Setzen ganzer Felder dient :meth:~sdata.sclass.dataframe.DataFrame.set_column.

Liegt – wie :class:AttrAccessor – bewusst auf einem eigenen Objekt, damit die Tab-Completion nur Spaltennamen zeigt und keine Methoden überschattet.

attribute_html ¶

attribute_html(attr)

Einzeilige HTML-Tabelle für ein :class:Attribute.

metadata_html ¶

metadata_html(metadata)

HTML-Darstellung einer :class:Metadata (Kopf + User-Attribut-Tabelle).

sdata.timestamp¶

timestamp ¶

ISO 8601 date time string parsing

Basic usage:

#>>> parse_date("2007-01-25T12:00:00Z")

datetime.datetime(2007, 1, 25, 12, 0, tzinfo=)

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

ParseError ¶

Bases: Exception

Raised when there is a problem parsing a date string

Utc ¶

Bases: tzinfo

UTC Timezone

FixedOffset ¶

Bases: tzinfo

Fixed offset in hours and minutes from UTC

TimeStamp ¶

Bases: object

2017-04-26T09:04:00.660000+00:00

utc `property` ¶

utc

returns the utc.isoformat string

Returns:

Type	Description
	str

local `property` ¶

local

returns the datetime isoformat string for the local timezone

Returns:

Type	Description
	str

to_int ¶

to_int(d, key, default_to_zero=False, default=None, required=True)

Pull a value from the dict and convert to int

Parameters:

Name	Type	Description	Default
`default_to_zero`		If the value is None or empty, treat it as zero	`False`
`default`		If the value is missing in the dict use this default	`None`

parse_timezone ¶

parse_timezone(matches, default_timezone=UTC)

Parses ISO 8601 time zone specs into tzinfo offsets

parse_date ¶

parse_date(datestring, default_timezone=UTC)

Parses ISO 8601 dates into datetime objects

The timezone is parsed from the date string. However it is quite common to have dates without a timezone (not strictly correct). In this case the default timezone specified in default_timezone is used. This is UTC by default.

Parameters:

Name	Type	Description	Default
`datestring`		The date to parse as a string	required
`default_timezone`		A datetime tzinfo instance to use when no timezone is specified in the datestring. If this is set to None then a naive datetime object is returned.	`UTC`

Returns:

Type	Description
	A datetime.datetime instance

Raises:

Type	Description
`ParseError`	when there is a problem parsing the date or constructing the datetime instance.

local_tzname ¶

local_tzname()

determine 'Etc/GMT%+d', e.g. 'Etc/GMT-2'

get_utc_timestamp ¶

get_utc_timestamp(dt)

datetime --> 2017-04-26T09:04:00.660000+00:00

get_local_timestamp ¶

get_local_timestamp(dt)

datetime --> 2017-04-26T09:04:00.660000+02:00

today_str ¶

today_str()

create timestamp for today (utc)

Returns:

Type	Description
	'2020-12-11T00:00:00+00:00'

now_local_str ¶

now_local_str()

create timestamp for now for local timezone

Returns:

Type	Description
	'2020-12-11T00:00:00+00:00'

now_utc_str ¶

now_utc_str()

create timestamp for now for utc timezone

Returns:

Type	Description
	'2020-12-11T00:00:00+00:00'

API reference¶

sdata.metadata¶

metadata ¶

Attribute ¶

guess_dtype staticmethod ¶

to_dict ¶

to_csv ¶

Metadata ¶

df property ¶

udf property ¶

dft property ¶

sdf property ¶

sdft property ¶

a property ¶

size property ¶

sha3_256 property ¶

add_attribute ¶

set_attr ¶

get_attr ¶

to_dict ¶

guess_dtype_from_value staticmethod ¶

update_from_dict ¶

from_dict classmethod ¶

get_sdict ¶

get_udict ¶

get_dict ¶

to_dataframe ¶

get_dft ¶

from_dataframe classmethod ¶

update_from_usermetadata ¶

to_csv ¶

to_csv_header ¶

from_csv classmethod ¶

to_json ¶

from_json classmethod ¶

to_list ¶

from_list classmethod ¶

to_jsonld ¶

from_jsonld classmethod ¶

get_prefixed ¶

validate ¶

apply_schema ¶

to_rdf ¶

to_verifiable_credential ¶

from_verifiable_credential classmethod ¶

to_turtle ¶

write_sidecar ¶

read_sidecar classmethod ¶

add ¶

relabel ¶

keys ¶

values ¶

items ¶

copy ¶

update_hash ¶

set_unit_from_name ¶

guess_value_dtype ¶

is_complete ¶

extract_name_unit ¶

sdata.base¶

base ¶

Base ¶

md property ¶

mdf property ¶

osname property ¶

class_name property ¶

uuid property ¶

huuid property ¶

suuid property ¶

suuid_bytes property ¶

did property ¶

parent property ¶

project property ¶

udf property ¶

sdf property ¶

get_sdata_spec classmethod ¶

validate ¶

get_sdata_did_method ¶

to_jsonld ¶

to_turtle ¶

guess_dtype `staticmethod` ¶

df `property` ¶

udf `property` ¶

dft `property` ¶

sdf `property` ¶

sdft `property` ¶

a `property` ¶

size `property` ¶

sha3_256 `property` ¶

guess_dtype_from_value `staticmethod` ¶

from_dict `classmethod` ¶

from_dataframe `classmethod` ¶

from_csv `classmethod` ¶

from_json `classmethod` ¶

from_list `classmethod` ¶

from_jsonld `classmethod` ¶

from_verifiable_credential `classmethod` ¶

read_sidecar `classmethod` ¶

md `property` ¶

mdf `property` ¶

osname `property` ¶

class_name `property` ¶

uuid `property` ¶

huuid `property` ¶

suuid `property` ¶

suuid_bytes `property` ¶

did `property` ¶

parent `property` ¶

project `property` ¶

udf `property` ¶

sdf `property` ¶

get_sdata_spec `classmethod` ¶

read_sidecar `classmethod` ¶

from_dict `classmethod` ¶

from_json `classmethod` ¶

from_zip `classmethod` ¶

content_bytes `property` ¶

column_metadata `property` ¶

col `property` ¶

column_units `property` ¶

shape `property` ¶

columns `property` ¶

dtypes `property` ¶

from_dict `classmethod` ¶

from_parquet_bytes `classmethod` ¶

from_parquet `classmethod` ¶

from_csv `classmethod` ¶

from_arrow `classmethod` ¶

from_feather `classmethod` ¶

from_datapackage `classmethod` ¶

from_hdf `classmethod` ¶

content_bytes `property` ¶

filetype `property` ¶

from_dict `classmethod` ¶