API reference¶
Auto-generated from the source docstrings via
mkdocstrings. Private members (leading _) are
omitted.
sdata.metadata¶
metadata ¶
Attribute ¶
Metadata ¶
Bases: object
Metadata container class
each Metadata entry has has a * name (256) * value * unit * description * type (int, str, float, bool, timestamp)
sha3_256
property
¶
Return a new SHA3 hash object with a hashbit length of 32 bytes.
Returns:
| Type | Description |
|---|---|
|
hashlib.sha3_256.hexdigest() |
guess_dtype_from_value
staticmethod
¶
guess dtype from value, e.g. '1.23' -> 'float' 'otto1.23' -> 'str' 1 -> 'int' False -> 'bool'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
|
required |
Returns:
| Type | Description |
|---|---|
|
dtype(value), dtype ['int', 'float', 'bool', 'str', 'list'] |
update_from_dict ¶
set attributes from dict
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
dict |
required |
Returns:
| Type | Description |
|---|---|
|
|
to_json ¶
create a json
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
default None |
None
|
Returns:
| Type | Description |
|---|---|
|
json str |
from_json
classmethod
¶
create metadata from json file
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
jsonstr
|
json str |
None
|
|
filepath
|
filepath to json file |
None
|
Returns:
| Type | Description |
|---|---|
|
Metadata |
from_list
classmethod
¶
create metadata from a list of Attribute values
[['force_x', 1.2, 'kN', 'float', 'force in x-direction'], ['force_y', 3.1, 'N', 'float', 'force in y-direction', 'label', True]]
to_jsonld ¶
Serialisiere als selbstbeschreibendes JSON-LD-Dokument (dict).
from_jsonld
classmethod
¶
Rekonstruiere Metadata aus einem JSON-LD-Dokument (dict oder str).
get_prefixed ¶
Neue Metadata nur mit Attributen, deren Schlüssel mit prefix beginnt.
apply_schema ¶
Vervollständige diese Metadaten in-place gemäß einem MetadataSchema.
to_verifiable_credential ¶
Signiere die Metadaten als W3C Verifiable Credential (Compact-JWS).
from_verifiable_credential
classmethod
¶
Verifiziere ein VC und rekonstruiere die Metadata aus dem credentialSubject.
write_sidecar ¶
Schreibe <sname>.meta.jsonld neben einen Datenblob; gibt den Pfad zurück.
add ¶
add Attribute
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
|
required | |
value
|
|
None
|
|
kwargs
|
|
{}
|
Returns:
| Type | Description |
|---|---|
|
|
relabel ¶
relabel Attribute
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
old attribute name |
required | |
newname
|
new attribute name |
required |
Returns:
| Type | Description |
|---|---|
|
None |
update_hash ¶
A hash represents the object used to calculate a checksum of a string of information.
.. code-block:: python
hashobject = hashlib.sha3_256()
metadata = Metadata()
metadata.update_hash(hashobject)
hash.hexdigest()
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hashobject
|
hash object |
required |
Returns:
| Type | Description |
|---|---|
|
hash_function().hexdigest() |
set_unit_from_name ¶
try to extract unit from attribute name
Returns:
| Type | Description |
|---|---|
|
|
guess_value_dtype ¶
try to cast the Attribute values, e.g. str -> float
Returns:
| Type | Description |
|---|---|
|
|
extract_name_unit ¶
extract name and unit from a combined string
.. code-block:: python
value: 'Target Strain Rate (1/s) '
name : 'Target Strain Rate'
unit : '1/s'
value: 'Gauge Length [mm] monkey '
name : 'Gauge Length'
unit : 'mm'
value: 'Gauge Length <mm> whatever '
name : 'Gauge Length'
unit : 'mm'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
string, e.g. 'Length |
required |
Returns:
| Type | Description |
|---|---|
|
name, unit |
sdata.base¶
base ¶
Base ¶
Base class for sdata objects with metadata management. Provides core functionality for handling metadata, unique identifiers, serialization, and hierarchical relationships.
get_sdata_spec
classmethod
¶
Kanonischer, importierbarer String für eine Klasse: modul:Klasse.
validate ¶
Validiere die Metadaten gegen das deklarierte SDATA_SCHEMA.
Ohne Schema ist das Ergebnis trivial gültig.
to_jsonld ¶
Serialisiere die Metadaten als JSON-LD (siehe :mod:sdata.semantic).
write_sidecar ¶
Schreibe <sname>.meta.jsonld neben einen Datenblob; gibt den Pfad zurück.
read_sidecar
classmethod
¶
Lade eine .meta.jsonld-Sidecar-Datei zu einer Metadata.
update_data ¶
Update the data dictionary with recursive merge for nested dicts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_dict
|
Dict[str, Any]
|
Dictionary to merge into self._data. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If data_dict is not a dict. |
set_default_attributes ¶
Set default attributes from self.default_attributes list.
to_dict ¶
Convert the object to a dictionary representation.
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with metadata, data, and description. |
from_dict
classmethod
¶
Create an instance from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
Dict[str, Any]
|
Dictionary with metadata, data, and description. |
required |
Returns:
| Type | Description |
|---|---|
Base
|
Instance of Base or subclass. |
to_json ¶
Export the object to JSON format, either as a string or to a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
Optional[str]
|
Optional file path to write JSON (default: None). |
None
|
sidecar
|
bool
|
bei True zusätzlich |
False
|
Returns:
| Type | Description |
|---|---|
Optional[str]
|
JSON string if filepath is None, else None. |
from_json
classmethod
¶
Create an instance from a JSON string or file path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
str
|
JSON string or path to JSON file. |
required |
Returns:
| Type | Description |
|---|---|
Base
|
Instance of Base or subclass. |
Raises:
| Type | Description |
|---|---|
json.JSONDecodeError
|
If invalid JSON. |
FileNotFoundError
|
If file not found. |
to_zip ¶
Serialisiert das Objekt via to_json() und packt es als data.json in ein ZIP. - Wenn 'filepath' gesetzt ist, wird die ZIP-Datei geschrieben. - Rückgabe ist immer ein BytesIO, beginnend bei Position 0.
from_zip
classmethod
¶
Erzeugt eine Instanz aus einem ZIP, das eine JSON-Datei enthält. 'src' kann ein Dateipfad, Bytes, BytesIO oder beliebiger file-like Stream sein. - 'member': Name des JSON-Eintrags im ZIP. Wenn None: * nimm die einzige .json-Datei * oder, wenn nur ein Eintrag existiert, nimm diesen * sonst Fehler (Mehrdeutigkeit).
cls_from_spec ¶
Factory function to create an instance of a dynamically generated subclass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sdata_spec
|
Optional[str]
|
The |
'sdata.base:Base'
|
sdata_attrs
|
Optional[Dict[str, Any]]
|
Optional dict of custom attributes/methods to add to the class. |
None
|
kwargs
|
Any
|
Keyword arguments to pass to the instance initialization. |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
generated class. |
sclass_factory ¶
Factory function to create an instance of a dynamically generated subclass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sdata_spec
|
Optional[str]
|
The |
'sdata.base:Base'
|
sdata_attrs
|
Optional[Dict[str, Any]]
|
Optional dict of custom attributes/methods to add to the class. |
None
|
kwargs
|
Any
|
Keyword arguments to pass to the instance initialization. |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
An instance of the generated class. |
sdata_factory ¶
Factory function to create an instance of a dynamically generated subclass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
class_name
|
str
|
The name of the class to generate (e.g., "Material"). |
required |
sdata_class
|
Type
|
The base class to inherit from (default: Base). |
Base
|
sdata_attrs
|
Optional[Dict[str, Any]]
|
Optional dict of custom attributes/methods to add to the class. |
None
|
kwargs
|
Any
|
Keyword arguments to pass to the instance initialization. |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
An instance of the generated class. |
sdata.sclass.dataframe¶
dataframe ¶
DataFrame ¶
Bases: ContentIntegrityMixin, Base
content_bytes
property
¶
Binary serialization of the data (plain Parquet of the df, without the embedded sdata metadata).
The hook the inherited :class:~sdata.sclass.content.ContentIntegrityMixin
hashes over — enables sha256/md5/sha1, size and
verify()/update_checksum() directly on a :class:DataFrame. Hashing
the data only keeps the checksum stable when metadata changes (otherwise
storing the checksum in the metadata would alter the hash).
column_metadata
property
¶
Retrieve the per-column metadata.
Returns:
| Type | Description |
|---|---|
Metadata
|
a :class: |
col
property
¶
Column-annotation accessor: df.col.weight / df.col['weight'].
Returns the column :class:Attribute; mutate its fields in place
(df.col.weight.unit = 'kg') or use :meth:set_column.
column_units
property
¶
Mapping {column: unit} (in df-column order) from column_metadata.
get_column ¶
Return the column :class:~sdata.metadata.Attribute for name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
column name. |
required |
Returns:
| Type | Description |
|---|---|
Optional[Attribute]
|
the column's :class: |
set_column ¶
set_column(name, *, unit=None, label=None, description=None, ontology=None, required=None, dtype=None)
Annotate a column; writes through to :attr:column_metadata.
Only the provided fields are changed; existing annotations are preserved
(delegates to :meth:~sdata.metadata.Metadata.set_attr). A warning is
logged if name is not a column of the current df.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
column name. |
required | |
unit
|
physical unit (e.g. |
None
|
|
label
|
human-readable label. |
None
|
|
description
|
free-text description. |
None
|
|
ontology
|
CURIE/IRI of the column's class. |
None
|
|
required
|
whether the column is required. |
None
|
|
dtype
|
declared dtype string. |
None
|
Returns:
| Type | Description |
|---|---|
Attribute
|
the (created or updated) column :class: |
validate_table ¶
Validate the df/column_metadata against a :class:~sdata.schema.TableSchema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
schema
|
a |
None
|
Returns:
| Type | Description |
|---|---|
|
a :class: |
describe ¶
Descriptive statistics of the df (delegates to pandas.DataFrame.describe).
to_dict ¶
Serialize to a dict, embedding the df as base64 Parquet plus column_metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
engine
|
str
|
Parquet engine for pandas (default |
'pyarrow'
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
dict with the :class: |
to_jsonld ¶
JSON-LD der Metadaten inkl. Spalten-Metadaten (csvw:column).
write_sidecar ¶
Sidecar <sname>.meta.jsonld inkl. Spalten-Metadaten; gibt den Pfad zurück.
from_dict
classmethod
¶
Reconstruct a DataFrame from a dict produced by :meth:to_dict.
Restores metadata and column_metadata and decodes the df from the embedded base64 Parquet payload.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
Dict[str, Any]
|
dict with |
required |
engine
|
str
|
Parquet engine for pandas (default |
'pyarrow'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
a :class: |
to_dataframe ¶
Return a copy of the pandas DataFrame with sdata metadata in df.attrs.
Metadata, column_metadata and description are embedded under the
"_sdata" key (same layout as :meth:to_parquet), so that a round-trip
through pandas keeps the annotations discoverable.
Returns:
pandas.DataFrame: A copy of the DataFrame with attrs['_sdata'] set.
to_parquet ¶
Serialize this sdata.DataFrame to Parquet format, embedding metadata.
This method will copy the internal pandas DataFrame, attach SData metadata
(dataset‐level metadata, per‐column metadata, and description) to df.attrs,
and then write the result as a Parquet file. If no output path is given,
it will return the Parquet bytes buffer.
Args:
path (str, optional): Directory where the Parquet file will be saved.
If provided (and filename is None), a file named
<sname>.spq is created under this directory.
filename (str, optional): Exact filename (without full path)
for the output Parquet file. Defaults is <sname>.spq.
**kwargs: Additional keyword arguments passed to pandas.DataFrame.to_parquet,
e.g.:
- engine (str): Parquet engine, defaults to "pyarrow".
- compression (str): Compression codec, defaults to "zstd".
Returns:
str or bytes:
- If path (or filename) is provided, returns the full file path (str)
where the Parquet file was written.
- Otherwise, returns the in‐memory Parquet representation as bytes.
Example:
# Save to disk under /data/output/
# Get in‐memory Parquet bytes (no file on disk):
parquet_bytes = sdf.to_parquet()
from_parquet_bytes
classmethod
¶
Load a DataFrame from in-memory Parquet bytes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parquet_bytes
|
Parquet file content as bytes. |
required | |
engine
|
str
|
Parquet engine for pandas (default |
'pyarrow'
|
Returns:
| Type | Description |
|---|---|
|
a :class: |
from_parquet
classmethod
¶
Load a DataFrame from a Parquet file on disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
path to the |
required | |
engine
|
str
|
Parquet engine for pandas (default |
'pyarrow'
|
Returns:
| Type | Description |
|---|---|
|
a :class: |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
if |
to_csv ¶
Serialize the df to CSV (pure pandas, no extra dependency).
CSV carries data only; the qualifying metadata travels in the optional
<sname>.meta.jsonld sidecar. The index is dropped by default
(override via index=True).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
directory to write |
None
|
|
filename
|
exact output filename (defaults to |
None
|
|
sidecar
|
also write a JSON-LD metadata sidecar next to the file. |
False
|
|
kwargs
|
forwarded to :meth: |
{}
|
Returns:
| Type | Description |
|---|---|
|
the file path (if written to disk) or the CSV string. |
from_csv
classmethod
¶
Load a DataFrame from a CSV file (pure pandas).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
path to the CSV file. |
required | |
kwargs
|
forwarded to :func: |
{}
|
Returns:
| Type | Description |
|---|---|
|
a :class: |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
if |
to_arrow ¶
Return a :class:pyarrow.Table with sdata metadata in the schema.
The dataset metadata, column_metadata and description are embedded as JSON
under the b"_sdata" schema-metadata key. In addition, each column's
unit/label/description/ontology are attached natively to
that column's Arrow field metadata, so Arrow-aware tools (DuckDB, Polars,
pyarrow) can read the per-column annotations without sdata.
Returns:
| Type | Description |
|---|---|
|
a |
Raises:
| Type | Description |
|---|---|
ImportError
|
if pyarrow is not installed ( |
from_arrow
classmethod
¶
Build a DataFrame from a :class:pyarrow.Table written by :meth:to_arrow.
The b"_sdata" schema blob is restored if present; per-column Arrow field
metadata (unit/label/...) is also merged into column_metadata, so
tables produced by other Arrow-native tools keep their column annotations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
a |
required |
Returns:
| Type | Description |
|---|---|
|
a :class: |
Raises:
| Type | Description |
|---|---|
ImportError
|
if pyarrow is not installed. |
to_feather ¶
Serialize to the Feather (Arrow IPC) format, embedding sdata metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
directory to write |
None
|
|
filename
|
exact output filename (defaults to |
None
|
|
sidecar
|
also write a JSON-LD metadata sidecar next to the file. |
False
|
|
kwargs
|
forwarded to :func: |
{}
|
Returns:
| Type | Description |
|---|---|
|
the file path (if written to disk) or the Feather bytes. |
Raises:
| Type | Description |
|---|---|
ImportError
|
if pyarrow is not installed. |
from_feather
classmethod
¶
Load a DataFrame from a Feather file written by :meth:to_feather.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
path to the |
required |
Returns:
| Type | Description |
|---|---|
|
a :class: |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
if |
ImportError
|
if pyarrow is not installed. |
as_blob ¶
Render the table as a standalone :class:~sdata.sclass.blob.Blob.
The df is serialized once (in the chosen fmt) into a fixed
bytes-content Blob — a binary snapshot that can be hashed, signed,
stored or transferred like any other asset. This is composition, not
inheritance: a living, multi-format table is rendered into one chosen
format on demand (RFC 0004, Option C). The Blob's checksum is filled
(update_checksum), so blob.verify() works out of the box.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fmt
|
str
|
serialization format — |
'parquet'
|
kwargs
|
forwarded to :class: |
{}
|
Returns:
| Type | Description |
|---|---|
|
a :class: |
Raises:
| Type | Description |
|---|---|
ValueError
|
if |
ImportError
|
if the format needs pyarrow and it is not installed. |
to_datapackage ¶
Write a Frictionless Data Package (.zip) — a portable bundle.
The zip holds a standard datapackage.json descriptor (so generic
Frictionless tooling can read it), the data as CSV or Parquet, and — for a
lossless sdata round-trip — the full sdata metadata under the descriptor's
"sdata" key. Optionally the <sname>.meta.jsonld JSON-LD sidecar.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
directory to write |
None
|
|
filename
|
exact output filename (defaults to |
None
|
|
fmt
|
data format inside the package, |
'csv'
|
|
sidecar
|
also embed the JSON-LD sidecar in the zip (default |
True
|
Returns:
| Type | Description |
|---|---|
|
the file path (if written to disk) or the zip bytes. |
Raises:
| Type | Description |
|---|---|
ValueError
|
if |
from_datapackage
classmethod
¶
Load a DataFrame from a Data Package .zip written by :meth:to_datapackage.
Restores the data and — losslessly — metadata/column_metadata/description
from the descriptor's "sdata" block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
path to the |
required |
Returns:
| Type | Description |
|---|---|
|
a :class: |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
if |
to_hdf ¶
Serialize the df to HDF5 (PyTables), embedding sdata metadata as a node attr.
HDF5 has no in-memory bytes form, so a path/filename is required. The
sdata metadata (metadata/column_metadata/description) is stored as the node's
_sdata attribute; several DataFrames can share one file via distinct key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
directory to write |
None
|
|
filename
|
exact output filename (defaults to |
None
|
|
key
|
HDF5 node/key (default: |
None
|
|
sidecar
|
also write a JSON-LD metadata sidecar next to the file. |
False
|
|
kwargs
|
forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
|
the file path. |
Raises:
| Type | Description |
|---|---|
ImportError
|
if PyTables is not installed ( |
ValueError
|
if neither |
from_hdf
classmethod
¶
Load a DataFrame from an HDF5 file written by :meth:to_hdf.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
path to the |
required | |
key
|
HDF5 node/key to read (default: the first key in the file). |
None
|
Returns:
| Type | Description |
|---|---|
|
a :class: |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
if |
ImportError
|
if PyTables is not installed. |
sdata.sclass.blob¶
blob ¶
Blob ¶
Bases: ContentIntegrityMixin, Base
A derived class from Base that represents a generic binary large object (Blob). Stores the content in self.data['content'] as a dictionary with: - 'type': 'bytes' for in-memory bytes (base64-encoded for serialization) or 'uri' for a filesystem URI (local path, S3 object, Zip path, etc., handled via fsspec). - 'value': The base64-encoded bytes string (for 'bytes') or the URI string (for 'uri'). - 'filetype': The file type (e.g., 'pdf', 'png', 'jpg', 'txt', or any custom type). This is always stored and serialized.
Additionally, integrates hash calculations (SHA1 and MD5) from the provided class for integrity checks. The actual bytes are loaded lazily when accessed via .content_bytes property, ensuring large content is not loaded unless explicitly requested. For serialization in to_dict(), bytes are base64-encoded if type is 'bytes'; URIs are kept as-is. Supports PDFs, images (png, jpg), or any arbitrary file types. Uses fsspec to handle various URI schemes: - Local file: 'file:///path/to/file.pdf' or simply '/path/to/file.pdf' - S3: 's3://bucket/key.pdf' - Zip: 'zip://innerfile.txt::/path/to/outer.zip'
content_bytes
property
¶
Lazily load and retrieve the content as bytes (only when this property is accessed). If type is 'uri', use fsspec to open and read; if 'bytes', decode from base64. Caches the result for subsequent accesses.
Returns:
| Type | Description |
|---|---|
bytes
|
The content as bytes. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If loading fails or no value set. |
Exception
|
If fsspec encounters an error (e.g., invalid URI, missing dependencies like s3fs for S3). |
filetype
property
¶
Retrieve the filetype (no content loading required).
Returns:
| Type | Description |
|---|---|
str
|
The filetype string (e.g., 'pdf', 'png', 'jpg'). |
set_content ¶
Update the content type, value, and optionally filetype. Clears any cached content to maintain lazy loading.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
content_type
|
ContentType
|
New content_type ('bytes' or 'uri'). |
required |
value
|
Any
|
New value (bytes or URI str). |
required |
filetype
|
Optional[str]
|
Optional new filetype (e.g., 'pdf', 'png', 'jpg'). |
None
|
exists ¶
Test whether the blob content exists. For 'uri', checks if the path/URI is accessible via fsspec; for 'bytes', always True if value set.
write ¶
Write the content to a destination uri via fsspec (local/S3/zip/…).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uri
|
str
|
destination URI/path (e.g. |
required |
kwargs
|
Any
|
forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
the destination |
Raises:
| Type | Description |
|---|---|
ImportError
|
if fsspec is not installed ( |
open ¶
Return a file-like handle to the content; use as a context manager.
For uri content a streaming fsspec handle is returned (no full
in-memory load); for bytes content an :class:io.BytesIO over the
decoded bytes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mode
|
str
|
open mode (default |
'rb'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
if no content/value is set or the content_type is unknown. |
ImportError
|
if fsspec is required (uri) but not installed. |
to_dict ¶
Extend Base.to_dict to include the content dict as-is (with base64 for bytes and filetype). Does not include or load the actual content bytes.
from_dict
classmethod
¶
Create a Blob instance from a dictionary. Restores content dict including filetype; no content loading occurs here (lazy via content_bytes).
sdata.imagemeta¶
imagemeta ¶
Native, format-übergreifende Einbettung von sdata-Metadaten in Bild-Bytes.
Reiner Python-Code (keine Pillow-Abhängigkeit für die Metadaten-Schicht): ein Text-Payload (i. d. R. das sdata-Metadaten-JSON) wird über eine einheitliche API verlustfrei in den jeweiligen Bildcontainer geschrieben bzw. daraus gelesen.
Unterstützte Container und ihr nativer Träger:
- PNG —
iTXt-Chunk mit Schlüsselwortsdata(UTF-8) - JPEG —
APP1-Segment mit Kennungsdata\0(UTF-8) - JP2 —
uuid-Box (JPEG 2000, ISO BMFF) mit fester sdata-UUID - GIF — Comment-Extension mit Präfix
sdata\0 - WebP — eigener RIFF-Chunk
sdAT(von Decodern als unbekannt ignoriert) - TIFF — privates IFD-Tag (65000); die Original-Bytes bleiben unverändert
Das Format wird an den Magic-Bytes erkannt (:func:detect_format); :func:embed
und :func:extract wählen den passenden Handler. Die Schreibsemantik ist
replace (eine vorhandene sdata-Nutzlast wird ersetzt, nicht dupliziert). Pillow
(optional) wird nur zum Transkodieren der Pixel benötigt, nicht für die
Metadaten — das Lesen funktioniert daher vollständig Pillow-frei.
:Example:
from sdata import imagemeta png_with_meta = imagemeta.embed(png_bytes, '{"name": "probe"}') # doctest: +SKIP imagemeta.extract(png_with_meta) # doctest: +SKIP '{"name": "probe"}'
ImageMetadataError ¶
Bases: Exception
Basisfehler der Bild-Metadaten-Schicht.
UnsupportedImageFormatError ¶
Bases: ImageMetadataError
Das Bildformat wird (zum Schreiben) nicht unterstützt.
PayloadTooLargeError ¶
Bases: ImageMetadataError
Die Nutzlast passt nicht in ein einzelnes Format-Segment (z. B. JPEG APP1).
detect_format ¶
Erkenne das Bildformat an den Magic-Bytes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
bytes
|
die Bild-Bytes. |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
|
supported_formats ¶
Die unterstützten Format-Schlüssel (Reihenfolge der Registry).
embed ¶
Bette payload (Text) nativ in die Bild-Bytes ein (replace-Semantik).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
bytes
|
die Original-Bild-Bytes. |
required |
payload
|
str
|
der einzubettende Text (i. d. R. sdata-Metadaten-JSON). |
required |
fmt
|
Optional[str]
|
Format-Schlüssel; |
None
|
Returns:
| Type | Description |
|---|---|
bytes
|
neue Bild-Bytes mit eingebetteter sdata-Nutzlast. |
Raises:
| Type | Description |
|---|---|
UnsupportedImageFormatError
|
wenn das Format unbekannt/nicht unterstützt ist. |
PayloadTooLargeError
|
wenn die Nutzlast nicht in ein Segment passt (JPEG). |
extract ¶
Lies eine eingebettete sdata-Nutzlast aus den Bild-Bytes (Pillow-frei).
Lenient beim Lesen: unbekannte/nicht unterstützte Formate liefern None
(kein Fehler), ebenso Bilder ohne eingebettete sdata-Nutzlast.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
bytes
|
die Bild-Bytes. |
required |
fmt
|
Optional[str]
|
Format-Schlüssel; |
None
|
Returns:
| Type | Description |
|---|---|
Optional[str]
|
die eingebettete Nutzlast (Text) oder |
sdata.sclass.image¶
image ¶
Image — ein :class:~sdata.sclass.blob.Blob über Bild-Inhalt.
Der Bild-Inhalt liegt als Blob-Content (uri für Dateien, bytes für
In-Memory-Daten). sdata-Metadaten werden format-übergreifend nativ in die
Bilddatei eingebettet (PNG/JPEG/JP2/GIF/WebP/TIFF) — über :mod:sdata.imagemeta,
das ohne Pillow auskommt. Formate ohne nativen Metadaten-Träger (z. B. BMP)
erhalten einen verlustfreien <filepath>.meta.json-Sidecar; die save/
from_file-API ist für alle Formate identisch. Pillow wird nur lazy zum
Dekodieren/Transkodieren der Pixel genutzt (:attr:Image.pil/
:meth:Image.to_numpy/:meth:Image.save bei Formatwechsel) und ist optional
(pip install pillow).
Image ¶
Bases: Blob
Image object based on :class:~sdata.sclass.blob.Blob.
from_file
classmethod
¶
Create an Image referencing an image file (kept as uri content).
Any sdata metadata is read back and merged: natively embedded
(PNG/JPEG/JP2/GIF/WebP/TIFF, Pillow-free) and/or from an adjacent
<filepath>.meta.json sidecar (for formats without a native container).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
path to the image file. |
required | |
project
|
namespace for the deterministic SUUID (alias of |
None
|
|
ns_name
|
namespace for the deterministic SUUID. |
None
|
|
kwargs
|
forwarded to :class: |
{}
|
Returns:
| Type | Description |
|---|---|
|
an :class: |
from_bytes
classmethod
¶
Create an Image from in-memory image bytes.
Any embedded sdata metadata is read back and merged (Pillow-free).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
a name for the image (its suffix sets the filetype). |
required | |
image_data
|
the raw image bytes. |
required | |
project
|
namespace for the deterministic SUUID. |
None
|
Returns:
| Type | Description |
|---|---|
|
an :class: |
embedded_metadata ¶
Return the sdata metadata embedded in the image bytes, or None.
Reads the native sdata payload (PNG iTXt / JPEG APP1 / JP2 uuid
box / GIF comment / WebP sdAT chunk) without Pillow.
Returns:
| Type | Description |
|---|---|
|
a :class: |
sidecar_path
staticmethod
¶
Path of the metadata sidecar for filepath (<filepath>.meta.json).
write_sidecar ¶
Write the sdata metadata next to filepath as a lossless JSON sidecar.
The sidecar carries the same payload as the embedded form
(metadata.to_json()), so a round-trip is lossless regardless of whether
the format has a native metadata container.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
the image path the sidecar belongs to. |
required |
Returns:
| Type | Description |
|---|---|
str
|
the sidecar path ( |
save ¶
Save the image to filepath with sdata metadata — one API for all formats.
The container is chosen from the file suffix. For a format with a native
metadata container (PNG/JPEG/JP2/GIF/WebP/TIFF) the metadata is embedded:
without re-encoding if the stored bytes already use that container (lossless,
Pillow-free), otherwise Pillow transcodes first. For any other format
(e.g. BMP) the image is written via Pillow and the metadata travels in a
lossless <filepath>.meta.json sidecar — so metadata is never lost.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
destination path (its suffix selects the format). |
required | |
sidecar
|
sidecar policy — |
None
|
|
kwargs
|
forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
|
the destination |
Raises:
| Type | Description |
|---|---|
ImportError
|
if Pillow is required (transcode / non-native format) but not installed. |
sdata.schema¶
schema ¶
Metadaten-Schemata / Templates zum Vollqualifizieren von Metadaten.
Ein :class:MetadataSchema deklariert die erwarteten Attribute einer Datenklasse
(Name, dtype, Einheit, required, ontology, …) und kann Metadaten dagegen
validieren sowie vervollständigen (Defaults/Einheiten/Ontologie auffüllen).
Reine-Python-Validierung ist immer verfügbar; mit dem optionalen Extra
[schema] (jsonschema) lässt sich zusätzlich gegen ein generiertes JSON Schema
prüfen.
AttrSpec
dataclass
¶
Spezifikation eines erwarteten Attributs.
ValidationReport
dataclass
¶
Ergebnis einer Schema-Validierung (truthy, wenn ok).
MetadataSchema ¶
Sammlung von :class:AttrSpec für eine Datenklasse.
validate ¶
Prüfe metadata gegen das Schema (wirft nie; liefert ValidationReport).
apply ¶
Vervollständige metadata in-place: fehlende Attribute aus Defaults
anlegen, vorhandene um Einheit/Ontologie/Beschreibung ergänzen. Gibt
metadata zurück (für nicht-destruktiv: schema.apply(md.copy())).
to_json_schema ¶
Generiere ein JSON Schema (Draft 2020-12) für metadata.get_udict().
validate_jsonschema ¶
Validiere via jsonschema (falls installiert), sonst native validate.
TableSchema ¶
Spalten-Schema für eine :class:~sdata.sclass.dataframe.DataFrame.
Deklariert je Spalte den erwarteten name/dtype/unit/required
(wiederverwendet :class:AttrSpec) und kann ein DataFrame dagegen validieren
sowie dessen column_metadata aus dem Schema vervollständigen.
Der dtype wird gegen die tatsächlichen df.dtypes geprüft, die Einheit gegen
die column_metadata-Annotation der Spalte.
validate ¶
Prüfe ein :class:DataFrame gegen das Spalten-Schema.
Wirft nie; liefert einen :class:ValidationReport mit fehlenden Spalten,
dtype-Abweichungen (gegen df.dtypes), Einheiten-Abweichungen (gegen
column_metadata) und zusätzlichen (nicht spezifizierten) Spalten.
apply ¶
Vervollständige column_metadata in-place aus dem Schema.
Für jede im df vorhandene Schema-Spalte: fehlende Annotation anlegen bzw.
leere unit/ontology/description auffüllen. Gibt dataframe zurück.
sdata.dtypes¶
dtypes ¶
Reine-stdlib dtype-Registry für :class:sdata.metadata.Attribute.
Single Source of Truth für Wert-Coercion, JSON-(De)Serialisierung, die zu einem dtype gehörige Python-Klasse und das XSD-Typ-Mapping (das die JSON-LD-Schicht konsumiert).
Designziele:
- Rückwärtskompatibel – im lenienten Default-Modus (
strict=False) verhält sich die Coercion byte-genau wie der bisherigeAttribute._set_value: leere/falsy Werte werden je dtype zunp.nan/None/""/False; nicht-castbare Werte lösenDtypeErroraus (vom Setter geloggt, Wert bleibt unverändert). - Strikt opt-in –
strict=TruewirftDtypeErrorstatt still zu degradieren. - Erweiterbar – neben den 6 Alt-dtypes (str/int/float/bool/timestamp/list)
zusätzlich
bytes(base64),json(dict/list),uri,date,time,duration(ISO 8601 /timedelta),decimal(exakt),complex,floatlist(typisierte Float-Liste) sowielangstring(sprach-getaggt,rdf:langString).complex/floatlisthaben keinen Standard-XSD-Typ und nutzen eigene Datentyp-CURIEs (sdata:complex/sdata:floatlist);langstringwird in JSON-LD über@languageausgedrückt.
DtypeError ¶
Bases: ValueError
Wert kann nicht in den Ziel-dtype überführt werden (v.a. im strict-Modus).
LangString ¶
Ein sprach-getaggter String (rdf:langString): text + BCP-47 lang.
In JSON-LD als {"@value": text, "@language": lang} repräsentiert (nicht über
@type). Die kompakte Textform ist "text@lang" (z. B. "Hallo@de").
DtypeSpec ¶
Beschreibt einen dtype: Coercion, JSON-Repräsentation, Klasse, XSD-Typ.
resolve ¶
Normalisiere einen dtype-Input (String ODER Klasse) auf einen Registry-Key.
Spiegelt die bisherigen _set_dtype-Regeln: Klassen via DTYPES_INV,
'float64'/'int32' -> 'float'/'int', Unbekanntes -> 'str'.
None -> None (kein dtype-Wechsel).
json_default ¶
default= für json.dumps: serialisiert die nicht-nativen dtype-Werte
(TimeStamp/bytes/Decimal/timedelta/date/time) JSON-sicher.
sdata.semantic¶
semantic ¶
JSON-LD-/Linked-Data-Serialisierung für sdata-Metadaten + Sidecar-Dateien.
Wandelt eine :class:sdata.metadata.Metadata in ein selbstbeschreibendes
JSON-LD-Dokument (und zurück) und schreibt/liest optionale Sidecar-Dateien
<sname>.meta.jsonld neben einem Datenblob.
Modellierung (siehe Projektentscheidungen):
@id= DID des Objektsdid:suuid:<sname>:sdata.@type=[sdata:<Klasse>, <BFO-IRI der Topologieklasse>].- Reservierte
_sdata_*-Felder -> schema.org/DCAT/PROV-Terme (name, identifier, generatedAtTime, wasDerivedFrom, isPartOf, …). - Hybrid für User-Attribute: mit echter Einheit ->
qudt:Quantity-Knoten; ohne Einheit aber mitontology-> getypter Knoten; sonst einfaches typisiertes Literal.ontologyist stets ein@type/Klasse des Werts; das Prädikat istsdata:<name>.
Pure-Python; keine Pflicht-Dependency.
column_node ¶
JSON-LD-Knoten (CSVW) für eine Tabellenspalte aus column_metadata.
attr.name = Spaltenname, attr.value = pandas-dtype-Name (z.B.
float64); optional unit/label/description.
to_jsonld ¶
Serialisiere metadata als JSON-LD-Dokument (dict).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
columns
|
optionale, geordnete Iterable von Spalten- |
None
|
from_jsonld ¶
Rekonstruiere eine :class:Metadata aus einem JSON-LD-Dokument (dict oder str).
to_rdf ¶
Serialisiere die Metadaten als RDF.
Mit installiertem rdflib ([rdf]) wird das JSON-LD nach fmt
(turtle/nt/xml/…) serialisiert. Ohne rdflib wird das JSON-LD selbst
zurückgegeben – application/ld+json ist bereits gültiges RDF.
rdf_from_doc ¶
Serialisiere ein bereits gebautes JSON-LD-Dokument als RDF (siehe :func:to_rdf).
to_verifiable_credential ¶
Signiere die Metadaten als W3C Verifiable Credential (Compact-JWS, EdDSA).
Wickelt :func:to_jsonld als credentialSubject und signiert über den
pure-Python-EdDSA-Stack (:mod:sdata.did.jose) – keine externe Krypto.
Returns:
| Type | Description |
|---|---|
|
Compact-JWS-String ( |
verify_credential ¶
Verifiziere ein VC (Compact-JWS) und gib das credentialSubject (JSON-LD) zurück.
Raises:
| Type | Description |
|---|---|
sdata.did.errors.VerificationError
|
bei ungültiger Signatur. |
write_sidecar ¶
Schreibe <sname>.meta.jsonld (JSON-LD) neben einen Blob; gibt den Pfad zurück.
write_sidecar_doc ¶
Schreibe ein bereits gebautes JSON-LD-Dokument als <sname>.meta.jsonld.
read_sidecar ¶
Lade eine .meta.jsonld-Sidecar-Datei und rekonstruiere die Metadata.
sdata.units¶
units ¶
Einheiten-Vokabular: kuratierte Abbildung Einheit → QUDT-IRI / UCUM-Code.
Reine-Python-Tabelle (keine Abhängigkeit). Optionales Extra [units] = pint
erweitert die Validierung auf beliebige parsebare Einheiten; ohne pint greift
die kuratierte Tabelle.
normalize_symbol ¶
Trimme/normalisiere ein Einheiten-Symbol auf einen kanonischen Schlüssel.
qudt_iri ¶
QUDT-Einheiten-IRI-CURIE für ein Symbol oder None (unbekannt/dimensionslos).
unit_node ¶
JSON-LD-Fragment für eine Einheit: {"unitRef": <iri>, "symbol": <sym>}.
unitRef entfällt, wenn keine QUDT-IRI bekannt ist (z.B. "-" oder
unbekannte Einheit) – das rohe symbol bleibt stets erhalten.
validate_unit ¶
True, wenn die Einheit bekannt (kuratierte Tabelle) oder – mit pint – parsebar ist.
sdata.vocab¶
vocab ¶
Vokabular & JSON-LD-@context für die semantische Metadaten-Schicht.
Reine-Python-Tabellen (keine Abhängigkeit): Namespace-Präfixe, ein
@context-Builder, das XSD-Typ-Mapping (aus :mod:sdata.dtypes) und die
Auflösung von BFO-Topologieklassen sowie von Attribut-ontology-Annotationen.
Designentscheidungen (siehe Projektplan):
- Identität:
@ideines Objekts ist seine DIDdid:suuid:<sname>:sdata. ontology= immer@type/Klasse des Werts (z.B.bfo:Quality); das Subjekt→Wert-Prädikat kommt NICHT ausontology, sondern istsdata:<name>(bzw. ein gemappter schema.org/qudt-Term).
build_context ¶
Liefere das JSON-LD-@context (inline = vollständige Term-Map) oder
– bei mode="url" – die gehostete Context-URL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mode
|
|
'inline'
|
|
extra
|
optionale zusätzliche Term-Definitionen |
None
|
expand_curie ¶
Expandiere eine CURIE (schema:name) zur vollen IRI; IRIs/Unbekanntes
werden unverändert zurückgegeben.
bfo_iri ¶
Mappe einen Topologieklassen-String ("sdata.sclass:IndependentContinuant")
auf die BFO-CURIE oder None, falls unbekannt/leer.
safe_term ¶
Normalisiere einen Attributnamen zu einem CURIE-tauglichen lokalen Teil.
predicate_for ¶
Subjekt→Wert-Prädikat für ein User-Attribut: Default sdata:<safe_name>.
type_iri ¶
@type/Klasse eines Werts aus dem ontology-Feld (CURIE/IRI) oder
None. (ontology ist per Projektentscheidung stets eine Klasse.)
sdata.interactive¶
interactive ¶
Interaktive Ergonomie für Metadaten: Jupyter-_repr_html_, Attribut-
Autocomplete (m.a.force_x) und Prefix-Filterung.
Reine Standardbibliothek; die Renderer bauen HTML manuell (keine pandas-Styler-/ tabulate-Abhängigkeit).
AttrAccessor ¶
Attribut-Autocomplete-Helfer: m.a.force_x liefert/setzt ein Attribut.
Liegt bewusst auf einem eigenen Objekt (nicht auf :class:Metadata), damit
Tab-Completion nur Attributnamen zeigt und keine Methoden überschattet.
ColumnAccessor ¶
Spalten-Annotation-Autocomplete: df.col.weight / df.col['weight'].
Liefert das Spalten-:class:Attribute aus column_metadata; dessen Felder
lassen sich direkt mutieren (df.col.weight.unit = 'kg'). Zum Anlegen/Setzen
ganzer Felder dient :meth:~sdata.sclass.dataframe.DataFrame.set_column.
Liegt – wie :class:AttrAccessor – bewusst auf einem eigenen Objekt, damit die
Tab-Completion nur Spaltennamen zeigt und keine Methoden überschattet.
metadata_html ¶
HTML-Darstellung einer :class:Metadata (Kopf + User-Attribut-Tabelle).
sdata.timestamp¶
timestamp ¶
ISO 8601 date time string parsing
Basic usage:
#>>> parse_date("2007-01-25T12:00:00Z")
datetime.datetime(2007, 1, 25, 12, 0, tzinfo=
MIT License
Copyright (c) 2007 - 2015 Michael Twomey
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
ParseError ¶
Bases: Exception
Raised when there is a problem parsing a date string
Utc ¶
Bases: tzinfo
UTC Timezone
FixedOffset ¶
Bases: tzinfo
Fixed offset in hours and minutes from UTC
TimeStamp ¶
to_int ¶
Pull a value from the dict and convert to int
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
default_to_zero
|
If the value is None or empty, treat it as zero |
False
|
|
default
|
If the value is missing in the dict use this default |
None
|
parse_timezone ¶
Parses ISO 8601 time zone specs into tzinfo offsets
parse_date ¶
Parses ISO 8601 dates into datetime objects
The timezone is parsed from the date string. However it is quite common to have dates without a timezone (not strictly correct). In this case the default timezone specified in default_timezone is used. This is UTC by default.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datestring
|
The date to parse as a string |
required | |
default_timezone
|
A datetime tzinfo instance to use when no timezone is specified in the datestring. If this is set to None then a naive datetime object is returned. |
UTC
|
Returns:
| Type | Description |
|---|---|
|
A datetime.datetime instance |
Raises:
| Type | Description |
|---|---|
ParseError
|
when there is a problem parsing the date or constructing the datetime instance. |
today_str ¶
create timestamp for today (utc)
Returns:
| Type | Description |
|---|---|
|
'2020-12-11T00:00:00+00:00' |
now_local_str ¶
create timestamp for now for local timezone
Returns:
| Type | Description |
|---|---|
|
'2020-12-11T00:00:00+00:00' |
now_utc_str ¶
create timestamp for now for utc timezone
Returns:
| Type | Description |
|---|---|
|
'2020-12-11T00:00:00+00:00' |