Changelog¶
All notable changes to sdata are documented here. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased¶
1.3.0 - 2026-06-29¶
A large, strictly additive increment: a content/integrity foundation under all
data containers (Blob), a much broader DataFrame serialization portfolio, and
native, format-agnostic metadata embedding for images. Core dependencies remain
numpy, pandas, suuid; every new backend stays optional with a pure-Python path.
Added¶
- New attribute dtypes
date,time,duration,decimal,complex,floatlistandlangstring(pure stdlib):date/time(xsd:date/xsd:time),durationas adatetime.timedeltaparsed from ISO 8601 (xsd:duration),decimalasdecimal.Decimalfor exact numerics (xsd:decimal),complexnumbers andfloatlist(typedlist[float], also from numpy arrays) — the latter two use the custom datatype CURIEssdata:complex/sdata:floatlist, andlangstring(rdf:langString,"Hallo@de") renders via JSON-LD@language— all with a lossless round-trip. Lenient/strict=coercion as for the existing dtypes. - Native image metadata (RFC 0005). New pure-Python, Pillow-free module
sdata.imagemetaembeds/reads sdata metadata natively into six containers with one API (detect_format/embed/extract/supported_formats): PNG (iTXt), JPEG (APP1), JPEG 2000 (uuidbox), GIF (comment extension), WebP (sdATchunk) and TIFF (private IFD tag, original bytes untouched).Imagegains a uniformsave/from_fileflow,embedded_metadata(), and a lossless<file>.meta.jsonsidecar fallback for formats without a native carrier (e.g. BMP), controllable viasave(sidecar=True|False|None). Blobas the content/integrity/provenance foundation (RFC 0003). HardenedBlobwithsha256/sha1/md5,size,verify()/update_checksum(), a lazycontent_bytescache,exists(),write(uri)/open()(fsspec), standard-vocabulary provenance metadata (dcat:mediaType,dcterms:*,schema:sha256) and mime/creation-date autofill.FileReferenceandImagenow build onBlob.- Shared integrity mixin (RFC 0004, Option B).
sdata.sclass.content.ContentIntegrityMixinprovides the hash/verify/sizelayer to bothBlobandDataFramevia acontent_byteshook (no inheritance between them). DataFrame.as_blob(fmt)(RFC 0004, Option C). Render a table as a standaloneBlobin a chosen format (parquet/csv/arrow/feather) — composition that grants hash/verify/size/write/openwithout changing the base class.DataFrameserialization portfolio. Native per-column field metadata in Arrow/Feather (to_arrow/from_arrow/to_feather/from_feather), a Frictionless Data Package bundle (to_datapackage/from_datapackage,.zip), and HDF5 I/O (to_hdf/from_hdf, optionalsdata[hdf], RFC 0002).- RFCs. 0002 (HDF5), 0003 (Blob foundation), 0004 (DataFrame vs. Blob), 0005 (native image metadata); MkDocs nav, API reference and usage guides extended.
Changed¶
DataFrame.content_byteshashes the data only (plain Parquet), so storing the checksum in the metadata does not change the hash (no self-reference).- Documentation reorganized around the full DataFrame serialization portfolio and the
image-metadata workflow (new
usage/image-metadata.md).
Notes¶
- 100 % line coverage maintained;
mkdocs build --strictgreen.sdata.imagemetais measured (100 %) via synthetic, Pillow-free tests, so coverage holds even without Pillow installed.
1.2.0 - 2026-06-26¶
- Machine-readable metadata backbone. Typed dtype registry, a registered JSON-LD
@context(vocab/units/BFO),to_jsonld/to_rdf/to_turtlewith<sname>.meta.jsonldsidecars, declarativeMetadataSchema/TableSchemavalidation, an interactive Jupyter layer (_repr_html_, attribute autocomplete) and signed metadata as W3C Verifiable Credentials over the pure-Python EdDSA stack (sdata.did). - Self-describing
DataFramecontainer with per-column metadata and Parquet/CSV/ dict/JSON-LD serialization, superseding the deprecatedDataclass. - Docs & packaging. MkDocs Material + mkdocstrings documentation site; core
dependencies reduced to
numpy/pandas/suuid(stdlibzoneinfo); warning-free test suite.