Images: embedding sdata metadata¶

sdata.sclass.image.Image is a Blob over image content. sdata can write its metadata natively into the image file — and read it back — across six containers with one API: PNG, JPEG, JPEG 2000 (jp2), GIF, WebP and TIFF.

The embedding layer sdata.imagemeta is pure Python (standard library only): it needs no third-party tool (no exiftool) and — crucially — no Pillow to read or write the metadata. Pillow is only used to decode pixels (img.pil / img.to_numpy) or to transcode between formats on save.

Any other Pillow-writable format without a native metadata container (e.g. BMP) is handled through the same API: save writes a lossless <filepath>.meta.json sidecar and from_file reads it back — so metadata is never lost regardless of the container.

Format	Native carrier of the sdata payload	Marker
PNG	`iTXt` chunk before `IEND`	keyword `sdata`
JPEG	`APP1` segment right after SOI	`sdata\0` prefix
JP2	`uuid` box (ISO BMFF) before `jp2c`	fixed sdata UUID
GIF	comment extension after the header	`sdata\0` prefix
WebP	dedicated RIFF chunk `sdAT`	FourCC `sdAT`
TIFF	private IFD tag (original bytes untouched)	tag `65000`

pip install pillow      # optional: only needed to decode/transcode pixels

Round-trip through `Image`¶

The same three calls work for every supported format — the container is chosen from the file suffix on save:

import io
import PIL.Image
from sdata.sclass.image import Image

# some image bytes (here a freshly encoded JPEG)
buf = io.BytesIO()
PIL.Image.new("RGB", (640, 480), (30, 60, 90)).save(buf, "JPEG")

img = Image.from_bytes("specimen.jpg", buf.getvalue())
img.metadata.add("operator", "ada", description="who acquired the image")
img.metadata.add("exposure", 1.5, unit="s", dtype="float")

img.save("specimen.jpg")                 # sdata metadata embedded in the APP1 segment

reloaded = Image.from_file("specimen.jpg")
reloaded.metadata.get("operator").value  # 'ada'
reloaded.metadata.get("exposure").value  # 1.5

save is lossless when the stored bytes already use the target container: the metadata is embedded without re-encoding the pixels (and without Pillow). Only a format change (e.g. a PNG saved as .webp) transcodes via Pillow:

png = Image.from_bytes("a.png", png_bytes)
png.metadata.add("note", "converted")
png.save("a.webp")                       # transcodes to WebP, then embeds the metadata

Reading the embedded metadata never needs Pillow:

md = Image.from_file("specimen.jpg").embedded_metadata()  # a Metadata, or None

Inherited `Blob` capabilities¶

Because Image is a Blob, every image is also a content-addressable asset (see RFC 0003):

img.size           # content size in bytes
img.sha256         # SHA-256 of the content
img.update_checksum()   # store the checksum in metadata
img.verify()       # check the content against the stored checksum
img.write("s3://bucket/specimen.jpg")    # fsspec target (needs sdata[blob])

Checksum vs. embedded metadata

Embedding metadata changes the file bytes (and therefore its hash). If you need a stable content hash, compute it before embedding, or hash the decoded pixels — analogous to the data-vs-metadata hash split for DataFrame (RFC 0004).

Low-level API (`sdata.imagemeta`)¶

To embed an arbitrary text payload directly into image bytes — independent of Image and of Pillow — use the façade:

from sdata import imagemeta

imagemeta.detect_format(data)        # 'png' | 'jpeg' | 'jp2' | 'gif' | 'webp' | None
out = imagemeta.embed(data, '{"k": 1}')   # format auto-detected; replace semantics
imagemeta.extract(out)               # '{"k": 1}'  (None if absent/unknown format)
imagemeta.supported_formats()        # ('png', 'jpeg', 'jp2', 'gif', 'webp', 'tiff')

Replace semantics: embedding again replaces the previous sdata payload rather than appending a second one (idempotent).
Lenient reads: extract returns None for an unknown format or an image without an sdata payload; embed raises UnsupportedImageFormatError for an unsupported format and PayloadTooLargeError when a JPEG payload exceeds the single-APP1 limit (~64 KiB).
Extensible registry: further containers (e.g. BMP, BigTIFF) plug in as two small functions plus one registry entry.

Sidecars¶

For a container without a native metadata slot, save automatically writes a lossless <filepath>.meta.json sidecar (same payload as the embedded form), and from_file merges it back — the API is identical to the embedded case:

img = Image.from_bytes("scan.bmp", bmp_bytes)
img.metadata.add("station", "lab-3")
img.save("scan.bmp")                       # writes scan.bmp + scan.bmp.meta.json
Image.from_file("scan.bmp").metadata.get("station").value   # 'lab-3'

The sidecar policy is controllable: save(..., sidecar=True) always writes one (in addition to embedding, e.g. for tooling that only reads sidecars), sidecar=False never does, and the default (None) writes one only when the format has no native container.

When metadata must stay external (read-only originals) or machine-readable as Linked Data, the JSON-LD sidecar remains the complement — see Machine-readable metadata. Embedding and sidecars share the same metadata model and are not mutually exclusive.

The design and the per-format details are specified in RFC 0005 — Native image metadata.

Images: embedding sdata metadata¶

Round-trip through Image¶

Inherited Blob capabilities¶

Low-level API (sdata.imagemeta)¶

Sidecars¶

Round-trip through `Image`¶

Inherited `Blob` capabilities¶

Low-level API (`sdata.imagemeta`)¶