Skip to content

tomwaits/dictshield

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DictShield

DictShield is a database-agnostic modeling system. It provides a way to model, validate, and reshape data — without requiring any particular database.

A blog model might look like this:

from dictshield.document import Document
from dictshield.fields import StringField

class BlogPost(Document):
    title = StringField(max_length=40)
    body = StringField(max_length=4096)

DictShield documents serialize to JSON via to_json() and to plain Python dicts via to_python(). Store them in Memcached, MongoDB, Riak, or whatever else you need.

Say we have some data coming in from a client:

data = json.loads(request.post["data"])
Model(**data).validate()

Requirements

Python 3.7 or newer. The optional bson package (from pymongo) is used by ObjectIdField when available; without it, the field falls back to validating values as 24-character hex strings.

Installing

pip install dictshield

The Design

DictShield provides helpers for a few common modeling needs:

  1. Creating flexible documents
  2. Easy use with databases or caches
  3. A type system
  4. Validation of types
  5. Input / output shaping

Object hierarchies can also be mapped into dictionaries — useful for those who want class instances representing their data instead of just filtering dictionaries through the class's static methods.

Example Uses

There are a few ways to use DictShield. A simple case is to create a class with typed fields. DictShield ships many field types in fields.py, including EmailField, URLField, DecimalField, and MD5Field.

Creating Flexible Documents

Below is a Media class with a single title field:

from dictshield.document import Document
from dictshield.fields import StringField

class Media(Document):
    """Simple document with one StringField member."""
    title = StringField(max_length=40)

You instantiate the class just like any Python class and can see its dictionary representation via to_python():

m = Media(title="Misc Media")
print(m.to_python())

Output:

{
    "_types": ["Media"],
    "_cls": "Media",
    "title": "Misc Media",
}

_types and _cls are metadata added by the metaclass so DictShield can round-trip subclass information.

More On Object Modeling

_types stores the hierarchy of Document classes used to create the document; _cls stores the concrete class. This becomes clearer when we subclass Media to create Movie:

import datetime
from dictshield.fields import IntField

class Movie(Media):
    """Subclass of Media with public-field whitelist."""
    _public_fields = ["title", "year"]
    year = IntField(min_value=1950, max_value=datetime.datetime.now().year)
    personal_thoughts = StringField(max_length=255)

An instance:

mv = Movie(
    title="Total Recall",
    year=1990,
    personal_thoughts="I wish I had three hands...",
)

Its Python-dict representation:

{
    "personal_thoughts": "I wish I had three hands...",
    "_types": ["Media", "Media.Movie"],
    "title": "Total Recall",
    "_cls": "Media.Movie",
    "year": 1990,
}

Notice that _types tracks the MediaMovie relationship.

Three serialization variants

  • to_python() — native Python dict; embedded documents remain as document instances.
  • to_python_dict() — like to_python(), but embedded documents are recursively expanded into plain dicts.
  • to_json() — returns a JSON-encoded string.

Upgrading Documents

Upgrading is easy: you can add optional fields and remove them. Unknown keys passed to a document's constructor are silently discarded, so incoming data with fields you no longer care about won't break instantiation.

Easy To Use With Databases Or Caches

Pass the Python dict directly to MongoDB:

db.test_collection.save(m.to_python())

Or Riak:

media = bucket.new("test_key", data=m.to_python())
media.store()

Or cache the JSON string in Memcached:

mc["test_key"] = m.to_json()

A Type System

Field validators raise DictPunch when a value is invalid. Here is a slimmed-down version of MD5Field (the real implementation lives in _HexHashField, a base class shared with SHA1Field):

class MD5Field(_HexHashField):
    hash_length = 32
    hash_name = "MD5"

_HexHashField.validate() raises DictPunch if the value is the wrong length or isn't valid hex. The exception exposes field_name and field_value, and its str() representation is:

MD5 value is wrong length - secret(whatevz)

If the overhead of validation isn't needed in a hot path, skip it by simply not calling validate().

Validation Of Types

A Document instance validates via validate():

from dictshield.document import Document
from dictshield.fields import MD5Field, StringField, URLField

class User(Document):
    _public_fields = ["name"]
    secret = MD5Field()
    name = StringField(required=True, max_length=50)
    bio = StringField(max_length=100)
    url = URLField()

Seed the instance and validate:

from dictshield.document import DictPunch

user = User(secret="whatevz", name="test hash")
try:
    user.validate()
except DictPunch as dp:
    print("DictPunch caught: %s" % dp)

validate() iterates the document's fields and calls each field's validate().

Validating User Input

Given this JSON:

{"bio": "Python, Erlang and guitars!", "secret": "e8b5d682452313a6142c10b045a9a135", "name": "J2D2"}

Construct and validate:

user_input = json.loads(json_string)
User(**user_input).validate()

Unknown keys in user_input are dropped at construction time. If validation fails, DictPunch is raised with the offending field name and value.

Input / Output Shaping

Input comes from the outside world, so its shape is uncertain. Output recipients vary too — typically the data owner versus the general public.

Removing Unknown Fields

Unknown keys are dropped automatically:

total_input = {
    "rogue_field": "MWAHAHA",
    "bio": "Python, Erlang and guitars!",
    "secret": "e8b5d682452313a6142c10b045a9a135",
    "name": "J2D2",
}

user_doc = User(**total_input).to_python()

user_doc now looks like:

{
    "_types": ["User"],
    "bio": "Python, Erlang and guitars!",
    "secret": "e8b5d682452313a6142c10b045a9a135",
    "name": "J2D2",
    "_cls": "User",
}

JSON for the Owner of the Document

Document.make_json_ownersafe(instance) returns JSON with the document's internal fields (_cls, _types, _id) removed. You can extend the blacklist by defining a class-level list named _private_fields:

Movie.make_json_ownersafe(mv)

Result:

{"personal_thoughts": "I wish I had three hands...", "title": "Total Recall", "year": 1990}

JSON for Public View

Document.make_json_publicsafe(instance) returns JSON containing only the fields listed in _public_fields:

Movie.make_json_publicsafe(mv)

Result:

{"title": "Total Recall", "year": 1990}

Working Without Instances

For partial updates, validate individual fields without instantiating a whole document.

Class-Level Validation

validate_class_fields checks that the input dictionary matches the document's shape, including required fields:

user_input = {"url": "http://j2labs.tumblr.com"}

try:
    User.validate_class_fields(user_input)
except DictPunch as dp:
    print("Validation failure: %s" % dp)

This raises because name is required but absent.

validate_class_partial validates only the fields present in the input — useful when updating one or two fields at a time:

User.validate_class_partial(user_input)

Aggregating Errors

Pass validate_all=True to collect every exception instead of raising on the first failure. The return value is a list; an empty list means all fields validated:

exceptions = User.validate_class_fields(total_input, validate_all=True)

Embedded Documents

For nested data, use EmbeddedDocument and EmbeddedDocumentField:

from dictshield.document import Document, EmbeddedDocument
from dictshield.fields import EmbeddedDocumentField, StringField

class Address(EmbeddedDocument):
    city = StringField()
    zip_code = StringField()

class Person(Document):
    name = StringField(required=True)
    address = EmbeddedDocumentField(Address)

Embedded fields accept either an EmbeddedDocument instance or a plain dict; dicts are coerced to instances on access and validation:

p = Person(name="Alice", address={"city": "Paris", "zip_code": "75000"})
p.validate()
p.to_json()
# '{"name": "Alice", "address": {"city": "Paris", "zip_code": "75000", ...}, ...}'

Running the Tests

The test suite uses the standard library unittest:

python -m unittest discover -s tests

Contributors

James Dennis, Andrew Gwozdziewycz, Dion Paragas.

License

BSD.

About

A typed dictionary for Python... sorta.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%