DictShield is a database-agnostic modeling system. It provides a way to model, validate, and reshape data — without requiring any particular database.
A blog model might look like this:
from dictshield.document import Document
from dictshield.fields import StringField
class BlogPost(Document):
title = StringField(max_length=40)
body = StringField(max_length=4096)DictShield documents serialize to JSON via to_json() and to plain Python
dicts via to_python(). Store them in Memcached, MongoDB, Riak, or whatever
else you need.
Say we have some data coming in from a client:
data = json.loads(request.post["data"])
Model(**data).validate()Python 3.7 or newer. The optional bson package (from pymongo) is used by
ObjectIdField when available; without it, the field falls back to validating
values as 24-character hex strings.
pip install dictshield
DictShield provides helpers for a few common modeling needs:
- Creating flexible documents
- Easy use with databases or caches
- A type system
- Validation of types
- Input / output shaping
Object hierarchies can also be mapped into dictionaries — useful for those who want class instances representing their data instead of just filtering dictionaries through the class's static methods.
There are a few ways to use DictShield. A simple case is to create a class
with typed fields. DictShield ships many field types in fields.py,
including EmailField, URLField, DecimalField, and MD5Field.
Below is a Media class with a single title field:
from dictshield.document import Document
from dictshield.fields import StringField
class Media(Document):
"""Simple document with one StringField member."""
title = StringField(max_length=40)You instantiate the class just like any Python class and can see its
dictionary representation via to_python():
m = Media(title="Misc Media")
print(m.to_python())Output:
{
"_types": ["Media"],
"_cls": "Media",
"title": "Misc Media",
}_types and _cls are metadata added by the metaclass so DictShield can
round-trip subclass information.
_types stores the hierarchy of Document classes used to create the
document; _cls stores the concrete class. This becomes clearer when we
subclass Media to create Movie:
import datetime
from dictshield.fields import IntField
class Movie(Media):
"""Subclass of Media with public-field whitelist."""
_public_fields = ["title", "year"]
year = IntField(min_value=1950, max_value=datetime.datetime.now().year)
personal_thoughts = StringField(max_length=255)An instance:
mv = Movie(
title="Total Recall",
year=1990,
personal_thoughts="I wish I had three hands...",
)Its Python-dict representation:
{
"personal_thoughts": "I wish I had three hands...",
"_types": ["Media", "Media.Movie"],
"title": "Total Recall",
"_cls": "Media.Movie",
"year": 1990,
}Notice that _types tracks the Media → Movie relationship.
to_python()— native Python dict; embedded documents remain as document instances.to_python_dict()— liketo_python(), but embedded documents are recursively expanded into plain dicts.to_json()— returns a JSON-encoded string.
Upgrading is easy: you can add optional fields and remove them. Unknown keys passed to a document's constructor are silently discarded, so incoming data with fields you no longer care about won't break instantiation.
Pass the Python dict directly to MongoDB:
db.test_collection.save(m.to_python())Or Riak:
media = bucket.new("test_key", data=m.to_python())
media.store()Or cache the JSON string in Memcached:
mc["test_key"] = m.to_json()Field validators raise DictPunch when a value is invalid. Here is a
slimmed-down version of MD5Field (the real implementation lives in
_HexHashField, a base class shared with SHA1Field):
class MD5Field(_HexHashField):
hash_length = 32
hash_name = "MD5"_HexHashField.validate() raises DictPunch if the value is the wrong
length or isn't valid hex. The exception exposes field_name and
field_value, and its str() representation is:
MD5 value is wrong length - secret(whatevz)
If the overhead of validation isn't needed in a hot path, skip it by simply
not calling validate().
A Document instance validates via validate():
from dictshield.document import Document
from dictshield.fields import MD5Field, StringField, URLField
class User(Document):
_public_fields = ["name"]
secret = MD5Field()
name = StringField(required=True, max_length=50)
bio = StringField(max_length=100)
url = URLField()Seed the instance and validate:
from dictshield.document import DictPunch
user = User(secret="whatevz", name="test hash")
try:
user.validate()
except DictPunch as dp:
print("DictPunch caught: %s" % dp)validate() iterates the document's fields and calls each field's
validate().
Given this JSON:
{"bio": "Python, Erlang and guitars!", "secret": "e8b5d682452313a6142c10b045a9a135", "name": "J2D2"}Construct and validate:
user_input = json.loads(json_string)
User(**user_input).validate()Unknown keys in user_input are dropped at construction time. If validation
fails, DictPunch is raised with the offending field name and value.
Input comes from the outside world, so its shape is uncertain. Output recipients vary too — typically the data owner versus the general public.
Unknown keys are dropped automatically:
total_input = {
"rogue_field": "MWAHAHA",
"bio": "Python, Erlang and guitars!",
"secret": "e8b5d682452313a6142c10b045a9a135",
"name": "J2D2",
}
user_doc = User(**total_input).to_python()user_doc now looks like:
{
"_types": ["User"],
"bio": "Python, Erlang and guitars!",
"secret": "e8b5d682452313a6142c10b045a9a135",
"name": "J2D2",
"_cls": "User",
}Document.make_json_ownersafe(instance) returns JSON with the document's
internal fields (_cls, _types, _id) removed. You can extend the
blacklist by defining a class-level list named _private_fields:
Movie.make_json_ownersafe(mv)Result:
{"personal_thoughts": "I wish I had three hands...", "title": "Total Recall", "year": 1990}Document.make_json_publicsafe(instance) returns JSON containing only the
fields listed in _public_fields:
Movie.make_json_publicsafe(mv)Result:
{"title": "Total Recall", "year": 1990}For partial updates, validate individual fields without instantiating a whole document.
validate_class_fields checks that the input dictionary matches the
document's shape, including required fields:
user_input = {"url": "http://j2labs.tumblr.com"}
try:
User.validate_class_fields(user_input)
except DictPunch as dp:
print("Validation failure: %s" % dp)This raises because name is required but absent.
validate_class_partial validates only the fields present in the input —
useful when updating one or two fields at a time:
User.validate_class_partial(user_input)Pass validate_all=True to collect every exception instead of raising on
the first failure. The return value is a list; an empty list means all
fields validated:
exceptions = User.validate_class_fields(total_input, validate_all=True)For nested data, use EmbeddedDocument and EmbeddedDocumentField:
from dictshield.document import Document, EmbeddedDocument
from dictshield.fields import EmbeddedDocumentField, StringField
class Address(EmbeddedDocument):
city = StringField()
zip_code = StringField()
class Person(Document):
name = StringField(required=True)
address = EmbeddedDocumentField(Address)Embedded fields accept either an EmbeddedDocument instance or a plain
dict; dicts are coerced to instances on access and validation:
p = Person(name="Alice", address={"city": "Paris", "zip_code": "75000"})
p.validate()
p.to_json()
# '{"name": "Alice", "address": {"city": "Paris", "zip_code": "75000", ...}, ...}'The test suite uses the standard library unittest:
python -m unittest discover -s tests
James Dennis, Andrew Gwozdziewycz, Dion Paragas.
BSD.