30.1. 01:00
Amazon's recent
release of
DynamoDB, a database whose name is inspired by Dynamo, the key-value
database the
distributed datastore they've been running in production for a good five to six
years now. I think it's great they've finally done it, though from my
obverservations, there's little resemblance of what the original Dynamo paper
describes, but I'm getting ahead of myself. Traditionally Amazon hasn't been
very open about how they implement their services, so some of what I'm stating
here may be nothing more than an educated guess. Either way, the result is
pretty neat.
Time to take a good look at what it has to offer, how that works out in code,
and to make some wild guesses as to what's happening under the covers. I'm using
fog's master to talk to DynamoDB in the examples
below, but the official Ruby SDK also
works. Fog is closer to the bare API metal though, which works out well for our
purposes.
My goal is not to outline the entire API and its full set of options, but to dig
into the bits most interesting to me and to show some examples. A lot of the
focus in other posts is on performance, scalability and operational ease. I
think that's a great feature of DynamoDB, but it's pretty much the same with all
of their web services. So instead I'm focusing on the effects DynamoDB has on
you, the user. We'll look at API, general usage, data model and what DynamoDB's
feature generally entails.
The Basics
DynamoDB is a distributed database in the cloud, no surprises, it's not the
first such thing Amazon has in its portfolio. S3,
SimpleDB, RDS
and DynamoDB all provide redudant ways to
store different types of data in Amazon's datacenters. Currently, DynamoDB
only supports the us-east region.
An important difference is that data stored in DynamoDB is officially stored on
SSDs, which has (or at least should have) the benefit of offering predictable
performance and greatly reduced latency across the board. The question that
remains is, of course: when can I hook up SSDs to my EC2 instances?
The other big deal is that the read and write capacity available to you is
configurable. You can tell Amazon how much capacity, in read and write units per
second, you expect for your application, and they make sure that capacity is
available to you as long as you need it (and pay for it).
Data Model
Data in DynamoDB
is stored in tables, a logical separation of data if you will. Table names are
only unique on a per-user basis, not globally like S3 buckets.
Tables store items, think rows of data or documents. Items have attributes and
values, similar to SimpleDB. Think of it as a JSON document where you can read
and write attributes independent of the entire document. Every row can have
different attributes and values, but every item needs to have a uniquely
identifying key whose name you decide on upfront, when you create the table.
Let's create a table first, but make sure you wear protection, as this is yet
another of Amazon's gross-looking APIs. The relevant API action is
CreateTable.
dynamo = Fog::AWS::DynamoDB.new(aws_access_key_id: "YOUR KEY",
aws_secret_access_key: "YOUR SECRET")
dynamo.create_table("people",
{HashKeyElement: {AttributeName: "username", AttributeType: "S"}},
{ReadCapacityUnits: 5, WriteCapacityUnits: 5})
This creates a table called "people" with an attribute "username" as the
indexing key. This is the key you're using to reference a row, sorry, an item,
in a table. The key is of type string, hence the S, you can alternatively
specify an N for numeric values. This pattern will come up with every attribute
and value pair you store, you always need to tell DynamoDB if it's a string or a
number. You also specify initial values for expected read and write capacity,
where 5 is the minimum.
You have to provide a proper value for a key, DynamoDB doesn't offer any
automatic generation of keys for you, so choose them wisely to ensure a good
distribution. A key, that has a lot of data behind it, and that's accessed much
more frequently than others will not benefit you. Keep your data simple and
store it with multiple keys if you can.
Anyhoo, to insert an item, you can use the
PutItem
API action, to which you issue a HTTP POST (go figure). The DynamoDB API already
gives me headache, but more on that later. Thankfully, a good client library
hides the terrible interface from us, only leaving us with the hash structures
sent to DynamoDB.
dynamo.put_item("people", {username: {"S" => "roidrage"}})
Specify multiple attributes as separate hash entries, each pointing to a
separate hash specifying data type and value. I'll leave it to you to think
about how backwards this is, though it must be said that JavaScript is also to
blame here, not handling 64 bit integers very well.
You can store lists (or rather sets of data) too, using SS as the datatype.
dynamo.put_item("people", {username: {S: "roidrage"},
tags: {SS: ["nosql", "cloud"]}})
Note that you can use PutItem on existing items, but you'll always replace all
existing attributes and values.
Conflicts and Conditional Writes
All writes can include optional conditions to check before updating an item.
That way you can ensure the item you're writing is still in the same state as
your local copy of the data. Think: read, update some attribute, then write,
expecting and ensuring to not have any conflicting updates from other clients in
between.
This is a pretty neat feature, you can base updates on one attributes based on
whether another attribute exists or has a certain value.
dynamo.put_item("people",
{username: {S: "roidrage"}, fullname: {S: "Mathias Meyer"}},
{Expected: {fullname: {Exists: false}}})
This operation is only executed if the item as currently stored in DynamoDB
doesn't have the attribute the attribute fullname yet. On a subsequent
PutItem call, you could also check if the item's fullname attribute still
has the same value, e.g. when updating the full name of user "roidrage".
dynamo.put_item("people",
{username: {S: "roidrage"}, fullname: {S: "Mathias Meyer"}},
{Expected: {fullname: {Value: {S: "Mathias Meyer"}}}})
It's not the greatest syntax for sure, but it does the job. Which brings me to
some musings about consistency. If the condition fails, Amazon returns a 400
response, just like with every other possible error you could cause.
To make proper use of conditional updates in your application and to actually
prevent conflicting writes and lost updates, you should use the UpdateItem
action instead and only update single attributes, as PutItem always replaces
the entire item in the database. But even then, make sure to always reference
the right attributes in an update. You could even implement some sort of
versioning scheme on top of this, for instance to emulate multi-version
concurrency control.
Updating single or multiple attributes is done with the
UpdateItem action.
You can update attributes without affecting others and add new attributes as you
see fit. To add a new attribute street, here's the slightly more verbose code.
dynamo.update_item("people", {HashKeyElement: {S: "roidrage"}},
{street: {Value: {S: "Gabriel-Max-Str. 3"}}})
There are more options involved than with PutItem, and there are several more
available for UpdateItem. But they'll have to wait for just a couple of
sentences.
Consistency on Writes
Conditional updates are the first hint that DynamoDB is not like Dynamo at all,
as they assume some sort of semi-transactional semantics. Wherein a set of nodes
agree on a state (the conditional expression) and then all apply the update. The
same is true for atomic counters, which we'll look at in just a minute.
From the documentation it's not fully clear how writes without a condition or
without an atomic counter increment are handled, or what happens when two
clients update the same attribute at the same time, and who wins based on which
condition. Facing the user, there is no mechanism to detect conflicting writes.
So I can only assume DynamoDB either lets the last write win or has a scheme
similar to BigTable, using timestamps for each attribute.
Writes don't allow you to specify something like a quorum, telling DynamoDB how
consistent you'd like the write to be, it seems to be up to the system to decide
when and how quickly replication to other datacenters is done. Alex Popescu's
summary
on DynamoDB and Werner Vogels' introduction
suggest that writes are replicated across data centers synchronously, but
doesn't say to how many. On a wild guess, two data centers would make the write
durable enough, leaving the others to asynchronous replication.
Consistency on Reads
For reads on the other hand, you can tell DynamoDB if stale data is acceptable
to you, resulting in an eventually consistent read. If you prefer strongly
consistent reads, you can specify that on a per-operation basis. What works out
well for Amazon is the fact that strongly consistent reads cost twice as much as
eventually consistent reads, as more coordination and more I/O are involved.
From a strictly calculating view, strongly consistent write take up twice as
much capacity as eventually consistent writes.
From this I can only assume that writes, unless conditional will usually be at
least partially eventually consistent or at least not immediately spread out to
all replicas. On reads on the other hand, you can tell DynamoDB to involve more
than just one or two nodes and reconcile outdated replicas before returning it
to you.
Reading Data
There's not much to reading single items. The action
GetItem allows you to
read entire items or select single attributes to read.
dynamo.get_item("people", {HashKeyElement: {S: "roidrage"}})
Optionally, add the consistency option to get strong consistency, with eventual
consistency being the default.
dynamo.get_item("people", {HashKeyElement: {S: "roidrage"}},
{ConsistentRead: true})
A read without any more options always returns all data, but you can select
specific attributes.
dynamo.get_item("people", {HashKeyElement: {S: "roidrage"}},
{AttributesToGet: ["street"]})
Atomic Counters
Atomic counters are a pretty big deal. This is what users want out of pretty
much every distributed database. Some say they're using it wrong, but I think
they're right to ask for it. Distributed counters are hard, but not impossible.
Counters need to be numerical fields, and you can increment and decrement them
using the UpdateItem action. To increment a field by a value, specify a numerical
attribute, the value to be incremented by, and use the action ADD. Note that
the increment value needs to be a string.
dynamo.update_item("people", {HashKeyElement: {S: "roidrage"}},
{logins: {Value: {N: "1"}, Action: "ADD"}})
When the counter attribute doesn't exist yet, it's created for you and set to 1.
You can also decrement values by specifying a negative value.
dynamo.update_item("people", {HashKeyElement: {S: "roidrage"}},
{logins: {Value: {N: "-1"}, Action: "ADD"}})
Can't tell you how cool I think this feature is. Even though we keep telling
people that atomic counters in distributed system are hard, as they involve
coordination and increase vulnerability to failure of a node involved in an
operation, systems like Cassandra and HBase show that it's possible to do.
Storing Data in Sets
Other than numerical (which need to be specified as strings nonetheless) and
string data types, you can store sets of both too. One member can only exist
once in an attribute set. The neat part is that you can atomically add new
members to sets using the UpdateItem action. In the JSON format sent to the
server, you always reference even just single members to add as a list of items.
Here's an example:
dynamo.update_item("people", {HashKeyElement: {S: "roidrage"}},
{tags: {Value: {SS: ["nosql"]}}})
That always replaces the existing attribute though. To add a member you need to
specify an action. If the member already exists, nothing happens, but the
request still succeeds.
dynamo.update_item("people", {HashKeyElement: {S: "roidrage"}},
{tags: {Value: {SS: ["cloud"]}, Action: "ADD"}})
You can delete elements in a similar fashion, by using the action DELETE.
dynamo.update_item("people", {HashKeyElement: {S: "roidrage"}},
{tags: {Value: {SS: ["cloud"]}, Action: "DELETE"}})
The default action is PUT, which replaces the listed attributes with the
values specified. If you run PUT or ADD on a key that doesn't exist yet, the
item will be automatically created. This little feature and handling of sets and
counters in general sounds a lot like things that made MongoDB so popular, as
you can atomically push items onto lists and increment counters.
This is another feature that's got me thinking about whether DynamoDB even
includes the tiniest bit of the original Dynamo. You could model counters and
sets based on something like Dynamo for sure, based on the ideas behind
Commutative Replicated Data
Types. But I do wonder
if Amazon actually did go through all the trouble building that system on top of
the traditional Dynamo, or if they implemented something entirely new for this
purpose. There is no doubt that operations like these is what a lot of
users want even from distributed databases, so either way, they've clearly hit a
nerve.
Column Store in Disguise?
The fact that you can fetch and update single attributes with consistency
guarantees makes me think that DynamoDB is actually more like a wide column
store like Cassandra, HBase or, gasp, Google's BigTable. There doesn't seem to
be anything left from the original, content-agnostic idea of the Dynamo data
store, whose name DynamoDB so nicely piggybacks on.
The bottom line is there's always a schema of attributes and values for a
particular item you store. What you store in an attribute is up to you, but
there's a limit of 64KB per item.
DynamoDB assumes there's always some structure in the data you're storing. If
you need something content-agnostic to store data but with similar benefits
(replication, redundancy, fault-tolerance), use S3 and CloudFront. Nothing's
keeping you from using several services obviously. If I used DynamoDB for
something it'd probably not be my main datastore, but an add-on for data I need
to have highly available and stored in a fault-tolerant fashion, but that's a
matter of taste.
A Word on Throughput Capacity
Whereas you had to dynamically add capacity in self-hosting database systems,
always keeping an eye on current capacity limits, you can add more capacity to
handle more DynamoDB request per second simply by issuing an API call.
Higher capacity means paying more. To save money, You could even adjust the
capacity for a specific time of day basis, growing up and down with your
traffic. If you go beyond your configured throughput capacity, operations may be
throttled.
Throughput capacity is based on size of items you read and the number of reads
and writes per second. Reading one item with a size of 1 KB corresponds to one
read capacity unit. Add more required capacity units with increased size and
number of operations per second.
This is the metric you always need to keep an eye on and constantly measure from
your app, not just to validate invoice Amazon sends you at the end of the month,
but also to track your capacity needs at all times. Luckily you can track this
using CloudWatch and trigger alerts. Amazon can trigger predefined alerts when
capacity crosses a certain threshold. Plus, every response to a read includes
the consumed read capacity, and the same is true for writes.
Throughput capacity pricing is pretty ingenious on Amazon's end. You pay for
what you reserve, not for what you actually use. As you always have to have more
capacity available as you currently need, you always need to reserve more in
DynamoDB. But if you think about it, this is exactly how you'd work with your
own hosted, distributed database. You'll never want to work very close to
capacity, unless you're some crazy person.
Of course you can only scale down throughput capacity once per day, but scale up
as much as you like, and increases need to be done at least 10%. I applaud you
for exploiting every possible opportunity to make money, Amazon!
Data Partitioning
Amazon's
documentation
suggests that Amazon doesn't use random partitioning to spread data across
partitions, partitioning is instead done per table. Partitions are created as
you add data and as you increase capacity, suggesting either some sort of
composite key scheme or a tablet like partitioning scheme, again, similar to
what HBase or BigTable do. This is more fiction than fact, it's an assumption on
my part. A tablet-like approach certainly would make distributing and splitting
up partitions easier than having a fixed partitioning scheme upfront like in
Dynamo.
The odd part is that Amazon actually makes you worry about partitions but
doesn't seem to offer any way of telling you about them or how your data is
partitioned. Amazon seems to handle partitioning mostly automatically and
increases capacity by spreading out your data across more partitions as you
scale capacity demand up.
Range Keys
Keys can be single values or based on a key-attribute combination. This is a
pretty big deal as it effectively gives you composite keys in a distributed
database, think Cassandra's data model. This effectively gives you a time series
database in the cloud, allowing you to store sorted data.
You can specify a secondary key on which you can query by a range, and which
Amazon automatically indexes for you. This is yet another feature that makes
DynamoDB closer to a wide column store than the traditional Dynamo data store.
The value of the range key could be an increasing counter (though you'd have to
take care of this yourself), a timestamp, or a time based UUID. Of course it
could be anything else and unique entirely, but time series data is just a nice
example for range keys. The neat part is that this way you can extract data for
specific time ranges, e.g. for logging or activity feeds.
We already looked at how you define a normal hash key, let's look at an example
with a more complex key combining a hash key and a range key, using a numerical
type to denote a timestamp.
dynamo.create_table("activities", {
HashKeyElement: {AttributeName: "username", AttributeType: "S"},
RangeKeyElement: {AttributeName: "created_at", AttributeType: "N"}},
{ReadCapacityUnits: 5, WriteCapacityUnits: 5})
Now you can insert new items based on both the hash key and a range key. Just
specify at least both attributes in a request.
dynamo.put_item("activities", {
username: {S: "roidrage"},
created_at: {N: Time.now.tv_sec.to_s},
activity: {S: "Logged in"}})
The idea is simple, it's a timestamp-based activity feed per user, indexed by
the time of the activity. Using the
Query action, we can fetch a range of
activities, e.g. for the last 24 hours. Just using get_item, you always have
to specify a specific combination of hash and range key.
To fetch a range, I'll have to resort to using Amazon's Ruby SDK, as fog hasn't
implemented the Query action yet. That way you won't see the dirty API stuff
for now, but maybe that's a good thing.
dynamo = AWS::DynamoDB.new(access_key_id: "YOUR KEY",
secret_access_key: "YOUR SECRET")
activites = dynamo["activities"]
items = activities.items.query(
hash_key: "roidrage",
range_greater_than: (Time.now - 85600).tv_sec)
This fetches all activity items for the last 24 hours. You can also fetch more
historic items by specifying ranges. This example fetches all items for the last
seven days.
items = activities.items.query(
hash_key: "roidrage",
range_greater_than: (Time.now - 7.days.ago).tv_sec,
range_less_than: Time.now.tv_sec)
Note that the Query action is only available for tables with composite keys. If
you don't specify a range key, DynamoDB will return all items matching the hash
key.
Queries using Filters
The only things that's missing now is a way to do richer queries, which DynamoDB
offers by way of the
Scan
action. fog doesn't have an implementation for this yet, so we once again turn
to the AWS Ruby SDK.
Scanning allows you to specify a filter, which can be an arbitrary number of
attribute and value matches. Once again the Ruby SDK abstracts the ugliness of
the API underneath into something more readable.
activities.items.where(:activity).begins_with("Logged").each do |item|
p item.attributes.to_h
end
You can include more than one attribute and build arbitrarily complex queries,
in this example to fetch only items related to the user "roidrage" and him
logging in.
activities.items.where(:activity).equals("Logged in").
and(:username).equals("roidrage").each do |item|
p item.attributes.to_h
end
You can query for ranges as well, combining the above with getting only items
for the last seven days.
activities.items.where(:activity).equals("Logged in").
and(:username).equals("mathias").
and(:created_at).between(7.days.ago.tv_sec, Time.now.tv_sec).each {|item|
p item.attributes.to_h
end
Filters fetch a maximum of 1 MB of data. As soon as that's accumulated, the scan
returns, but also includes the last item evaluated, so you can continue where
the query left off. Interestingly, you also get the number of items remaining.
Something like paginating results dynamically comes to mind. Running a filter is
very similar to running a full table scan, though it's rather efficient thanks
to SSDs. But don't expect single digit millisecond responses here. As scans
heavily affect your capacity throughput, you're better off resorting to their
use only for ad-hoc queries, not as a general part of your application's
workflow.
Unlike Query and GetItem, the Scan action doesn't guarantee strong
consistency, and there's no flag to explicitly request it.
The API
DynamoDB's HTTP API has got to be the worst ever built at Amazon, you could even
think it's the first API every designed at Amazon. You do POST requests even to
GET data. The request and response can include conditions and validations and
their respective errors, the proper Java class name of an error and other crap.
Not to mention that every error caused by any kind of input on your end always
causes a 400.
Here's an example of a simplified request header:
POST / HTTP/1.1
Content-Type: application/x-amz-json-1.0
x-amz-target: DynamoDB_20111205.GetItem
And here's a response body from an erroneous request:
{"__type":"com.amazon.coral.validate#ValidationException",
"message":"One or more parameter values were invalid:
The provided key size does not match with that of the schema"}
Lovely! At least there's a proper error message, though it's not always telling
you what the real error is. Given that Amazon's documentation is still filled with
syntactical errors, this is a bit inconvenient.
The API is some bastard child of SOAP, but with JSON instead of XML. Even using
a pretty low level library like fog doesn't hide all the crap from you. Which
worked out well in this case, as you see enough of the API to get an idea about
its basic workings.
The code examples above don't read very Ruby like as I'm sure you'll agree.
Though I gotta say, the Ruby SDK provided by AWS feels a lot more like Ruby in
its
interface.
I don't have very high hopes to see improvements on Amazon's end, but who knows.
S3, for example, got a pretty decent REST API eventually.
Pricing
Pricing is done (aside from throughput capacity) per GB stored. If you store
millions of items, and they all exceed size of just a few KB, expect to pay
Amazon a ton of money, storage pricing trumps throughput pricing by a lot. Keep
data stored in an item small. If you only store a few large items, it works too,
but you may end up being better off choosing one of Amazon's other storage
options. You do the math. Pricing for storage and the maximum size for a single
item always includes attribute names, just like with SimpleDB.
To give you an idea how pricing works out, here's a simple calculation. 100 GB
of data, 1000 reads per second, 200 writes per second, item size is 4 KB on
average. That's $1253.15 every month, not including traffic. Add 10 GB of data the
second month and you're at $1263.15. You get the idea. Pricing is much more
affected by read and write operations vs. item size. Make your items 6 KB in
size, and you're already at $1827.68.
Bottom Line
Though Amazon is doing a pretty good job at squeezing the last drop of money out
of their customers using DynamoDB, think about what you're getting in return. No
operations, no hosting facilities, let alone in three different datacenters,
conditional writes and atomic counters, and a database that (I assume) has years
of experience in production forged into it.
As usually the case with Amazon's Web Services, using something like DynamoDB is
a financial tradeoff. You get a data store that scales up and down with your
demand without you having to worry about the details of hardware, operations,
replication and data performance. In turn you pay for every aspect of the
system. The price for storage is likely to go down eventually, making it a more
attractive alternative to hosting an open source NoSQL database system yourself.
Whether this is an option for your specific use case, only you're able to make
that decision.
If you store terrabytes of data, and that data is worth tens of thousands of
dollars per month in return for not having to care about hosting, by all means,
go for DynamoDB. But at that size, just one or two months of hosting on Amazon
pays off buying servers and SSDs for several data centers. That obviously
doesn't cover operational costs and personell, but it's just something to think
about.
Closing Thoughts
Sorted range keys, conditional updates, atomic counters, structured data and
multi-valued data types, fetching and updating single attributes, strong
consistency, and no explicit way to handle and resolve conflicts other than
conditions. A lot of features DynamoDB has to offer remind me of everything
that's great about wide column stores like Cassandra, but even more so of HBase.
This is great in my opinion, as Dynamo would probably not be well-suited for a
customer-facing system. And indeed, Werner Vogel's post on DynamoDB seems to suggest
DynamoDB is a bastard child of Dynamo and SimpleDB, though with lots of sugar
sprinkled on top.
Note that it's certainly possible and may actually be the case that Amazon has
built all of the above on top of the basic Dynamo ingredients, Cassandra living
proof that it's possible. But if Amazon did reuse a lot of the existing Dynamo
code base, they hid it really well. All the evidence points to at least heavy
usage of a sorted storage system under the covers, which works very well with
SSDs, as they make sequential writes and reads nice and fast.
No matter what it is, Amazon has done something pretty great here. They hide
most of the complexity of a distributed system from the user. The only option
you as a user worry about is whether or not you prefer strong consistency.
No quorum, no thinking about just how consistent you want a specific write
or read operation to be.
I'm looking forward to seeing how DynamoDB evolves. Only time will tell how big
of an impact Amazon's entering the NoSQL market is going to have. Give it a
whirl to find out more about it.
Want to know how the original Dynamo system works? Have a look at the Riak
Handbook, a comprehensive guide to Riak a
distributed database that implements the ideas of Dynamo and adds lots of sugar
on top.
I'm happy to be proven wrong or told otherwise about some of my assumptions
here, so feel free to get in touch!
Resources
Be sure to read Werner Vogels' announcement of DynamoDB, and
Adrian Cockcroft's comments
have some good insights on the evolution of data storage at Netflix and how
Cassandra, SimpleDB and DynamoDB compare.
