To be clear, CAP isn’t about what is possible, but rather, what isn’t possible.
Here’s a more practical way to think about CAP: In the face of network partitions, you can’t always have both perfect consistency and 100% availability. The typical CAP definition says: You can’t have all three. Partition Tolerant: The system is designed to operate in the face of unplanned network connectivity loss between replicas.Available: All live nodes in a distributed system can process operations and respond to queries.Consistent: All replicas of the same data will be the same value across a distributed system.It was presented as a conjecture by Eric Brewer at the 2000 Symposium on Principles of Distributed Computing and formalized and proven by Gilbert and Lynch in 2002. What is CAP?ĬAP is a tool to explain trade-offs in distributed systems. These critics are right, but it seems hard to deny that the quality of the acronym has, in part, driven the success of ACID as a concept. Specifically, the “C” is not like the others. Many also argue the acronym itself is a bit forced. Many argue that this implies the transaction is on disk as well most formal definitions aren’t specific. Durable: Once a transaction is committed, it will persist and will not be undone to accommodate conflicts with other operations.With isolation, an incomplete transaction cannot affect another incomplete transaction. Isolated: Fundamental to achieving concurrency control, isolation ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e., one after the other.Note that this is different from “consistency” as it’s defined in the CAP theorem. No transaction can create an invalid data state. Thus, any data written to the database must be valid and any transaction that completes will change the state of the database. Consistent: Transactions must follow the defined rules and restrictions of the database, e.g., constraints, cascades, and triggers.All are completed or none are if one part of a transaction fails, the database’s state is unchanged. Atomic: All components of a transaction are treated as a single action.Designing applications to cope with concurrency anomalies in their data is very error-prone, time-consuming, and ultimately not worth the performance gains. The system must provide ACID transactions, and must always present applications with consistent and correct data. The value of ACID transactions is argued in the seminal Google F1 paper: ACID transactions offer guarantees that absolve the end user of much of the headache of concurrent access to mutable database state. Databases that offer transactional semantics offer a clear way to start, stop, and cancel (or roll back) a set of operations (reads and writes) as a single logical meta-operation.īut transactional semantics do not make a “transaction.” A true transaction must adhere to the ACID properties. Still, simply understanding these rules can educate those who seek to bend them about the dragons they may encounter.Ī transaction is a bundling of one or more operations on database state into a single sequence. ACID transactions solve many problems when implemented to the letter but have been engaged in a push-pull with performance trade-offs ever since. The “ACID” rules were originally defined by Jim Gray in the 1970s the acronym was popularized in the 1980s. Something needed to be done so the academics were called in. This led to problems where data could be changed or overwritten while other users were in the middle of a calculation. Eventually, multiple users shared data on a machine. As computers became more powerful, they were tasked with managing more data. The idea of transactions, their semantics and guarantees, evolved with data management itself.
What follows is a primer on the two concepts and an explanation of the differences between the two “C”s. It’s unfortunate that in both acronyms the “C” stands for “Consistency,” but actually means completely different things. The ACID properties and the CAP theorem are two important concepts in data management and distributed systems.