Friday, March 26, 2010

Transaction Basics

The core idea behind a transaction is that it's a complex operation completed as an all-or-nothing set of simpler operations. If one of the simpler operations fails, then none of the operations should be applied.
The classic example is making a purchase, and indeed this is probably where the name "transaction" comes from. There are several steps:
  1. Pull funds from buyer's account
  2. Add funds to seller's account
  3. Decrease product inventory
A failure could happen at any of those steps. For example, at step 1, if the buyer doesn't have enough money, then the transaction should fail. (We wouldn't add funds to the seller's account and we wouldn't decrease product inventory.) At step 2, if the seller's account has been frozen due to suspected involvement in illegal activities, then the transaction should fail. At step 3, if the product is out of stock, then again the transaction should fail.

ACID Properties

ACID is an acronym that stands for atomic, consistent, isolated and durable. An operation is a transaction if and only if it has these four characteristics:
  • Atomic: This is the aforementioned all-or-nothing property. Even though a transaction comprises multiple operations, it is completed as a single logical unit of work. Either the whole thing succeeds or none of it does.
  • Consistent: We want our "transactional resource" (which in most cases means our database) to be consistent with reality. So for example if I have 4,132 widgets in stock after I sell you five of them, then I want the database to say that I have 4,132 widgets in stock after I run a sell(you, 5, widget) transaction. The consistency property means that after a transaction completes, the database (or whatever transactional resource we're using) matches up with the real world.
  • Isolated: In general we want to be able to run lots of transactions against the database at the same time, and we don't want to make a big mess out of things in the process. The isolation property means that when you run a transaction, it achieves its results without interference from other transactions. That doesn't mean that the transactions can't run concurrently—it just means that the transactions shouldn't corrupt each other in the process. In practice it's nontrivial to strike the right balance between concurrency and isolation; transactions however exhibit some level of isolation, and well-designed transactions exhibit the "right" amount of isolation.
  • Durable: This just means that after a transaction has ended, its results are made persistent. Probably they would have called this property "persistent" instead of "durable", but "ACIP" is not as cool as "ACID".

Defining Individual Transactions

Now that we know what transactions are, we need to understand how individual transactions are actually specified. Defining a transaction involves picking a chunk of code (maybe a method, maybe a set of methods, maybe a block of code inside a method) and setting values for the following five parameters:
  • Propagation behavior: When defining a transaction we must specify its "boundaries": where does the transaction start and where does it end? In Spring (and also in EJB), the boundaries of a transaction may or may not coincide with the boundaries of a Java method. In one respect this is not unlike using the synchronized keyword: if one synchronized method calls another on the same class, then the protected scope includes both methods, not just one, and so the boundaries do not coincide with a single method. But the difference is that with transactions you have a lot more flexibility as far as defining the boundaries. Certainly it is possible to enter a transaction starting from one method and then include additional method calls within the scope of that initial transaction. But you can do other things too, such as declare that calls to other methods open up new transactions, or that a given method must always run within the scope of an existing transaction, etc. I won't go over all the options here but there are seven and you can see them here: transaction propagation options. Probably PROPAGATION_REQUIRED is the most typical.
  • Isolation level: I mentioned above that in practice concurrency trades against isolation. (Once again there's an analogy with threads, incidentally.) You can specify the level of isolation you need for a given transaction; the general rule of thumb is assign the weakest isolation level you can safely get away with so as to maximize concurrency. Again I won't go into the details but you can find them here: transaction isolation levels. The most typical cases are ISOLATION_READ_COMMITTED and ISOLATION_REPEATABLE_READ. The ISOLATION_READ_UNCOMMITTED is so weak as to be almost useless in practice, and ISOLATION_SERIALIZABLE is so strong that you don't want to use it unless you really, really need the safety it provides: it's a concurrency-killer.
  • Timeout: Transaction timeouts allow the calling application to give up if the database hasn't completed the transaction within a specifiable interval of time.
  • Read-only: This flag indicates whether the transaction will be, well, read-only. If so, you can flag it as such and the underlying database or other resource can apply optimizations based on that fact.
  • Rollback behavior: By default, both Spring and EJB roll back a transaction (i.e., cancel it without applying any of the changes) when runtime exceptions, but not when checked exceptions occur. You can modify that behavior.

No comments:

Post a Comment