Removing Performance Bottlenecks in Distributed System

One of our clients is a global leader in investment banking. One of their Order Management Systems had some performance concerns. When the Sales people found delays in entering orders, it became a critical priority for the development team to address these issues, so that order entry was highly performant.

The production support team had noticed performance issues in the RDBMS. The system was heavily dependent on the RDBMS for its function. Hence solving the RDBMS related issues was central to application performance.

First step was to identify long running queries on the database and to tune them. Tools like AppDynamics helped in identifying those. However, improving query performance turned out to be non-trivial. The query plans did not show signs of problems, like table scans. The problem turned out to be different.

The database logs and AppDynamics revealed that there was a lot of contention in the database. Queries were spending more time waiting for a lock, than actually executing. The most contentious place was a table used to generate sequential IDs for all entities. This table maintained the last ID used in the sequence for each entity. The IDs were generated using a stored proc, running in a transaction.

The issue was that the ID generation was done as part of a transaction that was processing a single request. The DB schema is normalized, as expected for an OLTP system. Processing any request involves transaction across multiple tables. Most of the request processing involved generation of new IDs. Thus the ID generation became a point of contention across the system.

The solution was found by taking the ID generation process out of the main request processing transaction. The application first figured out all the entity IDs that needed to be generated for processing any request. The IDs were generated in a single “outside” transaction and then used in the request processing.

In summary, RDBMS performance was hampered due to contention, rather than bad query plans. It was fixed by moving the contentious queries to a separate transaction.