Recently I was looking at the performance of a cluster and discovered the development team were performing a lot of count(*) queries on a specific table. This was easy to spot — it was a DataStax cluster and the queries were showing up in the node_slow_table.
The table collected daily orders at a location, something like this:
CREATE TABLE location_orders (
location_id int,
calculation_date date,
order_id int,
PRIMARY KEY ((location_id, calculation_date), order_id)
) WITH CLUSTERING ORDER BY (order_id ASC)
With queries like:
SELECT count(*) FROM location_orders
WHERE location_id = 1 AND calculation_date = '2019-03-29';
Even though this is “only” counting within one partition, there is no quick way for the node to count this data. Every clustered row within the partition needs to be loaded into memory to be counted. With up to 40,000 rows per partition and this query repeated across locations, it was causing significant heap pressure and too much time spent in garbage collection.
The team did not want to use Cassandra counters due to edge cases and accuracy concerns. So a new strategy was devised.
The Solution: timeuuid as a Clustering Key
First, create a new table with a timeuuid column added as a clustering key:
CREATE TABLE location_orders2 (
location_id int,
calculation_date date,
timestamp timeuuid,
order_id int,
PRIMARY KEY ((location_id, calculation_date), timestamp, order_id)
) WITH CLUSTERING ORDER BY (timestamp ASC)
Adding a timeuuid timestamp ensures rows within a partition are stored in strict time insertion order:
INSERT INTO location_orders2 (location_id, calculation_date, timestamp, order_id)
VALUES (1, '2019-03-29', now(), 1);
INSERT INTO location_orders2 (location_id, calculation_date, timestamp, order_id)
VALUES (1, '2019-03-29', now(), 2);
Efficient Counting with Pagination
Now we can count more efficiently by only loading each record once, and paginating using the timestamp.
First, get the count and the latest timestamp:
SELECT count(*), max(timestamp) FROM location_orders2
WHERE location_id = 1 AND calculation_date = '2019-03-29';
count | system.max(timestamp)
-------+--------------------------------------
2 | 5f7bbc40-5240-11e9-96f8-6f39bc97f361
Then, for subsequent counts, only read rows inserted after that timestamp:
SELECT count(*), max(timestamp) FROM location_orders2
WHERE location_id = 1 AND calculation_date = '2019-03-29'
AND timestamp > 5f7bbc40-5240-11e9-96f8-6f39bc97f361;
This means each record is only loaded into memory once per count cycle, rather than on every count.
Persisting the Count with LWT
For a single process, the count and last timestamp could be stored in memory. But typically the count is required by multiple processes. In this case we store the running total back in Cassandra:
CREATE TABLE location_order_count (
location_id int,
calculation_date date,
last_timestamp timeuuid,
count int,
PRIMARY KEY ((location_id, calculation_date))
)
This table has no clustering key, so each insert overwrites the previous data. To prevent two processes overwriting each other’s work, use a Lightweight Transaction (LWT):
UPDATE location_order_count
SET last_timestamp = e3c60511-5244-11e9-96f8-6f39bc97f361, count = 4
WHERE location_id = 1 AND calculation_date = '2019-03-29'
IF last_timestamp = 8e7e1ea1-5242-11e9-96f8-6f39bc97f361;
[applied]
-----------
True
If another process had already updated the count, you’d get False returned along with the newer timestamp — at which point you can safely discard your update, since there’s already more recent data in the count table.
The complete workflow is:
- Read the current count and timestamp from
location_order_count - Count the number of records in
location_orders2since that timestamp - Store the new count and timestamp only if the old timestamp is still current (using LWT)
This structure efficiently performs a count by reading each record only once, reducing heap pressure, GC time, and CPU usage significantly.
In a future post I will look at deletion of stale data from these tables.