How I Learned to Find the Most Recent Policy Version

Elisha Antunes
Jun 5
3 min read

Updated: Jun 22

One of the most valuable lessons I’ve learned in insurance data work is how to consistently identify the most recent version of a policy. It sounds simple, but getting it wrong can lead to outdated reports, misleading metrics, and failed downstream processes.

If you’re working with transactional policy data — where each update (ADD, CHG, CNL) is stored as a separate row — this problem will feel familiar. In our case, every policy event created a new record, and we needed to report only the latest version of each policy.

In this post, I’ll walk you through the exact problem I faced, how I first attempted to solve it, and how the ROW_NUMBER() window function ended up being the most elegant and reliable solution.

The Problem

Let’s say you’re working with the following tables:

policies: each row represents a transaction (ADD, CHG, CNL)
policy_versions: each row links to a policy_id and possibly a quote_id

Here’s a simplified query I tried:

SELECT * FROM policies WHERE creation_date = ( SELECT MAX(creation_date) FROM policies p2 WHERE p2.policy_number = policies.policy_number );

This worked... until i realized it didn’t scale well when I started joining it to the policy_versions or quotes tables. But the deeper issue came to light when I started grouping and aggregating policies with other transactional tables.

In one real-world example, I needed to:

Join policies to quote versions, venues, and coverages
Aggregate values like maximum daily attendance and coverage limits
Calculate adjusted premiums across versions

That’s when I found myself writing a complex report that needed to return only the latest version of a policy while computing premiums and merging event information — and the MAX() function simply couldn't handle that workload.

The issue was subtle: creation dates weren’t always a clear or unique indicator of the “most recent version.” I needed a way to find the latest transaction per policy, without breaking joins or losing access to the full row.

Enter ROW_NUMBER()

I’d used basic window functions before (like RANK() and COUNT()), but hadn’t really appreciated their full power until this moment.

The beauty of ROW_NUMBER() is that it assigns a unique sequential number to each row within a group — based on your sort criteria. So, instead of trying to wrangle MAX() in a subquery, I could do this:

SELECT policy_id, policy_number, transaction_type, creation_date, ROW_NUMBER() OVER ( PARTITION BY policy_number ORDER BY creation_date DESC ) AS row_num FROM policies;

policy_id	policy_number	transaction_type	creation_date	row_num
72183a12-6758	ABC100019	CHG	2024-02-22	1
72183a12-6758	ABC100019	CHG	2024-01-23	2
72183a12-6758	ABC100019	ADD	2023-12-24	3

This ranks every transaction for a policy_number. The most recent one (by creation_date) gets a row_num of 1.

To filter only the most recent:

WITH ranked_policies AS ( SELECT , ROW_NUMBER() OVER ( PARTITION BY policy_number ORDER BY creation_date DESC ) AS row_num FROM policies ) SELECT  FROM ranked_policies WHERE row_num = 1;

This query worked not only because it identified the latest transaction, but because it preserved the full detail of the row in a way MAX() never could.

Why ROW_NUMBER() is superior in Grouped Queries

In a more advanced report, I had to:

Join policy transactions with quote details, venues, and coverages
Aggregate values like attendance limits and premiums
Compare each transaction to its previous version

With ROW_NUMBER(), I could do this cleanly:

ROW_NUMBER() OVER (
  PARTITION BY PO.ID
  ORDER BY V.DATE_BOUND DESC
) AS transaction_version

Then compare changes with a self-join:

LEFT JOIN filtered_policy_version P1
  ON P2.POLICY_ID = P1.POLICY_ID
  AND P1.DATE_BOUND < P2.DATE_BOUND
  AND P1.transaction_version = P2.transaction_version + 1

This enabled version-to-version comparisons — something MAX() just can't handle.

Joining to policy_versions

Once I had the latest policies, I could join safely:

WITH ranked_policies AS (
  SELECT *, ROW_NUMBER() OVER (
    PARTITION BY policy_number
    ORDER BY creation_date DESC
  ) AS row_num
  FROM policies
)
SELECT rp.*, pv.quote_number, pv.version_status
FROM ranked_policies rp
LEFT JOIN policy_versions pv
  ON rp.policy_id = pv.policy_id
WHERE rp.row_num = 1;

This gave me one clean, current row per policy, with all related metadata.

Practical Use Cases

Generating reports of all active policies
Matching policies to their latest quotes, coverages, and events
Identifying orphaned policies (e.g., those without a corresponding PAY transaction)
Calculating premium adjustments across transactions
Cleaning up duplicate entries in staging tables

Takeaways

Policies evolve through multiple transactions (ADD, CHG, CNL), making it hard to know which version is the most current.
Creation_date alone isn’t always reliable, especially in large datasets or when clients retroactively submit changes.
Use ROW_NUMBER() with PARTITION BY policy_number ORDER BY creation_date DESC to rank transactions.
Filter where row_num = 1 to get the most recent policy version.
Join to policy_versions and other tables after filtering to reduce joins and improve performance.
Use row rankings to support comparison logic, calculate premium differences, and maintain clarity across complex joins.