top of page

How I Learned to Find the Most Recent Policy Version

  • Writer: Elisha Antunes
    Elisha Antunes
  • Jun 5
  • 3 min read

Updated: Jun 22

One of the most valuable lessons I’ve learned in insurance data work is how to consistently identify the most recent version of a policy. It sounds simple, but getting it wrong can lead to outdated reports, misleading metrics, and failed downstream processes.


If you’re working with transactional policy data — where each update (ADD, CHG, CNL) is stored as a separate row — this problem will feel familiar. In our case, every policy event created a new record, and we needed to report only the latest version of each policy.


In this post, I’ll walk you through the exact problem I faced, how I first attempted to solve it, and how the ROW_NUMBER() window function ended up being the most elegant and reliable solution.


The Problem


Let’s say you’re working with the following tables:


  • policies: each row represents a transaction (ADD, CHG, CNL)

  • policy_versions: each row links to a policy_id and possibly a quote_id


Here’s a simplified query I tried:

SELECT * FROM policies WHERE creation_date = ( SELECT MAX(creation_date) FROM policies p2 WHERE p2.policy_number = policies.policy_number );

This worked... until i realized it didn’t scale well when I started joining it to the policy_versions or quotes tables. But the deeper issue came to light when I started grouping and aggregating policies with other transactional tables.


In one real-world example, I needed to:


  • Join policies to quote versions, venues, and coverages

  • Aggregate values like maximum daily attendance and coverage limits

  • Calculate adjusted premiums across versions


That’s when I found myself writing a complex report that needed to return only the latest version of a policy while computing premiums and merging event information — and the MAX() function simply couldn't handle that workload.


The issue was subtle: creation dates weren’t always a clear or unique indicator of the “most recent version.” I needed a way to find the latest transaction per policy, without breaking joins or losing access to the full row.


Enter ROW_NUMBER()


I’d used basic window functions before (like RANK() and COUNT()), but hadn’t really appreciated their full power until this moment.


The beauty of ROW_NUMBER() is that it assigns a unique sequential number to each row within a group — based on your sort criteria. So, instead of trying to wrangle MAX() in a subquery, I could do this:


SELECT policy_id, policy_number, transaction_type, creation_date, ROW_NUMBER() OVER ( PARTITION BY policy_number ORDER BY creation_date DESC ) AS row_num FROM policies;

policy_id

policy_number

transaction_type

creation_date

row_num

72183a12-6758

ABC100019

CHG

2024-02-22

1

72183a12-6758

ABC100019

CHG

2024-01-23

2

72183a12-6758

ABC100019

ADD

2023-12-24

3

This ranks every transaction for a policy_number. The most recent one (by creation_date) gets a row_num of 1.


To filter only the most recent:


WITH ranked_policies AS ( SELECT , ROW_NUMBER() OVER ( PARTITION BY policy_number ORDER BY creation_date DESC ) AS row_num FROM policies ) SELECT  FROM ranked_policies WHERE row_num = 1;

This query worked not only because it identified the latest transaction, but because it preserved the full detail of the row in a way MAX() never could.


Why ROW_NUMBER() is superior in Grouped Queries


In a more advanced report, I had to:

  • Join policy transactions with quote details, venues, and coverages

  • Aggregate values like attendance limits and premiums

  • Compare each transaction to its previous version


With ROW_NUMBER(), I could do this cleanly:

ROW_NUMBER() OVER (
  PARTITION BY PO.ID
  ORDER BY V.DATE_BOUND DESC
) AS transaction_version

Then compare changes with a self-join:

LEFT JOIN filtered_policy_version P1
  ON P2.POLICY_ID = P1.POLICY_ID
  AND P1.DATE_BOUND < P2.DATE_BOUND
  AND P1.transaction_version = P2.transaction_version + 1

This enabled version-to-version comparisons — something MAX() just can't handle.


Joining to policy_versions

Once I had the latest policies, I could join safely:

WITH ranked_policies AS (
  SELECT *, ROW_NUMBER() OVER (
    PARTITION BY policy_number
    ORDER BY creation_date DESC
  ) AS row_num
  FROM policies
)
SELECT rp.*, pv.quote_number, pv.version_status
FROM ranked_policies rp
LEFT JOIN policy_versions pv
  ON rp.policy_id = pv.policy_id
WHERE rp.row_num = 1;

This gave me one clean, current row per policy, with all related metadata.


Practical Use Cases


  • Generating reports of all active policies

  • Matching policies to their latest quotes, coverages, and events

  • Identifying orphaned policies (e.g., those without a corresponding PAY transaction)

  • Calculating premium adjustments across transactions

  • Cleaning up duplicate entries in staging tables


Takeaways


  • Policies evolve through multiple transactions (ADD, CHG, CNL), making it hard to know which version is the most current.

  • Creation_date alone isn’t always reliable, especially in large datasets or when clients retroactively submit changes.

  • Use ROW_NUMBER() with PARTITION BY policy_number ORDER BY creation_date DESC to rank transactions.

  • Filter where row_num = 1 to get the most recent policy version.

  • Join to policy_versions and other tables after filtering to reduce joins and improve performance.

  • Use row rankings to support comparison logic, calculate premium differences, and maintain clarity across complex joins.

Recent Posts

See All
Join Tables Without Inflating Data

One of the earliest SQL lessons that stuck with me — and one that continues to surface in many data quality investigations — is the...

 
 
Deduplicating Data with SQL

One of the earliest challenges I encountered while working with policy data was understanding why seemingly duplicate records were...

 
 
Tracking Premium Changes Over Time

Premiums don’t always stay fixed after a policy is bound — especially in specialty insurance. As endorsements (change transactions) are...

 
 
bottom of page