top of page

Join Tables Without Inflating Data

  • Writer: Elisha Antunes
    Elisha Antunes
  • Jun 7
  • 3 min read

Updated: Jun 22

One of the earliest SQL lessons that stuck with me — and one that continues to surface in many data quality investigations — is the importance of managing joins carefully. Especially in insurance data, where you’re often working with one-to-many relationships, it's easy to inflate totals or duplicate rows without realizing it.


In this post, I’ll walk through a real-world scenario I encountered while reporting on policy premiums. Everything looked fine until I joined the policies table to quote_events and quote_event_venues, and suddenly my total premiums were triple what they should have been.



The Problem: Multiplying Rows With Naive Joins

Let’s say you’re working with these tables:

  • policies: each row represents a bound transaction (ADD, CHG, CNL)

  • quote_events: one policy may have multiple events

  • quote_event_venues: one event may have multiple venues


At first glance, a simple join across these tables seems fine:

SELECT p.policy_number, p.premium, v.venue_name
FROM policies p
JOIN quote_events e ON p.policy_id = e.policy_id
JOIN quote_event_venues v ON e.event_id = v.event_id;

This query works — but it multiplies the number of rows. If a policy has two events and each event has three venues, you get six rows for one policy. That’s fine if you’re looking at venue-level data, but if you're aggregating premium, you've got a serious problem.


The Symptom: Inflated Premium Totals

Let’s say you try to calculate total premium per policy:

SELECT p.policy_number, SUM(p.premium)
FROM policies p
JOIN quote_events e ON p.policy_id = e.policy_id
JOIN quote_event_venues v ON e.event_id = v.event_id
GROUP BY p.policy_number;

Suddenly, your numbers are way higher than expected. You start second-guessing the data itself. But the truth is — your join is the problem.


Another related problem is the assumption that every policy’s premium is correctly represented by a single row. But in systems where ADD and CHG transactions both contribute to the policy history, a simple SUM(premium) can mislead.


If a policy had an initial ADD transaction for $500 and a subsequent CHG that raised it to $800, and both rows are included in the aggregation, the result could show $1,300 instead of the final premium of $800.


The Fix: Aggregate Before You Join


The solution is often to reduce the granularity before performing joins — especially if you don’t actually need row-level detail from the child tables.


Here’s a safer approach using subqueries or CTEs:

WITH venue_summary AS (
  SELECT e.policy_id,
         COUNT(DISTINCT v.venue_name) AS venue_count
  FROM quote_events e
  JOIN quote_event_venues v ON e.event_id = v.event_id
  GROUP BY e.policy_id
),
latest_premiums AS (
  SELECT policy_id, policy_number, premium
  FROM (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY policy_id ORDER BY creation_date DESC) AS rn
    FROM policies
  ) t
  WHERE rn = 1
)
SELECT lp.policy_number, lp.premium, vs.venue_count
FROM latest_premiums lp
LEFT JOIN venue_summary vs ON lp.policy_id = vs.policy_id;

This structure ensures that we’re only pulling the most recent version of each policy’s premium, avoiding overcounting due to multiple transactions.


Here is an example of what this would look like:

policy_number

premium

venue_count

ABC100001

101

2

ABC100002

424

1

ABC100003

182

3

ABC100004

568

0

ABC100005

530

1

Practical Use Cases

  • Generate accurate reports on earned premiums

  • Reconcile submission volumes against actual bound policies

  • Pinpoint which CHG transactions affected policy financials

  • Prevent inflated row counts in exports or downstream reports


Key Takeaways

This is a common trap — not because the SQL is wrong, but because the data grain shifts quietly. One-to-many joins can multiply rows without warning, especially in complex policy data.

  • Be mindful of row multiplication when joining to detail tables

  • Control the data grain before aggregating

  • Use CTEs or subqueries to summarize before joining

  • Validate the joins by testing against a known single record


Recent Posts

See All
Deduplicating Data with SQL

One of the earliest challenges I encountered while working with policy data was understanding why seemingly duplicate records were...

 
 
Tracking Premium Changes Over Time

Premiums don’t always stay fixed after a policy is bound — especially in specialty insurance. As endorsements (change transactions) are...

 
 
bottom of page