Where Real Estate Data Actually Comes From (And How to Use It to Find Sellers Before They List)

Most agents pull a list and start calling without understanding where the data comes from, how fresh it is, or why it matters. This guide maps the entire real estate data supply chain — from county assessor offices to enterprise service bureaus to the agent-facing tools on your laptop — and shows you how to use that knowledge to find sellers before they list.

TL;DR

Every property “lead list” you’ve ever pulled traces back to the same origin: 3,143 county recorder and assessor offices across the United States. Enterprise service bureaus — CoreLogic (now Cotality), ATTOM, and Black Knight (now ICE Mortgage Technology) — aggregate, clean, and standardize that raw data into 150–160 million property records. Agent-facing platforms like PropStream, BatchLeads, and DealMachine license data from those bureaus and layer on filters, skip tracing, and outreach tools. Understanding this chain matters because data freshness, accuracy, and signal depth vary at every layer — and most agents are making decisions based on data they’ve never questioned.

This post explains every layer, names the players, and shows you exactly how to use this knowledge to stack seller-intent signals and reach homeowners 30–90 days before they list.

There’s a question on r/WholesaleRealestate that gets posted about once a month: “PropStream vs. DealMachine vs. BatchLeads — which one has the best data?”

Every response compares features, pricing, and UI. Nobody answers the actual question. Because the actual answer is: they all get their data from roughly the same upstream sources. The difference isn’t which platform has “better data.” The difference is what each platform does with the data after licensing it, how frequently they refresh it, and which filters and enrichment layers they build on top.

If you don’t understand where property data comes from, you can’t evaluate whether the list you pulled last Tuesday is actually useful. You can’t tell the difference between a stale skip-trace result and a fresh one. You can’t explain why one platform shows a homeowner with 62% equity and another shows 48% for the same property. And you definitely can’t stack seller-intent signals with confidence, because you don’t know which signals are real-time and which are 6 months old.

This post fixes that. We’re going to walk through the entire data supply chain, from the raw source all the way to the list on your screen, and then show you how to use that understanding to find sellers before any other agent in your market.

Layer 1: The Raw Source — County Government Offices

Every piece of property data in America originates in one of 3,143 county recorder and assessor offices (Levelset, 2026). These offices maintain the legal record of who owns what, what it’s worth for tax purposes, and what’s been filed against it.

The county recorder (sometimes called the register of deeds) documents ownership changes, mortgages, liens, easements, and other legal filings. When you close on a house, the deed gets recorded here. When a lender files a mortgage, it gets recorded here. When the IRS files a tax lien, it gets recorded here. This is the authoritative chain of title.

The county assessor maintains the property characteristics (square footage, lot size, year built, number of units), the assessed value used to calculate property taxes, and the tax payment history. ATTOM’s assessor data alone covers more than 160 million properties across more than 3,000 counties (ATTOM, 2026).

Other county-level offices contribute additional data that matters for signal stacking. The building department maintains permit records — when a homeowner pulls a permit for a roof replacement, HVAC, or renovation, it’s filed here. The county court system maintains probate filings, divorce filings, and foreclosure proceedings. And in many jurisdictions, the zoning and planning department maintains land-use classifications and development activity.

Here’s the critical thing to understand: these 3,143 offices do not operate on a common system. Some have fully digital, API-accessible databases that update weekly. Others still operate on partially paper-based records that require in-person visits or FOIA requests. Update frequencies vary from real-time to annual. Data formats vary from structured SQL databases to scanned PDFs. As BatchData put it: “This fragmented setup across thousands of counties makes collecting and consolidating data a daunting task” (BatchData, 2026).

No individual agent can aggregate data across 3,143 counties on their own. That’s where Layer 2 enters.

Layer 2: The Service Bureaus — CoreLogic, ATTOM, Black Knight

Enterprise service bureaus are the companies that solved the county fragmentation problem. They built the infrastructure to collect, standardize, clean, and deliver property data at national scale. There are three that dominate the U.S. market, and virtually every agent-facing tool you’ve ever used licenses data from one or more of them.

CoreLogic (now Cotality). Founded in 1968, CoreLogic rebranded to Cotality in March 2025 (Cotality, 2025). It maintains the largest U.S. property data repository, covering all 50 states with ownership, tax, mortgage, lien, hazard risk, and geospatial data. Cotality’s primary customers are mortgage lenders, title companies, and insurance carriers — industries where data accuracy is legally required. Their “360 Property Data” product combines structure details, sales history, assessment data, and climate risk overlays into a single profile. If you’ve ever pulled a title report, the data likely ran through CoreLogic at some point in the chain.

ATTOM Data Solutions. ATTOM warehouses data on more than 158 million U.S. properties with over 9,000 unique data attributes and 70+ billion rows of data (ATTOM, 2026). Every record runs through a 20-step standardization and quality control process. ATTOM assigns a persistent “ATTOM ID” to every property — a unique identifier that follows the parcel across datasets, making it possible to link ownership data to permit data to mortgage data to foreclosure data on the same property without manual matching. ATTOM is particularly popular with PropTech companies, fintech platforms, and government agencies that need bulk data at scale. Their specialty datasets include environmental risk, climate risk, neighborhood demographics, and school data.

Black Knight (now ICE Mortgage Technology). Black Knight was acquired by Intercontinental Exchange (ICE) in 2023 and operates as part of ICE Mortgage Technology. Their property database covers 99.9% of the U.S. population. Black Knight leans heavily toward mortgage servicing and origination analytics — delinquency data, prepayment models, collateral risk scores, and automated valuation models (AVMs). While less agent-facing than ATTOM, Black Knight’s data underpins many of the mortgage and foreclosure indicators that flow into the tools agents use.

What these three companies have in common: they all collect raw data from the same 3,143 county offices. The difference is in their coverage depth, update frequency, specialty datasets, enrichment layers, and delivery infrastructure. Cotality is strongest for lending and title workflows. ATTOM is strongest for breadth of attributes and PropTech integrations. ICE/Black Knight is strongest for mortgage servicing intelligence.

None of these companies sell directly to individual real estate agents. Their customers are enterprises, platforms, and government agencies. So how does their data reach you?

Layer 3: The MLS — A Parallel Data System

Before we get to the agent-facing tools, there’s a parallel data system that most agents interact with daily but rarely think about as “data infrastructure”: the Multiple Listing Service.

There are just over 500 MLS systems in the United States, each independently governed by the real estate professionals in its market (Constellation Data Labs, 2026). MLS data is agent-generated: when you take a listing, you enter the property details, photos, price, and showing instructions into your local MLS. As the listing moves through its lifecycle — active, under contract, closed, expired, withdrawn — those status changes are recorded in near real-time.

MLS data is the best source available for current market activity: what’s for sale right now, what just sold, at what price, and how long it sat. Zillow’s Zestimate, which has a national median error rate of approximately 2.4%, is trained on MLS comparable sales data combined with public records (Constellation Data Labs / Built In, 2026).

But MLS data has a structural blind spot that matters enormously for finding sellers: it only covers properties that went through the formal listing process. It does not cover for-sale-by-owner transactions, off-market sales, most new construction presales, distressed sales handled outside traditional brokerage, or — most importantly — properties where the owner is thinking about selling but hasn’t listed yet.

For prospecting, MLS data tells you where sellers have been. Public records data tells you where sellers are going to be. That distinction is the entire foundation of upstream prospecting.

Layer 4: Agent-Facing Platforms — PropStream, BatchLeads, DealMachine

This is the layer most agents interact with. Platforms like PropStream, BatchLeads, and DealMachine license data from the Layer 2 service bureaus, combine it with MLS feeds and their own proprietary enrichment, and package it into a user interface with filters, skip tracing, and outreach tools.

Think of it this way: CoreLogic, ATTOM, and Black Knight are the wholesalers. PropStream, BatchLeads, and DealMachine are the retailers. You’re the end consumer. The data on your screen passed through at least three layers before you saw it.

Here’s what each platform adds on top of the raw data:

PropStream covers 150+ million property records with 165+ filters. Its strength is deep filtering and comp analysis. PropStream also includes MLS data, which most competing platforms don’t have natively. In July 2025, PropStream acquired BatchLeads and BatchDialer, consolidating the property-data space, though the two products still operate on separate subscriptions as of mid-2026 (Jamil Academy, 2026). PropStream’s Lead Automator refreshes saved lists daily so filters stay current. Skip tracing is not included on the base plan — it’s roughly $0.12 per record.

BatchLeads covers 155+ million properties with 140+ filters and differentiates with BatchRank AI — a 5-level distress score that prioritizes homeowners most likely to sell. Skip tracing (phone + email) is included in every plan, not an add-on. BatchLeads is the strongest option for agents who want AI doing the prioritization work so they can go straight to outreach without building complex filter combinations manually.

DealMachine covers 150+ million properties but differentiates as a mobile-first, field-based tool. The original “driving for dollars” app, DealMachine GPS-tracks your route, lets you tap any property to add it as a lead, instantly pulls owner contact info, and can launch a personalized postcard (with the actual property photo) without leaving your car. DealMachine reports 96.5% owner-data accuracy. Skip-trace credits are capped per plan (500 on Starter, 1,000 on Professional).

	PropStream	BatchLeads	DealMachine
Property records	150M+	155M+	150M+
Filters	165+	140+	70+ (700+ data points)
AI distress ranking	Limited	BatchRank (5 levels)	AI Deal Finder
Skip tracing included	No (Pro tier yes)	Yes (all plans)	Capped per plan
MLS comp data	Strongest	Solid	Solid
Native dialer	No (BatchDialer add-on)	Add-on $89/mo	AI dialer included
In-app direct mail	Yes ($0.48–$1.50)	Yes ($0.55–$1.50)	Yes ($0.60–$1.50, with photos)
Realistic monthly cost	$180–$250	$120–$200	$100–$200
Best workflow fit	Desktop research	AI-prioritized outreach	Field / mobile-first

Pricing as of April 2026 per Jamil Academy. Verify current rates on each platform’s website.

The point of this comparison isn’t to pick a winner. It’s to show that all three platforms are drawing from the same upstream data ocean. The “lead list” on your PropStream screen and the “lead list” on your DealMachine screen originate in the same county recorder offices. What differs is the enrichment, the filtering interface, and the workflow built around the data. When an agent says “DealMachine’s data is better than PropStream’s,” what they usually mean is “DealMachine’s workflow fits how I actually work.”

Why Understanding the Supply Chain Makes You a Better Prospector

When you understand where data comes from, you stop making the mistakes that cost most agents months of wasted effort. Here are the five most common data mistakes agents make — and how supply-chain knowledge prevents each one.

Mistake 1: Trusting equity estimates at face value. Your platform shows a homeowner with 62% equity. But where did that number come from? It’s calculated by subtracting the recorded mortgage balance from an estimated current market value. The mortgage balance comes from county recorder data — which reflects the original loan amount and any recorded refinancing, but not monthly principal payments. The market value is an AVM estimate based on comps that may be 3–6 months old. The “62% equity” number is a useful approximation, not a fact. It’s good enough to filter a list. It’s not good enough to quote in a conversation with a homeowner. Always say “based on public records, your equity position appears strong” — not “you have $320K in equity.”

Mistake 2: Assuming ownership duration is current. Ownership data comes from recorded deeds. In most counties, there’s a lag of days to weeks between a closing and the deed being recorded. In some counties, it’s months. If your platform shows someone has owned for 14 years but the property actually sold 3 weeks ago, you’re mailing the wrong person. Cross-reference against recent MLS sold data before outreach.

Mistake 3: Treating skip-trace results as a phone book. Skip tracing pulls phone numbers and emails from third-party consumer databases. These databases aggregate data from credit headers, utility records, voter registrations, and marketing databases. They’re good — DealMachine reports 96.5% owner-data accuracy — but “owner data accuracy” means they correctly identified the owner. It doesn’t guarantee the phone number on file is current, working, or the owner’s preferred contact method. Expect 15–25% of skip-traced phone numbers to be disconnected or wrong. That’s not a platform failure. It’s the nature of the data source.

Mistake 4: Ignoring permit data because it’s “hard to find.” Permit data is one of the strongest seller-intent signals available — a homeowner who just pulled a permit for a major renovation is either preparing to sell or investing long-term. But permit data is also one of the most inconsistent data points in the supply chain. It originates in county building departments, which vary wildly in digitization and update frequency. Some counties publish permits online in real-time. Others don’t digitize them at all. The agent-facing platforms that include permit data only have it for the counties that report it digitally. If permit data doesn’t show up in your market on PropStream or BatchLeads, it may still be available directly from your county’s building department website.

Mistake 5: Pulling a list once and working it for months. Property data is a snapshot, not a photograph. Ownership changes, equity positions shift, permits get filed, life events happen. A list you pulled 90 days ago is materially different from the same list pulled today. The agents getting the best results from signal stacking pull fresh lists weekly or use automated daily refreshes (like PropStream’s Lead Automator) so their outreach always targets the most current set of high-probability sellers.

How to Signal-Stack When You Understand the Data

Signal stacking — the process of identifying homeowners showing 3–5 seller-intent signals simultaneously — is the core of what’s replacing traditional prospecting. Research across both B2B and real estate applications shows that stacked signals convert at 5–10x the rate of cold outreach (Landbase, 2026).

But not all signals are equal when it comes to data freshness and reliability. Now that you understand the supply chain, here’s how to weight each signal:

Signal	Data source	Typical freshness	Reliability
Ownership > 12 years	County recorder → service bureau → platform	Days to weeks lag after recording	High (deeds are legal records)
Equity > 55%	Assessor (value) + recorder (mortgage) + AVM	Assessor data: annual. AVM: varies by provider	Medium (estimate, not exact balance)
Recent permit activity	County building department	Varies wildly (real-time to never digitized)	High where available; spotty coverage
Absentee owner	Assessor (mailing address ≠ property address)	Updated with assessment cycle (annual)	High (address mismatch is binary)
Life events (probate, divorce, pre-foreclosure)	County court system	Days to weeks after filing	High (court filings are legal records)
Tax delinquency	County treasurer / tax collector	Quarterly to annual	High (nonpayment is a fact)
MLS expired / withdrawn	Local MLS	Same day	Very high (agent-entered, near real-time)

The optimal signal stack combines at least one high-reliability / high-freshness signal (like ownership duration or a court filing) with at least one high-intent signal (like a recent permit or tax delinquency). When three or more of these converge on a single property, you have a high-probability seller. We walk through the exact filter settings and messaging scripts in our data-backed guide to getting listings without cold calling.

Build Your Data Stack: A Practical Guide

Here’s the data stack that produces the best results for listing-focused agents running signal-stacked outreach.

Primary platform (pick one): PropStream, BatchLeads, or DealMachine. This is your daily workhorse for pulling filtered lists, skip tracing, and launching outreach. The decision should be based on your workflow preference: desktop research (PropStream), AI-prioritized lists (BatchLeads), or field-first mobile prospecting (DealMachine). Don’t subscribe to all three. Pick one, master it for 90 days, and evaluate. The agents who run multiple platforms simultaneously tend to master none of them.

Free supplementary sources: Regardless of which platform you choose, supplement it with data you can access directly. Your county assessor’s website gives you ownership and tax payment data — often more current than what the service bureaus have ingested. Your county building department website (if digitized) gives you real-time permit filings. Your county court system gives you probate, divorce, and foreclosure filings. USPS change-of-address data is available through certain channels. These free sources fill gaps that even the best platform can’t cover in every market.

MLS for expired/withdrawn/days-on-market: Your MLS is the fastest, most reliable source for listing-lifecycle data. Expired listings, withdrawn listings, and properties sitting 60+ days are all visible the moment the status changes. Cross-referencing your signal-stacked list against MLS expired data lets you reach sellers who are both showing intent signals and have already experienced a failed listing attempt — which is the highest-conversion overlap in prospecting. We covered this in depth in our analysis of what none of the 50 agents calling that expired listing actually did.

Monthly cost for a complete data stack: One agent-facing platform ($50–$200/month) + free county sources ($0) + your existing MLS membership ($0 incremental). Total: $50–$200/month. That’s the cost of one Zillow Premier Agent zip code in most markets — but instead of buying shared leads that every agent in town is also buying, you’re building a proprietary pipeline of high-probability sellers that nobody else is reaching. For the full CPA comparison across every major lead source, see our ranking of the 10 best lead generation companies by cost per closed deal.

Where the Data Is Going: AI, AVMs, and Predictive Scoring

The data supply chain described above has been relatively stable for two decades. What’s changing rapidly is what gets built on top of it.

Automated Valuation Models (AVMs) are becoming standard in lending. In 2024, lenders used AVMs or Property Condition Reports on 35% of home equity loans, a 20-percentage-point year-over-year increase (Constellation Data Labs / CSS, 2026). In June 2024, six federal agencies issued a final rule implementing quality-control standards for AVMs, effective October 2025. The implication for agents: AVM-based equity estimates will get more accurate, which makes equity-based signal stacking more reliable.

AI-powered predictive seller scoring is where BatchLeads’ BatchRank AI is an early mover. Instead of requiring agents to manually set filter criteria, predictive models analyze hundreds of data points simultaneously to generate a likelihood-to-sell score. Nearly 60% of the predictive power in modern real estate models now comes from nontraditional (alternative or enriched) data sources rather than raw public records (BatchData, 2026). This means the platforms that invest most in enrichment — not just raw data — will increasingly outperform those that just deliver county records with a filter interface.

The agents who understand the data supply chain will be able to evaluate these evolving tools critically instead of accepting marketing claims at face value. When a new platform promises “AI-powered seller prediction with 90% accuracy,” you’ll know to ask: accuracy measured how? Against what ground truth? Using what data freshness? Trained on what signal set? That kind of informed evaluation is the difference between an agent who makes data-driven decisions and one who just buys the shiniest tool.

Turn Data Into Listing Appointments

The Seller Signal Method gives you the exact filters, messaging scripts, and reply-to-appointment playbook to turn raw property data into a pipeline of high-probability sellers — without cold calling, without shared leads, and without guessing. It’s $27 with a “book 3+ listing appointments in 30 days or your money back” guarantee.

Get the Seller Signal Method →

Frequently Asked Questions

Where does PropStream, BatchLeads, and DealMachine get their property data?

All three platforms license property data from enterprise service bureaus — primarily CoreLogic (now Cotality), ATTOM Data Solutions, and ICE Mortgage Technology (formerly Black Knight). These service bureaus aggregate data from 3,143 county recorder and assessor offices across the United States, standardize it, and deliver it via APIs and bulk feeds. The platforms then add their own layers: filters, skip tracing, AI scoring, and outreach tools. The raw underlying data is similar across platforms. The difference is in enrichment, filtering depth, update frequency, and workflow design.

How fresh is the property data in agent-facing platforms?

It depends on the data type and the county. Deed recordings typically appear in platforms within days to weeks of the actual transfer. Tax assessment data updates annually in most counties. Permit data freshness varies from real-time (in digitized counties) to unavailable (in counties that don’t publish digitally). MLS data (where included) updates in near real-time. Skip-trace phone numbers come from third-party consumer databases that update on varying schedules. As a general rule, assume a 1–4 week lag on most property data, with equity estimates carrying the widest margin of uncertainty because they rely on AVMs.

What is CoreLogic / Cotality and why does it matter?

CoreLogic, which rebranded to Cotality in March 2025, is the largest U.S. property data service bureau. Founded in 1968, it maintains comprehensive property records covering all 50 states, including ownership, tax, mortgage, lien, and hazard data. Cotality’s primary customers are mortgage lenders, title companies, and insurance carriers. Most agent-facing real estate platforms license at least some of their data from Cotality. Understanding Cotality helps you understand why the data on your screen looks the way it does and what its limitations are.

Do I need to pay for a data platform to signal-stack?

You can signal-stack manually using free county sources: assessor websites for ownership and tax data, building department sites for permits, court systems for probate and foreclosure filings, and USPS for change-of-address indicators. The limitation is time. Manually cross-referencing five data sources across even 200 homes takes hours. A $50–$200/month platform automates the cross-referencing and lets you pull a filtered list in minutes. For most producing agents, the time savings justifies the subscription many times over.

Which platform should I choose: PropStream, BatchLeads, or DealMachine?

Match the platform to your actual workflow, not to a feature checklist. If you spend Sunday mornings building research-heavy target lists at your desk, PropStream’s 165+ filters and MLS comp data will serve you best. If you want to skip the research phase and start outreaching the highest-intent leads immediately, BatchLeads’ AI distress ranking and included skip tracing is the most efficient path. If you prospect by physically driving neighborhoods and want to launch personalized mail from your phone, DealMachine is purpose-built for that. Do not subscribe to all three. Pick one, run it for 90 days, and evaluate.

What’s the difference between MLS data and public records data?

MLS data is agent-generated and covers properties currently or recently listed for sale. It’s the best source for current market pricing, comparable sales, and listing lifecycle data (active, pending, sold, expired). Public records data is government-maintained and covers every property in the country regardless of listing status. It’s the authoritative source for ownership, mortgage debt, tax history, liens, and court filings. For finding sellers before they list, public records are the primary source because they reveal intent signals on properties that aren’t yet in the MLS. The strongest prospecting approach combines both: public records for identifying high-probability sellers, and MLS data for timing your outreach around listing lifecycles.