Schema design

If indexing is the GPS for your data, then schema design is the city layout. You can have the best GPS in the world, but if the streets don’t make sense, you’re still going to get stuck in traffic. Good MongoDB schema design starts with one simple question: how does your app actually use this data? Unlike SQL databases, where you normalize everything into tidy tables and join them back together, MongoDB wants you to model data around your application’s access patterns. Data that’s read together should live together. It’s a fundamentally different mindset — and once it clicks, it’s incredibly powerful.

Start with workload, not tables

Before you design a single collection, sit down and list the operations your app performs most often:

What screens or API endpoints hit the database hardest?
Which fields are filtered, sorted, or grouped?
Which data is updated together?
Which relationships are one-to-one, one-to-many, or many-to-many?
Which queries absolutely must be fast at scale?

This is your workload map, and it should drive every schema decision you make. Start here, then map relationships, then apply design patterns and indexes. In that order.

The core principle: store together what you read together

This is the single most important rule in MongoDB schema design. If your app always shows an order with its line items, store the line items inside the order document. If your dashboard always shows a user with their recent activity, embed the activity. Fewer round trips, simpler queries, faster reads.

The big decision: embed vs. reference

Every relationship in your schema comes down to this choice: do you embed the related data inside the document, or do you store it in a separate collection with a reference (like a foreign key)?

Embed when…

Embedding is usually the winner when:

Data is read together — The child data is always fetched alongside the parent.
Ownership is clear — The child belongs to one parent, not many.
The array is bounded — You know it won’t grow to thousands of entries.
You want speed — One read instead of two. No joins needed.

{
  _id: ObjectId("..."),
  name: "Acme Corp",
  billing: { plan: "pro", renewalDate: ISODate("2026-10-01") },
  contacts: [
    { name: "Jane", email: "jane@acme.com", role: "admin" },
    { name: "Tom", email: "tom@acme.com", role: "finance" }
  ]
}

Reference when…

References are the better choice when:

Data is large or unbounded — You can’t predict how big it’ll get.
Children are queried independently — You need to search them on their own.
Many-to-many relationships — The same child belongs to multiple parents.
Updates are frequent — The child data changes often and independently.

// accounts
{
  _id: ObjectId("..."),
  name: "Acme Corp",
  primaryContactId: ObjectId("...")
}

// contacts
{
  _id: ObjectId("..."),
  accountId: ObjectId("..."),
  name: "Jane",
  email: "jane@acme.com"
}

Design around your hottest queries

Your schema should make your most important queries cheap and simple. If your app mainly does this:

db.orders.find(
  { tenantId: "t1", status: "open" }
).sort({ createdAt: -1 })

then your schema and indexes should be optimized for exactly that. If instead your app mostly aggregates monthly revenue by customer, the schema might need to look very different. Schema design isn’t about theoretical purity — it’s about making your real workloads fast.

Keep documents practical, not just flexible

MongoDB’s flexible schema is a superpower, but “flexible” doesn’t mean “shapeless.” An unstructured schema is like a closet where you just throw everything in — technically it works, but good luck finding anything. Rules to live by:

Keep field names and types consistent. Don’t store age as a string in one document and a number in another.
Avoid multiple representations of the same concept.
Don’t store numbers as strings (you’d be surprised how often this happens).
Only use polymorphic documents when the use case genuinely requires it.

If a field is sometimes a string, sometimes an array, and sometimes missing entirely, everything from filtering to indexing to analytics becomes a nightmare.

Schema validation: guardrails for your data

MongoDB lets you set validation rules on collections to enforce field types, required fields, and allowed values. Think of it as setting up guardrails on a highway — the data can still flow freely, but it can’t fly off the road.

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["email", "status"],
      properties: {
        email: { bsonType: "string" },
        status: { enum: ["active", "invited", "disabled"] },
        createdAt: { bsonType: "date" }
      }
    }
  }
})

This is especially valuable when multiple services or developers write to the same database. Without validation, schema drift is a matter of when, not if.

Denormalization: it’s a feature, not a flaw

In the relational world, duplicating data is a cardinal sin. In MongoDB, it’s often the right tradeoff. If duplicating a customer name inside every order document means your order list page loads in one read instead of two, that’s a win. Common examples of smart denormalization:

customerName inside orders
productSummary inside line items
latestStatus on a parent document

The real question isn’t “is this duplicated?” It’s “does this duplication make the important reads faster, and can I keep it consistent enough for my use case?”

Design patterns: the cheat codes

MongoDB has official schema design patterns for problems that come up again and again. These are worth knowing because they solve real-world pain more cleanly than ad hoc approaches.

The attribute pattern

Perfect for documents with lots of semi-dynamic key-value attributes (think product specs, feature flags, or metadata). Instead of this:

{ name: "T-shirt", color: "blue", size: "L", material: "cotton" }

use this:

{
  name: "T-shirt",
  specs: [
    { k: "color", v: "blue" },
    { k: "size", v: "L" },
    { k: "material", v: "cotton" }
  ]
}

Now you can index specs.k and specs.v and query any attribute uniformly.

The bucket pattern

For high-volume time-series data (logs, telemetry, metrics), storing one document per event creates millions of tiny documents. Instead, group them into time-based “buckets”:

{
  sensorId: "A1",
  bucketStart: ISODate("2026-03-01T10:00:00Z"),
  readings: [
    { ts: ISODate("2026-03-01T10:00:01Z"), temp: 21.1 },
    { ts: ISODate("2026-03-01T10:00:02Z"), temp: 21.2 }
  ]
}

Fewer documents, better query patterns, happier indexes.

The outlier pattern

When 99% of your documents are modest in size but 1% are enormous (users with millions of activity records, products with 50,000 reviews), isolate the excess into a separate collection. The common case stays fast and compact.

Versioning and history patterns

Need to preserve historical states or audit older versions? Versioning patterns let you track changes over time without bloating your active documents. Useful for compliance, auditing, and “undo” features.

The unbounded array trap

This is one of the most common MongoDB mistakes: embedding an array that grows without limit. It starts small and innocent, then one day you’ve got documents with 50,000 entries that are slow to read, expensive to update, and impossible to index well. Good candidates for extracting or bucketing instead of endlessly embedding:

Audit trails
Event histories
Comments
Large membership lists
Telemetry points

Bounded arrays like topTags, recentLogins, or primaryContacts (where you cap the size) are usually safe. Unbounded ones are ticking time bombs.

Indexes should influence your schema

Schema design and index design are tightly linked — they’re two sides of the same coin. A schema that’s easy to index well has:

Stable, consistently typed filter fields — Not sometimes a string, sometimes a number.
Predictable nested structures — So compound indexes on nested fields actually work.
Common queries that map naturally to compound indexes — The ESR Rule applies here too.

Pro-tip: Don’t index entire embedded documents. Index specific nested fields instead:

// Good — targets the fields you actually filter on
db.users.createIndex({ "profile.country": 1, "profile.city": 1 })

Anti-patterns: the “don’t do this” list

Designing like a relational schema by default — If every relationship becomes a separate collection plus $lookup, you’re losing MongoDB’s biggest superpower: storing together what you read together.
Unbounded arrays — They grow, they slow down, they make everything harder. Use bucket or outlier patterns when growth isn’t naturally bounded.
Inconsistent field types — If status is a string in half your documents and a number in the other half, your filters and indexes are in trouble.
Overusing wildcards — Wildcard indexes are handy for genuinely unpredictable schemas, but they’re slower than targeted indexes for known query patterns.
Not planning for data lifecycle — If data becomes cold, archival, or disposable over time, bake that into your schema early. Don’t wait until you have 500 million documents to think about it.

A practical schema design workflow

List the top queries, writes, and reports.
Identify which entities are read together most often.
Decide where embedding makes reads simpler.
Split out data that’s large, shared, or unbounded.
Add validation rules for important collections.
Create indexes for the real query shapes.
Review whether any official design pattern fits better.
Test once the app is using realistic data volumes.

Example: a well-designed order schema

Here’s what a practical order document might look like:

{
  _id: ObjectId("..."),
  tenantId: "t1",
  orderNumber: "ORD-10042",
  customer: {
    customerId: ObjectId("..."),
    name: "Jane Smith",
    email: "jane@example.com"
  },
  status: "open",
  createdAt: ISODate("2026-03-14T09:30:00Z"),
  items: [
    {
      productId: ObjectId("..."),
      sku: "TSHIRT-BL-L",
      name: "T-Shirt Blue Large",
      qty: 2,
      unitPrice: 39.00
    }
  ],
  totals: {
    subtotal: 78.00,
    tax: 7.80,
    grandTotal: 85.80
  }
}

Why this works:

The order and its line items are read together → embedded.
Customer summary is denormalized for fast display → no $lookup needed for the order list.
Totals are precomputed → easy sorting and reporting.
The schema supports the hottest queries (by tenant, status, and date) naturally.

When to change the schema

You should revisit your schema design when:

Your app increasingly relies on $lookup just to render normal screens.
Common queries scan far more data than they return.
Arrays are growing without bound.
Fields have become inconsistent across documents.
Your indexes feel awkward because the document shape is fighting the workload.
New features keep requiring painful workarounds.

MongoDB’s flexibility means you can evolve iteratively, but changing a production schema at scale is still hard. Revisit the design before the pain compounds.

Summary

The best MongoDB schema isn’t the most normalized one or the most flexible one — it’s the one that makes your important reads, writes, and maintenance tasks simple and efficient. Start with your workload, embed what’s read together, reference what’s independent, and use proven patterns for the tricky stuff. With tools like Spanna Pro to help you inspect document shapes, analyze field consistency, and identify schema issues, you’ll build schemas that scale smoothly from prototype to production.

Getting Started

Core Features

MongoDB Concepts

Guides

Desktop Client

Pro Features

Start with workload, not tables

The core principle: store together what you read together

The big decision: embed vs. reference

Embed when…

Reference when…

Design around your hottest queries

Keep documents practical, not just flexible

Schema validation: guardrails for your data

Denormalization: it’s a feature, not a flaw

Design patterns: the cheat codes

The attribute pattern

The bucket pattern

The outlier pattern

Versioning and history patterns

The unbounded array trap

Indexes should influence your schema

Anti-patterns: the “don’t do this” list

A practical schema design workflow

Example: a well-designed order schema

When to change the schema

Summary

Getting Started

Core Features

MongoDB Concepts

Guides

Desktop Client

Pro Features

​Start with workload, not tables

​The core principle: store together what you read together

​The big decision: embed vs. reference

​Embed when…

​Reference when…

​Design around your hottest queries

​Keep documents practical, not just flexible

​Schema validation: guardrails for your data

​Denormalization: it’s a feature, not a flaw

​Design patterns: the cheat codes

​The attribute pattern

​The bucket pattern

​The outlier pattern

​Versioning and history patterns

​The unbounded array trap

​Indexes should influence your schema

​Anti-patterns: the “don’t do this” list

​A practical schema design workflow

​Example: a well-designed order schema

​When to change the schema

​Summary

Start with workload, not tables

The core principle: store together what you read together

The big decision: embed vs. reference

Embed when…

Reference when…

Design around your hottest queries

Keep documents practical, not just flexible

Schema validation: guardrails for your data

Denormalization: it’s a feature, not a flaw

Design patterns: the cheat codes

The attribute pattern

The bucket pattern

The outlier pattern

Versioning and history patterns

The unbounded array trap

Indexes should influence your schema

Anti-patterns: the “don’t do this” list

A practical schema design workflow

Example: a well-designed order schema

When to change the schema

Summary