Database Design Fundamentals for Web Developers

Database design is often treated as an afterthought in web development. We create tables, write queries, and move on. But I’ve seen applications collapse under load because of poor database design. After fixing data models that served millions of users, I can tell you: the time invested in proper design pays dividends.

My Experience

We inherited a database that was a nightmare. Tables with no foreign keys. The same data stored in multiple places. A user table with 47 columns. Queries that took 30 seconds.

The worst part: it was impossible to fix without a complete rewrite. The application code had adapted to every anti-pattern.

We spent four months redesigning the database, migrating data, and updating the application. The result: queries that took seconds now took milliseconds. But the lesson was clear: design matters from day one.

Here’s what you need to know.

Normalization: The Foundation

First Normal Form (1NF)

Each column contains atomic values—no lists or arrays:

-- Bad: Repeating groups
CREATE TABLE orders_bad (
  id INT PRIMARY KEY,
  customer_name VARCHAR(100),
  items VARCHAR(500)  -- 'item1,item2,item3'
);

-- Good: Atomic values
CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_name VARCHAR(100)
);

CREATE TABLE order_items (
  id INT PRIMARY KEY,
  order_id INT REFERENCES orders(id),
  item_name VARCHAR(100),
  quantity INT
);

Second Normal Form (2NF)

No partial dependencies—non-key columns depend on the entire primary key:

-- Bad: Partial dependency
CREATE TABLE order_items_bad (
  order_id INT,
  product_name VARCHAR(100),  -- Depends only on product_id
  product_id INT,
  quantity INT,
  PRIMARY KEY (order_id, product_id)
);

-- Good: Separate entity
CREATE TABLE products (
  id INT PRIMARY KEY,
  name VARCHAR(100)
);

CREATE TABLE order_items (
  order_id INT REFERENCES orders(id),
  product_id INT REFERENCES products(id),
  quantity INT,
  PRIMARY KEY (order_id, product_id)
);

Third Normal Form (3NF)

No transitive dependencies—non-key columns depend only on the primary key:

-- Bad: Transitive dependency
CREATE TABLE orders_bad (
  id INT PRIMARY KEY,
  customer_id INT,
  customer_name VARCHAR(100),  -- Depends on customer_id, not order id
  customer_email VARCHAR(100)
);

-- Good: Separate table
CREATE TABLE customers (
  id INT PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100)
);

CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT REFERENCES customers(id)
);

Indexing Strategies

Indexes are essential for performance—but they come with trade-offs:

When to Index

Scenario	Index Type	Example
Primary key lookup	Primary	`id`
Search by name	B-Tree	`CREATE INDEX idx_name ON users(name)`
Range queries	B-Tree	`WHERE price > 100`
Exact matches	Hash	`WHERE status = 'active'`
Full-text search	Full-text	`MATCH(content) AGAINST('query')`
Geographic data	Spatial	ST_Distance_Sphere

Composite Indexes

-- Index for queries like: WHERE status = 'active' AND created_at > '2025-01-01'
CREATE INDEX idx_status_date ON orders(status, created_at);

-- In this order: equality first, then range
-- WHERE status = 'X' AND date > 'Y'  -- Good
-- WHERE date > 'Y' AND status = 'X'  -- Ignores index

When NOT to Index

Columns with low cardinality (few unique values)
Tables with frequent writes and rare reads
Small tables (full scan is faster)

-- Bad: Low cardinality index
CREATE INDEX idx_gender ON users(gender);  -- Only male/female

-- Good: Filtered index
CREATE INDEX idx_active_users ON users(id) WHERE status = 'active';

Query Optimization

Write Efficient Queries

-- Bad: SELECT * loads unnecessary data
SELECT * FROM orders WHERE status = 'shipped';

-- Good: Select only needed columns
SELECT id, total, shipped_at
FROM orders
WHERE status = 'shipped';

-- Bad: Implicit joins (old syntax)
SELECT orders.id, customers.name
FROM orders, customers
WHERE orders.customer_id = customers.id;

-- Good: Explicit joins
SELECT o.id, c.name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;

Use EXISTS Instead of IN

-- Slower: IN with subquery
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);

-- Faster: EXISTS
SELECT * FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.user_id = u.id AND o.total > 1000
);

Avoid Functions on Indexed Columns

-- Bad: Function prevents index usage
SELECT * FROM users
WHERE LOWER(email) = 'test@example.com';

-- Good: Index-friendly
CREATE INDEX idx_email_lower ON users((LOWER(email)));

SELECT * FROM users
WHERE LOWER(email) = LOWER('test@example.com');

-- Alternative: Store normalized value
CREATE TABLE users (
  id INT PRIMARY KEY,
  email VARCHAR(100),
  email_normalized VARCHAR(100) GENERATED ALWAYS AS (LOWER(email)) STORED
);

CREATE INDEX idx_email_normalized ON users(email_normalized);

Scaling Patterns

Read Replicas

-- Primary handles writes
-- Replicas handle reads

-- Application uses replica for read-heavy queries
SELECT * FROM orders WHERE user_id = 123;  -- Route to replica
INSERT INTO orders (...) VALUES (...);     -- Route to primary

Partitioning

-- Partition by date for time-series data
CREATE TABLE orders (
  id INT,
  created_at DATE,
  total DECIMAL(10, 2),
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at) (
  PARTITION p2025_q1 VALUES LESS THAN ('2025-04-01'),
  PARTITION p2025_q2 VALUES LESS THAN ('2025-07-01'),
  PARTITION p2025_q3 VALUES LESS THAN ('2025-10-01'),
  PARTITION p2025_q4 VALUES LESS THAN ('2026-01-01')
);

Sharding

-- Shard by user_id
-- Shard 0: user_id % 4 = 0
-- Shard 1: user_id % 4 = 1
-- Shard 2: user_id % 4 = 2
-- Shard 3: user_id % 4 = 3

Common Anti-Patterns to Avoid

Anti-Pattern	Problem	Solution
No primary keys	Can’t identify rows	Always add primary key
Missing foreign keys	Data integrity issues	Add constraints
Storing JSON in columns	Hard to query, no indexing	Normalize to tables
No timestamps	Can’t track changes	Add created_at, updated_at
Using VARCHAR for everything	Wrong types	Use proper types
No migration strategy	Schema changes are risky	Use migration tools

Self-Assessment Checklist

Before launching your database:

All tables have primary keys
Foreign key constraints are in place
Indexes exist for frequently queried columns
Common queries don’t require full table scans
Normalization is appropriate for your use case
Backup and recovery procedures exist
Connection pooling is configured
Sensitive data is encrypted

What Students Should Do Next

Design a database for a project you know well — Apply normalization to a real scenario
Analyze slow queries with EXPLAIN — Understanding the execution plan is crucial
Practice indexing on sample data — Measure the difference between indexed and non-indexed queries

Database design is one of those skills that separates good developers from great ones. Invest in it early.