Backend Database SQL Design Tutorial

Database Design Fundamentals for Web Developers

Ravali Ravali March 15, 2025 8 min read
Database Design Fundamentals for Web Developers

Database design is often treated as an afterthought in web development. We create tables, write queries, and move on. But I’ve seen applications collapse under load because of poor database design. After fixing data models that served millions of users, I can tell you: the time invested in proper design pays dividends.

My Experience

We inherited a database that was a nightmare. Tables with no foreign keys. The same data stored in multiple places. A user table with 47 columns. Queries that took 30 seconds.

The worst part: it was impossible to fix without a complete rewrite. The application code had adapted to every anti-pattern.

We spent four months redesigning the database, migrating data, and updating the application. The result: queries that took seconds now took milliseconds. But the lesson was clear: design matters from day one.

Here’s what you need to know.

Normalization: The Foundation

First Normal Form (1NF)

Each column contains atomic values—no lists or arrays:

-- Bad: Repeating groups
CREATE TABLE orders_bad (
  id INT PRIMARY KEY,
  customer_name VARCHAR(100),
  items VARCHAR(500)  -- 'item1,item2,item3'
);

-- Good: Atomic values
CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_name VARCHAR(100)
);

CREATE TABLE order_items (
  id INT PRIMARY KEY,
  order_id INT REFERENCES orders(id),
  item_name VARCHAR(100),
  quantity INT
);

Second Normal Form (2NF)

No partial dependencies—non-key columns depend on the entire primary key:

-- Bad: Partial dependency
CREATE TABLE order_items_bad (
  order_id INT,
  product_name VARCHAR(100),  -- Depends only on product_id
  product_id INT,
  quantity INT,
  PRIMARY KEY (order_id, product_id)
);

-- Good: Separate entity
CREATE TABLE products (
  id INT PRIMARY KEY,
  name VARCHAR(100)
);

CREATE TABLE order_items (
  order_id INT REFERENCES orders(id),
  product_id INT REFERENCES products(id),
  quantity INT,
  PRIMARY KEY (order_id, product_id)
);

Third Normal Form (3NF)

No transitive dependencies—non-key columns depend only on the primary key:

-- Bad: Transitive dependency
CREATE TABLE orders_bad (
  id INT PRIMARY KEY,
  customer_id INT,
  customer_name VARCHAR(100),  -- Depends on customer_id, not order id
  customer_email VARCHAR(100)
);

-- Good: Separate table
CREATE TABLE customers (
  id INT PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100)
);

CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT REFERENCES customers(id)
);

Indexing Strategies

Indexes are essential for performance—but they come with trade-offs:

When to Index

ScenarioIndex TypeExample
Primary key lookupPrimaryid
Search by nameB-TreeCREATE INDEX idx_name ON users(name)
Range queriesB-TreeWHERE price > 100
Exact matchesHashWHERE status = 'active'
Full-text searchFull-textMATCH(content) AGAINST('query')
Geographic dataSpatialST_Distance_Sphere

Composite Indexes

-- Index for queries like: WHERE status = 'active' AND created_at > '2025-01-01'
CREATE INDEX idx_status_date ON orders(status, created_at);

-- In this order: equality first, then range
-- WHERE status = 'X' AND date > 'Y'  -- Good
-- WHERE date > 'Y' AND status = 'X'  -- Ignores index

When NOT to Index

  • Columns with low cardinality (few unique values)
  • Tables with frequent writes and rare reads
  • Small tables (full scan is faster)
-- Bad: Low cardinality index
CREATE INDEX idx_gender ON users(gender);  -- Only male/female

-- Good: Filtered index
CREATE INDEX idx_active_users ON users(id) WHERE status = 'active';

Query Optimization

Write Efficient Queries

-- Bad: SELECT * loads unnecessary data
SELECT * FROM orders WHERE status = 'shipped';

-- Good: Select only needed columns
SELECT id, total, shipped_at
FROM orders
WHERE status = 'shipped';

-- Bad: Implicit joins (old syntax)
SELECT orders.id, customers.name
FROM orders, customers
WHERE orders.customer_id = customers.id;

-- Good: Explicit joins
SELECT o.id, c.name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;

Use EXISTS Instead of IN

-- Slower: IN with subquery
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);

-- Faster: EXISTS
SELECT * FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.user_id = u.id AND o.total > 1000
);

Avoid Functions on Indexed Columns

-- Bad: Function prevents index usage
SELECT * FROM users
WHERE LOWER(email) = 'test@example.com';

-- Good: Index-friendly
CREATE INDEX idx_email_lower ON users((LOWER(email)));

SELECT * FROM users
WHERE LOWER(email) = LOWER('test@example.com');

-- Alternative: Store normalized value
CREATE TABLE users (
  id INT PRIMARY KEY,
  email VARCHAR(100),
  email_normalized VARCHAR(100) GENERATED ALWAYS AS (LOWER(email)) STORED
);

CREATE INDEX idx_email_normalized ON users(email_normalized);

Scaling Patterns

Read Replicas

-- Primary handles writes
-- Replicas handle reads

-- Application uses replica for read-heavy queries
SELECT * FROM orders WHERE user_id = 123;  -- Route to replica
INSERT INTO orders (...) VALUES (...);     -- Route to primary

Partitioning

-- Partition by date for time-series data
CREATE TABLE orders (
  id INT,
  created_at DATE,
  total DECIMAL(10, 2),
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at) (
  PARTITION p2025_q1 VALUES LESS THAN ('2025-04-01'),
  PARTITION p2025_q2 VALUES LESS THAN ('2025-07-01'),
  PARTITION p2025_q3 VALUES LESS THAN ('2025-10-01'),
  PARTITION p2025_q4 VALUES LESS THAN ('2026-01-01')
);

Sharding

-- Shard by user_id
-- Shard 0: user_id % 4 = 0
-- Shard 1: user_id % 4 = 1
-- Shard 2: user_id % 4 = 2
-- Shard 3: user_id % 4 = 3

Common Anti-Patterns to Avoid

Anti-PatternProblemSolution
No primary keysCan’t identify rowsAlways add primary key
Missing foreign keysData integrity issuesAdd constraints
Storing JSON in columnsHard to query, no indexingNormalize to tables
No timestampsCan’t track changesAdd created_at, updated_at
Using VARCHAR for everythingWrong typesUse proper types
No migration strategySchema changes are riskyUse migration tools

Self-Assessment Checklist

Before launching your database:

  • All tables have primary keys
  • Foreign key constraints are in place
  • Indexes exist for frequently queried columns
  • Common queries don’t require full table scans
  • Normalization is appropriate for your use case
  • Backup and recovery procedures exist
  • Connection pooling is configured
  • Sensitive data is encrypted

What Students Should Do Next

  1. Design a database for a project you know well — Apply normalization to a real scenario
  2. Analyze slow queries with EXPLAIN — Understanding the execution plan is crucial
  3. Practice indexing on sample data — Measure the difference between indexed and non-indexed queries

Database design is one of those skills that separates good developers from great ones. Invest in it early.

Share this article

Ravali

Ravali

Software Engineer & Content Creator

Ravali writes practical engineering guides for students and developers, combining hands-on project stories, career lessons, and trend-focused technical research.