Post

The Database Designer's Mind: From Vague Requirement to Robust Schema

The Database Designer’s Mind: From Vague Requirement to Robust Schema

Meta Description: Go beyond just knowing the rules of database design. This article walks you through the step-by-step thought process of tackling a real-world problem, weighing pragmatism, purity, and modern tools to build a schema that lasts.

Tags: Database Design, SQL, PostgreSQL, System Architecture, Data Modeling, Best Practices, Tech Article

A new feature request arrives in your inbox. It seems simple enough: “When a student fails a course, we need to record the reason why.”

For many, the instinct is to jump straight to code: ALTER TABLE... ADD COLUMN.... Done. But this is where the craft of a software architect or a thoughtful engineer truly begins. A database schema is the foundation of your application; a crack in that foundation, however small, can cause problems for years.

Let’s use this seemingly simple request to walk through a deliberate, step-by-step design process. This isn’t about finding the one “right” answer, but about learning how to ask the right questions.

Step 1: Deconstruct the Requirement (The “What If?” Phase)

Before we write a single line of SQL, we must become the most curious person in the room. The initial requirement is just the tip of the iceberg. Our job is to discover what lies beneath.

Let’s interrogate our “fail reason” feature:

  • What is the shape of this data? Is it just a simple text string ("Did not submit final project"), or could it be more structured? What if it’s a failure_type from a dropdown, a final percentage score, and an optional text note?
  • Who provides this data? Is it an instructor, an automated script, or an administrator? This might imply needing to store who recorded the reason and when.
  • How will we use this data? Will we only ever look it up for a single student? Or will a future request be: “Show me a report of failure types across the entire university for the last semester.”?
  • How common is this event? The stakeholders mention it’s rare—maybe only 1% of course enrollments end in failure. This detail is crucial.

The Takeaway: Your database design is only as good as your understanding of the problem. A few minutes of questioning can save you weeks of refactoring later. For our scenario, let’s assume we discovered that a failure reason might become more structured in the future, and reporting is a possibility.

Step 2: The First Impulse (The “Just Add a Column” Solution)

With our questions in mind, we can evaluate the most obvious solution. It’s straightforward, fast, and gets the immediate job done.

The Schema:

1
2
ALTER TABLE user_course
ADD COLUMN fail_reason text NULL;

The Analysis: This is the path of least resistance. It’s incredibly simple to implement. The query to retrieve the reason is trivial. For the 1% of students who fail, it works perfectly.

But the questions from Step 1 should make us pause. This design immediately paints us into a corner.

  • The 99% Problem: For the vast majority of your rows, this column will be NULL. While modern databases like PostgreSQL are very efficient at storing nulls, it feels… untidy. It’s a field that simply doesn’t apply to most of the entities in the table.
  • The Scalability Dead-End: What happens when the request comes to add failure_type and recorded_by_instructor_id? Do we add more nullable columns? The user_course table gets wider and wider, cluttered with data that only applies to a fraction of its records.
  • The Reporting Nightmare: Try writing an efficient query to count failure types when the type is just an unstructured text string. It’s messy and unreliable.

The Takeaway: The simplest solution is often a short-term fix that accumulates long-term “design debt.” It’s a valuable starting point for discussion, but we can do better.

Step 3: The Purist’s Rebuttal (The “Normalized” Solution)

The weaknesses of our first approach lead us directly to the classic solution: normalization. If a “failure event” is its own distinct thing with its own unique attributes, then let’s give it its own table.

The Schema:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Main table stays clean
CREATE TABLE user_course (
    user_course_id serial PRIMARY KEY,
    user_id int,
    course_id int,
    status text -- 'enrolled', 'completed', 'failed'
);

-- New, dedicated table for the 1%
CREATE TABLE user_course_fail_details (
    user_course_id int PRIMARY KEY REFERENCES user_course(user_course_id),
    reason_text text,
    failure_type text, -- Can be an ENUM or foreign key
    recorded_by_id int REFERENCES instructors(instructor_id),
    timestamp timestamptz DEFAULT now()
);

The Analysis: Look at how clean this is. Every piece of data is exactly where it belongs.

  • It’s Scalable: We can add as many details about the failure event as we want to the new table without ever touching the massive user_course table.
  • It’s Efficient: There are no wasted NULLs. Data is only stored when a failure actually occurs.
  • It’s Robust: We can use foreign keys and ENUM types to ensure data integrity. Reporting on failure_type becomes trivial and fast.

The downside? To get a complete picture of an enrollment, you now need to perform a LEFT JOIN. Some might see this as added complexity, but it’s the small price you pay for a design that is clean, scalable, and correct.

The Takeaway: Normalization is the professional’s tool for building systems that last. It forces you to think clearly about the entities in your system and how they relate to one another.

Step 4: The Modern Compromise (The “Flexible JSON” Solution)

There is a third path, one that blends the single-table simplicity of our first impulse with the “no-wasted-space” benefit of the normalized approach. We can use PostgreSQL’s powerful jsonb type.

The Schema:

1
2
3
4
5
6
7
CREATE TABLE user_course (
    user_course_id serial PRIMARY KEY,
    ...,
    status text,
    -- NULL for passes, a JSON object for other states
    status_details jsonb NULL
);

For a failure, the status_details column might contain: {"reason": "Did not submit project", "failure_type": "non-submission", "final_score": 45}

For a withdrawal, it could be: {"reason": "Medical leave", "withdrew_on_date": "2025-05-10"}

The Analysis: This approach is powerful. It gives you immense flexibility to store different data shapes for different terminal states without touching your schema. It’s a great fit if you have many such states (failed, withdrew, incomplete), each with its own unique descriptive data.

However, this flexibility comes at the cost of database-level integrity. You can no longer have a foreign key to the instructors table for the recorded_by_id because it’s just a key in a JSON blob. The responsibility for data validation shifts almost entirely to your application.

The Takeaway: Modern tools provide powerful compromises. jsonb offers a compelling path when your primary need is flexibility for semi-structured data, but it requires discipline at the application layer.

Step 5: Making the Final Decision

So, which path do we choose? We go back to the questions from Step 1 and our analysis.

  • If the business swears the fail reason will always be a single text field and nothing more, the “Just Add a Column” approach (Step 2) is a pragmatic, if not pure, choice.
  • If you need to handle many different states with unpredictable and varied data, the “Flexible JSON” approach (Step 4) is a strong contender.
  • If a failure is a critical event with structured data that needs to be reported on and maintained with high integrity, the “Normalized” approach (Step 3) is the most professional and robust solution.

For a system as important as student records, where scalability and data integrity are paramount, my choice would be the normalized, separate table. It’s the design that is most likely to stand the test of time.

Conclusion: Think, Then Design

The journey from a one-line request to a final schema is where the real work of software engineering happens. It’s a process of questioning, exploring trade-offs, and anticipating the future. By resisting the urge to implement the first solution that comes to mind, you build a foundation that is not only functional for today but resilient for tomorrow.

The next time a “simple” feature request comes your way, don’t just see a task. See an opportunity to think.

This post is licensed under CC BY 4.0 by the author.