Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Writing Clean and Modular Code

(a.k.a. How to Build Systems So Clean, They Could Survive a Merger)

Writing clean code is great — but at some point, your project stops being a cute 200-line script and becomes a full-on ecosystem of models, APIs, dashboards, and mystery bugs.

At that point, the question isn’t:

“Does it work?” It’s: “Will it still work when the intern refactors it next year?”

That’s where program design principles come in — the art of building code that’s modular, scalable, testable, and doesn’t collapse when your company triples its data volume overnight. 🚀


🧱 1. Modular Architecture: Divide and Conquer (Without the Empire Falling Apart)

In small scripts, everything can live in one file. In serious systems — you need modules.

Each module should be a specialist — like a team member in a startup:

  • data_loader.py — The reliable intern. Brings the data. Never on time.

  • model_trainer.py — The overachiever. Talks about accuracy a lot.

  • business_rules.py — The MBA consultant. Always asks, “But what’s the ROI?”

  • api_server.py — The PR person. Talks to the outside world.

A modular system isn’t just about separating files — it’s about separating responsibilities. When something breaks (and it will), you should know which part is guilty.


🧠 2. The Layer Cake Design (aka “Software Lasagna”) 🍰

Think of your ML/business app as a layered cake: Each layer has its own purpose, and you don’t want frosting mixed with the flour.

Example Architecture:

ml_business_app/
│
├── data_layer/
│   ├── data_loader.py
│   ├── db_connector.py
│   └── preprocessors/
│
├── business_logic/
│   ├── forecasting.py
│   ├── pricing_rules.py
│   └── optimization.py
│
├── ml_models/
│   ├── regression_model.py
│   ├── clustering_model.py
│   └── model_utils.py
│
├── api_layer/
│   ├── routes.py
│   └── serializers.py
│
├── config/
│   ├── settings.py
│   └── logging.yaml
│
└── app.py

🧩 Each layer’s role:

  • Data Layer: Talks to databases, CSVs, or APIs.

  • Business Logic Layer: Turns raw data into decisions.

  • ML Layer: Crunches numbers, trains models, and gets all the glory.

  • API Layer: Exposes results to users (and occasionally hackers).

  • Config Layer: Keeps secrets safe (hopefully).

Why this matters: When your pricing model changes, you shouldn’t have to touch your database logic. That’s like fixing the roof by changing the oven settings. 🔥


🧬 3. Loose Coupling, Tight Cohesion

(aka “Friends With Boundaries”)

  • Loose coupling → Modules don’t know too much about each other. Example: your ML model shouldn’t care how data is loaded — just that it arrives.

  • Tight cohesion → Each module has one clear job. Example: forecasting.py should not suddenly start sending Slack alerts.

Imagine your modules as coworkers in a healthy relationship: They collaborate — but don’t read each other’s emails. 💌

Use interfaces and abstraction to keep them independent:

class DataSource:
    def get_data(self):
        raise NotImplementedError

class CSVSource(DataSource):
    def get_data(self):
        print("Loading data from CSV")

class DatabaseSource(DataSource):
    def get_data(self):
        print("Loading data from DB")

Now your model doesn’t care where the data comes from — it just works.


⚙️ 4. Dependency Injection (aka “Stop Hardcoding Everything!”)

If your code is full of hardcoded paths and secret tokens, congratulations — you’ve written software that only runs on your laptop. 😅

Use configuration files, environment variables, or dependency injection frameworks to keep code flexible:

## config.yaml
database_url: "postgresql://prod-db"
model_path: "models/latest.pkl"
import yaml

config = yaml.safe_load(open("config.yaml"))
db = DatabaseConnector(config["database_url"])
model = load_model(config["model_path"])

Now you can switch between dev, test, and production like a magician. 🎩✨


🧰 5. The Plug-and-Play Principle

You want to be able to swap out components — like replacing your regression model with an XGBoost one without rewriting everything.

Use common interfaces for that:

class ForecastModel:
    def train(self, data):
        raise NotImplementedError

class LinearRegressionModel(ForecastModel):
    def train(self, data):
        print("Training Linear Regression")

class XGBoostModel(ForecastModel):
    def train(self, data):
        print("Training XGBoost")

Now your app can switch models faster than a business can pivot strategies. 🏃‍♂️💨


🧱 6. The “12-Factor App” for Data People

If your system will ever hit production, follow these modern commandments:

PrincipleMeaningAnalogy
CodebaseOne repo per appNo secret copies named “new_final_v2”
DependenciesDeclare them explicitlyRequirements.txt = grocery list
ConfigStore in environment varsDon’t hardcode your secrets, please
Backing ServicesTreat DBs as attached resourcesSwap DBs like LEGO blocks
Build, Release, RunKeep them separateCooking ≠ plating ≠ eating
LogsStream themDebugging shouldn’t require archaeology
ProcessesStateless and disposableLike snacks — easy to replace

🧠 7. Example: Modular Business ML Pipeline

## app.py
from data_layer.loader import get_data
from ml_models.forecaster import Forecaster
from business_logic.pricing import adjust_prices

def main():
    data = get_data("sales_2024.csv")
    model = Forecaster(model_type="xgboost")
    predictions = model.predict(data)
    new_prices = adjust_prices(predictions)
    print("Updated business decisions deployed!")

if __name__ == "__main__":
    main()

You’ve just built a pipeline that:

  • Loads data

  • Runs a model

  • Applies business rules

  • Deploys decisions

All without spaghetti code. 🍝✨


💬 Final Thoughts

Clean code is like personal hygiene — essential. Modular design, though, is like a gym routine — it keeps your system strong under stress.

When your business app grows, you’ll thank yourself for:

  • Splitting modules

  • Keeping responsibilities separate

  • Writing code that can evolve

“Good design is when you can replace half your system without a nervous breakdown.”

# Your code here

Exercises

Exercise 1

Refactor filter_and_square(nums) to be readable and handle non-integers: implement the function and return results.


Exercise 2

Write shorten_name(name, max_len) that truncates a string to max_len and appends ‘...’ when truncated.


Exercise 3