Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

“Because no great data scientist ever said, ‘I loved configuring environments manually.’”


💡 Why This Section Exists

Let’s face it — half of data science isn’t about models or math… it’s about getting your Python environment to stop yelling at you. This section is your sanity-saving toolkit: the stuff that makes your experiments reproducible, your teammates slightly less angry, and your laptop slightly less hot.


🐍 Setting Up Your Python Environment

ToolWhat It DoesWhy You Need It
Anaconda / MinicondaEnvironment + dependency managerBecause pip install roulette is no fun.
PoetryModern Python packagingKeeps dependencies tidy like Marie Kondo for code.
VS CodeIDEIt’s like Notepad, if Notepad went to grad school.
JupyterLabInteractive notebooksPerfect for running “just one more cell” until your GPU cries.

🧠 MLOps & Experiment Tracking

If your model results live only in screenshots, we have a problem.

ToolPurposeTagline
MLflowTrack experiments, parameters, and metricsBecause “final_model_v27_FINAL_really_final.pkl” is not version control.
Weights & Biases (W&B)Logging, visualization, model registryTurns chaos into pretty charts (and dopamine).
DVCData version controlGit for data, but less tears.
DockerPackaging & deployment“Works on my machine” → “Works everywhere” magic.

🛠️ Deployment & Serving

ToolDescriptionHumor Level
FastAPIBuild REST APIs for ML modelsSo fast it should have a cape.
StreamlitQuick web apps for ML demosFor when your PM says, “Can I see it?”
Ray ServeScalable model servingTurns your laptop into a distributed cluster… in spirit, at least.
AirflowWorkflow orchestrationThe Excel macros of the data engineering world — but cooler.

📊 Visualization Tools

ToolUse CaseComment
Matplotlib / SeabornClassic plotsStill the go-to for serious work (and serious frustration).
Plotly / DashInteractive dashboardsFancy plots that make your boss say “Ooooh”.
Tableau / Power BIBusiness dashboardsWhere data meets PowerPoint.

🧩 LLM & Agent Toolkits

ToolWhy It Matters
LangChainThe duct tape of LLM apps — connects everything to everything.
LlamaIndexRetrieval-Augmented Generation (RAG) made sane.
Hugging Face TransformersFor when you want to use a billion-parameter model like it’s NBD.
OpenAI APIThe Swiss army knife for modern AI projects.

🤖 GPU & Cloud Setup

If your laptop sounds like a jet engine, it’s time for cloud compute.

PlatformUseNotes
Google ColabFree GPUs (with occasional drama)Great until it disconnects mid-training.
AWS SageMakerScalable ML in the cloudExpensive, but powerful.
Paperspace / RunPodPay-as-you-go GPU instancesPerfect for poor PhDs and startup devs.
Azure ML / Vertex AIEnterprise-grade pipelinesWhen your manager wants governance and you want GPU time.

🚀 TL;DR

  • Automate everything that can be automated.

  • Log every experiment — you’ll thank yourself later.

  • Containers are your friends.

  • If it takes more than 5 minutes to set up, script it.

  • And remember: real data scientists use version control… for literally everything.

# Your code here