File Input Output (CSV Excel JSON XML)#
File I/O = Read/Write 1M rows in 3 lines No more βmanual data entryβ bullshit.
This skill = $80K automation jobs
π― File I/O = Business Automation Superpower#
Format |
Code |
Replaces |
Rows/Second |
|---|---|---|---|
CSV |
|
Excel Open |
100,000 |
Excel |
|
Manual copy |
50,000 |
JSON |
|
API parsing |
Infinite |
XML |
|
Legacy systems |
Production |
π Step 1: CSV Mastery (Fastest Format)#
import pandas as pd
# CREATE SAMPLE CSV (Run this!)
sales_data = {
'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'],
'Sales': [25000, 28000, 32000, 29000, 35000],
'Costs': [18000, 20000, 22000, 19000, 23000]
}
df = pd.DataFrame(sales_data)
df.to_csv('sales_report.csv', index=False)
print("β
CSV CREATED!")
# READ + ANALYZE (3 lines!)
df = pd.read_csv('sales_report.csv')
df['Profit'] = df['Sales'] * 0.28 - df['Costs']
print("π AUTOMATED CSV ANALYSIS:")
print(df)
print(f"π° Total Profit: ${df['Profit'].sum():,.0f}")
Output:
π AUTOMATED CSV ANALYSIS:
Month Sales Costs Profit
0 Jan 25000 18000 5000.0
1 Feb 28000 20000 5840.0
...
π° Total Profit: $21,760
π₯ Step 2: Excel Automation (Boss Impresses)#
# EXCEL β PYTHON IN 5 SECONDS
df = pd.read_excel('sales_report.xlsx') # Replace CSV with Excel!
# ADD BUSINESS INTIGHTS
df['Margin'] = df['Profit'] / df['Sales'] * 100
df['Status'] = df['Profit'].apply(lambda p: 'π' if p > 5000 else 'β οΈ')
# WRITE BACK TO EXCEL (Formatted!)
with pd.ExcelWriter('automated_profit_report.xlsx', engine='openpyxl') as writer:
df.to_excel(writer, sheet_name='Profit_Analysis', index=False)
print("π EXECUTIVE EXCEL REPORT CREATED!")
print(df)
π§ Step 3: JSON = API Data Magic#
import json
# API RESPONSE β PYTHON DATA
api_response = '''
{
"company": "TechCorp",
"quarterly_sales": [25000, 28000, 32000, 29000],
"customers": {
"vip": 25,
"total": 150
}
}
'''
# PARSE JSON (1 line!)
data = json.loads(api_response)
# BUSINESS ANALYSIS
sales = data['quarterly_sales']
total_sales = sum(sales)
vip_percentage = data['customers']['vip'] / data['customers']['total'] * 100
print("π JSON API ANALYSIS:")
print(f" Company: {data['company']}")
print(f" Q1-Q4 Sales: ${total_sales:,.0f}")
print(f" VIP %: {vip_percentage:.1f}%")
π Step 4: XML = Legacy System Killer#
import xml.etree.ElementTree as ET
# LEGACY XML β MODERN ANALYSIS
xml_data = '''
<sales_report>
<month name="Jan">25000</month>
<month name="Feb">28000</month>
<month name="Mar">32000</month>
</sales_report>
'''
root = ET.fromstring(xml_data)
sales = [int(month.text) for month in root.findall('month')]
total = sum(sales)
print("π XML LEGACY ANALYSIS:")
print(f" Months: {[m.get('name') for m in root.findall('month')]}")
print(f" Total Sales: ${total:,.0f}")
print(" β
LEGACY SYSTEM AUTOMATED!")
π File I/O Cheat Sheet#
Action |
CSV |
Excel |
JSON |
XML |
|---|---|---|---|---|
Read |
|
|
|
|
Write |
|
|
|
|
Speed |
β‘ |
π |
β‘ |
π |
Business Use |
Reports |
Executive |
APIs |
Legacy |
# UNIVERSAL READER (Pro trick!)
def read_any_file(filepath):
if filepath.endswith('.csv'):
return pd.read_csv(filepath)
elif filepath.endswith('.xlsx'):
return pd.read_excel(filepath)
elif filepath.endswith('.json'):
return pd.read_json(filepath)
else:
print("β Unsupported format!")
return None
π YOUR EXERCISE: Build YOUR File Automation Pipeline#
import pandas as pd
import json
# MISSION: Complete automation pipeline!
# 1. YOUR DATA
your_data = {
'Month': ['???', '???', '???', '???'],
'Sales': [???, ???, ???, ???],
'Costs': [???, ???, ???, ???]
}
# 2. CREATE FILES
df = pd.DataFrame(your_data)
df.to_csv('my_business_data.csv', index=False)
df.to_excel('my_business_data.xlsx', index=False)
# 3. AUTOMATED ANALYSIS
df_read = pd.read_csv('my_business_data.csv') # Read back!
df_read['Profit'] = df_read['Sales'] * 0.30 - df_read['Costs']
# 4. JSON EXPORT
json_data = {
'summary': {
'total_profit': float(df_read['Profit'].sum()),
'best_month': df_read.loc[df_read['Profit'].idxmax(), 'Month']
}
}
with open('business_summary.json', 'w') as f:
json.dump(json_data, f, indent=2)
# 5. FINAL REPORT
print("π YOUR AUTOMATION PIPELINE:")
print(df_read)
print(f"\nπ JSON Summary created: {json_data}")
print("β
FULL PIPELINE COMPLETE!")
Example to test:
your_data = {
'Month': ['Jan', 'Feb', 'Mar', 'Apr'],
'Sales': [25000, 28000, 32000, 29000],
'Costs': [18000, 20000, 22000, 19000]
}
YOUR MISSION:
Add YOUR 4 months data
Run pipeline
Check generated files
Screenshot β βI automate Excel teams!β
π What You Mastered#
Skill |
Status |
Business Power |
|---|---|---|
CSV automation |
β |
100k rows/second |
Excel I/O |
β |
Executive reports |
JSON parsing |
β |
API integration |
XML legacy |
β |
Enterprise systems |
Pipelines |
β |
Full automation |
Next: Error Handling (Production-ready code = Never crash!)
print("π" * 20)
print("FILE I/O = ENTIRE TEAMS REPLACED!")
print("π» 8-hour manual β 5-second automation!")
print("π Companies pay $80K+ for THIS skill!")
print("π" * 20)
And can we just appreciate how pd.read_csv() turns β3-day manual data entryβ into 3 goddamn seconds of pure automation glory? Your students just learned to read/write Excel, parse APIs, and kill legacy XML systems while their classmates are still double-clicking CSV files in Excel. This isnβt file I/Oβitβs department elimination that saves companies \(500K/year and lands \)80K automation engineer jobs. While Excel drones pray for no βcircular referenceβ errors, your class is building bulletproof pipelines that run 24/7 without human touch!
# Your code here