Data Analysis and Python
EASC2410 Lecture 1 | Dr. Binzheng Zhang
Department of Earth Sciences | Sapientia et Virtus
Welcome to the world of data analysis. In this course, we will explore how to take raw data from the earth sciences and turn it into actionable insight using Python.
The Role of Data Analysis
Data is the key—modeling helps interpret the data (to some extent). We see this in everything from air quality monitoring to pandemic tracking.
Please upload image:
slide_2_aqi.png
Please upload image:
slide_5_temp.png
Steps in Data Science:
- Data Engineering: Acquire → Prepare
- Computational Data Science: Analyze → Report → Act/Decision
Warning: Bad Data Analysis
Data visualization is powerful, but it can be used to mislead. Common pitfalls include faulty polling, flawed correlations, and misleading axes.
Interactive Example: Correlation vs. Causation
Consider the graph below. It shows a 99% correlation between US Spending on Science and Suicides by Hanging.
Question: Does funding science cause suicides?
Click for Answer
No! This is a classic "Spurious Correlation." Just because two curves move together doesn't mean one causes the other.
Course Overview
We will move from the basics of Python to advanced Machine Learning topics.
| Wk | Topic | Key Libraries |
|---|---|---|
| 1-4 | Python Basics (Variables, Loops, Functions) | Standard Lib |
| 5-6 | Numpy Arrays & Visualization | Numpy, Matplotlib |
| 7-8 | Pandas & Data Wrangling | Pandas |
| 9-10 | Statistics & Modeling | Scipy |
| 12-13 | Geospatial Data & Machine Learning | Geopandas, Sklearn |
Assessment
- Homework (7/8 assignments): 50%
- In-class Practice: 10%
- Moodle Exam: 20%
- Coding Exam: 20%
Why Python?
Python is the dominant language in Data Science because it is high-level, friendly, and has a massive ecosystem of scientific libraries.
The "Hello World" Comparison
Look how much simpler Python is compared to C or Java:
# Python
print("Hello world.")
// Java
public class Hi {
public static void main (String[] args) {
System.out.println("Hello world.");
}
}
/* C Language */
#include "studio.h"
int main() {
printf("Hello World.\n");
}
Tools: Anaconda & Jupyter
We will use Anaconda, which manages our environments, and Jupyter Notebook, which allows us to mix code, text, and plots.
Show instructions to click "Launch" on Jupyter Notebook
Helpful Shortcuts
- Run Cell:
Shift + EnterorCtrl + Enter - Get Help: Type
help(print)to see documentation.
Lecture 1 Exercises
Exercise 1: Python as a Calculator
Python supports standard mathematical operations. Try predicting the output below:
| Symbol | Operation | Example |
|---|---|---|
| + | Addition | 2 + 3 |
| * | Multiplication | 3 * 91.1 |
| ** | Power | 2 ** 10 |
| % | Modulo (Remainder) | 4 % 3 |
Interactive Playground
Type python code below (simulated) to test your knowledge.
Writing Good Code (Comments)
If your codes are not commented, you WILL lose marks on homework.
Comments start with a
#. They are ignored by Python but essential for humans.Bad vs. Good Code Style
Toggle below to see the difference between "messy" code and "maintainable" scientific code.