Data Analysis and Python

EASC2410 Lecture 1 | Dr. Binzheng Zhang

Department of Earth Sciences | Sapientia et Virtus


Welcome to the world of data analysis. In this course, we will explore how to take raw data from the earth sciences and turn it into actionable insight using Python.

The Role of Data Analysis

Data is the key—modeling helps interpret the data (to some extent). We see this in everything from air quality monitoring to pandemic tracking.

[Insert Slide 2 Figure: Air Quality Index NY vs NH]
Please upload image: slide_2_aqi.png
Visualizing PM2.5 and PM10 AQI values allows us to instantly spot pollution trends.
[Insert Slide 5 Figure: Global Temperature Change]
Please upload image: slide_5_temp.png

Steps in Data Science:

  1. Data Engineering: Acquire → Prepare
  2. Computational Data Science: Analyze → Report → Act/Decision

Warning: Bad Data Analysis

Data visualization is powerful, but it can be used to mislead. Common pitfalls include faulty polling, flawed correlations, and misleading axes.

Interactive Example: Correlation vs. Causation

Consider the graph below. It shows a 99% correlation between US Spending on Science and Suicides by Hanging.

[Insert Slide 9 Figure: Spurious Correlation Graph]

Question: Does funding science cause suicides?

Click for Answer

No! This is a classic "Spurious Correlation." Just because two curves move together doesn't mean one causes the other.

Key Lesson: Always check your Y-axis! Some graphs truncate the Y-axis (starting at 34% instead of 0%) to make small changes look massive.

Course Overview

We will move from the basics of Python to advanced Machine Learning topics.

Wk Topic Key Libraries
1-4Python Basics (Variables, Loops, Functions)Standard Lib
5-6Numpy Arrays & VisualizationNumpy, Matplotlib
7-8Pandas & Data WranglingPandas
9-10Statistics & ModelingScipy
12-13Geospatial Data & Machine LearningGeopandas, Sklearn

Assessment

Why Python?

Python is the dominant language in Data Science because it is high-level, friendly, and has a massive ecosystem of scientific libraries.

The "Hello World" Comparison

Look how much simpler Python is compared to C or Java:

# Python
print("Hello world.")
// Java
public class Hi {
    public static void main (String[] args) {
        System.out.println("Hello world.");
    }
}
/* C Language */
#include "studio.h"
int main() {
    printf("Hello World.\n");
}

Tools: Anaconda & Jupyter

We will use Anaconda, which manages our environments, and Jupyter Notebook, which allows us to mix code, text, and plots.

[Insert Slide 22: Anaconda Navigator Interface]
Show instructions to click "Launch" on Jupyter Notebook

Helpful Shortcuts

Writing Good Code (Comments)

If your codes are not commented, you WILL lose marks on homework.

Comments start with a #. They are ignored by Python but essential for humans.

Bad vs. Good Code Style

Toggle below to see the difference between "messy" code and "maintainable" scientific code.

# BAD CODE
# No comments, confusing variable names
def A_dl(s,tilt,phi,r0,r1):
    ct = cos(tilt)
    st = sin(tilt)
    dr = r1-r0
    dx = dr[0]
    # What is happening here? 
    # It is impossible to debug.
    return 0.1e1/r3 * sin(tilt)
# GOOD CODE
# Solar wind calculation
# Author: Dr. Kareem Sorathia, Johns Hopkins University

import numpy as np

pW = (1.0e-2)/50.0  # Default THERMAL pressure, nPa
nW = 0.1            # Default density, #/cc
VxW = 300.0         # Default wind, km/s

# Calculate time bounds
tMin = 0.0          # Start time
tMax = 200.0        # Stop time

Lecture 1 Exercises

Exercise 1: Python as a Calculator

Python supports standard mathematical operations. Try predicting the output below:

SymbolOperationExample
+Addition2 + 3
*Multiplication3 * 91.1
**Power2 ** 10
%Modulo (Remainder)4 % 3

Interactive Playground

Type python code below (simulated) to test your knowledge.