GG413: Introduction to Statistics and Data Analysis

Instructor: Garrett Apuzen-Ito

Classes: POST 708, Mon & Wed 8:30-9:45

Prerequisites: Math242 (2nd semester calculus), GG250 (scientific programming using Matlab), or instructor consent

Textbook: Introduction to Statistics and Data Analysis, by Paul Wessel. Recommended (optional) text:  John C. Davis, Statistics and Data Analysis in Geology, 3rd Edition



Quantitative analysis of data and modelling have become the norm in earth, planetary, and environmental sciences. Having knowledge and skills in such quantitative analysis enables one to objectively define the extent and limits of ones interpretations and opens the door to diverse ways of using data. This course provides a foundational understanding of the basic theory behind probability, statistics and quantitative data analysis, as well as practice in using real data sets using computer software (Matlab, Octave, or FreeMat). The course emphasizes solving problems, interactive class discussions, and independent inquire so that students

·Learn how to explore and characterize their data, including defining the mean, median, uncertainties, and factors that contribute to variance. 

·Understand how to propagate errors in calculations of derived quantities

·Learn and gain practice in using principles in probability theory and statistics

·Perform formal hypothesis testing in interpreting data

·Use basic concepts of linear algebra and least squares formalism for curve fitting and regression

·Explore various ways to examine sequential or time-series data, including using spectral analysis

·Analyze directional data

The applications will be on geoscience data sets but the course is relevant to all fields of science.



This course emphasizes three student learning objectives for undergraduate and graduate students:

·Students can apply technical knowledge of computer applications and mathematics and physics to solving real-world problems in geology and geophysics

·Students use the scientific method to define, critically analyze, and solve a problem in earth science

·Students can communicate scientific knowledge in both oral presentations and in writing


Format and workload

Lectures are to be viewed outside of class on YouTube (links provided below).  Class time is an interactive learning environment and largely dedicated to working problem sets.  Problem sets will be assigned approximately weekly and will involve using computer software to apply and practice using the techniques covered.  There will be a mid-term and a final exam. 



Data analysis is a very hands-on activity and there will be weekly problem sets that require a mix of mathematical and computational manipulations. Homework must be handed in at the beginning of class on THURSDAY, unless you have made prior arrangements with me.  Otherwise, unexcused late homework will receive 10% less credit for each day it is late. If you anticipate a conflict for exams, you must re-schedule the exam prior to the scheduled date.  The final grade will be a weighted average of grades for homework (70%), mid term (15%), the final exam (15%).



Working Course Syllabus


Chapters 1 & 2: Exploring Data & Error Analysis

Week1: 8/22&24, Swan and Sandilands Handout and Wessel Ch 1 and 2

1.1 Classification of data (see video #1 on Data Types and Precision vs. accuracy)

1.2 Exploratory data analysis (see EDA_Lecture files)

2  Error Analysis

Reporting uncertainties, significant figures, & errors of sums & general functions

Uncertainties of products, quotients, and examples cases

Homework #1 and required datasets;  and  >>>SOLUTIONS<<<


Chapter 3: Basic Concepts in Statistics


Week2: 8/29 & 8/31  (HW #1 due Wed 8/31)

3.1 Probability Basics

Lecture Videos

#1: Permutations

#2: Combinations

#3: The Binomial probability distribution (Davis Ch 2)

#4: The Hypergeometric distribution, 3.1.3 Probability, 3.1.4 Some Rules of Probability

#5: 3.1.6 Additional rules, 3.1.7 Conditional Probability

#6: 3.1.8 Conditional Probability and Bayes Theorem

Examples:  Binomial & Hypergeometric PDs (& Matlab scripts for examples 1 & 2), and Conditional Probability

Homework #2:  Probability; and >>>SOLUTIONS<<<


Week 3: 9/7 (watch #1-#3 for Wed, HW2 due Wed)

3.2 The M&M’s of Statistics (Davis pages on Central Limit Theorem)

Lecture Videos: 

#1:3.2.1 Population and Samples, 3.2.2 Measure of central location (mean, median, mode)

#2: 3.2.3 Measure of variation

#2.5:  3.2.6 Covariance and Correlation

#3: 3.2.4 Robust Estimation (MAD)

HW3:  Statistics and Probability Distributions and (data for problems 2 & 3)




Week 4: 9/12 & 9/14, (HW3 Due Wed)

Watch #4-#7 for Mon

#4: 3.2.5 Inference about the mean and Central Limits Theorem

#5: 3.3.1-3.3.3 Probability Distributions, Binomial and Normal Distributions

#6: 3.3.3 The Normal (Gaussian) Probability Density Function

#7: 3.3.3-3.3.4 Applications of the Normal Distribution & the Poisson’s Distribution

See example script for plotting the binomial and normal distributions. 


Wed 9/14: Study videos #1-#5 below

3.4. Inferences about means of populations, Videos #1, #2, #3


Chapter 4: Hypothesis Testing

4.1 Null Hypothesis, Videos #4

4.2. Parametric Tests (Students t, Chi-squared, F tests),

#5:  One and two sample test of means

Tables:  normal distribution, t-distribution, chi-squared, F-distribution

Hw4:  Hypothesis Testing with Parametric Statistics  see SOLUTIONS


Week 5:  9/19 & 9/21 (HW #4 due Wed)

Watch #7-#9 for Mon 9/19

#7:  4.2.3 estimating the variance of a population

#8:  4.2.4 one-sample, chi-square test of variance

#9: 4.2.5 two sample test of F-test of variance


Watch #1-#4 for Wed 9/21

2.2 Parametric Tests, videos...

#1:  general aspects of Chi-squared

#2: 4.2.6 Chi-squared test of a pdf

#3: 4.2.6 Chi-squared test of a pdf, example

#4: 4.2.7 test of linear correlation

Hw5:  Hypothesis Testing II: datasets:  quakedays.txt”, and “rho.txt



Week 6: 9/26 & 9/28 (HW #5 due Wed)

Mon 9/26, work on HW5.  The videos are #1-#4 above


For Wed 9/28: study the 4 videos below (annotations will be added by Friday night)

4.3 Non-Parametric Tests, see video

4.3 Parametric vs. Non-Parametric tests

4.3.1: Sign test of central value

4.3.2 videos #1 and #2: Mann-Whitney 2-sample U test of median

Tables:  Mann-Whitney, K-S (1-sample), K-S (2-sample)

Hw6:  Hypothesis Testing III, see Matlab script kolsmir.m



Week 7:  10/3 & 10/5 (HW #6 due Wed)

For Mon 10/3: study the two videos below

2.3 Non Parametric Tests

4.3.3 : Kolmogorov-Smirnov goodness of fit test (1 or 2 sample) to a pdf

4.3.4: Spearman’s Non Parametric test for correlation


For Wed 10/5 study videos #1-#4 below. Also come to class with questions about HW 1-6.
Wed is our review before the exam.


Chapter 5: Linear (Matrix) Algebra and Least Squares Inversion

5.1-5.2 #1 Matrices:  General concepts and definitions

5.3-5.4 #2 Matrix Addition, Dot Product, and Matrix Multiplication

5.5 #3 Determinant of a Matrix

5.7 #4Matrix Division:  the Inverse Matrix


Week 8: 10/10-10/12

>>MIDTERM Monday 10/10 (Covering material through HW #6) <<<<


For Wed 10/12 study videos #5-#8 below

5.9.1 #5 Simple Regression and #6 RMS Misfit

5.9.2-5.9.3 General Least Squares Regression:  #7 Part I and #8 Part II

Hw7:  Least Squares Regression:  see datasets Lanai_elev_faa_GG413.txt  and  hf.txt



Week 9: 10/17 & 10/19 (Hw #7 due)

For Wed 10/19: study the first two videos below

5.9.4 Video #1: Weighted Least Squares


Chapter 6: Regression

6.1 #2:  Line Fitting Revisited:  Confidence Intervals on True Slope, Intercept, and Regression Line

Hw8: Lease Square Regression II: see hawaii.txt, faultstep.txt, and heaviside.m



***Thu 10/20 is the day of the Great Shake Out (click link to find out what and how).

Click to find out what you need to know about earthquakes in Hawaii.


Week 10: 10/24 & 10/26 (HW #8 due)

For Mon 10/24: study the following video

#3:  Derivation of Variances of True Slope, Intercept, and Regression Line


Analysis of Variance (ANOVA)

For Wed 10/26: study this first video

Video #1:  Analysis of Variance (ANOVA) of Linear Regression

See also Draper & Smith excerpt

Hw9:  ANOVA, see Hw9_hf.txt, Hw9_Prob2_Chromium.txt, and Hw9_StudentPorosityMeasurements.txt



Week 11: 10/31 & 10/2 (HW #9 due)

For Mon 10/31 study the next two videos

4.2.8 #2 One-way ANOVA

4.2.8 #3 Two-way ANOVA


Chapter 7: Sequences and Time Series Analysis

For Wed 11/2 study video #1 below

7.1 Markov Chains:  videos #1 and #2

See detailed explanation of Example 5-1

Hw10:  Markov Chains and SOLUTIONS


Week 12:  11/7 & 11/9 (HW #10 due)

For Mon 11/7 study video #2 on Marcov Chains


For Wed 11/9 study the following video

7.5 Autocorrelation, Video #1

Matlab script shown in videos, with data for auto- and cross-correlation

HW11: Autocorrelation and Cross-Correlation, data files:  TEMPER.TXT, Chesapeake_salinity.txt



Week 13: 11/14

7.6 Cross-correlation, Video #2


Chapter 8:  Spectral Analysis

For Wed 11/17 study videos #1-#2. (HW 11 due)

8.1 Spectral Analysis: Basic Terminology

Video #1: Introduction to spectral analysis

Video #2: Orthogonality of periodic functions

Hw 12:  Spectral Analysis.  See data file honolulu_resampled.txt



Week 14:  For Mon 11/21 study videos #3 & #4 below

8.2 Spectral Analysis: Fitting the Fourier Series Video #3


Wed 11/23 Continue working on HW12

8.3 The Periodogram or Discrete Power Spectrum, Video #4

Happy Thanksgiving!


Chapter 9:  Analysis of Directional Data

Week 15: Mon 11/28 (HW 12 due), study videos #1 & #2

Video #1:  Polar histogram, computing means and variance

Video #2:  Confidence intervals, One-sample tests of means

Read Davis Hand out

Hw13:  Analysis of Directional Data

See data files Iceland_West.txt and Iceland_East.txt, as well as Matlab script polarhist.m



For Wed 11/30, study video #3

Video #3: Two-sample F test of means


Week 15: 12/5 & 12/7 (HW #13 due WED)

Review for Final EXAM


FINAL EXAM: Tuesday 12/13 8:10 to 11:30 10:30 a.m.