GG413: Introduction to Statistics and Data Analysis

Instructor: Garrett Apuzen-Ito

Classes: POST 708, Mon & Wed 8:30-9:45

Prerequisites: Math242 (2nd semester calculus), GG250 (scientific programming using Matlab), or instructor consent

Textbook: Introduction to Statistics and Data Analysis, by Paul Wessel. Recommended (optional) text:  John C. Davis, Statistics and Data Analysis in Geology, 3rd Edition

Overview:

Quantitative analysis of data and modelling have become the norm in earth, planetary, and environmental sciences. Having knowledge and skills in such quantitative analysis enables one to objectively define the extent and limits of ones interpretations and opens the door to diverse ways of using data. This course provides a foundational understanding of the basic theory behind probability, statistics and quantitative data analysis, as well as practice in using real data sets using computer software (Matlab, Octave, or FreeMat). The course emphasizes solving problems, interactive class discussions, and independent inquire so that students

·Learn how to explore and characterize their data, including defining the mean, median, uncertainties, and factors that contribute to variance.

·Understand how to propagate errors in calculations of derived quantities

·Learn and gain practice in using principles in probability theory and statistics

·Perform formal hypothesis testing in interpreting data

·Use basic concepts of linear algebra and least squares formalism for curve fitting and regression

·Explore various ways to examine sequential or time-series data, including using spectral analysis

·Analyze directional data

The applications will be on geoscience data sets but the course is relevant to all fields of science.

STUDENT LEARNING OBJECTIVES

This course emphasizes three student learning objectives for undergraduate and graduate students:

·Students can apply technical knowledge of computer applications and mathematics and physics to solving real-world problems in geology and geophysics

·Students use the scientific method to define, critically analyze, and solve a problem in earth science

·Students can communicate scientific knowledge in both oral presentations and in writing

Lectures are to be viewed outside of class on YouTube (links provided below).  Class time is an interactive learning environment and largely dedicated to working problem sets.  Problem sets will be assigned approximately weekly and will involve using computer software to apply and practice using the techniques covered.  There will be a mid-term and a final exam.

Data analysis is a very hands-on activity and there will be weekly problem sets that require a mix of mathematical and computational manipulations. Homework must be handed in at the beginning of class on THURSDAY, unless you have made prior arrangements with me.  Otherwise, unexcused late homework will receive 10% less credit for each day it is late. If you anticipate a conflict for exams, you must re-schedule the exam prior to the scheduled date.  The final grade will be a weighted average of grades for homework (70%), mid term (15%), the final exam (15%).

Working Course Syllabus

Chapters 1 & 2: Exploring Data & Error Analysis

Week1: 8/22&24, Swan and Sandilands Handout and Wessel Ch 1 and 2

1.1 Classification of data (see video #1 on Data Types and Precision vs. accuracy)

1.2 Exploratory data analysis (see EDA_Lecture files)

2  Error Analysis

Homework #1 and required datasets;  and  >>>SOLUTIONS<<<

Chapter 3: Basic Concepts in Statistics

Week2: 8/29 & 8/31  (HW #1 due Wed 8/31)

3.1 Probability Basics

Lecture Videos

#1: Permutations

#2: Combinations

#3: The Binomial probability distribution (Davis Ch 2)

#4: The Hypergeometric distribution, 3.1.3 Probability, 3.1.4 Some Rules of Probability

#5: 3.1.6 Additional rules, 3.1.7 Conditional Probability

#6: 3.1.8 Conditional Probability and Bayes Theorem

Examples:  Binomial & Hypergeometric PDs (& Matlab scripts for examples 1 & 2), and Conditional Probability

Homework #2:  Probability; and >>>SOLUTIONS<<<

Week 3: 9/7 (watch #1-#3 for Wed, HW2 due Wed)

3.2 The M&M’s of Statistics (Davis pages on Central Limit Theorem)

Lecture Videos:

#1:3.2.1 Population and Samples, 3.2.2 Measure of central location (mean, median, mode)

#2: 3.2.3 Measure of variation

#2.5:  3.2.6 Covariance and Correlation

see SOLUTIONS

Week 4: 9/12 & 9/14, (HW3 Due Wed)

Watch #4-#7 for Mon

#4: 3.2.5 Inference about the mean and Central Limits Theorem

#5: 3.3.1-3.3.3 Probability Distributions, Binomial and Normal Distributions

#6: 3.3.3 The Normal (Gaussian) Probability Density Function

#7: 3.3.3-3.3.4 Applications of the Normal Distribution & the Poisson’s Distribution

See example script for plotting the binomial and normal distributions.

Wed 9/14: Study videos #1-#5 below

3.4. Inferences about means of populations, Videos #1, #2, #3

Chapter 4: Hypothesis Testing

4.1 Null Hypothesis, Videos #4

4.2. Parametric Tests (Students t, Chi-squared, F tests),

#5:  One and two sample test of means

Week 5:  9/19 & 9/21 (HW #4 due Wed)

Watch #7-#9 for Mon 9/19

#7:  4.2.3 estimating the variance of a population

#8:  4.2.4 one-sample, chi-square test of variance

#9: 4.2.5 two sample test of F-test of variance

Watch #1-#4 for Wed 9/21

2.2 Parametric Tests, videos...

#1:  general aspects of Chi-squared

Hw5:  Hypothesis Testing II: datasets:  quakedays.txt”, and “rho.txt

Week 6: 9/26 & 9/28 (HW #5 due Wed)

Mon 9/26, work on HW5.  The videos are #1-#4 above

For Wed 9/28: study the 4 videos below (annotations will be added by Friday night)

4.3 Non-Parametric Tests, see video

4.3.2 videos #1 and #2: Mann-Whitney 2-sample U test of median

Hw6:  Hypothesis Testing III, see Matlab script kolsmir.m

Week 7:  10/3 & 10/5 (HW #6 due Wed)

For Mon 10/3: study the two videos below

2.3 Non Parametric Tests

4.3.3 : Kolmogorov-Smirnov goodness of fit test (1 or 2 sample) to a pdf

For Wed 10/5 study videos #1-#4 below. Also come to class with questions about HW 1-6.
Wed is our review before the exam.

Chapter 5: Linear (Matrix) Algebra and Least Squares Inversion

Week 8: 10/10-10/12

>>MIDTERM Monday 10/10 (Covering material through HW #6) <<<<

For Wed 10/12 study videos #5-#8 below

5.9.2-5.9.3 General Least Squares Regression:  #7 Part I and #8 Part II

Week 9: 10/17 & 10/19 (Hw #7 due)

For Wed 10/19: study the first two videos below

Chapter 6: Regression

***Thu 10/20 is the day of the Great Shake Out (click link to find out what and how).

Week 10: 10/24 & 10/26 (HW #8 due)

For Mon 10/24: study the following video

Analysis of Variance (ANOVA)

For Wed 10/26: study this first video

Week 11: 10/31 & 10/2 (HW #9 due)

For Mon 10/31 study the next two videos

4.2.8 #2 One-way ANOVA

4.2.8 #3 Two-way ANOVA

Chapter 7: Sequences and Time Series Analysis

For Wed 11/2 study video #1 below

7.1 Markov Chains:  videos #1 and #2

Week 12:  11/7 & 11/9 (HW #10 due)

For Mon 11/7 study video #2 on Marcov Chains

For Wed 11/9 study the following video

7.5 Autocorrelation, Video #1

Matlab script shown in videos, with data for auto- and cross-correlation

Week 13: 11/14

7.6 Cross-correlation, Video #2

Chapter 8:  Spectral Analysis

For Wed 11/17 study videos #1-#2. (HW 11 due)

8.1 Spectral Analysis: Basic Terminology

Video #1: Introduction to spectral analysis

Video #2: Orthogonality of periodic functions

See data file honolulu_resampled.txt

Week 14:  For Mon 11/21 study videos #3 & #4 below

8.2 Spectral Analysis: Fitting the Fourier Series Video #3

Wed 11/23 Continue working on HW12

8.3 The Periodogram or Discrete Power Spectrum, Video #4

Happy Thanksgiving!

Chapter 9:  Analysis of Directional Data

Week 15: Mon 11/28 (HW 12 due), study videos #1 & #2

Video #1:  Polar histogram, computing means and variance

Video #2:  Confidence intervals, One-sample tests of means

Hw13:  Analysis of Directional Data

See data files Iceland_West.txt and Iceland_East.txt, as well as Matlab script polarhist.m

For Wed 11/30, study video #3

Video #3: Two-sample F test of means

Week 15: 12/5 & 12/7 (HW #13 due WED)

Review for Final EXAM

FINAL EXAM: Tuesday 12/13 8:10 to 11:30 10:30 a.m.