29 Exercises — Across the Statistics Track

A collection of end-of-chapter exercises plus cross-chapter synthesis problems. All use the NSFG dataset.

29.1 Chapter 1 — EDA

Load the NSFG data. How many total pregnancies? How many live births?
What is the mean age of mothers at the end of their first recorded pregnancy?
What fraction of live births are first babies?
Plot a histogram of agepreg for first-time mothers only. What shape do you see?

29.2 Chapter 2 — Distributions

Compute Cohen’s d for birth weight (first vs other). Is the effect larger or smaller than for pregnancy length?
What fraction of pregnancies last less than 37 weeks (premature)?
Plot normalised histograms of birth weight for all three outcomes: live birth, stillbirth, induced abortion. What differences do you see?

29.3 Chapter 3 — PMF

Build a PMF of birth order. What is the most common birth order?
Implement the size-biased distribution for birth order. How does the mean change?
At which pregnancy length do first babies and other babies differ most in probability?

29.4 Chapter 4 — CDF

What is the IQR of birth weight for live births?
A baby weighs 5.0 lbs. What percentile is this? (Use the CDF.)
Generate 1 000 synthetic birth weights using the inverse CDF. Do they match the original distribution?

29.5 Chapter 5 — Modelling

Fit a normal distribution to pregnancy length. Make a normal probability plot. Is the fit good?
Which fits better for birth weight — Normal or Lognormal? Use the KS test to decide.

29.6 Chapter 6 — PDF

Compute skewness and kurtosis for birth weight and mother’s age.
Plot KDE of birth weight with three bandwidths. Which looks right?
Implement a Gaussian KDE from scratch (not using scipy). Verify against scipy’s output on a sample.

29.7 Chapter 7 — Relationships

Compute Pearson’s r for pregnancy length vs birth weight. Is it stronger than age vs weight?
Plot a scatter of birth order vs birth weight. Is there a trend?
Compute Spearman’s \rho for mother’s age vs pregnancy length.

29.8 Chapter 8 — Estimation

Show that the 1/n variance estimator is biased using simulation (n=10, 10 000 repetitions).
Bootstrap the median pregnancy length for first babies. What is the 95 % CI?
Compute mean birth weight with and without survey weights. How large is the difference?

29.9 Chapter 9 — Hypothesis Testing

Run a permutation test for birth weight (first vs other). What is the p-value?
Run a permutation test for the difference in medians of pregnancy length.
Simulate Type I error rate: how often does a permutation test give p < 0.05 when there is truly no effect?
At what sample size does the first-baby effect in pregnancy length become statistically significant (p < 0.05)?

29.10 Chapter 10 — Least Squares

Fit a line predicting birth weight from pregnancy length. Report slope, intercept, R^2.
Bootstrap the slope 1 000 times. What is the 95 % CI?
Make a residual plot. Is there any pattern?

29.11 Chapter 11 — Regression

Fit multiple regression: totalwgt_lb ~ prglngth + agepreg + birthord. Which predictor has the largest coefficient?
Does adding a quadratic age term improve adjusted R^2?
Fit logistic regression predicting preterm birth (prglngth < 37) from mother’s age. What is the odds ratio for age?

29.12 Chapter 12 — Time Series

Aggregate mean birth weight by year. Is there an upward or downward trend?
Compute a 3-year moving average and overlay it on the raw series.
Compute the ACF of yearly birth weight. Is there meaningful serial correlation?

29.13 Chapter 13 — Survival Analysis

Compute inter-pregnancy intervals. What fraction of women are censored?
Implement the Kaplan-Meier estimator from scratch. What fraction of women have their next pregnancy within 18 months?
How does the naive mean (ignoring censoring) compare to the KM estimate?

29.14 Chapter 14 — Analytic Methods

Demonstrate the CLT: show that sampling distributions of the mean converge to normal as n grows.
Compare t-test and permutation test p-values for pregnancy length. Do they agree?
Compute the analytic 95 % CI for mean birth weight. Compare to bootstrap CI.

29.15 Synthesis problems

S1 — The full pipeline. Using NSFG: compute a summary statistic of your choice; bootstrap its confidence interval; run a permutation test; compute Cohen’s d. Write one paragraph interpreting all four results together.

S2 — Model comparison. Fit three models for birth weight:

totalwgt_lb ~ prglngth
totalwgt_lb ~ prglngth + agepreg
totalwgt_lb ~ prglngth + agepreg + birthord

For each, report R^2, adjusted R^2, AIC. Which model would you choose and why?

S3 — The full story. You now have all the tools to answer: “Are first babies born later, and does it matter?”

Write a 200-word analysis (as if reporting to a non-technical audience) that covers:

The observed difference
Whether it is statistically real (hypothesis test)
Whether it is practically important (effect size)
What other factors might explain it (regression)
Your conclusion