29 Exercises — Across the Statistics Track
A collection of end-of-chapter exercises plus cross-chapter synthesis problems. All use the NSFG dataset.
29.1 Chapter 1 — EDA
- Load the NSFG data. How many total pregnancies? How many live births?
- What is the mean age of mothers at the end of their first recorded pregnancy?
- What fraction of live births are first babies?
- Plot a histogram of
agepregfor first-time mothers only. What shape do you see?
29.2 Chapter 2 — Distributions
- Compute Cohen’s d for birth weight (first vs other). Is the effect larger or smaller than for pregnancy length?
- What fraction of pregnancies last less than 37 weeks (premature)?
- Plot normalised histograms of birth weight for all three outcomes: live birth, stillbirth, induced abortion. What differences do you see?
29.3 Chapter 3 — PMF
- Build a PMF of birth order. What is the most common birth order?
- Implement the size-biased distribution for birth order. How does the mean change?
- At which pregnancy length do first babies and other babies differ most in probability?
29.4 Chapter 4 — CDF
- What is the IQR of birth weight for live births?
- A baby weighs 5.0 lbs. What percentile is this? (Use the CDF.)
- Generate 1 000 synthetic birth weights using the inverse CDF. Do they match the original distribution?
29.5 Chapter 5 — Modelling
- Fit a normal distribution to pregnancy length. Make a normal probability plot. Is the fit good?
- Which fits better for birth weight — Normal or Lognormal? Use the KS test to decide.
29.6 Chapter 6 — PDF
- Compute skewness and kurtosis for birth weight and mother’s age.
- Plot KDE of birth weight with three bandwidths. Which looks right?
- Implement a Gaussian KDE from scratch (not using scipy). Verify against scipy’s output on a sample.
29.7 Chapter 7 — Relationships
- Compute Pearson’s r for pregnancy length vs birth weight. Is it stronger than age vs weight?
- Plot a scatter of birth order vs birth weight. Is there a trend?
- Compute Spearman’s \rho for mother’s age vs pregnancy length.
29.8 Chapter 8 — Estimation
- Show that the 1/n variance estimator is biased using simulation (n=10, 10 000 repetitions).
- Bootstrap the median pregnancy length for first babies. What is the 95 % CI?
- Compute mean birth weight with and without survey weights. How large is the difference?
29.9 Chapter 9 — Hypothesis Testing
- Run a permutation test for birth weight (first vs other). What is the p-value?
- Run a permutation test for the difference in medians of pregnancy length.
- Simulate Type I error rate: how often does a permutation test give p < 0.05 when there is truly no effect?
- At what sample size does the first-baby effect in pregnancy length become statistically significant (p < 0.05)?
29.10 Chapter 10 — Least Squares
- Fit a line predicting birth weight from pregnancy length. Report slope, intercept, R^2.
- Bootstrap the slope 1 000 times. What is the 95 % CI?
- Make a residual plot. Is there any pattern?
29.11 Chapter 11 — Regression
- Fit multiple regression:
totalwgt_lb ~ prglngth + agepreg + birthord. Which predictor has the largest coefficient? - Does adding a quadratic age term improve adjusted R^2?
- Fit logistic regression predicting preterm birth (prglngth < 37) from mother’s age. What is the odds ratio for age?
29.12 Chapter 12 — Time Series
- Aggregate mean birth weight by year. Is there an upward or downward trend?
- Compute a 3-year moving average and overlay it on the raw series.
- Compute the ACF of yearly birth weight. Is there meaningful serial correlation?
29.13 Chapter 13 — Survival Analysis
- Compute inter-pregnancy intervals. What fraction of women are censored?
- Implement the Kaplan-Meier estimator from scratch. What fraction of women have their next pregnancy within 18 months?
- How does the naive mean (ignoring censoring) compare to the KM estimate?
29.14 Chapter 14 — Analytic Methods
- Demonstrate the CLT: show that sampling distributions of the mean converge to normal as n grows.
- Compare t-test and permutation test p-values for pregnancy length. Do they agree?
- Compute the analytic 95 % CI for mean birth weight. Compare to bootstrap CI.
29.15 Synthesis problems
S1 — The full pipeline. Using NSFG: compute a summary statistic of your choice; bootstrap its confidence interval; run a permutation test; compute Cohen’s d. Write one paragraph interpreting all four results together.
S2 — Model comparison. Fit three models for birth weight:
totalwgt_lb ~ prglngthtotalwgt_lb ~ prglngth + agepregtotalwgt_lb ~ prglngth + agepreg + birthord
For each, report R^2, adjusted R^2, AIC. Which model would you choose and why?
S3 — The full story. You now have all the tools to answer: “Are first babies born later, and does it matter?”
Write a 200-word analysis (as if reporting to a non-technical audience) that covers:
- The observed difference
- Whether it is statistically real (hypothesis test)
- Whether it is practically important (effect size)
- What other factors might explain it (regression)
- Your conclusion