¡Down with bar charts!

  1. Calculate mean (proportion) of posts rated as is_asshole.

  2. Calculate mean (proportion) of posts rates as ~is_asshole.

  3. Make a scatter plot of these proportions.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
aita_url = ""
dfa = pd.read_csv(aita_url)
dfa["body"].fillna("", inplace = True)
p = np.mean(dfa['is_asshole'])
data = {"Label": ["A-hole", "Not A-hole"], "Proportion": [p, 1 - p]}
df = pd.DataFrame(data).round(2)
y = df["Proportion"] # data["Proportion"] # (p,1-p)
plt.scatter([1, 0], y, c = ["blue", "orange"]);
plt.ylim([0, 1]);
plt.xlim([-0.5, 1.5]);
plt.xticks([1, 0], df["Label"]);

Aggregate / Group by

  1. Find and load into Python the data set bike.csv found on

  2. Read about the data set using the help file bike.txt.

  3. Calculate the median of the variable cnt by groups of your choice; choose a reasonable variable to group by.

  4. Make a scatter plot of your grouped medians: grouping variable on the x-axis and medians on the y-axis.

  5. Challenge: Plot all the grouped data in the background of your grouped medians.

Interactive Distributions

  1. Use the notebook Distributions to get started.

  2. Make an interactive plot of the Bernoulli density function

\[f(x | p) = p^x (1 - p)^{(1 - x)}\]

where \(x \in \{0, 1\}\), \(0 \leq p \leq 1\).

  1. Read

  2. Read