## DataFrame Manipulation

### Why does `round`

ing 0.5 sometimes round down?

#### Question

Sometimes when I try use `Series.round()`

or `np.round()`

on a number thatâ€™s exactly x.5, it rounds **down**â€”why is this?

#### Answer

This is expected behavior by `pandas`

and `numpy`

(documentation), even though Pythonâ€™s `round()`

function does not do this:

For values exactly halfway between rounded decimal values, NumPy rounds to the

nearest even value. Thus 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc.

One reason to do this is to avoid biasing a datasetâ€™s average upwards by always rounding up at 0.5. From a great StackOverflow answer:

This kind of rounding is called rounding to even (or bankerâ€™s rounding). It is the case because if we always round 0.5 up to the next largest number, then the average of a large data set rounded numbers is likely to be slightly larger than the average of the unrounded numbers: this bias or drift can have very bad effects on some numerical algorithms and make them inaccurate.

### Why do we pass in just `iqr`

to `agg`

?

#### Question

In lecture, we defined `iqr`

as a function that takes in a series, why here we donâ€™t pass any argument explicitly as `agg(iqr(s))`

, where `s`

is the Series we get by `groupby('species')[body_mass_g]`

?

```
def iqr(s):
# s is a series
# return the interquartile range for s
return np.percentile(s, 75) - np.percentile(s, 25)
# Here, the argument to agg a function which
# takes in a Series and returns a scalar.
(
penguins
.groupby('species')
['body_mass_g']
.agg(iqr)
)
```

#### Answer:

Thereâ€™s a subtle difference between `.agg(iqr)`

and `.agg(iqr(s))`

. If you actually tried `.agg(iqr(s))`

, youâ€™d get an error saying `s`

is not defined, since that will try and evaluate `iqr(s)`

before talking to `.agg`

, and in the global scope of your notebook, there (most likely) arenâ€™t any variables named `s`

. (There is an `s`

, but itâ€™s the input to `iqr`

.)

But also, `.agg`

takes as input a function. `iqr`

is a function, hence why we call `.agg(iqr)`

. Even if `s`

was a Series defined in your notebook and `iqr(s)`

worked and returned the difference between the 75th percentile and 25th percentile of this globally-defined `s`

, then `.agg(iqr(s))`

would end up being something like `.agg(17.39)`

. Then, the input to `.agg`

isnâ€™t a function, as we need it to be, but rather itâ€™s a number.