diff --git a/lessons/pydata/pandas_types/index.ipynb b/lessons/pydata/pandas_types/index.ipynb
new file mode 100644
index 0000000000..bb925d07bd
--- /dev/null
+++ b/lessons/pydata/pandas_types/index.ipynb
@@ -0,0 +1,6941 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Pandas - datové typy a manipulace se sloupci\n",
+ "\n",
+ "V minulé lekci jsme si představili knihovnu pandas a její základní třídy: `Series`, `DataFrame` a `Index`. Brali jsme je ovšem jako statické objekty, na které jsme se pouze dívali.\n",
+ "\n",
+ "V této lekci začneme upravovat existující tabulky. Ukážeme si:\n",
+ "\n",
+ "* jak přidat či ubrat sloupce a řádky\n",
+ "* jak změnit hodnotu konkrétní buňky\n",
+ "* jaké datové typy se hodí pro který účel\n",
+ "* aritmetické a logické operace, které lze se sloupci provádět\n",
+ "* filtrování a řazení řádků\n",
+ "\n",
+ "A jelikož o výsledky práce určitě nechceš přijít, přijde nakonec vhod i ukládání výsledků do externích souborů."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Manipulace s DataFrames\n",
+ "\n",
+ "Pro rozehřátí budeme pracovat s malou tabulkou obsahující několik základních informací o planetách, které snadno najdeš např. na [wikipedii](https://en.wikipedia.org/wiki/Planet)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " symbol | \n",
+ " obezna_poloosa | \n",
+ " obezna_doba | \n",
+ "
\n",
+ " \n",
+ " | jmeno | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Merkur | \n",
+ " ☿ | \n",
+ " 0.39 | \n",
+ " 0.24 | \n",
+ "
\n",
+ " \n",
+ " | Venuše | \n",
+ " ♀ | \n",
+ " 0.72 | \n",
+ " 0.62 | \n",
+ "
\n",
+ " \n",
+ " | Země | \n",
+ " ⊕ | \n",
+ " 1.00 | \n",
+ " 1.00 | \n",
+ "
\n",
+ " \n",
+ " | Mars | \n",
+ " ♂ | \n",
+ " 1.52 | \n",
+ " 1.88 | \n",
+ "
\n",
+ " \n",
+ " | Jupiter | \n",
+ " ♃ | \n",
+ " 5.20 | \n",
+ " 11.86 | \n",
+ "
\n",
+ " \n",
+ " | Saturn | \n",
+ " ♄ | \n",
+ " 9.54 | \n",
+ " 29.46 | \n",
+ "
\n",
+ " \n",
+ " | Uran | \n",
+ " ♅ | \n",
+ " 19.22 | \n",
+ " 84.01 | \n",
+ "
\n",
+ " \n",
+ " | Neptun | \n",
+ " ♆ | \n",
+ " 30.06 | \n",
+ " 164.80 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " symbol obezna_poloosa obezna_doba\n",
+ "jmeno \n",
+ "Merkur ☿ 0.39 0.24\n",
+ "Venuše ♀ 0.72 0.62\n",
+ "Země ⊕ 1.00 1.00\n",
+ "Mars ♂ 1.52 1.88\n",
+ "Jupiter ♃ 5.20 11.86\n",
+ "Saturn ♄ 9.54 29.46\n",
+ "Uran ♅ 19.22 84.01\n",
+ "Neptun ♆ 30.06 164.80"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "planety = pd.DataFrame({\n",
+ " \"jmeno\": [\"Merkur\", \"Venuše\", \"Země\", \"Mars\", \"Jupiter\", \"Saturn\", \"Uran\", \"Neptun\"],\n",
+ " \"symbol\": [\"☿\", \"♀\", \"⊕\", \"♂\", \"♃\", \"♄\", \"♅\", \"♆\"],\n",
+ " \"obezna_poloosa\": [0.39, 0.72, 1.00, 1.52, 5.20, 9.54, 19.22, 30.06],\n",
+ " \"obezna_doba\": [0.24, 0.62, 1, 1.88, 11.86, 29.46, 84.01, 164.8],\n",
+ "})\n",
+ "planety = planety.set_index(\"jmeno\") # S jmenným indexem se ti bude snáze pracovat\n",
+ "planety"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Přidání nového sloupce\n",
+ "\n",
+ "Když chceme přidat nový sloupec (`Series`), přiřadíme ho do `DataFrame` jako hodnotu do slovníku - tedy v hranatých závorkách s názvem sloupce. Dobrá zpráva je, že stejně jako v konstruktoru, `pandas` si \"poradí\" jak se `Series`, tak s obyčejným seznamem.\n",
+ "\n",
+ "V našem konkrétním případě si najdeme a přidáme počet známých měsíců (velkých i malých)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " symbol | \n",
+ " obezna_poloosa | \n",
+ " obezna_doba | \n",
+ " mesice | \n",
+ "
\n",
+ " \n",
+ " | jmeno | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Merkur | \n",
+ " ☿ | \n",
+ " 0.39 | \n",
+ " 0.24 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " | Venuše | \n",
+ " ♀ | \n",
+ " 0.72 | \n",
+ " 0.62 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " | Země | \n",
+ " ⊕ | \n",
+ " 1.00 | \n",
+ " 1.00 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " | Mars | \n",
+ " ♂ | \n",
+ " 1.52 | \n",
+ " 1.88 | \n",
+ " 2 | \n",
+ "
\n",
+ " \n",
+ " | Jupiter | \n",
+ " ♃ | \n",
+ " 5.20 | \n",
+ " 11.86 | \n",
+ " 79 | \n",
+ "
\n",
+ " \n",
+ " | Saturn | \n",
+ " ♄ | \n",
+ " 9.54 | \n",
+ " 29.46 | \n",
+ " 82 | \n",
+ "
\n",
+ " \n",
+ " | Uran | \n",
+ " ♅ | \n",
+ " 19.22 | \n",
+ " 84.01 | \n",
+ " 27 | \n",
+ "
\n",
+ " \n",
+ " | Neptun | \n",
+ " ♆ | \n",
+ " 30.06 | \n",
+ " 164.80 | \n",
+ " 14 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " symbol obezna_poloosa obezna_doba mesice\n",
+ "jmeno \n",
+ "Merkur ☿ 0.39 0.24 0\n",
+ "Venuše ♀ 0.72 0.62 0\n",
+ "Země ⊕ 1.00 1.00 1\n",
+ "Mars ♂ 1.52 1.88 2\n",
+ "Jupiter ♃ 5.20 11.86 79\n",
+ "Saturn ♄ 9.54 29.46 82\n",
+ "Uran ♅ 19.22 84.01 27\n",
+ "Neptun ♆ 30.06 164.80 14"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "mesice = [0, 0, 1, 2, 79, 82, 27, 14] # Alternativně mesice = pd.Series([...])\n",
+ "planety[\"mesice\"] = mesice\n",
+ "planety"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "💡 V tomto případě jsme přímo upravili existující `DataFrame`. Většina metod / operací v `pandas` (už znáš např. `set_index`) ve výchozím nastavení vždy vrací nový objekt - je to dobrým zvykem, který budeme dodržovat. Přiřazování sloupců je jednou z výjimek tohoto jinak uznávaného pravidla (tou druhou je pohodlnost).\n",
+ "\n",
+ "TODO: \n",
+ " Jak to píšu, tak mi to zase tak samozřejmé nepřijde. Nějak bych tohle chtěl zformulovat líp.
\n",
+ " \n",
+ "`DataFrame` nabízí ještě metodu `assign`, která nemění tabulku, ale vytváří její kopii s přidanými (nebo nahrazenými) sloupci:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " symbol | \n",
+ " obezna_poloosa | \n",
+ " obezna_doba | \n",
+ " mesice | \n",
+ " je_stavebnice | \n",
+ " ma_vztah_k_vestonicim | \n",
+ "
\n",
+ " \n",
+ " | jmeno | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Merkur | \n",
+ " ☿ | \n",
+ " 0.39 | \n",
+ " 0.24 | \n",
+ " 0 | \n",
+ " True | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Venuše | \n",
+ " ♀ | \n",
+ " 0.72 | \n",
+ " 0.62 | \n",
+ " 0 | \n",
+ " False | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Země | \n",
+ " ⊕ | \n",
+ " 1.00 | \n",
+ " 1.00 | \n",
+ " 1 | \n",
+ " False | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Mars | \n",
+ " ♂ | \n",
+ " 1.52 | \n",
+ " 1.88 | \n",
+ " 2 | \n",
+ " False | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Jupiter | \n",
+ " ♃ | \n",
+ " 5.20 | \n",
+ " 11.86 | \n",
+ " 79 | \n",
+ " False | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Saturn | \n",
+ " ♄ | \n",
+ " 9.54 | \n",
+ " 29.46 | \n",
+ " 82 | \n",
+ " False | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Uran | \n",
+ " ♅ | \n",
+ " 19.22 | \n",
+ " 84.01 | \n",
+ " 27 | \n",
+ " False | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Neptun | \n",
+ " ♆ | \n",
+ " 30.06 | \n",
+ " 164.80 | \n",
+ " 14 | \n",
+ " False | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " symbol obezna_poloosa obezna_doba mesice je_stavebnice \\\n",
+ "jmeno \n",
+ "Merkur ☿ 0.39 0.24 0 True \n",
+ "Venuše ♀ 0.72 0.62 0 False \n",
+ "Země ⊕ 1.00 1.00 1 False \n",
+ "Mars ♂ 1.52 1.88 2 False \n",
+ "Jupiter ♃ 5.20 11.86 79 False \n",
+ "Saturn ♄ 9.54 29.46 82 False \n",
+ "Uran ♅ 19.22 84.01 27 False \n",
+ "Neptun ♆ 30.06 164.80 14 False \n",
+ "\n",
+ " ma_vztah_k_vestonicim \n",
+ "jmeno \n",
+ "Merkur False \n",
+ "Venuše True \n",
+ "Země False \n",
+ "Mars False \n",
+ "Jupiter False \n",
+ "Saturn False \n",
+ "Uran False \n",
+ "Neptun False "
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Nový dočasný DataFrame\n",
+ "planety.assign(\n",
+ " je_stavebnice=[True, False, False, False, False, False, False, False],\n",
+ " ma_vztah_k_vestonicim=[False, True, False, False, False, False, False, False],\n",
+ ")\n",
+ "\n",
+ "# Objekt `planety` zůstal nezměněn."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol**: Zkus (jedním či druhým způsobem) přidat sloupec s rokem objevu (`\"objeveno\"`). Údaje najdeš např. zde: https://cs.wikipedia.org/wiki/Slune%C4%8Dn%C3%AD_soustava."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Není to zase tak často praktické, ale pro hodnoty nového sloupce lze použít i jednu skalární hodnotu:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "planety[\"je_planeta\"] = True"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Přidání nového řádku\n",
+ "\n",
+ "Když se strojem času vrátíme do dětství (nebo rané dospělosti) autorů těchto materiálů, tedy před rok 2006, kdy se v Praze konal astronomický kongres, který definoval pojem \"planeta\" (ale ne před rok 1930!), přibude nám nová planeta: Pluto.\n",
+ "\n",
+ "Do naší tabulky ho vložíme pomocí indexeru `loc`, který jsme již dříve používali pro \"koukání\" do tabulky:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " symbol | \n",
+ " obezna_poloosa | \n",
+ " obezna_doba | \n",
+ " mesice | \n",
+ " je_planeta | \n",
+ "
\n",
+ " \n",
+ " | jmeno | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Merkur | \n",
+ " ☿ | \n",
+ " 0.39 | \n",
+ " 0.24 | \n",
+ " 0 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Venuše | \n",
+ " ♀ | \n",
+ " 0.72 | \n",
+ " 0.62 | \n",
+ " 0 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Země | \n",
+ " ⊕ | \n",
+ " 1.00 | \n",
+ " 1.00 | \n",
+ " 1 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Mars | \n",
+ " ♂ | \n",
+ " 1.52 | \n",
+ " 1.88 | \n",
+ " 2 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Jupiter | \n",
+ " ♃ | \n",
+ " 5.20 | \n",
+ " 11.86 | \n",
+ " 79 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Saturn | \n",
+ " ♄ | \n",
+ " 9.54 | \n",
+ " 29.46 | \n",
+ " 82 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Uran | \n",
+ " ♅ | \n",
+ " 19.22 | \n",
+ " 84.01 | \n",
+ " 27 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Neptun | \n",
+ " ♆ | \n",
+ " 30.06 | \n",
+ " 164.80 | \n",
+ " 14 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Pluto | \n",
+ " ♇ | \n",
+ " 39.48 | \n",
+ " 247.94 | \n",
+ " 5 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " symbol obezna_poloosa obezna_doba mesice je_planeta\n",
+ "jmeno \n",
+ "Merkur ☿ 0.39 0.24 0 True\n",
+ "Venuše ♀ 0.72 0.62 0 True\n",
+ "Země ⊕ 1.00 1.00 1 True\n",
+ "Mars ♂ 1.52 1.88 2 True\n",
+ "Jupiter ♃ 5.20 11.86 79 True\n",
+ "Saturn ♄ 9.54 29.46 82 True\n",
+ "Uran ♅ 19.22 84.01 27 True\n",
+ "Neptun ♆ 30.06 164.80 14 True\n",
+ "Pluto ♇ 39.48 247.94 5 True"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "planety.loc[\"Pluto\"] = [\"♇\", 39.48, 247.94, 5, True] # Seznam hodnot v řádku\n",
+ "planety"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Změna hodnoty buňky\n",
+ "\n",
+ "\"Indexery\" `.loc` a `.iloc` se dvěma argumenty v hranatých závorkách odkazují přímo na konkrétní buňku, a přiřazením do nich (opět, podobně jako ve slovníku) se hodnota na příslušné místo zapíše. Jen je třeba zachovat pořadí (řádek, sloupec). \n",
+ "\n",
+ "Vrátíme se opět do současnosti a Pluto zbavíme jeho privilegií:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " symbol | \n",
+ " obezna_poloosa | \n",
+ " obezna_doba | \n",
+ " mesice | \n",
+ " je_planeta | \n",
+ "
\n",
+ " \n",
+ " | jmeno | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Merkur | \n",
+ " ☿ | \n",
+ " 0.39 | \n",
+ " 0.24 | \n",
+ " 0 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Venuše | \n",
+ " ♀ | \n",
+ " 0.72 | \n",
+ " 0.62 | \n",
+ " 0 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Země | \n",
+ " ⊕ | \n",
+ " 1.00 | \n",
+ " 1.00 | \n",
+ " 1 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Mars | \n",
+ " ♂ | \n",
+ " 1.52 | \n",
+ " 1.88 | \n",
+ " 2 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Jupiter | \n",
+ " ♃ | \n",
+ " 5.20 | \n",
+ " 11.86 | \n",
+ " 79 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Saturn | \n",
+ " ♄ | \n",
+ " 9.54 | \n",
+ " 29.46 | \n",
+ " 82 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Uran | \n",
+ " ♅ | \n",
+ " 19.22 | \n",
+ " 84.01 | \n",
+ " 27 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Neptun | \n",
+ " ♆ | \n",
+ " 30.06 | \n",
+ " 164.80 | \n",
+ " 14 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Pluto | \n",
+ " ♇ | \n",
+ " 39.48 | \n",
+ " 247.94 | \n",
+ " 5 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " symbol obezna_poloosa obezna_doba mesice je_planeta\n",
+ "jmeno \n",
+ "Merkur ☿ 0.39 0.24 0 True\n",
+ "Venuše ♀ 0.72 0.62 0 True\n",
+ "Země ⊕ 1.00 1.00 1 True\n",
+ "Mars ♂ 1.52 1.88 2 True\n",
+ "Jupiter ♃ 5.20 11.86 79 True\n",
+ "Saturn ♄ 9.54 29.46 82 True\n",
+ "Uran ♅ 19.22 84.01 27 True\n",
+ "Neptun ♆ 30.06 164.80 14 True\n",
+ "Pluto ♇ 39.48 247.94 5 False"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "planety.loc[\"Pluto\", \"je_planeta\"] = False\n",
+ "planety"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**⚠ Pozor:** Podobně jako ve slovníku, ale možná poněkud neintuitivně, je možné zapsat hodnotu do řádku i sloupce, které neexistují!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " symbol | \n",
+ " obezna_poloosa | \n",
+ " obezna_doba | \n",
+ " mesice | \n",
+ " je_planeta | \n",
+ " planeta | \n",
+ "
\n",
+ " \n",
+ " | jmeno | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Merkur | \n",
+ " ☿ | \n",
+ " 0.39 | \n",
+ " 0.24 | \n",
+ " 0.0 | \n",
+ " True | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " | Venuše | \n",
+ " ♀ | \n",
+ " 0.72 | \n",
+ " 0.62 | \n",
+ " 0.0 | \n",
+ " True | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " | Země | \n",
+ " ⊕ | \n",
+ " 1.00 | \n",
+ " 1.00 | \n",
+ " 1.0 | \n",
+ " True | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " | Mars | \n",
+ " ♂ | \n",
+ " 1.52 | \n",
+ " 1.88 | \n",
+ " 2.0 | \n",
+ " True | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " | Jupiter | \n",
+ " ♃ | \n",
+ " 5.20 | \n",
+ " 11.86 | \n",
+ " 79.0 | \n",
+ " True | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " | Saturn | \n",
+ " ♄ | \n",
+ " 9.54 | \n",
+ " 29.46 | \n",
+ " 82.0 | \n",
+ " True | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " | Uran | \n",
+ " ♅ | \n",
+ " 19.22 | \n",
+ " 84.01 | \n",
+ " 27.0 | \n",
+ " True | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " | Neptun | \n",
+ " ♆ | \n",
+ " 30.06 | \n",
+ " 164.80 | \n",
+ " 14.0 | \n",
+ " True | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " | Pluto | \n",
+ " ♇ | \n",
+ " 39.48 | \n",
+ " 247.94 | \n",
+ " 5.0 | \n",
+ " False | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " | Zeme | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " symbol obezna_poloosa obezna_doba mesice je_planeta planeta\n",
+ "jmeno \n",
+ "Merkur ☿ 0.39 0.24 0.0 True NaN\n",
+ "Venuše ♀ 0.72 0.62 0.0 True NaN\n",
+ "Země ⊕ 1.00 1.00 1.0 True NaN\n",
+ "Mars ♂ 1.52 1.88 2.0 True NaN\n",
+ "Jupiter ♃ 5.20 11.86 79.0 True NaN\n",
+ "Saturn ♄ 9.54 29.46 82.0 True NaN\n",
+ "Uran ♅ 19.22 84.01 27.0 True NaN\n",
+ "Neptun ♆ 30.06 164.80 14.0 True NaN\n",
+ "Pluto ♇ 39.48 247.94 5.0 False NaN\n",
+ "Zeme NaN NaN NaN NaN NaN True"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "planety_bad = planety.copy() # Pro jistotu si uděláme kopii\n",
+ "\n",
+ "planety_bad.loc[\"Zeme\", \"planeta\"] = True\n",
+ "planety_bad"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "💡 Jistě se ptáš, co znamená **NaN** v tabulce. Tato hodnota, více slovy \"not a number\", označuje chybějící, neplatnou nebo neznámou hodnotu (v našem případě jsme ji nezadali, a tedy se není co divit). O problematice chybějících hodnot (a jejich napravování) si budeme povídat někdy příště, prozatím se jimi nenech znervóznit."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Přiřazovat je možné i do rozsahů v indexech - jen je potřeba hlídat, aby přiřazovaná hodnota či hodnoty byly buď skalárem, nebo měly stejný tvar jako oblast, do které přiřazujeme:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " symbol | \n",
+ " obezna_poloosa | \n",
+ " obezna_doba | \n",
+ " mesice | \n",
+ " je_planeta | \n",
+ " je_obr | \n",
+ "
\n",
+ " \n",
+ " | jmeno | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Merkur | \n",
+ " ☿ | \n",
+ " 0.39 | \n",
+ " 0.24 | \n",
+ " 0 | \n",
+ " True | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Venuše | \n",
+ " ♀ | \n",
+ " 0.72 | \n",
+ " 0.62 | \n",
+ " 0 | \n",
+ " True | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Země | \n",
+ " ⊕ | \n",
+ " 1.00 | \n",
+ " 1.00 | \n",
+ " 1 | \n",
+ " True | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Mars | \n",
+ " ♂ | \n",
+ " 1.52 | \n",
+ " 1.88 | \n",
+ " 2 | \n",
+ " True | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Jupiter | \n",
+ " ♃ | \n",
+ " 5.20 | \n",
+ " 11.86 | \n",
+ " 79 | \n",
+ " True | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Saturn | \n",
+ " ♄ | \n",
+ " 9.54 | \n",
+ " 29.46 | \n",
+ " 82 | \n",
+ " True | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Uran | \n",
+ " ♅ | \n",
+ " 19.22 | \n",
+ " 84.01 | \n",
+ " 27 | \n",
+ " True | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Neptun | \n",
+ " ♆ | \n",
+ " 30.06 | \n",
+ " 164.80 | \n",
+ " 14 | \n",
+ " True | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Pluto | \n",
+ " ♇ | \n",
+ " 39.48 | \n",
+ " 247.94 | \n",
+ " 5 | \n",
+ " False | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " symbol obezna_poloosa obezna_doba mesice je_planeta je_obr\n",
+ "jmeno \n",
+ "Merkur ☿ 0.39 0.24 0 True False\n",
+ "Venuše ♀ 0.72 0.62 0 True False\n",
+ "Země ⊕ 1.00 1.00 1 True False\n",
+ "Mars ♂ 1.52 1.88 2 True False\n",
+ "Jupiter ♃ 5.20 11.86 79 True True\n",
+ "Saturn ♄ 9.54 29.46 82 True True\n",
+ "Uran ♅ 19.22 84.01 27 True True\n",
+ "Neptun ♆ 30.06 164.80 14 True True\n",
+ "Pluto ♇ 39.48 247.94 5 False NaN"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "planety.loc[\"Merkur\":\"Mars\", \"je_obr\"] = False\n",
+ "planety.loc[\"Jupiter\":\"Neptun\", \"je_obr\"] = [True, True, True, True]\n",
+ "planety"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "**Úkol:** Shodou okolností (nebo jde o astronomickou nevyhnutelnost?) mají všichni planetární obři alespoň nějaký prstenec. Dokážeš jednoduše vytvořit sloupec `\"ma_prstenec\"`?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Odstranění řádku\n",
+ "\n",
+ "Pro odebrání sloupce či řádku z DataFrame slouží metoda `drop`. Její první argument očekává označení (index) jednoho nebo více řádků či sloupců, které chceš odebrat. Argument axis označuje, ve které dimenzi se operace má aplikovat (0 či 1). Číslo je intuitivní a odpovídá pořadí, ve kterém se uvádějí klíče při odkazování na buňky.\n",
+ "\n",
+ "Osa (axis):\n",
+ "\n",
+ "- 0 = řádky\n",
+ "- 1 = sloupce\n",
+ "(Tento argument používají i četné další metody a funkce, proto se ujisti, že mu rozumíš).\n",
+ "\n",
+ "Když už jsme se vrátili do budoucnosti (resp. současnosti), vypořádejme se nemilosrdně s Plutem (pro metodu `drop` je výchozí hodnotou argumentu `axis` 0, a tedy to nemusíme psát):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " symbol | \n",
+ " obezna_poloosa | \n",
+ " obezna_doba | \n",
+ " mesice | \n",
+ " je_planeta | \n",
+ " je_obr | \n",
+ "
\n",
+ " \n",
+ " | jmeno | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Merkur | \n",
+ " ☿ | \n",
+ " 0.39 | \n",
+ " 0.24 | \n",
+ " 0 | \n",
+ " True | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Venuše | \n",
+ " ♀ | \n",
+ " 0.72 | \n",
+ " 0.62 | \n",
+ " 0 | \n",
+ " True | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Země | \n",
+ " ⊕ | \n",
+ " 1.00 | \n",
+ " 1.00 | \n",
+ " 1 | \n",
+ " True | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Mars | \n",
+ " ♂ | \n",
+ " 1.52 | \n",
+ " 1.88 | \n",
+ " 2 | \n",
+ " True | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Jupiter | \n",
+ " ♃ | \n",
+ " 5.20 | \n",
+ " 11.86 | \n",
+ " 79 | \n",
+ " True | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Saturn | \n",
+ " ♄ | \n",
+ " 9.54 | \n",
+ " 29.46 | \n",
+ " 82 | \n",
+ " True | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Uran | \n",
+ " ♅ | \n",
+ " 19.22 | \n",
+ " 84.01 | \n",
+ " 27 | \n",
+ " True | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Neptun | \n",
+ " ♆ | \n",
+ " 30.06 | \n",
+ " 164.80 | \n",
+ " 14 | \n",
+ " True | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " symbol obezna_poloosa obezna_doba mesice je_planeta je_obr\n",
+ "jmeno \n",
+ "Merkur ☿ 0.39 0.24 0 True False\n",
+ "Venuše ♀ 0.72 0.62 0 True False\n",
+ "Země ⊕ 1.00 1.00 1 True False\n",
+ "Mars ♂ 1.52 1.88 2 True False\n",
+ "Jupiter ♃ 5.20 11.86 79 True True\n",
+ "Saturn ♄ 9.54 29.46 82 True True\n",
+ "Uran ♅ 19.22 84.01 27 True True\n",
+ "Neptun ♆ 30.06 164.80 14 True True"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "planety = planety.drop(\"Pluto\") # Přidej axis=0, chceš-li být explicitní\n",
+ "planety"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Odstranění sloupce\n",
+ "\n",
+ "U sloupce funguje metoda `drop` velmi podobně, jen tentokrát argument `axis` uvést musíme.\n",
+ "\n",
+ "Odstraňme zbytečný sloupec s informační hodnotou na úrovni \"stěrače stírají, klakson troubí\"..."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " symbol | \n",
+ " obezna_poloosa | \n",
+ " obezna_doba | \n",
+ " mesice | \n",
+ " je_obr | \n",
+ "
\n",
+ " \n",
+ " | jmeno | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Merkur | \n",
+ " ☿ | \n",
+ " 0.39 | \n",
+ " 0.24 | \n",
+ " 0 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Venuše | \n",
+ " ♀ | \n",
+ " 0.72 | \n",
+ " 0.62 | \n",
+ " 0 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Země | \n",
+ " ⊕ | \n",
+ " 1.00 | \n",
+ " 1.00 | \n",
+ " 1 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Mars | \n",
+ " ♂ | \n",
+ " 1.52 | \n",
+ " 1.88 | \n",
+ " 2 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Jupiter | \n",
+ " ♃ | \n",
+ " 5.20 | \n",
+ " 11.86 | \n",
+ " 79 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Saturn | \n",
+ " ♄ | \n",
+ " 9.54 | \n",
+ " 29.46 | \n",
+ " 82 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Uran | \n",
+ " ♅ | \n",
+ " 19.22 | \n",
+ " 84.01 | \n",
+ " 27 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Neptun | \n",
+ " ♆ | \n",
+ " 30.06 | \n",
+ " 164.80 | \n",
+ " 14 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " symbol obezna_poloosa obezna_doba mesice je_obr\n",
+ "jmeno \n",
+ "Merkur ☿ 0.39 0.24 0 False\n",
+ "Venuše ♀ 0.72 0.62 0 False\n",
+ "Země ⊕ 1.00 1.00 1 False\n",
+ "Mars ♂ 1.52 1.88 2 False\n",
+ "Jupiter ♃ 5.20 11.86 79 True\n",
+ "Saturn ♄ 9.54 29.46 82 True\n",
+ "Uran ♅ 19.22 84.01 27 True\n",
+ "Neptun ♆ 30.06 164.80 14 True"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "planety = planety.drop(\"je_planeta\", axis=1) \n",
+ "planety"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "⛧ Metoda `drop`, v souladu s výše zmíněnou konvencí, vrací nový `DataFrame` (a proto výsledek operace musíme přiřadit do `planety`). Pokud chceš operovat rovnou na tabulce, můžeš použít příkaz `del` (funguje stejně jako u slovníku) nebo poprosit pandí bohy (a autory těchto materiálů) o odpuštění a přidat argument `inplace=True`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Alternativa 1)\n",
+ "# del planety[\"je_planeta\"]\n",
+ "\n",
+ "# Alternativa 2)\n",
+ "# planety.drop(\"je_planeta\", axis=1, inplace=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Datové typy\n",
+ "\n",
+ "Jak už jsme předeslali, datové typy v pandas se trochu liší od typů v Pythonu a nejsou to v pravém slova smyslu třídy, ale naštěstí konverze mezi nimi je často automatická a \"chovající se dle očekávání\"."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Příprava dat\n",
+ "\n",
+ "V datovém kurzu budeme využívat různých datových sad (obvykle větších - takových, kde není praktické je celé zapsat v konstruktoru). Nyní opustíme planety a podíváme se na některé zajímavé charakteristiky zemí kolem světa (ježto definice toho, co je to země, je poněkud vágní, bereme v potaz členy OSN), zachycené k jednomu konkrétnímu roku uplynulé dekády (protože ne vždy jsou všechny údaje k dispozici, bereme poslední rok, kde je známo dost ukazatelů). Data pocházejí povětšinou z projektu [Gapminder](https://www.gapminder.org/), doplnili jsme je jen o několik dalších informací z wikipedie.\n",
+ "\n",
+ "TODO: Upravit URL podle toho, kde nakonec data budou.
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " iso | \n",
+ " world_6region | \n",
+ " world_4region | \n",
+ " income_groups | \n",
+ " is_eu | \n",
+ " is_oecd | \n",
+ " eu_accession | \n",
+ " year | \n",
+ " area | \n",
+ " population | \n",
+ " alcohol_adults | \n",
+ " bmi_men | \n",
+ " bmi_women | \n",
+ " car_deaths_per_100000_people | \n",
+ " calories_per_day | \n",
+ " infant_mortality | \n",
+ " life_expectancy | \n",
+ " life_expectancy_female | \n",
+ " life_expectancy_male | \n",
+ " un_accession | \n",
+ "
\n",
+ " \n",
+ " | name | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Afghanistan | \n",
+ " AFG | \n",
+ " south_asia | \n",
+ " asia | \n",
+ " low_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 652860.0 | \n",
+ " 34500000.0 | \n",
+ " 0.03 | \n",
+ " 20.62 | \n",
+ " 21.07 | \n",
+ " NaN | \n",
+ " 2090.0 | \n",
+ " 66.3 | \n",
+ " 58.69 | \n",
+ " 65.812 | \n",
+ " 63.101 | \n",
+ " 1946-11-19 | \n",
+ "
\n",
+ " \n",
+ " | Albania | \n",
+ " ALB | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 28750.0 | \n",
+ " 3238000.0 | \n",
+ " 7.29 | \n",
+ " 26.45 | \n",
+ " 25.66 | \n",
+ " 5.978 | \n",
+ " 3193.0 | \n",
+ " 12.5 | \n",
+ " 78.01 | \n",
+ " 80.737 | \n",
+ " 76.693 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Algeria | \n",
+ " DZA | \n",
+ " middle_east_north_africa | \n",
+ " africa | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 2381740.0 | \n",
+ " 36980000.0 | \n",
+ " 0.69 | \n",
+ " 24.60 | \n",
+ " 26.37 | \n",
+ " NaN | \n",
+ " 3296.0 | \n",
+ " 21.9 | \n",
+ " 77.86 | \n",
+ " 77.784 | \n",
+ " 75.279 | \n",
+ " 1962-10-08 | \n",
+ "
\n",
+ " \n",
+ " | Andorra | \n",
+ " AND | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2017 | \n",
+ " 470.0 | \n",
+ " 88910.0 | \n",
+ " 10.17 | \n",
+ " 27.63 | \n",
+ " 26.43 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 2.1 | \n",
+ " 82.55 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1993-07-28 | \n",
+ "
\n",
+ " \n",
+ " | Angola | \n",
+ " AGO | \n",
+ " sub_saharan_africa | \n",
+ " africa | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 1246700.0 | \n",
+ " 20710000.0 | \n",
+ " 5.57 | \n",
+ " 22.25 | \n",
+ " 23.48 | \n",
+ " NaN | \n",
+ " 2473.0 | \n",
+ " 96.0 | \n",
+ " 65.19 | \n",
+ " 64.939 | \n",
+ " 59.213 | \n",
+ " 1976-12-01 | \n",
+ "
\n",
+ " \n",
+ " | ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ "
\n",
+ " \n",
+ " | Venezuela | \n",
+ " VEN | \n",
+ " america | \n",
+ " americas | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 912050.0 | \n",
+ " 30340000.0 | \n",
+ " 7.60 | \n",
+ " 27.45 | \n",
+ " 28.13 | \n",
+ " 7.332 | \n",
+ " 2631.0 | \n",
+ " 12.9 | \n",
+ " 75.91 | \n",
+ " 79.079 | \n",
+ " 70.950 | \n",
+ " 1945-11-15 | \n",
+ "
\n",
+ " \n",
+ " | Vietnam | \n",
+ " VNM | \n",
+ " east_asia_pacific | \n",
+ " asia | \n",
+ " lower_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 330967.0 | \n",
+ " 90660000.0 | \n",
+ " 3.91 | \n",
+ " 20.92 | \n",
+ " 21.07 | \n",
+ " NaN | \n",
+ " 2745.0 | \n",
+ " 17.3 | \n",
+ " 74.88 | \n",
+ " 81.203 | \n",
+ " 72.003 | \n",
+ " 1977-09-20 | \n",
+ "
\n",
+ " \n",
+ " | Yemen | \n",
+ " YEM | \n",
+ " middle_east_north_africa | \n",
+ " asia | \n",
+ " lower_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 527970.0 | \n",
+ " 26360000.0 | \n",
+ " 0.20 | \n",
+ " 24.44 | \n",
+ " 26.11 | \n",
+ " NaN | \n",
+ " 2223.0 | \n",
+ " 33.8 | \n",
+ " 67.14 | \n",
+ " 66.871 | \n",
+ " 63.875 | \n",
+ " 1947-09-30 | \n",
+ "
\n",
+ " \n",
+ " | Zambia | \n",
+ " ZMB | \n",
+ " sub_saharan_africa | \n",
+ " africa | \n",
+ " lower_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 752610.0 | \n",
+ " 14310000.0 | \n",
+ " 3.56 | \n",
+ " 20.68 | \n",
+ " 23.05 | \n",
+ " 11.260 | \n",
+ " 1930.0 | \n",
+ " 43.3 | \n",
+ " 59.45 | \n",
+ " 65.362 | \n",
+ " 59.845 | \n",
+ " 1964-12-01 | \n",
+ "
\n",
+ " \n",
+ " | Zimbabwe | \n",
+ " ZWE | \n",
+ " sub_saharan_africa | \n",
+ " africa | \n",
+ " low_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 390760.0 | \n",
+ " 13330000.0 | \n",
+ " 4.96 | \n",
+ " 22.03 | \n",
+ " 24.65 | \n",
+ " 20.850 | \n",
+ " 2110.0 | \n",
+ " 46.6 | \n",
+ " 60.18 | \n",
+ " 63.944 | \n",
+ " 60.120 | \n",
+ " 1980-08-25 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
193 rows × 20 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " iso world_6region world_4region income_groups \\\n",
+ "name \n",
+ "Afghanistan AFG south_asia asia low_income \n",
+ "Albania ALB europe_central_asia europe upper_middle_income \n",
+ "Algeria DZA middle_east_north_africa africa upper_middle_income \n",
+ "Andorra AND europe_central_asia europe high_income \n",
+ "Angola AGO sub_saharan_africa africa upper_middle_income \n",
+ "... ... ... ... ... \n",
+ "Venezuela VEN america americas upper_middle_income \n",
+ "Vietnam VNM east_asia_pacific asia lower_middle_income \n",
+ "Yemen YEM middle_east_north_africa asia lower_middle_income \n",
+ "Zambia ZMB sub_saharan_africa africa lower_middle_income \n",
+ "Zimbabwe ZWE sub_saharan_africa africa low_income \n",
+ "\n",
+ " is_eu is_oecd eu_accession year area population \\\n",
+ "name \n",
+ "Afghanistan False False NaN 2018 652860.0 34500000.0 \n",
+ "Albania False False NaN 2018 28750.0 3238000.0 \n",
+ "Algeria False False NaN 2018 2381740.0 36980000.0 \n",
+ "Andorra False False NaN 2017 470.0 88910.0 \n",
+ "Angola False False NaN 2018 1246700.0 20710000.0 \n",
+ "... ... ... ... ... ... ... \n",
+ "Venezuela False False NaN 2018 912050.0 30340000.0 \n",
+ "Vietnam False False NaN 2018 330967.0 90660000.0 \n",
+ "Yemen False False NaN 2018 527970.0 26360000.0 \n",
+ "Zambia False False NaN 2018 752610.0 14310000.0 \n",
+ "Zimbabwe False False NaN 2018 390760.0 13330000.0 \n",
+ "\n",
+ " alcohol_adults bmi_men bmi_women car_deaths_per_100000_people \\\n",
+ "name \n",
+ "Afghanistan 0.03 20.62 21.07 NaN \n",
+ "Albania 7.29 26.45 25.66 5.978 \n",
+ "Algeria 0.69 24.60 26.37 NaN \n",
+ "Andorra 10.17 27.63 26.43 NaN \n",
+ "Angola 5.57 22.25 23.48 NaN \n",
+ "... ... ... ... ... \n",
+ "Venezuela 7.60 27.45 28.13 7.332 \n",
+ "Vietnam 3.91 20.92 21.07 NaN \n",
+ "Yemen 0.20 24.44 26.11 NaN \n",
+ "Zambia 3.56 20.68 23.05 11.260 \n",
+ "Zimbabwe 4.96 22.03 24.65 20.850 \n",
+ "\n",
+ " calories_per_day infant_mortality life_expectancy \\\n",
+ "name \n",
+ "Afghanistan 2090.0 66.3 58.69 \n",
+ "Albania 3193.0 12.5 78.01 \n",
+ "Algeria 3296.0 21.9 77.86 \n",
+ "Andorra NaN 2.1 82.55 \n",
+ "Angola 2473.0 96.0 65.19 \n",
+ "... ... ... ... \n",
+ "Venezuela 2631.0 12.9 75.91 \n",
+ "Vietnam 2745.0 17.3 74.88 \n",
+ "Yemen 2223.0 33.8 67.14 \n",
+ "Zambia 1930.0 43.3 59.45 \n",
+ "Zimbabwe 2110.0 46.6 60.18 \n",
+ "\n",
+ " life_expectancy_female life_expectancy_male un_accession \n",
+ "name \n",
+ "Afghanistan 65.812 63.101 1946-11-19 \n",
+ "Albania 80.737 76.693 1955-12-14 \n",
+ "Algeria 77.784 75.279 1962-10-08 \n",
+ "Andorra NaN NaN 1993-07-28 \n",
+ "Angola 64.939 59.213 1976-12-01 \n",
+ "... ... ... ... \n",
+ "Venezuela 79.079 70.950 1945-11-15 \n",
+ "Vietnam 81.203 72.003 1977-09-20 \n",
+ "Yemen 66.871 63.875 1947-09-30 \n",
+ "Zambia 65.362 59.845 1964-12-01 \n",
+ "Zimbabwe 63.944 60.120 1980-08-25 \n",
+ "\n",
+ "[193 rows x 20 columns]"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "url = \"https://raw.githubusercontent.com/janpipek/data-pro-pyladies/master/data/countries.csv\"\n",
+ "countries = pd.read_csv(url, index_col=\"name\") # Místo `set_index`\n",
+ "countries = countries.sort_index()\n",
+ "countries"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Namátkou si vybereme nějakou zemi a podíváme se, jaké údaje o ní v tabulce máme."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "iso CZE\n",
+ "world_6region europe_central_asia\n",
+ "world_4region europe\n",
+ "income_groups high_income\n",
+ "is_eu True\n",
+ "is_oecd True\n",
+ "eu_accession 2004-05-01\n",
+ "year 2018\n",
+ "area 78870\n",
+ "population 1.059e+07\n",
+ "alcohol_adults 16.47\n",
+ "bmi_men 27.91\n",
+ "bmi_women 26.51\n",
+ "car_deaths_per_100000_people 5.72\n",
+ "calories_per_day 3256\n",
+ "infant_mortality 2.8\n",
+ "life_expectancy 79.37\n",
+ "life_expectancy_female 81.858\n",
+ "life_expectancy_male 76.148\n",
+ "un_accession 1993-01-19\n",
+ "Name: Czechia, dtype: object"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "countries.loc[\"Czechia\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Už na první pohled je každé pole jiného typu. Ale jakého? Na to nám odpoví vlastnost `dtypes` naší tabulky (u `Series` použiješ `dtype`, resp. raději `dtype.name`, pokud chceš stejně pěknou řetězcovou reprezentaci)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "iso object\n",
+ "world_6region object\n",
+ "world_4region object\n",
+ "income_groups object\n",
+ "is_eu bool\n",
+ "is_oecd bool\n",
+ "eu_accession object\n",
+ "year int64\n",
+ "area float64\n",
+ "population float64\n",
+ "alcohol_adults float64\n",
+ "bmi_men float64\n",
+ "bmi_women float64\n",
+ "car_deaths_per_100000_people float64\n",
+ "calories_per_day float64\n",
+ "infant_mortality float64\n",
+ "life_expectancy float64\n",
+ "life_expectancy_female float64\n",
+ "life_expectancy_male float64\n",
+ "un_accession object\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "countries.dtypes"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Typy v pandas vycházejí z toho, jak je definuje knihovna `numpy` (obecně užitečná pro práci s numerickými poli a poskytující vektorové operace s rychlostí řádově rychlejší než v Pythonu jako takovém). Ta potřebuje především vědět, jak alokovat pole pro prvky daného typu - na to, aby mohly být seřazeny efektivně jeden za druhým, a tedy i kolik bajtů paměti každý zabírá. Kopíruje přitom \"nativní\" datové typy, jako je můžeš znát, pokud už máš takovou zkušenost, např. z jazyka C. Umístění paměti je něco, co v Pythonu obvykle neřešíme, ale rychlé počítání se bez toho neobejde. My nepůjdeme do detailů, ale požadavek na rychlost se nám tu a tam vynoří a my budeme klást důraz na to, aby se operace dělaly \"vektorově\", řešily \"na úrovni numpy\".\n",
+ "\n",
+ "Poněkud kryptický systém typů v `numpy` (popsaný v [dokumentaci](https://docs.scipy.org/doc/numpy/user/basics.types.html)) je naštěstí v `pandas` (mírně) zjednodušen a nabízí jen několik užitečných základních (rodin) typů, které si teď představíme."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Celá čísla (integers)\n",
+ "\n",
+ "V Pythonu je pro celá čísla vyhrazen přesně jeden typ: `int`, který možňuje pracovat s libovolně velkými celými čísly (0, -58 nebo třeba 123456789012345678901234567890). V `pandas` se můžeš setkat s `int8`, `int16`, `int32`, `int64`, `uint8`, `uint16`, `uint32` a `uint64` - všechny mají stejné základní vlastnosti a každý z nich má jen určitý rozsah čísel, která do něj lze uložit. Liší se velikostí paměti, kterou jedno číslo zabere (číslovka v názvu vyjadřuje počet bitů), a tím, zda jsou podporována i záporná čísla (předpona `u` znamená, že počítáme pouze s nulou a kladnými čísly). \n",
+ "\n",
+ "Rozsahy:\n",
+ "\n",
+ "- `int8`: -128 až 127 \n",
+ "- `uint8`: 0 až 255\n",
+ "- `int16`: -32 768 až 32 767\n",
+ "- `uint16`: 0 až 65 535\n",
+ "- `int32`: -2 147 483 647 až 2 147 483 647 (tedy +/- ~2 miliardy)\n",
+ "- `uint32`: 0 až 4 294 967 295 (tedy až ~4 miliardy)\n",
+ "- `int64`: -9 223 372 036 854 775 808 až 9 223 372 036 854 775 807 (tedy +/- ~9 trilionů)\n",
+ "- `uint64`: 0 až 18 446 744 073 709 551 615 (tedy až ~18 trilionů)\n",
+ "\n",
+ "💡 Aby toho nebylo málo, ke každému `intX` / `uintY` typu existuje ještě jeho alternativa, která umožňuje ve sloupci použít chybějící hodnoty, t.j. `NaN`. Místo malého `i`, případně `u` v názvu se použije písmeno velké. Tato vlastnost (tzv. \"nullable integer types\") je relativně užitečná, ale je dosud poněkud experimentální. My ji nebudeme v kurzu využívat.\n",
+ "\n",
+ "Detailní vysvětlení toho, jak jsou celá čísla v paměti počítače reprezentována, najdeš třeba ve [wikipedii](https://cs.wikipedia.org/wiki/Integer)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "V `pandas` je výchozí celočíselný typ `int64`, a pokud neřekneš jinak, automaticky se pro celá čísla použije (ve většině případů to bude vhodná volba):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan 2018\n",
+ "Albania 2018\n",
+ "Algeria 2018\n",
+ "Andorra 2017\n",
+ "Angola 2018\n",
+ " ... \n",
+ "Venezuela 2018\n",
+ "Vietnam 2018\n",
+ "Yemen 2018\n",
+ "Zambia 2018\n",
+ "Zimbabwe 2018\n",
+ "Name: year, Length: 193, dtype: int64"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "countries[\"year\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 0\n",
+ "1 123\n",
+ "2 12345\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.Series([0, 123, 12345])\n",
+ "\n",
+ "# pd.Series([0, 123, 12345], dtype=\"int64\") # totéž"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Pomocí argumentu `dtype` můžeš ovšem přesně specifikovat, který typ celých čísel chceš:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 0\n",
+ "1 123\n",
+ "2 12345\n",
+ "dtype: int16"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.Series([0, 123, 12345], dtype=\"int16\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**⚠ Pozor:** Když vybíráš konkrétní celočíselný typ, musíš si dát pozor na rozsahy, protože `pandas` tě nebude varovat, pokud se nějaká z tvých hodnot do rozsahu \"nevleze\" a vesele zahodí tu část binární reprezentace, která je navíc:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 0\n",
+ "1 123\n",
+ "2 57\n",
+ "dtype: int8"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.Series([0, 123, 12345], dtype=\"int8\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Toto naštěstí neplatí pro typ s nejširším rozsahem (`int64`). Zkusme do něj vložit veliké číslo (třeba 123456789012345678901234567890) a uvidíme, co se stane:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 0\n",
+ "1 123\n",
+ "2 123456789012345678901234567890\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Toto vyhodí výjimku:\n",
+ "# pd.Series([0, 123, 123456789012345678901234567890], dtype=\"int64\")\n",
+ "\n",
+ "# Toto ano, ale už to není int64:\n",
+ "pd.Series([0, 123, 123456789012345678901234567890])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "- Když ho budeme explicitně požadovat, vyhodí se výjimka.\n",
+ "- Když `pandas` necháme dělat jeho práci, použije se obecný typ `object` a přijdeme o jistou část výhod: sloupec nám zabere násobně více paměti a aritmetické operace s ním jsou o řád až dva pomalejší. Dokud není výkon na předním místě našich priorit, není to zase takový problém.\n",
+ "\n",
+ "Obecně proto doporučujeme držet se `int64`, resp. nechat `pandas`, aby jej za nás automaticky použil. Teprve v případě, že si to budou žádat přísné paměťové nároky, se ti vyplatí hledat ten \"nejvíce růžový\" typ."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Čísla s plovoucí desetinnou čárkou (floats)\n",
+ "\n",
+ "Podobně jako u celočíselných hodnot, i jednomu typu v Python (`float`) odpovídá několik typů v `pandas`: `float16`, `float32`, `float64`. Součástí názvu je opět počet bitů, které jedno číslo potřebuje ke svému uložení. Naštěstí v tomto případě `float64` přesně odpovídá svým chováním `float` z Pythonu, zbylé dva typy nejsou tak přesné a mají menší rozsah - kromě optimalizace paměťových nároků u specifického druhu dat je nejspíš nepoužiješ.\n",
+ "\n",
+ "Více teoretického čtení o reprezentaci čísel s desetinnou čárkou najdeš na [wiki](https://cs.wikipedia.org/wiki/Pohybliv%C3%A1_%C5%99%C3%A1dov%C3%A1_%C4%8D%C3%A1rka)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan 20.62\n",
+ "Albania 26.45\n",
+ "Algeria 24.60\n",
+ "Andorra 27.63\n",
+ "Angola 22.25\n",
+ " ... \n",
+ "Venezuela 27.45\n",
+ "Vietnam 20.92\n",
+ "Yemen 24.44\n",
+ "Zambia 20.68\n",
+ "Zimbabwe 22.03\n",
+ "Name: bmi_men, Length: 193, dtype: float64"
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "countries[\"bmi_men\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 3.141593\n",
+ "dtype: float64"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Docela přesné pí\n",
+ "pd.Series([3.14159265])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 3.140625\n",
+ "dtype: float16"
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Ne už tak přesné pí\n",
+ "pd.Series([3.14159265], dtype=\"float16\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logické hodnoty (booleans)\n",
+ "\n",
+ "Toto je asi nejméně překvapivý datový typ. Chová se v zásadě stejně jako typ `bool` v Pythonu. Nabírá hodnot `True` a `False` (které lze též pokládat za 1 a 0 v některých operacích). Má ještě jednu skvělou vlastnost - objekty `Series` i `DataFrame` jde filtrovat právě pomocí sloupce logického typu (o tom viz níže)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Slovakia True\n",
+ "Slovenia True\n",
+ "Solomon Islands False\n",
+ "Somalia False\n",
+ "South Africa False\n",
+ "South Korea False\n",
+ "South Sudan False\n",
+ "Spain True\n",
+ "Sri Lanka False\n",
+ "Sudan False\n",
+ "Suriname False\n",
+ "Swaziland False\n",
+ "Sweden True\n",
+ "Switzerland False\n",
+ "Syria False\n",
+ "Name: is_eu, dtype: bool"
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "countries[\"is_eu\"][\"Slovakia\":\"Syria\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 True\n",
+ "1 False\n",
+ "2 False\n",
+ "dtype: bool"
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.Series([True, False, False])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Jde to ovšem i takto:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 True\n",
+ "1 False\n",
+ "2 False\n",
+ "dtype: bool"
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.Series([1, 0, 0], dtype=\"bool\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Objekty a řetězce (objects)\n",
+ "\n",
+ "Toto tě pravděpodobně překvapí: `pandas` nemá zvláštní datový typ pro řetězce! Spadá společně s dalšími neurčenými nebo nerozpoznanými hodnotami do kategorie `object`, která umožňuje v daném sloupci mít cokoliv, co znáš z Pythonu, a chová se tak do značné míry jako obyčejný seznam s výhodami (žádné podivné konverze, sledování rozsahů, ...) i nevýhodami (je to pomalejší, než by mohlo; nikdo ti nezaručí, že ve sloupci budou jen řetězce). "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan AFG\n",
+ "Albania ALB\n",
+ "Algeria DZA\n",
+ "Andorra AND\n",
+ "Angola AGO\n",
+ " ... \n",
+ "Venezuela VEN\n",
+ "Vietnam VNM\n",
+ "Yemen YEM\n",
+ "Zambia ZMB\n",
+ "Zimbabwe ZWE\n",
+ "Name: iso, Length: 193, dtype: object"
+ ]
+ },
+ "execution_count": 27,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "countries[\"iso\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 pes\n",
+ "1 kočka\n",
+ "2 křeček\n",
+ "3 tarantule\n",
+ "4 hroznýš\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Domácí mazlíčci\n",
+ "pd.Series([\"pes\", \"kočka\", \"křeček\", \"tarantule\", \"hroznýš\"])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 1\n",
+ "1 dvě\n",
+ "2 3\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.Series([1, \"dvě\", 3.0]) # Řetězec a další \"smetí\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Pozor, třeba i takový seznam může být hodnotou v sloupci typu `object`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Eva [řízek, brambory, cola]\n",
+ "Evelína [smažák, hranolky]\n",
+ "Evženie [sodovka]\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 30,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Objednávky\n",
+ "pd.Series(\n",
+ " [[\"řízek\", \"brambory\", \"cola\"], [\"smažák\", \"hranolky\"], [\"sodovka\"]],\n",
+ " index=[\"Eva\", \"Evelína\", \"Evženie\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Datum / čas (datetime)\n",
+ "\n",
+ "Časovými daty se blíže zabývá jedna z následujících lekcí, nicméně nějaká v tabulce zemí už máme, a tak alespoň pro úplnost uvedeme, co v tomto směru `pandas` nabízí:\n",
+ "\n",
+ "- Časové či datumové údaje (*datetime*) jakožto body na časové ose.\n",
+ "\n",
+ "- Časové údaje s označením časové zóny (*datetimes with time zone*).\n",
+ "\n",
+ "- Časové úseky (*timedeltas*) jakožto určení délky nějakého úseku (počítáno v nanosekundách)\n",
+ "\n",
+ "- Období (*periods*) udávají nějak určená časová období (třeba \"únor 2020\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "💡 Pro převod z nejrůznějších formátů na datum / čas slouží funkce `to_datetime`, kterou použijeme pro následující ukázku:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan 1946-11-19\n",
+ "Albania 1955-12-14\n",
+ "Algeria 1962-10-08\n",
+ "Andorra 1993-07-28\n",
+ "Angola 1976-12-01\n",
+ " ... \n",
+ "Venezuela 1945-11-15\n",
+ "Vietnam 1977-09-20\n",
+ "Yemen 1947-09-30\n",
+ "Zambia 1964-12-01\n",
+ "Zimbabwe 1980-08-25\n",
+ "Name: un_accession, Length: 193, dtype: datetime64[ns]"
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.to_datetime(countries[\"un_accession\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Kategorické (category)\n",
+ "\n",
+ "Pokud chceme být efektivní při práci se sloupci, kde se často opakují hodnoty (zejména řetězcové), můžeme je zakódovat do kategorií. Tím mnohdy ušetříme zabrané místo a urychlíme některé operace. Při takové konverzi `pandas` najde všechny unikátní hodnoty v daném sloupci, uloží si je do zvláštního seznamu a do sloupce uloží jenom indexy z tohoto seznamu. Vše se chová transparentně a při používání tak většinou ani nepoznáte, jestli máte sloupec typu `object` nebo `category`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "💡 Pro převod mezi různými datovými typy slouží metoda `astype`, která jako svůj argument akceptuje jméno dtype, na který chceme převést:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan low_income\n",
+ "Albania upper_middle_income\n",
+ "Algeria upper_middle_income\n",
+ "Andorra high_income\n",
+ "Angola upper_middle_income\n",
+ " ... \n",
+ "Venezuela upper_middle_income\n",
+ "Vietnam lower_middle_income\n",
+ "Yemen lower_middle_income\n",
+ "Zambia lower_middle_income\n",
+ "Zimbabwe low_income\n",
+ "Name: income_groups, Length: 193, dtype: category\n",
+ "Categories (4, object): [high_income, low_income, lower_middle_income, upper_middle_income]"
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "countries[\"income_groups\"].astype(\"category\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol:** Napadne tě, které sloupce z tabulky `countries` bychom měli překonvertovat na nějaký jiný typ?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Matematika\n",
+ "\n",
+ "Počítání se `Series` v `pandas` je navrženo tak, aby co nejméně překvapilo. Jednotlivé sloupce se tak můžou stát součástí aritmetických výrazů společně se skalárními hodnotami, s jinými sloupci, `numpy` poli příslušného tvaru, a dokonce i seznamy."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan 21421.85\n",
+ "Albania 28473.65\n",
+ "Algeria 28418.90\n",
+ "Andorra 30130.75\n",
+ "Angola 23794.35\n",
+ " ... \n",
+ "Venezuela 27707.15\n",
+ "Vietnam 27331.20\n",
+ "Yemen 24506.10\n",
+ "Zambia 21699.25\n",
+ "Zimbabwe 21965.70\n",
+ "Name: life_expectancy, Length: 193, dtype: float64"
+ ]
+ },
+ "execution_count": 33,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Očekávaná doba života ve dnech\n",
+ "countries[\"life_expectancy\"] * 365"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan 52.844408\n",
+ "Albania 112.626087\n",
+ "Algeria 15.526464\n",
+ "Andorra 189.170213\n",
+ "Angola 16.611855\n",
+ " ... \n",
+ "Venezuela 33.265720\n",
+ "Vietnam 273.924591\n",
+ "Yemen 49.927079\n",
+ "Zambia 19.013832\n",
+ "Zimbabwe 34.113011\n",
+ "Length: 193, dtype: float64"
+ ]
+ },
+ "execution_count": 34,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Hustota obyvatelstva\n",
+ "countries[\"population\"] / countries[\"area\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "řízek 129.9\n",
+ "smažák 109.9\n",
+ "dtype: float64"
+ ]
+ },
+ "execution_count": 35,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Jak nám zdražili obědy\n",
+ "pd.Series([109, 99], index=[\"řízek\", \"smažák\"]) + [20.9, 10.9] # sčítání se seznamem"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol**: Spočti celkový počet mrtvých v automobilových haváriích v jednotlivých zemích (použij sloupce \"population\" a \"car_deaths_per_100000_people\" a jednoduchou aritmetiku). Sedí výsledek pro ČR?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan 26700 days 00:54:33.011664\n",
+ "Albania 23388 days 00:54:33.011664\n",
+ "Algeria 20898 days 00:54:33.011664\n",
+ "Andorra 9647 days 00:54:33.011664\n",
+ "Angola 15730 days 00:54:33.011664\n",
+ " ... \n",
+ "Venezuela 27069 days 00:54:33.011664\n",
+ "Vietnam 15437 days 00:54:33.011664\n",
+ "Yemen 26385 days 00:54:33.011664\n",
+ "Zambia 20113 days 00:54:33.011664\n",
+ "Zimbabwe 14367 days 00:54:33.011664\n",
+ "Name: un_accession, Length: 193, dtype: timedelta64[ns]"
+ ]
+ },
+ "execution_count": 36,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Jak dlouho jsou v OSN?\n",
+ "from datetime import datetime\n",
+ "datetime.now() - pd.to_datetime(countries[\"un_accession\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "💡 Čísla s plouvoucí desetinnou čárkou mohou obsahovat i speciální hodnoty \"not a number\" a plus nebo mínus nekonečno. Vzniknou např. při nevhodném dělení nulou:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 NaN\n",
+ "1 -inf\n",
+ "2 inf\n",
+ "dtype: float64"
+ ]
+ },
+ "execution_count": 37,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.Series([0, -1, 1]) / pd.Series([0, 0, 0])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Varování:** Nabádáme tě k opatrnosti při práci s omezenými celočíselnými typy. Podobně jako při jejich nevhodné konverzi, i tady může výsledek \"přetéct\" a ukazovat pochybné výsledky. O důvod víc, proč se držet `int64`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 14\n",
+ "1 28\n",
+ "2 42\n",
+ "dtype: int8"
+ ]
+ },
+ "execution_count": 38,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.Series([7, 14, 149], dtype=\"int8\") * 2"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Porovnávání"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Pro `Series` lze použít nejen operátory početní, ale také logické. Výsledkem pak není jedna logická hodnota, ale sloupec logických hodnot."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan False\n",
+ "Albania False\n",
+ "Algeria False\n",
+ "Andorra False\n",
+ "Angola False\n",
+ " ... \n",
+ "Venezuela False\n",
+ "Vietnam False\n",
+ "Yemen False\n",
+ "Zambia False\n",
+ "Zimbabwe False\n",
+ "Name: alcohol_adults, Length: 193, dtype: bool"
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 15 litrů čistého alkoholu budeme považovat za hranici nadměrného pití (nekonzultováno s adiktology!)\n",
+ "# Kde se hodně pije?\n",
+ "countries[\"alcohol_adults\"] > 15"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "True"
+ ]
+ },
+ "execution_count": 40,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Skoro nikde. A jak jsme na tom u nás?\n",
+ "(countries[\"alcohol_adults\"] > 15).loc[\"Czechia\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan False\n",
+ "Albania True\n",
+ "Algeria False\n",
+ "Andorra True\n",
+ "Angola False\n",
+ " ... \n",
+ "Venezuela False\n",
+ "Vietnam False\n",
+ "Yemen False\n",
+ "Zambia False\n",
+ "Zimbabwe False\n",
+ "Length: 193, dtype: bool"
+ ]
+ },
+ "execution_count": 41,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Jsou muži v jednotlivých zemích tlustší než ženy?\n",
+ "countries[\"bmi_men\"] > countries[\"bmi_women\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol**: Zjistěte, jestli se v jednotlivých zemích dožívají více muži nebo ženy."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan False\n",
+ "Albania False\n",
+ "Algeria True\n",
+ "Andorra False\n",
+ "Angola True\n",
+ " ... \n",
+ "Venezuela False\n",
+ "Vietnam False\n",
+ "Yemen False\n",
+ "Zambia True\n",
+ "Zimbabwe True\n",
+ "Name: world_4region, Length: 193, dtype: bool"
+ ]
+ },
+ "execution_count": 42,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Leží země v Africe?\n",
+ "countries[\"world_4region\"] == \"africa\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Podobně jako v Pythonu lze podmínky kombinovat pomocí operátorů. Vzhledem k jistým syntaktickým požadavkům Pythonu je ale potřeba použít místo vám známých logických operátorů jejich alternativy: `&` (místo `and`), `|` (místo `or`) a `~` (místo `not`). Protože mají jiné priority než jejich klasičtí břatříčci, bude lepší, když při kombinaci s jinýmim operátory vždycky použiješ závorky."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Afghanistan False\n",
+ "Albania True\n",
+ "Algeria True\n",
+ "Andorra False\n",
+ "Angola False\n",
+ " ... \n",
+ "Venezuela False\n",
+ "Vietnam False\n",
+ "Yemen False\n",
+ "Zambia False\n",
+ "Zimbabwe False\n",
+ "Length: 193, dtype: bool"
+ ]
+ },
+ "execution_count": 43,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Kde se ženy i muži dožívají přes 75 let?\n",
+ "(countries[\"life_expectancy_male\"] > 75) & (countries[\"life_expectancy_female\"] > 75)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Filtrování\n",
+ "\n",
+ "Pokud chceš z tabulky vybrat řádky, které splňují nějaké kritérium, musíš (není to vždy těžké :-)) toto kritérium převést do podoby sloupce logických hodnot. Potom tento sloupec (sloupec samotný, nikoliv jeho název!) vložíš do hranatých závorek jako index `DataFrame`.\n",
+ "\n",
+ "Když budeš například chtít informace jen o členech EU, můžeš k tomu přímo použít sloupec \"is_eu\", který logické hodnoty obsahuje:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " iso | \n",
+ " world_6region | \n",
+ " world_4region | \n",
+ " income_groups | \n",
+ " is_eu | \n",
+ " is_oecd | \n",
+ " eu_accession | \n",
+ " year | \n",
+ " area | \n",
+ " population | \n",
+ " alcohol_adults | \n",
+ " bmi_men | \n",
+ " bmi_women | \n",
+ " car_deaths_per_100000_people | \n",
+ " calories_per_day | \n",
+ " infant_mortality | \n",
+ " life_expectancy | \n",
+ " life_expectancy_female | \n",
+ " life_expectancy_male | \n",
+ " un_accession | \n",
+ "
\n",
+ " \n",
+ " | name | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Austria | \n",
+ " AUT | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1995-01-01 | \n",
+ " 2018 | \n",
+ " 83879.0 | \n",
+ " 8441000.0 | \n",
+ " 12.40 | \n",
+ " 26.47 | \n",
+ " 25.09 | \n",
+ " 3.541 | \n",
+ " 3768.0 | \n",
+ " 2.9 | \n",
+ " 81.84 | \n",
+ " 84.249 | \n",
+ " 79.585 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Belgium | \n",
+ " BEL | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1952-07-23 | \n",
+ " 2018 | \n",
+ " 30530.0 | \n",
+ " 10820000.0 | \n",
+ " 10.41 | \n",
+ " 26.76 | \n",
+ " 25.14 | \n",
+ " 5.427 | \n",
+ " 3733.0 | \n",
+ " 3.3 | \n",
+ " 81.23 | \n",
+ " 83.751 | \n",
+ " 79.131 | \n",
+ " 1945-12-27 | \n",
+ "
\n",
+ " \n",
+ " | Bulgaria | \n",
+ " BGR | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " upper_middle_income | \n",
+ " True | \n",
+ " False | \n",
+ " 2007-01-01 | \n",
+ " 2018 | \n",
+ " 111000.0 | \n",
+ " 7349000.0 | \n",
+ " 11.40 | \n",
+ " 26.54 | \n",
+ " 25.52 | \n",
+ " 9.662 | \n",
+ " 2829.0 | \n",
+ " 9.3 | \n",
+ " 75.32 | \n",
+ " 78.485 | \n",
+ " 71.618 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Croatia | \n",
+ " HRV | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " False | \n",
+ " 2013-01-01 | \n",
+ " 2018 | \n",
+ " 56590.0 | \n",
+ " 4379000.0 | \n",
+ " 15.00 | \n",
+ " 26.60 | \n",
+ " 25.18 | \n",
+ " 6.434 | \n",
+ " 3059.0 | \n",
+ " 3.6 | \n",
+ " 77.66 | \n",
+ " 81.167 | \n",
+ " 74.701 | \n",
+ " 1992-05-22 | \n",
+ "
\n",
+ " \n",
+ " | Cyprus | \n",
+ " CYP | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " False | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 9250.0 | \n",
+ " 1141000.0 | \n",
+ " 8.84 | \n",
+ " 27.42 | \n",
+ " 25.93 | \n",
+ " 6.419 | \n",
+ " 2649.0 | \n",
+ " 2.5 | \n",
+ " 80.79 | \n",
+ " 82.918 | \n",
+ " 78.734 | \n",
+ " 1960-09-20 | \n",
+ "
\n",
+ " \n",
+ " | Czechia | \n",
+ " CZE | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 78870.0 | \n",
+ " 10590000.0 | \n",
+ " 16.47 | \n",
+ " 27.91 | \n",
+ " 26.51 | \n",
+ " 5.720 | \n",
+ " 3256.0 | \n",
+ " 2.8 | \n",
+ " 79.37 | \n",
+ " 81.858 | \n",
+ " 76.148 | \n",
+ " 1993-01-19 | \n",
+ "
\n",
+ " \n",
+ " | Denmark | \n",
+ " DNK | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1973-01-01 | \n",
+ " 2018 | \n",
+ " 42922.0 | \n",
+ " 5611000.0 | \n",
+ " 12.02 | \n",
+ " 26.13 | \n",
+ " 25.11 | \n",
+ " 3.481 | \n",
+ " 3367.0 | \n",
+ " 2.9 | \n",
+ " 81.10 | \n",
+ " 82.878 | \n",
+ " 79.130 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | Estonia | \n",
+ " EST | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 45230.0 | \n",
+ " 1339000.0 | \n",
+ " 17.24 | \n",
+ " 26.26 | \n",
+ " 25.19 | \n",
+ " 5.896 | \n",
+ " 3253.0 | \n",
+ " 2.3 | \n",
+ " 77.66 | \n",
+ " 82.111 | \n",
+ " 73.201 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Finland | \n",
+ " FIN | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1995-01-01 | \n",
+ " 2018 | \n",
+ " 338420.0 | \n",
+ " 5419000.0 | \n",
+ " 13.10 | \n",
+ " 26.73 | \n",
+ " 25.58 | \n",
+ " 3.615 | \n",
+ " 3368.0 | \n",
+ " 1.9 | \n",
+ " 82.06 | \n",
+ " 84.423 | \n",
+ " 78.934 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | France | \n",
+ " FRA | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1952-07-23 | \n",
+ " 2018 | \n",
+ " 549087.0 | \n",
+ " 63780000.0 | \n",
+ " 12.48 | \n",
+ " 25.85 | \n",
+ " 24.83 | \n",
+ " 2.491 | \n",
+ " 3482.0 | \n",
+ " 3.5 | \n",
+ " 82.62 | \n",
+ " 85.747 | \n",
+ " 79.991 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | Germany | \n",
+ " DEU | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1952-07-23 | \n",
+ " 2018 | \n",
+ " 357380.0 | \n",
+ " 81800000.0 | \n",
+ " 12.14 | \n",
+ " 27.17 | \n",
+ " 25.74 | \n",
+ " 3.280 | \n",
+ " 3499.0 | \n",
+ " 3.1 | \n",
+ " 81.25 | \n",
+ " 83.632 | \n",
+ " 79.060 | \n",
+ " 1973-09-18 | \n",
+ "
\n",
+ " \n",
+ " | Greece | \n",
+ " GRC | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1981-01-01 | \n",
+ " 2018 | \n",
+ " 131960.0 | \n",
+ " 11450000.0 | \n",
+ " 11.01 | \n",
+ " 26.34 | \n",
+ " 24.92 | \n",
+ " 9.175 | \n",
+ " 3400.0 | \n",
+ " 3.6 | \n",
+ " 81.34 | \n",
+ " 84.071 | \n",
+ " 79.129 | \n",
+ " 1945-10-25 | \n",
+ "
\n",
+ " \n",
+ " | Hungary | \n",
+ " HUN | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " upper_middle_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 93030.0 | \n",
+ " 9934000.0 | \n",
+ " 16.12 | \n",
+ " 27.12 | \n",
+ " 25.98 | \n",
+ " 5.234 | \n",
+ " 3037.0 | \n",
+ " 5.3 | \n",
+ " 75.90 | \n",
+ " 79.557 | \n",
+ " 72.610 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Ireland | \n",
+ " IRL | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1973-01-01 | \n",
+ " 2018 | \n",
+ " 70280.0 | \n",
+ " 4631000.0 | \n",
+ " 14.92 | \n",
+ " 27.65 | \n",
+ " 26.62 | \n",
+ " 3.768 | \n",
+ " 3600.0 | \n",
+ " 3.0 | \n",
+ " 81.49 | \n",
+ " 83.737 | \n",
+ " 79.885 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Italy | \n",
+ " ITA | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1952-07-23 | \n",
+ " 2018 | \n",
+ " 301340.0 | \n",
+ " 61090000.0 | \n",
+ " 9.72 | \n",
+ " 26.48 | \n",
+ " 24.79 | \n",
+ " 3.778 | \n",
+ " 3579.0 | \n",
+ " 2.9 | \n",
+ " 82.62 | \n",
+ " 85.435 | \n",
+ " 81.146 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Latvia | \n",
+ " LVA | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 64490.0 | \n",
+ " 2226000.0 | \n",
+ " 13.45 | \n",
+ " 26.46 | \n",
+ " 25.62 | \n",
+ " 8.275 | \n",
+ " 3174.0 | \n",
+ " 6.9 | \n",
+ " 75.13 | \n",
+ " 79.498 | \n",
+ " 69.882 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Lithuania | \n",
+ " LTU | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 65286.0 | \n",
+ " 3278000.0 | \n",
+ " 16.30 | \n",
+ " 26.86 | \n",
+ " 26.01 | \n",
+ " 8.090 | \n",
+ " 3417.0 | \n",
+ " 3.3 | \n",
+ " 75.31 | \n",
+ " 80.060 | \n",
+ " 69.554 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Luxembourg | \n",
+ " LUX | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1952-07-23 | \n",
+ " 2018 | \n",
+ " 2590.0 | \n",
+ " 530000.0 | \n",
+ " 12.84 | \n",
+ " 27.43 | \n",
+ " 26.09 | \n",
+ " 5.971 | \n",
+ " 3539.0 | \n",
+ " 1.5 | \n",
+ " 82.39 | \n",
+ " 84.227 | \n",
+ " 79.981 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | Malta | \n",
+ " MLT | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " False | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 320.0 | \n",
+ " 420600.0 | \n",
+ " 4.10 | \n",
+ " 27.68 | \n",
+ " 27.05 | \n",
+ " 2.228 | \n",
+ " 3378.0 | \n",
+ " 5.1 | \n",
+ " 81.75 | \n",
+ " 82.724 | \n",
+ " 79.570 | \n",
+ " 1964-12-01 | \n",
+ "
\n",
+ " \n",
+ " | Netherlands | \n",
+ " NLD | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1952-07-23 | \n",
+ " 2018 | \n",
+ " 41540.0 | \n",
+ " 16760000.0 | \n",
+ " 9.75 | \n",
+ " 26.02 | \n",
+ " 25.47 | \n",
+ " 2.237 | \n",
+ " 3228.0 | \n",
+ " 3.2 | \n",
+ " 81.92 | \n",
+ " 83.841 | \n",
+ " 80.440 | \n",
+ " 1945-12-10 | \n",
+ "
\n",
+ " \n",
+ " | Poland | \n",
+ " POL | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 312680.0 | \n",
+ " 38330000.0 | \n",
+ " 14.43 | \n",
+ " 26.67 | \n",
+ " 25.92 | \n",
+ " 7.675 | \n",
+ " 3451.0 | \n",
+ " 4.5 | \n",
+ " 78.19 | \n",
+ " 81.732 | \n",
+ " 74.043 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | Portugal | \n",
+ " PRT | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1986-01-01 | \n",
+ " 2018 | \n",
+ " 92225.0 | \n",
+ " 10700000.0 | \n",
+ " 13.89 | \n",
+ " 26.68 | \n",
+ " 26.18 | \n",
+ " 5.078 | \n",
+ " 3477.0 | \n",
+ " 3.0 | \n",
+ " 81.30 | \n",
+ " 84.372 | \n",
+ " 78.685 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Romania | \n",
+ " ROU | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " upper_middle_income | \n",
+ " True | \n",
+ " False | \n",
+ " 2007-01-01 | \n",
+ " 2018 | \n",
+ " 238390.0 | \n",
+ " 21340000.0 | \n",
+ " 16.15 | \n",
+ " 25.41 | \n",
+ " 25.22 | \n",
+ " 8.808 | \n",
+ " 3358.0 | \n",
+ " 9.7 | \n",
+ " 75.53 | \n",
+ " 79.158 | \n",
+ " 72.265 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Slovakia | \n",
+ " SVK | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 49035.0 | \n",
+ " 5489000.0 | \n",
+ " 13.31 | \n",
+ " 26.93 | \n",
+ " 26.32 | \n",
+ " 6.746 | \n",
+ " 2944.0 | \n",
+ " 5.8 | \n",
+ " 77.16 | \n",
+ " 80.511 | \n",
+ " 73.589 | \n",
+ " 1993-01-19 | \n",
+ "
\n",
+ " \n",
+ " | Slovenia | \n",
+ " SVN | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 20270.0 | \n",
+ " 2045000.0 | \n",
+ " 14.94 | \n",
+ " 27.44 | \n",
+ " 26.58 | \n",
+ " 5.315 | \n",
+ " 3168.0 | \n",
+ " 2.1 | \n",
+ " 81.12 | \n",
+ " 84.017 | \n",
+ " 78.499 | \n",
+ " 1992-05-22 | \n",
+ "
\n",
+ " \n",
+ " | Spain | \n",
+ " ESP | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1986-01-01 | \n",
+ " 2018 | \n",
+ " 505940.0 | \n",
+ " 47040000.0 | \n",
+ " 11.83 | \n",
+ " 27.50 | \n",
+ " 26.31 | \n",
+ " 5.146 | \n",
+ " 3174.0 | \n",
+ " 3.5 | \n",
+ " 83.23 | \n",
+ " 86.119 | \n",
+ " 80.694 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Sweden | \n",
+ " SWE | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1995-01-01 | \n",
+ " 2018 | \n",
+ " 447420.0 | \n",
+ " 9546000.0 | \n",
+ " 9.50 | \n",
+ " 26.38 | \n",
+ " 25.15 | \n",
+ " 2.737 | \n",
+ " 3179.0 | \n",
+ " 2.4 | \n",
+ " 82.37 | \n",
+ " 84.443 | \n",
+ " 81.126 | \n",
+ " 1946-11-19 | \n",
+ "
\n",
+ " \n",
+ " | United Kingdom | \n",
+ " GBR | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1973-01-01 | \n",
+ " 2018 | \n",
+ " 243610.0 | \n",
+ " 63180000.0 | \n",
+ " 13.24 | \n",
+ " 27.39 | \n",
+ " 26.94 | \n",
+ " 3.377 | \n",
+ " 3424.0 | \n",
+ " 3.5 | \n",
+ " 81.19 | \n",
+ " 83.558 | \n",
+ " 80.127 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " iso world_6region world_4region income_groups \\\n",
+ "name \n",
+ "Austria AUT europe_central_asia europe high_income \n",
+ "Belgium BEL europe_central_asia europe high_income \n",
+ "Bulgaria BGR europe_central_asia europe upper_middle_income \n",
+ "Croatia HRV europe_central_asia europe high_income \n",
+ "Cyprus CYP europe_central_asia europe high_income \n",
+ "Czechia CZE europe_central_asia europe high_income \n",
+ "Denmark DNK europe_central_asia europe high_income \n",
+ "Estonia EST europe_central_asia europe high_income \n",
+ "Finland FIN europe_central_asia europe high_income \n",
+ "France FRA europe_central_asia europe high_income \n",
+ "Germany DEU europe_central_asia europe high_income \n",
+ "Greece GRC europe_central_asia europe high_income \n",
+ "Hungary HUN europe_central_asia europe upper_middle_income \n",
+ "Ireland IRL europe_central_asia europe high_income \n",
+ "Italy ITA europe_central_asia europe high_income \n",
+ "Latvia LVA europe_central_asia europe high_income \n",
+ "Lithuania LTU europe_central_asia europe high_income \n",
+ "Luxembourg LUX europe_central_asia europe high_income \n",
+ "Malta MLT europe_central_asia europe high_income \n",
+ "Netherlands NLD europe_central_asia europe high_income \n",
+ "Poland POL europe_central_asia europe high_income \n",
+ "Portugal PRT europe_central_asia europe high_income \n",
+ "Romania ROU europe_central_asia europe upper_middle_income \n",
+ "Slovakia SVK europe_central_asia europe high_income \n",
+ "Slovenia SVN europe_central_asia europe high_income \n",
+ "Spain ESP europe_central_asia europe high_income \n",
+ "Sweden SWE europe_central_asia europe high_income \n",
+ "United Kingdom GBR europe_central_asia europe high_income \n",
+ "\n",
+ " is_eu is_oecd eu_accession year area population \\\n",
+ "name \n",
+ "Austria True True 1995-01-01 2018 83879.0 8441000.0 \n",
+ "Belgium True True 1952-07-23 2018 30530.0 10820000.0 \n",
+ "Bulgaria True False 2007-01-01 2018 111000.0 7349000.0 \n",
+ "Croatia True False 2013-01-01 2018 56590.0 4379000.0 \n",
+ "Cyprus True False 2004-05-01 2018 9250.0 1141000.0 \n",
+ "Czechia True True 2004-05-01 2018 78870.0 10590000.0 \n",
+ "Denmark True True 1973-01-01 2018 42922.0 5611000.0 \n",
+ "Estonia True True 2004-05-01 2018 45230.0 1339000.0 \n",
+ "Finland True True 1995-01-01 2018 338420.0 5419000.0 \n",
+ "France True True 1952-07-23 2018 549087.0 63780000.0 \n",
+ "Germany True True 1952-07-23 2018 357380.0 81800000.0 \n",
+ "Greece True True 1981-01-01 2018 131960.0 11450000.0 \n",
+ "Hungary True True 2004-05-01 2018 93030.0 9934000.0 \n",
+ "Ireland True True 1973-01-01 2018 70280.0 4631000.0 \n",
+ "Italy True True 1952-07-23 2018 301340.0 61090000.0 \n",
+ "Latvia True True 2004-05-01 2018 64490.0 2226000.0 \n",
+ "Lithuania True True 2004-05-01 2018 65286.0 3278000.0 \n",
+ "Luxembourg True True 1952-07-23 2018 2590.0 530000.0 \n",
+ "Malta True False 2004-05-01 2018 320.0 420600.0 \n",
+ "Netherlands True True 1952-07-23 2018 41540.0 16760000.0 \n",
+ "Poland True True 2004-05-01 2018 312680.0 38330000.0 \n",
+ "Portugal True True 1986-01-01 2018 92225.0 10700000.0 \n",
+ "Romania True False 2007-01-01 2018 238390.0 21340000.0 \n",
+ "Slovakia True True 2004-05-01 2018 49035.0 5489000.0 \n",
+ "Slovenia True True 2004-05-01 2018 20270.0 2045000.0 \n",
+ "Spain True True 1986-01-01 2018 505940.0 47040000.0 \n",
+ "Sweden True True 1995-01-01 2018 447420.0 9546000.0 \n",
+ "United Kingdom True True 1973-01-01 2018 243610.0 63180000.0 \n",
+ "\n",
+ " alcohol_adults bmi_men bmi_women \\\n",
+ "name \n",
+ "Austria 12.40 26.47 25.09 \n",
+ "Belgium 10.41 26.76 25.14 \n",
+ "Bulgaria 11.40 26.54 25.52 \n",
+ "Croatia 15.00 26.60 25.18 \n",
+ "Cyprus 8.84 27.42 25.93 \n",
+ "Czechia 16.47 27.91 26.51 \n",
+ "Denmark 12.02 26.13 25.11 \n",
+ "Estonia 17.24 26.26 25.19 \n",
+ "Finland 13.10 26.73 25.58 \n",
+ "France 12.48 25.85 24.83 \n",
+ "Germany 12.14 27.17 25.74 \n",
+ "Greece 11.01 26.34 24.92 \n",
+ "Hungary 16.12 27.12 25.98 \n",
+ "Ireland 14.92 27.65 26.62 \n",
+ "Italy 9.72 26.48 24.79 \n",
+ "Latvia 13.45 26.46 25.62 \n",
+ "Lithuania 16.30 26.86 26.01 \n",
+ "Luxembourg 12.84 27.43 26.09 \n",
+ "Malta 4.10 27.68 27.05 \n",
+ "Netherlands 9.75 26.02 25.47 \n",
+ "Poland 14.43 26.67 25.92 \n",
+ "Portugal 13.89 26.68 26.18 \n",
+ "Romania 16.15 25.41 25.22 \n",
+ "Slovakia 13.31 26.93 26.32 \n",
+ "Slovenia 14.94 27.44 26.58 \n",
+ "Spain 11.83 27.50 26.31 \n",
+ "Sweden 9.50 26.38 25.15 \n",
+ "United Kingdom 13.24 27.39 26.94 \n",
+ "\n",
+ " car_deaths_per_100000_people calories_per_day \\\n",
+ "name \n",
+ "Austria 3.541 3768.0 \n",
+ "Belgium 5.427 3733.0 \n",
+ "Bulgaria 9.662 2829.0 \n",
+ "Croatia 6.434 3059.0 \n",
+ "Cyprus 6.419 2649.0 \n",
+ "Czechia 5.720 3256.0 \n",
+ "Denmark 3.481 3367.0 \n",
+ "Estonia 5.896 3253.0 \n",
+ "Finland 3.615 3368.0 \n",
+ "France 2.491 3482.0 \n",
+ "Germany 3.280 3499.0 \n",
+ "Greece 9.175 3400.0 \n",
+ "Hungary 5.234 3037.0 \n",
+ "Ireland 3.768 3600.0 \n",
+ "Italy 3.778 3579.0 \n",
+ "Latvia 8.275 3174.0 \n",
+ "Lithuania 8.090 3417.0 \n",
+ "Luxembourg 5.971 3539.0 \n",
+ "Malta 2.228 3378.0 \n",
+ "Netherlands 2.237 3228.0 \n",
+ "Poland 7.675 3451.0 \n",
+ "Portugal 5.078 3477.0 \n",
+ "Romania 8.808 3358.0 \n",
+ "Slovakia 6.746 2944.0 \n",
+ "Slovenia 5.315 3168.0 \n",
+ "Spain 5.146 3174.0 \n",
+ "Sweden 2.737 3179.0 \n",
+ "United Kingdom 3.377 3424.0 \n",
+ "\n",
+ " infant_mortality life_expectancy life_expectancy_female \\\n",
+ "name \n",
+ "Austria 2.9 81.84 84.249 \n",
+ "Belgium 3.3 81.23 83.751 \n",
+ "Bulgaria 9.3 75.32 78.485 \n",
+ "Croatia 3.6 77.66 81.167 \n",
+ "Cyprus 2.5 80.79 82.918 \n",
+ "Czechia 2.8 79.37 81.858 \n",
+ "Denmark 2.9 81.10 82.878 \n",
+ "Estonia 2.3 77.66 82.111 \n",
+ "Finland 1.9 82.06 84.423 \n",
+ "France 3.5 82.62 85.747 \n",
+ "Germany 3.1 81.25 83.632 \n",
+ "Greece 3.6 81.34 84.071 \n",
+ "Hungary 5.3 75.90 79.557 \n",
+ "Ireland 3.0 81.49 83.737 \n",
+ "Italy 2.9 82.62 85.435 \n",
+ "Latvia 6.9 75.13 79.498 \n",
+ "Lithuania 3.3 75.31 80.060 \n",
+ "Luxembourg 1.5 82.39 84.227 \n",
+ "Malta 5.1 81.75 82.724 \n",
+ "Netherlands 3.2 81.92 83.841 \n",
+ "Poland 4.5 78.19 81.732 \n",
+ "Portugal 3.0 81.30 84.372 \n",
+ "Romania 9.7 75.53 79.158 \n",
+ "Slovakia 5.8 77.16 80.511 \n",
+ "Slovenia 2.1 81.12 84.017 \n",
+ "Spain 3.5 83.23 86.119 \n",
+ "Sweden 2.4 82.37 84.443 \n",
+ "United Kingdom 3.5 81.19 83.558 \n",
+ "\n",
+ " life_expectancy_male un_accession \n",
+ "name \n",
+ "Austria 79.585 1955-12-14 \n",
+ "Belgium 79.131 1945-12-27 \n",
+ "Bulgaria 71.618 1955-12-14 \n",
+ "Croatia 74.701 1992-05-22 \n",
+ "Cyprus 78.734 1960-09-20 \n",
+ "Czechia 76.148 1993-01-19 \n",
+ "Denmark 79.130 1945-10-24 \n",
+ "Estonia 73.201 1991-09-17 \n",
+ "Finland 78.934 1955-12-14 \n",
+ "France 79.991 1945-10-24 \n",
+ "Germany 79.060 1973-09-18 \n",
+ "Greece 79.129 1945-10-25 \n",
+ "Hungary 72.610 1955-12-14 \n",
+ "Ireland 79.885 1955-12-14 \n",
+ "Italy 81.146 1955-12-14 \n",
+ "Latvia 69.882 1991-09-17 \n",
+ "Lithuania 69.554 1991-09-17 \n",
+ "Luxembourg 79.981 1945-10-24 \n",
+ "Malta 79.570 1964-12-01 \n",
+ "Netherlands 80.440 1945-12-10 \n",
+ "Poland 74.043 1945-10-24 \n",
+ "Portugal 78.685 1955-12-14 \n",
+ "Romania 72.265 1955-12-14 \n",
+ "Slovakia 73.589 1993-01-19 \n",
+ "Slovenia 78.499 1992-05-22 \n",
+ "Spain 80.694 1955-12-14 \n",
+ "Sweden 81.126 1946-11-19 \n",
+ "United Kingdom 80.127 1945-10-24 "
+ ]
+ },
+ "execution_count": 44,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "countries[countries[\"is_eu\"]]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Nemusíš použít existující sloupec v tabulce, ale i jakoukoliv vypočítanou hodnotu stejného tvaru:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 45,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " iso | \n",
+ " world_6region | \n",
+ " world_4region | \n",
+ " income_groups | \n",
+ " is_eu | \n",
+ " is_oecd | \n",
+ " eu_accession | \n",
+ " year | \n",
+ " area | \n",
+ " population | \n",
+ " alcohol_adults | \n",
+ " bmi_men | \n",
+ " bmi_women | \n",
+ " car_deaths_per_100000_people | \n",
+ " calories_per_day | \n",
+ " infant_mortality | \n",
+ " life_expectancy | \n",
+ " life_expectancy_female | \n",
+ " life_expectancy_male | \n",
+ " un_accession | \n",
+ "
\n",
+ " \n",
+ " | name | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Andorra | \n",
+ " AND | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2017 | \n",
+ " 470.0 | \n",
+ " 88910.0 | \n",
+ " 10.17 | \n",
+ " 27.63 | \n",
+ " 26.43 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 2.10 | \n",
+ " 82.55 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1993-07-28 | \n",
+ "
\n",
+ " \n",
+ " | Antigua and Barbuda | \n",
+ " ATG | \n",
+ " america | \n",
+ " americas | \n",
+ " high_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 440.0 | \n",
+ " 91400.0 | \n",
+ " 8.17 | \n",
+ " 25.77 | \n",
+ " 27.51 | \n",
+ " NaN | \n",
+ " 2417.0 | \n",
+ " 5.80 | \n",
+ " 77.60 | \n",
+ " 79.028 | \n",
+ " 74.154 | \n",
+ " 1981-11-11 | \n",
+ "
\n",
+ " \n",
+ " | Dominica | \n",
+ " DMA | \n",
+ " america | \n",
+ " americas | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2017 | \n",
+ " 750.0 | \n",
+ " 67700.0 | \n",
+ " 8.68 | \n",
+ " 24.57 | \n",
+ " 28.78 | \n",
+ " NaN | \n",
+ " 2931.0 | \n",
+ " 19.60 | \n",
+ " 73.01 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1978-12-18 | \n",
+ "
\n",
+ " \n",
+ " | Liechtenstein | \n",
+ " LIE | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2017 | \n",
+ " 160.0 | \n",
+ " 36870.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1.76 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1990-09-18 | \n",
+ "
\n",
+ " \n",
+ " | Marshall Islands | \n",
+ " MHL | \n",
+ " east_asia_pacific | \n",
+ " asia | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2017 | \n",
+ " 180.0 | \n",
+ " 56690.0 | \n",
+ " NaN | \n",
+ " 29.37 | \n",
+ " 31.39 | \n",
+ " 1.800 | \n",
+ " NaN | \n",
+ " 29.60 | \n",
+ " 65.00 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Monaco | \n",
+ " MCO | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2017 | \n",
+ " 2.0 | \n",
+ " 35460.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 2.80 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1993-05-28 | \n",
+ "
\n",
+ " \n",
+ " | Nauru | \n",
+ " NRU | \n",
+ " east_asia_pacific | \n",
+ " asia | \n",
+ " NaN | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2015 | \n",
+ " 20.0 | \n",
+ " 10440.0 | \n",
+ " 4.81 | \n",
+ " 33.90 | \n",
+ " 35.02 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 29.10 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1999-09-14 | \n",
+ "
\n",
+ " \n",
+ " | Palau | \n",
+ " PLW | \n",
+ " east_asia_pacific | \n",
+ " asia | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2017 | \n",
+ " 460.0 | \n",
+ " 20920.0 | \n",
+ " 9.86 | \n",
+ " 30.38 | \n",
+ " 31.85 | \n",
+ " 10.730 | \n",
+ " NaN | \n",
+ " 14.20 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1994-12-15 | \n",
+ "
\n",
+ " \n",
+ " | Saint Kitts and Nevis | \n",
+ " KNA | \n",
+ " america | \n",
+ " americas | \n",
+ " high_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2017 | \n",
+ " 260.0 | \n",
+ " 54340.0 | \n",
+ " 10.62 | \n",
+ " 28.23 | \n",
+ " 30.51 | \n",
+ " NaN | \n",
+ " 2492.0 | \n",
+ " 8.40 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1983-09-23 | \n",
+ "
\n",
+ " \n",
+ " | San Marino | \n",
+ " SMR | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2017 | \n",
+ " 60.0 | \n",
+ " 32160.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 5.946 | \n",
+ " NaN | \n",
+ " 2.60 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1992-03-02 | \n",
+ "
\n",
+ " \n",
+ " | Seychelles | \n",
+ " SYC | \n",
+ " sub_saharan_africa | \n",
+ " africa | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 460.0 | \n",
+ " 87420.0 | \n",
+ " 12.11 | \n",
+ " 25.56 | \n",
+ " 27.97 | \n",
+ " 11.700 | \n",
+ " NaN | \n",
+ " 11.70 | \n",
+ " 74.23 | \n",
+ " 78.730 | \n",
+ " 69.693 | \n",
+ " 1976-09-21 | \n",
+ "
\n",
+ " \n",
+ " | Tuvalu | \n",
+ " TUV | \n",
+ " east_asia_pacific | \n",
+ " asia | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2017 | \n",
+ " 30.0 | \n",
+ " 9888.0 | \n",
+ " 2.14 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 22.80 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 2000-09-05 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " iso world_6region world_4region \\\n",
+ "name \n",
+ "Andorra AND europe_central_asia europe \n",
+ "Antigua and Barbuda ATG america americas \n",
+ "Dominica DMA america americas \n",
+ "Liechtenstein LIE europe_central_asia europe \n",
+ "Marshall Islands MHL east_asia_pacific asia \n",
+ "Monaco MCO europe_central_asia europe \n",
+ "Nauru NRU east_asia_pacific asia \n",
+ "Palau PLW east_asia_pacific asia \n",
+ "Saint Kitts and Nevis KNA america americas \n",
+ "San Marino SMR europe_central_asia europe \n",
+ "Seychelles SYC sub_saharan_africa africa \n",
+ "Tuvalu TUV east_asia_pacific asia \n",
+ "\n",
+ " income_groups is_eu is_oecd eu_accession year \\\n",
+ "name \n",
+ "Andorra high_income False False NaN 2017 \n",
+ "Antigua and Barbuda high_income False False NaN 2018 \n",
+ "Dominica upper_middle_income False False NaN 2017 \n",
+ "Liechtenstein high_income False False NaN 2017 \n",
+ "Marshall Islands upper_middle_income False False NaN 2017 \n",
+ "Monaco high_income False False NaN 2017 \n",
+ "Nauru NaN False False NaN 2015 \n",
+ "Palau upper_middle_income False False NaN 2017 \n",
+ "Saint Kitts and Nevis high_income False False NaN 2017 \n",
+ "San Marino high_income False False NaN 2017 \n",
+ "Seychelles upper_middle_income False False NaN 2018 \n",
+ "Tuvalu upper_middle_income False False NaN 2017 \n",
+ "\n",
+ " area population alcohol_adults bmi_men bmi_women \\\n",
+ "name \n",
+ "Andorra 470.0 88910.0 10.17 27.63 26.43 \n",
+ "Antigua and Barbuda 440.0 91400.0 8.17 25.77 27.51 \n",
+ "Dominica 750.0 67700.0 8.68 24.57 28.78 \n",
+ "Liechtenstein 160.0 36870.0 NaN NaN NaN \n",
+ "Marshall Islands 180.0 56690.0 NaN 29.37 31.39 \n",
+ "Monaco 2.0 35460.0 NaN NaN NaN \n",
+ "Nauru 20.0 10440.0 4.81 33.90 35.02 \n",
+ "Palau 460.0 20920.0 9.86 30.38 31.85 \n",
+ "Saint Kitts and Nevis 260.0 54340.0 10.62 28.23 30.51 \n",
+ "San Marino 60.0 32160.0 NaN NaN NaN \n",
+ "Seychelles 460.0 87420.0 12.11 25.56 27.97 \n",
+ "Tuvalu 30.0 9888.0 2.14 NaN NaN \n",
+ "\n",
+ " car_deaths_per_100000_people calories_per_day \\\n",
+ "name \n",
+ "Andorra NaN NaN \n",
+ "Antigua and Barbuda NaN 2417.0 \n",
+ "Dominica NaN 2931.0 \n",
+ "Liechtenstein NaN NaN \n",
+ "Marshall Islands 1.800 NaN \n",
+ "Monaco NaN NaN \n",
+ "Nauru NaN NaN \n",
+ "Palau 10.730 NaN \n",
+ "Saint Kitts and Nevis NaN 2492.0 \n",
+ "San Marino 5.946 NaN \n",
+ "Seychelles 11.700 NaN \n",
+ "Tuvalu NaN NaN \n",
+ "\n",
+ " infant_mortality life_expectancy \\\n",
+ "name \n",
+ "Andorra 2.10 82.55 \n",
+ "Antigua and Barbuda 5.80 77.60 \n",
+ "Dominica 19.60 73.01 \n",
+ "Liechtenstein 1.76 NaN \n",
+ "Marshall Islands 29.60 65.00 \n",
+ "Monaco 2.80 NaN \n",
+ "Nauru 29.10 NaN \n",
+ "Palau 14.20 NaN \n",
+ "Saint Kitts and Nevis 8.40 NaN \n",
+ "San Marino 2.60 NaN \n",
+ "Seychelles 11.70 74.23 \n",
+ "Tuvalu 22.80 NaN \n",
+ "\n",
+ " life_expectancy_female life_expectancy_male \\\n",
+ "name \n",
+ "Andorra NaN NaN \n",
+ "Antigua and Barbuda 79.028 74.154 \n",
+ "Dominica NaN NaN \n",
+ "Liechtenstein NaN NaN \n",
+ "Marshall Islands NaN NaN \n",
+ "Monaco NaN NaN \n",
+ "Nauru NaN NaN \n",
+ "Palau NaN NaN \n",
+ "Saint Kitts and Nevis NaN NaN \n",
+ "San Marino NaN NaN \n",
+ "Seychelles 78.730 69.693 \n",
+ "Tuvalu NaN NaN \n",
+ "\n",
+ " un_accession \n",
+ "name \n",
+ "Andorra 1993-07-28 \n",
+ "Antigua and Barbuda 1981-11-11 \n",
+ "Dominica 1978-12-18 \n",
+ "Liechtenstein 1990-09-18 \n",
+ "Marshall Islands 1991-09-17 \n",
+ "Monaco 1993-05-28 \n",
+ "Nauru 1999-09-14 \n",
+ "Palau 1994-12-15 \n",
+ "Saint Kitts and Nevis 1983-09-23 \n",
+ "San Marino 1992-03-02 \n",
+ "Seychelles 1976-09-21 \n",
+ "Tuvalu 2000-09-05 "
+ ]
+ },
+ "execution_count": 45,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Prťavé země\n",
+ "countries[countries[\"population\"] < 100_000]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "...a samozřejmě kombinace:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " iso | \n",
+ " world_6region | \n",
+ " world_4region | \n",
+ " income_groups | \n",
+ " is_eu | \n",
+ " is_oecd | \n",
+ " eu_accession | \n",
+ " year | \n",
+ " area | \n",
+ " population | \n",
+ " alcohol_adults | \n",
+ " bmi_men | \n",
+ " bmi_women | \n",
+ " car_deaths_per_100000_people | \n",
+ " calories_per_day | \n",
+ " infant_mortality | \n",
+ " life_expectancy | \n",
+ " life_expectancy_female | \n",
+ " life_expectancy_male | \n",
+ " un_accession | \n",
+ "
\n",
+ " \n",
+ " | name | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Bulgaria | \n",
+ " BGR | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " upper_middle_income | \n",
+ " True | \n",
+ " False | \n",
+ " 2007-01-01 | \n",
+ " 2018 | \n",
+ " 111000.0 | \n",
+ " 7349000.0 | \n",
+ " 11.40 | \n",
+ " 26.54 | \n",
+ " 25.52 | \n",
+ " 9.662 | \n",
+ " 2829.0 | \n",
+ " 9.3 | \n",
+ " 75.32 | \n",
+ " 78.485 | \n",
+ " 71.618 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Hungary | \n",
+ " HUN | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " upper_middle_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 93030.0 | \n",
+ " 9934000.0 | \n",
+ " 16.12 | \n",
+ " 27.12 | \n",
+ " 25.98 | \n",
+ " 5.234 | \n",
+ " 3037.0 | \n",
+ " 5.3 | \n",
+ " 75.90 | \n",
+ " 79.557 | \n",
+ " 72.610 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Romania | \n",
+ " ROU | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " upper_middle_income | \n",
+ " True | \n",
+ " False | \n",
+ " 2007-01-01 | \n",
+ " 2018 | \n",
+ " 238390.0 | \n",
+ " 21340000.0 | \n",
+ " 16.15 | \n",
+ " 25.41 | \n",
+ " 25.22 | \n",
+ " 8.808 | \n",
+ " 3358.0 | \n",
+ " 9.7 | \n",
+ " 75.53 | \n",
+ " 79.158 | \n",
+ " 72.265 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " iso world_6region world_4region income_groups is_eu \\\n",
+ "name \n",
+ "Bulgaria BGR europe_central_asia europe upper_middle_income True \n",
+ "Hungary HUN europe_central_asia europe upper_middle_income True \n",
+ "Romania ROU europe_central_asia europe upper_middle_income True \n",
+ "\n",
+ " is_oecd eu_accession year area population alcohol_adults \\\n",
+ "name \n",
+ "Bulgaria False 2007-01-01 2018 111000.0 7349000.0 11.40 \n",
+ "Hungary True 2004-05-01 2018 93030.0 9934000.0 16.12 \n",
+ "Romania False 2007-01-01 2018 238390.0 21340000.0 16.15 \n",
+ "\n",
+ " bmi_men bmi_women car_deaths_per_100000_people calories_per_day \\\n",
+ "name \n",
+ "Bulgaria 26.54 25.52 9.662 2829.0 \n",
+ "Hungary 27.12 25.98 5.234 3037.0 \n",
+ "Romania 25.41 25.22 8.808 3358.0 \n",
+ "\n",
+ " infant_mortality life_expectancy life_expectancy_female \\\n",
+ "name \n",
+ "Bulgaria 9.3 75.32 78.485 \n",
+ "Hungary 5.3 75.90 79.557 \n",
+ "Romania 9.7 75.53 79.158 \n",
+ "\n",
+ " life_expectancy_male un_accession \n",
+ "name \n",
+ "Bulgaria 71.618 1955-12-14 \n",
+ "Hungary 72.610 1955-12-14 \n",
+ "Romania 72.265 1955-12-14 "
+ ]
+ },
+ "execution_count": 46,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Chudší země EU\n",
+ "countries[countries[\"is_eu\"] & (countries[\"income_groups\"] != \"high_income\")]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " iso | \n",
+ " world_6region | \n",
+ " world_4region | \n",
+ " income_groups | \n",
+ " is_eu | \n",
+ " is_oecd | \n",
+ " eu_accession | \n",
+ " year | \n",
+ " area | \n",
+ " population | \n",
+ " alcohol_adults | \n",
+ " bmi_men | \n",
+ " bmi_women | \n",
+ " car_deaths_per_100000_people | \n",
+ " calories_per_day | \n",
+ " infant_mortality | \n",
+ " life_expectancy | \n",
+ " life_expectancy_female | \n",
+ " life_expectancy_male | \n",
+ " un_accession | \n",
+ "
\n",
+ " \n",
+ " | name | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Estonia | \n",
+ " EST | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 45230.0 | \n",
+ " 1339000.0 | \n",
+ " 17.24 | \n",
+ " 26.26 | \n",
+ " 25.19 | \n",
+ " 5.896 | \n",
+ " 3253.0 | \n",
+ " 2.3 | \n",
+ " 77.66 | \n",
+ " 82.111 | \n",
+ " 73.201 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Hungary | \n",
+ " HUN | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " upper_middle_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 93030.0 | \n",
+ " 9934000.0 | \n",
+ " 16.12 | \n",
+ " 27.12 | \n",
+ " 25.98 | \n",
+ " 5.234 | \n",
+ " 3037.0 | \n",
+ " 5.3 | \n",
+ " 75.90 | \n",
+ " 79.557 | \n",
+ " 72.610 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Latvia | \n",
+ " LVA | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 64490.0 | \n",
+ " 2226000.0 | \n",
+ " 13.45 | \n",
+ " 26.46 | \n",
+ " 25.62 | \n",
+ " 8.275 | \n",
+ " 3174.0 | \n",
+ " 6.9 | \n",
+ " 75.13 | \n",
+ " 79.498 | \n",
+ " 69.882 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Lithuania | \n",
+ " LTU | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 65286.0 | \n",
+ " 3278000.0 | \n",
+ " 16.30 | \n",
+ " 26.86 | \n",
+ " 26.01 | \n",
+ " 8.090 | \n",
+ " 3417.0 | \n",
+ " 3.3 | \n",
+ " 75.31 | \n",
+ " 80.060 | \n",
+ " 69.554 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Mexico | \n",
+ " MEX | \n",
+ " america | \n",
+ " americas | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " True | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 1964380.0 | \n",
+ " 117500000.0 | \n",
+ " 8.55 | \n",
+ " 27.42 | \n",
+ " 28.74 | \n",
+ " 9.468 | \n",
+ " 3072.0 | \n",
+ " 11.3 | \n",
+ " 76.78 | \n",
+ " 79.880 | \n",
+ " 75.120 | \n",
+ " 1945-11-07 | \n",
+ "
\n",
+ " \n",
+ " | Slovakia | \n",
+ " SVK | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 49035.0 | \n",
+ " 5489000.0 | \n",
+ " 13.31 | \n",
+ " 26.93 | \n",
+ " 26.32 | \n",
+ " 6.746 | \n",
+ " 2944.0 | \n",
+ " 5.8 | \n",
+ " 77.16 | \n",
+ " 80.511 | \n",
+ " 73.589 | \n",
+ " 1993-01-19 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " iso world_6region world_4region income_groups is_eu \\\n",
+ "name \n",
+ "Estonia EST europe_central_asia europe high_income True \n",
+ "Hungary HUN europe_central_asia europe upper_middle_income True \n",
+ "Latvia LVA europe_central_asia europe high_income True \n",
+ "Lithuania LTU europe_central_asia europe high_income True \n",
+ "Mexico MEX america americas upper_middle_income False \n",
+ "Slovakia SVK europe_central_asia europe high_income True \n",
+ "\n",
+ " is_oecd eu_accession year area population alcohol_adults \\\n",
+ "name \n",
+ "Estonia True 2004-05-01 2018 45230.0 1339000.0 17.24 \n",
+ "Hungary True 2004-05-01 2018 93030.0 9934000.0 16.12 \n",
+ "Latvia True 2004-05-01 2018 64490.0 2226000.0 13.45 \n",
+ "Lithuania True 2004-05-01 2018 65286.0 3278000.0 16.30 \n",
+ "Mexico True NaN 2018 1964380.0 117500000.0 8.55 \n",
+ "Slovakia True 2004-05-01 2018 49035.0 5489000.0 13.31 \n",
+ "\n",
+ " bmi_men bmi_women car_deaths_per_100000_people calories_per_day \\\n",
+ "name \n",
+ "Estonia 26.26 25.19 5.896 3253.0 \n",
+ "Hungary 27.12 25.98 5.234 3037.0 \n",
+ "Latvia 26.46 25.62 8.275 3174.0 \n",
+ "Lithuania 26.86 26.01 8.090 3417.0 \n",
+ "Mexico 27.42 28.74 9.468 3072.0 \n",
+ "Slovakia 26.93 26.32 6.746 2944.0 \n",
+ "\n",
+ " infant_mortality life_expectancy life_expectancy_female \\\n",
+ "name \n",
+ "Estonia 2.3 77.66 82.111 \n",
+ "Hungary 5.3 75.90 79.557 \n",
+ "Latvia 6.9 75.13 79.498 \n",
+ "Lithuania 3.3 75.31 80.060 \n",
+ "Mexico 11.3 76.78 79.880 \n",
+ "Slovakia 5.8 77.16 80.511 \n",
+ "\n",
+ " life_expectancy_male un_accession \n",
+ "name \n",
+ "Estonia 73.201 1991-09-17 \n",
+ "Hungary 72.610 1955-12-14 \n",
+ "Latvia 69.882 1991-09-17 \n",
+ "Lithuania 69.554 1991-09-17 \n",
+ "Mexico 75.120 1945-11-07 \n",
+ "Slovakia 73.589 1993-01-19 "
+ ]
+ },
+ "execution_count": 47,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Které země OECD mají očekávanou dobu dožití méně 78 let?\n",
+ "countries[countries[\"is_oecd\"] & (countries[\"life_expectancy\"] < 78)]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Protože tento způsob filtrování je poněkud nešikovný, existuje ještě metoda `query`, která umožňuje vybírat řádky na základě řetězce, který popisuje nějakou nerovnost z názvů sloupců a číselných hodnot (což poměrně často jde, někdy ovšem nemusí)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " iso | \n",
+ " world_6region | \n",
+ " world_4region | \n",
+ " income_groups | \n",
+ " is_eu | \n",
+ " is_oecd | \n",
+ " eu_accession | \n",
+ " year | \n",
+ " area | \n",
+ " population | \n",
+ " alcohol_adults | \n",
+ " bmi_men | \n",
+ " bmi_women | \n",
+ " car_deaths_per_100000_people | \n",
+ " calories_per_day | \n",
+ " infant_mortality | \n",
+ " life_expectancy | \n",
+ " life_expectancy_female | \n",
+ " life_expectancy_male | \n",
+ " un_accession | \n",
+ "
\n",
+ " \n",
+ " | name | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Bangladesh | \n",
+ " BGD | \n",
+ " south_asia | \n",
+ " asia | \n",
+ " low_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 147630.0 | \n",
+ " 1.544000e+08 | \n",
+ " 0.17 | \n",
+ " 20.40 | \n",
+ " 20.55 | \n",
+ " 4.401 | \n",
+ " 2450.0 | \n",
+ " 30.7 | \n",
+ " 73.41 | \n",
+ " 74.937 | \n",
+ " 71.484 | \n",
+ " 1974-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Brazil | \n",
+ " BRA | \n",
+ " america | \n",
+ " americas | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 8515770.0 | \n",
+ " 2.001000e+08 | \n",
+ " 10.08 | \n",
+ " 25.79 | \n",
+ " 25.99 | \n",
+ " 1.872 | \n",
+ " 3263.0 | \n",
+ " 14.6 | \n",
+ " 75.70 | \n",
+ " 79.527 | \n",
+ " 72.340 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | China | \n",
+ " CHN | \n",
+ " east_asia_pacific | \n",
+ " asia | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 9562911.0 | \n",
+ " 1.359000e+09 | \n",
+ " 5.56 | \n",
+ " 22.92 | \n",
+ " 22.91 | \n",
+ " 3.590 | \n",
+ " 3108.0 | \n",
+ " 9.2 | \n",
+ " 76.92 | \n",
+ " 78.163 | \n",
+ " 75.096 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | India | \n",
+ " IND | \n",
+ " south_asia | \n",
+ " asia | \n",
+ " lower_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 3287259.0 | \n",
+ " 1.275000e+09 | \n",
+ " 2.69 | \n",
+ " 20.96 | \n",
+ " 21.31 | \n",
+ " 3.034 | \n",
+ " 2459.0 | \n",
+ " 37.9 | \n",
+ " 69.10 | \n",
+ " 70.678 | \n",
+ " 67.538 | \n",
+ " 1945-10-30 | \n",
+ "
\n",
+ " \n",
+ " | Indonesia | \n",
+ " IDN | \n",
+ " east_asia_pacific | \n",
+ " asia | \n",
+ " lower_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 1910931.0 | \n",
+ " 2.472000e+08 | \n",
+ " 0.56 | \n",
+ " 21.86 | \n",
+ " 22.99 | \n",
+ " 1.232 | \n",
+ " 2777.0 | \n",
+ " 22.8 | \n",
+ " 72.03 | \n",
+ " 71.742 | \n",
+ " 67.426 | \n",
+ " 1950-09-28 | \n",
+ "
\n",
+ " \n",
+ " | Japan | \n",
+ " JPN | \n",
+ " east_asia_pacific | \n",
+ " asia | \n",
+ " high_income | \n",
+ " False | \n",
+ " True | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 377962.0 | \n",
+ " 1.263000e+08 | \n",
+ " 7.79 | \n",
+ " 23.50 | \n",
+ " 21.87 | \n",
+ " 1.381 | \n",
+ " 2726.0 | \n",
+ " 2.0 | \n",
+ " 84.17 | \n",
+ " 87.244 | \n",
+ " 80.803 | \n",
+ " 1956-12-18 | \n",
+ "
\n",
+ " \n",
+ " | Mexico | \n",
+ " MEX | \n",
+ " america | \n",
+ " americas | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " True | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 1964380.0 | \n",
+ " 1.175000e+08 | \n",
+ " 8.55 | \n",
+ " 27.42 | \n",
+ " 28.74 | \n",
+ " 9.468 | \n",
+ " 3072.0 | \n",
+ " 11.3 | \n",
+ " 76.78 | \n",
+ " 79.880 | \n",
+ " 75.120 | \n",
+ " 1945-11-07 | \n",
+ "
\n",
+ " \n",
+ " | Nigeria | \n",
+ " NGA | \n",
+ " sub_saharan_africa | \n",
+ " africa | \n",
+ " lower_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 923770.0 | \n",
+ " 1.709000e+08 | \n",
+ " 12.72 | \n",
+ " 23.03 | \n",
+ " 23.67 | \n",
+ " NaN | \n",
+ " 2700.0 | \n",
+ " 69.4 | \n",
+ " 66.14 | \n",
+ " 55.158 | \n",
+ " 53.512 | \n",
+ " 1960-10-07 | \n",
+ "
\n",
+ " \n",
+ " | Pakistan | \n",
+ " PAK | \n",
+ " south_asia | \n",
+ " asia | \n",
+ " lower_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 796100.0 | \n",
+ " 1.832000e+08 | \n",
+ " 0.05 | \n",
+ " 22.30 | \n",
+ " 23.45 | \n",
+ " NaN | \n",
+ " 2440.0 | \n",
+ " 65.8 | \n",
+ " 67.96 | \n",
+ " 67.869 | \n",
+ " 65.750 | \n",
+ " 1947-09-30 | \n",
+ "
\n",
+ " \n",
+ " | Russia | \n",
+ " RUS | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 17098250.0 | \n",
+ " 1.426000e+08 | \n",
+ " 16.23 | \n",
+ " 26.01 | \n",
+ " 27.21 | \n",
+ " 14.380 | \n",
+ " 3361.0 | \n",
+ " 8.2 | \n",
+ " 71.07 | \n",
+ " 76.882 | \n",
+ " 65.771 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | United States | \n",
+ " USA | \n",
+ " america | \n",
+ " americas | \n",
+ " high_income | \n",
+ " False | \n",
+ " True | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 9831510.0 | \n",
+ " 3.185000e+08 | \n",
+ " 9.70 | \n",
+ " 28.46 | \n",
+ " 28.34 | \n",
+ " 9.523 | \n",
+ " 3682.0 | \n",
+ " 5.6 | \n",
+ " 79.14 | \n",
+ " 81.942 | \n",
+ " 77.429 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " iso world_6region world_4region income_groups \\\n",
+ "name \n",
+ "Bangladesh BGD south_asia asia low_income \n",
+ "Brazil BRA america americas upper_middle_income \n",
+ "China CHN east_asia_pacific asia upper_middle_income \n",
+ "India IND south_asia asia lower_middle_income \n",
+ "Indonesia IDN east_asia_pacific asia lower_middle_income \n",
+ "Japan JPN east_asia_pacific asia high_income \n",
+ "Mexico MEX america americas upper_middle_income \n",
+ "Nigeria NGA sub_saharan_africa africa lower_middle_income \n",
+ "Pakistan PAK south_asia asia lower_middle_income \n",
+ "Russia RUS europe_central_asia europe high_income \n",
+ "United States USA america americas high_income \n",
+ "\n",
+ " is_eu is_oecd eu_accession year area population \\\n",
+ "name \n",
+ "Bangladesh False False NaN 2018 147630.0 1.544000e+08 \n",
+ "Brazil False False NaN 2018 8515770.0 2.001000e+08 \n",
+ "China False False NaN 2018 9562911.0 1.359000e+09 \n",
+ "India False False NaN 2018 3287259.0 1.275000e+09 \n",
+ "Indonesia False False NaN 2018 1910931.0 2.472000e+08 \n",
+ "Japan False True NaN 2018 377962.0 1.263000e+08 \n",
+ "Mexico False True NaN 2018 1964380.0 1.175000e+08 \n",
+ "Nigeria False False NaN 2018 923770.0 1.709000e+08 \n",
+ "Pakistan False False NaN 2018 796100.0 1.832000e+08 \n",
+ "Russia False False NaN 2018 17098250.0 1.426000e+08 \n",
+ "United States False True NaN 2018 9831510.0 3.185000e+08 \n",
+ "\n",
+ " alcohol_adults bmi_men bmi_women \\\n",
+ "name \n",
+ "Bangladesh 0.17 20.40 20.55 \n",
+ "Brazil 10.08 25.79 25.99 \n",
+ "China 5.56 22.92 22.91 \n",
+ "India 2.69 20.96 21.31 \n",
+ "Indonesia 0.56 21.86 22.99 \n",
+ "Japan 7.79 23.50 21.87 \n",
+ "Mexico 8.55 27.42 28.74 \n",
+ "Nigeria 12.72 23.03 23.67 \n",
+ "Pakistan 0.05 22.30 23.45 \n",
+ "Russia 16.23 26.01 27.21 \n",
+ "United States 9.70 28.46 28.34 \n",
+ "\n",
+ " car_deaths_per_100000_people calories_per_day \\\n",
+ "name \n",
+ "Bangladesh 4.401 2450.0 \n",
+ "Brazil 1.872 3263.0 \n",
+ "China 3.590 3108.0 \n",
+ "India 3.034 2459.0 \n",
+ "Indonesia 1.232 2777.0 \n",
+ "Japan 1.381 2726.0 \n",
+ "Mexico 9.468 3072.0 \n",
+ "Nigeria NaN 2700.0 \n",
+ "Pakistan NaN 2440.0 \n",
+ "Russia 14.380 3361.0 \n",
+ "United States 9.523 3682.0 \n",
+ "\n",
+ " infant_mortality life_expectancy life_expectancy_female \\\n",
+ "name \n",
+ "Bangladesh 30.7 73.41 74.937 \n",
+ "Brazil 14.6 75.70 79.527 \n",
+ "China 9.2 76.92 78.163 \n",
+ "India 37.9 69.10 70.678 \n",
+ "Indonesia 22.8 72.03 71.742 \n",
+ "Japan 2.0 84.17 87.244 \n",
+ "Mexico 11.3 76.78 79.880 \n",
+ "Nigeria 69.4 66.14 55.158 \n",
+ "Pakistan 65.8 67.96 67.869 \n",
+ "Russia 8.2 71.07 76.882 \n",
+ "United States 5.6 79.14 81.942 \n",
+ "\n",
+ " life_expectancy_male un_accession \n",
+ "name \n",
+ "Bangladesh 71.484 1974-09-17 \n",
+ "Brazil 72.340 1945-10-24 \n",
+ "China 75.096 1945-10-24 \n",
+ "India 67.538 1945-10-30 \n",
+ "Indonesia 67.426 1950-09-28 \n",
+ "Japan 80.803 1956-12-18 \n",
+ "Mexico 75.120 1945-11-07 \n",
+ "Nigeria 53.512 1960-10-07 \n",
+ "Pakistan 65.750 1947-09-30 \n",
+ "Russia 65.771 1945-10-24 \n",
+ "United States 77.429 1945-10-24 "
+ ]
+ },
+ "execution_count": 48,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Opravdu veliké země (počet obyvatel nad 100 milionů)\n",
+ "countries.query(\"population > 100_000_000\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 49,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " iso | \n",
+ " world_6region | \n",
+ " world_4region | \n",
+ " income_groups | \n",
+ " is_eu | \n",
+ " is_oecd | \n",
+ " eu_accession | \n",
+ " year | \n",
+ " area | \n",
+ " population | \n",
+ " alcohol_adults | \n",
+ " bmi_men | \n",
+ " bmi_women | \n",
+ " car_deaths_per_100000_people | \n",
+ " calories_per_day | \n",
+ " infant_mortality | \n",
+ " life_expectancy | \n",
+ " life_expectancy_female | \n",
+ " life_expectancy_male | \n",
+ " un_accession | \n",
+ "
\n",
+ " \n",
+ " | name | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Austria | \n",
+ " AUT | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1995-01-01 | \n",
+ " 2018 | \n",
+ " 83879.0 | \n",
+ " 8441000.0 | \n",
+ " 12.40 | \n",
+ " 26.47 | \n",
+ " 25.09 | \n",
+ " 3.541 | \n",
+ " 3768.0 | \n",
+ " 2.9 | \n",
+ " 81.84 | \n",
+ " 84.249 | \n",
+ " 79.585 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Belgium | \n",
+ " BEL | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1952-07-23 | \n",
+ " 2018 | \n",
+ " 30530.0 | \n",
+ " 10820000.0 | \n",
+ " 10.41 | \n",
+ " 26.76 | \n",
+ " 25.14 | \n",
+ " 5.427 | \n",
+ " 3733.0 | \n",
+ " 3.3 | \n",
+ " 81.23 | \n",
+ " 83.751 | \n",
+ " 79.131 | \n",
+ " 1945-12-27 | \n",
+ "
\n",
+ " \n",
+ " | Ireland | \n",
+ " IRL | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1973-01-01 | \n",
+ " 2018 | \n",
+ " 70280.0 | \n",
+ " 4631000.0 | \n",
+ " 14.92 | \n",
+ " 27.65 | \n",
+ " 26.62 | \n",
+ " 3.768 | \n",
+ " 3600.0 | \n",
+ " 3.0 | \n",
+ " 81.49 | \n",
+ " 83.737 | \n",
+ " 79.885 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Italy | \n",
+ " ITA | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1952-07-23 | \n",
+ " 2018 | \n",
+ " 301340.0 | \n",
+ " 61090000.0 | \n",
+ " 9.72 | \n",
+ " 26.48 | \n",
+ " 24.79 | \n",
+ " 3.778 | \n",
+ " 3579.0 | \n",
+ " 2.9 | \n",
+ " 82.62 | \n",
+ " 85.435 | \n",
+ " 81.146 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Luxembourg | \n",
+ " LUX | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 1952-07-23 | \n",
+ " 2018 | \n",
+ " 2590.0 | \n",
+ " 530000.0 | \n",
+ " 12.84 | \n",
+ " 27.43 | \n",
+ " 26.09 | \n",
+ " 5.971 | \n",
+ " 3539.0 | \n",
+ " 1.5 | \n",
+ " 82.39 | \n",
+ " 84.227 | \n",
+ " 79.981 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " iso world_6region world_4region income_groups is_eu \\\n",
+ "name \n",
+ "Austria AUT europe_central_asia europe high_income True \n",
+ "Belgium BEL europe_central_asia europe high_income True \n",
+ "Ireland IRL europe_central_asia europe high_income True \n",
+ "Italy ITA europe_central_asia europe high_income True \n",
+ "Luxembourg LUX europe_central_asia europe high_income True \n",
+ "\n",
+ " is_oecd eu_accession year area population alcohol_adults \\\n",
+ "name \n",
+ "Austria True 1995-01-01 2018 83879.0 8441000.0 12.40 \n",
+ "Belgium True 1952-07-23 2018 30530.0 10820000.0 10.41 \n",
+ "Ireland True 1973-01-01 2018 70280.0 4631000.0 14.92 \n",
+ "Italy True 1952-07-23 2018 301340.0 61090000.0 9.72 \n",
+ "Luxembourg True 1952-07-23 2018 2590.0 530000.0 12.84 \n",
+ "\n",
+ " bmi_men bmi_women car_deaths_per_100000_people \\\n",
+ "name \n",
+ "Austria 26.47 25.09 3.541 \n",
+ "Belgium 26.76 25.14 5.427 \n",
+ "Ireland 27.65 26.62 3.768 \n",
+ "Italy 26.48 24.79 3.778 \n",
+ "Luxembourg 27.43 26.09 5.971 \n",
+ "\n",
+ " calories_per_day infant_mortality life_expectancy \\\n",
+ "name \n",
+ "Austria 3768.0 2.9 81.84 \n",
+ "Belgium 3733.0 3.3 81.23 \n",
+ "Ireland 3600.0 3.0 81.49 \n",
+ "Italy 3579.0 2.9 82.62 \n",
+ "Luxembourg 3539.0 1.5 82.39 \n",
+ "\n",
+ " life_expectancy_female life_expectancy_male un_accession \n",
+ "name \n",
+ "Austria 84.249 79.585 1955-12-14 \n",
+ "Belgium 83.751 79.131 1945-12-27 \n",
+ "Ireland 83.737 79.885 1955-12-14 \n",
+ "Italy 85.435 81.146 1955-12-14 \n",
+ "Luxembourg 84.227 79.981 1945-10-24 "
+ ]
+ },
+ "execution_count": 49,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# V kterých zemích EU se hodně jí?\n",
+ "countries.query(\"is_eu & (calories_per_day > 3500)\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol**: Která jediná země Afriky patří do skupiny s vysokými příjmy?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol**: Ve kterých zemích se pije opravdu hodně (použij výše uvedené nebo jakékoliv jiné kritérium)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Řazení"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "V úvodní lekci `pandas` jsme si již ukázali, jak pomocí metody `sort_index` seřadit řádky podle indexu. Jelikož `countries` už jsou srovnané, vyzkoušíme si to ještě jednou na planetách:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 50,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " symbol | \n",
+ " obezna_poloosa | \n",
+ " obezna_doba | \n",
+ " mesice | \n",
+ " je_obr | \n",
+ "
\n",
+ " \n",
+ " | jmeno | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Jupiter | \n",
+ " ♃ | \n",
+ " 5.20 | \n",
+ " 11.86 | \n",
+ " 79 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Mars | \n",
+ " ♂ | \n",
+ " 1.52 | \n",
+ " 1.88 | \n",
+ " 2 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Merkur | \n",
+ " ☿ | \n",
+ " 0.39 | \n",
+ " 0.24 | \n",
+ " 0 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Neptun | \n",
+ " ♆ | \n",
+ " 30.06 | \n",
+ " 164.80 | \n",
+ " 14 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Saturn | \n",
+ " ♄ | \n",
+ " 9.54 | \n",
+ " 29.46 | \n",
+ " 82 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Uran | \n",
+ " ♅ | \n",
+ " 19.22 | \n",
+ " 84.01 | \n",
+ " 27 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " | Venuše | \n",
+ " ♀ | \n",
+ " 0.72 | \n",
+ " 0.62 | \n",
+ " 0 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " | Země | \n",
+ " ⊕ | \n",
+ " 1.00 | \n",
+ " 1.00 | \n",
+ " 1 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " symbol obezna_poloosa obezna_doba mesice je_obr\n",
+ "jmeno \n",
+ "Jupiter ♃ 5.20 11.86 79 True\n",
+ "Mars ♂ 1.52 1.88 2 False\n",
+ "Merkur ☿ 0.39 0.24 0 False\n",
+ "Neptun ♆ 30.06 164.80 14 True\n",
+ "Saturn ♄ 9.54 29.46 82 True\n",
+ "Uran ♅ 19.22 84.01 27 True\n",
+ "Venuše ♀ 0.72 0.62 0 False\n",
+ "Země ⊕ 1.00 1.00 1 False"
+ ]
+ },
+ "execution_count": 50,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "planety.sort_index()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Pro řazení hodnot v `Series` se použije metoda `sort_values`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 51,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Tuvalu 9888.0\n",
+ "Nauru 10440.0\n",
+ "Palau 20920.0\n",
+ "San Marino 32160.0\n",
+ "Monaco 35460.0\n",
+ "Liechtenstein 36870.0\n",
+ "Saint Kitts and Nevis 54340.0\n",
+ "Marshall Islands 56690.0\n",
+ "Dominica 67700.0\n",
+ "Seychelles 87420.0\n",
+ "Name: population, dtype: float64"
+ ]
+ },
+ "execution_count": 51,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 10 zemí s nejmenším počtem obyvatel\n",
+ "countries[\"population\"].sort_values().head(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Nepovinný argument `ascending` říká, kterým směrem máme řadit. Výchozí hodnota je `True`, změnou na `False` tedy budeme řadit od největšího k nejmenšímu:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 52,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name\n",
+ "Russia 17098250.0\n",
+ "Canada 9984670.0\n",
+ "United States 9831510.0\n",
+ "China 9562911.0\n",
+ "Brazil 8515770.0\n",
+ "Australia 7741220.0\n",
+ "India 3287259.0\n",
+ "Argentina 2780400.0\n",
+ "Kazakhstan 2724902.0\n",
+ "Algeria 2381740.0\n",
+ "Name: area, dtype: float64"
+ ]
+ },
+ "execution_count": 52,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Největších 10 zemí podle rozlohy\n",
+ "countries[\"area\"].sort_values(ascending=False).head(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "V případě tabulky je třeba jako první argument uvést jméno sloupce (nebo sloupců), podle kterých chceme řadit:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 53,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " iso | \n",
+ " world_6region | \n",
+ " world_4region | \n",
+ " income_groups | \n",
+ " is_eu | \n",
+ " is_oecd | \n",
+ " eu_accession | \n",
+ " year | \n",
+ " area | \n",
+ " population | \n",
+ " alcohol_adults | \n",
+ " bmi_men | \n",
+ " bmi_women | \n",
+ " car_deaths_per_100000_people | \n",
+ " calories_per_day | \n",
+ " infant_mortality | \n",
+ " life_expectancy | \n",
+ " life_expectancy_female | \n",
+ " life_expectancy_male | \n",
+ " un_accession | \n",
+ "
\n",
+ " \n",
+ " | name | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Moldova | \n",
+ " MDA | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " lower_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 33850.0 | \n",
+ " 3496000.0 | \n",
+ " 23.01 | \n",
+ " 24.24 | \n",
+ " 27.06 | \n",
+ " 5.529 | \n",
+ " 2714.0 | \n",
+ " 13.6 | \n",
+ " 72.41 | \n",
+ " 76.090 | \n",
+ " 67.544 | \n",
+ " 1992-03-02 | \n",
+ "
\n",
+ " \n",
+ " | South Korea | \n",
+ " KOR | \n",
+ " east_asia_pacific | \n",
+ " asia | \n",
+ " high_income | \n",
+ " False | \n",
+ " True | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 100280.0 | \n",
+ " 48770000.0 | \n",
+ " 19.15 | \n",
+ " 23.99 | \n",
+ " 23.33 | \n",
+ " 4.319 | \n",
+ " 3334.0 | \n",
+ " 2.9 | \n",
+ " 81.35 | \n",
+ " 85.467 | \n",
+ " 79.456 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Belarus | \n",
+ " BLR | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " upper_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 207600.0 | \n",
+ " 9498000.0 | \n",
+ " 18.85 | \n",
+ " 26.16 | \n",
+ " 26.64 | \n",
+ " 8.454 | \n",
+ " 3250.0 | \n",
+ " 3.4 | \n",
+ " 73.76 | \n",
+ " 78.583 | \n",
+ " 67.693 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | North Korea | \n",
+ " PRK | \n",
+ " east_asia_pacific | \n",
+ " asia | \n",
+ " low_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 120540.0 | \n",
+ " 24650000.0 | \n",
+ " 18.28 | \n",
+ " 22.02 | \n",
+ " 21.25 | \n",
+ " NaN | \n",
+ " 2094.0 | \n",
+ " 19.7 | \n",
+ " 71.13 | \n",
+ " 75.512 | \n",
+ " 68.450 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Ukraine | \n",
+ " UKR | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " lower_middle_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 603550.0 | \n",
+ " 44700000.0 | \n",
+ " 17.47 | \n",
+ " 25.42 | \n",
+ " 26.23 | \n",
+ " 8.771 | \n",
+ " 3138.0 | \n",
+ " 7.7 | \n",
+ " 72.29 | \n",
+ " 77.067 | \n",
+ " 67.246 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | Estonia | \n",
+ " EST | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 45230.0 | \n",
+ " 1339000.0 | \n",
+ " 17.24 | \n",
+ " 26.26 | \n",
+ " 25.19 | \n",
+ " 5.896 | \n",
+ " 3253.0 | \n",
+ " 2.3 | \n",
+ " 77.66 | \n",
+ " 82.111 | \n",
+ " 73.201 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Czechia | \n",
+ " CZE | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 78870.0 | \n",
+ " 10590000.0 | \n",
+ " 16.47 | \n",
+ " 27.91 | \n",
+ " 26.51 | \n",
+ " 5.720 | \n",
+ " 3256.0 | \n",
+ " 2.8 | \n",
+ " 79.37 | \n",
+ " 81.858 | \n",
+ " 76.148 | \n",
+ " 1993-01-19 | \n",
+ "
\n",
+ " \n",
+ " | Uganda | \n",
+ " UGA | \n",
+ " sub_saharan_africa | \n",
+ " africa | \n",
+ " low_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 241550.0 | \n",
+ " 36760000.0 | \n",
+ " 16.40 | \n",
+ " 22.36 | \n",
+ " 22.48 | \n",
+ " 13.690 | \n",
+ " 2130.0 | \n",
+ " 37.7 | \n",
+ " 62.86 | \n",
+ " 62.667 | \n",
+ " 58.252 | \n",
+ " 1962-10-25 | \n",
+ "
\n",
+ " \n",
+ " | Lithuania | \n",
+ " LTU | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " True | \n",
+ " True | \n",
+ " 2004-05-01 | \n",
+ " 2018 | \n",
+ " 65286.0 | \n",
+ " 3278000.0 | \n",
+ " 16.30 | \n",
+ " 26.86 | \n",
+ " 26.01 | \n",
+ " 8.090 | \n",
+ " 3417.0 | \n",
+ " 3.3 | \n",
+ " 75.31 | \n",
+ " 80.060 | \n",
+ " 69.554 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Russia | \n",
+ " RUS | \n",
+ " europe_central_asia | \n",
+ " europe | \n",
+ " high_income | \n",
+ " False | \n",
+ " False | \n",
+ " NaN | \n",
+ " 2018 | \n",
+ " 17098250.0 | \n",
+ " 142600000.0 | \n",
+ " 16.23 | \n",
+ " 26.01 | \n",
+ " 27.21 | \n",
+ " 14.380 | \n",
+ " 3361.0 | \n",
+ " 8.2 | \n",
+ " 71.07 | \n",
+ " 76.882 | \n",
+ " 65.771 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " iso world_6region world_4region income_groups \\\n",
+ "name \n",
+ "Moldova MDA europe_central_asia europe lower_middle_income \n",
+ "South Korea KOR east_asia_pacific asia high_income \n",
+ "Belarus BLR europe_central_asia europe upper_middle_income \n",
+ "North Korea PRK east_asia_pacific asia low_income \n",
+ "Ukraine UKR europe_central_asia europe lower_middle_income \n",
+ "Estonia EST europe_central_asia europe high_income \n",
+ "Czechia CZE europe_central_asia europe high_income \n",
+ "Uganda UGA sub_saharan_africa africa low_income \n",
+ "Lithuania LTU europe_central_asia europe high_income \n",
+ "Russia RUS europe_central_asia europe high_income \n",
+ "\n",
+ " is_eu is_oecd eu_accession year area population \\\n",
+ "name \n",
+ "Moldova False False NaN 2018 33850.0 3496000.0 \n",
+ "South Korea False True NaN 2018 100280.0 48770000.0 \n",
+ "Belarus False False NaN 2018 207600.0 9498000.0 \n",
+ "North Korea False False NaN 2018 120540.0 24650000.0 \n",
+ "Ukraine False False NaN 2018 603550.0 44700000.0 \n",
+ "Estonia True True 2004-05-01 2018 45230.0 1339000.0 \n",
+ "Czechia True True 2004-05-01 2018 78870.0 10590000.0 \n",
+ "Uganda False False NaN 2018 241550.0 36760000.0 \n",
+ "Lithuania True True 2004-05-01 2018 65286.0 3278000.0 \n",
+ "Russia False False NaN 2018 17098250.0 142600000.0 \n",
+ "\n",
+ " alcohol_adults bmi_men bmi_women car_deaths_per_100000_people \\\n",
+ "name \n",
+ "Moldova 23.01 24.24 27.06 5.529 \n",
+ "South Korea 19.15 23.99 23.33 4.319 \n",
+ "Belarus 18.85 26.16 26.64 8.454 \n",
+ "North Korea 18.28 22.02 21.25 NaN \n",
+ "Ukraine 17.47 25.42 26.23 8.771 \n",
+ "Estonia 17.24 26.26 25.19 5.896 \n",
+ "Czechia 16.47 27.91 26.51 5.720 \n",
+ "Uganda 16.40 22.36 22.48 13.690 \n",
+ "Lithuania 16.30 26.86 26.01 8.090 \n",
+ "Russia 16.23 26.01 27.21 14.380 \n",
+ "\n",
+ " calories_per_day infant_mortality life_expectancy \\\n",
+ "name \n",
+ "Moldova 2714.0 13.6 72.41 \n",
+ "South Korea 3334.0 2.9 81.35 \n",
+ "Belarus 3250.0 3.4 73.76 \n",
+ "North Korea 2094.0 19.7 71.13 \n",
+ "Ukraine 3138.0 7.7 72.29 \n",
+ "Estonia 3253.0 2.3 77.66 \n",
+ "Czechia 3256.0 2.8 79.37 \n",
+ "Uganda 2130.0 37.7 62.86 \n",
+ "Lithuania 3417.0 3.3 75.31 \n",
+ "Russia 3361.0 8.2 71.07 \n",
+ "\n",
+ " life_expectancy_female life_expectancy_male un_accession \n",
+ "name \n",
+ "Moldova 76.090 67.544 1992-03-02 \n",
+ "South Korea 85.467 79.456 1991-09-17 \n",
+ "Belarus 78.583 67.693 1945-10-24 \n",
+ "North Korea 75.512 68.450 1991-09-17 \n",
+ "Ukraine 77.067 67.246 1945-10-24 \n",
+ "Estonia 82.111 73.201 1991-09-17 \n",
+ "Czechia 81.858 76.148 1993-01-19 \n",
+ "Uganda 62.667 58.252 1962-10-25 \n",
+ "Lithuania 80.060 69.554 1991-09-17 \n",
+ "Russia 76.882 65.771 1945-10-24 "
+ ]
+ },
+ "execution_count": 53,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 10 zemí s největší spotřebou alkoholu na jednoho obyvatele\n",
+ "countries.sort_values(\"alcohol_adults\", ascending=False).head(10)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 54,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " eu_accession | \n",
+ " un_accession | \n",
+ "
\n",
+ " \n",
+ " | name | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | France | \n",
+ " 1952-07-23 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | Luxembourg | \n",
+ " 1952-07-23 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | Netherlands | \n",
+ " 1952-07-23 | \n",
+ " 1945-12-10 | \n",
+ "
\n",
+ " \n",
+ " | Belgium | \n",
+ " 1952-07-23 | \n",
+ " 1945-12-27 | \n",
+ "
\n",
+ " \n",
+ " | Italy | \n",
+ " 1952-07-23 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Germany | \n",
+ " 1952-07-23 | \n",
+ " 1973-09-18 | \n",
+ "
\n",
+ " \n",
+ " | Denmark | \n",
+ " 1973-01-01 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | United Kingdom | \n",
+ " 1973-01-01 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | Ireland | \n",
+ " 1973-01-01 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Greece | \n",
+ " 1981-01-01 | \n",
+ " 1945-10-25 | \n",
+ "
\n",
+ " \n",
+ " | Portugal | \n",
+ " 1986-01-01 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Spain | \n",
+ " 1986-01-01 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Sweden | \n",
+ " 1995-01-01 | \n",
+ " 1946-11-19 | \n",
+ "
\n",
+ " \n",
+ " | Austria | \n",
+ " 1995-01-01 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Finland | \n",
+ " 1995-01-01 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Poland | \n",
+ " 2004-05-01 | \n",
+ " 1945-10-24 | \n",
+ "
\n",
+ " \n",
+ " | Hungary | \n",
+ " 2004-05-01 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Cyprus | \n",
+ " 2004-05-01 | \n",
+ " 1960-09-20 | \n",
+ "
\n",
+ " \n",
+ " | Malta | \n",
+ " 2004-05-01 | \n",
+ " 1964-12-01 | \n",
+ "
\n",
+ " \n",
+ " | Estonia | \n",
+ " 2004-05-01 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Latvia | \n",
+ " 2004-05-01 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Lithuania | \n",
+ " 2004-05-01 | \n",
+ " 1991-09-17 | \n",
+ "
\n",
+ " \n",
+ " | Slovenia | \n",
+ " 2004-05-01 | \n",
+ " 1992-05-22 | \n",
+ "
\n",
+ " \n",
+ " | Czechia | \n",
+ " 2004-05-01 | \n",
+ " 1993-01-19 | \n",
+ "
\n",
+ " \n",
+ " | Slovakia | \n",
+ " 2004-05-01 | \n",
+ " 1993-01-19 | \n",
+ "
\n",
+ " \n",
+ " | Bulgaria | \n",
+ " 2007-01-01 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Romania | \n",
+ " 2007-01-01 | \n",
+ " 1955-12-14 | \n",
+ "
\n",
+ " \n",
+ " | Croatia | \n",
+ " 2013-01-01 | \n",
+ " 1992-05-22 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " eu_accession un_accession\n",
+ "name \n",
+ "France 1952-07-23 1945-10-24\n",
+ "Luxembourg 1952-07-23 1945-10-24\n",
+ "Netherlands 1952-07-23 1945-12-10\n",
+ "Belgium 1952-07-23 1945-12-27\n",
+ "Italy 1952-07-23 1955-12-14\n",
+ "Germany 1952-07-23 1973-09-18\n",
+ "Denmark 1973-01-01 1945-10-24\n",
+ "United Kingdom 1973-01-01 1945-10-24\n",
+ "Ireland 1973-01-01 1955-12-14\n",
+ "Greece 1981-01-01 1945-10-25\n",
+ "Portugal 1986-01-01 1955-12-14\n",
+ "Spain 1986-01-01 1955-12-14\n",
+ "Sweden 1995-01-01 1946-11-19\n",
+ "Austria 1995-01-01 1955-12-14\n",
+ "Finland 1995-01-01 1955-12-14\n",
+ "Poland 2004-05-01 1945-10-24\n",
+ "Hungary 2004-05-01 1955-12-14\n",
+ "Cyprus 2004-05-01 1960-09-20\n",
+ "Malta 2004-05-01 1964-12-01\n",
+ "Estonia 2004-05-01 1991-09-17\n",
+ "Latvia 2004-05-01 1991-09-17\n",
+ "Lithuania 2004-05-01 1991-09-17\n",
+ "Slovenia 2004-05-01 1992-05-22\n",
+ "Czechia 2004-05-01 1993-01-19\n",
+ "Slovakia 2004-05-01 1993-01-19\n",
+ "Bulgaria 2007-01-01 1955-12-14\n",
+ "Romania 2007-01-01 1955-12-14\n",
+ "Croatia 2013-01-01 1992-05-22"
+ ]
+ },
+ "execution_count": 54,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "(\n",
+ " # Uvažuj jenom EU\n",
+ " countries[countries[\"is_eu\"]]\n",
+ " \n",
+ " # Seřaď nejdřív podle data vstupu do EU, pak podle vstupu do OSN\n",
+ " .sort_values([\"eu_accession\", \"un_accession\"])\n",
+ "\n",
+ " # Zobraz si jen ty dva sloupce\n",
+ " [[\"eu_accession\", \"un_accession\"]]\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "💡 Ostatně je možné řadit nejen řádky, ale i sloupce. Následující příklad rovná sloupce podle jejich názvu (indexu). Poslouží k tomu (podobně jako v jiných podobných případech) argument `axis`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 55,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " alcohol_adults | \n",
+ " area | \n",
+ " bmi_men | \n",
+ " bmi_women | \n",
+ " calories_per_day | \n",
+ " car_deaths_per_100000_people | \n",
+ " eu_accession | \n",
+ " income_groups | \n",
+ " infant_mortality | \n",
+ " is_eu | \n",
+ " is_oecd | \n",
+ " iso | \n",
+ " life_expectancy | \n",
+ " life_expectancy_female | \n",
+ " life_expectancy_male | \n",
+ " population | \n",
+ " un_accession | \n",
+ " world_4region | \n",
+ " world_6region | \n",
+ " year | \n",
+ "
\n",
+ " \n",
+ " | name | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | Afghanistan | \n",
+ " 0.03 | \n",
+ " 652860.0 | \n",
+ " 20.62 | \n",
+ " 21.07 | \n",
+ " 2090.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " low_income | \n",
+ " 66.3 | \n",
+ " False | \n",
+ " False | \n",
+ " AFG | \n",
+ " 58.69 | \n",
+ " 65.812 | \n",
+ " 63.101 | \n",
+ " 34500000.0 | \n",
+ " 1946-11-19 | \n",
+ " asia | \n",
+ " south_asia | \n",
+ " 2018 | \n",
+ "
\n",
+ " \n",
+ " | Albania | \n",
+ " 7.29 | \n",
+ " 28750.0 | \n",
+ " 26.45 | \n",
+ " 25.66 | \n",
+ " 3193.0 | \n",
+ " 5.978 | \n",
+ " NaN | \n",
+ " upper_middle_income | \n",
+ " 12.5 | \n",
+ " False | \n",
+ " False | \n",
+ " ALB | \n",
+ " 78.01 | \n",
+ " 80.737 | \n",
+ " 76.693 | \n",
+ " 3238000.0 | \n",
+ " 1955-12-14 | \n",
+ " europe | \n",
+ " europe_central_asia | \n",
+ " 2018 | \n",
+ "
\n",
+ " \n",
+ " | Algeria | \n",
+ " 0.69 | \n",
+ " 2381740.0 | \n",
+ " 24.60 | \n",
+ " 26.37 | \n",
+ " 3296.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " upper_middle_income | \n",
+ " 21.9 | \n",
+ " False | \n",
+ " False | \n",
+ " DZA | \n",
+ " 77.86 | \n",
+ " 77.784 | \n",
+ " 75.279 | \n",
+ " 36980000.0 | \n",
+ " 1962-10-08 | \n",
+ " africa | \n",
+ " middle_east_north_africa | \n",
+ " 2018 | \n",
+ "
\n",
+ " \n",
+ " | Andorra | \n",
+ " 10.17 | \n",
+ " 470.0 | \n",
+ " 27.63 | \n",
+ " 26.43 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " high_income | \n",
+ " 2.1 | \n",
+ " False | \n",
+ " False | \n",
+ " AND | \n",
+ " 82.55 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 88910.0 | \n",
+ " 1993-07-28 | \n",
+ " europe | \n",
+ " europe_central_asia | \n",
+ " 2017 | \n",
+ "
\n",
+ " \n",
+ " | Angola | \n",
+ " 5.57 | \n",
+ " 1246700.0 | \n",
+ " 22.25 | \n",
+ " 23.48 | \n",
+ " 2473.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " upper_middle_income | \n",
+ " 96.0 | \n",
+ " False | \n",
+ " False | \n",
+ " AGO | \n",
+ " 65.19 | \n",
+ " 64.939 | \n",
+ " 59.213 | \n",
+ " 20710000.0 | \n",
+ " 1976-12-01 | \n",
+ " africa | \n",
+ " sub_saharan_africa | \n",
+ " 2018 | \n",
+ "
\n",
+ " \n",
+ " | ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ "
\n",
+ " \n",
+ " | Venezuela | \n",
+ " 7.60 | \n",
+ " 912050.0 | \n",
+ " 27.45 | \n",
+ " 28.13 | \n",
+ " 2631.0 | \n",
+ " 7.332 | \n",
+ " NaN | \n",
+ " upper_middle_income | \n",
+ " 12.9 | \n",
+ " False | \n",
+ " False | \n",
+ " VEN | \n",
+ " 75.91 | \n",
+ " 79.079 | \n",
+ " 70.950 | \n",
+ " 30340000.0 | \n",
+ " 1945-11-15 | \n",
+ " americas | \n",
+ " america | \n",
+ " 2018 | \n",
+ "
\n",
+ " \n",
+ " | Vietnam | \n",
+ " 3.91 | \n",
+ " 330967.0 | \n",
+ " 20.92 | \n",
+ " 21.07 | \n",
+ " 2745.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " lower_middle_income | \n",
+ " 17.3 | \n",
+ " False | \n",
+ " False | \n",
+ " VNM | \n",
+ " 74.88 | \n",
+ " 81.203 | \n",
+ " 72.003 | \n",
+ " 90660000.0 | \n",
+ " 1977-09-20 | \n",
+ " asia | \n",
+ " east_asia_pacific | \n",
+ " 2018 | \n",
+ "
\n",
+ " \n",
+ " | Yemen | \n",
+ " 0.20 | \n",
+ " 527970.0 | \n",
+ " 24.44 | \n",
+ " 26.11 | \n",
+ " 2223.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " lower_middle_income | \n",
+ " 33.8 | \n",
+ " False | \n",
+ " False | \n",
+ " YEM | \n",
+ " 67.14 | \n",
+ " 66.871 | \n",
+ " 63.875 | \n",
+ " 26360000.0 | \n",
+ " 1947-09-30 | \n",
+ " asia | \n",
+ " middle_east_north_africa | \n",
+ " 2018 | \n",
+ "
\n",
+ " \n",
+ " | Zambia | \n",
+ " 3.56 | \n",
+ " 752610.0 | \n",
+ " 20.68 | \n",
+ " 23.05 | \n",
+ " 1930.0 | \n",
+ " 11.260 | \n",
+ " NaN | \n",
+ " lower_middle_income | \n",
+ " 43.3 | \n",
+ " False | \n",
+ " False | \n",
+ " ZMB | \n",
+ " 59.45 | \n",
+ " 65.362 | \n",
+ " 59.845 | \n",
+ " 14310000.0 | \n",
+ " 1964-12-01 | \n",
+ " africa | \n",
+ " sub_saharan_africa | \n",
+ " 2018 | \n",
+ "
\n",
+ " \n",
+ " | Zimbabwe | \n",
+ " 4.96 | \n",
+ " 390760.0 | \n",
+ " 22.03 | \n",
+ " 24.65 | \n",
+ " 2110.0 | \n",
+ " 20.850 | \n",
+ " NaN | \n",
+ " low_income | \n",
+ " 46.6 | \n",
+ " False | \n",
+ " False | \n",
+ " ZWE | \n",
+ " 60.18 | \n",
+ " 63.944 | \n",
+ " 60.120 | \n",
+ " 13330000.0 | \n",
+ " 1980-08-25 | \n",
+ " africa | \n",
+ " sub_saharan_africa | \n",
+ " 2018 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
193 rows × 20 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " alcohol_adults area bmi_men bmi_women calories_per_day \\\n",
+ "name \n",
+ "Afghanistan 0.03 652860.0 20.62 21.07 2090.0 \n",
+ "Albania 7.29 28750.0 26.45 25.66 3193.0 \n",
+ "Algeria 0.69 2381740.0 24.60 26.37 3296.0 \n",
+ "Andorra 10.17 470.0 27.63 26.43 NaN \n",
+ "Angola 5.57 1246700.0 22.25 23.48 2473.0 \n",
+ "... ... ... ... ... ... \n",
+ "Venezuela 7.60 912050.0 27.45 28.13 2631.0 \n",
+ "Vietnam 3.91 330967.0 20.92 21.07 2745.0 \n",
+ "Yemen 0.20 527970.0 24.44 26.11 2223.0 \n",
+ "Zambia 3.56 752610.0 20.68 23.05 1930.0 \n",
+ "Zimbabwe 4.96 390760.0 22.03 24.65 2110.0 \n",
+ "\n",
+ " car_deaths_per_100000_people eu_accession income_groups \\\n",
+ "name \n",
+ "Afghanistan NaN NaN low_income \n",
+ "Albania 5.978 NaN upper_middle_income \n",
+ "Algeria NaN NaN upper_middle_income \n",
+ "Andorra NaN NaN high_income \n",
+ "Angola NaN NaN upper_middle_income \n",
+ "... ... ... ... \n",
+ "Venezuela 7.332 NaN upper_middle_income \n",
+ "Vietnam NaN NaN lower_middle_income \n",
+ "Yemen NaN NaN lower_middle_income \n",
+ "Zambia 11.260 NaN lower_middle_income \n",
+ "Zimbabwe 20.850 NaN low_income \n",
+ "\n",
+ " infant_mortality is_eu is_oecd iso life_expectancy \\\n",
+ "name \n",
+ "Afghanistan 66.3 False False AFG 58.69 \n",
+ "Albania 12.5 False False ALB 78.01 \n",
+ "Algeria 21.9 False False DZA 77.86 \n",
+ "Andorra 2.1 False False AND 82.55 \n",
+ "Angola 96.0 False False AGO 65.19 \n",
+ "... ... ... ... ... ... \n",
+ "Venezuela 12.9 False False VEN 75.91 \n",
+ "Vietnam 17.3 False False VNM 74.88 \n",
+ "Yemen 33.8 False False YEM 67.14 \n",
+ "Zambia 43.3 False False ZMB 59.45 \n",
+ "Zimbabwe 46.6 False False ZWE 60.18 \n",
+ "\n",
+ " life_expectancy_female life_expectancy_male population \\\n",
+ "name \n",
+ "Afghanistan 65.812 63.101 34500000.0 \n",
+ "Albania 80.737 76.693 3238000.0 \n",
+ "Algeria 77.784 75.279 36980000.0 \n",
+ "Andorra NaN NaN 88910.0 \n",
+ "Angola 64.939 59.213 20710000.0 \n",
+ "... ... ... ... \n",
+ "Venezuela 79.079 70.950 30340000.0 \n",
+ "Vietnam 81.203 72.003 90660000.0 \n",
+ "Yemen 66.871 63.875 26360000.0 \n",
+ "Zambia 65.362 59.845 14310000.0 \n",
+ "Zimbabwe 63.944 60.120 13330000.0 \n",
+ "\n",
+ " un_accession world_4region world_6region year \n",
+ "name \n",
+ "Afghanistan 1946-11-19 asia south_asia 2018 \n",
+ "Albania 1955-12-14 europe europe_central_asia 2018 \n",
+ "Algeria 1962-10-08 africa middle_east_north_africa 2018 \n",
+ "Andorra 1993-07-28 europe europe_central_asia 2017 \n",
+ "Angola 1976-12-01 africa sub_saharan_africa 2018 \n",
+ "... ... ... ... ... \n",
+ "Venezuela 1945-11-15 americas america 2018 \n",
+ "Vietnam 1977-09-20 asia east_asia_pacific 2018 \n",
+ "Yemen 1947-09-30 asia middle_east_north_africa 2018 \n",
+ "Zambia 1964-12-01 africa sub_saharan_africa 2018 \n",
+ "Zimbabwe 1980-08-25 africa sub_saharan_africa 2018 \n",
+ "\n",
+ "[193 rows x 20 columns]"
+ ]
+ },
+ "execution_count": 55,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "countries.sort_index(axis=1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol:** Seřaď země světa podle hustoty obyvatel."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol:** Které země mají problémy s nadváhou (průměrné BMI mužů a žen je přes 25)?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol:** V kterých 20 zemích umře absolutně nejvíc lidí při automobilových haváriích?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Ulož výsledky!\n",
+ "\n",
+ "A tím už pomalu končíme. Jenže jsme udělali (skoro) netriviální množství práce a ta bude do příště ztracená. Naštěstí zapsat `DataFrame` do externího souboru v některém z typických formátů není vůbec komplikované. K sadě funkcí `pd.read_XXX` existují jejich protějšky `DataFrame.to_XXX`. Liší se různými parametry, ale základní použití je velmi jednoduché:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 56,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "planety.to_csv(\"planety.csv\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 57,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "planety.to_excel(\"planety.xlsx\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Jednou z možností je i vytvoření HTML tabulky (které lze dodat i různé formátování, což ovšem nechme raději na jindy nebo na doma, viz [dokumentace \"Styling\"](https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html)). Výchozí `to_html` si bohužel neporadí s \"nezápadními\" symboly (což je třeba ☿), a tak mu (v našem konkrétním případě) musíme předat korektně otevřený soubor:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 58,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# planety.to_html(\"planety.html\") # To nefunguje :-(\n",
+ "\n",
+ "with open(\"planety.html\", \"w\", encoding=\"utf-8\") as out:\n",
+ " planety.to_html(out)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 59,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "countries.to_html(\"countries.html\") # Žádné exotické symboly :-)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol**: Podívej se, co ve výstupních souborech najdeš."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol**: Podívej se na seznam možných výstupních formátů a zkus si planety nebo země zapsat do nějakého z nich: https://pandas.pydata.org/pandas-docs/stable/reference/frame.html#serialization-io-conversion"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "A to už je opravdu všechno. 👋"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/lessons/pydata/visualization_basics/index.ipynb b/lessons/pydata/visualization_basics/index.ipynb
new file mode 100644
index 0000000000..5b89b04aa3
--- /dev/null
+++ b/lessons/pydata/visualization_basics/index.ipynb
@@ -0,0 +1,612 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Základy vizualizace - v pandas a pro pandas\n",
+ "\n",
+ "Jeden obrázek (či graf) někdy dokáže říci více než tisíc slov. U (explorativní) datové analýzy to platí dvojnásob (A jako umí být manipulativní článek o tisíci slovech, o to manipulativnější umí být \"vhodně\" připravený graf).\n",
+ "\n",
+ "V této lekci si ukážeme, jak z dat, která už umíš načíst a se kterými provádíš mnohé aritmetické operace, vykreslíš některé základní typy grafů (sloupcový, spojnicový a bodový)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Rozmanitý svět vizualizačních knihoven v Pythonu\n",
+ "\n",
+ "Zatímco ohledně knihovny pro běžné zpracování tabulkových dat panuje shoda a při zkoumání malých až středně velkých dat nepříliš exotického typu téměř vždy analytici běžně sahají po `pandas`, knihoven pro vizualizaci dat existuje nepřeberné množství - každá má svoje výhody i nevýhody. My si během lekcí EDA zmíníme tyto tři (a budeme se soustředit především na to, jak je použít společně s pandas):\n",
+ "\n",
+ "- `matplotlib` - Toto je asi nejrozšířenější a v mnoha ohledech nejflexibilnější knihovna. Představuje výchozí volbu, pokud potřebuješ dobře vyhlížející statické grafy, které budou fungovat skoro všude. Značná flexibilita je vyvážena někdy ne zcela intuitivními jmény funkcí a argumentů. Pandas ji využívá interně (takže s trochou snahy můžeš předstírat, že o její existenci nevíš). Viz https://matplotlib.org/.\n",
+ "\n",
+ "- `seaborn` - Cílem této knihovny je pomoci zejména se statistickými grafy. Staví na matplotlibu, ale překrývá ho \"lidskou\" tváří. My s ním budeme pracovat při vizualizaci složitějších vztahů mezi více proměnnými. Viz https://seaborn.pydata.org/.\n",
+ "\n",
+ "- `plotly` (a zejména její podmnožina `plotly.express`) - Po této knihovně zejména sáhneš, budeš-li chtít do své vizualizace vložit interaktivitu. Ta se samozřejmě obtížně tiskne na papír, ale zejména při práci v Jupyter notebooku umožní vše zkoumat výrazně rychleji. Viz https://plot.ly/python/.\n",
+ "\n",
+ "Pro zájemce o bližší vysvětlení doporučujeme podívat se na (již poněkud starší) video od J. Vanderplase: Python Visualizations' Landscape (https://www.youtube.com/watch?v=FytuB8nFHPQ), které shrnuje základní vlastnosti jednotlivých knihoven.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%matplotlib inline\n",
+ "\n",
+ "# Co to má znamenat!?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Jestli ses dosud tvářil/a, že nevíš o existenci matplotlibu, teď už nemůžeš :-). Tato mysteriózní řádka (ve skutečnosti \"IPython magic command\") říká, že všechny grafy se automaticky vykreslí přímo do notebooku (to vůbec není samozřejmé a lekcdy to ani nechceme - třeba když chceme grafy ukládat rovnou do souboru nebo interaktivně mimo notebook).\n",
+ "\n",
+ "Více viz https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-matplotlib.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Příprava - zdroj dat\n",
+ "\n",
+ "Nejdříve si načteme nám již známá data se zeměmi světa. Přidáme k tomu i tabulku s vývojem některých ukazatelů v čase pro Českou republiku (a hned se na ně podíváme)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "# TODO: opravit podle toho, jak to bude\n",
+ "\n",
+ "# Světová data\n",
+ "url = \"https://raw.githubusercontent.com/janpipek/data-pro-pyladies/master/data/countries.csv\"\n",
+ "countries = pd.read_csv(url).set_index(\"name\")\n",
+ "\n",
+ "# Česká data\n",
+ "url = \"https://raw.githubusercontent.com/janpipek/data-pro-pyladies/master/data/cze.csv\"\n",
+ "czech = pd.read_csv(url)\n",
+ "czech"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Sloupcový graf (bar plot)\n",
+ "\n",
+ "Úplně nejjednodušší graf, který můžeš vytvořit, je **sloupcový**. Vedle sebe postupně zobrazíš sloupečky vysoké podle vlastnosti, která tě zajímá. Ukazuje hodnoty jedné proměnné, aniž by je jakýmkoliv způsobem statisticky zpracovával nebo porovnával s proměnnou jinou.\n",
+ "\n",
+ "V `pandas` se k funkcím pro kreslení grafů přistupuje pomocí tzv. **accessoru** `.plot`. To je hybridní objekt, který lze volat jako metodu (`Series.plot()` - použije výchozí typ grafu), anebo lze pomocí další tečky odkazovat na jeho vlastní metody, které kreslí různé typy grafů. Z \"pedagogických důvodů\" (které bývají leckdy nepochopitelné) chceme začít od sloupcového grafu, který výchozí není, a tak voláme `Series.plot.bar()`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "countries[\"life_expectancy\"].plot.bar()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Uf, to nevypadá úplně nejpřehledněji. Zkusme totéž, jen pro země Evropské Unie (kterých bylo v době psaní materiálu i zahájení kurzu stále ještě 28)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "eu_countries = countries.query(\"is_eu\") # Filtrování -> výsledek je opět DataFrame\n",
+ "eu_countries[\"life_expectancy\"].plot.bar();"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To se neporovnává úplně snadno - dožívají se lidé více ve Spojeném Království nebo v Německu? Co kdybychom (opakování z minula) hodnoty seřadili a teprve pak zobrazili?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "eu_countries[\"life_expectancy\"].sort_values(ascending=False).plot.bar();"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "A vlastně musíme kroutit hlavou, když chceme najit svoji (nebo někoho jiného domovinu). Můžeme zkusit horizontální sloupcový graf, `.plot.barh`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "eu_countries[\"life_expectancy\"].sort_values(ascending=False).plot.barh();"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Funkce pro kreslení grafů nabízejí spoustu parametrů, které nejsou úplně dobře zdokumentované a jsou dost úzce svázány s tím, jak funguje knihovna `matplotlib`. Budeme si je postupně ukazovat, když nám přijdou vhod. Náš graf by se nám hodilo trošku zvětšit na výšku. Také se hodnoty od sebe příliš neliší a nastavení vlastního rozsahu na ose x by pomohlo rozdíly zvýraznit. Plus si přidáme trošku formátování.\n",
+ "\n",
+ "- `figsize` specifikuje velikost grafu jako n-tici (tuple) velikosti v palcích v pořadí (šířka, výška). Při volbě ideální hodnoty si prostě v notebooku zaexperimentuj.\n",
+ "- `xlim` specifikuje rozsah hodnot na ose x v podobně ntice (minimum, maximum)\n",
+ "- `color` specifikuje barvu: může jít o název či o hexadecimální RGB zápis\n",
+ "- `edgecolor` říká, jakou barvou mají být sloupce ohraničeny\n",
+ "- `title` nastavuje titulek celého grafu"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "eu_countries[\"life_expectancy\"].sort_values().plot.barh(\n",
+ " figsize=(6, 8),\n",
+ " xlim=(75, 85),\n",
+ " color=\"yellow\",\n",
+ " edgecolor=\"#888888\", # střední šeď\n",
+ " title=\"Očekávaná doba dožití (roky)\"\n",
+ ");"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "💡 Začínat sloupcové (ale i mnohé další) grafy jinde než u nuly ti pomůže všimnout si i nepatrných rozdílů, a proto v explorativní fázi je to určitě dobrý nápad. Ovšem při prezentaci výsledků mohou zvýrazněné rozdíly mást publikum a budit dojem, že nějaký efekt je výrazně silnější než ve skutečnosti. Manipulační efekt je tím silnější, čím méně intuitivní jsou prezentovaná data. V tomto případě by asi málokdo uvěřil, že ve Španělsku žijí lidé šedesátkrát déle než v Lotyšsku, protože to neodpovídá běžnému očekávání, ale i tak na první pohled situace vypadá velice dramaticky (necháváme ti na posouzení, jestli rozdíl mezi 75 a 83, neboli cca 10 % je obrovský či nikoliv). Novináři takto matou poměrně často - ať už úmyslně, nebo omylem."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "V grafu ovšem můžeme velice snadno zobrazit více veličin, pokud jej nevytváříme skrze `Series`, ale `DataFrame`. Stačí místo jednoho sloupce dodat sloupců více (například výběrem z `DataFrame`) a pro každý řádek se nám zobrazí více sloupečků pod sebou.\n",
+ "\n",
+ "V našem případě se podíváme na to, kolika let se dožívají muži a ženy zvlášť. Zvolíme genderově stereotypní barvy (ono je to někdy přehlednější), ale ty si je samozřejmě můžeš upravit podle libosti."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "eu_countries.sort_values(\"life_expectancy\")[[\"life_expectancy_male\", \"life_expectancy_female\"]].plot.barh(\n",
+ " figsize=(8, 10),\n",
+ " xlim=(68, 88), # rozsah osy\n",
+ " color=[\"blue\", \"red\"], # dvě různé barvy pro dva sloupce\n",
+ " edgecolor=\"#888888\", # střední šeď\n",
+ " title=\"Očekávaná doba dožití (roky)\"\n",
+ ");"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol:** Zkus si nakreslit sloupcový graf některé z dalších charakteristik (\"sloupců\") zemí (ať už evropských, nebo filtrováním přes nějaký region) a zamysli se nad tím, jakou výpovědní hodnotu takový graf má (někdy prachbídnou)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Bodový graf (scatter plot)\n",
+ "\n",
+ "Bodový graf je nejjednodušším způsobem, jak porovnat dvě různé veličiny. V soustavě souřadníc, jak se používá v matematice, každému řádku odpovídá jeden bod (nakreslený jako symbol, nejčastěji kolečko), hodnoty dvou sloupců pak kódují souřadnici `x` a `y`. To se odráží i ve způsobu, jak bodový graf v `pandas` vytváříme.\n",
+ "\n",
+ "Zavoláme metodu `plot.scatter` naší tabulky (poznámka: bodový graf nelze jednoduše vytvořit ze `Series`) a dodáme jí coby argumenty `x` a `y` jména sloupců, která se pro souřadnice mají použít:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Souvislost mezi pitím a střední dobou života\n",
+ "countries.plot.scatter(\n",
+ " x=\"life_expectancy\",\n",
+ " y=\"alcohol_adults\");"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "💡 O kauzalitách, korelacích a souvislostech mezi veličinami si budeme povídat jindy, ale taky se nemůžeš ubránit dojmu, že čím více se někde pije, tím déle se tam žije?\n",
+ "\n",
+ "I bez matematické rigoróznosti ovšem asi poznáme, kde bude zakopaný pes. Zkusme si obarvit jednotlivé regiony světa různými (stereotypními?) barvami. Naučíme se u toho šikovnou funkci `map`, která hodnoty v `Series` nahradí podle slovníku od->do (a vrátí novou instanci `Series`). Sloupec `world_4region` obsahuje přesně 4 různé oblasti (\"kontinenty\"), tak nám bude stačit velice jednoduchý slovník.\n",
+ "\n",
+ "Ukážeme si několik dalších argumentů (jež jsou vlastně spíše argumenty použité v knihovně `matplotlib`, a tak nemůžeme jednoduše použít jméno sloupce :-( ):\n",
+ "- `s` vyjadřuje druhou mocninu velikosti symbolu v bodech (může být jedna hodnota nebo sloupec/pole hodnot)\n",
+ "- `marker` značí tvar symbolu, většinou pomocí jednoho písmene, viz [seznam možností](https://matplotlib.org/3.1.1/api/markers_api.html)\n",
+ "- `alpha` vyjadřuje neprůhlednost symbolu (0 = naprosto průhledný a není vidět, 1 = neprůhledný, intenzivní, schovává vše \"za\" ním). Hodí se, když máme velké množství symbolů v grafu a chceme jim dovolit, aby se překrývaly."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Souvislost mezi pitím a střední dobou života\n",
+ "import numpy as np\n",
+ "\n",
+ "barvy_kontinentu = {\n",
+ " \"europe\": \"blue\",\n",
+ " \"asia\": \"yellow\",\n",
+ " \"africa\": \"black\",\n",
+ " \"americas\": \"red\"\n",
+ "}\n",
+ "barva = countries[\"world_4region\"].map(barvy_kontinentu) \n",
+ "# barva obsahuje sloupec plný barev\n",
+ "\n",
+ "countries.plot.scatter(\n",
+ " figsize=(7, 7),\n",
+ " x=\"life_expectancy\",\n",
+ " y=\"alcohol_adults\",\n",
+ " marker=\"h\", # Tvar symbolu: šestiúhelník - (h)exagon\n",
+ " color=barva, # Bohužel nejde použít jen jméno sloupce, musíme dát celé \"pole\" hodnot \n",
+ " s=countries[\"population\"] / 1e6, # Velikost symbolu (na druhou) podle populace\n",
+ " edgecolor=\"black\", # Barva okraje\n",
+ " alpha=0.5 # Poloprůhledné symboly\n",
+ ");"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "A tak to vlastně vypadá, že v Asii se obecně pije málo, v Americe tak středně, v Africe se lidé dožívají menšího věku, ale na první pohled v těchto skupinách zemí nevidíme žádný trend. Jediný kontinent, který se vymyká, je Evropa, kde se jak hodně pije, tak dlouho žije, ale obojí je nejspíš důsledkem moderního způsobu života. No a při bližším pohledu se naopak zdá, že v rámci Evropy větší pití znamená kratší život. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Často se stane, že jsou hodnoty obtížně souměřitelné. Například co do rozlohy či počtu obyvatelstva se na světě vyskytují země miniaturní a naopak gigantické, kde rozdíly činí několik řádů:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "countries.plot.scatter(\n",
+ " x=\"area\",\n",
+ " y=\"population\",\n",
+ " figsize=(6,6)\n",
+ ") \n",
+ "# Tady úmyslně není středník"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "No nic moc - odděleně vidíme cca 7 až 20 bodů a zbytek splývá v jednu velikou \"kaňku\". V takovém případě se hodí opustit běžné, **lineární měřítko**. Místo něj použijeme **logaritmické měřítko**.\n",
+ "\n",
+ "To bohužel nejde udělat v `pandas` přímo, a tak se budeme chtě nechtě (ale určitě chtě, protože jsme zvídaví!) dotknout objektů knihovny `matplotlib`. Všimni se, že volání `plot` nám vrátilo jakýsi `matplotlib.axes._subplots.AxesSubplot`. To je třída reprezentující samotný graf, která má další metody, umožňující graf dále upravit. Pro změnu měřítka se používají funkce `set_xscale` a `set_yscale`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ax = countries.plot.scatter(\n",
+ " x=\"area\",\n",
+ " y=\"population\",\n",
+ " color=\"black\",\n",
+ " alpha=0.5,\n",
+ " figsize=(6, 6)\n",
+ ") \n",
+ "# ax obsahuje objekt \"grafu\", přesněji instanci třídy `AxesSubplot`\n",
+ "\n",
+ "# Pomocí metod objektu `AxesSubplot` nastavíme měřítko obou os na logaritmické\n",
+ "ax.set_xscale(\"log\")\n",
+ "ax.set_yscale(\"log\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Úkol:** Vyzkoušej si zobrazení některých dalších dvojic veličin. Které z nich ukazují zajímavé výsledky?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Spojnicový graf (line plot)\n",
+ "\n",
+ "Tento druh grafu má smysl zejména tehdy, pokud se nějaká proměnná vyvíjí spojitě v závislosti na proměnné jiné. Časové řady jsou pro to skvělým příkladem (ať už pro vztah mezi časem a veličinou, anebo dvěma veličinami, které se obě vyvíjí ve stejném čase).\n",
+ "\n",
+ "Spojnicový graf vytvoříš pomocí funkce `plot.line`. Shodou okolností je to také výchozí typ grafů pro `pandas`, a tak vlastně postačí `plot` zavolat jako metodu. Parametry má podobné jako `scatter` (bodový graf).\n",
+ "\n",
+ "Pojďme se například podívat na vývoj očekávané doby života v Česku, jak se vyvíjela s časem od začátku 80. let:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "czech.plot.line(x=\"year\", y=\"life_expectancy\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Samozřejmě můžeme opět vykreslit více sloupců."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "czech.plot(x=\"year\", y=[\"life_expectancy_female\", \"life_expectancy_male\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Pro čárové grafy existuje několik zajímavých argumentů:\n",
+ "\n",
+ "- `lw` udává tloušťku čáry v bodech\n",
+ "- `style` je styl čáry: \"-\" je plná, \":\" tečkovaná, \"--\" přerušovaná, \"-.\" čerchovaná\n",
+ "- `markersize` je velikost symbolu, který může volitelně čáru doprovázet"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "czech.plot.line(\n",
+ " x=\"year\",\n",
+ " y=[\"bmi_men\", \"bmi_women\"],\n",
+ " lw=1,\n",
+ " style=\"--\",\n",
+ " marker=\"o\", # Přidáme kulaté body pro hodnoty z tabulky\n",
+ " markersize=3);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Moc smysl čárový graf používat v případě, že na sobě dvě proměnné nejsou přímo závislé, nebo se nevyvíjí společně. Zkusme například nakreslit čárový graf vztahu mezi pitím alkoholu a dobou života v jednotlivých zemích:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "countries.plot.line(x=\"life_expectancy\", y=\"alcohol_adults\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Dostali jsme čáranici, ze které nelze vyčíst vůbec nic. Můžeš namítnout, že hodnoty nejsou seřazené, a že by situace byla lepší, kdybychom třeba země seřadili podle očekávané doby dožití. No pojďme to zkusit:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "sorted_countries = countries.sort_values(\"life_expectancy\")\n",
+ "sorted_countries.plot.line(x=\"life_expectancy\", y=\"alcohol_adults\")\n",
+ "sorted_countries[[\"life_expectancy\", \"alcohol_adults\"]]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Dává to smysl? Čára sice nelítá napříč celým grafem, \"jen\" zdola nahoru, ale i tak je to nesmysl, protože žádné \"přirozené\" uspořádání zemí neexistuje a nemá smysl se ho snažit lámáním přes koleno sestavit. V tomto případě byl bodový graf mnohem lepší volbou."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Bonus: Jak kreslit pomocí jiných knihoven?\n",
+ "\n",
+ "A to je ze základů vizualizace vlastně všechno, další typy grafů si ukážeme jindy.\n",
+ "\n",
+ "Pokud ti to ještě nestačilo, ještě si ukážeme, jak by se bodový graf vztahu mezi očekávanou délkou života a množstvím vypitého čistého alkoholu vytvořil ve třech jiných vizualizačních knihovnách. Nebudeme to však již příliš komentovat."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Bonus 1: \"čistý\" matplotlib\n",
+ "\n",
+ "Protože výchozí kreslení grafů v `pandas` staví na knihovně `matplotlib` a jen jednotlivé funkce obaluje a zpříjemňuje práci se sloupci, budou parametry funkcí povětšinou podobné (hlavní rozdíl je v tom, že neberou názvy sloupců, musíš předat sloupec jako takový)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "fig, ax = plt.subplots(figsize=(7,7))\n",
+ "\n",
+ "# TODO: změnit\n",
+ "ax.scatter(\n",
+ " countries[\"life_expectancy\"],\n",
+ " countries[\"alcohol_adults\"],\n",
+ " s=countries[\"population\"] / 1e6,\n",
+ " color=countries[\"world_4region\"].map({\"europe\": \"blue\", \"asia\": \"yellow\", \"africa\": \"black\", \"americas\": \"red\"}),\n",
+ " edgecolor=\"black\"\n",
+ ");\n",
+ "\n",
+ "# Popisky os musíme doplnit ručně\n",
+ "ax.set_xlabel(\"alcohol_adults\")\n",
+ "ax.set_ylabel(\"life_expectancy\");"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Galerie ukázkových příkladů `matplotlib` je nepřeberná: https://matplotlib.org/3.1.1/gallery/index.html"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Bonus 2: seaborn\n",
+ "\n",
+ "Seaborn je vhodný především pro složitější statistické grafy. Ale obsahuje též vlastní funkce, které obalují volání `matplotlib`u."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import seaborn as sns\n",
+ "\n",
+ "fig, ax = plt.subplots(figsize=(8,8))\n",
+ "\n",
+ "sns.scatterplot(\n",
+ " data=countries, # Pracuje s DataFrame\n",
+ " x=\"life_expectancy\", # Rozumí názvům sloupců :-)\n",
+ " y=\"alcohol_adults\",\n",
+ " size=\"population\", # Velikost podle sloupce (nepříliš vhodná)\n",
+ " hue=\"world_4region\", # Umí přiřadit barvičky podle nějaké kategorie\n",
+ " marker=\"h\"\n",
+ ");"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Mnoho ukázkových vizualací najdeš na stránkách samotného projektu: https://seaborn.pydata.org/examples/index.html"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Bonus 3: plotly.(express)\n",
+ "\n",
+ "`plotly` se vymyká, protože umožňuje přímo do notebooku zobrazit interaktivní grafy, ve kterých jde libovolně zoomovat, navíc při najetí na nějaký bod ukazují užitečné doplňují tooltipy. Od verze 4.0 navíc pomocí velice elegantně designovaných funkcí v integrovaném balíčku `plotly.express`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import plotly.express as px\n",
+ "\n",
+ "px.scatter(\n",
+ " countries.reset_index(),\n",
+ " x=\"life_expectancy\",\n",
+ " y=\"alcohol_adults\",\n",
+ " size=\"population\",\n",
+ " color=\"world_4region\",\n",
+ " hover_name=\"name\"\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "A co by řekl/a na mapu světa se zeměmi vybarvenými podle očekávané délky života?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "px.choropleth(countries.reset_index(), locations=\"iso\", color=\"life_expectancy\", hover_name=\"name\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Mnoho ukázek, včetně několika se zeměmi světa, najdeš na stránkách projektu: https://plot.ly/python/plotly-express/"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}