{
"cells": [
{
"cell_type": "markdown",
"id": "e67d0edb",
"metadata": {},
"source": [
"# Clase 13: introducción a los datos\n",
"En esta clase revisaremos como importar datos, diferentes formatos que podemos utilizar y algunos problemas típicos a la hora del manejo de información. \n",
"\n",
"\n",
"La librería que vamos a ocupar para el manejo de datos es `pandas`. \n",
"- La documentación de `pandas` la pueden encontrar en el link 1: https://pandas.pydata.org/docs/\n",
"- La documentación de read_csv la encuentran en el link 2: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html\n",
"\n",
"Conceptos clave: \n",
"- Pandas\n",
"- DataFrame\n",
"- Delimitador de miles y decimales\n",
"- Tipo de variable\n",
"- Index"
]
},
{
"cell_type": "markdown",
"id": "fc3d179d",
"metadata": {},
"source": [
"## 1. Introducción"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "c68d9523",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f59e9d6c",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "5ac7f25e",
"metadata": {},
"source": [
"Para importar una base de datos en formato `csv`vamos a utilizar `pd.read_csv(ruta/archivo)`\n",
"\n",
"En el caso de Windows llamamos la ruta con doble \"$\\backslash \\backslash$\", por ejemplo: \"C:$\\backslash \\backslash$Users$\\backslash \\backslash$...\"\n",
"\n",
"Vamos a utilizar una base csv del banco central con información de empleo. Lo primero que llama la atención es que la base importa mal debido al delimitador, la base viene con \";\" y por default viene \",\". "
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "03924724",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Periodo;1.Total;2.Empleadores;3.Cuenta Propia;4.Asalariados;5.Personal de servicio;6.Familiar no remunerado \n",
" \n",
" \n",
" \n",
" \n",
" mar.2010;7.156 \n",
" 21;318 \n",
" 32;1.289 \n",
" 68;5.141 \n",
" 76;325 \n",
" 38;81 \n",
" 8 \n",
" \n",
" \n",
" abr.2010;7.198 \n",
" 78;324 \n",
" 94;1.332 \n",
" 33;5.114 \n",
" 80;331 \n",
" 31;95 \n",
" 39 \n",
" \n",
" \n",
" may.2010;7.181 \n",
" 90;326 \n",
" 95;1.346 \n",
" 54;5.080 \n",
" 65;328 \n",
" 56;99 \n",
" 21 \n",
" \n",
" \n",
" jun.2010;7.221 \n",
" 58;328 \n",
" 03;1.384 \n",
" 28;5.074 \n",
" 00;327 \n",
" 60;107 \n",
" 68 \n",
" \n",
" \n",
" jul.2010;7.256 \n",
" 52;333 \n",
" 82;1.390 \n",
" 03;5.081 \n",
" 93;339 \n",
" 44;111 \n",
" 29 \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" nov.2020;7.916 \n",
" 72;248 \n",
" 89;1.568 \n",
" 05;5.833 \n",
" 68;188 \n",
" 43;77 \n",
" 66 \n",
" \n",
" \n",
" dic.2020;8.026 \n",
" 22;234 \n",
" 57;1.588 \n",
" 14;5.927 \n",
" 28;194 \n",
" 91;81 \n",
" 32 \n",
" \n",
" \n",
" ene.2021;8.121 \n",
" 42;237 \n",
" 25;1.610 \n",
" 63;6.000 \n",
" 74;197 \n",
" 43;75 \n",
" 36 \n",
" \n",
" \n",
" feb.2021;8.167 \n",
" 62;245 \n",
" 25;1.634 \n",
" 08;6.018 \n",
" 35;198 \n",
" 73;71 \n",
" 20 \n",
" \n",
" \n",
" mar.2021;8.148 \n",
" 21;246 \n",
" 92;1.646 \n",
" 38;5.978 \n",
" 29;204 \n",
" 48;72 \n",
" 14 \n",
" \n",
" \n",
"
\n",
"
133 rows × 1 columns
\n",
"
"
],
"text/plain": [
" Periodo;1.Total;2.Empleadores;3.Cuenta Propia;4.Asalariados;5.Personal de servicio;6.Familiar no remunerado\n",
"mar.2010;7.156 21;318 32;1.289 68;5.141 76;325 38;81 8 \n",
"abr.2010;7.198 78;324 94;1.332 33;5.114 80;331 31;95 39 \n",
"may.2010;7.181 90;326 95;1.346 54;5.080 65;328 56;99 21 \n",
"jun.2010;7.221 58;328 03;1.384 28;5.074 00;327 60;107 68 \n",
"jul.2010;7.256 52;333 82;1.390 03;5.081 93;339 44;111 29 \n",
"... ... \n",
"nov.2020;7.916 72;248 89;1.568 05;5.833 68;188 43;77 66 \n",
"dic.2020;8.026 22;234 57;1.588 14;5.927 28;194 91;81 32 \n",
"ene.2021;8.121 42;237 25;1.610 63;6.000 74;197 43;75 36 \n",
"feb.2021;8.167 62;245 25;1.634 08;6.018 35;198 73;71 20 \n",
"mar.2021;8.148 21;246 92;1.646 38;5.978 29;204 48;72 14 \n",
"\n",
"[133 rows x 1 columns]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\"/home/felix/Dropbox/Computational_Economics/Intro_python/2021_S2/Clases/clase13_base1.csv\")"
]
},
{
"cell_type": "markdown",
"id": "629a0a98",
"metadata": {},
"source": [
"Para cambiar el delimitador vamos a usar `delimiter`"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "45603e41",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Periodo \n",
" 1.Total \n",
" 2.Empleadores \n",
" 3.Cuenta Propia \n",
" 4.Asalariados \n",
" 5.Personal de servicio \n",
" 6.Familiar no remunerado \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" mar.2010 \n",
" 7.156,21 \n",
" 318,32 \n",
" 1.289,68 \n",
" 5.141,76 \n",
" 325,38 \n",
" 81,08 \n",
" \n",
" \n",
" 1 \n",
" abr.2010 \n",
" 7.198,78 \n",
" 324,94 \n",
" 1.332,33 \n",
" 5.114,80 \n",
" 331,31 \n",
" 95,39 \n",
" \n",
" \n",
" 2 \n",
" may.2010 \n",
" 7.181,90 \n",
" 326,95 \n",
" 1.346,54 \n",
" 5.080,65 \n",
" 328,56 \n",
" 99,21 \n",
" \n",
" \n",
" 3 \n",
" jun.2010 \n",
" 7.221,58 \n",
" 328,03 \n",
" 1.384,28 \n",
" 5.074,00 \n",
" 327,60 \n",
" 107,68 \n",
" \n",
" \n",
" 4 \n",
" jul.2010 \n",
" 7.256,52 \n",
" 333,82 \n",
" 1.390,03 \n",
" 5.081,93 \n",
" 339,44 \n",
" 111,29 \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 128 \n",
" nov.2020 \n",
" 7.916,72 \n",
" 248,89 \n",
" 1.568,05 \n",
" 5.833,68 \n",
" 188,43 \n",
" 77,66 \n",
" \n",
" \n",
" 129 \n",
" dic.2020 \n",
" 8.026,22 \n",
" 234,57 \n",
" 1.588,14 \n",
" 5.927,28 \n",
" 194,91 \n",
" 81,32 \n",
" \n",
" \n",
" 130 \n",
" ene.2021 \n",
" 8.121,42 \n",
" 237,25 \n",
" 1.610,63 \n",
" 6.000,74 \n",
" 197,43 \n",
" 75,36 \n",
" \n",
" \n",
" 131 \n",
" feb.2021 \n",
" 8.167,62 \n",
" 245,25 \n",
" 1.634,08 \n",
" 6.018,35 \n",
" 198,73 \n",
" 71,20 \n",
" \n",
" \n",
" 132 \n",
" mar.2021 \n",
" 8.148,21 \n",
" 246,92 \n",
" 1.646,38 \n",
" 5.978,29 \n",
" 204,48 \n",
" 72,14 \n",
" \n",
" \n",
"
\n",
"
133 rows × 7 columns
\n",
"
"
],
"text/plain": [
" Periodo 1.Total 2.Empleadores 3.Cuenta Propia 4.Asalariados \\\n",
"0 mar.2010 7.156,21 318,32 1.289,68 5.141,76 \n",
"1 abr.2010 7.198,78 324,94 1.332,33 5.114,80 \n",
"2 may.2010 7.181,90 326,95 1.346,54 5.080,65 \n",
"3 jun.2010 7.221,58 328,03 1.384,28 5.074,00 \n",
"4 jul.2010 7.256,52 333,82 1.390,03 5.081,93 \n",
".. ... ... ... ... ... \n",
"128 nov.2020 7.916,72 248,89 1.568,05 5.833,68 \n",
"129 dic.2020 8.026,22 234,57 1.588,14 5.927,28 \n",
"130 ene.2021 8.121,42 237,25 1.610,63 6.000,74 \n",
"131 feb.2021 8.167,62 245,25 1.634,08 6.018,35 \n",
"132 mar.2021 8.148,21 246,92 1.646,38 5.978,29 \n",
"\n",
" 5.Personal de servicio 6.Familiar no remunerado \n",
"0 325,38 81,08 \n",
"1 331,31 95,39 \n",
"2 328,56 99,21 \n",
"3 327,60 107,68 \n",
"4 339,44 111,29 \n",
".. ... ... \n",
"128 188,43 77,66 \n",
"129 194,91 81,32 \n",
"130 197,43 75,36 \n",
"131 198,73 71,20 \n",
"132 204,48 72,14 \n",
"\n",
"[133 rows x 7 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\"/home/felix/Dropbox/Computational_Economics/Intro_python/2021_S2/Clases/clase13_base1.csv\", delimiter=\";\")"
]
},
{
"cell_type": "markdown",
"id": "62f460b5",
"metadata": {},
"source": [
"Si el archivo se encuentra en la misma carpeta que el Jupyter se puede llamar sólo con el nombre del csv. "
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "a582bb2f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Periodo \n",
" 1.Total \n",
" 2.Empleadores \n",
" 3.Cuenta Propia \n",
" 4.Asalariados \n",
" 5.Personal de servicio \n",
" 6.Familiar no remunerado \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" mar.2010 \n",
" 7.156,21 \n",
" 318,32 \n",
" 1.289,68 \n",
" 5.141,76 \n",
" 325,38 \n",
" 81,08 \n",
" \n",
" \n",
" 1 \n",
" abr.2010 \n",
" 7.198,78 \n",
" 324,94 \n",
" 1.332,33 \n",
" 5.114,80 \n",
" 331,31 \n",
" 95,39 \n",
" \n",
" \n",
" 2 \n",
" may.2010 \n",
" 7.181,90 \n",
" 326,95 \n",
" 1.346,54 \n",
" 5.080,65 \n",
" 328,56 \n",
" 99,21 \n",
" \n",
" \n",
" 3 \n",
" jun.2010 \n",
" 7.221,58 \n",
" 328,03 \n",
" 1.384,28 \n",
" 5.074,00 \n",
" 327,60 \n",
" 107,68 \n",
" \n",
" \n",
" 4 \n",
" jul.2010 \n",
" 7.256,52 \n",
" 333,82 \n",
" 1.390,03 \n",
" 5.081,93 \n",
" 339,44 \n",
" 111,29 \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 128 \n",
" nov.2020 \n",
" 7.916,72 \n",
" 248,89 \n",
" 1.568,05 \n",
" 5.833,68 \n",
" 188,43 \n",
" 77,66 \n",
" \n",
" \n",
" 129 \n",
" dic.2020 \n",
" 8.026,22 \n",
" 234,57 \n",
" 1.588,14 \n",
" 5.927,28 \n",
" 194,91 \n",
" 81,32 \n",
" \n",
" \n",
" 130 \n",
" ene.2021 \n",
" 8.121,42 \n",
" 237,25 \n",
" 1.610,63 \n",
" 6.000,74 \n",
" 197,43 \n",
" 75,36 \n",
" \n",
" \n",
" 131 \n",
" feb.2021 \n",
" 8.167,62 \n",
" 245,25 \n",
" 1.634,08 \n",
" 6.018,35 \n",
" 198,73 \n",
" 71,20 \n",
" \n",
" \n",
" 132 \n",
" mar.2021 \n",
" 8.148,21 \n",
" 246,92 \n",
" 1.646,38 \n",
" 5.978,29 \n",
" 204,48 \n",
" 72,14 \n",
" \n",
" \n",
"
\n",
"
133 rows × 7 columns
\n",
"
"
],
"text/plain": [
" Periodo 1.Total 2.Empleadores 3.Cuenta Propia 4.Asalariados \\\n",
"0 mar.2010 7.156,21 318,32 1.289,68 5.141,76 \n",
"1 abr.2010 7.198,78 324,94 1.332,33 5.114,80 \n",
"2 may.2010 7.181,90 326,95 1.346,54 5.080,65 \n",
"3 jun.2010 7.221,58 328,03 1.384,28 5.074,00 \n",
"4 jul.2010 7.256,52 333,82 1.390,03 5.081,93 \n",
".. ... ... ... ... ... \n",
"128 nov.2020 7.916,72 248,89 1.568,05 5.833,68 \n",
"129 dic.2020 8.026,22 234,57 1.588,14 5.927,28 \n",
"130 ene.2021 8.121,42 237,25 1.610,63 6.000,74 \n",
"131 feb.2021 8.167,62 245,25 1.634,08 6.018,35 \n",
"132 mar.2021 8.148,21 246,92 1.646,38 5.978,29 \n",
"\n",
" 5.Personal de servicio 6.Familiar no remunerado \n",
"0 325,38 81,08 \n",
"1 331,31 95,39 \n",
"2 328,56 99,21 \n",
"3 327,60 107,68 \n",
"4 339,44 111,29 \n",
".. ... ... \n",
"128 188,43 77,66 \n",
"129 194,91 81,32 \n",
"130 197,43 75,36 \n",
"131 198,73 71,20 \n",
"132 204,48 72,14 \n",
"\n",
"[133 rows x 7 columns]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(\"clase13_base1.csv\", delimiter=\";\")"
]
},
{
"cell_type": "markdown",
"id": "a067be00",
"metadata": {},
"source": [
"El resultado es una estructura del tipo `fila-columna` que podemos guardar en una variable que llamaremos `DataFrame`. \n",
"\n",
"Un `DataFrame` corresponde a una estructura de datos del tipo fila-columna (similar a una hoja de excel) en el que podemos guardar información de diferentes `types`. Los DataFrame tienen un índice en la primera columna que parte en 0. \n",
"\n",
"Nuestro DataFrame tiene 133 filas y 7 columnas, donde el índice va de 0 a 132. \n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "3d93a8f7",
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"clase13_base1.csv\", delimiter=\";\")"
]
},
{
"cell_type": "markdown",
"id": "475b4393",
"metadata": {},
"source": [
"## 2. Accediendo a un DataFrame\n",
"La primera mirada a nustros datos se la vamos a dar con la función `head()`, esta nos muestra un resumen de la tabla. Para esto ponemos \"nombre del DataFrame\"+ \".\" + \"head()\".\n",
"\n",
"Esto nos va a mostrar las columnas de la base y las primeras 5 filas. "
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "ac3a0641",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Periodo \n",
" 1.Total \n",
" 2.Empleadores \n",
" 3.Cuenta Propia \n",
" 4.Asalariados \n",
" 5.Personal de servicio \n",
" 6.Familiar no remunerado \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" mar.2010 \n",
" 7.156,21 \n",
" 318,32 \n",
" 1.289,68 \n",
" 5.141,76 \n",
" 325,38 \n",
" 81,08 \n",
" \n",
" \n",
" 1 \n",
" abr.2010 \n",
" 7.198,78 \n",
" 324,94 \n",
" 1.332,33 \n",
" 5.114,80 \n",
" 331,31 \n",
" 95,39 \n",
" \n",
" \n",
" 2 \n",
" may.2010 \n",
" 7.181,90 \n",
" 326,95 \n",
" 1.346,54 \n",
" 5.080,65 \n",
" 328,56 \n",
" 99,21 \n",
" \n",
" \n",
" 3 \n",
" jun.2010 \n",
" 7.221,58 \n",
" 328,03 \n",
" 1.384,28 \n",
" 5.074,00 \n",
" 327,60 \n",
" 107,68 \n",
" \n",
" \n",
" 4 \n",
" jul.2010 \n",
" 7.256,52 \n",
" 333,82 \n",
" 1.390,03 \n",
" 5.081,93 \n",
" 339,44 \n",
" 111,29 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Periodo 1.Total 2.Empleadores 3.Cuenta Propia 4.Asalariados \\\n",
"0 mar.2010 7.156,21 318,32 1.289,68 5.141,76 \n",
"1 abr.2010 7.198,78 324,94 1.332,33 5.114,80 \n",
"2 may.2010 7.181,90 326,95 1.346,54 5.080,65 \n",
"3 jun.2010 7.221,58 328,03 1.384,28 5.074,00 \n",
"4 jul.2010 7.256,52 333,82 1.390,03 5.081,93 \n",
"\n",
" 5.Personal de servicio 6.Familiar no remunerado \n",
"0 325,38 81,08 \n",
"1 331,31 95,39 \n",
"2 328,56 99,21 \n",
"3 327,60 107,68 \n",
"4 339,44 111,29 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "ef5a33e8",
"metadata": {},
"source": [
"Podemos decir específicamente cuántas filas queremos ver colocando el número dentro del paréntesis de head(10)."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "7de37363",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Periodo \n",
" 1.Total \n",
" 2.Empleadores \n",
" 3.Cuenta Propia \n",
" 4.Asalariados \n",
" 5.Personal de servicio \n",
" 6.Familiar no remunerado \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" mar.2010 \n",
" 7.156,21 \n",
" 318,32 \n",
" 1.289,68 \n",
" 5.141,76 \n",
" 325,38 \n",
" 81,08 \n",
" \n",
" \n",
" 1 \n",
" abr.2010 \n",
" 7.198,78 \n",
" 324,94 \n",
" 1.332,33 \n",
" 5.114,80 \n",
" 331,31 \n",
" 95,39 \n",
" \n",
" \n",
" 2 \n",
" may.2010 \n",
" 7.181,90 \n",
" 326,95 \n",
" 1.346,54 \n",
" 5.080,65 \n",
" 328,56 \n",
" 99,21 \n",
" \n",
" \n",
" 3 \n",
" jun.2010 \n",
" 7.221,58 \n",
" 328,03 \n",
" 1.384,28 \n",
" 5.074,00 \n",
" 327,60 \n",
" 107,68 \n",
" \n",
" \n",
" 4 \n",
" jul.2010 \n",
" 7.256,52 \n",
" 333,82 \n",
" 1.390,03 \n",
" 5.081,93 \n",
" 339,44 \n",
" 111,29 \n",
" \n",
" \n",
" 5 \n",
" ago.2010 \n",
" 7.289,22 \n",
" 333,77 \n",
" 1.430,97 \n",
" 5.080,20 \n",
" 339,49 \n",
" 104,79 \n",
" \n",
" \n",
" 6 \n",
" sep.2010 \n",
" 7.389,47 \n",
" 339,06 \n",
" 1.477,55 \n",
" 5.130,21 \n",
" 338,02 \n",
" 104,63 \n",
" \n",
" \n",
" 7 \n",
" oct.2010 \n",
" 7.414,43 \n",
" 343,68 \n",
" 1.485,61 \n",
" 5.150,80 \n",
" 331,05 \n",
" 103,29 \n",
" \n",
" \n",
" 8 \n",
" nov.2010 \n",
" 7.503,09 \n",
" 347,12 \n",
" 1.486,07 \n",
" 5.216,86 \n",
" 341,96 \n",
" 111,08 \n",
" \n",
" \n",
" 9 \n",
" dic.2010 \n",
" 7.572,32 \n",
" 340,32 \n",
" 1.486,32 \n",
" 5.294,66 \n",
" 342,20 \n",
" 108,82 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Periodo 1.Total 2.Empleadores 3.Cuenta Propia 4.Asalariados \\\n",
"0 mar.2010 7.156,21 318,32 1.289,68 5.141,76 \n",
"1 abr.2010 7.198,78 324,94 1.332,33 5.114,80 \n",
"2 may.2010 7.181,90 326,95 1.346,54 5.080,65 \n",
"3 jun.2010 7.221,58 328,03 1.384,28 5.074,00 \n",
"4 jul.2010 7.256,52 333,82 1.390,03 5.081,93 \n",
"5 ago.2010 7.289,22 333,77 1.430,97 5.080,20 \n",
"6 sep.2010 7.389,47 339,06 1.477,55 5.130,21 \n",
"7 oct.2010 7.414,43 343,68 1.485,61 5.150,80 \n",
"8 nov.2010 7.503,09 347,12 1.486,07 5.216,86 \n",
"9 dic.2010 7.572,32 340,32 1.486,32 5.294,66 \n",
"\n",
" 5.Personal de servicio 6.Familiar no remunerado \n",
"0 325,38 81,08 \n",
"1 331,31 95,39 \n",
"2 328,56 99,21 \n",
"3 327,60 107,68 \n",
"4 339,44 111,29 \n",
"5 339,49 104,79 \n",
"6 338,02 104,63 \n",
"7 331,05 103,29 \n",
"8 341,96 111,08 \n",
"9 342,20 108,82 "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(10)"
]
},
{
"cell_type": "markdown",
"id": "17389080",
"metadata": {},
"source": [
"La función `dtypes` nos va a describir la información dentro de la base de datos."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "bd42a4c4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Periodo object\n",
"1.Total object\n",
"2.Empleadores object\n",
"3.Cuenta Propia object\n",
"4.Asalariados object\n",
"5.Personal de servicio object\n",
"6.Familiar no remunerado object\n",
"dtype: object"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"id": "f3370db9",
"metadata": {},
"source": [
"El type `object` corresponde a un dato del tipo texto, como una palabra. En este caso es poco intuitivo frente al tipo de datos que estamos usando. Deberíamos esperar que la base fuese en su mayoría del tipo numérico (Float, Int). \n",
"\n",
"Para esto podemos especificar dos cosas: \n",
"- Decimal: usamos el argumento `decimal=\"separador\"`. \n",
"- Separador de miles: usamos el argumento `thousands=\"separador\"`\n",
"\n",
"Esto es relevante porque según la configuración del computador e idioma las bases pueden venir con separadores \".\" o \",\". En nuestro caso la base viene con separador de decimal \",\" y con separador de miles \".\". "
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "31f8de1a",
"metadata": {},
"outputs": [],
"source": [
"#1. Guarda el DataFrame, muestra las columnas y la cantidad de filas y columnas"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "d56722e3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Periodo object\n",
"1.Total float64\n",
"2.Empleadores float64\n",
"3.Cuenta Propia float64\n",
"4.Asalariados float64\n",
"5.Personal de servicio float64\n",
"6.Familiar no remunerado float64\n",
"dtype: object\n",
"Index(['Periodo', '1.Total', '2.Empleadores', '3.Cuenta Propia',\n",
" '4.Asalariados', '5.Personal de servicio', '6.Familiar no remunerado'],\n",
" dtype='object')\n",
"(133, 7)\n"
]
}
],
"source": [
"#IMportar datos\n",
"df = pd.read_csv(\"clase13_base1.csv\", delimiter=\";\", decimal=\",\", thousands='.')\n",
"#Muestra los tipos\n",
"print(df.dtypes)\n",
"#Muestra columnas\n",
"print(df.columns)\n",
"#Mostrar N fila- M columna\n",
"print(df.shape)"
]
},
{
"cell_type": "markdown",
"id": "0a50faa9",
"metadata": {},
"source": [
"Para ver las columnas del DataFrame usamos `columns`"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "72cfb751",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Periodo', '1.Total', '2.Empleadores', '3.Cuenta Propia',\n",
" '4.Asalariados', '5.Personal de servicio', '6.Familiar no remunerado'],\n",
" dtype='object')"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "markdown",
"id": "e402d915",
"metadata": {},
"source": [
"Las dimensiones fila-columna las podemos ver mediante `shape`. Esta viene en formato tupla (fila,columna)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "6cf31910",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(133, 7)"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "markdown",
"id": "1e9ca181",
"metadata": {},
"source": [
"Para ver el final de la tabla podemos usar `tail()`"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "1f8481d9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Periodo \n",
" 1.Total \n",
" 2.Empleadores \n",
" 3.Cuenta Propia \n",
" 4.Asalariados \n",
" 5.Personal de servicio \n",
" 6.Familiar no remunerado \n",
" \n",
" \n",
" \n",
" \n",
" 128 \n",
" nov.2020 \n",
" 7916.72 \n",
" 248.89 \n",
" 1568.05 \n",
" 5833.68 \n",
" 188.43 \n",
" 77.66 \n",
" \n",
" \n",
" 129 \n",
" dic.2020 \n",
" 8026.22 \n",
" 234.57 \n",
" 1588.14 \n",
" 5927.28 \n",
" 194.91 \n",
" 81.32 \n",
" \n",
" \n",
" 130 \n",
" ene.2021 \n",
" 8121.42 \n",
" 237.25 \n",
" 1610.63 \n",
" 6000.74 \n",
" 197.43 \n",
" 75.36 \n",
" \n",
" \n",
" 131 \n",
" feb.2021 \n",
" 8167.62 \n",
" 245.25 \n",
" 1634.08 \n",
" 6018.35 \n",
" 198.73 \n",
" 71.20 \n",
" \n",
" \n",
" 132 \n",
" mar.2021 \n",
" 8148.21 \n",
" 246.92 \n",
" 1646.38 \n",
" 5978.29 \n",
" 204.48 \n",
" 72.14 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Periodo 1.Total 2.Empleadores 3.Cuenta Propia 4.Asalariados \\\n",
"128 nov.2020 7916.72 248.89 1568.05 5833.68 \n",
"129 dic.2020 8026.22 234.57 1588.14 5927.28 \n",
"130 ene.2021 8121.42 237.25 1610.63 6000.74 \n",
"131 feb.2021 8167.62 245.25 1634.08 6018.35 \n",
"132 mar.2021 8148.21 246.92 1646.38 5978.29 \n",
"\n",
" 5.Personal de servicio 6.Familiar no remunerado \n",
"128 188.43 77.66 \n",
"129 194.91 81.32 \n",
"130 197.43 75.36 \n",
"131 198.73 71.20 \n",
"132 204.48 72.14 "
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail()"
]
},
{
"cell_type": "markdown",
"id": "7f902b5c",
"metadata": {},
"source": [
"Para revisar una columna en específico podemos usar diferentes mecanismos"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "56a047d5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 mar.2010\n",
"1 abr.2010\n",
"2 may.2010\n",
"3 jun.2010\n",
"4 jul.2010\n",
" ... \n",
"128 nov.2020\n",
"129 dic.2020\n",
"130 ene.2021\n",
"131 feb.2021\n",
"132 mar.2021\n",
"Name: Periodo, Length: 133, dtype: object"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Caso 1\n",
"df['Periodo']\n",
"#Caso 2\n",
"df.Periodo"
]
},
{
"cell_type": "markdown",
"id": "a754cbf2",
"metadata": {},
"source": [
"¿Qué pasa cuando el nombre de nuestra columna viene con espacios? ¿Podemos usar el caso 2? "
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "e6cf0e5b",
"metadata": {},
"outputs": [
{
"ename": "SyntaxError",
"evalue": "invalid syntax (, line 4)",
"output_type": "error",
"traceback": [
"\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m4\u001b[0m\n\u001b[0;31m df.'3.Cuenta Propia'\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
]
}
],
"source": [
"#Funciona\n",
"df['3.Cuenta Propia']\n",
"#No Funciona\n",
"df.'3.Cuenta Propia'"
]
},
{
"cell_type": "markdown",
"id": "c06f4841",
"metadata": {},
"source": [
"Por esta razón es fundamental que los nombres sean simples, en caso que tengan más de una palabra separar con \"_\". "
]
},
{
"cell_type": "markdown",
"id": "fe135172",
"metadata": {},
"source": [
"## 3. Manipulando el DataFrame"
]
},
{
"cell_type": "markdown",
"id": "ab493bf9",
"metadata": {},
"source": [
"Lo primero que haremos es modificar el nombre de las variables. Para esto podemos usar el la función `rename()` y un `diccionario` con {'nombre_antiguo':'nombre_nuevo'}."
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "c8fd9c5f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Periodo \n",
" TOT \n",
" EMP \n",
" CP \n",
" ASA \n",
" PdS \n",
" FnR \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" mar.2010 \n",
" 7156.21 \n",
" 318.32 \n",
" 1289.68 \n",
" 5141.76 \n",
" 325.38 \n",
" 81.08 \n",
" \n",
" \n",
" 1 \n",
" abr.2010 \n",
" 7198.78 \n",
" 324.94 \n",
" 1332.33 \n",
" 5114.80 \n",
" 331.31 \n",
" 95.39 \n",
" \n",
" \n",
" 2 \n",
" may.2010 \n",
" 7181.90 \n",
" 326.95 \n",
" 1346.54 \n",
" 5080.65 \n",
" 328.56 \n",
" 99.21 \n",
" \n",
" \n",
" 3 \n",
" jun.2010 \n",
" 7221.58 \n",
" 328.03 \n",
" 1384.28 \n",
" 5074.00 \n",
" 327.60 \n",
" 107.68 \n",
" \n",
" \n",
" 4 \n",
" jul.2010 \n",
" 7256.52 \n",
" 333.82 \n",
" 1390.03 \n",
" 5081.93 \n",
" 339.44 \n",
" 111.29 \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 128 \n",
" nov.2020 \n",
" 7916.72 \n",
" 248.89 \n",
" 1568.05 \n",
" 5833.68 \n",
" 188.43 \n",
" 77.66 \n",
" \n",
" \n",
" 129 \n",
" dic.2020 \n",
" 8026.22 \n",
" 234.57 \n",
" 1588.14 \n",
" 5927.28 \n",
" 194.91 \n",
" 81.32 \n",
" \n",
" \n",
" 130 \n",
" ene.2021 \n",
" 8121.42 \n",
" 237.25 \n",
" 1610.63 \n",
" 6000.74 \n",
" 197.43 \n",
" 75.36 \n",
" \n",
" \n",
" 131 \n",
" feb.2021 \n",
" 8167.62 \n",
" 245.25 \n",
" 1634.08 \n",
" 6018.35 \n",
" 198.73 \n",
" 71.20 \n",
" \n",
" \n",
" 132 \n",
" mar.2021 \n",
" 8148.21 \n",
" 246.92 \n",
" 1646.38 \n",
" 5978.29 \n",
" 204.48 \n",
" 72.14 \n",
" \n",
" \n",
"
\n",
"
133 rows × 7 columns
\n",
"
"
],
"text/plain": [
" Periodo TOT EMP CP ASA PdS FnR\n",
"0 mar.2010 7156.21 318.32 1289.68 5141.76 325.38 81.08\n",
"1 abr.2010 7198.78 324.94 1332.33 5114.80 331.31 95.39\n",
"2 may.2010 7181.90 326.95 1346.54 5080.65 328.56 99.21\n",
"3 jun.2010 7221.58 328.03 1384.28 5074.00 327.60 107.68\n",
"4 jul.2010 7256.52 333.82 1390.03 5081.93 339.44 111.29\n",
".. ... ... ... ... ... ... ...\n",
"128 nov.2020 7916.72 248.89 1568.05 5833.68 188.43 77.66\n",
"129 dic.2020 8026.22 234.57 1588.14 5927.28 194.91 81.32\n",
"130 ene.2021 8121.42 237.25 1610.63 6000.74 197.43 75.36\n",
"131 feb.2021 8167.62 245.25 1634.08 6018.35 198.73 71.20\n",
"132 mar.2021 8148.21 246.92 1646.38 5978.29 204.48 72.14\n",
"\n",
"[133 rows x 7 columns]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.rename(columns={'1.Total': 'TOT', '2.Empleadores':'EMP', '3.Cuenta Propia':'CP', '4.Asalariados':'ASA', '5.Personal de servicio':'PdS', '6.Familiar no remunerado':'FnR'})"
]
},
{
"cell_type": "markdown",
"id": "65268ef6",
"metadata": {},
"source": [
"Si hacemos sólo `df.rename()` no se modifica el DataFrame, entonces tenemos dos opciones: 1) creamos uno nuevo o 2) modificamos el que ya existe. "
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "1e08da99",
"metadata": {},
"outputs": [],
"source": [
"#1. Creamos lun DF nuevo \n",
"df2 = df.rename(columns={'1.Total': 'TOT', '2.Empleadores':'EMP', '3.Cuenta Propia':'CP', '4.Asalariados':'ASA', '5.Personal de servicio':'PdS', '6.Familiar no remunerado':'FnR'})\n",
"\n",
"#2. Modificamos el que existe\n",
"df = df.rename(columns={'1.Total': 'TOT', '2.Empleadores':'EMP', '3.Cuenta Propia':'CP', '4.Asalariados':'ASA', '5.Personal de servicio':'PdS', '6.Familiar no remunerado':'FnR'})"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "d9fb6db7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Periodo \n",
" TOT \n",
" EMP \n",
" CP \n",
" ASA \n",
" PdS \n",
" FnR \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" mar.2010 \n",
" 7156.21 \n",
" 318.32 \n",
" 1289.68 \n",
" 5141.76 \n",
" 325.38 \n",
" 81.08 \n",
" \n",
" \n",
" 1 \n",
" abr.2010 \n",
" 7198.78 \n",
" 324.94 \n",
" 1332.33 \n",
" 5114.80 \n",
" 331.31 \n",
" 95.39 \n",
" \n",
" \n",
" 2 \n",
" may.2010 \n",
" 7181.90 \n",
" 326.95 \n",
" 1346.54 \n",
" 5080.65 \n",
" 328.56 \n",
" 99.21 \n",
" \n",
" \n",
" 3 \n",
" jun.2010 \n",
" 7221.58 \n",
" 328.03 \n",
" 1384.28 \n",
" 5074.00 \n",
" 327.60 \n",
" 107.68 \n",
" \n",
" \n",
" 4 \n",
" jul.2010 \n",
" 7256.52 \n",
" 333.82 \n",
" 1390.03 \n",
" 5081.93 \n",
" 339.44 \n",
" 111.29 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Periodo TOT EMP CP ASA PdS FnR\n",
"0 mar.2010 7156.21 318.32 1289.68 5141.76 325.38 81.08\n",
"1 abr.2010 7198.78 324.94 1332.33 5114.80 331.31 95.39\n",
"2 may.2010 7181.90 326.95 1346.54 5080.65 328.56 99.21\n",
"3 jun.2010 7221.58 328.03 1384.28 5074.00 327.60 107.68\n",
"4 jul.2010 7256.52 333.82 1390.03 5081.93 339.44 111.29"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "3fba8e83",
"metadata": {},
"source": [
"Una mirada inicial a una variable podemos darla con la función `describe()`. Esta función nos entrega una resumen estadístico de la variable. "
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "9ce90f2c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 133.000000\n",
"mean 8187.411504\n",
"std 504.204366\n",
"min 7073.190000\n",
"25% 7844.780000\n",
"50% 8202.890000\n",
"75% 8535.210000\n",
"max 9118.180000\n",
"Name: TOT, dtype: float64"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#1. Llamamos una variable df.variable\n",
"df.TOT.describe()\n",
"\n",
"#2. Llamamos df['variable']\n",
"df['TOT'].describe()"
]
},
{
"cell_type": "markdown",
"id": "518e6902",
"metadata": {},
"source": [
"Podemos sacar una estadística en particular"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "a18493f9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"min: 7073.19\n",
"max: 9118.18\n",
"mean: 8187.411503759398\n",
"std: 504.20436565714255\n",
"count: 133\n"
]
}
],
"source": [
"print(\"min:\", df.TOT.min())\n",
"print(\"max:\", df.TOT.max())\n",
"print(\"mean:\", df.TOT.mean())\n",
"print(\"std:\", df.TOT.std())\n",
"print(\"count:\", df.TOT.count())"
]
},
{
"cell_type": "markdown",
"id": "c3673349",
"metadata": {},
"source": [
"La variable `Periodo` sigue siendo del tipo Objeto (texto), podemos crear una variable del tipo fecha. Para esto vamos a hacer dos cosas: 1) crear una variable en formato fecha y 2) agregar esta variable al DataFrame. \n",
"\n",
"- Para crear una variable del tipo fecha podemos usar la función `date_range(fecha_inicio, periodos, frecuencia)`. En el siguiente link (link 3) encuentran detalle de como variables del tipo fecha en un DataFrame: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html. \n",
"- Para anexar una variable al DataFrame colocamos `df['nombre_variable'] = variable`."
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "4c6c2477",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Periodo object\n",
"TOT float64\n",
"EMP float64\n",
"CP float64\n",
"ASA float64\n",
"PdS float64\n",
"FnR float64\n",
"Date datetime64[ns]\n",
"dtype: object"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Creamos una variable del tipo fecha con la función\n",
"Date = pd.date_range(\"2010-03-01\", periods=133, freq=\"M\")\n",
"\n",
"#Anexamos la variable nueva\n",
"df['Date'] = Date\n",
"\n",
"#Vemos el resultado\n",
"df.dtypes"
]
},
{
"cell_type": "markdown",
"id": "1db536ae",
"metadata": {},
"source": [
"Vamos a guardar los meses y años por separados en el DataFrame. Para esto utilizamos `DatetimeIndex` (abreviamos dt) que nos permite extraer el segundos/dia/mes/año de una variable del tipo `datetime`, por ejemplo usando `month` y `year`."
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "78b75d8c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Periodo object\n",
"TOT float64\n",
"EMP float64\n",
"CP float64\n",
"ASA float64\n",
"PdS float64\n",
"FnR float64\n",
"Date datetime64[ns]\n",
"mes int64\n",
"year int64\n",
"dtype: object\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Periodo \n",
" TOT \n",
" EMP \n",
" CP \n",
" ASA \n",
" PdS \n",
" FnR \n",
" Date \n",
" mes \n",
" year \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" mar.2010 \n",
" 7156.21 \n",
" 318.32 \n",
" 1289.68 \n",
" 5141.76 \n",
" 325.38 \n",
" 81.08 \n",
" 2010-03-31 \n",
" 3 \n",
" 2010 \n",
" \n",
" \n",
" 1 \n",
" abr.2010 \n",
" 7198.78 \n",
" 324.94 \n",
" 1332.33 \n",
" 5114.80 \n",
" 331.31 \n",
" 95.39 \n",
" 2010-04-30 \n",
" 4 \n",
" 2010 \n",
" \n",
" \n",
" 2 \n",
" may.2010 \n",
" 7181.90 \n",
" 326.95 \n",
" 1346.54 \n",
" 5080.65 \n",
" 328.56 \n",
" 99.21 \n",
" 2010-05-31 \n",
" 5 \n",
" 2010 \n",
" \n",
" \n",
" 3 \n",
" jun.2010 \n",
" 7221.58 \n",
" 328.03 \n",
" 1384.28 \n",
" 5074.00 \n",
" 327.60 \n",
" 107.68 \n",
" 2010-06-30 \n",
" 6 \n",
" 2010 \n",
" \n",
" \n",
" 4 \n",
" jul.2010 \n",
" 7256.52 \n",
" 333.82 \n",
" 1390.03 \n",
" 5081.93 \n",
" 339.44 \n",
" 111.29 \n",
" 2010-07-31 \n",
" 7 \n",
" 2010 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Periodo TOT EMP CP ASA PdS FnR Date \\\n",
"0 mar.2010 7156.21 318.32 1289.68 5141.76 325.38 81.08 2010-03-31 \n",
"1 abr.2010 7198.78 324.94 1332.33 5114.80 331.31 95.39 2010-04-30 \n",
"2 may.2010 7181.90 326.95 1346.54 5080.65 328.56 99.21 2010-05-31 \n",
"3 jun.2010 7221.58 328.03 1384.28 5074.00 327.60 107.68 2010-06-30 \n",
"4 jul.2010 7256.52 333.82 1390.03 5081.93 339.44 111.29 2010-07-31 \n",
"\n",
" mes year \n",
"0 3 2010 \n",
"1 4 2010 \n",
"2 5 2010 \n",
"3 6 2010 \n",
"4 7 2010 "
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Guardamos el mes en una variable \n",
"df['mes'] = df['Date'].dt.month\n",
"\n",
"#Guardamos el año en una variable\n",
"df['year'] = df['Date'].dt.year\n",
"\n",
"#Vemos el resultado\n",
"print(df.dtypes)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "3dbabc89",
"metadata": {},
"source": [
"Podemos agrupar una variable mediante `groupby`. Luego podemos aplicar funciones básicas como `mean()`, `std()`, `sum()`, etc. "
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "8f87f1e3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" TOT \n",
" EMP \n",
" CP \n",
" ASA \n",
" PdS \n",
" FnR \n",
" mes \n",
" \n",
" \n",
" year \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 2010 \n",
" 7318.352000 \n",
" 333.601000 \n",
" 1410.938000 \n",
" 5136.587000 \n",
" 334.501000 \n",
" 102.726000 \n",
" 7.5 \n",
" \n",
" \n",
" 2011 \n",
" 7676.545833 \n",
" 349.602500 \n",
" 1488.466667 \n",
" 5382.360833 \n",
" 359.097500 \n",
" 97.016667 \n",
" 6.5 \n",
" \n",
" \n",
" 2012 \n",
" 7858.610833 \n",
" 323.438333 \n",
" 1460.165833 \n",
" 5626.480000 \n",
" 353.575000 \n",
" 94.954167 \n",
" 6.5 \n",
" \n",
" \n",
" 2013 \n",
" 8023.047500 \n",
" 334.418333 \n",
" 1484.029167 \n",
" 5766.081667 \n",
" 336.561667 \n",
" 101.956667 \n",
" 6.5 \n",
" \n",
" \n",
" 2014 \n",
" 8150.107500 \n",
" 338.969167 \n",
" 1569.675000 \n",
" 5800.595833 \n",
" 336.685000 \n",
" 104.185000 \n",
" 6.5 \n",
" \n",
" \n",
" 2015 \n",
" 8273.739167 \n",
" 344.422500 \n",
" 1593.640833 \n",
" 5919.330833 \n",
" 319.370833 \n",
" 96.973333 \n",
" 6.5 \n",
" \n",
" \n",
" 2016 \n",
" 8391.921667 \n",
" 334.027500 \n",
" 1686.861667 \n",
" 5945.020000 \n",
" 330.251667 \n",
" 95.760000 \n",
" 6.5 \n",
" \n",
" \n",
" 2017 \n",
" 8574.332500 \n",
" 371.210833 \n",
" 1777.026667 \n",
" 6008.260833 \n",
" 323.455000 \n",
" 94.375833 \n",
" 6.5 \n",
" \n",
" \n",
" 2018 \n",
" 8773.940000 \n",
" 363.481667 \n",
" 1800.942500 \n",
" 6185.576667 \n",
" 323.193333 \n",
" 100.750000 \n",
" 6.5 \n",
" \n",
" \n",
" 2019 \n",
" 8953.679167 \n",
" 368.760000 \n",
" 1855.963333 \n",
" 6320.787500 \n",
" 319.305833 \n",
" 88.863333 \n",
" 6.5 \n",
" \n",
" \n",
" 2020 \n",
" 7932.822500 \n",
" 274.867500 \n",
" 1495.655000 \n",
" 5878.350000 \n",
" 214.355000 \n",
" 69.597500 \n",
" 6.5 \n",
" \n",
" \n",
" 2021 \n",
" 8145.750000 \n",
" 243.140000 \n",
" 1630.363333 \n",
" 5999.126667 \n",
" 200.213333 \n",
" 72.900000 \n",
" 2.0 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" TOT EMP CP ASA PdS \\\n",
"year \n",
"2010 7318.352000 333.601000 1410.938000 5136.587000 334.501000 \n",
"2011 7676.545833 349.602500 1488.466667 5382.360833 359.097500 \n",
"2012 7858.610833 323.438333 1460.165833 5626.480000 353.575000 \n",
"2013 8023.047500 334.418333 1484.029167 5766.081667 336.561667 \n",
"2014 8150.107500 338.969167 1569.675000 5800.595833 336.685000 \n",
"2015 8273.739167 344.422500 1593.640833 5919.330833 319.370833 \n",
"2016 8391.921667 334.027500 1686.861667 5945.020000 330.251667 \n",
"2017 8574.332500 371.210833 1777.026667 6008.260833 323.455000 \n",
"2018 8773.940000 363.481667 1800.942500 6185.576667 323.193333 \n",
"2019 8953.679167 368.760000 1855.963333 6320.787500 319.305833 \n",
"2020 7932.822500 274.867500 1495.655000 5878.350000 214.355000 \n",
"2021 8145.750000 243.140000 1630.363333 5999.126667 200.213333 \n",
"\n",
" FnR mes \n",
"year \n",
"2010 102.726000 7.5 \n",
"2011 97.016667 6.5 \n",
"2012 94.954167 6.5 \n",
"2013 101.956667 6.5 \n",
"2014 104.185000 6.5 \n",
"2015 96.973333 6.5 \n",
"2016 95.760000 6.5 \n",
"2017 94.375833 6.5 \n",
"2018 100.750000 6.5 \n",
"2019 88.863333 6.5 \n",
"2020 69.597500 6.5 \n",
"2021 72.900000 2.0 "
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_agrupado = df.groupby('year')\n",
"df_agrupado.mean()"
]
},
{
"cell_type": "markdown",
"id": "88c9792e",
"metadata": {},
"source": [
"Lo anterior podemos hacerlo sobre una o un grupo de variables específicas. Para esto hacemos lo siguiente: \n",
"- Seleccionamos las variablers sobre las que vamos a trabajar mediante una lista: df[['TOT', 'year']]. Tiene que estar la variable sobre la que quiero tener el análisis (TOT) y la que voy a agrupar (year). \n",
"- Aplicamos la función para agrupar y la variable: groupby('year'). "
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "707f372b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" TOT \n",
" \n",
" \n",
" year \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 2010 \n",
" 7318.352000 \n",
" \n",
" \n",
" 2011 \n",
" 7676.545833 \n",
" \n",
" \n",
" 2012 \n",
" 7858.610833 \n",
" \n",
" \n",
" 2013 \n",
" 8023.047500 \n",
" \n",
" \n",
" 2014 \n",
" 8150.107500 \n",
" \n",
" \n",
" 2015 \n",
" 8273.739167 \n",
" \n",
" \n",
" 2016 \n",
" 8391.921667 \n",
" \n",
" \n",
" 2017 \n",
" 8574.332500 \n",
" \n",
" \n",
" 2018 \n",
" 8773.940000 \n",
" \n",
" \n",
" 2019 \n",
" 8953.679167 \n",
" \n",
" \n",
" 2020 \n",
" 7932.822500 \n",
" \n",
" \n",
" 2021 \n",
" 8145.750000 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" TOT\n",
"year \n",
"2010 7318.352000\n",
"2011 7676.545833\n",
"2012 7858.610833\n",
"2013 8023.047500\n",
"2014 8150.107500\n",
"2015 8273.739167\n",
"2016 8391.921667\n",
"2017 8574.332500\n",
"2018 8773.940000\n",
"2019 8953.679167\n",
"2020 7932.822500\n",
"2021 8145.750000"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_agrupado = df[['TOT', 'year']].groupby('year')\n",
"df_agrupado.mean()"
]
},
{
"cell_type": "markdown",
"id": "1a9be3fd",
"metadata": {},
"source": [
"Si analizamos el resultado de df_agrupado.mean() tiene dos elementos: \n",
"- index: variable sobre la que se agrupó, esta la llamamos con `.index`\n",
"- Variables relevantes: sobre las que hicimos el análisis, en este caso TOT. La llamamos con ['TOT']\n",
"\n",
"$$f(x) = 10$$"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "293802cb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"index: Int64Index([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020,\n",
" 2021],\n",
" dtype='int64', name='year')\n",
"TOT: year\n",
"2010 7318.352000\n",
"2011 7676.545833\n",
"2012 7858.610833\n",
"2013 8023.047500\n",
"2014 8150.107500\n",
"2015 8273.739167\n",
"2016 8391.921667\n",
"2017 8574.332500\n",
"2018 8773.940000\n",
"2019 8953.679167\n",
"2020 7932.822500\n",
"2021 8145.750000\n",
"Name: TOT, dtype: float64\n"
]
}
],
"source": [
"#Index\n",
"print(\"index:\", df_agrupado.mean().index)\n",
"\n",
"#TOT\n",
"print(\"TOT:\", df_agrupado.mean()['TOT'])"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "f21b815f",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAP0UlEQVR4nO3df6zddX3H8efLVhBkTJBCSsssyzq3QjKBhqEuixszVN1WMkdSE6VbcJ0EN12WbO2yxC3aBBdnlDnYGtmATCX1x0I3gxvpJJuGwC5ChFKRKgwqHVxdnOgWFPbeH+fT5Nje9p5Lzz23936ej+TkfM/7fL/f83lzzn2dbz/fcw6pKiRJfXjRQg9AkjQ5hr4kdcTQl6SOGPqS1BFDX5I6snyhBzCbM844o9asWbPQw5CkReXee+/9ZlWtOLR+3If+mjVrmJqaWuhhSNKikuQ/Zqo7vSNJHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR057r+RK2lpWrP1s2Pf52PXvmns+1xqPNKXpI4Y+pLUEUNfkjrinL6kH+Jc+9Lmkb4kdcTQl6SOGPqS1BHn9KVFwrl2jYNH+pLUEUNfkjpi6EtSRwx9SeqIJ3KlMRj3SVZPsGq+eKQvSR3xSF9Lmh9zlH6YR/qS1BFDX5I64vSOFoTTLlpqFsvJ/JGO9JP8XpI9SR5M8okkL0lyepI7kjzSrk8bWn9bkn1JHk5y2VD9oiQPtPuuS5L5aEqSNLNZQz/JKuB3gfVVdT6wDNgEbAV2V9VaYHe7TZJ17f7zgA3A9UmWtd3dAGwB1rbLhrF2I0k6qlGnd5YDJyX5AXAy8CSwDXhdu/9m4E7gD4GNwK1V9SzwaJJ9wMVJHgNOraq7AJLcAlwO3D6ORjQeTrtIS9usR/pV9Q3gA8DjwAHgv6vqn4GzqupAW+cAcGbbZBXwxNAu9rfaqrZ8aP0wSbYkmUoyNT09PbeOJElHNOuRfpur3wicC3wb+GSStx5tkxlqdZT64cWqHcAOgPXr18+4jiSNYrGcYJ2UUU7k/hLwaFVNV9UPgM8ArwGeSrISoF0/3dbfD5wztP1qBtNB+9vyoXVJ0oSMMqf/OHBJkpOB/wUuBaaA7wGbgWvb9W1t/V3Ax5N8EDibwQnbe6rq+STPJLkEuBu4EviLcTaz1HnEIulYzRr6VXV3kk8BXwKeA+5jMPVyCrAzyVUM3hiuaOvvSbITeKitf01VPd92dzVwE3ASgxO4nsSVpAka6dM7VfUe4D2HlJ9lcNQ/0/rbge0z1KeA8+c4RknSmPgzDJLUEX+GYQyca5e0WHikL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SerIkv5ylv9DEEn6YR7pS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6shIoZ/kZUk+leQrSfYmeXWS05PckeSRdn3a0PrbkuxL8nCSy4bqFyV5oN13XZLMR1OSpJmNeqT/YeBzVfVTwM8Ae4GtwO6qWgvsbrdJsg7YBJwHbACuT7Ks7ecGYAuwtl02jKkPSdIIZg39JKcCPw/cCFBV36+qbwMbgZvbajcDl7fljcCtVfVsVT0K7AMuTrISOLWq7qqqAm4Z2kaSNAGjHOn/ODAN/G2S+5J8NMlLgbOq6gBAuz6zrb8KeGJo+/2ttqotH1o/TJItSaaSTE1PT8+pIUnSkY0S+suBC4EbquoC4Hu0qZwjmGmevo5SP7xYtaOq1lfV+hUrVowwREnSKEYJ/f3A/qq6u93+FIM3gafalA3t+umh9c8Z2n418GSrr56hLkmakFlDv6r+E3giyStb6VLgIWAXsLnVNgO3teVdwKYkJyY5l8EJ23vaFNAzSS5pn9q5cmgbSdIELB9xvd8BPpbkBODrwG8yeMPYmeQq4HHgCoCq2pNkJ4M3hueAa6rq+bafq4GbgJOA29tFkjQhI4V+Vd0PrJ/hrkuPsP52YPsM9Sng/DmMT5I0Rn4jV5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSMjh36SZUnuS/KP7fbpSe5I8ki7Pm1o3W1J9iV5OMllQ/WLkjzQ7rsuScbbjiTpaOZypP8uYO/Q7a3A7qpaC+xut0myDtgEnAdsAK5PsqxtcwOwBVjbLhuOafSSpDkZKfSTrAbeBHx0qLwRuLkt3wxcPlS/taqerapHgX3AxUlWAqdW1V1VVcAtQ9tIkiZg1CP9DwF/APzfUO2sqjoA0K7PbPVVwBND6+1vtVVt+dD6YZJsSTKVZGp6enrEIUqSZjNr6Cf5ZeDpqrp3xH3ONE9fR6kfXqzaUVXrq2r9ihUrRnxYSdJslo+wzmuBX03yRuAlwKlJ/g54KsnKqjrQpm6ebuvvB84Z2n418GSrr56hLkmakFmP9KtqW1Wtrqo1DE7Q/ktVvRXYBWxuq20GbmvLu4BNSU5Mci6DE7b3tCmgZ5Jc0j61c+XQNpKkCRjlSP9IrgV2JrkKeBy4AqCq9iTZCTwEPAdcU1XPt22uBm4CTgJubxdJ0oTMKfSr6k7gzrb8LeDSI6y3Hdg+Q30KOH+ug5QkjYffyJWkjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SerIrKGf5Jwkn0+yN8meJO9q9dOT3JHkkXZ92tA225LsS/JwksuG6hcleaDdd12SzE9bkqSZjHKk/xzw+1X108AlwDVJ1gFbgd1VtRbY3W7T7tsEnAdsAK5Psqzt6wZgC7C2XTaMsRdJ0ixmDf2qOlBVX2rLzwB7gVXARuDmttrNwOVteSNwa1U9W1WPAvuAi5OsBE6tqruqqoBbhraRJE3AnOb0k6wBLgDuBs6qqgMweGMAzmyrrQKeGNpsf6utasuH1iVJEzJy6Cc5Bfg08O6q+s7RVp2hVkepz/RYW5JMJZmanp4edYiSpFmMFPpJXswg8D9WVZ9p5afalA3t+ulW3w+cM7T5auDJVl89Q/0wVbWjqtZX1foVK1aM2oskaRajfHonwI3A3qr64NBdu4DNbXkzcNtQfVOSE5Ocy+CE7T1tCuiZJJe0fV45tI0kaQKWj7DOa4G3AQ8kub/V/gi4FtiZ5CrgceAKgKrak2Qn8BCDT/5cU1XPt+2uBm4CTgJubxdJ0oTMGvpV9QVmno8HuPQI22wHts9QnwLOn8sAJUnj4zdyJakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHDH1J6oihL0kdMfQlqSOGviR1xNCXpI4Y+pLUEUNfkjpi6EtSRwx9SeqIoS9JHTH0Jakjhr4kdcTQl6SOGPqS1BFDX5I6YuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjhj6ktQRQ1+SOmLoS1JHJh76STYkeTjJviRbJ/34ktSziYZ+kmXAXwJvANYBb0mybpJjkKSeTfpI/2JgX1V9vaq+D9wKbJzwGCSpW6mqyT1Y8uvAhqp6e7v9NuBnq+qdh6y3BdjSbr4SeHieh3YG8M15foxJWkr9LKVewH6OZ0upF4BXVNWKQ4vLJzyIzFA77F2nqnYAO+Z/OANJpqpq/aQeb74tpX6WUi9gP8ezpdTL0Ux6emc/cM7Q7dXAkxMegyR1a9Kh/+/A2iTnJjkB2ATsmvAYJKlbE53eqarnkrwT+CdgGfA3VbVnkmM4golNJU3IUupnKfUC9nM8W0q9HNFET+RKkhaW38iVpI4Y+pLUkSUZ+knOSfL5JHuT7EnyrlY/PckdSR5p16e1+svb+t9N8pFD9nVRkgfaz0Zcl2Smj50uin6SnJzks0m+0vZz7WLt5ZB97kry4CT7GHrscb7WTkiyI8lX23P05kXez1va386Xk3wuyRnHeS+vT3JvG/O9SX5xaF8LngNjU1VL7gKsBC5syz8CfJXBzz78GbC11bcC72/LLwV+DngH8JFD9nUP8GoG3zG4HXjDYu0HOBn4hbZ8AvBvk+5nnM9Nu//XgI8DDy6B19qfAu9ryy8Czlis/TD4kMjTB3to2//Jcd7LBcDZbfl84BtD+1rwHBjbf5eFHsCEnvzbgNcz+GbvyqEXxMOHrPcbh7xwVwJfGbr9FuCvF2s/M+znw8BvLdZegFOAL7Q/5AUJ/TH38wTw0oXuYRz9AC8GpoFXtKD8K2DLYuil1QN8CzjxeM2BF3pZktM7w5KsYfAOfjdwVlUdAGjXZ86y+SoGXyg7aH+rLZhj7Gd4Py8DfgXYPf5RjjyGNRxbL+8F/hz4n/ka41wcSz/t+QB4b5IvJflkkrPmcbizOpZ+quoHwNXAAwy+gLkOuHE+x3s0L6CXNwP3VdWzHIc5cCyWdOgnOQX4NPDuqvrOC9nFDLUF+4zrGPo5uJ/lwCeA66rq6+Ma3xzHcEy9JHkV8BNV9ffjHtsLMYbnZjmDb6h/saouBO4CPjDGIc7JGJ6fFzMI/QuAs4EvA9vGOsjRxzKnXpKcB7wf+O2DpRlWW7SfdV+yod9edJ8GPlZVn2nlp5KsbPevZDDneDT7GfwhHrRgPxsxpn4O2gE8UlUfGvtARzCmXl4NXJTkMQZTPD+Z5M75GfHRjamfbzH4F8vBN7FPAhfOw3BnNaZ+XgVQVV+rwZzITuA18zPiI5trL0lWM3gOrqyqr7XycZMD47AkQ7+dWb8R2FtVHxy6axewuS1vZjDHd0Ttn37PJLmk7fPK2baZD+Pqp+3rfcCPAu8e8zBHMsbn5oaqOruq1jA4kfjVqnrd+Ed8dGPsp4B/AF7XSpcCD411sCMY42vtG8C6JAd/5fH1wN5xjnU2c+2lTbF9FthWVV88uPLxkgNjs9AnFebjwiAEisE/Ke9vlzcCL2cwh/1Iuz59aJvHgP8CvsvgnX1dq68HHgS+BnyE9i3mxdgPgyOUYvDHd3A/b1+MvRyyzzUs3Kd3xvlaewXwr21fu4EfW+T9vKO91r7M4A3t5cdzL8AfA98bWvd+4Mx234LnwLgu/gyDJHVkSU7vSJJmZuhLUkcMfUnqiKEvSR0x9CWpI4a+JHXE0Jekjvw/VPyfjkeWhTIAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"x = df_agrupado.mean().index\n",
"y = df_agrupado.mean()['TOT']\n",
"plt.bar(x, y)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "c472c936",
"metadata": {},
"source": [
"**Con esto podemos empezar a utilizar una base de datos y mostrar algunos resultados!** \n",
"- Para leer datos de excel podemos usar `pd.read_excel`.\n",
"- Para leer datos de stata podemos usar `pd.read_stata`. \n"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "d4cfd550",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" TOT \n",
" \n",
" \n",
" year \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 2010 \n",
" 7318.352000 \n",
" \n",
" \n",
" 2011 \n",
" 7676.545833 \n",
" \n",
" \n",
" 2012 \n",
" 7858.610833 \n",
" \n",
" \n",
" 2013 \n",
" 8023.047500 \n",
" \n",
" \n",
" 2014 \n",
" 8150.107500 \n",
" \n",
" \n",
" 2015 \n",
" 8273.739167 \n",
" \n",
" \n",
" 2016 \n",
" 8391.921667 \n",
" \n",
" \n",
" 2017 \n",
" 8574.332500 \n",
" \n",
" \n",
" 2018 \n",
" 8773.940000 \n",
" \n",
" \n",
" 2019 \n",
" 8953.679167 \n",
" \n",
" \n",
" 2020 \n",
" 7932.822500 \n",
" \n",
" \n",
" 2021 \n",
" 8145.750000 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" TOT\n",
"year \n",
"2010 7318.352000\n",
"2011 7676.545833\n",
"2012 7858.610833\n",
"2013 8023.047500\n",
"2014 8150.107500\n",
"2015 8273.739167\n",
"2016 8391.921667\n",
"2017 8574.332500\n",
"2018 8773.940000\n",
"2019 8953.679167\n",
"2020 7932.822500\n",
"2021 8145.750000"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_agrupado = df[['TOT', 'year']].groupby('year')\n",
"df_agrupado.mean()"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "a91bcb72",
"metadata": {},
"outputs": [],
"source": [
"df_agrupado = df[['TOT', 'year']].groupby('year')\n",
"# df_agrupado.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c9d2d4ef",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "972d2a49",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}