{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Summary of the tutorial\n", "\n", "### Bogumił Kamiński" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we finish let us summarize the major functions that DataFrames.jl provides:\n", "1. data frame is a matrix-like data structure. You can index it just like a matrix. The differences are\n", " - you can use strings or `Symbol`s to select columns\n", " - if you select rows with `!` it selects you whole column of a data frame and passes it to you without copying\n", "2. You can quickly summarize the contents of a data frame using the `describe` function\n", "3. You can add rows to a data frame in-place using `push!` (similarly `append!` allows you to add multiple rows at the same time) (also `repeat`/`repeat!`, `hcat` and `vcat` are provided)\n", "4. You can work on a grouped data frame that is created using the `groupby` function. It is a view and works as-if you have created a lookup index to a data frame.\n", "5. There are `select`/`select!`/`transform`/`transform!`/`combine` functions that allow you to quickly transform/aggregate columns of a data frame or grouped data frame; there is also `mapcols`/`mapcols!` functions for quick aggregation of columns of a data frame\n", "6. You can filter rows of a data frame using `filter` and `filter!` functions\n", "7. Use `sort` and `sort!` functions to sort data frames\n", "8. We have not discussed this but you can join multiple data frames using `innerjoin`, `outerjoin`, `leftjoin`, `rightjoin`, `semijoin`, `antijoin`, and `crossjoin` functions (they work as you would expect them if you know SQL)\n", "9. If you want to iterate rows or columns of a data frame use `eachrow` and `eachcol` functions (we have not discussed them, but they work exactly like in Julia Base)\n", "10. You can change names of columns in a data frame using `rename` and `rename!` functions; to get names of columns of a data frame use `names` (strings) or `propertynames` (`Symbol`s)\n", "11. To get number of rows and columns of a data frame use `nrow` and `ncol` functions\n", "12. To flatten nested columns of a data frame use `flatten`\n", "13. You can easily allow/disallow missing values in columns of a data frame using `allowmising`/`allowmissing!`/`disallowmising`/`disallowmissing!` functions (similar functionality is provided for making columns categorical using `categorical`/`categorical!` functions)\n", "14. You can drop rows with missing data with `dropmissing`/`dropmissing!` functions\n", "15. You can switch between [long and wide](https://en.wikipedia.org/wiki/Wide_and_narrow_data) representation of a data frame using `stack` and `unstack`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Additionally we have covered `freqtable` from FreqTables.jl, `@pipe` from Pipe.jl, and `lm` from GLM.jl packages that are often useful when wrangling data.\n", "\n", "Finally we have shown how to integrate DataFrames.jl with plotting using PyPlot.jl." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course this course was just an introduction.\n", "\n", "You can find reviews of functionality of DataFrames.jl in:\n", "* an official manual at https://juliadata.github.io/DataFrames.jl/stable/\n", "* a tutorial going through all functionalities of DataFrames.jl at https://github.com/bkamins/Julia-DataFrames-Tutorial\n", "* documentation strings of the respective funcions" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.4.1", "language": "julia", "name": "julia-1.4" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.4.1" } }, "nbformat": 4, "nbformat_minor": 4 }