{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Environment setup for data frames tutorial\n", "\n", "## Bogumił Kamiński" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to DataFrames.jl introduction!\n", "\n", "This set of Jupyter notebooks is intended to give you an overwiew of what functionality DataFrames.jl has based on practical examples.\n", "\n", "You can find reviews of functionality of DataFrames.jl (not as exercises as this tutorial but task-type oriented) in the following locations:\n", "* an official manual at https://juliadata.github.io/DataFrames.jl/stable/\n", "* a tutorial going through all functionalities of DataFrames.jl at https://github.com/bkamins/Julia-DataFrames-Tutorial\n", "\n", "We also assume that you have a basic knowledge of the Julia language and the Julia ecosystem. There are great tutorials on this topic in [JuliaAcademy](https://juliaacademy.com/), so I encourage you to check them out.\n", "\n", "As this is a hands-on tutorial you can expect that the examples will be implemented in a way as I would write them when doing actual project." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The notebooks were prepared under Julia 1.4.1. If you have a different version of Julia installed change the kernel in *Kernel/Change kernel* option in menu (assuming you are on a Julia 1.x all examples should work without a problem)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "v\"1.4.1\"" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "VERSION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Jupyter Notebook automatically activates project environment if it is found in the working directory.\n", "\n", "So first let us check if we have Project.toml and Manifest.toml files present (they should be present if you cloned the repository of this tutorial)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2-element BitArray{1}:\n", " 1\n", " 1" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "isfile.([\"Project.toml\", \"Manifest.toml\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should get `1` printed (meaning `true`) in both entries of a vector.\n", "\n", "Now we are sure that you are going to use exactly the same versions of the packages that I use when running this tutorial.\n", "\n", "Let us check what packages (and in what versions) we will use." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32m\u001b[1mStatus\u001b[22m\u001b[39m `D:\\DataFrames\\Project.toml`\n", " \u001b[90m [336ed68f]\u001b[39m\u001b[37m CSV v0.6.2\u001b[39m\n", " \u001b[90m [a93c6f00]\u001b[39m\u001b[37m DataFrames v0.21.0\u001b[39m\n", " \u001b[90m [da1fdf0e]\u001b[39m\u001b[37m FreqTables v0.3.3\u001b[39m\n", " \u001b[90m [38e38edf]\u001b[39m\u001b[37m GLM v1.3.9\u001b[39m\n", " \u001b[90m [b98c9c47]\u001b[39m\u001b[37m Pipe v1.2.0\u001b[39m\n", " \u001b[90m [d330b81b]\u001b[39m\u001b[37m PyPlot v2.9.0\u001b[39m\n" ] } ], "source": [ "] status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "if the command above gives a warning that some of the packages are not downloaded run the `instantiate` instruction from the following line" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [], "source": [ "] instantiate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you see we will use the following packages:\n", "* DataFrames.jl: a major package that is a subject of this tutorial; it is used for data manipulation; we use version 0.21.0 of this package\n", "* CSV.jl: a package for reading/writing of CSV files\n", "* FreqTables.jl: a very useful package for creating frquency tables\n", "* GLM.jl: a package for fitting Generalized Linear Models (as no data science tutorial would be complete without building some predictive model)\n", "* PyPlot.jl: a package for plotting; there are many options in the Julia ecosystem to choose from; in this tutorial we use PyPlot.jl as it is based on Matplotlib so if you have experience with the Python data science technology stack it should be familiar\n", "* Pipe.jl: a package that makes chaining of operations super powerful (which is something you probably know from `%>%` in R)" ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "kernelspec": { "display_name": "Julia 1.4.1", "language": "julia", "name": "julia-1.4" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.4.1" } }, "nbformat": 4, "nbformat_minor": 4 }