Let’s start creating some directories where we will store data during the project:
# go the the user's home directory
cd ~
# create a directory in the Document folder
cd Documents/
mkdir data_science_project_1
# create another directory to store data:
cd data_science_project_1
mkdir data
To manage environments for python codes, we will install miniconda and use conda. First, download the installer and follow the instructions at the web site https://conda.io/projects/conda/en/latest/user-guide/install/linux.html. Then open a terminal of your Linux system and you will probably see that (base) appears like this:
(base) fabio@fabio-K50IJ:~$
That means that the (base) environment is activated by default. We will create a new environment and install some pakcages we will use in the project.
# create a new environment names 'datasc'
conda create --name datasc
# switch to the new environment
conda activate datasc
# now you should see something like this:
(datasc) fabio@fabio-K50IJ:~$
# list the packages installed in the environment (now it is empty):
conda list
# list the version of a package available for installations:
conda search scikit-learn
# At the time of writing, the last release of scikit-learn is 0.23.2
# it appear at the end of the list
# install a package:
conda install scikit-learn
# now it (and its dependences) will appear in the list of installed packages;
# note that the last release of the package has benn installed;
# moreover, note that the installation has configured the python version as well:
conda list
# install other packages
conda install numpy
conda install pandas
conda install pymongo
conda install matplotlib
conda install -c conda-forge jupyterlab
# let's install spyder as well, a useful IDE for python
conda install spyder
Maybe later we will need to create more directories and/or to install more packages. But for now we are ready to start our project.