for setting deep learning server via Docker go here Assuming that you already have a machine runnning Linux/Ubuntu.
1st type the following command to get the list of recommend drivers for your PC.
ubuntu-drivers devices
Now install GPU drivers. I will install 470v drivers as shown in above images, so lets proceed.
// Update repository.
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
// Check recommeded driver can be used.
$ apt-cache search nvidia | grep nvidia-driver-470
Now lets install drivers using APT
// Install driver by apt.
$ sudo apt-get install nvidia-driver-470
// Reboot.
$ sudo reboot
※ During NVIDIA installation process if an error occurs or you can't proceed or you can't get your desired vesion to be displayes or run you have to uninstall it completelyl by
$ sudo apt --purge autoremove nvidia*
after installation verify it by
nvidia-smi
For using the Tensorflow or Pythorch we need to install the CUDA toolkit. Important thing here is that specific versions of both liberaries require a certain versions of CUDA nd cnDNN toolkit to be installed to be compatible.
As CUDA 11.3 is compatible with both so we will install that one. Follow this link to get your desired version, and select appropriate options
The type command on the page or below.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-ubuntu1804-11-4-local_11.4.0-470.42.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-4-local_11.4.0-470.42.01-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-4-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
The third command will download the files so it might take a while depending on the internet speed.
Next you need to set the Enviornment variables, in ~/.bashrc
so type
sudo gedit ~/.bashrc
you can use vm
or other editors to but I am more comfertable with this one. Then add following lines at the end of opened window.
#cuDNN path setup
export CUDA_HOME=/usr/local/cuda-11.4
export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
# end
change 11.4
to your installed version.then
sudo reboot
If everything went well you can check your CUDA version by typing.
nvcc --version
//or
nvcc -V
and you will see
If not you'll have to uninstall everything related to NVIDIA using command mentioned above and debug. Also delete the files in /usr/local
if any remaining.
sudo apt-get autoremove --purge cuda
You need to make an account on nvidia before downloading it. Each CUDA toolkit has its compatible cuDNN version so keep that in mind. After logging in follow this link to download the cuDNN. I will download the cuDNN 8.2.4v as it is compatable with 11.4.
There are many ways to install cuDNN, I will show you one method which I think is easy.
Download the cuDNN Runtime Library of Linux [x85_64]. Then
cd` to the download dir and type follwoing commands
// This will extract all the files in the same dir
tar -xzvf <full ame of the file>.tgz
Then copy soem files to where the CUDA is installed by typing following
in the newer versions the lib64
might be replaced by just lib
then just update the command by removing 64
and copy the files.
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
Then check the cuDNN installation by typing.
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
The cat
command might not output anything. But as long as it don't give error just proceed.
Download the file from link, and as conda installer is a bash script. To run the installation script, use the command after navigating (cd
) download dir.
bash Anaconda3-2020.02-Linux-x86_64.sh
check the name of the downloaded file.
During Anaconda installation you might have to press Enter
multiple times and it'll ask for multiple permision jsut go with the flow and allow default installation to proceed.
Then restart your terminal and you will see (base) at start of your username.
Then Verify that your shell's configuration file (e.g., .bashrc, .bash_profile, or .zshrc) contains the necessary lines to initialize Conda. Open the configuration file with a text editor and check for lines like:
# Anaconda3
export PATH="/home/your_username/anaconda3/bin:$PATH"
The reopen terminal and write conda init
.
We will create two enviornments with conda one for tensorflow and one for pytorch.
For creating env type.
conda create -n <env_name> python=x.x
// activate by
conda activate <env_name>
Then install tensorflow via pip
// first install pip via
sudo apt install python3-pip
// install tensorflow
pip install tensorflow-gpu==2.x.x
Test your installation by
python -c"import tensorflow as tf;print(tf.test_is_gpu_available())"
If it prints True and you can see the names of your GPUs and the memory in ouptput then you installation is successful.
Install pytorch as
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
Then test the installation via
python -c"import torch;print(torch.cuda.is_available());print(torch.cuda.get_device_name())"
and it'll print out the name of one of the gpus in our machine
For starting the SSH server follow the steps below.
sudo apt update
sudo apt install openssh-server
When you install SSH, it runs automatically. You can check if SSH is running with the following command:
sudo systemctl status ssh
if it shows active (running) then its running.
If it is not running, run it with the following command.
sudo systemctl enable ssh
sudo systemctl start ssh
If you are using a firewall, make sure to allow ssh. If your firewall is disabled, you can ignore it.
sudo ufw allow ssh
Firewall is disabled by default, and you can check the status with the following command.
sudo ufw status
You can also change to port of your SSH server if you want to for the type the following command
sudo gedit /etc/ssh/sshd_congfig
and locate line
# Port 22
Uncomment it and change the port number
Port <new port>
// reboot system to take effect
sudo reboot
Restarting the SSH sever on Linux
sudo /etc/init.d/ssh restart
You need to edit two files:
/etc/motd
(Message of the Day)/etc/ssh/sshd_config
here uncomment and change the settingPrintLastLog
tono
, this will disable the "Last login" message.
And then restart your sshd.
Welcome screen
Spyder is the simplest, easiest IDE available for data science projects. If you are new to ML/DL then this is best IDE for you. For installation, after creating your env
you can install Spyder via following command
conda instll spyder
and to use it just activate your env and type spyder
in terminal.
For installing VS Code on linux server via snap
type
sudo snap install code --classic
and then from terminal run code
and you are all set.
Follow instructions in this BlogPost to connect to your ssh
server via VS code. It in korean so turn on google translator.
Just type ssh username@your_ip_address
and press enter
.
Then vs code will ask you for passworkd. If it gives error then it means your srever port is other than default port 22
then you have to specify the custom port by editing the config
file of vs code as below. Yor can edit that by click the dialogue box which opens when you enter your user name and ip.
Follow steps here: Link
Install gpustat
via
pip install gpustat
// then to see type
gpustat -cp
you'll see follwoin compact view
or to continuously watch type
watch -c gpustat -cp --color
Reference:
https://gist.github.com/denguir/b21aa66ae7fb1089655dd9de8351a202 https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#network-repo-installation-for-ubuntu