Keith Kim : Notes on CUDA and Tensorflow

This is a note to myself. I just had to re-install TensorFlow and wanted to put some notes for the record.

This is about installing CUDA, Anaconda, TensorFlow.

Environment:Win10 Pro 64-bit

I have old GPU graphics cards:

Tesla C2050 - https://www.nvidia.com/docs/IO/43395/NV_DS_Tesla_C2050_C2070_jul10_lores.pdf
Quadro 600 - https://www.nvidia.com/object/product-quadro-600-us.html

I got the Tesla for GPU programming many years ago before TensorFlow came out, and paid good money for it, but now it's only $60-$70 on eBay.

I still can use above old cards with frameworks other than just Tensorflow. So to be compatible with all my cards, I have to stick to CUDA 8 for the older cards. The latest CUDA is 10.1, and requires latest GPUs.

Due to some TensorFlow work I had to do last year, I bought somewhat latest graphics card for it:

Quadro P2000 - https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/documents/Quadro-P2000-US-03Feb17.pdf

TensorFlow GPU requires minimum compute capability of 3, Quadro P2000 performs decently for experimental work. I used it along with AWS -- when I compared to AWS GPU small machine configuration with P2000; and duration of the work, and disk space, etc -- I found buying this graphics card is a good decision. For larger projects with good budget, I would explorer AWS option.

So there are three graphics cards: two connected to actual monitors, and P2000 is just used for GPU only with TensorFlow.

To use TensorFlow with GPU for Quadro P2000, and also to be backward compatible with the older cards to use with other frameworks/C++, etc:

Install CUDA 8.0 + cuDNN 6
Install Anaconda

Next, install python 3.5 for TensorFlow v1.3 with CUDA8: Run Anaconda console. By the way, I use ConEmu, so use this entry for Anaconda task:

%windir%\System32\cmd.exe "/K" C:\opt\Anaconda3\Scripts\activate.bat C:\opt\Anaconda3

And in Anaconda console, create environment for TensorFlow:

(base) C:\Users\kkim> conda create --name tensorflow python=3.5(base) C:\Users\kkim> activate tensorflow

Then install all the required packages:

(tensorflow) C:\Users\kkim>conda install pandas matplotlib jupyter notebook scipy scikit-learn numpy nb_conda pillow h5py pyhamcrest cython

Now, install TensorFlow and Keras, if you don't want Keras, you can just install TF only. In order to install Keras, you have to follow this odd steps: install TF, install Keras, uninstall TF, and then install TF again.

It's because Keras needs TF to be installed, but after installing Keras, it messes up something and there will be an issue with TF. So the solution to this problem is uninstall TF and re-install. This will fix it. See Reference#4:

(tensorflow) C:\Users\kkim>pip install keras
(tensorflow) C:\Users\kkim>pip install tensorflow-gpu==1.3
(tensorflow) C:\Users\kkim>pip uninstall tensorflow-gpu
(tensorflow) C:\Users\kkim>pip install tensorflow-gpu==1.3

Due to use of CUDA8, Only TF v1.3 can be used. Later version of TF requires newer version of CUDA.

All done. Now time to have fun with Jupyter and TF. TF will use Quadro P2000 only, but with CUDA SDK and other frameworks can use all three cards.

Just a note...

Here is the output of deviceQuery -- deviceQuery is an example program that comes with CUDA SDK from nVidia:

deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 3 CUDA Capable device(s)

Device 0: "Quadro P2000"
CUDA Driver Version / Runtime Version          8.0 / 8.0
CUDA Capability Major/Minor version number:    6.1
Total amount of global memory:                 5120 MBytes (5368709120 bytes)
( 8) Multiprocessors, (128) CUDA Cores/MP:     1024 CUDA Cores
GPU Max Clock rate:                            1481 MHz (1.48 GHz)
Memory Clock rate:                             3504 Mhz
Memory Bus Width:                              160-bit
L2 Cache Size:                                 1310720 bytes
Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory:               65536 bytes
Total amount of shared memory per block:       49152 bytes
Total number of registers available per block: 65536
Warp size:                                     32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block:           1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch:                          2147483647 bytes
Texture alignment:                             512 bytes
Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
Run time limit on kernels:                     Yes
Integrated GPU sharing Host Memory:            No
Support host page-locked memory mapping:       Yes
Alignment requirement for Surfaces:            Yes
Device has ECC support:                        Disabled
CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA):      Yes
Device PCI Domain ID / Bus ID / location ID:   0 / 9 / 0
Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Tesla C2050"
CUDA Driver Version / Runtime Version          8.0 / 8.0
CUDA Capability Major/Minor version number:    2.0
Total amount of global memory:                 3072 MBytes (3221225472 bytes)
(14) Multiprocessors, ( 32) CUDA Cores/MP:     448 CUDA Cores
GPU Max Clock rate:                            1147 MHz (1.15 GHz)
Memory Clock rate:                             1500 Mhz
Memory Bus Width:                              384-bit
L2 Cache Size:                                 786432 bytes
Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory:               65536 bytes
Total amount of shared memory per block:       49152 bytes
Total number of registers available per block: 32768
Warp size:                                     32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block:           1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
Maximum memory pitch:                          2147483647 bytes
Texture alignment:                             512 bytes
Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
Run time limit on kernels:                     Yes
Integrated GPU sharing Host Memory:            No
Support host page-locked memory mapping:       Yes
Alignment requirement for Surfaces:            Yes
Device has ECC support:                        Disabled
CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA):      Yes
Device PCI Domain ID / Bus ID / location ID:   0 / 5 / 0
Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 2: "Quadro 600"
CUDA Driver Version / Runtime Version          8.0 / 8.0
CUDA Capability Major/Minor version number:    2.1
Total amount of global memory:                 1024 MBytes (1073741824 bytes)
( 2) Multiprocessors, ( 48) CUDA Cores/MP:     96 CUDA Cores
GPU Max Clock rate:                            1280 MHz (1.28 GHz)
Memory Clock rate:                             800 Mhz
Memory Bus Width:                              128-bit
L2 Cache Size:                                 131072 bytes
Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory:               65536 bytes
Total amount of shared memory per block:       49152 bytes
Total number of registers available per block: 32768
Warp size:                                     32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block:           1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
Maximum memory pitch:                          2147483647 bytes
Texture alignment:                             512 bytes
Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
Run time limit on kernels:                     Yes
Integrated GPU sharing Host Memory:            No
Support host page-locked memory mapping:       Yes
Alignment requirement for Surfaces:            Yes
Device has ECC support:                        Disabled
CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA):      Yes
Device PCI Domain ID / Bus ID / location ID:   0 / 6 / 0
Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 3, Device0 = Quadro P2000, Device1 = Tesla C2050, Device2 = Quadro 600
Result = PASS

References

https://ulrik.is/writing/keras-tensorflow-with-cuda-8-and-cudnn-on-windows-10/
cuda version, tensorflow version match - https://stackoverflow.com/questions/50622525/which-tensorflow-and-cuda-version-combinations-are-compatible#50622526
https://medium.com/@minhplayer95/how-to-install-tensorflow-with-gpu-support-on-windows-10-with-anaconda-4e80a8beaaf0
Installing TensorFlow with Keras: https://github.com/keras-team/keras/issues/5776
CUDA version, tensorflow version match - https://stackoverflow.com/questions/50622525/which-tensorflow-and-cuda-version-combinations-are-compatible#50622526
https://medium.com/@minhplayer95/how-to-install-tensorflow-with-gpu-support-on-windows-10-with-anaconda-4e80a8beaaf0

Keith Kim

March 8, 2019

Notes on CUDA and Tensorflow

Just a note...

References

No comments:

InfoQ

IT & Science News

IT Sites

SDTimes

InformationWeek

IT World (한국어)

DZone

AI News

Best of Internet

CACM

AI

Lisp

Github

Misc

Security

Security Magazine

Threat Post

Security Week

Dark Reading

Krebson Security

보안뉴스