GPU training of neural network with parallel computing toolbox unreasonably slow, what am I missing?

I’m trying to speed up the training of some NARNET neural networks by using the GPU support that you get from the parallel computing toolbox but so far I haven’t been getting it to work. Or rather, it is working but it’s unreasonably slow. According to the documentation training on a GPU instead of the CPU shouldn’t be any harder than adding the statement 'useGPU','yes” to the training command. However, if I simply create some dummy data, for example a sine wave with 900 values, and train a NARNET on it using the CPU like so:

%CPU training
T = num2cell(sin(1:0.01:10));
net = narnet( 1:2, 10 ); 
[ Xs, Xsi, Asi, Ts] = preparets( net, {}, {}, T );
rng(0)
net.trainFcn = 'trainscg';
tic
net = train(net,Xs,Ts,'showResources','yes' );
toc %2.77

The training takes less than 3 seconds. But when doing the exact same thing on a CUDA supported GTX 760 GPU:

%GPU training
T = num2cell(sin(1:0.01:10));
net = narnet( 1:2, 10 ); 
[ Xs, Xsi, Asi, Ts] = preparets( net, {}, {}, T );
rng(0)
net.trainFcn = 'trainscg';
tic
net = train(net,Xs,Ts,'useGPU','yes','showResources','yes' );
toc % 1247.6

Incredibly the training takes over 20 minutes!

I’ve read through Mathworks fairly extensive documentation on parallel and GPU computing with the neural network toolbox and seen that there are a few things that can/should be done when calculating with a GPU for example converting the input and target data to GPU arrays before training with the nndata2gpu command and replacing any tansig activation functions with elliotsig which does speed up the training a bit:

%Improved GPU training
T = num2cell(sin(1:0.01:10));
net = narnet( 1:2, 10 ); 
[ Xs, Xsi, Asi, Ts ] = preparets( net, {}, {}, T );
rng(0)

net = configure(net,Xs,Ts); 
Xs = nndata2gpu(Xs);
Ts = nndata2gpu(Ts);
Xsi = nndata2gpu(Xsi);

for i=1:net.numLayers
  if strcmp(net.layers{i}.transferFcn,'tansig')
    net.layers{i}.transferFcn = 'elliotsig';
  end
end

net.trainFcn = 'trainscg';
tic
net = train(net,Xs,Ts,'showResources','yes' );
toc  %70.79

The training here only takes about 70 seconds, but still it’s many times slower compared to just doing it on my CPU. I’ve tried several different sized data series and network architectures but I’ve never seen the GPU training being able to compete with the CPU which is strange since as I understand it most professional ANN research is done using GPU’s?

What am I doing wrong here? Clearly I must be missing something fundamental.

ANSWER

Matlabsolutions.com provide latest MatLab Homework Help,MatLab Assignment Help for students, engineers and researchers in Multiple Branches like ECE, EEE, CSE, Mechanical, Civil with 100% output.Matlab Code for B.E, B.Tech,M.E,M.Tech, Ph.D. Scholars with 100% privacy guaranteed. Get MATLAB projects with source code for your learning and research.

Getting a speed up with a GPU requires a couple things:

1) The amount of time spent in gradient calculations (which happen on CPU or GPU as you request) is significant compared to the training step update (which still happens on the CPU).

2) The problem allows enough parallelism to run efficiently on the much slower but much greater number of GPU cores relative to the CPU.

For both requirements, the larger the dataset and the larger the neural network, the more parallelism that can be taken advantage of and the greater percentage of calculations are in the gradient so the training steps are not a speed bottleneck.

The NAR problem you defined only has 899 steps with a 10 neuron network. The fact that both dataset

SEE COMPLETE ANSWER CLICK THE LINK

https://www.matlabsolutions.com/resources/gpu-training-of-neural-network-with-parallel-computing-toolbox-unreasonably-slow-what-am-i-missing-.php

matlab articles

Search This Blog

How To Plot Transfer Functions In Matlab?

GPU training of neural network with parallel computing toolbox unreasonably slow, what am I missing?

ANSWER

Comments

Post a Comment

Popular posts from this blog

What are some good alternatives to Simulink?

Stretch the dynamic range of the given 8-bit grayscale image using MATL...