Bobby Anguelov's Blog

A day in the life of a wannabe game developer

Basic Neural Network Tutorial : C++ Implementation and Source Code

Introduction

So I’ve now finished the first version of my second neural network tutorial covering the implementation and training of a neural network. I noticed mistakes and better ways of phrasing things in the first tutorial (thanks for the comments guys) and rewrote large sections. This will probably occur with this tutorial in the coming week so please bear with me. I’m pretty overloaded with work and assignments so I haven’t been able to dedicate as much time as I would have liked to this tutorial, even so I feel its rather complete and any gaps will be filled in by my source code.

Implementation

Okay so how do we implement our neural network? I’m not going to cover every aspect in great detail since you can just look at my source code. I’m just going to go over the very basic architecture. So what critical data storage do we need for our neural network?

  • Our neuron values
  • Our weights
  • Our weight changes
  • Our error gradients

Now I’ve seen various implementations and wait for it… here comes an OO rant: I don’t understand why people feel the need to encapsulate everything in classes. The most common implementation of a neural network I’ve come across is the one where each neuron is modeled as an object and the neural network stores all these objects. Now explain to me that advantages of such an approach? All you’re doing is wasting memory and code space. There is no sane reason to do this apart from trying to be super OO.

The other common approach is to model each layer as an object? Again what the hell are people smoking? I guess i can blame this on idiot professors that know the theory but can’t code their way out of a paper bag. Like for example my father is a math’s professor and is absolutely brilliant in regards to developing algorithms and just seeing elegant solutions to problems but he cannot write code at all. I’ve had to convert his original discrete pulse transform Matlab code into c++, and in the process of converting/optimizing it, I ended up developing a whole new approach and technique that wasnt possible to achieve directly from the maths – the so called fast discrete pulse transform.

I also tend to be a bit of a perfectionist and am firm believer in occam’s razor (well a modified version) – “the simplest solution is usually the best solution”. Okay enough ranting – the point is dont use classes when you dont need to! Useless encapsulation is a terrible terrible mistake, one that most college students make since its drilled into them during their years of study!

So below is how I structured my neural network and afaik it’s as efficient as possible. If anyone can further optimize my implementation please do!  I’d love to see your techniques (I love learning how to make code faster).

A neural network is a very simple thing and so must be implemented very simply. All the above values are just numbers so why can’t I just use multi-dimensional arrays. The short answer is that I can and I do. Storing each of those sets of number in multi-dimensional arrays is the most efficient and simplest way of doing it with the added benefit that if you use the index variables i/j/k for input/hidden/output loops respectively, then your code will almost exactly resemble the formulas in the theory tutorial and in this one.

So now that the major architecture design decision has been mentioned, here’s what the end result looks like:

Initialization of Neural Network

Okay so how do we initialize our neural network? Well again its super simple, we set all the values of the inputs, deltas and error gradients to 0. What about the weights?

The weights between the layers be have to randomly initialized. They are usually set to small values often between in the range of [-0.5, 0.5], but there are many other initialization strategies available for example, we can normally distribute the weights over the number of inputs but changing the range to: [ -2.4/n, 2.4/n] where n is the number of input neurons.

Technically you can set the weights to a fixed value since the weight updates will correct the weights eventually but this will be terribly inefficient. The whole point of setting the initialization range is to reduce the number of epochs required for training. Using weights closer to the required ones will obviously need less changes than weights that differ greatly.

If for example you have an implementation where the data for training coming in is very similar, for example an image recognition system that takes in various silhouettes and their classification and trains using that data. After a few training sessions with different data you can observe the range in which the final weights are found and initialize the weights randomly within that range, this technique will further reduce the number of epochs required to train the network.

Training data and training problem

So we’ve created our neural network but what good does that do without data to train it on? I’ve selected a letter classification problem as an example. I downloaded a letter recognition dataset from the UCI machine learning repository. ( http://archive.ics.uci.edu/ml/datasets/Letter+Recognition ).

I’ll be using this test data to explain the basics of training your neural network. The first thing is to format the dataset, in our case we had a list of 16 attributes and the letter those attribute represent. To make life simple the first problem I selected was to distinguish the letter ‘a’ from the rest. So I simply replaced the letter with a 1 for an “a” and a 0 for everything else, this effectively reduces the output to a single Boolean value.

Another problem i’ve included is a vowel regonition problem, where the output neuron is 1 for (a,e,i,o,u) and 0 for all other letters. This is a more complex problem and as such the accuracy achieved will be much lower.

So you can already see that we have 16 inputs (the attributes) and a single output, so most of our NN architecture is done, how many hidden neurons do we need? Well we don’t know, we have to play around with different numbers of hidden neurons till we get a good result.

I’ve also noticed that there seems to be an accuracy ceiling for every NN’s architecture. A good method to find out how many hidden neurons you need would be to: train the network several times (10+) and see what the average accuracy is at the end and then use this accuracy as a measure to compare different architectures.

Something that was asked in the comments section was what is the range for the input values, well the sigmoid functions is continuous over the range (-inf, inf) so any values will work but it a good idea to keep the values within the active region (region of greatest change / steepest gradient) of your activation function.

Okay moving along, so we’ve “formatted” our data the next step is to split it up so that we can train our neural network with it.

The training data sets

So in our case we have this massive data set of 20000 patterns (entries), we can’t just stick all of the data into our network since the network will learn that data and we have no way of checking how well the network will do with unseen data. This problem is referred to as over-fitting, basically the network starts remembering the input data and will fail to correctly handle unseen data. I don’t want to go into too much detail here as this is something of an advanced research topic. Once you are familiar with neural networks you can read up on over-fitting on your own.

So we don’t want the network to memorize the input data, so obviously we’ll have to separate our training set. Classically we split the data set into three parts: the training data, the generalization data and the validation data. It is also often recommended to shuffle the initial dataset before splitting it to ensure that your data sets are different each time.

  • The training data is what we used to train the network and update the weights with so it must be the largest chunk.
  • Generalization data is data that we’ll run through the NN at the end of each epoch to see how well the network manages to handle unseen data.
  • The validation data will be run though the neural network once training has completed (i.e. The stopping conditions have been met), this gives us our final validation error.

The classic split of the dataset is 60%, 20%, 20% for the training, generalization and validation data sets respectively. Let’s ignore the validation set for now; the training set is used to train the network, so if you can remember from the last tutorial for every piece of data (pattern) in the training set the following occurs:

  • Feed pattern through NN
  • Check errors and calculate error gradients
  • Update weights according to error gradients (back propagation)

Once we have processed all the patterns in the training data set, the process begins again from the start. Each run through of all the patterns in the training set is called an epoch. This brings us to the question how do we decide how many epochs to run?

Stopping Conditions

There are several measures used to decide when to stop training. I’m going to list the various measures, what they mean and how to calculate them.

  • Maximum Epochs Reached – The first measure is really easy, all it means is that the NN will stop once a set number of epochs have elapsed.
  • Training Set Accuracy – This is the number of correctly classified patterns over the total number of patterns in the training set for an epoch. Obviously the higher the accuracy the better. But remember that this is the accuracy on previously seen patterns so you can use this alone to stop your training.
  • Generalization Set Accuracy – This is the number of correctly classified patterns over the total number of patterns in the generalization set for an epoch. This gets calculated at the end of an epoch once all the weight changes have been completed. This represents the accuracy of the network in dealing with unseen patterns. Again you can’t use this measure alone since this could have a much higher accuracy that the training set error.
  • Training Set Mean Squared Error (MSE) – this is the average of the sum of the squared errors (desired – actual) for each pattern in the training set. This gives a more detailed measure of the current networks accuracy, the smaller the MSE the better the network performs.
  • Generalization Set Mean Squared Error (MSE) – this is the average of the sum of the squared errors (desired – actual) for each pattern in the generalization set.

So now we have these measures so how do we stop the network, well what I use is either the training and generalization accuracies or MSEs in addition to a maximum epoch. So I’ll stop once both my training and generalization set accuracies are above some value or the MSE’s are below some value. This way you cover all your bases.

Just remember that while your accuracies might be the same over several epochs the MSE may have changed, the MSE is a measure with a much higher resolution and often is a better choice to use for stopping conditions.

Now we come to an important point that people often overlook, every time you train the network the data sets are different (if you shuffled the initial dataset as recommended earlier) and the weights are random, so obviously your results will differ each time you train the network. Usually the only difference is the number of epochs required to reach the NN’s accuracy ceiling.

Advanced Data Partitioning Methods

There are several advanced concept I want to deal with here. Firstly when dealing with large datasets, pushing through a massive training dataset through the network may not be the best solution. Since the NN will have to process all the patterns in the training set over and over, and there could be tens of thousands of patterns, you can imagine how slow this will be.
So there are several techniques developed to partition the training set to provide better performance in regards to both time taken to train and accuracy. I’m only going to cover two basic ones that I’ve implemented into my data set reader found below.

Growing Dataset:

This approach involves training with a growing subset of the training set; this subset will increase with a fixed percentage each time training has completed (stopping conditions for the neural network have been met). This carries on until the training data set contains all the training patterns.

Windowed Data set:

This approach creates a window of fixed size and moves it across the dataset. This move occurs once training has completed for that window. You’ll specify the window size and the step size. This method stops once the window has covered the entire training data set.

Momentum

Momentum is a technique used to speed up the training of a BPN. Remember that the weight update is a move along the gradient of the activation function in the direction specified by the output error. Momentum is just that, it basically keeps you moving in the direction of the previous step. This also has the added bonus that when you change direction you don’t immediately jump in the opposite direction but your initial step is a small one. Basically if you overshoot you’ve missed your ideal point and you don’t wish to overshoot it again and so momentum helps to prevent that.

The only change to the implementation is in regards to the weight updates, remember from part one that the weights updates were as follows:

Momentum is usually set to a high value between 0 and 1.  You’ll notice that with momentum set to 0 the weight updates are identical to original ones. The effect of this momentum term is shown below for our training problem with the learning rate set to 0.01 and the momentum set to 0.8.

As you can see from the graphs the BPN will usually converge a lot faster with momentum added than without it and it also has the added benefit of allowing the back-propagation to avoid local minima’s in the search space and to traverse areas where the error space doesn’t change. This is evident from all the fluctuations seen in the graph.

The momentum formula’s shown above are also known as the generalized delta rule.

Batch / Online learning

The last thing I want to go over is the difference between batch learning and stochastic or on-line learning.
Stochastic learning occurs when the neuron weights are updated after each individual piece of data is passed through the system. The FFNN therefore changes with every piece of data and is in a constant state of change during training. This is the way we’ve been doing it up to now.

Batch Learning on the other hand stores each neuron weight change when it occurs, and only at the end of each epoch does it update the weights with the net change over the training set. This means the neural network will only update once at the end of each epoch. Implementation wise this change is minor as all you need to do is just store the weight changes (the delta w values) and just update the weights at the end of the epoch.

The effects of this I’ll leave up to you guys to discover for yourselves. I can’t spoon feed you everything.

Source code and Implementation

In the zip file below you’ll find the complete visual studio 2k8 project for the following:

  • My neural network class (has CSV logging, and has supports for momentum and batch/stochastic learning)
  • My CSV data reader class (loads CSV files, has several data partitioning approaches built in)
  • The test data files for the above problem
  • A test implementation of the above training problem

My code is highly commented and very easy to read so any questions I haven’t answered should hopefully be answered by the code. I hope this has helped you guys understand neural networks.

Neural Network Project and C++ Source code : nnImplementation.zip
My First Neural Network Tutorial : Theory of a Neural Network

UPDATE:

I’ve updated the source code to use a better archictural design, its cleaner, simpler and easier to understand: nnImplementationV2.zip

23 April 2008 - Posted by Bobby | Artificial Intelligence, Neural Networks, Programming | , , , , , , , , , , | 49 Comments

49 Comments »

  1. [...] Tutorial Continues in Part 2 : implementation and c++ source code – NN Tutorial Part 2 [...]

    Pingback by Basic Neural Network Tutorial - Theory « Bobby Anguelov’s Blog | 23 April 2008 | Reply

  2. [...] Basic Neural Network Tutorial : C++ Implementation and Source Code [...]

    Pingback by Basic Neural Network Tutorial : C++ Implementation and Source Code « gmgPuBlog | 6 May 2008 | Reply

  3. Somehow i missed the point. Probably lost in translation :) Anyway … nice blog to visit.

    cheers, Vamper!!

    Comment by Vamper | 23 June 2008 | Reply

  4. One of the best reasons to model a neuron as an object is that you can extend the parent class from neuron to create “sub networks” that are specially trained, with one manager network on top. And, what are you coding on? An apple II+? Memory’s like 10 bucks a gig, and your neuron class takes less than a K of extra space… WAY less. OOP’s the way to go. Accept it.

    Comment by NegaScout | 27 June 2008 | Reply

  5. To create what you want I’d just need several full networks and can avoid any inheritance. Also just because you have memory and resources doesn’t mean you should use them. I always try to aim for the smallest, neatest, most efficient approach to anything.

    It just adds unnecessary complexity, apart from the case where you’d want each neuron in a single network to use a different activation function or input aggregate (since thats all a neuron contains), but till now i haven’t seen any academic literature supporting such a network.

    Explain to me the benefits again? separation of the training was a good idea since it allows multiple ways of the training the network and reduced the amount of code. Encapsulating neurons isn’t.

    Also I’d love to know where you live cause memory is around $60 a gig (for the good quality stuff) and i’m running a monster: e8400 @ 4.3ghz, 8800gtx @ 8800ultra speeds, 1.3tb of hard drive space and 4 gigs of ddr2-1200 ram…

    Comment by Bobby | 28 June 2008 | Reply

  6. I’ve been reading through NN tutorials for the past few weeks, this is by far the best. Many thanks Bobby.

    Comment by dan | 11 September 2008 | Reply

  7. in defense of the author, for systems such as microcontrollers, or GPU’s, you need to be really conservative with memory

    Comment by s | 14 September 2008 | Reply

  8. thanks a lot,it’s good to me.Can you send me some data about “Basic Neural Network Tutorial : C++ Implementation and Source Code”.

    Comment by chi | 27 September 2008 | Reply

  9. nice post bro..

    Comment by ump student | 8 October 2008 | Reply

  10. Занимаюсь дизайном и хочу попросить автора takinginitiative.wordpress.com отправить шаьлончик на мой мыил) Готов заплатить…

    Comment by spenly | 14 October 2008 | Reply

  11. hey, my russian is terrible, from what i gather you want me to send you an email about something?

    Comment by Bobby | 14 October 2008 | Reply

  12. hi, this tutorials are quite awesome, many thanks for them
    but i do have a question, i read you were doing some image processing with neural networks, could you explain a bit how do you do that?
    i know some basics of image processing etc, but i have no idea how to apply neural networks with image processing
    for example, if you have some picture with some red rectangle, what is the input for neural network (whole picture, part of picture, filtered picture, etc) ?

    thanks in advance
    cheers!

    ps: is it possible, that the c implementation is almost two times as fast as c++? (i might have bugs in my program, but i think stuff works just fine, oh and i compiled both the same way)

    Comment by Lightenix | 8 November 2008 | Reply

  13. i used it for the classification of images, basically it told me what the image represented.

    Your input to a NN is entirely dependent on what you want to put into the network and what you want out, you need to define your inputs and outputs accordingly.

    as for the implementation speed, what do you mean its two times faster? does it learn twice as fast or does it take half the clock cycles per epoch? Remember that the learning time varies every time you run it, so for one run you might get it to reach 90% in 50 epochs and for the next run it might take 400. you cannot make any sort of learning time expectation…

    Comment by Bobby | 8 November 2008 | Reply

  14. i used srand( 1 ) for initializations, soo all weights would be the same at the start and both programs were run 150 epochs, also weights in both programs were initialized the same way, soo i don’t think there should be any differences each run
    for timing i used this:
    time_t secondsStart, secondsEnd;
    secondsStart = time (NULL);
    which gives accuracy of a second, anyway, programs returned 26s and 42s, soo greater accurancy shouldn’t be needed
    i started measuring time at the start of main and ended it at the end. This way it is measuring whole initialization, even saving weights (which i don’t have).

    anyway, bugs are not excluded =)

    cheers

    Comment by Lightenix | 9 November 2008 | Reply

  15. that timing method is inaccurate, if you want use my timing class for the timing and let me know what results you get.

    wanna send me your code I’m curious to see what you’ve done? banguelov (@) cs.up.ac.za

    Comment by Bobby | 9 November 2008 | Reply

  16. hi bobby, this really helps me as a complete beginner in NN, but why i can’t download the zip files?

    Comment by liyen | 9 November 2008 | Reply

  17. Hey. Love the tutorial. Although I use a different programming language so would it be ok if I emailed you with specific questions? If so, could you email me.

    So hard to find in depth NN tutorials. This is definetly the best. Thanks!

    Comment by Louis | 15 November 2008 | Reply

  18. hey bro, thanks for the kind words, I’m crazy busy at the moment but you can mail me any questions you want but I’ll only be able to reply mid next week.

    Comment by Bobby | 15 November 2008 | Reply

  19. Thanks mate. Can i have your email address pwease? I couldn’t find it on the site :P

    The current question I have at the moment regarding Bias and threshold. Should there be a bias neuron in each hidden layer. Say I have 2 hidden layers, does that mean I need 2 bias neurons?

    Comment by Louis | 16 November 2008 | Reply

  20. Hey, Excellent tutorial, you explained the concept very well!

    I have an issue with character recognition using Matlab NN toolbox, how do I pass an image to a Neural Network? I’ve done the feature extraction using image processing but i dont know how to pass the image to the train function. you can reach me at cyber_kobe@hotmail.com
    thanx in advance,

    Comment by jago | 17 November 2008 | Reply

  21. Great work, man. I have one question: after computing the weights and all the NN parameters, how do you apply this latter to a new data without training?

    Thanks.

    Comment by melfatao | 20 November 2008 | Reply

  22. I’m take an Introduction to Nueral Networks module and intend to try and implement one myself, this is a lot less high brow than the daunting notes made on the mathematics of it all.

    Thanks for summing it up in English for us mere mortals :-)

    Comment by Adrian | 10 December 2008 | Reply

  23. Hey….
    how can i run this program in visual studio 6(visual C++) ?
    thanks in advance…

    Comment by Jewel | 21 December 2008 | Reply

  24. just create a new empty c++ project and import the files, then compile. should work perfectly :P

    Comment by Bobby | 23 December 2008 | Reply

  25. G’day Bobby,

    Thanks for the C++ code and the tutorial. Still trying to get my head around NN’s, and your tutorial was most helpful.

    I want to try using a NN to predict horse racing results…

    What I was thinking of doing was to compare two horses at a time – have x number of inputs and 1 ouput ie horseA beat horseB (0/1). What I’m trying to work out is how to organise the input. I have a comprehensive database going back several years, so I have plenty of data to play with. I can extract whatever I like using SQL and Python to massage the query result sets and create the csv file. No problem there. What I would like to know is how to lay out the input data – two horses per line, each line containing back history and stats for both horses… But how do I tell the NN which data belongs to which horse? Or am I barking up the wrong tree?

    Thanks again and kind regards

    Comment by Peter | 17 February 2009 | Reply

  26. Greetings,

    For being a perfectionist, you surely do have a lot of spelling and/or typos in this post. Hehe … sorry to bust your balls about that one, but I would like to say that this code is great and is exactly what I was looking for. You have a talent of expressing complex situations in a simple manner. Usually when I read some source code I find on the internet, I find myself running around their code wondering why the hell it’s so complicated. With yours I opened it, compiled it even after switching the .sln file to MS Visual Studio 2005 on the first try, and ran it in the debugger very smoothly. The code is a little slow, but I have a feeling some of the time is spent printing out all the cout’s but I’ll about that one soon.

    Anyways, great work and I’ll leave some more feedback when I get more into the code and seeing how extendible it is.

    ,Brian T. Hannan

    Skype: brianthannan
    email: bnubbz@hotmail.com or bnubbz@gmail.com

    Comment by Brian T. Hannan | 20 February 2009 | Reply

  27. haha, yeh I don’t bother proof reading the tutorials, I neither have the time nor the patience for it.

    The code would probably be slow exactly for the reason you mentioned, it’s meant to be example code more so than a fully working implementation.

    As for the simplicity, most things are really simple but are just poorly explained, or the people explaining them over complicate things to make themselves seem smarter.

    anyhows, glad the tut helped you!

    Comment by Bobby | 21 February 2009 | Reply

  28. Bobby,

    one thing I still can’t grasp when it comes to neural nets is the expected output. What is this? Is there a node per possible identification? for example, to recognise between an ‘x’ and an ‘o’ do you need 3 outputs. One at 100% confidence of an ‘x’, one for an ‘o’ and a third for garbage?

    Thanks for a great tutorial, its been a lot of help

    Scott

    Comment by Scott Higgins | 24 February 2009 | Reply

  29. yeh scott, that’s exactly right, the output is basically a big selector with a node for each type. In some cases you might use both states for your output.

    like there was a case where i had 6 classes that i needed to distinguish from, so i used 3 outputs and then used the binary representations of 1-6 on them for the output:

    ie: no class (000), class 3(011), class 5(101) etc…

    Comment by Bobby | 26 February 2009 | Reply

  30. fantastic, I have managed to get your nerual net to work perfectly with my mouse gesture recognition program. It is very good code and easy to follow, thank you.

    I have, however, found a bug with your code. I am going to try and fix it, and if I do I will let you know.

    Basically, if I extend the number of input nodes to 32 and hidden nodes to 20, then train the network and save the weights. Running the program again, loading in the weights, appears to have a memory leak? I can feed forward once maybe twice and then it has a moan about the heap.

    I will let you know how I get on,

    Thank you again,

    Scott

    Comment by Scott Higgins | 27 February 2009 | Reply

  31. it’s very possible, the saving and loading of weights was an afterthought and i did it in a rush, its possible I’ve done something silly, I’ll see if i can find the time to track it down.

    Comment by Bobby | 28 February 2009 | Reply

  32. I like the code and layout, and am intrigued by the simplicity and speed, but it seems that the code is suited for only one purpose: classification. And given that neural networks are not the best for classification, I was hoping this could be more general (real valued outputs, being able to run exactly one iteration of backprop) Also, you only allow one hidden layer? I think maybe 2 hidden layers max is good, but only one isn’t using the full strength of neural networks (it will have proven limits, and isn’t necessarily the best for every situation). Would adding these features significantly degrade the performance? I would change the code myself, but I am afraid to, and if you do it, it will be more self consistent, and other users will be able to benefit. Thanks.

    Comment by Trevor Standley | 2 March 2009 | Reply

  33. why do you say neural networks are not the best at classification? A neural net is by definition a classifier, and as such is what it’s most commonly used for. You give it inputs and it give you an output based on the inputs, it classifies the inputs into a set output.

    Adding an extra hidden layer is easy, but once again do you have any evidence that adding this extra layer improves the accuracy of the net?

    Adding an extra layer will negatively affect the total iterations needed for training as the number of weight changes are greater.

    The outputs are real valued, i just manually force them to be binary using a basic step function. Remember that your neural net will only be as good as the training data, and you have control over the training data.

    Comment by Bobby | 2 March 2009 | Reply

  34. “A neural net is by definition a classifier” I don’t think so. I think it is more general, it is a function approximater. It is capable of learning a real valued output function.

    What I meant was that support vector machines and Bayesian networks are often preferred to neural networks for classification, boosting is also a common technique for classification.

    The project I am working on is a reinforcement learning algorithm, and I am only using back propagation to match a small set of seed data.

    I know that adding a second layer makes training much slower, but the function I am trying to learn comes from a complicated space, and the network is empirically unable to learn to match it with a single layer.

    Comment by Trevor Standley | 4 March 2009 | Reply

  35. a neural network is a pattern identifier and as such can be seen as a classifier. Whether the separate classes can be approximated by a function isn’t guaranteed. I don’t have the literature handy but my professor said there is academic literature that proves that you should never need more than a single hidden layer in any NN.

    If you are having trouble training the network it might be advisable to change your activation function or training method rather than adding an extra layer. There is also a possibility that you have too many hidden neurons in your hidden layer, there is no method to calculate the optimum architecture of a NN.

    I don’t have much experience with bayesian networks or svm’ so I cant comment on that.

    Comment by Bobby | 4 March 2009 | Reply

  36. “I don’t have the literature handy but my professor said there is academic literature that proves that you should never need more than a single hidden layer in any NN.”

    While it is true that any single valued function can be approximated to an arbitrary degree of accuracy using a single hidden layer, the decision regions used for classification are defined by multi-valued functions, and with only a single hidden layer it is not possible to represent concave decision regions or regions with holes (a bullseye for example requires 2 hidden layers to represent).

    The representation power of the neural network is very important for certain problems, especially ones in which the output of the NN is to be a vector of real values.

    Comment by Trevor Standley | 9 March 2009 | Reply

  37. please sir… i cant see the C++ code.. im a v source oriented person.. a piece of source code is all i need.. i have be reading the first part as well i.. thanx for tute…

    Comment by Asitha | 28 April 2009 | Reply

  38. a neural network

    Comment by Lal | 25 May 2009 | Reply

  39. Awesome article! Although I’ve read several introductory texts about neural networks, none them actually showed any calculations, strategies, or gave sample code; congratulations on a fine piece of work!

    Comment by Kenton | 4 June 2009 | Reply

  40. I you want to make it even faster you should consider programming in asm.

    Comment by David | 11 June 2009 | Reply

  41. bro, this tute is awesome..but i can’t download your source code…kindly post it again in this article or email me at ememeke@yahoo.com
    thanks and more power…get neuralized…

    Comment by kim | 25 June 2009 | Reply

  42. how would you tackle an optical character recognition? i mean you assumed 1 for a letter a and 0 for the rest. but i dont think it is valid for an OCR. can you help pls? i really like your work.

    Comment by Xer | 29 July 2009 | Reply

  43. see Neural Networks in action:
    http://www.sharktime.com/snn

    Comment by SharkTime | 18 August 2009 | Reply

  44. its nice , but since i agree with you that a simple solution is a best solution, so why not explain how a simple OR gate is implemented or a simple caharcter recognition problem show how the nenetwwork is traine using simple c languge code , anyway thanking you, for your valueable ideas

    Comment by pankaj | 19 September 2009 | Reply

  45. honestly both those are really simple and I’m sure you can figure it out, I’m not in the habit of spoon feeding people.

    this tutorial has source code that is pretty simple to understand.

    Comment by Bobby | 19 September 2009 | Reply

  46. Hi Bobby!

    First, I would like to thank you for this great blog. I find it best among others I have found online.
    Actually, I have used your example in implementing my master thesis. Of course, I have to optimize some things, but generally, i used your idea. I wanted to ask – why not use momentum for batch learning? I think it could be used, if error is less than previous epoch error…

    Comment by Dalcek | 28 October 2009 | Reply

  47. The momentum term is added to stochastic/online learning to overcome the fluctuations in the sign of the error, the neurons keep going back and forth unlearning and relearning with every new piece of data and so momentum helps to curb that fluctuation.

    Offline/batch learning doesn’t have this problem so the momentum term is unnecessary since all the weight changes are averaged before being applied. Also due to the weights being averaged, you cannot assume that the pushing the error in the same direction as on the past epoch will be a better solution.

    I hope this helps to answer your question?

    Comment by Bobby | 29 October 2009 | Reply

  48. Yes, thanks! I use your example, had to change it.. It works well for simple functions.. both online and batch learning, but unfortunately for me, doesn’t work well for problem i have to solve, some predictions.

    Average relative error is about 20%, where maximum allowed is 10. I am not sure why, but i am trying to figure out. I think I should try with bias separated for each neuron.

    I am not sure if the way you implemented is the same as without neurons. Are you?

    Comment by Dalcek | 30 October 2009 | Reply

  49. hi Bobby
    do you have any mtlab code for back propagation batch learning ?i have problem with it i write this program but instead of reduced the error ,it increases the error could u pls help me what is the wrong?

    Comment by zara | 21 November 2009 | Reply


Leave a comment