What is an Activation Function in a Neural Network?
This article is conceptual only, and only covers how to understand the very basics of what an Activation function is.
This article is fantastic and was my starting point for the content of this post. It’s much more detailed and is good for people who already have some fundamental understanding.
My goal is to put things into a context where I can better understand them as a person coming from a weaker mathematics background.
What Does an Activation Function Do?
Let’s imagine two pipes meant to carry fluid from place to place. We want the contents of pipe A to empty into pipe B, and we need to make a junction so that this fluid can travel between the pipes.
But there is a catch. We want the fluid coming into pipe B to be in a state where it best fits the type of fluid that pipe B should carry so that the fluid is in the ‘best state’ possible when it exits the system.
The Activation Function in this analogy would be the junction, and it would change the state of the fluid so it’s more useful to the larger system. So for this example, we could say it heats or cools it, or maybe it pressurizes or depressurizes it. Whatever it’s set up to do, it does some changes to the values passing through it.
In a previous post where I’m reviewing the tensorflow tutorial, I use a ReLU Activation function. And the maths behind it look like this:
f(x) = max(0, x)
So in our pipes example, the layers of the model are our pipes, and the ReLU activation function is the junction between the pipes that mutates values passing through it for purposes that are specific to that model. Could be normalization, could be simulating real-world behaviors.
The end goal of it is to make the model as valuable to the purpose it serves, as possible, by mutating values as they pass between nodes or layers.
Why is this useful?
The primary reason to use activation functions appears to be the ability to create non-linear data. It seems that being able to introduce a higher level of chaos into the model is a key ingredient in making the system more capable of doing the tasks it is frequently given.
I don’t have a lot to add here because I’m inexperienced, and I don’t have a good understanding of what can be done to make data flow appear more like reality.
How to Choose an Activation Function?
Now that we understand what it’s supposed to do, and a kind of vague reason to use them, how do we decide on which function to use?
I don’t know! This is going to be a separate post on its own. Again I can only defer to this blog from earlier in the post, and the information it contains. I will have to invest more time to understand ways to differentiate use cases for activation functions. It seems that there is a consensus that ReLU is a good place to start, but I can’t even verify that just yet.
I’ll come back and update this post once I learn this.
Conclusion
An activation function is a piece of the ML puzzle that typically goes between layers in the model, and mutates data moving between the layers according to which function is selected. The purpose of doing this is mostly to create non-linear values that make the model better at understanding data that is being generated from real sources. And there are different functions that are better in certain cases, but I haven’t quite learned what those are just yet.