What is a Dense Layer in Neural Networks?
In a neural network, there are layers of nodes or neurons. These layers are set up uniquely in models to produce different results based on the use case.
In this case, the dense layer, or fully connected layer, we see each of the nodes that are present in the layer being connected to the output of each ‘neuron’ in the previous layer, as well as each node output present in this dense layer being connected to the next layer.
A dense layer commonly has an activation function, but there are also instances where no activation function is present for a dense layer.
What are Dense Layers Used to Achieve?
My first step in understanding something that is pretty foreign to me is to first understand what it does. The invisible nature of ML processes (programming or data in general) makes this difficult. So I’ll aggregate some Google research here to understand more about what Dense layers are meant to do.
Simplicity (relatively): Dense layers are straightforward to understand, so it’s easier to implement them across a wider range of problems. They combine weighted input features, which help the network learn patterns and relationships in the data.
Versatility: Their relative simplicity makes it possible for them to be good choices to use in a pretty large range of network types.
Aggregating Information: Dense layers can aggregate information from previous layers or inputs, making them useful for combining and processing information from different parts of the network. I’m not 100% sure if you have to connect layers sequentially or not at the time of writing.
Non-linearity: By using activation functions in Dense layers, nonlinearity is introduced into the network, enabling it to learn complex relationships in the input data. This nonlinearity is essential for capturing intricate patterns and approximating complex functions.
Uncommon Correlation Classifications: The fully connected structure of Dense layers allows them to consider all possible combinations of input features, potentially capturing relationships that other layer types might miss.
Output Layer: Dense layers are often used as the output layer of a neural network, producing the final predictions or classifications. The choice of activation function in the output Dense layer depends on the specific problem, such as using a sigmoid activation for binary classification or a softmax activation for multi-class classification.
What are some things a Dense Layer should not be used for?
Computationally expensive applications: if the application has very large-scale data, and will definitely require a lot of CPU, it makes sense to avoid Dense layers as much as possible in these situations.
When Specific (local) Connectivity Matters: In instances where the output of the model needs to capture or account for very specific connectivity between bits of data, a dense layer can potentially cause issues here because each node is connected to every input node. This one is harder for me to understand fully, but from what I can tell this sort of interconnectivity in the dense layers causes the resolution of ‘locality’ to get lost.
Graph Structured Data: This piggybacks off the previous point, but graph data is specifically all about the relationships contained within the graph. A dense layer could potentially be useful using this sort of data, but I think it’s probably going to cause issues because it won’t be able to account for the relationships of the graph.
I’m sure there are more places where these dense layers may not be a good fit. And I plan to tinker more with this stuff a bit more to get a more contextual understanding so that I can properly demonstrate my understanding vs. regurgitating other stuff I’m reading. This is a part of my learning process though, so I don’t want to skip it.
Conclusion
Dense (fully connected) layers in Neural Networks have quite a lot of use-cases. They are good at revealing unexpected correlations, creating non-linear outputs, and are among the simplest of layer types to implement.
There are plenty of reasons to avoid using them, especially in cases where computational power comes into question, and the requirement of maintaining relationships between nodes in the graph is a part of the final output.