Activation functions play a crucial role in artificial neural networks (ANNs) by introducing nonlinearity to the model, enabling it to learn complex patterns and solve a wide range of problems. The mathematics behind activation functions involves various functions that transform the input signal (weighted sum of inputs) into an output signal. Here, we will discuss a few common activation functions and their mathematical formulations:

- Sigmoid (logistic) function: The sigmoid function maps input values to the range (0, 1), making it suitable for binary classification tasks or probabilities. Mathematically, it is defined as:

σ(x) = 1 / (1 + exp(-x))

- Hyperbolic tangent (tanh) function: The tanh function is similar to the sigmoid function but maps input values to the range (-1, 1), providing a balanced output with a mean of 0. It is defined as:

tanh(x) = (exp(x) – exp(-x)) / (exp(x) + exp(-x))

- Rectified Linear Unit (ReLU) function: ReLU is a popular activation function due to its simplicity and computational efficiency. It simply sets negative input values to 0, while positive values remain unchanged. Mathematically, it is defined as:

ReLU(x) = max(0, x)

- Leaky Rectified Linear Unit (Leaky ReLU) function: Leaky ReLU is a modification of the ReLU function that allows a small negative slope when the input is negative, which can help mitigate the “dying ReLU” problem (neurons becoming inactive during training). It is defined as:

Leaky_ReLU(x) = max(αx, x), where α is a small constant (e.g., 0.01)

- Exponential Linear Unit (ELU) function: The ELU function is similar to the ReLU but has a smooth exponential curve for negative input values, which can help alleviate the vanishing gradient problem. It is defined as:

ELU(x) = x, if x > 0 ELU(x) = α(exp(x) – 1), if x ≤ 0, where α is a scaling constant (e.g., 1)

These are just a few examples of activation functions used in neural networks. The choice of activation function depends on the specific problem and the desired properties of the network, such as smoothness, computational efficiency, and output range.