panther.nn.linear_kernels package#
Submodules#
panther.nn.linear_kernels.backward module#
panther.nn.linear_kernels.forward module#
Linear Kernel Implementation for Panther Neural Network
This file contains optimized CUDA kernels implemented with Triton for performing efficient linear operations in a structured matrix multiplication. The implementation follows a two-pass approach for better performance.
First pass: Computes intermediate values using hidden input and structured matrices. Second pass: Combines these intermediate values to produce the final output.
- panther.nn.linear_kernels.forward.first_pass(hin, S1s, U2s)[source]#
Perform the first pass of the structured matrix multiplication.
This function sets up and launches the first_pass_kernel to compute two intermediate results by multiplying the hidden input with structured matrices S1s and U2s.
- Parameters:
hin – Hidden input tensor of shape [BSIZE, d2]
S1s – First structured matrix of shape [L, d2, K]
U2s – Second structured matrix of shape [L, d2, K]
- Returns:
(out1, out2) - Two output tensors of shape [L, BSIZE, K]
- Return type:
- panther.nn.linear_kernels.forward.second_pass(in1, in2, U1s, S2s, bias)[source]#
Perform the second pass of the structured matrix multiplication.
This function sets up and launches the second_pass_kernel to compute the final output by combining the intermediate results from first_pass with structured matrices U1s and S2s.
- Parameters:
in1 – First input tensor from first pass of shape [L, BSIZE, K]
in2 – Second input tensor from first pass of shape [L, BSIZE, K]
U1s – Third structured matrix of shape [L, K, d1]
S2s – Fourth structured matrix of shape [L, K, d1]
bias – Bias tensor of shape [1, d1]
- Returns:
Output tensor of shape [BSIZE, d1]
- Return type: