panther.nn.linear_kernels package#

Submodules#

panther.nn.linear_kernels.backward module#

panther.nn.linear_kernels.forward module#

Linear Kernel Implementation for Panther Neural Network

This file contains optimized CUDA kernels implemented with Triton for performing efficient linear operations in a structured matrix multiplication. The implementation follows a two-pass approach for better performance.

First pass: Computes intermediate values using hidden input and structured matrices. Second pass: Combines these intermediate values to produce the final output.

panther.nn.linear_kernels.forward.first_pass(hin, S1s, U2s)[source]#

Perform the first pass of the structured matrix multiplication.

This function sets up and launches the first_pass_kernel to compute two intermediate results by multiplying the hidden input with structured matrices S1s and U2s.

Parameters:
  • hin – Hidden input tensor of shape [BSIZE, d2]

  • S1s – First structured matrix of shape [L, d2, K]

  • U2s – Second structured matrix of shape [L, d2, K]

Returns:

(out1, out2) - Two output tensors of shape [L, BSIZE, K]

Return type:

tuple

panther.nn.linear_kernels.forward.second_pass(in1, in2, U1s, S2s, bias)[source]#

Perform the second pass of the structured matrix multiplication.

This function sets up and launches the second_pass_kernel to compute the final output by combining the intermediate results from first_pass with structured matrices U1s and S2s.

Parameters:
  • in1 – First input tensor from first pass of shape [L, BSIZE, K]

  • in2 – Second input tensor from first pass of shape [L, BSIZE, K]

  • U1s – Third structured matrix of shape [L, K, d1]

  • S2s – Fourth structured matrix of shape [L, K, d1]

  • bias – Bias tensor of shape [1, d1]

Returns:

Output tensor of shape [BSIZE, d1]

Return type:

torch.Tensor

Module contents#