single precision
[Level-2 BLAS]

Functions

void magma_sgemv (magma_trans_t transA, magma_int_t m, magma_int_t n, float alpha, const float *dA, magma_int_t ldda, const float *dx, magma_int_t incx, float beta, float *dy, magma_int_t incy)
 Perform matrix-vector product.
void magma_sger (magma_int_t m, magma_int_t n, float alpha, const float *dx, magma_int_t incx, const float *dy, magma_int_t incy, float *dA, magma_int_t ldda)
 Perform rank-1 update, $ A = \alpha x y^H + A $.
void magma_ssymv (magma_uplo_t uplo, magma_int_t n, float alpha, const float *dA, magma_int_t ldda, const float *dx, magma_int_t incx, float beta, float *dy, magma_int_t incy)
 Perform symmetric matrix-vector product, $ y = \alpha A x + \beta y $.
void magma_ssyr (magma_uplo_t uplo, magma_int_t n, float alpha, const float *dx, magma_int_t incx, float *dA, magma_int_t ldda)
 Perform symmetric rank-1 update, $ A = \alpha x x^H + A $.
void magma_ssyr2 (magma_uplo_t uplo, magma_int_t n, float alpha, const float *dx, magma_int_t incx, const float *dy, magma_int_t incy, float *dA, magma_int_t ldda)
 Perform symmetric rank-2 update, $ A = \alpha x y^H + conj(\alpha) y x^H + A $.
void magma_strmv (magma_uplo_t uplo, magma_trans_t trans, magma_diag_t diag, magma_int_t n, const float *dA, magma_int_t ldda, float *dx, magma_int_t incx)
 Perform triangular matrix-vector product.
void magma_strsv (magma_uplo_t uplo, magma_trans_t trans, magma_diag_t diag, magma_int_t n, const float *dA, magma_int_t ldda, float *dx, magma_int_t incx)
 Solve triangular matrix-vector system (one right-hand side).
void magmablas_sgemv_conjv (magma_int_t m, magma_int_t n, float alpha, const float *A, magma_int_t lda, const float *x, magma_int_t incx, float beta, float *y, magma_int_t incy)
 SGEMV_CONJV performs the matrix-vector operation.
void magmablas_sgemv_tesla (magma_trans_t trans, magma_int_t m, magma_int_t n, float alpha, const float *A, magma_int_t lda, const float *x, magma_int_t incx, float beta, float *y, magma_int_t incy)
 This routine computes: 1) y = A x if trans == 'N' or 'n', alpha == 1, beta == 0, and incx == incy == 1 (using magmablas code) 2) y = alpha A^T x if trans == 'T' or 't', beta == 0, and incx == incy == 1 (using magmablas code) 3) y = alpha A^TRANS x + beta y otherwise, using CUBLAS.
void magmablas_sswapblk (magma_order_t order, magma_int_t n, float *dA1T, magma_int_t lda1, float *dA2T, magma_int_t lda2, magma_int_t i1, magma_int_t i2, const magma_int_t *ipiv, magma_int_t inci, magma_int_t offset)
magma_int_t magmablas_ssymv_work (magma_uplo_t uplo, magma_int_t n, float alpha, const float *A, magma_int_t lda, const float *x, magma_int_t incx, float beta, float *y, magma_int_t incy, float *dwork, magma_int_t lwork)
 magmablas_ssymv_work performs the matrix-vector operation:
magma_int_t magmablas_ssymv (magma_uplo_t uplo, magma_int_t n, float alpha, const float *A, magma_int_t lda, const float *x, magma_int_t incx, float beta, float *y, magma_int_t incy)
 magmablas_ssymv performs the matrix-vector operation:
magma_int_t magmablas_ssymv_mgpu_offset (magma_uplo_t uplo, magma_int_t n, float alpha, float **A, magma_int_t lda, float **x, magma_int_t incx, float beta, float **y, magma_int_t incy, float **work, magma_int_t lwork, magma_int_t num_gpus, magma_int_t nb, magma_int_t offset, magma_queue_t stream[][10])
 magmablas_ssymv performs the matrix-vector operation:

Function Documentation

void magma_sgemv ( magma_trans_t  transA,
magma_int_t  m,
magma_int_t  n,
float  alpha,
const float *  dA,
magma_int_t  ldda,
const float *  dx,
magma_int_t  incx,
float  beta,
float *  dy,
magma_int_t  incy 
)

Perform matrix-vector product.

$ y = \alpha A x + \beta y $ (transA == MagmaNoTrans), or
$ y = \alpha A^T x + \beta y $ (transA == MagmaTrans), or
$ y = \alpha A^H x + \beta y $ (transA == MagmaConjTrans).

Parameters:
[in] transA Operation to perform on A.
[in] m Number of rows of A. m >= 0.
[in] n Number of columns of A. n >= 0.
[in] alpha Scalar $ \alpha $
[in] dA REAL array of dimension (ldda,n), ldda >= max(1,m). The m-by-n matrix A, on GPU device.
[in] ldda Leading dimension of dA.
[in] dx REAL array on GPU device. If transA == MagmaNoTrans, the n element vector x of dimension (1 + (n-1)*incx);
otherwise, the m element vector x of dimension (1 + (m-1)*incx).
[in] incx Stride between consecutive elements of dx. incx != 0.
[in] beta Scalar $ \beta $
[in,out] dy REAL array on GPU device. If transA == MagmaNoTrans, the m element vector y of dimension (1 + (m-1)*incy);
otherwise, the n element vector y of dimension (1 + (n-1)*incy).
[in] incy Stride between consecutive elements of dy. incy != 0.
void magma_sger ( magma_int_t  m,
magma_int_t  n,
float  alpha,
const float *  dx,
magma_int_t  incx,
const float *  dy,
magma_int_t  incy,
float *  dA,
magma_int_t  ldda 
)

Perform rank-1 update, $ A = \alpha x y^H + A $.

Parameters:
[in] m Number of rows of A. m >= 0.
[in] n Number of columns of A. n >= 0.
[in] alpha Scalar $ \alpha $
[in] dx REAL array on GPU device. The m element vector x of dimension (1 + (m-1)*incx).
[in] incx Stride between consecutive elements of dx. incx != 0.
[in] dy REAL array on GPU device. The n element vector y of dimension (1 + (n-1)*incy).
[in] incy Stride between consecutive elements of dy. incy != 0.
[in,out] dA REAL array on GPU device. The m-by-n matrix A of dimension (ldda,n), ldda >= max(1,m).
[in] ldda Leading dimension of dA.
void magma_ssymv ( magma_uplo_t  uplo,
magma_int_t  n,
float  alpha,
const float *  dA,
magma_int_t  ldda,
const float *  dx,
magma_int_t  incx,
float  beta,
float *  dy,
magma_int_t  incy 
)

Perform symmetric matrix-vector product, $ y = \alpha A x + \beta y $.

Parameters:
[in] uplo Whether the upper or lower triangle of A is referenced.
[in] n Number of rows and columns of A. n >= 0.
[in] alpha Scalar $ \alpha $
[in] dA REAL array of dimension (ldda,n), ldda >= max(1,n). The n-by-n matrix A, on GPU device.
[in] ldda Leading dimension of dA.
[in] dx REAL array on GPU device. The m element vector x of dimension (1 + (m-1)*incx).
[in] incx Stride between consecutive elements of dx. incx != 0.
[in] beta Scalar $ \beta $
[in,out] dy REAL array on GPU device. The n element vector y of dimension (1 + (n-1)*incy).
[in] incy Stride between consecutive elements of dy. incy != 0.
void magma_ssyr ( magma_uplo_t  uplo,
magma_int_t  n,
float  alpha,
const float *  dx,
magma_int_t  incx,
float *  dA,
magma_int_t  ldda 
)

Perform symmetric rank-1 update, $ A = \alpha x x^H + A $.

Parameters:
[in] uplo Whether the upper or lower triangle of A is referenced.
[in] n Number of rows and columns of A. n >= 0.
[in] alpha Scalar $ \alpha $
[in] dx REAL array on GPU device. The n element vector x of dimension (1 + (n-1)*incx).
[in] incx Stride between consecutive elements of dx. incx != 0.
[in,out] dA REAL array of dimension (ldda,n), ldda >= max(1,n). The n-by-n matrix A, on GPU device.
[in] ldda Leading dimension of dA.
void magma_ssyr2 ( magma_uplo_t  uplo,
magma_int_t  n,
float  alpha,
const float *  dx,
magma_int_t  incx,
const float *  dy,
magma_int_t  incy,
float *  dA,
magma_int_t  ldda 
)

Perform symmetric rank-2 update, $ A = \alpha x y^H + conj(\alpha) y x^H + A $.

Parameters:
[in] uplo Whether the upper or lower triangle of A is referenced.
[in] n Number of rows and columns of A. n >= 0.
[in] alpha Scalar $ \alpha $
[in] dx REAL array on GPU device. The n element vector x of dimension (1 + (n-1)*incx).
[in] incx Stride between consecutive elements of dx. incx != 0.
[in] dy REAL array on GPU device. The n element vector y of dimension (1 + (n-1)*incy).
[in] incy Stride between consecutive elements of dy. incy != 0.
[in,out] dA REAL array of dimension (ldda,n), ldda >= max(1,n). The n-by-n matrix A, on GPU device.
[in] ldda Leading dimension of dA.
void magma_strmv ( magma_uplo_t  uplo,
magma_trans_t  trans,
magma_diag_t  diag,
magma_int_t  n,
const float *  dA,
magma_int_t  ldda,
float *  dx,
magma_int_t  incx 
)

Perform triangular matrix-vector product.

$ x = A x $ (trans == MagmaNoTrans), or
$ x = A^T x $ (trans == MagmaTrans), or
$ x = A^H x $ (trans == MagmaConjTrans).

Parameters:
[in] uplo Whether the upper or lower triangle of A is referenced.
[in] trans Operation to perform on A.
[in] diag Whether the diagonal of A is assumed to be unit or non-unit.
[in] n Number of rows and columns of A. n >= 0.
[in] dA REAL array of dimension (ldda,n), ldda >= max(1,n). The n-by-n matrix A, on GPU device.
[in] ldda Leading dimension of dA.
[in] dx REAL array on GPU device. The n element vector x of dimension (1 + (n-1)*incx).
[in] incx Stride between consecutive elements of dx. incx != 0.
void magma_strsv ( magma_uplo_t  uplo,
magma_trans_t  trans,
magma_diag_t  diag,
magma_int_t  n,
const float *  dA,
magma_int_t  ldda,
float *  dx,
magma_int_t  incx 
)

Solve triangular matrix-vector system (one right-hand side).

$ A x = b $ (trans == MagmaNoTrans), or
$ A^T x = b $ (trans == MagmaTrans), or
$ A^H x = b $ (trans == MagmaConjTrans).

Parameters:
[in] uplo Whether the upper or lower triangle of A is referenced.
[in] trans Operation to perform on A.
[in] diag Whether the diagonal of A is assumed to be unit or non-unit.
[in] n Number of rows and columns of A. n >= 0.
[in] dA REAL array of dimension (ldda,n), ldda >= max(1,n). The n-by-n matrix A, on GPU device.
[in] ldda Leading dimension of dA.
[in,out] dx REAL array on GPU device. On entry, the n element RHS vector b of dimension (1 + (n-1)*incx). On exit, overwritten with the solution vector x.
[in] incx Stride between consecutive elements of dx. incx != 0.
void magmablas_sgemv_conjv ( magma_int_t  m,
magma_int_t  n,
float  alpha,
const float *  A,
magma_int_t  lda,
const float *  x,
magma_int_t  incx,
float  beta,
float *  y,
magma_int_t  incy 
)

SGEMV_CONJV performs the matrix-vector operation.

y := alpha*A*conj(x) + beta*y,

where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.

Parameters:
[in] m INTEGER On entry, m specifies the number of rows of the matrix A.
[in] n INTEGER On entry, n specifies the number of columns of the matrix A
[in] alpha REAL On entry, ALPHA specifies the scalar alpha.
[in] A REAL array of dimension ( LDA, n ) on the GPU.
[in] lda INTEGER LDA specifies the leading dimension of A.
[in] x REAL array of dimension n
[in] incx Specifies the increment for the elements of X. INCX must not be zero.
[in] beta DOUBLE REAL On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input.
[out] y REAL array of dimension m
[in] incy Specifies the increment for the elements of Y. INCY must not be zero.
void magmablas_sgemv_tesla ( magma_trans_t  trans,
magma_int_t  m,
magma_int_t  n,
float  alpha,
const float *  A,
magma_int_t  lda,
const float *  x,
magma_int_t  incx,
float  beta,
float *  y,
magma_int_t  incy 
)

This routine computes: 1) y = A x if trans == 'N' or 'n', alpha == 1, beta == 0, and incx == incy == 1 (using magmablas code) 2) y = alpha A^T x if trans == 'T' or 't', beta == 0, and incx == incy == 1 (using magmablas code) 3) y = alpha A^TRANS x + beta y otherwise, using CUBLAS.

Parameters:
[in] trans magma_trans_t On entry, TRANS specifies the operation to be performed as follows:

  • = MagmaNoTrans: y := alpha*A *x + beta*y
  • = MagmaTrans: y := alpha*A^T*x + beta*y
  • = MagmaConjTrans: y := alpha*A^T*x + beta*y
[in] m INTEGER On entry, M specifies the number of rows of the matrix A.
[in] n INTEGER On entry, N specifies the number of columns of the matrix A
[in] alpha REAL On entry, ALPHA specifies the scalar alpha.
[in] A REAL array of dimension (LDA, N) on the GPU.
[in] lda INTEGER LDA specifies the leading dimension of A.
[in] x REAL array of dimension n if trans == 'n' m if trans == 't'
[in] incx Specifies the increment for the elements of X. INCX must not be zero.
[in] beta REAL On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input.
[out] y REAL array of dimension m if trans == 'n' n if trans == 't'
[in] incy Specifies the increment for the elements of Y. INCY must not be zero.
void magmablas_sswapblk ( magma_order_t  order,
magma_int_t  n,
float *  dA1T,
magma_int_t  lda1,
float *  dA2T,
magma_int_t  lda2,
magma_int_t  i1,
magma_int_t  i2,
const magma_int_t *  ipiv,
magma_int_t  inci,
magma_int_t  offset 
)
See also:
magmablas_sswapblk_q
magma_int_t magmablas_ssymv ( magma_uplo_t  uplo,
magma_int_t  n,
float  alpha,
const float *  A,
magma_int_t  lda,
const float *  x,
magma_int_t  incx,
float  beta,
float *  y,
magma_int_t  incy 
)

magmablas_ssymv performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.

Parameters:
[in] uplo magma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:

  • = MagmaUpper: Only the upper triangular part of A is to be referenced.
  • = MagmaLower: Only the lower triangular part of A is to be referenced.
[in] n INTEGER. On entry, N specifies the order of the matrix A. N must be at least zero.
[in] alpha REAL. On entry, ALPHA specifies the scalar alpha.
[in] A REAL array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero.
[in] lda INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent.
[in] x REAL array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x.
[in] incx INTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero.
[in] beta REAL. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input.
[in,out] y REAL array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y.
[in] incy INTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero.
magma_int_t magmablas_ssymv_mgpu_offset ( magma_uplo_t  uplo,
magma_int_t  n,
float  alpha,
float **  A,
magma_int_t  lda,
float **  x,
magma_int_t  incx,
float  beta,
float **  y,
magma_int_t  incy,
float **  work,
magma_int_t  lwork,
magma_int_t  num_gpus,
magma_int_t  nb,
magma_int_t  offset,
magma_queue_t  stream[][10] 
)

magmablas_ssymv performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.

Parameters:
[in] uplo magma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:

  • = MagmaUpper: Only the upper triangular part of A is to be referenced.
  • = MagmaLower: Only the lower triangular part of A is to be referenced.
[in] n INTEGER. On entry, N specifies the order of the matrix A. N must be at least zero.
[in] alpha REAL. On entry, ALPHA specifies the scalar alpha.
[in] A REAL array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero.
[in] lda INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent.
[in] x REAL array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x.
[in] incx INTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero.
[in] beta REAL. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input.
[in,out] y REAL array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y.
[in] incy INTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero.
magma_int_t magmablas_ssymv_work ( magma_uplo_t  uplo,
magma_int_t  n,
float  alpha,
const float *  A,
magma_int_t  lda,
const float *  x,
magma_int_t  incx,
float  beta,
float *  y,
magma_int_t  incy,
float *  dwork,
magma_int_t  lwork 
)

magmablas_ssymv_work performs the matrix-vector operation:

y := alpha*A*x + beta*y,

where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.

Parameters:
[in] uplo magma_uplo_t. On entry, UPLO specifies whether the upper or lower triangular part of the array A is to be referenced as follows:

  • = MagmaUpper: Only the upper triangular part of A is to be referenced.
  • = MagmaLower: Only the lower triangular part of A is to be referenced.
[in] n INTEGER. On entry, N specifies the order of the matrix A. N must be at least zero.
[in] alpha REAL. On entry, ALPHA specifies the scalar alpha.
[in] A REAL array of DIMENSION ( LDA, n ). Before entry with UPLO = MagmaUpper, the leading n by n upper triangular part of the array A must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of A is not referenced. Before entry with UPLO = MagmaLower, the leading n by n lower triangular part of the array A must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of A is not referenced. Note that the imaginary parts of the diagonal elements need not be set and are assumed to be zero.
[in] lda INTEGER. On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. LDA must be at least max( 1, n ). It is recommended that lda is multiple of 16. Otherwise performance would be deteriorated as the memory accesses would not be fully coalescent.
[in] x REAL array of dimension at least ( 1 + ( n - 1 )*abs( INCX ) ). Before entry, the incremented array X must contain the n element vector x.
[in] incx INTEGER. On entry, INCX specifies the increment for the elements of X. INCX must not be zero.
[in] beta REAL. On entry, BETA specifies the scalar beta. When BETA is supplied as zero then Y need not be set on input.
[in,out] y REAL array of dimension at least ( 1 + ( n - 1 )*abs( INCY ) ). Before entry, the incremented array Y must contain the n element vector y. On exit, Y is overwritten by the updated vector y.
[in] incy INTEGER. On entry, INCY specifies the increment for the elements of Y. INCY must not be zero.
[in] dwork (workspace) REAL array on the GPU, dimension (MAX(1, LWORK)),
[in] lwork INTEGER. The dimension of the array DWORK. LWORK >= LDA * ceil( N / NB_X ), where NB_X = 64.

MAGMA implements ssymv through two steps: 1) perform the multiplication in each thread block and put the intermediate value in dwork. 2) sum the intermediate values and store the final result in y.

magamblas_ssymv_work requires users to provide a workspace, while magmablas_ssymv is a wrapper routine allocating the workspace inside the routine and provides the same interface as cublas.

If users need to call ssymv frequently, we suggest using magmablas_ssymv_work instead of magmablas_ssymv. As the overhead to allocate and free in device memory in magmablas_ssymv would hurt performance. Our tests show that this penalty is about 10 Gflop/s when the matrix size is around 10000.


Generated on 17 Sep 2014 for MAGMA by  doxygen 1.6.1