NN-512

Back

Activation
FromTensor=from
ToTensor=to
Kind=ReLU
Param=0

Generate code to apply an elementwise activation function.

FromTensor= Read from a pre-existing data tensor with this name. Must be a
letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. Must be a letter
followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

Kind= The kind of activation function to apply. ReLU means that if X is
positive then F(X)=X else F(X)=X*C where C is the constant negative slope
parameter.

Param= A parameter for the activation function. For ReLU this is the
negative slope parameter (0 gives standard ReLU, 0.1 gives a typical leaky
ReLU, -1 gives absolute value, 1 gives the identity function, etc.). Must be
a simple float: ^-?(0|[1-9][0-9]*)(\.[0-9]+)?$

----

Add
FromTensor1=from1
FromTensor2=from2
ToTensor=to

Generate code for the elementwise addition of two data tensors. FromTensor1,
FromTensor2, and ToTensor are all structurally identical (same number of
channels, same height, same width).

FromTensor1= Read from a pre-existing data tensor with this name. Must be a
letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

FromTensor2= Read from a pre-existing data tensor with this name. Must be a
letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. Must be a letter
followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

----

BatchNorm
FromTensor=from
ToTensor=to
Epsilon=0.001

Generate code to apply batch normalization with per-channel mean, variance,
scale, and shift parameters. Let X be an element of FromTensor and let Y be the
corresponding element of ToTensor that will be computed. X and Y are at the same
CHW coordinate in their respective tensors and the channel part of that
coordinate selects a mean M, a variance V, a scale S, and a shift H. Then
Y=S*(X-M)/SQRT(V+E)+H where E is the constant epsilon parameter (to avoid
division by zero).

FromTensor= Read from a pre-existing data tensor with this name. Must be a
letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. The user passes the
mean, variance, scale, and shift parameter tensors into the generated
initialization code through struct fields that have this same name but with
"Means", "Variances", "Scales", and "Shifts" appended (each of these
parameter tensors is an array of 32-bit floats, one float per data tensor
channel). Must be a letter followed by zero or more letters/digits:
^[a-zA-Z][a-zA-Z0-9]*$

Epsilon= A small positive number added to the variance to avoid division by
zero. Should match the value that was used for this purpose during training.
Must be a simple float: ^-?(0|[1-9][0-9]*)(\.[0-9]+)?$

----

Concat
FromTensor1=from1
FromTensor2=from2
ToTensor=to

Generate code to concatenate two tensors along the channel dimension.
FromTensor1 and FromTensor2 must have matching spatial extents (the same height
H and the same width W). If FromTensor1 has C1 channels and FromTensor2 has C2
channels then ToTensor has C1+C2 channels, height H, and width W. The feature
maps of FromTensor1 go first (they are assigned channel numbers starting with
zero) and the feature maps of FromTensor2 go next (they are assigned channel
numbers starting with C1).

FromTensor1= Read from a pre-existing data tensor with this name. The
feature maps of this tensor get the low channel numbers in ToTensor. Must be
a letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

FromTensor2= Read from a pre-existing data tensor with this name. The
feature maps of this tensor get the high channel numbers in ToTensor. Must
be a letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. Must be a letter
followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

----

Config
Prefix=NN512
Platform=AVX512Float32
L1DataCachePerThread=32KiB
L2CachePerThreadExL1=960KiB
L3CachePerThreadExL1L2=1408KiB

Settings for the code generator.

Prefix= A string used for filenames, function names, etc. Must be a letter
followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

Platform= The kind of C99 code to generate. AVX512Float32 denotes x86-64
AVX-512 Foundation (AVX512F) and 32-bit floating point.

L1DataCachePerThread= Size in bytes of each L1D cache divided by the number
of threads that share each L1D cache. A positive integer with an optional
suffix like k, K, KB, KiB, m, M, MB, MiB. The K suffixes multiply by 1024.
The M suffixes multiply by the square of 1024. After conversion to
lowercase: ^([1-9][0-9]*)([km](i?b)?)?$

L2CachePerThreadExL1= Size in bytes of each L2 cache divided by the number
of threads that share each L2 cache. This size must exclude the L1 overlap
if L2 is inclusive. A positive integer with an optional suffix like k, K,
KB, KiB, m, M, MB, MiB. The K suffixes multiply by 1024. The M suffixes
multiply by the square of 1024. After conversion to lowercase:
^([1-9][0-9]*)([km](i?b)?)?$

L3CachePerThreadExL1L2= Size in bytes of the L3 cache divided by the number
of threads that share the L3 cache. This size must exclude the L1/L2 overlap
if L3 is inclusive. A positive integer with an optional suffix like k, K,
KB, KiB, m, M, MB, MiB. The K suffixes multiply by 1024. The M suffixes
multiply by the square of 1024. After conversion to lowercase:
^([1-9][0-9]*)([km](i?b)?)?$

----

Conv
FromTensor=from
ToTensor=to
ToChannels=64
FilterH=3
FilterW=3
StrideH=1
StrideW=1
PaddingH=1
PaddingW=1
DilationH=1
DilationW=1
Groups=1

Generate code to perform cross-correlation. Suppose FromTensor has C channels,
height H, and width W. ToTensor has K (= ToChannels) channels. A formula for the
height of ToTensor is ((H+2*PaddingH)-(1+(FilterH-1)*DilationH))/StrideH+1 in
which the division truncates toward zero and the dividend must not be negative.
The width of ToTensor is calculated analogously. There are K filters in the
weight parameter tensor and each of them has C/Groups channels, a height of
FilterH, and a width of FilterW. The weight parameter tensor is in KCHW format,
32-bit floating point, fully packed (filter number is the outermost/slowest
dimension and otherwise the layout is just like an input data tensor). The bias
parameter tensor is an array of K 32-bit floats (one float for each filter),
fully packed.

FromTensor= Read from a pre-existing data tensor with this name. Must be a
letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. The user passes the
weight parameter tensor into the generated initialization code through a
struct field that has this same name but with "Weights" appended. Similarly
the bias parameter tensor ("Biases" is appended). Must be a letter followed
by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToChannels= The number of feature maps in ToTensor. This is also the number
of filters in the weight parameter tensor (the K in KCHW) and the number of
biases in the bias parameter tensor. Must be a positive integer:
^[1-9][0-9]*$

FilterH= The undilated spatial height of each filter in the weight parameter
tensor (the H in KCHW). Must be a positive integer: ^[1-9][0-9]*$

FilterW= The undilated spatial width of each filter in the weight parameter
tensor (the W in KCHW). Must be a positive integer: ^[1-9][0-9]*$

StrideH= The heightwise step between adjacent rows of the filtering position
grid (the heightwise subsampling ratio). Must be a positive integer:
^[1-9][0-9]*$

StrideW= The widthwise step between adjacent columns of the filtering
position grid (the widthwise subsampling ratio). Must be a positive integer:
^[1-9][0-9]*$

PaddingH= Implicit heightwise padding of FromTensor. This is the number of
all-zero rows to implicitly concatenate at the top of each feature map,
before the first explicit row. The same number of all-zero rows is
implicitly concatenated at the bottom of each feature map, after the last
explicit row. Must be a non-negative integer: ^0|[1-9][0-9]*$

PaddingW= Implicit widthwise padding of FromTensor. This is the number of
all-zero columns to implicitly concatenate on the left side of each feature
map, before the first explicit column. The same number of all-zero columns
is implicitly concatenated on the right side of each feature map, after the
last explicit column. Must be a non-negative integer: ^0|[1-9][0-9]*$

DilationH= The heightwise filter dilation factor. 1 means no dilation
(ordinary cross-correlation). 2 means the filter is multiplied against
FromTensor in a spatially sparse (spread out) way just as if one all-zero
row had been inserted between each pair of adjacent rows in the filter. 3 is
like if two all-zero rows had been inserted. And so on. Must be a positive
integer: ^[1-9][0-9]*$

DilationW= The widthwise filter dilation factor. 1 means no dilation
(ordinary cross-correlation). 2 means the filter is multiplied against
FromTensor in a spatially sparse (spread out) way just as if one all-zero
column had been inserted between each pair of adjacent columns in the
filter. 3 is like if two all-zero columns had been inserted. And so on. Must
be a positive integer: ^[1-9][0-9]*$

Groups= The number of disjoint cross-correlation operations to perform (no
shared data, no shared filters). Suppose FromTensor has C channels and
ToTensor has K channels (ToChannels is K). Let G be the number of groups
(both C and K must be divisible by G). Then there are K filters in the
weight parameter tensor and each of them has C/G channels. The first
operation applies the first K/G filters to the first C/G FromTensor channels
to produce the first K/G ToTensor channels. The second operation applies the
second K/G filters to the second C/G FromTensor channels to produce the
second K/G ToTensor channels. And so on. Must be a positive integer:
^[1-9][0-9]*$

----

FullyConnected
FromTensor=from
ToTensor=to
ToChannels=1000

Generate code to implement a fully connected layer. Suppose FromTensor has C
channels, height H, and width W. The weight parameter tensor consists of K
filters (where K is the ToChannels parameter) and each filter is structurally
identical to FromTensor (C channels, height H, width W). The weight parameter
tensor is in KCHW format, 32-bit floating point, fully packed (filter number is
the outermost/slowest dimension; the rest is like an input data tensor). Each
filter element is multiplied by the FromTensor element that has the same CHW
coordinate. The bias parameter tensor is an array of K 32-bit floats (one float
for each filter), fully packed. ToTensor has K channels, height 1, and width 1.

FromTensor= Read from a pre-existing data tensor with this name. Must be a
letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. The user passes the
weight parameter tensor into the generated initialization code through a
struct field that has this same name but with "Weights" appended. Similarly
the bias parameter tensor ("Biases" is appended). Must be a letter followed
by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToChannels= The number of feature maps in ToTensor (each feature map has
height 1 and width 1). This is also the number of filters in the weight
parameter tensor (the K in KCHW) and the number of biases in the bias
parameter tensor. Must be a positive integer: ^[1-9][0-9]*$

----

Input
ToTensor=image
Channels=3
Height=224
Width=224

Declare an input data tensor parameter for the generated inference function.
Input data must be in CHW format, 32-bit floating point, fully packed. The
inference code reads the input tensor memory but never writes to it.

ToTensor= A name for this input data tensor. The corresponding inference
function parameter in the generated code has the same name but with "Data"
appended. Must be a letter followed by zero or more letters/digits:
^[a-zA-Z][a-zA-Z0-9]*$

Channels= The number of feature maps for this input data tensor. This is the
C in CHW (the outermost/slowest dimension) and has a stride of
H*W*sizeof(float) bytes. Must be a positive integer: ^[1-9][0-9]*$

Height= The spatial height dimension of this input data tensor. For an image
tensor the height is usually the number of pixel rows. This is the H in CHW
(the outermost/slowest spatial dimension) and has a stride of
W*sizeof(float) bytes. Must be a positive integer: ^[1-9][0-9]*$

Width= The spatial width dimension of this input data tensor. For an image
tensor the width is usually the number of pixels per row. This is the W in
CHW (the innermost/fastest dimension) and has a stride of sizeof(float)
bytes. Must be a positive integer: ^[1-9][0-9]*$

----

Output
FromTensor=prob

Declare an output data tensor parameter for the generated inference function.
The user allocates output tensor memory and passes a pointer into the inference
function. There, output data is written in CHW format, 32-bit floating point,
fully packed.

FromTensor= The name of a data tensor that will be written back to the user
as output. Must not be the name of an input tensor and must not be the same
as another output. The corresponding inference function parameter in the
generated code has a matching name but with "Data" appended. Must be a
letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

----

Pooling
FromTensor=from
ToTensor=to
Kind=Max2x2Stride2
PaddingH=0
PaddingW=0

Generate code to apply a standard window pooling or global pooling operation.
Padding affects window placement but padding values never participate in max/avg
calculations. Therefore the padding must be small enough that every window will
contain at least one non-padding value. Each (H+2*PaddingH)x(W+2*PaddingW)
feature map in FromTensor yields a corresponding feature map in ToTensor. For
RxR window pooling with a stride of S the height of every feature map in
ToTensor is ((H+2*PaddingH)-R)/S+1 where the division by S truncates toward
zero; the dividend must not be negative. The width formula is analogous. For
global pooling there is no padding and every feature map in ToTensor is 1x1.

FromTensor= Read from a pre-existing data tensor with this name. Must be a
letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. Must be a letter
followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

Kind= The kind of pooling operation to apply. Max2x2Stride2 and
Avg2x2Stride2 produce a single value for each 2x2 window and there is no
overlap between adjacent windows. Max3x3Stride2 and Avg3x3Stride2 produce a
single value for each 3x3 window and adjacent windows overlap. MaxGlobal and
AvgGlobal produce a single value for each feature map.

PaddingH= Implicit heightwise padding of FromTensor. This is the number of
all-zero rows to implicitly concatenate at the top of each feature map,
before the first explicit row. The same number of all-zero rows is
implicitly concatenated at the bottom of each feature map, after the last
explicit row. Must be a non-negative integer: ^0|[1-9][0-9]*$

PaddingW= Implicit widthwise padding of FromTensor. This is the number of
all-zero columns to implicitly concatenate on the left side of each feature
map, before the first explicit column. The same number of all-zero columns
is implicitly concatenated on the right side of each feature map, after the
last explicit column. Must be a non-negative integer: ^0|[1-9][0-9]*$

----

Softmax
FromTensor=from
ToTensor=to

Generate code to compute softmax along the channel dimension independently for
each spatial (height, width) location. FromTensor and ToTensor have the same
number of channels, the same height, and the same width.

FromTensor= Read from a pre-existing data tensor with this name. Must be a
letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. Must be a letter
followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

Top