Activation

FromTensor=from

ToTensor=to

Kind=ReLU

Param=0

Generate code to apply an elementwise activation function.

FromTensor= Read from a pre-existing data tensor with this name. Must be a

letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. Must be a letter

followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

Kind= The kind of activation function to apply. ReLU means that if X is

positive then F(X)=X else F(X)=X*C where C is the constant negative slope

parameter.

Param= A parameter for the activation function. For ReLU this is the

negative slope parameter (0 gives standard ReLU, 0.1 gives a typical leaky

ReLU, -1 gives absolute value, 1 gives the identity function, etc.). Must be

a simple float: ^-?(0|[1-9][0-9]*)(\.[0-9]+)?$

----

Add

FromTensor1=from1

FromTensor2=from2

ToTensor=to

Generate code for the elementwise addition of two data tensors. FromTensor1,

FromTensor2, and ToTensor are all structurally identical (same number of

channels, same height, same width).

FromTensor1= Read from a pre-existing data tensor with this name. Must be a

letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

FromTensor2= Read from a pre-existing data tensor with this name. Must be a

letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. Must be a letter

followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

----

BatchNorm

FromTensor=from

ToTensor=to

Epsilon=0.001

Generate code to apply batch normalization with per-channel mean, variance,

scale, and shift parameters. Let X be an element of FromTensor and let Y be the

corresponding element of ToTensor that will be computed. X and Y are at the same

CHW coordinate in their respective tensors and the channel part of that

coordinate selects a mean M, a variance V, a scale S, and a shift H. Then

Y=S*(X-M)/SQRT(V+E)+H where E is the constant epsilon parameter (to avoid

division by zero).

FromTensor= Read from a pre-existing data tensor with this name. Must be a

letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. The user passes the

mean, variance, scale, and shift parameter tensors into the generated

initialization code through struct fields that have this same name but with

"Means", "Variances", "Scales", and "Shifts" appended (each of these

parameter tensors is an array of 32-bit floats, one float per data tensor

channel). Must be a letter followed by zero or more letters/digits:

^[a-zA-Z][a-zA-Z0-9]*$

Epsilon= A small positive number added to the variance to avoid division by

zero. Should match the value that was used for this purpose during training.

Must be a simple float: ^-?(0|[1-9][0-9]*)(\.[0-9]+)?$

----

Concat

FromTensor1=from1

FromTensor2=from2

ToTensor=to

Generate code to concatenate two tensors along the channel dimension.

FromTensor1 and FromTensor2 must have matching spatial extents (the same height

H and the same width W). If FromTensor1 has C1 channels and FromTensor2 has C2

channels then ToTensor has C1+C2 channels, height H, and width W. The feature

maps of FromTensor1 go first (they are assigned channel numbers starting with

zero) and the feature maps of FromTensor2 go next (they are assigned channel

numbers starting with C1).

FromTensor1= Read from a pre-existing data tensor with this name. The

feature maps of this tensor get the low channel numbers in ToTensor. Must be

a letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

FromTensor2= Read from a pre-existing data tensor with this name. The

feature maps of this tensor get the high channel numbers in ToTensor. Must

be a letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. Must be a letter

followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

----

Config

Prefix=NN512

Platform=AVX512Float32

L1DataCachePerThread=32KiB

L2CachePerThreadExL1=960KiB

L3CachePerThreadExL1L2=1408KiB

Settings for the code generator.

Prefix= A string used for filenames, function names, etc. Must be a letter

followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

Platform= The kind of C99 code to generate. AVX512Float32 denotes x86-64

AVX-512 Foundation (AVX512F) and 32-bit floating point.

L1DataCachePerThread= Size in bytes of each L1D cache divided by the number

of threads that share each L1D cache. A positive integer with an optional

suffix like k, K, KB, KiB, m, M, MB, MiB. The K suffixes multiply by 1024.

The M suffixes multiply by the square of 1024. After conversion to

lowercase: ^([1-9][0-9]*)([km](i?b)?)?$

L2CachePerThreadExL1= Size in bytes of each L2 cache divided by the number

of threads that share each L2 cache. This size must exclude the L1 overlap

if L2 is inclusive. A positive integer with an optional suffix like k, K,

KB, KiB, m, M, MB, MiB. The K suffixes multiply by 1024. The M suffixes

multiply by the square of 1024. After conversion to lowercase:

^([1-9][0-9]*)([km](i?b)?)?$

L3CachePerThreadExL1L2= Size in bytes of the L3 cache divided by the number

of threads that share the L3 cache. This size must exclude the L1/L2 overlap

if L3 is inclusive. A positive integer with an optional suffix like k, K,

KB, KiB, m, M, MB, MiB. The K suffixes multiply by 1024. The M suffixes

multiply by the square of 1024. After conversion to lowercase:

^([1-9][0-9]*)([km](i?b)?)?$

----

Conv

FromTensor=from

ToTensor=to

ToChannels=64

FilterH=3

FilterW=3

StrideH=1

StrideW=1

PaddingH=1

PaddingW=1

DilationH=1

DilationW=1

Groups=1

Generate code to perform cross-correlation. Suppose FromTensor has C channels,

height H, and width W. ToTensor has K (= ToChannels) channels. A formula for the

height of ToTensor is ((H+2*PaddingH)-(1+(FilterH-1)*DilationH))/StrideH+1 in

which the division truncates toward zero and the dividend must not be negative.

The width of ToTensor is calculated analogously. There are K filters in the

weight parameter tensor and each of them has C/Groups channels, a height of

FilterH, and a width of FilterW. The weight parameter tensor is in KCHW format,

32-bit floating point, fully packed (filter number is the outermost/slowest

dimension and otherwise the layout is just like an input data tensor). The bias

parameter tensor is an array of K 32-bit floats (one float for each filter),

fully packed.

FromTensor= Read from a pre-existing data tensor with this name. Must be a

letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. The user passes the

weight parameter tensor into the generated initialization code through a

struct field that has this same name but with "Weights" appended. Similarly

the bias parameter tensor ("Biases" is appended). Must be a letter followed

by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToChannels= The number of feature maps in ToTensor. This is also the number

of filters in the weight parameter tensor (the K in KCHW) and the number of

biases in the bias parameter tensor. Must be a positive integer:

^[1-9][0-9]*$

FilterH= The undilated spatial height of each filter in the weight parameter

tensor (the H in KCHW). Must be a positive integer: ^[1-9][0-9]*$

FilterW= The undilated spatial width of each filter in the weight parameter

tensor (the W in KCHW). Must be a positive integer: ^[1-9][0-9]*$

StrideH= The heightwise step between adjacent rows of the filtering position

grid (the heightwise subsampling ratio). Must be a positive integer:

^[1-9][0-9]*$

StrideW= The widthwise step between adjacent columns of the filtering

position grid (the widthwise subsampling ratio). Must be a positive integer:

^[1-9][0-9]*$

PaddingH= Implicit heightwise padding of FromTensor. This is the number of

all-zero rows to implicitly concatenate at the top of each feature map,

before the first explicit row. The same number of all-zero rows is

implicitly concatenated at the bottom of each feature map, after the last

explicit row. Must be a non-negative integer: ^0|[1-9][0-9]*$

PaddingW= Implicit widthwise padding of FromTensor. This is the number of

all-zero columns to implicitly concatenate on the left side of each feature

map, before the first explicit column. The same number of all-zero columns

is implicitly concatenated on the right side of each feature map, after the

last explicit column. Must be a non-negative integer: ^0|[1-9][0-9]*$

DilationH= The heightwise filter dilation factor. 1 means no dilation

(ordinary cross-correlation). 2 means the filter is multiplied against

FromTensor in a spatially sparse (spread out) way just as if one all-zero

row had been inserted between each pair of adjacent rows in the filter. 3 is

like if two all-zero rows had been inserted. And so on. Must be a positive

integer: ^[1-9][0-9]*$

DilationW= The widthwise filter dilation factor. 1 means no dilation

(ordinary cross-correlation). 2 means the filter is multiplied against

FromTensor in a spatially sparse (spread out) way just as if one all-zero

column had been inserted between each pair of adjacent columns in the

filter. 3 is like if two all-zero columns had been inserted. And so on. Must

be a positive integer: ^[1-9][0-9]*$

Groups= The number of disjoint cross-correlation operations to perform (no

shared data, no shared filters). Suppose FromTensor has C channels and

ToTensor has K channels (ToChannels is K). Let G be the number of groups

(both C and K must be divisible by G). Then there are K filters in the

weight parameter tensor and each of them has C/G channels. The first

operation applies the first K/G filters to the first C/G FromTensor channels

to produce the first K/G ToTensor channels. The second operation applies the

second K/G filters to the second C/G FromTensor channels to produce the

second K/G ToTensor channels. And so on. Must be a positive integer:

^[1-9][0-9]*$

----

FullyConnected

FromTensor=from

ToTensor=to

ToChannels=1000

Generate code to implement a fully connected layer. Suppose FromTensor has C

channels, height H, and width W. The weight parameter tensor consists of K

filters (where K is the ToChannels parameter) and each filter is structurally

identical to FromTensor (C channels, height H, width W). The weight parameter

tensor is in KCHW format, 32-bit floating point, fully packed (filter number is

the outermost/slowest dimension; the rest is like an input data tensor). Each

filter element is multiplied by the FromTensor element that has the same CHW

coordinate. The bias parameter tensor is an array of K 32-bit floats (one float

for each filter), fully packed. ToTensor has K channels, height 1, and width 1.

FromTensor= Read from a pre-existing data tensor with this name. Must be a

letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. The user passes the

weight parameter tensor into the generated initialization code through a

struct field that has this same name but with "Weights" appended. Similarly

the bias parameter tensor ("Biases" is appended). Must be a letter followed

by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToChannels= The number of feature maps in ToTensor (each feature map has

height 1 and width 1). This is also the number of filters in the weight

parameter tensor (the K in KCHW) and the number of biases in the bias

parameter tensor. Must be a positive integer: ^[1-9][0-9]*$

----

Input

ToTensor=image

Channels=3

Height=224

Width=224

Declare an input data tensor parameter for the generated inference function.

Input data must be in CHW format, 32-bit floating point, fully packed. The

inference code reads the input tensor memory but never writes to it.

ToTensor= A name for this input data tensor. The corresponding inference

function parameter in the generated code has the same name but with "Data"

appended. Must be a letter followed by zero or more letters/digits:

^[a-zA-Z][a-zA-Z0-9]*$

Channels= The number of feature maps for this input data tensor. This is the

C in CHW (the outermost/slowest dimension) and has a stride of

H*W*sizeof(float) bytes. Must be a positive integer: ^[1-9][0-9]*$

Height= The spatial height dimension of this input data tensor. For an image

tensor the height is usually the number of pixel rows. This is the H in CHW

(the outermost/slowest spatial dimension) and has a stride of

W*sizeof(float) bytes. Must be a positive integer: ^[1-9][0-9]*$

Width= The spatial width dimension of this input data tensor. For an image

tensor the width is usually the number of pixels per row. This is the W in

CHW (the innermost/fastest dimension) and has a stride of sizeof(float)

bytes. Must be a positive integer: ^[1-9][0-9]*$

----

Output

FromTensor=prob

Declare an output data tensor parameter for the generated inference function.

The user allocates output tensor memory and passes a pointer into the inference

function. There, output data is written in CHW format, 32-bit floating point,

fully packed.

FromTensor= The name of a data tensor that will be written back to the user

as output. Must not be the name of an input tensor and must not be the same

as another output. The corresponding inference function parameter in the

generated code has a matching name but with "Data" appended. Must be a

letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

----

Pooling

FromTensor=from

ToTensor=to

Kind=Max2x2Stride2

PaddingH=0

PaddingW=0

Generate code to apply a standard window pooling or global pooling operation.

Padding affects window placement but padding values never participate in max/avg

calculations. Therefore the padding must be small enough that every window will

contain at least one non-padding value. Each (H+2*PaddingH)x(W+2*PaddingW)

feature map in FromTensor yields a corresponding feature map in ToTensor. For

RxR window pooling with a stride of S the height of every feature map in

ToTensor is ((H+2*PaddingH)-R)/S+1 where the division by S truncates toward

zero; the dividend must not be negative. The width formula is analogous. For

global pooling there is no padding and every feature map in ToTensor is 1x1.

FromTensor= Read from a pre-existing data tensor with this name. Must be a

letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. Must be a letter

followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

Kind= The kind of pooling operation to apply. Max2x2Stride2 and

Avg2x2Stride2 produce a single value for each 2x2 window and there is no

overlap between adjacent windows. Max3x3Stride2 and Avg3x3Stride2 produce a

single value for each 3x3 window and adjacent windows overlap. MaxGlobal and

AvgGlobal produce a single value for each feature map.

PaddingH= Implicit heightwise padding of FromTensor. This is the number of

all-zero rows to implicitly concatenate at the top of each feature map,

before the first explicit row. The same number of all-zero rows is

implicitly concatenated at the bottom of each feature map, after the last

explicit row. Must be a non-negative integer: ^0|[1-9][0-9]*$

PaddingW= Implicit widthwise padding of FromTensor. This is the number of

all-zero columns to implicitly concatenate on the left side of each feature

map, before the first explicit column. The same number of all-zero columns

is implicitly concatenated on the right side of each feature map, after the

last explicit column. Must be a non-negative integer: ^0|[1-9][0-9]*$

----

Softmax

FromTensor=from

ToTensor=to

Generate code to compute softmax along the channel dimension independently for

each spatial (height, width) location. FromTensor and ToTensor have the same

number of channels, the same height, and the same width.

FromTensor= Read from a pre-existing data tensor with this name. Must be a

letter followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$

ToTensor= Write to a new data tensor with this name. Must be a letter

followed by zero or more letters/digits: ^[a-zA-Z][a-zA-Z0-9]*$