Model feature-map dimensions with confidence across operations. Review channels, receptive field, and memory estimates instantly. Avoid silent tensor mismatches during prototyping and deployment planning.
| Case | Operation | Input | Kernel / Stride / Pad | Filters | Repeats | Output | Parameters | MACs Per Batch |
|---|---|---|---|---|---|---|---|---|
| Vision Stem | Convolution 2D | 224 × 224 × 3 | 7×7 / 2×2 / 3×3 | 32 | 1 | 112 × 112 × 32 | 4,736 | 59,006,976 |
| Residual Stack | Convolution 2D | 56 × 56 × 64 | 3×3 / 1×1 / 1×1 | 128 | 3 | 56 × 56 × 128 | 368,640 | 1,156,055,040 |
| Decoder Upsample | Transposed Convolution 2D | 28 × 28 × 128 | 4×4 / 2×2 / 1×1 | 64 | 2 | 112 × 112 × 64 | 196,736 | 1,233,125,376 |
| Pooling Pyramid | Max Pooling 2D | 64 × 64 × 64 | 2×2 / 2×2 / 0×0 | 64 | 2 | 16 × 16 × 64 | 0 | 327,680 |
For convolution and pooling, output height uses floor(((input height + 2 × padding height − effective kernel height) ÷ stride height) + 1). Width uses the same structure.
Effective kernel height equals dilation height × (kernel height − 1) + 1. Effective kernel width follows the same rule.
For transposed convolution, output height equals ((input height − 1) × stride height) − (2 × padding height) + effective kernel height + output padding height.
Trainable parameters for grouped convolution equal filters × (input channels ÷ groups) × kernel height × kernel width, plus bias when enabled.
Estimated MACs per batch equal output height × output width × output channels × kernel height × kernel width × grouped input channels. Pooling uses window operations instead of learned parameters.
Approximate output memory equals batch size × output height × output width × output channels × bytes per value.
It estimates output tensor shape, trainable parameters, approximate MACs, activation memory, effective kernel size, and repeated-layer shape changes for common 2D operations.
Use it when you need learned upsampling in decoders, generators, or segmentation heads. It expands spatial dimensions while still applying trainable kernels.
Groups split channels into smaller paths. They reduce parameter count and compute, and they support depthwise or channel-partitioned designs.
A large kernel, heavy dilation, or small input can push the computed spatial size below one. Adjust kernel, stride, padding, or input size.
Not exactly. Many practitioners treat one MAC as roughly two FLOPs for multiply and add. This page reports MACs directly.
No. The estimate focuses on output activations and input activations. Training usually needs more memory for gradients, parameters, and optimizer tensors.
Yes. Max pooling and average pooling both use the same output-size logic here, but they add no trainable parameters.
Repeated layers help you inspect stacked blocks quickly. You can see shrinking or expanding shapes, cumulative parameters, and total compute in one pass.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.