CIMFlow LogoCIMFlow

Architecture

Static module composition and source-map of CIMFlow-Simulator

The simulator runtime is organized in three levels: Chip → Core → Units. This page focuses on static composition and ownership boundaries.

Chip Level and Core Level

The two outermost levels define the overall simulation topology.

Chip
One global clock, N cores, global memory, and network
Core
Decoder, registers, local memory, switch, and execute units

At chip construction, each core receives its instruction stream and connects its local switch to the shared network. The global memory switch is also bound to the same network.

Each Core contains the following subcomponents:

Decoder
Instruction fetch, decode, and issue
RegUnit
Register file access (GRF and SRF)
MemoryUnit
Local memory backends (RAM, RegBuffer)
Switch
Local endpoint for NoC communication
Execute Units
Scalar, SIMD, Reduce, Transfer, CIM Compute, CIM Control
CimUnit
CIM macro array mapped into local memory

Execute Units

Each core contains six specialized execute units:

CimComputeUnit
Matrix-vector multiply over CIM macro groups
CimControlUnit
Activation mask and output-side CIM control
SIMDUnit
Vector operations via configurable functor pipelines
ReduceUnit
Aggregation operations (max, sum) across vectors
ScalarUnit
Scalar arithmetic, load/store, and register updates
TransferUnit
Local/global memory transfers, inter-core communication, sync

Control Path Split

Branch and jump targets are resolved in the decoder. TAG/WAIT are decoded as control opcodes but executed by TransferUnit.


Source Directory Map

The tree below shows the source directory layout of the CIMFlow-Simulator C++ codebase. Each directory corresponds to a major subsystem documented in this section.

core.h
core.cpp

Last updated on