Architecture

The simulator runtime is organized in three levels: Chip → Core → Units. This page focuses on static composition and ownership boundaries.

Chip Level and Core Level

The two outermost levels define the overall simulation topology.

Chip

One global clock, N cores, global memory, and network

Core

Decoder, registers, local memory, switch, and execute units

At chip construction, each core receives its instruction stream and connects its local switch to the shared network. The global memory switch is also bound to the same network.

Each Core contains the following subcomponents:

Decoder

Instruction fetch, decode, and issue

RegUnit

MemoryUnit

Local memory backends (RAM, RegBuffer)

Switch

Local endpoint for NoC communication

Execute Units

Scalar, SIMD, Reduce, Transfer, CIM Compute, CIM Control

CimUnit

CIM macro array mapped into local memory

Execute Units

Each core contains six specialized execute units:

CimComputeUnit

Matrix-vector multiply over CIM macro groups

CimControlUnit

Activation mask and output-side CIM control

SIMDUnit

Vector operations via configurable functor pipelines

ReduceUnit

Aggregation operations (max, sum) across vectors

ScalarUnit

Scalar arithmetic, load/store, and register updates

TransferUnit

Local/global memory transfers, inter-core communication, sync

Control Path Split

Branch and jump targets are resolved in the decoder. TAG/WAIT are decoded as control opcodes but executed by TransferUnit.

Source Directory Map

The tree below shows the source directory layout of the CIMFlow-Simulator C++ codebase. Each directory corresponds to a major subsystem documented in this section.

core.h

core.cpp

Architecture

Chip Level and Core Level

Execute Units

Source Directory Map

On this page