Asha Harmonics Unified Model
  • Home
    • Asha Origin >
      • Asha Books
    • Cycles of Time >
      • Calendar Reconciliation
  • Copyright
  • The Asha Framework
    • How Asha Works
  • Asha CARE
    • Asha CARE Testdrive
  • Asha eBooks
    • About the Author
  • Asha Papers
    • 0 The Unseen Seam
    • CARE explained
    • Spiral Nine
    • Breadcrumbs
    • ALIC >
      • Honorable Integration
  • Agent Mesh Sim
  • ALIC SIM
  • Home
    • Asha Origin >
      • Asha Books
    • Cycles of Time >
      • Calendar Reconciliation
  • Copyright
  • The Asha Framework
    • How Asha Works
  • Asha CARE
    • Asha CARE Testdrive
  • Asha eBooks
    • About the Author
  • Asha Papers
    • 0 The Unseen Seam
    • CARE explained
    • Spiral Nine
    • Breadcrumbs
    • ALIC >
      • Honorable Integration
  • Agent Mesh Sim
  • ALIC SIM
Search by typing & pressing enter

YOUR CART

Bread Crumbs
Bread Crumbs: A Passive Compression Probe for Structural Mapping in Large Language Model Weight Files
Author: Susan L. Gardner
ORCID: 0009-0002-5372-5454
Independent Researcher
April 2026
​

Gardner, S. (2026). Bread Crumbs: A Passive Compression Probe for Structural Mapping in Large Language Model Weight Files (v1.0). Zenodo. https://doi.org/10.5281/zenodo.19582324
​
Abstract
Large language model weight files remain largely opaque despite their central role in modern AI systems. We introduce Bread Crumbs, a low-cost, CPU-only method for probing structural regularity in weight files by measuring local compressibility. The method scans model binaries in chunks and records compression ratios using standard lossless compressors (zstd, brotli), producing a coarse-grained structural map without loading the model into memory or executing inference.

Across multiple models (DeepSeek R1 Distill 8B, Gemma 3 4B/12B, Gemma 2 9B, GPT-OSS 20B), we observe consistent separation between tensor types under per-tensor chunking, with dense weight matrices exhibiting high compressibility ratios and normalization layers significantly lower. Fixed-window and Fibonacci chunking produce similar global statistics, while per-tensor alignment reveals structure that blind chunking obscures.

A second-stage coordinate-sensitivity analysis evaluates whether these signals depend on traversal order. Compressibility remains stable under native, reversed, and transposed traversals, while degrading under random permutation. Fibonacci block transforms introduce minor shifts, suggesting sensitivity to mid-range adjacency while preserving local structure.

These results indicate that compression ratio functions as a practical proxy for local statistical regularity in weight tensors. Bread Crumbs provides a lightweight structural screening method that can guide more expensive interpretability analyses.

1. Introduction
The internal structure of large language model weights remains difficult to inspect directly. Models are typically treated as opaque artifacts: loaded, quantized, and executed without direct examination of how learned information is distributed within the weight space.

This work explores a simple alternative: using lossless compression as a passive probe of structural regularity. If a region of a weight file contains repeated or predictable patterns, it should compress more efficiently than regions with higher variability. By scanning a model file in chunks and recording compression ratios, it is possible to construct a coarse structural map of the model without executing it.

The goal is not to replace existing interpretability methods, but to provide a low-cost first-pass tool that highlights structurally distinct regions of large models.

2. Theoretical Framing
Lossless compression exploits redundancy. Regions of data that contain regular patterns can be encoded more compactly, while regions with high variability resist compression. The theoretical connection between compressibility and information content is well established: Shannon's formulation of entropy as the lower bound on lossless encoding [1] links compression ratio to statistical regularity, while Kolmogorov complexity [2] frames the minimum description length of a string as a measure of its intrinsic structure. Modern general-purpose compressors such as zstd and brotli do not compute either quantity directly, but the ratios they produce serve as tractable proxies for local statistical regularity in the data they encode.

In this work, compression ratio is used as such a proxy for local statistical regularity, not as a direct measure of Shannon entropy or Kolmogorov complexity. The method does not attempt to recover semantic meaning from weights. Instead, it identifies regions where statistical structure differs, offering a coarse signal that can be inspected further using targeted methods. 

This framing is intentionally modest: compression provides a signal, not an explanation. The contribution of Bread Crumbs is methodological — a lightweight, execution-free screening tool that can guide where more expensive analyses should be applied.

3. Method
3.1 Stage 1 — Structural ScanWeight files in GGUF format were analyzed under three chunking schemes: fixed-window (1 MB default), Fibonacci (cycling between ~256 KB and ~8 MB), and per-tensor (aligned to GGUF tensor boundaries). Each chunk was compressed using zstd at level 3, with brotli, gzip, xz, and bzip2 used for validation in selected runs.  

The compression ratio was defined as:
compressed_size / original_size

Per-chunk ratios were recorded along with metadata, producing a full-file structural map.  Unless otherwise stated, reported ratios refer to zstd level 3.

3.2 Stage 2 — Coordinate Sensitivity
Selected tensors were dequantized to fp32 and re-serialized under five traversal schemes: native order, reversed, transposed, random permutation, and Fibonacci block grouping. Each variant was compressed using zstd at level 3 and the resulting ratios compared.

4. Results
4.1 Global Chunking Behavior
Across all tested models, fixed-window and Fibonacci chunking produced nearly identical global compression ratios, with no consistent advantage observed for Fibonacci at the whole-file level. This suggests that blind chunking primarily reflects aggregate file statistics rather than internal structure.

4.2 Per-Tensor Structure
Per-tensor chunking revealed a consistent separation between dense matrix tensors and normalization tensors across all tested models. Dense weight matrices (attention projections, feedforward layers, and embeddings) compressed to high ratios, while normalization tensors compressed to much lower ratios, with a gap of roughly 40-60 percentage points between the two populations in every model tested. The absolute ratios depended on the quantization format: Q4_0 dense matrices clustered near 0.94, Q4_K_M dense matrices near 0.97-0.99, and GPT-OSS's F16 attention tensors near 0.78. Normalization tensors clustered between 0.30 and 0.45 regardless of quantization. The separation was consistent across DeepSeek R1 Distill (Qwen 2 7B base), Gemma 2 9B, Gemma 3 4B and 12B, and GPT-OSS 20B, spanning three organizations, three architecture families, and multiple quantization regimes.  We additionally observed a depth-dependent pattern: normalization layers become progressively more compressible in deeper transformer blocks, with the magnitude of this drift varying by architecture. In Gemma 2 9B, the post-feedforward normalization of block 37 compressed to a ratio of 0.32, compared to roughly 0.47 in early blocks — a drop of approximately 15 percentage points across depth. Similar directional drifts were observed in all other tested models, though with varying magnitudes.

4.3 Embedding Layer Behavior
In the Gemma-family models, the token embedding tensor consistently exhibited near-maximal incompressibility in its packed quantized form, with ratios at or near 1.00. In models without weight tying between input embedding and output head, the packed embedding sat below this ceiling, suggesting that information density is distributed differently across these tensors.  

A follow-up experiment on Gemma 2 9B showed that this ceiling is partially a property of the packed representation rather than the underlying learned values. When the token embedding was dequantized to fp32 and compressed in its native row-major order, the ratio dropped from approximately 1.00 in the packed Q6_K representation to 0.71 in the dequantized fp32 representation, indicating substantial compressible structure that the packed byte layout obscures.

4.4 Cross-Model Variation
While structural separation between tensor types was consistent, overall compression profiles differed between models in ways that revealed architectural and implementation details. GPT-OSS 20B showed a distinctive two-band structure in its compression ratios, reflecting OpenAI's mixed-precision storage strategy: expert feedforward tensors at Q4_1 compressed near 0.96, while attention projections, embeddings, and output head, stored at F16, compressed near 0.78. This mixed-precision regime was visible in the probe output without prior knowledge of the storage format.

Other architectural signatures were similarly detectable. The 8:1 ratio between query and key/value projection sizes in GPT-OSS was visible directly from tensor byte counts, indicating extreme grouped-query attention. The Gemma 3 models contained a complete vision encoder (27 transformer blocks with independent attention and feedforward layers) alongside the language model, detectable from tensor names and their independent bimodal compression signature. Weight-tying between input embedding and output head was distinguishable from untied architectures by comparing the compressibility of the respective tensors. These results indicate that compressibility patterns are sensitive to architecture, tensor role, and storage/quantization format in ways that can be read from a static weight file without model execution.

4.5 Coordinate Sensitivity (Stage 2)
To test whether Stage 1 signals reflected intrinsic tensor structure or artifacts of serialization order, we evaluated three representative tensors under five traversal schemes after dequantization to fp32.

A Phase 2 experiment tested three tensors from Gemma 2 9B (the token embedding, the block 20 feedforward-down matrix, and the block 37 post-feedforward normalization vector) under five traversal orders after dequantization to fp32.  Compression ratios were stable under native and reversed traversals for all three tensors, with differences under 0.001. Fibonacci block grouping with reversed group ordering produced similarly small changes, indicating that this particular structured re-traversal did not reveal additional structure beyond the native row-major layout at the tested scale.  Transposition produced substantial changes in the 2D tensors: the block 20 feedforward matrix shifted from ratio 0.339 in native order to 0.459 when transposed, a change of approximately 36% in relative compressed size. The token embedding shifted from 0.711 to 0.753. Random permutation produced the largest shifts, with the feedforward matrix reaching ratio 0.463 and the embedding reaching 0.766. The 1D normalization vector was essentially invariant under all transforms, as expected given its lack of a secondary axis.  Taken together, these results indicate that compressibility of 2D weight tensors depends strongly on the choice of serialization axis, with row-major and column-major traversals capturing substantially different structural signals. For the feedforward matrix tested, transposition was nearly as destructive to zstd-visible structure as random permutation, suggesting that the native row-major layout captures most of the locally exploitable structure at this scale. Simple Fibonacci column re-grouping was too gentle a transform to reveal additional structure beyond this axis effect.

These findings provide narrow empirical support for the general claim that measured compressibility depends on the traversal order used to observe the data, while also clarifying that surfacing structure beyond the primary axis effect would require more radical transformations than simple reorderings of existing values.


5. Discussion
Three conclusions follow from these results. First, compression ratio functions as a useful structural signal, separating tensor roles and revealing organization that whole-file statistics obscure. Second, alignment matters more than chunk shape: per-tensor chunking exposes structure that fixed and Fibonacci chunking smooth over. Third, the signal depends on how local adjacency is exposed to the compressor: 2D tensors changed substantially under transposition and randomization, indicating that compressibility reflects directional local structure rather than a traversal-invariant property of the values alone.

The method is intentionally lightweight. It is best understood as a screening tool that identifies regions of interest before applying more computationally expensive interpretability methods.

6. Limitations
Several limitations should be noted. First, results are restricted to the specific models and quantization formats tested and may not generalize across architectures, training regimes, or bit-widths. While consistent patterns were observed across multiple model families, broader validation is required to establish generality.

Second, compression ratio is used as a proxy for local statistical regularity and cannot distinguish between structured complexity and true randomness. The method identifies differences in compressibility, but does not by itself provide semantic interpretation of high- or low-compressibility regions.

Third, the approach depends on access to tensor boundaries for per-tensor analysis, which currently requires GGUF parsing support and may not extend directly to other weight formats without additional tooling.

Fourth, the Phase 2 coordinate-sensitivity experiment showed that simple structured reordering (such as Fibonacci block grouping) was insufficient to reveal structure beyond the dominant row/column axis effects in 2D tensors. This suggests that more substantial transformations would be required to surface additional representational structure beyond what is captured by standard serialization layouts.
Finally, fine-tuning and reasoning-distillation signatures were not detectable at the resolution of the current probe. In particular, the DeepSeek R1 distillation on top of Qwen 2 produced a compression profile indistinguishable from the base model under the tested conditions, indicating that the method may not be sensitive to higher-level training differences at the tensor-scale granularity examined here.

7. Future Work
Future work will expand cross-model and cross-quantization testing, including same-model comparisons across quantization formats (for example, Q4_0, Q4_K_M, Q8_0, and fp16) to disentangle quantization artifacts from learned-weight structure. A longitudinal study using the Pythia training checkpoint suite could characterize how the structural signatures observed here emerge during training, and a cross-version comparison within model families (such as Llama 1/2/3 or Gemma 2/3) could reveal how architectural and training-recipe changes propagate to measurable compressibility patterns. Transformations beyond simple reordering — including spectral, wavelet, and phase-space re-encodings — warrant investigation as tests of whether additional structure can be surfaced beyond the native axis effect documented here. Finally, the pairing of the Bread Crumbs coarse map with targeted lightweight pattern-matching on identified regions of interest offers a natural compounding interpretability pipeline, where cheap localization guides expensive identification.

8. Conclusion
Weight files contain detectable structure even when treated as raw binary data.
By measuring compressibility across chunks, it is possible to construct a reproducible structural map without executing the model. The resulting signals are reproducible and sensitive to tensor type, model architecture, and traversal choice.
Bread Crumbs demonstrates that passive probing can provide useful insight into model structure at minimal cost, offering a practical entry point for further analysis.

References
[1] Shannon, C. E.: A Mathematical Theory of Communication. Bell System Technical Journal 27(3), 379--423 (1948)
[2] Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer (2008)
[3] Gerganov, G.: GGUF: A file format for distributing large language models. llama.cpp project (2023). https://github.com/ggerganov/llama.cpp
[4] Collet, Y., Kucherawy, M.: Zstandard Compression and the application/zstd Media Type. RFC 8878, Internet Engineering Task Force (2021)



​
Proudly powered by Weebly