quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, )
model = AutoModelForCausalLM.from_pretrained( "completetinymodelraven_top", quantization_config=quant_config, device_map="auto", trust_remote_code=True # Required for Raven architecture )
tokenizer = AutoTokenizer.from_pretrained("completetinymodelraven_top") completetinymodelraven top
How did they fit a Raven-level reasoner into 1B parameters? The paper mentions a novel head called the G Laplacian Top. In graph theory, the Laplacian matrix represents connectivity. This model dynamically rewires its attention heads based on the topological complexity of the prompt.
Practical Implication: When you ask the Raven Top a question, it doesn't search its memory for an answer. It visualizes the problem as a graph (Nodes = Concepts, Edges = Relationships) and solves for the shortest path. This is remarkably close to how human working memory functions. Solution: Update your transformers library
Because the CompleteTinyModelRaven Top runs locally, there is no data leakage to API endpoints. However, the model is not aligned against harmful content by default. The base "Raven Top" was trained on a filtered Common Crawl subset, but developers should implement their own safety guardrails if deploying in public-facing applications.
A lightweight safety filter is included in the safety/ folder of the repository. Enable it via: quant_config = BitsAndBytesConfig(
load_in_4bit=True
model.enable_safety_filter(threshold=0.85)
Solution: Update your transformers library. The Raven architecture was merged in PR #28745. Alternatively, run pip install --upgrade transformers.
If you’ve picked up the top-rated Raven model, painting it can be a joy due to its size. Here is how to make your "complete" model stand out: