Artificial Intelligence AI News
رفتن به کانال در Telegram
We are a community of machine learning enthusiasts/researchers/journalists/writers who share interesting news and articles about the applications of AI. You will never miss any updates on ML/AI/CV/NLP fields because we post them daily. JOIN NOW
نمایش بیشترکشور مشخص نشده استفناوری و برنامهها24 572
3 156
مشترکین
+224 ساعت
+197 روز
+6030 روز
آرشیو پست ها
NVIDIA AI Introduces ASPIRE: A Self-Improving Robotics Framework Reaching 31% Zero-Shot on LIBERO-Pro Long Tasks
Most robot-coding agents throw away everything they learn. Solve a task, discard the fix, start the next one cold — the agent on its 100th task is no smarter than on its first. NVIDIA's ASPIRE draws a clean line between that and an agent whose experience actually compounds.
They introduced ASPIRE (Agentic Skill Programming through Iterative Robot Exploration) — a code-as-policy system where a coding agent (Claude Code, Claude Opus 4.6, 1M-token context) writes and debugs its own robot programs against a fixed perception/planning/control API, and distills every validated fix into a reusable skill library, with no fixed perception-plan-execute pipeline anywhere in the loop.
Here's what's actually interesting:
→ The execution engine logs per-primitive multimodal traces — RGB keyframes, grasp candidates, object poses, motion plans, return status — so the agent localizes the failing primitive, not just the failed rollout
→ Validated fixes distill into a text skill library (failure signature + when-to-apply guard + repair sketch), not weights — and the agent is barred from reading sim ground truth, so the skills transfer to real hardware
→ Evolutionary search proposes K candidate programs per round, conditioned on surviving programs + residual failure traces — beyond single-trajectory tuning
→ LIBERO-Pro Object under perturbation: 98 vs 22 for CaP-Agent0
→ Robosuite bimanual handover: 92 vs 20 for CaP-Agent0
→ LIBERO-Pro Long zero-shot: 31 vs 4 for prior methods (skills learned on LIBERO-90, no test-time retries)
On a real bimanual robot with a different embodiment and API (OpenAI Codex GPT-5.5), transferred skills took soda-can lifting to 19/20 at ~10x fewer tokens, and drawer opening from 0/20 to 11/20.
Full analysis: https://www.marktechpost.com/2026/07/03/nvidia-ai-introduces-aspire-a-self-improving-robotics-framework-reaching-31-zero-shot-on-libero-pro-long-tasks/
Paper: https://arxiv.org/pdf/2607.00272
Project page: https://research.nvidia.com/labs/gear/aspire/
Mistral AI Releases Leanstral 1.5: An Apache-2.0 Lean 4 Code Agent Model Solving 587 of 672 PutnamBench Problems
Most AI theorem proving is a language model generating a proof in one shot, with a verifier bolted on at the end to check it. That's autocomplete with a grader — and Mistral just drew a clear line between that and an actual proof agent.
They released Leanstral 1.5 — a 119B MoE with 6.5B active parameters, trained as a code agent that lives inside the Lean 4 compiler loop: propose a proof, read the compiler's goals and errors, refine, repeat until it compiles or the budget runs out. Verification isn't the eval here. It's the training signal.
Here's what's actually interesting:
→ Test-time scaling behaves like a dial: PutnamBench Pass@8 climbs 44 → 244 → 493 → 587 solved as the per-attempt token budget moves 50k → 200k → 1M → 4M
→ 587/672 on PutnamBench at ~$4 per problem, versus an estimated $300+ for Seed-Prover 1.5 high (a 10 H20-days-per-problem budget)
→ Saturates miniF2F: 100% on both validation and test sets
→ Two RL environments in training — a multiturn prover, and a raw-filesystem code agent that edits files, runs bash, and queries the Lean language server for live goals and types
→ Not just math: an Aeneas (Rust → Lean) pipeline flagged 11 genuine bugs across 57 repos, 5 previously unreported — including an integer overflow in datrs/varinteger when (value + 1) hits Std.U64.MAX
Apache 2.0 weights, free API endpoint
Full analysis: https://www.marktechpost.com/2026/07/03/mistral-ai-releases-leanstral-1-5-an-apache-2-0-lean-4-code-agent-model-solving-587-of-672-putnambench-problems/
Model weights: https://huggingface.co/mistralai/Leanstral-1.5-119B-A6B
Project: https://docs.mistral.ai/models/model-cards/leanstral-1-5
Technical Details: https://mistral.ai/news/leanstral-1-5/
Meet WebBrain: An Open-Source, Local-First AI Browser Agent That Reads Pages and Automates Tasks in Chrome and Firefox
WebBrain lives inside your browser and can run entirely on your own local model — no cloud, no account, no data leaving your machine.
Most "AI browser agents" are a chat box that pastes your page into someone else's server. That's not an agent that lives where you browse — and WebBrain draws a very clear line between the two.
It's an open-source (MIT), local-first browser agent for Chrome and Firefox. It runs inside your existing authenticated session, on a model you pick — so with llama.cpp or Ollama, nothing leaves your machine.
Here's what's actually interesting:
→ Two modes, cleanly separated. Ask reads the page (read-only, content scripts). Act clicks and types through the Chrome DevTools Protocol (chrome.debugger) — trusted input events that modern sites honor, reaching cross-origin iframes and shadow DOM.
→ UI-first by design. For anything that submits, sends, or buys, it drives the visible UI and refuses to hit REST/GraphQL endpoints directly. It starts read-only and asks before consequential actions.
→ Bring any model. llama.cpp, Ollama, LM Studio, vLLM — or OpenAI, Claude, Gemini, DeepSeek, Groq, OpenRouter. Recommended local: Qwen 3.6 35B (Qwen3.6-35B-A3B), which beat Gemma 4 on the project's screenshot benchmark.
→ Tuned for cost and privacy. Token-conscious screenshots, oldest-first context trimming, a dedicated vision model, 40+ tools (~20 in Compact mode). No telemetry. No accounts.
Full analysis: https://www.marktechpost.com/2026/07/02/meet-webbrain-an-open-source-local-first-ai-browser-agent-that-reads-pages-and-automates-tasks-in-chrome-and-firefox/
GitHub Repo: https://pxllnk.co/wdva98c
Chrome Extension: https://pxllnk.co/p4mn8
Firefox Add-on: https://pxllnk.co/m6k7c5w9
Portal: https://pxllnk.co/rlifl7h
Most browser agents drive the page from the outside. An external process, a headless browser, screenshots piped to a multimodal model. That's automation pointed at your app — not something living inside it. Alibaba's Page Agent flips the direction.
They shipped an open-source JavaScript GUI agent that runs client-side, inside the webpage itself. It reads the live DOM as text — a "dehydrated" FlatDomTree — then clicks, types, and scrolls as the real user. No screenshots, no multimodal model, no backend rewrite, and it inherits the user's existing session and auth.
Here's what's actually interesting:
→ Text-only DOM, not pixels — a strong text model is enough, so you skip multimodal cost and latency
→ Model-agnostic through any OpenAI-compatible endpoint (qwen3.5-plus in the docs)
→ Guardrails are built in: operation allowlists, data masking for fields like passwords, and approval before critical actions
→ Single-page by default; optional Chrome extension for multi-tab, plus a beta MCP server so external agents can drive it
→ MIT-licensed, 17K+ GitHub stars, built on browser-use — its DOM processing and prompt are derived from it
Full analysis: https://www.marktechpost.com/2026/07/02/meet-alibabas-page-agent-a-javascript-in-page-gui-agent-that-controls-web-interfaces-with-natural-language-through-the-dom/
GitHub Repo: https://github.com/alibaba/page-agent
Most diffusion language models make one network do two jobs at once — represent the clean context and denoise the noisy tokens. Those two goals pull the same weights in different directions. NVIDIA just split them apart.
They released Nemotron-Labs-TwoTower — a block-wise autoregressive diffusion model built on the Nemotron-3-Nano-30B-A3B hybrid Mamba-2/attention/MoE backbone. It runs two towers: a frozen autoregressive context tower that processes clean tokens causally, and a trainable diffusion denoiser tower that refines noisy blocks via cross-attention to that context. Only the denoiser is trained — on ~2.1T tokens, a fraction of the backbone's 25T.
Here's what's actually interesting:
→ Two towers, not one: a frozen AR context tower and a trainable diffusion denoiser, connected layer-by-layer — denoiser layer i attends to context layer i, not just the last hidden state
→ 98.7% of the autoregressive baseline's quality at 2.42× generation throughput (γ=0.8, block size 16, 2×H100)
→ It commits multiple tokens per denoising step early in decoding — that's where the one-token-per-step AR bottleneck breaks
→ One checkpoint, three decoding modes: mask diffusion, mock-AR, and standard AR
→ Ablations: causal Mamba beats bidirectional Mamba, and tying the two towers under a joint loss is substantially worse
Full analysis: https://www.marktechpost.com/2026/07/01/nvidia-releases-nemotron-labs-twotower/
Paper: https://arxiv.org/pdf/2606.26493
Weights: https://huggingface.co/collections/nvidia/nemotron-labs-twotower
Most "cheaper model" launches just trade quality for price. Claude Sonnet 5 is doing something more specific — it's collapsing the gap between Anthropic's mid-tier and its flagship.
Anthropic shipped it today as its most agentic Sonnet yet: it plans, drives browsers and terminals, and runs autonomously across long tasks. It's now the default for Free and Pro, and the launch post says it ships with a 1M-token context window.
Here's what's actually interesting:
→ SWE-bench Pro (agentic coding): 63.2% vs 58.1% for Sonnet 4.6 — Opus 4.8 still leads at 69.2%
→ OSWorld-Verified (computer use): 81.2% vs 78.5%
→ Terminal-Bench 2.1: 80.4% vs 67.0%
→ Knowledge work (GDPval-AA v2): 1,618 — edging out Opus 4.8's 1,615
→ Effort levels (low → xhigh) are the real lever: value lives at low/medium; at xhigh, cost can pass Opus 4.8
→ Intro pricing: $2/$10 per MTok through Aug 31, then $3/$15 — Opus 4.8 is $5/$25
→ One catch: the new tokenizer can map the same text to ~1.0–1.35× more tokens, so watch the bill
Full analysis: https://www.marktechpost.com/2026/06/30/anthropic-claude-sonnet-5-vs-sonnet-4-6-vs-opus-4-8-agentic-coding-benchmarks-api-pricing-and-cost-performance-tradeoffs-compared/
Technical Details: https://www.anthropic.com/news/claude-sonnet-5
OpenClaw Releases iOS and Android Companion Node Apps That Connect a Phone to a Self-Hosted AI Agent Gateway
Most "AI assistant" apps are a chatbot in a sandbox, calling someone else's API. OpenClaw's iOS and Android apps draw a very clear line away from that model.
They're companion nodes, not standalone apps. Each phone pairs to a self-hosted OpenClaw Gateway over a WebSocket (default port 18789) with role: "node". The Gateway — the single control plane for sessions, routing, channels, and events — runs on macOS, Linux, or Windows (WSL2). The phone gives the agent a body: camera, location, voice, notifications, and a live Canvas.
Here's what's actually interesting:
→ The assistant runs on your machine — chat messages land on the Gateway, never on the phone
→ Nodes expose a command surface (canvas., camera., device., notifications., system.*) through node.invoke
→ Privacy-heavy commands like camera.snap and screen.record stay off until you allowlist them via gateway.nodes.allowCommands
→ Camera and screen capture run foreground-only; pairing needs explicit approval (openclaw devices approve)
→ Both store listings declare no data collection; ws:// is LAN-only, remote needs a wss:// TLS endpoint via Tailscale
Full analysis: https://www.marktechpost.com/2026/06/29/openclaw-releases-ios-and-android-companion-node-apps-that-connect-a-phone-to-a-self-hosted-ai-agent-gateway/
Android app: https://play.google.com/store/apps/details?id=ai.openclaw.app
iOS App: https://apps.apple.com/us/app/openclaw-ai-that-does-things/id6780396132
Most "AI for science" is a general coding agent pointed at biology and asked to find a drug. That's not an AI scientist — and NVIDIA just drew a clear line between the two.
They released the BioNeMo Agent Toolkit — an open-source library of "skills" that turn biomolecular models like OpenFold3, DiffDock, and GenMol into documented, callable tools, each carrying its own inputs, expected artifacts, and failure modes, so an agent can pick the right model, format the request, and read the result with no glue code anywhere in the loop.
Here's what's actually interesting:
→ Two layers: an accelerated tool layer (NIM + open models, sped up by cuEquivariance and Parabricks) and a skill layer that teaches the agent how to call each one
→ Same prompt pattern across protein folding, docking, generative chemistry, and genomics — run hosted on build.nvidia.com or local NIM, the agent's choice
→ Task completion: 100% vs. 57.1% for the same agent without skills (Codex CLI + GPT-5.5 fast)
→ 2x more passing assertions per 1k tokens, measured across all ten NIM skills
Full analysis: https://www.marktechpost.com/2026/06/29/nvidia-bionemo-agent-toolkit-turns-biomolecular-models-into-callable-skills-for-ai-agents-in-drug-discovery/
Repo: https://github.com/NVIDIA-BioNeMo/bionemo-agent-toolkit
Technical Details: https://developer.nvidia.com/blog/build-an-ai-scientist-for-life-science-discovery-with-nvidia-bionemo-agent-toolkit/?ncid=so-twit-130411
Most speculative decoding makes you pick one: a fast parallel drafter, or an accurate sequential one. is that a false choice? — and DeepSeek's DSpark just showed why.
They released DSpark — a speculative decoding framework, not a new model — that attaches a draft module to existing DeepSeek-V4 weights. It pairs a heavy parallel draft backbone with a tiny Markov head that nudges each token's logits using only t-1, then schedules how many tokens get verified based on real-time GPU load.
Here's what's actually interesting:
→ Semi-autoregressive drafting: parallel backbone for speed, lightweight sequential head to cut suffix decay — the rank-256 Markov head adds almost nothing to latency (0.2–1.3%)
→ Confidence-scheduled verification: a calibrated confidence head plus a hardware-aware scheduler verify more tokens when GPUs are idle, fewer when they're busy
→ Accepted length: +26–31% over Eagle3 and +16–18% over DFlash across Qwen3-4B / 8B / 14B
→ Production on DeepSeek-V4: 57–85% faster per-user generation over the MTP-1 baseline at matched throughput
→ Output stays lossless, and the training repo (DeepSpec) ships under MIT
Full analysis: https://www.marktechpost.com/2026/06/27/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1/
Paper: https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf
GitHub Repo: https://github.com/deepseek-ai/DeepSpec
Model weights on HF: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
Running Linux containers on a Mac has always meant one big shared VM, with every container packed inside it. Apple just inverted that model with the release of container 1.0.
container is an open-source CLI written in Swift and optimized for Apple silicon. It runs each Linux container inside its own lightweight virtual machine, and it consumes and produces standard OCI images.
Here's what's actually interesting:
→ Each container gets its own lightweight VM — isolation at the VM boundary, not a shared kernel
→ Built on Apple's open-source Containerization package, using the Virtualization and vmnet frameworks
→ OCI-compatible: pull from any standard registry (Docker Hub, GHCR), push back the same way, no conversion
→ New "container machines": persistent Linux environments with your home directory mounted and the login user matching your Mac account
→ 1.0 also moved settings to a TOML config and added structured JSON/YAML/TOML output for list and inspect
→ Apache 2.0, Apple silicon only, best on macOS 26 — and past 30,000 GitHub stars within days of release
Full analysis: https://www.marktechpost.com/2026/06/26/meet-container-apples-open-source-swift-tool-for-running-linux-containers-as-lightweight-vms-on-apple-silicon/
GitHub: https://github.com/apple/container
Most end-to-end OCR models slow down the longer they read. Every token they generate adds to the KV cache — so memory climbs and parsing dozens of pages becomes impractical. Baidu's Unlimited OCR attacks that at the attention layer, not with engineering workarounds.
They open-sourced Unlimited OCR — a 3B MoE model with 500M active parameters, built on DeepSeek OCR, that replaces every decoder attention layer with Reference Sliding Window Attention (R-SWA). Each token attends to all reference tokens (visual tokens + prompt) plus only the last 128 generated tokens. Everything older is evicted, so the KV cache stays constant instead of growing with output length. MIT-licensed, weights public.
Here's what's actually interesting:
→ The full decode runs on a constant KV cache (L_m + n) — memory and per-step latency stay flat the whole way
→ DeepEncoder compresses a 1024×1024 page to 256 visual tokens (16×), so the prefill stays small
→ Continue-trained from the DeepSeek OCR checkpoint for just 4,000 steps with the encoder frozen
— the gains come from R-SWA, not scale
→ OmniDocBench v1.5: 93.23 vs. 87.01 for the DeepSeek OCR baseline (+6.22)
→ 40+ pages parsed in one forward pass, edit distance still under 0.11; 35% throughput lead at 6,000 output tokens
Full analysis: https://www.marktechpost.com/2026/06/24/baidu-releases-unlimited-ocr-a-3b-model-that-keeps-the-kv-cache-flat-for-long-document-parsing/
Paper: https://arxiv.org/pdf/2606.23050
Model weights on HF: https://huggingface.co/baidu/Unlimited-OCR
Repo: https://github.com/baidu/Unlimited-OCR
Most speculative decoding still drafts tokens one at a time. That's not parallel generation — it just hides the serial loop behind a smaller model.
UC San Diego's z-lab just drew a clear line between the two. They released DFlash — a lightweight block diffusion model that drafts a whole block of tokens in a single forward pass, then lets the target model verify the block in parallel. Up to 15× higher throughput for gpt-oss-120b on NVIDIA Blackwell. No token-by-token drafting anywhere in the speculative path.
Here's what's actually interesting:
→ The drafter is conditioned on the target model's own hidden features, injected into the Key/Value cache of every draft layer — so acceptance length scales with draft depth instead of diluting away
→ A 5-layer drafter replaces the 7B diffusion drafters that capped earlier methods near 3–4×
→ MATH-500 speedup: 6.08× vs. 1.81× for EAGLE-3 (4.86× average vs. 1.76×, Qwen3-8B, greedy)
→ Up to 15× higher throughput for gpt-oss-120b on NVIDIA Blackwell — at the same interactivity target
→ Lossless: the target still verifies every token, so output quality is preserved
Full analysis: https://www.marktechpost.com/2026/06/24/dflash-speculative-decoding-drafts-whole-token-blocks-in-parallel-for-up-to-15x-higher-throughput-on-nvidia-blackwell/
Paper: https://arxiv.org/pdf/2602.06036
NVIDIA's metrics: https://developer.nvidia.com/blog/boost-inference-performance-up-to-15x-on-nvidia-blackwell-using-dflash-speculative-decoding/
Project: https://z-lab.ai/projects/dflash/
Model weights: https://huggingface.co/collections/z-lab/dflash
Repo: https://github.com/z-lab/dflash
Most "structured extraction" is a general LLM asked nicely to return JSON, with a retry loop bolted on. That's not a guarantee — and Datalab just drew a very clear line between the two.
They just released lift as open weights — a 9B vision model that decodes directly against your JSON schema, so the output is valid by construction. It reads whole multi-page documents in a single pass, including values that span pages. The structural guarantee lives in the decoder, so you don't need a parse-validate-retry loop to get well-formed JSON.
Here's what's actually interesting:
→ Schema-constrained decoding: your schema is compiled to a grammar, and tokens that would break it are masked at every step. Structure is enforced as it generates, not validated after the fact.
→ It guarantees shape, not meaning — a field typed "number" holds a number, just not necessarily the right one. Validity ≠ correctness.
→ Trained abstention: every field is made nullable, so it returns null instead of hallucinating a tax ID that isn't on the page.
→ The trap: hand it enum / $ref / anyOf and the schema won't compile — lift silently drops the guarantee and free-generates. No hard error. Validate downstream.
→ 90.2% field accuracy on a 225-doc, ~11,000-field adversarial benchmark — the highest of any self-hostable model they tested.
→ 9.5s median/doc: ~3x faster than Gemini Flash 3.5, and within a point of it on field accuracy.
→ Built on Qwen 3.5 — the base scores 76.3%, lift hits 90.2%. Same size, so the gain is the training, not the parameters.
→ The honest catch: full-document accuracy is 20.9% — near the bottom of the table. Getting every field right across a 64-page doc is brutal; even the hosted leaders top out at 44.4% / 40.0%.
Full analysis: https://www.marktechpost.com/2026/06/23/datalab-releases-lift-a-9b-open-weights-vision-model-that-extracts-structured-json-from-pdfs-using-schemas/
Repo: https://pxllnk.co/nmpjxqn
Model weights on HF: https://pxllnk.co/t0x8a0r
Playground: https://pxllnk.co/mf4o7kl
Most fast attention kernels on AMD get there by hand-writing GCN assembly. That's a maintenance tax most teams can't pay — and MoonMath.ai just showed you don't have to.
They open-sourced a bf16 forward attention kernel for AMD MI300X (CDNA3, gfx942), written entirely in HIP, not assembly. It beats AITER v3 — AMD's own assembly-tuned kernel — on every shape and every rounding mode across an 8K–128K token sweep.
Here's what's actually interesting:
→ One-instruction asm wrappers: you pick the exact opcode, the compiler still allocates the registers — instruction-level control without the assembly tax
→ Eight waves in two groups, two barriers per iteration — one group saturates the matrix core while the other runs softmax and prefetches the next loads
→ Most of the win is memory placement, not a clever instruction — K in LDS, V kept hot in L1, Q and accumulators in registers
→ Geomean 1.18× / 1.15× / 1.08× vs AITER (RTNE/RTNA/RTZ), up to 1.26×; 1.37–1.59× vs Modular MAX
→ Already merged into SGLang diffusion: 1.23× faster Wan2.1 video generation on MI300X, with no visible quality regression
The core bet: give the compiler a hand-built framework, then let it do what it's good at — optimize locally inside it.
Full analysis: https://www.marktechpost.com/2026/06/22/moonmath-ai-open-sources-a-hip-attention-kernel-for-amd-mi300x-that-beats-aiter-v3-on-every-shape-and-rounding-mode/
Technical details: https://moonmath.ai/cdna3attention/
Most 'deep research' tools optimize for speed — an answer in a few minutes.
Sakana AI just shipped the opposite, and it's a deliberate design choice.
They launched Sakana Marlin — the company's first commercial product, an autonomous B2B research agent positioned as a "Virtual CSO." Give it one prompt and it runs autonomously for up to ~8 hours, then returns a detailed report and presentation slides with no human in the loop.
Here's what's actually interesting:
→ AB-MCTS (Adaptive Branching Monte Carlo Tree Search) decides at each step whether to go 'wider' (a new candidate) or 'deeper' (refine a line) — NeurIPS 2025 Spotlight
→ Paired with a workflow-automation core from the AI Scientist project, published in Nature
→ One run issues hundreds to thousands of queries on Sakana's proprietary long-term inference architecture — not a third-party model router
→ Press hands-on: reports ran 60–100 pages citing 60–80 sources; slides generated with image-gen AI
→ Refined through a ~300-professional closed beta across finance, consulting, and think tanks
The core bet: spend hours of inference-time compute on depth, not seconds on a summary.
Full analysis: https://www.marktechpost.com/2026/06/15/sakana-ai-marlin/
Technical details: https://sakana.ai/marlin-release/
Product page: https://sakana.ai/marlin/
Databricks Open-Sources Omnigent: The "Meta-Harness" Layer for AI Agents
Juggling multiple AI agent frameworks like Claude Code, Codex, or Pi often means dealing with fragmented environments, manual context switching, and fragile prompt-based guardrails.
To solve this, Databricks team has built Omnigent (under the Apache 2.0 license)—a powerful meta-harness built that standardizes how we compose, govern, and share AI agents.
If you run more than one coding agent, it's worth a look.
Quick framing: a harness is the wrapper that turns a model into an agent — Claude Code, Codex, Pi. Omnigent sits one level above them.
Here are takeaways:
1. One layer over every harness → Claude Code, Codex, Pi, and custom YAML agents in the same session → Swap a harness or model with a one-line change → The same session is reachable from terminal, web, desktop, and phone
2. Control through policies, not prompts → A cost policy can pause an agent after every $100 it spends → A contextual policy can require approval to git push after an npm install → Its OS sandbox injects secrets like a GitHub token only at the egress proxy
3. Collaboration that isn't copy-paste → Share a live agent session by URL → Teammates watch it work, comment on files, co-drive, or fork the conversation
4. Two example agents ship with it → Polly: delegates to coding sub-agents in parallel git worktrees, then routes each diff to a reviewer from a different vendor than the writer → Debby: sends every question to both Claude and GPT and lets them debate
It's Apache 2.0
Full analysis: https://www.marktechpost.com/2026/06/13/databricks-open-sources-omnigent-a-meta-harness-that-composes-governs-and-shares-ai-agents-across-claude-code-codex-and-pi/
Repo: https://github.com/omnigent-ai/omnigent
Technical details: https://www.databricks.com/blog/introducing-omnigent-meta-harness-combine-control-and-share-your-agents
We have created small demo to show how the research works: https://ai-paper-demos.vercel.app/omnigent-demo.html
The US government ordered Anthropic to disable its two most capable models. Three days after launch, Claude Fable 5 and Mythos 5 went dark for everyone.
Here's what actually happened:
1. The order→ Arrived June 12, 5:21pm ET, as an export control directive → Cited national security authorities → Banned access for any foreign national, inside or outside the US
2. Why both models went fully offline: Anthropic can't separate foreign nationals from US users in real time. So it disabled Fable 5 and Mythos 5 for all customers. Every other Claude model, including Opus 4.8, stays online.
3. The trigger→ Another company claimed it jailbroke Mythos (per Axios) → The administration had earlier tried to delay the launch → Anthropic declined, and the letter followed
4. Anthropic's response: The company is complying but disputing the rationale. It calls the cited jailbreak narrow and non-universal. It says the same capability is widely available elsewhere, including OpenAI's GPT-5.5.
5. Why builders should care→ Calls to claude-fable-5 now return errors → Route to Opus 4.8 as a fallback for now → A government can pull a live model by directive
This looks like the first government-forced takedown of a publicly deployed frontier model.
Full analysis: https://marktechpost.com/2026/06/13/anthropic-disables-claude-fable-5-and-mythos-5-after-us-government-order/
Anthropic's blog: https://anthropic.com/news/fable-mythos-access
Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6
Here's what's actually in it.
1. It's a coding-focused model built on Mixture-of-Experts, 1T total parameters, 32B active. 256K context window. Open weights under a Modified MIT license on Hugging Face.
2. The benchmark gains are over K2.6 (and company-reported)→ +21.8% on Kimi Code Bench v2 (50.9 → 62.0) → +11.0% on Program Bench → +31.5% on MLS Bench Lite
3. The efficiency number is the one I'd watch→ ~30% lower reasoning-token usage vs K2.6 Reasoning tokens bill as output. Across a long agent run, that compounds into real cost and latency.
4. Against the closed frontier, here's where it actually landsGPT-5.5 leads on all six rows. Claude Opus 4.8 leads on five. K2.7-Code beats Opus 4.8 on MCP Mark Verified (81.1 vs 76.4).
5. Pricing is low for high-volume runs→ $0.19 / 1M cached input → $0.95 / 1M cache-miss input → $4.00 / 1M output
Full analysis: https://www.marktechpost.com/2026/06/12/moonshot-ai-releases-kimi-k2-7-code-a-coding-model-reporting-21-8-on-kimi-code-bench-v2-over-k2-6/
Kimi code: https://www.kimi.com/code?track_id=4fe13f24-6411-4407-be73-38f5fc4a4346
API: https://platform.kimi.ai/
Zyphra Released Zamba2-VL: Hybrid Mamba2–Transformer Vision-Language Models That Cut Time-to-First-Token by About an Order of Magnitude
It's a family of open vision-language models that swaps the usual dense Transformer backbone for a hybrid one.
Here's what is super interesting
1. The architecture is the actual storyMost open VLMs put a dense Transformer under the vision encoder. Zamba2-VL uses Zamba2 — Mamba2 state-space layers carry most of the compute, with a few shared transformer blocks (each with a per-layer LoRA adapter) kept for in-context retrieval.
2. The payoff is latency, not leaderboards→ Near-linear-time prefill instead of quadratic attention → Fixed-size recurrent state instead of a growing KV cache → Roughly an order-of-magnitude lower time-to-first-token on a 32k-token prefill
The gap is widest at 1.2B and 2.7B — the sizes that matter for on-device and edge.
3. It's competitive, not dominant — and they show where it lags→ Strong on counting: Zamba2-VL-1.2B hits 62.5 on PixMoCount (InternVL3.5-1B: 32.8) → DocVQA holds up at 90.9 for the 2.7B model → But it trails larger models on MMMU (37.7) and MathVista (51.0)
4. Fully open→ 1.2B, 2.7B, 7B under Apache 2.0 → Weights and inference code on Hugging Face and GitHub
Full analysis: https://www.marktechpost.com/2026/06/12/zyphra-release-zamba2-vl-hybrid-mamba2-transformer-vision-language-models-that-cut-time-to-first-token-by-about-an-order-of-magnitude/
Model card: https://huggingface.co/collections/Zyphra/zamba2-vl
Repo: https://github.com/Zyphra/transformers/tree/zamba2-vl
Technical details: https://www.zyphra.com/our-work/zamba2-vl
Cohere just released its first coding model named ‘North Mini Code’
It has 30B parameters. Only 3B Active Parameters for Agentic Coding
The minimum to run it? A single H100.
Five things stand out:
𝟭. 𝗧𝗵𝗲 𝘀𝗽𝗮𝗿𝘀𝗶𝘁𝘆 𝘁𝗿𝗶𝗰𝗸 North Mini Code is a sparse mixture-of-experts model. → 30B total parameters, 3B active per token → 128 experts per block; the router picks 8 → So roughly 6% of experts run on any token → Small active compute, large total capacity
𝟮. 𝗕𝘂𝗶𝗹𝘁 𝗳𝗼𝗿 𝗿𝗲𝗮𝗹 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴, 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗰𝗵𝗮𝘁 → Sub-agent orchestration → Systems architecture mapping → Code reviews and terminal tasks → Native tool use and interleaved thinking → 256K context, 64K max output
𝟯. 𝗧𝗵𝗲 𝘀𝗽𝗲𝗲𝗱 𝗻𝘂𝗺𝗯𝗲𝗿𝘀 (𝗖𝗼𝗵𝗲𝗿𝗲'𝘀 𝗼𝘄𝗻) → Up to 2.8x output throughput vs Devstral Small 2 → 30% better inter-token latency → 33.4 on the Artificial Analysis Coding Index Always re-test on your own workload.
𝟰. 𝗛𝗼𝘄 𝘁𝗼 𝗿𝘂𝗻 𝗶𝘁 → Minimum: one H100 at FP8, BF16 weights → Serve with vLLM; trained for OpenCode → Quantized for Ollama, LM Studio, llama.cpp → Also on Cohere API, Model Vault, OpenRouter
𝟱. 𝗧𝗵𝗲 𝗰𝗮𝘁𝗰𝗵 The blog says Apache 2.0. The Hugging Face card adds a non-commercial, acceptable-use note. → Read both before you ship commercially.
The bigger signal: capable coding models are shrinking.
A single-GPU, open-weight agent changes who can self-host.
Full analysis: https://www.marktechpost.com/2026/06/11/meet-north-mini-code-coheres-30b-open-weight-mixture-of-experts-model-with-3b-active-parameters-for-agentic-coding/
Model weight: https://huggingface.co/CohereLabs/North-Mini-Code-1.0
Technical details: https://cohere.com/blog/north-mini-code
اکنون در دسترس! پژوهش تلگرام ۲۰۲۵ — مهمترین بینشهای سال 
