Architecture Weekly

الذهاب إلى القناة على Telegram

Architecture Weekly newsletter originated at https://blog.vvsevolodovich.dev. ~10 articles or videos on solution architecture and system design every week!.

2 984

المشتركون

+124 ساعات

-17 أيام

+530 أيام

1 036

عرض المشاهدات

~ 42324 ساعات

~ 46748 ساعات

34.72%

معدل المشاركة

لا توجد بيانات

المشاركات في اليوم

Ads index

beta

أرشيف المشاركات

2 984

People go to the technical conferences and the only value they get are free snacks and some talks missing the true purpose of such events. I published a guide how to actually prepare the conferences and what to do there depending on your career aspirations. https://open.substack.com/pub/softwarearchitectureweekly/p/capturing-value-out-of-technical?r=1m9i62&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

2 984

Stragglers, Not Failures: How Adaptive Hedged Requests Reduce p99 Latency by 74 Percent 🤓 In fan-out microservice architectures, the dominant cause of high p99 latency is stragglers — slow-completing requests rather than failures — because one straggler in a fan-out blocks the entire composite response. While retries are a solution for failed requests, the stragglers require a parallel request if slow response is detected. Indeed new issues come with the solution, like write amplification, but if the goal is p99 optimized, raced requests are a way to go. The result is a 74% reduction in p99 latency with zero call-site configuration changes, with a reference implementation available as an open-source Go library. #distributed #architecture #engineering #softwareengineering

2 984

The Inference Paradox: How Split-Brain LLMs Are Killing Your GPU ROI 🤓 LLM inference has a structural hardware mismatch: the prefill phase is compute-bound (processing all input tokens in a single forward pass) while decode is memory-bandwidth-bound (reading the full KV cache to emit one token per step), so coupling both phases on the same GPU means each permanently starves the other. Kubex's enterprise audits surface average GPU utilization near 5% — monolithic vLLM serializes all prefill before decode can continue, and under high concurrency the delay compounds across every request in the batch. Disaggregating prefill and decode onto separate hardware-optimized node pools — as llm-d (now a CNCF sandbox project) implements on Kubernetes with prefix-cache-aware routing — yields 2–3x throughput at high concurrency by keeping the decode pool continuously active while the prefill pool handles bursts in parallel. #ai #llm #engineering #cloud

2 984

https://open.substack.com/pub/softwarearchitectureweekly/p/building-a-stripe-app-for-data-sync?r=1m9i62&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

2 984

ScyllaDB Clusters at Discord, Zero Trust for AI Agents, CloudFlare Data Platform and many more in Architecture Weekly #198 https://www.youtube.com/watch?v=71AXRNKlg5c

2 984

Zero Trust for AI Agents 🍼 Each new technology bears security risks and the agentic systems is no exception. Both common vulnerabilities like supply chain attacks and excessive permissions stay; new ones like direct and indirect prompt injection, RAG poisoning and others comes. Follow the Antropic guide on applying Zero Trust to your agentic systems. #security #ai

2 984

Migrating Data Ingestion Systems at Meta Scale 👨‍💼 Big migrations are scary and effortful. Imagine one with the biggest MySQL deployment in the world like at Meta’s data ingestion system. The team ran both legacy and new systems in a shadow testing mode, diffing outputs for correctness so individual pipelines could be validated and migrated independently; partition-level metadata flags automatically halted new delta landings and forced merges with known-good partitions whenever a partition was flagged as bad, bounding data quality risk during the transition. At @boltapp we followed the same shadow testing approach during our migration from MySQL to TitaniumDB. #db #architecture #distributed #engineering

2 984

How Discord Automates ScyllaDB Clusters at Scale 🤓 A small infrastructure team operating 20+ ScyllaDB clusters with nearly 500 nodes cannot afford purely manual runbooks for cluster-wide operations. Discord’s Scylla Control Plane (SCP) encodes operations — rolling OS upgrades, cluster expansion, shadow cluster provisioning, node recovery — as YAML-defined workflows with explicit retry counts, parallelism controls, and abort-on-failure semantics; idempotency is a hard requirement for every task so that any retry is safe. Shadow clusters — temporary production replicas receiving real traffic — let the team validate new ScyllaDB versions before they touch live data; SCP automates the full shadow cluster lifecycle, cutting what once required more than a day of continuous engineer attention to largely unattended runs. #db #distributed #architecture #engineering

2 984

Decided to give another shot to video versions of Architecture Weekly. Check it out and leave me a comment if you like this format! https://youtu.be/oSkLhBRqWIY

2 984

What's Easy Now? What's Hard Now? 👨‍💼 AI coding agents are fundamentally feedback loops built around LLMs, and the quality of available feedback determines where they succeed or fail. Tasks with fast, clear, objective feedback — like building a UI — are much easier for agents than tasks where feedback is delayed, silent, or subjective: writing concurrent code (where bugs manifest as silent data corruption at runtime), or making architecture decisions (where feedback is inherently contextual and often never arrives). Brooker frames this as the most important axis for evaluating agent capability, whether the problem domain supports tight feedback loops. #ai #softwareengineering #engineering #llm

2 984

5 Ways to make CDC with Postgres Short article not only on the CDC blueprint from Pinterest, but also mentioning 5 different ways of how Change Data Capture can be implemented in PostgreSQL. I knew only half of them! #db #cdc

2 984

RAG in Production We faced the problem of choosing data sources for our AI Agents and obviously we started weighing retriaval augamented generation. And it looks like RAG is surprisingly complex: tokenization process, committing to the model, reindexing cost and many more. Grab a great article on the topic.

2 984

Your AI wants to nuke your database 👷‍♂️ AI deleting your production database is not a nightmare anymore: it’s a reality which happened to PocketOS running their systems on Railway. To be fair to the latter company, they learned from the incident and implemented 48 hour window for soft deletion and backups for backups. Learn the full story in the article. #ai #resilience

2 984

Structured prompt driven development 👷‍♂️ Everybody who hasn’t spent last 2 years in a cave came up with their own way of working in the AI for software development era. Thoughtworks is no exception. Their core idea is prompt should become a first class citizen, just like code: be saved, versioned and reviewed. Based on this idea they develop the structured prompt process and tools and show how to implement a feature with it. To my taste the approach is a bit naive because creating holistic, full and correct test cases is much more complicated than “make me test cases, avoid duplicates” like it is shown in the article, but interest approach anyway. #softwaredevelopment

2 984

Harness Design for Long Running Apps 👷‍♂️ Long running AI agent can fail on multiple occasions building wrong stuff, building it incorrectly or building it with a suboptimal quality. Antropic experimented with long running agents extensively and converged on Planner - Executor - Reviewer architecture. Feel free to steal the approach. #ai #aiagents

2 984

Practical Lessons From the Claude Code Leak 🍼 While Axios was compromised on purpose, the Claude Code sources leaked accidentally as source maps during the update publishing. A great chance to learn from it though to get best practices on Claude.MD, multi-agent orchestration, permissions and many more. #security #ai

2 984

Why fakes beat mocks and testcontainers 👷‍♂️ Mocks and Testcontainers are the two tools most developers reach for. Both have fundamental limitations. Testcontainers are binary failing to provide partial failure modes, and mocks test implementation rather than results. Fakes fix both issues: they are the in-memory implementations which allow to test results, while providing partial failures. Grab the best practice guide! #testcontainers #qa

2 984

Compromised Axios 👷‍♂️ Another week, another compromised npm package. Axios - the most popular js library for HTTP requests - got infected with a malicious dependency via… right, social engineering. Make sure to check out the infection signals and apply all the remediation and prevention steps from the article! #security

2 984

Redis Cluster 👷‍♂️ Redis is single-threaded and despite that, can handle hundred of thousands requests per second while guaranteeing atomicity. This is enough for the majority of systems, but not for Stripe. They moved from a single hot node to a 10-nodes cluster and told everyone how Redis cluster operates in this well-written article. #redis #performance

2 984

First of all, the code writing was never a bottleneck: product understanding, architecture, quality control was. Second, AI will generate the code but why the engineer has not reviewed it? How it passed the peer review with such grave architecture problems? Those are my questions to the article, but it is good anyway: whatever entropy you have in your codebase, the AI will only amplify, not solve them. https://ctosub.com/p/the-ctos-entropy-war