fa
Feedback
DevOps & SRE notes

DevOps & SRE notes

رفتن به کانال در Telegram

Helpful articles and tools for DevOps&SRE WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F For paid consultation (RU/EN), contact: @tutunak All ways to support https://telegra.ph/How-support-the-channel-02-19

نمایش بیشتر

📈 تحلیل کانال تلگرام DevOps & SRE notes

کانال DevOps & SRE notes (@devops_sre_notes) در بخش زبانی انگلیسی بازیگری فعال است. در حال حاضر جامعه شامل 12 632 مشترک است و جایگاه 10 059 را در دسته فناوری و برنامه‌ها و رتبه 2 997 را در منطقه الولايات المتحدة الأمريكية دارد.

📊 شاخص‌های مخاطب و پویایی

از زمان ایجاد در невідомо، پروژه رشد سریعی داشته و 12 632 مشترک جذب کرده است.

بر اساس آخرین داده‌ها در تاریخ 08 ژوئن, 2026، کانال فعالیت پایداری دارد. در ۳۰ روز گذشته تغییر اعضا برابر 223 و در ۲۴ ساعت گذشته برابر 7 بوده و همچنان دسترسی گسترده‌ای حفظ شده است.

  • وضعیت تأیید: تأیید نشده
  • نرخ تعامل (ER): میانگین تعامل مخاطب 19.00% است و در ۲۴ ساعت نخست پس از انتشار، محتوا معمولاً 4.69% واکنش نسبت به کل مشترکان کسب می‌کند.
  • دسترسی پست‌ها: هر پست به طور میانگین 2 400 بازدید دریافت می‌کند. در اولین روز معمولاً 593 بازدید جمع‌آوری می‌شود.
  • واکنش‌ها و تعامل: مخاطبان به‌طور فعال حمایت می‌کنند؛ میانگین واکنش به هر پست 3 است.
  • علایق موضوعی: محتوا بر موضوعات کلیدی مانند kubernete, cluster, author, engineering, monitoring تمرکز دارد.

📝 توضیح و سیاست محتوایی

نویسنده این فضا را محل بیان دیدگاه‌های شخصی توصیف می‌کند:
Helpful articles and tools for DevOps&SRE WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F For paid consultation (RU/EN), contact: @tutunak All ways to support https://telegra.ph/How-support-the-channel-02-19

به لطف به‌روزرسانی‌های پرتکرار (آخرین داده در تاریخ 09 ژوئن, 2026)، کانال همواره به‌روز و دارای دسترسی بالاست. تحلیل‌ها نشان می‌دهد مخاطبان به‌طور فعال با محتوا تعامل دارند و آن را به نقطه اثرگذاری مهم در دسته فناوری و برنامه‌ها تبدیل کرده‌اند.

12 632
مشترکین
+724 ساعت
+707 روز
+22330 روز
آرشیو پست ها
What is the difference between a good and a bad commit message? A good commit message tells you the reason why the change was
+1
What is the difference between a good and a bad commit message? A good commit message tells you the reason why the change was made, while a bad one tells you nothing or repeats the same changes in the code. This commit doesn't explain anything and provides zero motivation; it is just a fact not attached to any reasoning. https://github.com/kubernetes/kubernetes/commit/94f7f922054d0aa4aa07d572a940ec0dda842646#diff-b2ad44e189798d2d03c3b05e0334899474353de68e03d71653b69ea5fd807c87L287-L387

Rust is a language for rewriting things. kdash is like k9s, but in Rust. https://github.com/kdash-rs/kdash

Repost from DevOps & SRE notes
Ingress Nginx will be retired, time to choose a gateway api. Gateway API Benchmarks provides a common set of tests to evaluate a Gateway API implementation. https://github.com/howardjohn/gateway-api-bench

A small reminder: Ingress Nginx will be retired soon (in less than two weeks), so you can choose the Gateway API instead.

Finally, Grafana has addressed the elephant in the room. Let's be honest, the previous "Grafana as Code" management was terrible. Whether it was the clunky provisioning system or the need for endless sidecars and scripts, it always felt like a hack. They have now introduced Grafana Git Sync. You can connect a repository directly to Grafana, and it natively syncs your dashboards and data sources from Git. No more API workarounds or messy provisioning files. It looks like the GitOps workflow for observability might finally become usable. It’s about time. https://grafana.com/blog/git-sync-grafana/

The article details Salesforce’s transition from the traditional AWS Cluster Autoscaler (based on Auto Scaling Groups) to Karpenter. To manage this at a massive scale, Salesforce built custom automation tools to handle non-disruptive migrations, mapped over 1,180 diverse node pool configurations, and implemented a phased rollout that reduced operational overhead by 80% and improved scaling speed from minutes to seconds. https://aws.amazon.com/blogs/architecture/how-salesforce-migrated-from-cluster-autoscaler-to-karpenter-across-their-fleet-of-1000-eks-clusters/

Sometimes finding a good solution for backups is a difficult task, but for many years one of the main tools I’ve used for backing up my workstation is Restic. I’ve used it on Linux, macOS, and Windows, and it works perfectly — delivering backups to HDDs and Backblaze. I can recommend it to everyone: it’s quite fast, reliable, and an optimal solution for most file-backup cases. https://github.com/restic/restic What backup solution do you use? Share it in the comments 👇

Not only has this article been updated, but the post "How does the Kubernetes scheduler work?" has been as well. https://learnkube.com/kubernetes-scheduler-explained

"What happens inside the Kubernetes API server?" has been updated. It is a good starting point for preparing for your next K8s job interview https://learnkube.com/kubernetes-api-explained

Anyone who has been on call at night knows that it's impossible to react within minutes and triage an incident fast enough, especially if you are in such cases very rarely. When you are paged once a quarter or a year, all your dashboards are outdated, your diagnostic skills are lacking, and your understanding of the system has already changed a great deal. In those cases, a current AI agent can be useful. Looking at these article, I see that by the time you get paged, wake up, turn on your laptop, and try to open your eyes, the agent can already triage the incident and provide a full report with recommendations. Yes, we still need a human to approve those changes or do them manually (as with planes, people prefer to see a live human pilot, but autopilots are already better than humans). https://www.opsworker.ai/blog/agent-driven-sre-investigations-a-practical-deep-dive-into-multi-agent-incident-response/

Recently I searched for a simple solution that allows developers to be notified about changes in ConfigMaps. I tried to find a simple solution, and to my surprise, there is only one simple and straightforward solution that does only one thing, and that is Kubewatch. So, if you would like to have a simple solution that can be used for notifying about changes to objects in your K8s cluster, choose Kubewatch. https://github.com/robusta-dev/kubewatch

Today I read the article “What Would a Kubernetes 2.0 Look Like?” Thoughts on what the next major version might be. And found
Today I read the article “What Would a Kubernetes 2.0 Look Like?” Thoughts on what the next major version might be. And found this :)
YAML is just too much for what we're trying to do with k8s and it's not a safe enough format. Indentation is error-prone, the files don't scale great (you really don't want a super long YAML file), debugging can be annoying. YAML has so many subtle behaviors outlined in its spec.
HCL is already the format for Terraform, so at least we'd only have to hate one configuration language instead of two. It's strongly typed with explicit types. There's already good validation mechanisms. It is specifically designed to do the job that we are asking YAML to do and it's not much harder to read.
and realized that Kubernetes developers had the same thoughts about using YAML- but instead of HCL, they just invented their own HCL-like language: KYAML.

The article clarifies the distinction between Platform Engineering (focused on velocity and Developer Experience/DevEx) and Site Reliability Engineering (focused on stability and production health). It argues that while their daily tasks differ, they must be integrated: Platform Engineers build the "golden paths" that abstract infrastructure complexity, while SREs ensure those paths are robust, scalable, and monitored. https://octopus.com/devops/platform-engineering/platform-engineering-vs-sre/

If you, like me, use linters in the pipeline for GitOps repositories, this repo is the best thing you can use. It contains popular Kubernetes CRDs (CustomResourceDefinition) in JSON schema format. https://github.com/datreeio/CRDs-catalog

In November, a new major version of Helm was released, but for me and my colleagues it didn’t cause any excitement. I checked
In November, a new major version of Helm was released, but for me and my colleagues it didn’t cause any excitement. I checked the changelog and realized that there were no new features that would make my life easier or improve stability. I talked to people who use Argo, and the response was the same: it’s just another release. It even feels like it could have been a minor update rather than a major one. https://t.me/devops_sre_notes/2512 Why did this happen? I think the current version of Helm is already good enough, especially if you are using GitOps with Argo CD or FluxCD. If you didn’t like Helm 3, you probably won’t change your opinion with the Helm 4.0 release.

The author conducts a side-by-side security experiment using Minikube to compare a standard root-privileged container against a custom non-root Alpine container. Through three distinct attack vectors, the article illustrates how non-root configurations actively block common exploitation attempts that succeed in root-privileged environments. Key Insights: - Tooling Denial: In a root container, an attacker can easily install missing utilities (like curl) to fetch malicious payloads. The non-root container blocks package installation and unauthorized data fetching. - Host Path Protection: The author demonstrates that if a sensitive host directory (like /etc/kubernetes/manifests) is mounted, a root user can write to it to deploy malicious static pods (e.g., crypto miners) or read sensitive host files (/etc/passwd). The non-root user is successfully denied permission to modify these files or inject new manifests. - Privilege Escalation Barrier: The experiment shows that standard attempts to switch users (e.g., using su) inside a non-root container fail immediately, limiting an attacker's ability to escalate privileges or move laterally without explicit sudo misconfigurations. https://medium.com/@marcin.wasiucionek/why-is-running-as-root-in-kubernetes-containers-dangerous-e5f1a116080e

You can do everything right but still be hacked through the official SDK. A couple of mistakes (CI/CD misconfiguration, unanchored regular expressions) in the configuration of AWS CodeBuild by AWS, combined with predictable identifier generation in GitHub, resulted in granting admin access to the AWS GitHub account. The Wiz team reported a case of gaining access to the AWS GitHub. But how many companies have made similar mistakes, enabling a hacker to have already injected vulnerabilities inside widely used libraries? https://www.wiz.io/blog/wiz-research-codebreach-vulnerability-aws-codebuild

Repost from DevOps & SRE notes
Looking for a hosting platform to practice with Linux, Kubernetes, etc.? Register using my referral link on DigitalOcean and
Looking for a hosting platform to practice with Linux, Kubernetes, etc.? Register using my referral link on DigitalOcean and get $200 in credit for 60 days. By registering through my referral link, you also support this Telegram channel. 👉 Register

The more operator tools you use, the more time you will spend replacing them after deprecation. Your processes might be well-
The more operator tools you use, the more time you will spend replacing them after deprecation. Your processes might be well-optimized, but a chain of deprecations can cause you to spend time solving problems you have already solved before, and now you have to make changes that could make your system less stable than before. The year 2025 was a year of deprecation: 1. Kaniko was deprecated (link). Our team spent quite some time finding a solution with similar performance to avoid increasing pipeline build times. 2. NGINX Unit (link) was discontinued. Similarly, we had to find an application server that could handle high loads without slowing down. 3. Ingress-NGINX (link) was discontinued—the most impactful. The options were either to migrate to another solution or start using an API gateway. Finding an “ideal” solution that fits your current needs doesn’t guarantee stability in the long term. One day, you might have to migrate to something new, introducing potential instability to your system.

That is not a positive 'fragment' from Martin Fowler about AI and code quality. It sums up other studies. After reading it, if this is true, I can say that tech professionals are protected and will not lose their jobs in the new year. The demand for skilled professionals might actually increase or shift toward higher-level maintenance and architecture. Someone needs to clean up the "mess" AI might be creating, ensure long-term maintainability, and provide the deep understanding that AI currently lacks https://martinfowler.com/articles/20251204-frags.html