Data science/ML/AI

前往频道在 Telegram

Data science and machine learning hub Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources. For beginners, data scientists and ML engineers 👉 https://rebrand.ly/bigdatachannels DMCA: @disclosure_bds Contact: @mldatascientist

显示更多

网络:Programming, data science, ML - free courses by Big Data Specialist 印度31 693 技术与应用9 381...

📈 Telegram 频道 Data science/ML/AI 的分析概览

频道 Data science/ML/AI (@datascience_bds) 英语语言赛道中的是活跃参与者。目前社区聚集了 13 667 名订阅者，在 技术与应用 类别中位列第 9 381，并在印度地区排名第 31 693 位。

📊 受众指标与增长动态

自 невідомо 创建以来，项目保持高速增长，吸引了 13 667 名订阅者。

根据 08 六月, 2026 的最新数据，频道保持稳定运转。过去 30 天订阅人数变化为 150，过去 24 小时变化为 4，整体触达仍然可观。

认证状态： 未认证
互动率 (ER)： 平均受众互动率为 7.97%。内容发布后 24 小时内通常能获得 2.27% 的反应，占订阅者总量。
帖子覆盖： 每篇帖子平均可获得 1 089 次浏览，首日通常累积 310 次浏览。
互动与反馈： 受众积极参与，单帖平均反应数为 5。
主题关注点： 内容集中在 panda, learning, row, api, ethic 等核心主题上。

📝 描述与内容策略

作者将该频道定位为表达主观观点的平台：
“Data science and machine learning hub Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources. For beginners, data scientists and ML engineers 👉 https://rebrand.ly/bigdatachannels DMCA: @disclosure_bds Contact: @mldatasci...”

凭借高频更新（最新数据采集于 09 六月, 2026），频道始终保持新鲜度与高覆盖。分析显示受众积极互动，使其成为 技术与应用 类别中的关键影响点。

13 667

订阅者

+424 小时

+437 天

+15030 天

1 089

帖子浏览量

~ 31024 小时

~ 45848 小时

7.97%

参与率

~ 1

每日帖子数

Ads index

beta

帖子存档

13 666

60 Generative AI Project Ideas

13 666

📚 Data Science Riddle Why is data versioning(e.g., DVC, LakeFS) essential in ML workflows?

Anonymous voting

13 666

6 Steps of Data Cleaning Every Data Analyst Should Know

13 666

Instead of starting every project from scratch, use this template to build AI apps with structure and speed

13 666

https://jakevdp.github.io/PythonDataScienceHandbook/

13 666

📚 Data Science Riddle In A/B testing, why is random assignment of users essential?

Anonymous voting

13 666

⚡ Parallelism In Databricks ⚡ 1️⃣ DEFINITION Parallelism = running many tasks 🏃‍♂️🏃‍♀️ at the same time (instead of one by one 🐢). In Databricks (via Apache Spark), data is split into 📦 partitions, and each partition is processed simultaneously across worker nodes 💻💻💻. 2️⃣ KEY CONCEPTS 🔹 Partition = one chunk of data 📦 🔹 Task = work done on a partition 🛠️ 🔹 Stage = group of tasks that run in parallel ⚙️ 🔹 Job = complete action (made of stages + tasks) 📊 3️⃣ HOW IT WORKS ✅ Step 1: Dataset ➡️ divided into partitions 📦📦📦 ✅ Step 2: Each partition ➡️ assigned to a worker 💻 ✅ Step 3: Workers run tasks in parallel ⏩ ✅ Step 4: Results ➡️ combined into final output 🎯 4️⃣ EXAMPLES # Increase parallelism by repartitioning df = spark.read.csv("/data/huge_file.csv") df = df.repartition(200) # ⚡ 200 parallel tasks # Spark DataFrame ops run in parallel by default 🚀 result = df.groupBy("category").count() # Parallelize small Python objects 📂 rdd = spark.sparkContext.parallelize(range(1000), numSlices=50) rdd.map(lambda x: x * 2).collect() # Parallel workflows in Jobs UI ⚡ # Independent tasks = run at the same time. 5️⃣ BEST PRACTICES ⚖️ Balance partitions → not too few, not too many 📉 Avoid data skew → partitions should be even 🗃️ Cache data if reused often 💪 Scale cluster → more workers = more parallelism ==================================================== 📌 SUMMARY Parallelism in Databricks = split data 📦 → assign tasks 🛠️ → run them at the same time ⏩ → faster results 🚀

13 666

📚 Data Science Riddle You train a CNN for image classification but loss stops decreasing early. What's your next step?

Anonymous voting

13 666

Feature Engineering: The Hidden Skill That Makes or Breaks ML Models Most people chase better algorithms. Professionals chase better features. Because no matter how fancy your model is, if the data doesn’t speak the right language. it won’t learn anything meaningful. 🔍 So What Exactly Is Feature Engineering? It’s not just cleaning data. It’s translating raw, messy reality into something your model can understand. You’re basically asking:

“How can I represent the real world in numbers, without losing its meaning?”

Example: ➖ “Date of birth” → Age (time-based insight) ➖ “Text review” → Sentiment score (emotional signal) ➖ “Price” → log(price) (stabilized distribution) Every transformation teaches your model how to see the world more clearly. ⚙️ Why It Matters More Than the Model You can’t outsmart bad features. A simple linear model trained on smartly engineered data will outperform a deep neural net trained on noise. Kaggle winners know this. They spend 80% of their time creating and refining features not tuning hyperparameters. Why? Because models don’t create intelligence, They extract it from what you feed them. 🧩 The Core Idea: Add Signal, Remove Noise Feature engineering is about sculpting your data so patterns stand out. You do that by: ✔️ Transforming data (scale, encode, log). ✔️ Creating new signals (ratios, lags, interactions). ✔️ Reducing redundancy (drop correlated or useless columns). Every step should make learning easier not prettier. ⚠️ Beware of Data Leakage Here’s the silent trap: using future information when building features. For example, when predicting loan default, if you include “payment status after 90 days,” your model will look brilliant in training and fail in production. Golden rule: 👉 A feature is valid only if it’s available at prediction time. 🧠 Think Like a Domain Expert Anyone can code transformations. But great data scientists understand context. They ask: ❔What actually influences this outcome in real life? ❔How can I capture that influence as a feature? When you merge domain intuition with technical precision, feature engineering becomes your superpower. ⚡️ Final Takeaway The model is the student. The features are the teacher. And no matter how capable the student if the teacher explains things poorly, learning fails.

Feature engineering isn’t preprocessing. It’s the art of teaching your model how to understand the world.

13 666

📚 Data Science Riddle You have messy CSVs arriving daily. What's your first production step?

Anonymous voting

13 666

3 Common Questions About Data and Analytics

13 666

🚀 Databricks Tip: REPLACE vs MERGE When updating Delta tables, you’ve got two powerful options: 🔹 REPLACE TABLE … ON 📚 Like throwing away the entire library and rebuilding it. - Drops the old table & recreates it. - Schema + data = fully replaced. - ⚡ Super fast but destructive (old data gone). - ✅ Best for full refreshes or schema changes. 🔹 MERGE 📖 Like updating only the books that changed. - Works row by row. - Updates, inserts, or deletes specific records. - 🔍 Preserves unchanged data. - ✅ Best for incremental updates or CDC (Change Data Capture). ⚖️ Key Difference - REPLACE = Start fresh with a new table. - MERGE = Surgically update rows without losing the rest. 👉 Rule of thumb: Use REPLACE for full rebuilds, Use MERGE for incremental upserts. #Databricks #DeltaLake

13 666

🤖 AI that creates AI: ASI-ARCH finds 106 new SOTA architectures ASI-ARCH — experimental ASI that autonomously researches and designs neural nets. It hypothesizes, codes, trains & tests models. 💡 Scale: 1,773 experiments → 20,000+ GPU-hours. Stage 1 (20M params, 1B tokens): 1,350 candidates beat DeltaNet. Stage 2 (340M params): 400 models → 106 SOTA winners. Top 5 trained on 15B tokens vs Mamba2 & Gated DeltaNet. 📊 Results: PathGateFusionNet: 48.51 avg (Mamba2: 47.84, Gated DeltaNet: 47.32). BoolQ: 60.58 vs 60.12 (Gated DeltaNet). Consistent gains across tasks. 🔍 Insights: Prefers proven tools (gating, convs), refines them iteratively. Ideas come from: 51.7% literature, 38.2% self-analysis, 10.1% originality. SOTA share: self-analysis ↑ to 44.8%, literature ↓ to 48.6%. 📎 Project page | Arxiv | GitHub #AI #ML #Research #ASIARCH@datascience_bds

13 666

📚 Data Science Riddle Your object detection model misses small objects. Easiest fix?

Anonymous voting

13 666

LLM Cheatsheet

13 666

📚 Data Science Riddle Why do we use Batch Normalization?

Anonymous voting

13 666

Basic SQL Commands

13 666

Excel Vs SQL Vs Python

13 666

Statistical Moments (M1, M2) for Data Analysis Here are 5 curated PDFs diving into the mean (M1), variance (M2), and their applications in crafting research questions and sourcing data. A channel member requested resources on this topic and we delivered. If you have a topic you want resources on let us know, and we’ll make it happen! @datascience_bds

13 666

📚 Data Science Riddle Model Accuracy improves after dropping half the features. Why?

Anonymous voting