Data science/ML/AI

前往频道在 Telegram

Data science and machine learning hub Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources. For beginners, data scientists and ML engineers 👉 https://rebrand.ly/bigdatachannels DMCA: @disclosure_bds Contact: @mldatascientist

显示更多

网络:Programming, data science, ML - free courses by Big Data Specialist 印度31 743 技术与应用9 391...

📈 Telegram 频道 Data science/ML/AI 的分析概览

频道 Data science/ML/AI (@datascience_bds) 英语语言赛道中的是活跃参与者。目前社区聚集了 13 660 名订阅者，在 技术与应用 类别中位列第 9 391，并在印度地区排名第 31 743 位。

📊 受众指标与增长动态

自 невідомо 创建以来，项目保持高速增长，吸引了 13 660 名订阅者。

根据 07 六月, 2026 的最新数据，频道保持稳定运转。过去 30 天订阅人数变化为 151，过去 24 小时变化为 -5，整体触达仍然可观。

认证状态： 未认证
互动率 (ER)： 平均受众互动率为 7.92%。内容发布后 24 小时内通常能获得 2.33% 的反应，占订阅者总量。
帖子覆盖： 每篇帖子平均可获得 1 082 次浏览，首日通常累积 318 次浏览。
互动与反馈： 受众积极参与，单帖平均反应数为 5。
主题关注点： 内容集中在 panda, learning, row, api, ethic 等核心主题上。

📝 描述与内容策略

作者将该频道定位为表达主观观点的平台：
“Data science and machine learning hub Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources. For beginners, data scientists and ML engineers 👉 https://rebrand.ly/bigdatachannels DMCA: @disclosure_bds Contact: @mldatasci...”

凭借高频更新（最新数据采集于 08 六月, 2026），频道始终保持新鲜度与高覆盖。分析显示受众积极互动，使其成为 技术与应用 类别中的关键影响点。

13 660

订阅者

-524 小时

+527 天

+15130 天

1 082

帖子浏览量

~ 31824 小时

~ 46448 小时

7.92%

参与率

~ 1

每日帖子数

Ads index

beta

帖子存档

13 660

Repost from Data science research papers

TradingAgents: Multi-Agents LLM Financial Trading Framework 📅 Publication Date: Dec 28, 2024 📑 Paper: https://arxiv.org/pdf/2412.20138 🔗 Code: https://github.com/tauricresearch/tradingagents 🚀 Spaces citing this paper: • https://huggingface.co/spaces/shanghengdu/LLM-Agent-Optimization-PaperList • https://huggingface.co/spaces/tahp0604/ai-stock-watchlist 📝 Description: The paper introduces TradingAgents, a multi-agent framework that utilizes large language models for stock trading, simulating the collaborative dynamics of real-world trading firms. The framework consists of various agents, including fundamental analysts, sentiment analysts, technical analysts, and traders with different risk profiles, all powered by large language models. These agents work together to assess market conditions, manage risk, and make informed trading decisions. The framework also includes researcher agents that evaluate market conditions and a risk management team that monitors exposure.

13 660

What is an Agenetic AI?

13 660

🧠 The Statistical Illusion: Simpson’s Paradox 🎭 Imagine you are choosing a hospital for a surgery. • Hospital A has a higher survival rate than Hospital B for "Easy" cases. • Hospital A also has a higher survival rate for "Hard" cases. Common sense says: Choose Hospital A. But when you look at the total combined data, Hospital B actually has a higher survival rate. 🤯 This is Simpson’s Paradox: A trend appears in several different groups but disappears or reverses when these groups are combined. 🔍 Why does this happen? It happens because of a Lurking Variable (a hidden factor). In this case, Hospital A is a world-class facility, so it takes on way more "Hard" cases than Hospital B. Even though they are better at both types, the high volume of risky surgeries drags their overall average down. 🐍 See the Paradox in Code Let's simulate this "impossible" scenario using Python:

import pandas as pd

# Data: [Successes, Total Attempts]
data = {
    'Hospital': ['A', 'A', 'B', 'B'],
    'Case_Type': ['Easy', 'Hard', 'Easy', 'Hard'],
    'Survived': [95, 10, 90, 70], 
    'Total': [100, 100, 100, 1000] 
}
df = pd.DataFrame(data)

# 1. Check rates per group
df['Rate'] = df['Survived'] / df['Total']
print("--- Rates by Group ---")
print(df[['Hospital', 'Case_Type', 'Rate']])

# 2. Check overall rates
overall = df.groupby('Hospital').sum()
overall['Overall_Rate'] = overall['Survived'] / overall['Total']
print("\n--- Overall Rates (The Paradox!) ---")
print(overall['Overall_Rate'])

The Result: • A is better at Easy (95% vs 90%). • A is better at Hard (10% vs 7%). • BUT... Overall, B wins (14% vs 52%) because B mostly did "Easy" cases. 🛠 How to avoid being fooled? 1. Don't trust the aggregate: When analyzing data, always try to "segment" or "drill down" into sub-groups. 2. Look for the Weight: Ask yourself: "Is one group disproportionately represented in the total?" 3. Identify the Lurking Variable: What context is missing? (e.g., Age, Severity, Time of Day). 🎯 The Takeaway In Data Science, the "Big Picture" can sometimes be a big lie. If your analysis produces a result that defies logic, you might be looking at a Simpson’s Paradox. Always slice your data before you trust it.

13 660

Hey everyone 👋 I know I promised to create a Data Science course. I was working on that late last year, but since early 2026 I’ve had some health issues, so they got postponed. I’ll get back to them as soon as I’m better 🙌 In the meantime, I launched this ☝️ today: https://learndevs.com/ I started building this back in the 2020s, together with many of you. It’s not perfect yet, but better to have it now than wait forever. Would love your feedback ❤️

13 660

Repost from Programming, data science, ML - free courses by Big Data Specialist

We’re live 🚀 After 4 years of work, I finally launched: 👉 learndevs.com

Goal: one place for everything a developer needs (free courses, tech news, job offers, manually written blogs. best github repos etc)

A lot of you contributed by writing code or adding courses and knowledge along the way. This is as much yours as it is mine 🙌 And I’m already working on: • Personalized roadmaps • Live chat • Better job search & placement Try it and please tell me: What would you add next? Reminder that if you want early access to new features, Join our beta testers group. Looking for people who will explore, break things, and share honest feedback.

13 660

Repost from Programming Quiz Channel

Which trade-off is common in database indexing?

Anonymous voting

13 660

Heart of Data Science

13 660

🗺 The 5 W's of Data Visualization: Why, Who, What, When, Where Creating a chart is easy. Creating a good chart, one that actually communicates an insight and isn't just a pretty picture, requires thinking like a detective. You need to answer the "5 W's" before you even pick a chart type. Every great visualization tells a story, and you need to know the plot points. 🤔 1. WHY: What is the Goal? Before you draw anything, ask: • What question am I trying to answer? (e.g., "How do sales change over time?", "Which region performs best?") • What insight do I want the viewer to gain? (e.g., "Sales are growing rapidly," "Region X is underperforming.") • What decision will this chart help make? (e.g., "Should we invest more in Region Y?") Your chart's purpose dictates everything from chart type to color choices. 👥 2. WHO: Who is the Audience? Consider who will be looking at your chart: • Technical Experts: Can handle complex plots, statistical jargon, and detailed axes. • Business Stakeholders: Need clear, high-level insights. Focus on the "so what?" Avoid jargon. • General Public: Keep it simple, use intuitive charts, and provide clear titles and labels. A chart for an AI researcher is vastly different from one for a marketing team. 📊 3. WHAT: What Data is Relevant? • What variables (columns) are needed? Don't include everything just because it's there. • What time frame or subset of data is required? (e.g., Q3 sales only, data for specific countries). • What are the units? ($, %, kg, units, etc.) – Crucial for labels! ⏰ 4. WHEN: When is the Data Important? This is about the time or sequence of your data: • Trends over time? (Line charts, area charts) • Comparisons at a specific point? (Bar charts, pie charts - use sparingly!) • Distribution within a period? (Histograms, box plots) • Relationships at any time? (Scatter plots) The "when" helps you choose the chart type that best shows change or static comparison. 🗺 5. WHERE: Where Does the Data Live? • Geographical Data: If your data is tied to locations (countries, states, cities), use maps! • Choropleth Maps: Color-coding regions based on a value. • Point Maps: Showing locations with markers. • Hierarchical Data: If your data has levels (e.g., Company > Department > Team), use treemaps or sunburst charts. 💡 The Golden Rule of Visualization: The chart should make the insight obvious, not require the viewer to dig for it. If you're not sure, ask someone from your target audience to look at it and tell you what they see. 🎯 What you should do ✔️ Clarify your chart's purpose (WHY). ✔️ Tailor your visuals to your audience (WHO). ✔️ Select only the necessary data (WHAT). ✔️ Choose chart types that reflect time/sequence (WHEN). ✔️ Use maps or hierarchical charts for spatial/structural data (WHERE).

13 660

▎Common MLOps Terms 1. MLOps: A set of practices that automates and standardizes the lifecycle of Machine Learning models, from experimentation and development to deployment and maintenance. 2. Model Training: The process of feeding data to an ML algorithm to learn patterns and make predictions, resulting in a trained model. 3. Feature Store: A centralized repository for storing, serving, and managing features for Machine Learning models, ensuring consistency between training and inference. 4. Data Versioning: The practice of tracking changes to datasets over time, ensuring reproducibility and allowing rollbacks to previous versions. 5. Model Versioning: Managing different iterations of a Machine Learning model, tracking changes, performance, and metadata. 6. Experiment Tracking: Recording all details of an ML experiment (code, hyperparameters, data, metrics) to compare results and ensure reproducibility. 7. Model Registry: A centralized hub to manage the lifecycle of ML models, including versioning, metadata, and status (e.g., "staging," "production"). 8. Model Deployment: The process of making a trained ML model available for predictions in a production environment, often via an API endpoint. 9. Inference: The process of using a deployed ML model to make predictions on new, unseen data. 10. Model Monitoring: Continuously tracking the performance, health, and behavior of deployed ML models to detect issues like data drift or performance degradation. 11. Continuous Training (CT): The practice of automatically retraining and updating ML models in production based on new data or performance metrics. 12. Reproducibility: The ability to achieve the same results (model, predictions) from an ML experiment given the same data, code, and environment. 13. Data Drift: A change in the distribution of input data to an ML model, which can cause performance degradation. 14. Concept Drift: A change in the underlying relationship between the input data and the target variable, leading to model inaccuracy over time. 15. Bias Detection: Identifying and mitigating unfair or discriminatory patterns in ML models or their data, ensuring ethical AI outcomes. 16. ML Pipeline: An automated workflow for running an ML task, encompassing data ingestion, feature engineering, model training, evaluation, and deployment steps. 17. Orchestration: Managing and coordinating the automated tasks within an ML pipeline to ensure they run in the correct sequence and handle dependencies. 18. Explainable AI (XAI): Tools and techniques that make the decisions and predictions of ML models understandable to humans. 19. Serving Infrastructure: The systems and platforms used to host and serve ML models in production, optimized for low-latency inference (e.g., REST APIs, specialized model servers). 20. ML Metadata Management: Storing and organizing information about ML artifacts (datasets, models, features, experiments) to provide lineage and ensure governance.

13 660

Repost from Programming Quiz Channel

Which metric is best for regression problems?

Anonymous voting

13 660

Software Engineer to AI Engineer: 2026 Practical Roadmap

13 660

Repost from Programming Quiz Channel

Which concept helps reduce variance in machine learning models?

Anonymous voting

13 660

📉 The Art of the Dashboard: Choosing the Right Chart Type 🖼 You have clean data, you've tested your hypotheses, and now you need to show your findings. But which chart do you use? A bar chart? A line chart? A pie chart (gulp)? Choosing the wrong chart can obscure your message or even mislead your audience. Choosing the right one makes your data sing. 1. To Show a Trend Over Time 📈 Best For: Seeing how something changes day-to-day, month-to-month, year-to-year. Chart Types: - Line Chart: Classic, great for continuous data. Shows direction. - Area Chart: Like a line chart, but the area under the line is filled. Good for showing total volume over time. - Bar Chart (Time Series): Use if you have discrete time periods (e.g., yearly sales) and want to compare exact values.

# Example Use Case: Monthly Website Traffic
# Chart: Line Chart

2. To Compare Categories 📊 Best For: Showing differences in size or value across distinct groups. Chart Types: - Bar Chart (Vertical/Column): Most common. Great for comparing quantities across groups. Easy to read exact values. - Bar Chart (Horizontal): Better when you have many categories or long category names. - Grouped Bar Chart: Compares sub-categories within main categories. - Stacked Bar Chart: Shows total for a category AND how it's made up of sub-categories.

# Example Use Case: Sales per Region
# Chart: Horizontal Bar Chart

3. To Show Composition (Part-to-Whole) 🍕 Best For: Displaying how a total is divided into parts. Use with caution! Chart Types: - Pie Chart: Only use if you have few categories (max 5-6) and you want to show proportions of a whole. The *largest* slice is easiest to read. - Donut Chart: Similar to pie, but the center is cut out (can sometimes display a total value). - Stacked Bar Chart (100%): Shows proportions across categories, but as bars, which are often easier to compare than pie slices.

# Example Use Case: Market Share (if only 3 companies)
# Chart: Pie Chart (if few companies) or 100% Stacked Bar

Warning: Humans are bad at comparing slice angles. Bar charts are usually better for precise comparisons. 4. To Show Relationships (Correlation) 🔗 Best For: Seeing if two numerical variables are connected and how strongly. Chart Types: - Scatter Plot: The go-to. Each dot is an observation, showing the values of two variables. Look for patterns (linear, curved, clusters). - Bubble Chart: A scatter plot where the size of the "bubble" (dot) represents a third numerical variable.

# Example Use Case: Does Experience correlate with Salary?
# Chart: Scatter Plot

5. To Show Distribution 📦 Best For: Understanding the range, spread, and central tendency of a single numerical variable. Chart Types: - Histogram: Shows frequency counts within bins (ranges) of your data. Great for spotting skewness or multi-modal distributions. - Box Plot (Whisker Plot): Shows median, quartiles, and potential outliers. Excellent for comparing distributions across categories.

# Example Use Case: Distribution of customer ages
# Chart: Histogram or Box Plot (if comparing age by product)

💡 The Ultimate Rule: Keep it simple. The chart should tell the story quickly. If your audience has to stare at it for five minutes to figure out what's going on, it's not working. 🎯 Today's Goal(What you should do) ✔️ Know which chart excels at showing trends vs. comparisons vs. relationships. ✔️ Use bar charts for categories and line charts for time. ✔️ Be very cautious with pie charts! ✔️ Use scatter plots to find connections.

13 660

📢 Advertising in this channel You can place an ad via Telega․io. It takes just a few minutes. Formats and current rates: View details

13 660

💎 5 Rare But High-Value Sites for Data Scientists If you’re tired of the same surface-level tutorials, these five "hidden gems" provide deep technical value you'll refer to for the rest of your career: 1️⃣ Deep Learning Drizzle A massive, curated database of free, high-quality university courses (Stanford, MIT, CMU) covering every niche in AI and ML. 🔗 https://deep-learning-drizzle.github.io/ 2️⃣ Distill pub It uses incredible interactive visualizations to explain complex machine learning research papers that are usually very hard to digest. 🔗 https://distill.pub/ 3️⃣ Connected Papers It creates a visual map of how academic papers are linked so you can find the "ancestors" of any specific algorithm. 🔗 https://www.connectedpapers.com/ 4️⃣ ML-Ops org While everyone focuses on building models, this site teaches you the "production" side how to actually deploy, monitor, and manage models in the real world. 🔗 https://ml-ops.org/ 5️⃣ Explained ai Provides the most intuitive, deep-dive explanations on the internet for how specific algorithms (like Random Forests or Gradient Boosting) actually work under the hood. 🔗 https://explained.ai/ Save these for your next deep-work session! 🚀

13 660

Power BI Dax Formulas Handbook.pdf0.12 KB

13 660

⚖️ Hypothesis Testing & P-values 🧑‍⚖️📊 You've run an A/B test. Your new website design (Version B) got 12% more clicks than the old one (Version A). Great, right? But is that 12% a real improvement, or just a lucky fluctuation in your data? This is where Hypothesis Testing and the notorious P-value come in. They help you decide if your observed data is significant enough to make a big decision, or if it's just random chance. 🏛 The Courtroom Scenario Imagine a trial: • Default Assumption (Null Hypothesis, H0): The defendant is NOT GUILTY. (Our designs are the same, the 12% is luck.) • What We're Trying to Prove (Alternative Hypothesis, H1): The defendant IS GUILTY. (Version B is better than A.) • The Evidence (Your Data): The 12% difference in clicks. • The Judge's Decision (P-value): How likely is it that we'd see this "evidence" (12% difference) if the defendant were truly not guilty (designs truly the same)? 1. The Null (H0) & Alternative (H1) Hypotheses • Null Hypothesis (H0): There is no significant difference between the two groups/variables. (e.g., "New design has no effect on clicks." or "Mean sales for region X is 100.") • Alternative Hypothesis (H1): There is a significant difference or relationship. (e.g., "New design increases clicks." or "Mean sales for region X is not 100.") Our goal is usually to reject H0 in favor of H1. 2. The P-value: What It Actually Means The P-value is the probability of observing data as extreme as (or more extreme than) your current data, assuming the Null Hypothesis is true. • Small P-value (e.g., 0.01): "It's highly unlikely we'd see this much difference if the new design had no effect. So, we'll reject the null and conclude the new design is better." • Large P-value (e.g., 0.60): "There's a good chance we'd see this difference just by luck, even if the new design had no real effect. So, we fail to reject the null." 3. The Significance Level (Alpha, α) This is your cutoff point. Most commonly, α = 0.05 (5%). • P-value ≤ α: Reject the Null Hypothesis. (Your result is "statistically significant.") • P-value > α: Fail to Reject the Null Hypothesis. (Your result is not statistically significant.) 4. The Biggest Misconceptions (DON'T DO THIS!) • P-value is NOT the probability that H0 is true. • P-value is NOT the probability that H1 is false. • A "significant" P-value doesn't mean the effect is large or important in the real world. (A tiny, unimportant difference can be statistically significant if you have a huge dataset.) 🎯 Today's Goal(What you should do) ✔️ Formulate clear Null and Alternative Hypotheses. ✔️ Understand the P-value as the likelihood of seeing your data if the Null were true. ✔️ Use a significance level (alpha) to make decisions. ✔️ AVOID common P-value misinterpretations! 👉 P-values don't tell you if your hypothesis is true, but they do tell you if your data makes the Null Hypothesis look very, very unlikely.

13 660

How to Choose Your ML Research Topic: Step by Step Framework

13 660

Repost from Programming Quiz Channel

Which activation function outputs values between 0 and 1?

Anonymous voting

13 660

Repost from Programming, data science, ML - free courses by Big Data Specialist

Azure Data Engineering: A comprehensive Roadmap Including Professional Notes + Interview Guide