Data science/ML/AI

رفتن به کانال در Telegram

Data science and machine learning hub Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources. For beginners, data scientists and ML engineers 👉 https://rebrand.ly/bigdatachannels DMCA: @disclosure_bds Contact: @mldatascientist

نمایش بیشتر

شبکه:Programming, data science, ML - free courses by Big Data Specialist الهند31 771 فناوری و برنامه‌ها9 387...

📈 تحلیل کانال تلگرام Data science/ML/AI

کانال Data science/ML/AI (@datascience_bds) در بخش زبانی انگلیسی بازیگری فعال است. در حال حاضر جامعه شامل 13 663 مشترک است و جایگاه 9 387 را در دسته فناوری و برنامه‌ها و رتبه 31 771 را در منطقه الهند دارد.

📊 شاخص‌های مخاطب و پویایی

از زمان ایجاد در невідомо، پروژه رشد سریعی داشته و 13 663 مشترک جذب کرده است.

بر اساس آخرین داده‌ها در تاریخ 05 ژوئن, 2026، کانال فعالیت پایداری دارد. در ۳۰ روز گذشته تغییر اعضا برابر 171 و در ۲۴ ساعت گذشته برابر 1 بوده و همچنان دسترسی گسترده‌ای حفظ شده است.

وضعیت تأیید: تأیید نشده
نرخ تعامل (ER): میانگین تعامل مخاطب 7.95% است و در ۲۴ ساعت نخست پس از انتشار، محتوا معمولاً 2.46% واکنش نسبت به کل مشترکان کسب می‌کند.
دسترسی پست‌ها: هر پست به طور میانگین 1 086 بازدید دریافت می‌کند. در اولین روز معمولاً 336 بازدید جمع‌آوری می‌شود.
واکنش‌ها و تعامل: مخاطبان به‌طور فعال حمایت می‌کنند؛ میانگین واکنش به هر پست 5 است.
علایق موضوعی: محتوا بر موضوعات کلیدی مانند panda, learning, row, api, ethic تمرکز دارد.

📝 توضیح و سیاست محتوایی

نویسنده این فضا را محل بیان دیدگاه‌های شخصی توصیف می‌کند:
“Data science and machine learning hub Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources. For beginners, data scientists and ML engineers 👉 https://rebrand.ly/bigdatachannels DMCA: @disclosure_bds Contact: @mldatasci...”

به لطف به‌روزرسانی‌های پرتکرار (آخرین داده در تاریخ 06 ژوئن, 2026)، کانال همواره به‌روز و دارای دسترسی بالاست. تحلیل‌ها نشان می‌دهد مخاطبان به‌طور فعال با محتوا تعامل دارند و آن را به نقطه اثرگذاری مهم در دسته فناوری و برنامه‌ها تبدیل کرده‌اند.

13 663

مشترکین

+124 ساعت

+597 روز

+17130 روز

1 086

نمایش های پست

~ 33624 ساعت

~ 49948 ساعت

7.95%

نرخ مشارکت

~ 1

پست های در روز

Ads index

beta

آرشیو پست ها

13 668

Hey everyone 👋 I know I promised to create a Data Science course. I was working on that late last year, but since early 2026 I’ve had some health issues, so they got postponed. I’ll get back to them as soon as I’m better 🙌 In the meantime, I launched this ☝️ today: https://learndevs.com/ I started building this back in the 2020s, together with many of you. It’s not perfect yet, but better to have it now than wait forever. Would love your feedback ❤️

13 668

Repost from Programming, data science, ML - free courses by Big Data Specialist

We’re live 🚀 After 4 years of work, I finally launched: 👉 learndevs.com

Goal: one place for everything a developer needs (free courses, tech news, job offers, manually written blogs. best github repos etc)

A lot of you contributed by writing code or adding courses and knowledge along the way. This is as much yours as it is mine 🙌 And I’m already working on: • Personalized roadmaps • Live chat • Better job search & placement Try it and please tell me: What would you add next? Reminder that if you want early access to new features, Join our beta testers group. Looking for people who will explore, break things, and share honest feedback.

13 668

Repost from Programming Quiz Channel

Which trade-off is common in database indexing?

Anonymous voting

13 668

Heart of Data Science

13 668

🗺 The 5 W's of Data Visualization: Why, Who, What, When, Where Creating a chart is easy. Creating a good chart, one that actually communicates an insight and isn't just a pretty picture, requires thinking like a detective. You need to answer the "5 W's" before you even pick a chart type. Every great visualization tells a story, and you need to know the plot points. 🤔 1. WHY: What is the Goal? Before you draw anything, ask: • What question am I trying to answer? (e.g., "How do sales change over time?", "Which region performs best?") • What insight do I want the viewer to gain? (e.g., "Sales are growing rapidly," "Region X is underperforming.") • What decision will this chart help make? (e.g., "Should we invest more in Region Y?") Your chart's purpose dictates everything from chart type to color choices. 👥 2. WHO: Who is the Audience? Consider who will be looking at your chart: • Technical Experts: Can handle complex plots, statistical jargon, and detailed axes. • Business Stakeholders: Need clear, high-level insights. Focus on the "so what?" Avoid jargon. • General Public: Keep it simple, use intuitive charts, and provide clear titles and labels. A chart for an AI researcher is vastly different from one for a marketing team. 📊 3. WHAT: What Data is Relevant? • What variables (columns) are needed? Don't include everything just because it's there. • What time frame or subset of data is required? (e.g., Q3 sales only, data for specific countries). • What are the units? ($, %, kg, units, etc.) – Crucial for labels! ⏰ 4. WHEN: When is the Data Important? This is about the time or sequence of your data: • Trends over time? (Line charts, area charts) • Comparisons at a specific point? (Bar charts, pie charts - use sparingly!) • Distribution within a period? (Histograms, box plots) • Relationships at any time? (Scatter plots) The "when" helps you choose the chart type that best shows change or static comparison. 🗺 5. WHERE: Where Does the Data Live? • Geographical Data: If your data is tied to locations (countries, states, cities), use maps! • Choropleth Maps: Color-coding regions based on a value. • Point Maps: Showing locations with markers. • Hierarchical Data: If your data has levels (e.g., Company > Department > Team), use treemaps or sunburst charts. 💡 The Golden Rule of Visualization: The chart should make the insight obvious, not require the viewer to dig for it. If you're not sure, ask someone from your target audience to look at it and tell you what they see. 🎯 What you should do ✔️ Clarify your chart's purpose (WHY). ✔️ Tailor your visuals to your audience (WHO). ✔️ Select only the necessary data (WHAT). ✔️ Choose chart types that reflect time/sequence (WHEN). ✔️ Use maps or hierarchical charts for spatial/structural data (WHERE).

13 668

▎Common MLOps Terms 1. MLOps: A set of practices that automates and standardizes the lifecycle of Machine Learning models, from experimentation and development to deployment and maintenance. 2. Model Training: The process of feeding data to an ML algorithm to learn patterns and make predictions, resulting in a trained model. 3. Feature Store: A centralized repository for storing, serving, and managing features for Machine Learning models, ensuring consistency between training and inference. 4. Data Versioning: The practice of tracking changes to datasets over time, ensuring reproducibility and allowing rollbacks to previous versions. 5. Model Versioning: Managing different iterations of a Machine Learning model, tracking changes, performance, and metadata. 6. Experiment Tracking: Recording all details of an ML experiment (code, hyperparameters, data, metrics) to compare results and ensure reproducibility. 7. Model Registry: A centralized hub to manage the lifecycle of ML models, including versioning, metadata, and status (e.g., "staging," "production"). 8. Model Deployment: The process of making a trained ML model available for predictions in a production environment, often via an API endpoint. 9. Inference: The process of using a deployed ML model to make predictions on new, unseen data. 10. Model Monitoring: Continuously tracking the performance, health, and behavior of deployed ML models to detect issues like data drift or performance degradation. 11. Continuous Training (CT): The practice of automatically retraining and updating ML models in production based on new data or performance metrics. 12. Reproducibility: The ability to achieve the same results (model, predictions) from an ML experiment given the same data, code, and environment. 13. Data Drift: A change in the distribution of input data to an ML model, which can cause performance degradation. 14. Concept Drift: A change in the underlying relationship between the input data and the target variable, leading to model inaccuracy over time. 15. Bias Detection: Identifying and mitigating unfair or discriminatory patterns in ML models or their data, ensuring ethical AI outcomes. 16. ML Pipeline: An automated workflow for running an ML task, encompassing data ingestion, feature engineering, model training, evaluation, and deployment steps. 17. Orchestration: Managing and coordinating the automated tasks within an ML pipeline to ensure they run in the correct sequence and handle dependencies. 18. Explainable AI (XAI): Tools and techniques that make the decisions and predictions of ML models understandable to humans. 19. Serving Infrastructure: The systems and platforms used to host and serve ML models in production, optimized for low-latency inference (e.g., REST APIs, specialized model servers). 20. ML Metadata Management: Storing and organizing information about ML artifacts (datasets, models, features, experiments) to provide lineage and ensure governance.

13 668

Repost from Programming Quiz Channel

Which metric is best for regression problems?

Anonymous voting

13 668

Software Engineer to AI Engineer: 2026 Practical Roadmap

13 668

Repost from Programming Quiz Channel

Which concept helps reduce variance in machine learning models?

Anonymous voting

13 668

📉 The Art of the Dashboard: Choosing the Right Chart Type 🖼 You have clean data, you've tested your hypotheses, and now you need to show your findings. But which chart do you use? A bar chart? A line chart? A pie chart (gulp)? Choosing the wrong chart can obscure your message or even mislead your audience. Choosing the right one makes your data sing. 1. To Show a Trend Over Time 📈 Best For: Seeing how something changes day-to-day, month-to-month, year-to-year. Chart Types: - Line Chart: Classic, great for continuous data. Shows direction. - Area Chart: Like a line chart, but the area under the line is filled. Good for showing total volume over time. - Bar Chart (Time Series): Use if you have discrete time periods (e.g., yearly sales) and want to compare exact values.

# Example Use Case: Monthly Website Traffic
# Chart: Line Chart

2. To Compare Categories 📊 Best For: Showing differences in size or value across distinct groups. Chart Types: - Bar Chart (Vertical/Column): Most common. Great for comparing quantities across groups. Easy to read exact values. - Bar Chart (Horizontal): Better when you have many categories or long category names. - Grouped Bar Chart: Compares sub-categories within main categories. - Stacked Bar Chart: Shows total for a category AND how it's made up of sub-categories.

# Example Use Case: Sales per Region
# Chart: Horizontal Bar Chart

3. To Show Composition (Part-to-Whole) 🍕 Best For: Displaying how a total is divided into parts. Use with caution! Chart Types: - Pie Chart: Only use if you have few categories (max 5-6) and you want to show proportions of a whole. The *largest* slice is easiest to read. - Donut Chart: Similar to pie, but the center is cut out (can sometimes display a total value). - Stacked Bar Chart (100%): Shows proportions across categories, but as bars, which are often easier to compare than pie slices.

# Example Use Case: Market Share (if only 3 companies)
# Chart: Pie Chart (if few companies) or 100% Stacked Bar

Warning: Humans are bad at comparing slice angles. Bar charts are usually better for precise comparisons. 4. To Show Relationships (Correlation) 🔗 Best For: Seeing if two numerical variables are connected and how strongly. Chart Types: - Scatter Plot: The go-to. Each dot is an observation, showing the values of two variables. Look for patterns (linear, curved, clusters). - Bubble Chart: A scatter plot where the size of the "bubble" (dot) represents a third numerical variable.

# Example Use Case: Does Experience correlate with Salary?
# Chart: Scatter Plot

5. To Show Distribution 📦 Best For: Understanding the range, spread, and central tendency of a single numerical variable. Chart Types: - Histogram: Shows frequency counts within bins (ranges) of your data. Great for spotting skewness or multi-modal distributions. - Box Plot (Whisker Plot): Shows median, quartiles, and potential outliers. Excellent for comparing distributions across categories.

# Example Use Case: Distribution of customer ages
# Chart: Histogram or Box Plot (if comparing age by product)

💡 The Ultimate Rule: Keep it simple. The chart should tell the story quickly. If your audience has to stare at it for five minutes to figure out what's going on, it's not working. 🎯 Today's Goal(What you should do) ✔️ Know which chart excels at showing trends vs. comparisons vs. relationships. ✔️ Use bar charts for categories and line charts for time. ✔️ Be very cautious with pie charts! ✔️ Use scatter plots to find connections.

13 668

📢 Advertising in this channel You can place an ad via Telega․io. It takes just a few minutes. Formats and current rates: View details

13 668

💎 5 Rare But High-Value Sites for Data Scientists If you’re tired of the same surface-level tutorials, these five "hidden gems" provide deep technical value you'll refer to for the rest of your career: 1️⃣ Deep Learning Drizzle A massive, curated database of free, high-quality university courses (Stanford, MIT, CMU) covering every niche in AI and ML. 🔗 https://deep-learning-drizzle.github.io/ 2️⃣ Distill pub It uses incredible interactive visualizations to explain complex machine learning research papers that are usually very hard to digest. 🔗 https://distill.pub/ 3️⃣ Connected Papers It creates a visual map of how academic papers are linked so you can find the "ancestors" of any specific algorithm. 🔗 https://www.connectedpapers.com/ 4️⃣ ML-Ops org While everyone focuses on building models, this site teaches you the "production" side how to actually deploy, monitor, and manage models in the real world. 🔗 https://ml-ops.org/ 5️⃣ Explained ai Provides the most intuitive, deep-dive explanations on the internet for how specific algorithms (like Random Forests or Gradient Boosting) actually work under the hood. 🔗 https://explained.ai/ Save these for your next deep-work session! 🚀

13 668

Power BI Dax Formulas Handbook.pdf0.12 KB

13 668

⚖️ Hypothesis Testing & P-values 🧑‍⚖️📊 You've run an A/B test. Your new website design (Version B) got 12% more clicks than the old one (Version A). Great, right? But is that 12% a real improvement, or just a lucky fluctuation in your data? This is where Hypothesis Testing and the notorious P-value come in. They help you decide if your observed data is significant enough to make a big decision, or if it's just random chance. 🏛 The Courtroom Scenario Imagine a trial: • Default Assumption (Null Hypothesis, H0): The defendant is NOT GUILTY. (Our designs are the same, the 12% is luck.) • What We're Trying to Prove (Alternative Hypothesis, H1): The defendant IS GUILTY. (Version B is better than A.) • The Evidence (Your Data): The 12% difference in clicks. • The Judge's Decision (P-value): How likely is it that we'd see this "evidence" (12% difference) if the defendant were truly not guilty (designs truly the same)? 1. The Null (H0) & Alternative (H1) Hypotheses • Null Hypothesis (H0): There is no significant difference between the two groups/variables. (e.g., "New design has no effect on clicks." or "Mean sales for region X is 100.") • Alternative Hypothesis (H1): There is a significant difference or relationship. (e.g., "New design increases clicks." or "Mean sales for region X is not 100.") Our goal is usually to reject H0 in favor of H1. 2. The P-value: What It Actually Means The P-value is the probability of observing data as extreme as (or more extreme than) your current data, assuming the Null Hypothesis is true. • Small P-value (e.g., 0.01): "It's highly unlikely we'd see this much difference if the new design had no effect. So, we'll reject the null and conclude the new design is better." • Large P-value (e.g., 0.60): "There's a good chance we'd see this difference just by luck, even if the new design had no real effect. So, we fail to reject the null." 3. The Significance Level (Alpha, α) This is your cutoff point. Most commonly, α = 0.05 (5%). • P-value ≤ α: Reject the Null Hypothesis. (Your result is "statistically significant.") • P-value > α: Fail to Reject the Null Hypothesis. (Your result is not statistically significant.) 4. The Biggest Misconceptions (DON'T DO THIS!) • P-value is NOT the probability that H0 is true. • P-value is NOT the probability that H1 is false. • A "significant" P-value doesn't mean the effect is large or important in the real world. (A tiny, unimportant difference can be statistically significant if you have a huge dataset.) 🎯 Today's Goal(What you should do) ✔️ Formulate clear Null and Alternative Hypotheses. ✔️ Understand the P-value as the likelihood of seeing your data if the Null were true. ✔️ Use a significance level (alpha) to make decisions. ✔️ AVOID common P-value misinterpretations! 👉 P-values don't tell you if your hypothesis is true, but they do tell you if your data makes the Null Hypothesis look very, very unlikely.

13 668

How to Choose Your ML Research Topic: Step by Step Framework

13 668

Repost from Programming Quiz Channel

Which activation function outputs values between 0 and 1?

Anonymous voting

13 668

Repost from Programming, data science, ML - free courses by Big Data Specialist

Azure Data Engineering: A comprehensive Roadmap Including Professional Notes + Interview Guide

13 668

AI Agents vs LLM vs RAG vs Agentic AI

13 668

📏 Feature Scaling (Standardization vs. Normalization) ⚖️ Imagine you're trying to compare apples and oranges... or rather, "Age" measured in years (0-100) and "Salary" measured in thousands of dollars (0-1,000,000). Many Machine Learning algorithms get utterly confused if one feature has a massive range and another is tiny. The larger-ranged feature will dominate the distance calculations or gradient descent, making the model unfairly biased towards it. 👉 This is where Feature Scaling comes in: making all your features play nicely together on the same playground. Why Do We Need It? 🤔 • Distance-based algorithms: (K-Nearest Neighbors, K-Means Clustering, Support Vector Machines) are very sensitive to the magnitude of features. A small difference in a large-ranged feature can seem more important than a big difference in a small-ranged feature. • Gradient Descent based algorithms: (Linear Regression, Logistic Regression, Neural Networks) converge much faster when features are on a similar scale. Two Main Flavors: Standardization & Normalization 1. Standardization (Z-score Normalization) ⚡️ • What it does: Transforms data to have a mean of 0 and a standard deviation of 1. It centers the data around the mean and scales it based on its variance. • Formula: (x - mean) / standard_deviation • When to use: • When your data follows a Gaussian (Normal) distribution. • When your algorithm assumes features are normally distributed. • When you have outliers (Standardization is less affected by them than Normalization). • Vibe: "Let's put everyone on a common baseline relative to the average." 2. Normalization (Min-Max Scaling) ↔️ • What it does: Scales data to a fixed range, usually 0 to 1. It squeezes all values into this specific interval. • Formula: (x - min) / (max - min) • When to use: • When you know your data doesn't follow a Gaussian distribution. • When your algorithm requires inputs to be within a specific range (e.g., some neural network activation functions). • When you don't have outliers (Normalization is very sensitive to extreme values). • Vibe: "Let's squeeze everyone into this exact box, no matter what." 🐍 Code Example: Seeing the Difference with Scikit-learn

import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import numpy as np

# Sample Data: 'Age' (small range) vs. 'Income' (large range)
data = {
    'Age': [25, 30, 45, 60, 20, 70],
    'Income': [40000, 60000, 90000, 150000, 30000, 1000000] # An outlier in income!
}
df = pd.DataFrame(data)

print("Original Data:")
print(df)

# --- 1. Standardization ---
scaler_std = StandardScaler()
df_standardized = scaler_std.fit_transform(df)
print("\nStandardized Data (Mean=0, Std=1):")
print(pd.DataFrame(df_standardized, columns=df.columns))

# --- 2. Normalization ---
scaler_minmax = MinMaxScaler()
df_normalized = scaler_minmax.fit_transform(df)
print("\nNormalized Data (Range 0-1):")
print(pd.DataFrame(df_normalized, columns=df.columns))

Key Observation in Output: Notice how the huge 1,000,000 income outlier in the original data dramatically pulls all other Income values towards 0 for Normalization, making them tiny. Standardization still scales it down but maintains its relative distance more robustly. The Takeaway 🧠 There's no single "best" scaling method. Your choice depends on: 1. The distribution of your data. 2. The specific Machine Learning algorithm you're using. 3. The presence of outliers. Always experiment and evaluate which scaling method performs best for your particular task!

13 668

Machine Learning for Newbies.pdf2.33 KB