fa
Feedback
Data science/ML/AI

Data science/ML/AI

رفتن به کانال در Telegram

Data science and machine learning hub Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources. For beginners, data scientists and ML engineers 👉 https://rebrand.ly/bigdatachannels DMCA: @disclosure_bds Contact: @mldatascientist

نمایش بیشتر

📈 تحلیل کانال تلگرام Data science/ML/AI

کانال Data science/ML/AI (@datascience_bds) در بخش زبانی انگلیسی بازیگری فعال است. در حال حاضر جامعه شامل 13 660 مشترک است و جایگاه 9 391 را در دسته فناوری و برنامه‌ها و رتبه 31 743 را در منطقه الهند دارد.

📊 شاخص‌های مخاطب و پویایی

از زمان ایجاد در невідомо، پروژه رشد سریعی داشته و 13 660 مشترک جذب کرده است.

بر اساس آخرین داده‌ها در تاریخ 07 ژوئن, 2026، کانال فعالیت پایداری دارد. در ۳۰ روز گذشته تغییر اعضا برابر 151 و در ۲۴ ساعت گذشته برابر -5 بوده و همچنان دسترسی گسترده‌ای حفظ شده است.

  • وضعیت تأیید: تأیید نشده
  • نرخ تعامل (ER): میانگین تعامل مخاطب 7.92% است و در ۲۴ ساعت نخست پس از انتشار، محتوا معمولاً 2.33% واکنش نسبت به کل مشترکان کسب می‌کند.
  • دسترسی پست‌ها: هر پست به طور میانگین 1 082 بازدید دریافت می‌کند. در اولین روز معمولاً 318 بازدید جمع‌آوری می‌شود.
  • واکنش‌ها و تعامل: مخاطبان به‌طور فعال حمایت می‌کنند؛ میانگین واکنش به هر پست 5 است.
  • علایق موضوعی: محتوا بر موضوعات کلیدی مانند panda, learning, row, api, ethic تمرکز دارد.

📝 توضیح و سیاست محتوایی

نویسنده این فضا را محل بیان دیدگاه‌های شخصی توصیف می‌کند:
Data science and machine learning hub Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources. For beginners, data scientists and ML engineers 👉 https://rebrand.ly/bigdatachannels DMCA: @disclosure_bds Contact: @mldatasci...

به لطف به‌روزرسانی‌های پرتکرار (آخرین داده در تاریخ 08 ژوئن, 2026)، کانال همواره به‌روز و دارای دسترسی بالاست. تحلیل‌ها نشان می‌دهد مخاطبان به‌طور فعال با محتوا تعامل دارند و آن را به نقطه اثرگذاری مهم در دسته فناوری و برنامه‌ها تبدیل کرده‌اند.

13 660
مشترکین
-524 ساعت
+527 روز
+15130 روز
آرشیو پست ها
Why Feature Drift Is Harder Than Data Drift Data drift = inputs change Feature drift = the logic that generates the feature changes Example: Your “active user” feature used to be “clicked in last 7 days.” Marketing redefines it to “clicked in last 3 days.” Your model silently dies because the underlying concept changed. Feature drift is more dangerous: it happens inside your system, not in external data. Production ML must version: ▪️feature definitions ▪️transformation logic ▪️data contracts Otherwise the same model receives different features week to week.

📚 Data Science Riddle - Probability A classifier outputs 0.9 probability for class A, but the real frequency is only 0.7. What is the model lacking?
Anonymous voting

The Real Reason PCA Works: Variance as Signal Students memorize PCA as “dimensionality reduction.” But the deeper insight is:
The Real Reason PCA Works: Variance as Signal Students memorize PCA as “dimensionality reduction.” But the deeper insight is: PCA assumes variance = information. If a direction in the data has high variance, PCA considers it meaningful. If variance is small, PCA considers it noise. This is not always true in real systems. PCA fails when: ➖important signals have low variance ➖noise has high variance ➖relationships are nonlinear That’s why modern methods (autoencoders, UMAP, t-SNE) outperform PCA on many datasets.

📚 Data Science Riddle - Model Selection Two models have similar accuracy, but one is far simpler. Which should you choose ?
Anonymous voting

Data Cleaning in Python Data cleaning is the process of detecting and correcting inaccurate, incomplete, or inconsistent data to improve data quality for analysis and modeling. It is a crucial step in any data science workflow. Handling Missing Values
df.isnull().sum()        # Check missing values
df.dropna()              # Remove rows with missing values
df.fillna(0)             # Replace missing values
Removing Duplicate Data
df.duplicated()          # Identify duplicates
df.drop_duplicates()     # Remove duplicates
Correcting Data Types
df.dtypes                                                      #identify data types
df["age"] = df["age"].astype(int)               #convert age column to integer data type
df["date"] = pd.to_datetime(df["date"])    #convert date column to date data type
Renaming Columns
df.columns = df.columns.str.lower().str.replace(" ", "_")
Handling Inconsistent Data
df["gender"] = df["gender"].str.lower()   #convert to lower case
df["name"] = df["name"].str.strip()    
Clean data leads to more accurate analysis and reliable models. Python’s pandas library simplifies cleaning tasks such as handling missing values, duplicates, incorrect types, and inconsistencies.

Hey Everyone 👋 Should we continue another series on "Data Manipulation with Pandas" just like the previous series?
Anonymous voting

📚 Data Science Riddle - Dimensionality Reduction You want to visualize high-dimensional clusters while keeping neighborhood structure intact. What should you use?
Anonymous voting

Sometimes reality outpaces expectations in the most unexpected ways. While global AI development seems increasingly fragmented, Sber just released Europe's largest open-source AI collection—full weights, code, and commercial rights included. ✅ No API paywalls. ✅ No usage restrictions. ✅ Just four complete model families ready to run in your private infrastructure, fine-tuned on your data, serving your specific needs. What makes this release remarkable isn't merely the technical prowess, but the quiet confidence behind sharing it openly when others are building walls. Find out more in the article from the developers. GigaChat Ultra Preview: 702B-parameter MoE model (36B active per token) with 128K context window. Trained from scratch, it outperforms DeepSeek V3.1 on specialized benchmarks while maintaining faster inference than previous flagships. Enterprise-ready with offline fine-tuning for secure environments. GitHub | HuggingFace GigaChat Lightning offers the opposite balance: compact yet powerful MoE architecture running on your laptop. It competes with Qwen3-4B in quality, matches the speed of Qwen3-1.7B, yet is significantly smarter and larger in parameter count. Lightning holds its own against the best open-source models in its class, outperforms comparable models on different tasks, and delivers ultra-fast inference—making it ideal for scenarios where Ultra would be overkill and speed is critical. Plus, it features stable expert routing and a welcome bonus: 256K context support. GitHub | Hugging Face Kandinsky 5.0 brings a significant step forward in open generative models. The flagship Video Pro matches Veo 3 in visual quality and outperforms Wan 2.2-A14B, while Video Lite and Image Lite offer fast, lightweight alternatives for real-time use cases. The suite is powered by K-VAE 1.0, a high-efficiency open-source visual encoder that enables strong compression and serves as a solid base for training generative models. This stack balances performance, scalability, and practicality—whether you're building video pipelines or experimenting with multimodal generation. GitHub | Hugging Face | Technical report Audio gets its upgrade too: GigaAM-v3 delivers speech recognition model with 50% lower WER than Whisper-large-v3, trained on 700k hours of audio with punctuation/normalization for spontaneous speech. GitHub | HuggingFace Every model can be deployed on-premises, fine-tuned on your data, and used commercially. It's not just about catching up – it's about building sovereign AI infrastructure that belongs to everyone who needs it.

Big Data Formats Big Data formats such as Parquet, ORC, and Feather are designed for efficient storage and fast access when working with large datasets. They are optimized for performance, compression, and scalability, making them ideal for data science and big data applications. Parquet Parquet is a columnar storage format widely used in big data ecosystems such as Apache Spark and Hadoop. It allows efficient reading of selected columns and supports strong compression.
import pandas as pd

# Read Parquet file into a DataFrame
df = pd.read_parquet("data.parquet")
ORC (Optimized Row Columnar) ORC is a columnar format optimized for high-performance analytics and commonly used in Hadoop-based systems.
import pandas as pd

# Read ORC file into a DataFrame
df = pd.read_orc("data.orc")
Feather Feather is a lightweight binary format designed for fast data exchange between Python and other languages like R.
import pandas as pd

# Read Feather file into a DataFrame
df = pd.read_feather("data.feather")
✅ This concludes our Data Importing Series. 👉Join @datascience_bds for more Part of the @bigdataspecialist family ❤️

HTML Tables HTML tables are commonly found on websites and can be imported into Python for analysis by extracting table data
HTML Tables HTML tables are commonly found on websites and can be imported into Python for analysis by extracting table data directly from web pages. This is useful for collecting publicly available data without manually copying it. Importing HTML Tables Using Pandas
import pandas as pd

# URL of the webpage containing HTML tables
url = "https://example.com/page"

# Read all tables from the webpage
tables = pd.read_html(url)

# Select the first table
df = tables[0]
Next up ➡️ Big Data Formats

Pickle Files Pickle files (.pkl) are used to store serialized Python objects such as DataFrames, lists, dictionaries, or trai
Pickle Files Pickle files (.pkl) are used to store serialized Python objects such as DataFrames, lists, dictionaries, or trained models. They allow quick saving and loading of Python objects without converting them to text formats. Importing Pickle files in python
import pickle  # Library for object serialization

# Open the pickle file in read-binary mode
with open("data.pkl", "rb") as file:
    data = pickle.load(file)  # Load the stored Python object
Using Pickle with Pandas
import pandas as pd

# Load a pickled pandas DataFrame
df = pd.read_pickle("data.pkl")
Next up ➡️ Importing HTML Tables

API Key Authentication import requests # API endpoint url = "https://api.example.com/data" # Parameters including the API key
API Key Authentication
import requests

# API endpoint
url = "https://api.example.com/data"

# Parameters including the API key for authentication
params = {
    "api_key": "YOUR_API_KEY"  # Replace with your actual API key
}

# Send GET request with parameters
response = requests.get(url, params=params)

# Convert JSON response to Python object
data = response.json()

# Print the data
print(data)
Next up ➡️ Importing Pickle files in python

Importing API Data into a Pandas DataFrame import requests # Library for making HTTP requests import pandas as pd # Library f
Importing API Data into a Pandas DataFrame
import requests            # Library for making HTTP requests
import pandas as pd        # Library for data manipulation and analysis

# API endpoint
url = "https://api.example.com/users"

# Send request to API
response = requests.get(url)

# Convert JSON response to Python object
data = response.json()

# Convert the JSON data into a pandas DataFrame
df = pd.DataFrame(data)

# Display the first five rows of the DataFrame
print(df.head())
Next up ➡️ API Key Authentication

An API (Application Programming Interface) allows different software systems to communicate with each other. In data science
An API (Application Programming Interface) allows different software systems to communicate with each other. In data science and software development, APIs are commonly used to retrieve data from web services such as social media platforms, financial systems, weather services, and databases hosted online. Python provides powerful libraries that make it easy to import and process data from APIs efficiently. Making API Requests in Python HTTP Methods GET – retrieve data POST – send data PUT – update data DELETE – remove data Next up ➡️ Importing API Data into a Pandas DataFrame 👉Join @datascience_bds for more Part of the @bigdataspecialist family ❤️

📚 Data Science Riddle - NLP You want a model to capture meaning similarity between sentences. What representation is best?
Anonymous voting

Loading a JSON file in Python JSON is the king of APIs, config files, NoSQL databases, and web data. With Python’s built-in json module (or pandas), you go from file to usable data in seconds
# Import json module (built-in, no install needed!)
import json

# Or import pandas if you want it directly as a DataFrame
import pandas as pd

# Your JSON file path
filename = "data.json"

# Load JSON file into a Python dictionary/list
with open(filename, "r", encoding="utf-8") as file:
    data = json.load(file)

# Quick look at structure and first few items
print(type(data))        # usually dict or list
print(data.keys() if isinstance(data, dict) else len(data))

# Load the json file
df = pd.read_json(filename)         


df.head()
👉Join @datascience_bds for more Part of the @bigdataspecialist family

Loading a text file in Python Text files (.txt) are perfect for logs, books, raw notes, or any unstructured data. With one clean line using pathlib, you can load an entire novel, log file, or dataset into a string
# Loading a text file in Python

filename = 'huck_finn.txt'                  # Name of the file to open

file = open(filename, mode='r')             # Open file in read mode ('r')
                                            # Use encoding='utf-8' if needed

text = file.read()                          # Read entire content into a string

print(file.closed)                          # False → file is still open

file.close()                                # Always close the file when done!
                                            # Prevents memory leaks & file locks

print(file.closed)                          # Now True → file is safely closed

print(text)                                 # Display the full text content
Next up ➡️ Loading a JSON file in Python 👉Join @datascience_bds for more Part of the @bigdataspecialist family

📚 Data Science Riddle - Numerical Optimization Which method uses second-order curvature information?
Anonymous voting

Loading an Excel file in Python Excel files are packed with headers, logos, merged cells, and multiple sheets but pandas hand
Loading an Excel file in Python Excel files are packed with headers, logos, merged cells, and multiple sheets but pandas handles it all. With just a few extra parameters, you can skip junk rows, pick exact columns,e.t.c
# Import the pandas library 
import pandas as pd

# Specify the path to your Excel file (.xlsx or .xls)
filename = "data.xlsx"

# Read the Excel file into a DataFrame
# Common options you'll use all the time:
df = pd.read_excel(
    filename,
    sheet_name=0,              # 0 = first sheet
    header=0,                  # Row (0-indexed) to use as column names
    skiprows=4,                # Skip first 4 rows
    nrows=1000,                # Load only first 1000 rows
)
# Check the first five rows
df.head()
Next up ➡️ Loading a text file in Python 👉Join @datascience_bds for more Part of the @bigdataspecialist family

Loading a CSV file in Python CSV stands for Comma-Separated Values the most common format for tabular data everywhere. With p
Loading a CSV file in Python CSV stands for Comma-Separated Values the most common format for tabular data everywhere. With pandas, turning a CSV into a powerful, queryable DataFrame takes just a few clear lines.
# Import the pandas library
import pandas as pd

# Specify the path to your CSV file
filename = "data.csv"

# Read the CSV file into a DataFrame
df = pd.read_csv(filename)

#Checking the first five rows
df.head()
Next up ➡️ Loading an Excel file in Python 👉Join @datascience_bds for more Part of the @bigdataspecialist family