Data science/ML/AI

رفتن به کانال در Telegram

Data science and machine learning hub Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources. For beginners, data scientists and ML engineers 👉 https://rebrand.ly/bigdatachannels DMCA: @disclosure_bds Contact: @mldatascientist

نمایش بیشتر

شبکه:Programming, data science, ML - free courses by Big Data Specialist الهند31 771 فناوری و برنامه‌ها9 387...

📈 تحلیل کانال تلگرام Data science/ML/AI

کانال Data science/ML/AI (@datascience_bds) در بخش زبانی انگلیسی بازیگری فعال است. در حال حاضر جامعه شامل 13 663 مشترک است و جایگاه 9 387 را در دسته فناوری و برنامه‌ها و رتبه 31 771 را در منطقه الهند دارد.

📊 شاخص‌های مخاطب و پویایی

از زمان ایجاد در невідомо، پروژه رشد سریعی داشته و 13 663 مشترک جذب کرده است.

بر اساس آخرین داده‌ها در تاریخ 05 ژوئن, 2026، کانال فعالیت پایداری دارد. در ۳۰ روز گذشته تغییر اعضا برابر 171 و در ۲۴ ساعت گذشته برابر 1 بوده و همچنان دسترسی گسترده‌ای حفظ شده است.

وضعیت تأیید: تأیید نشده
نرخ تعامل (ER): میانگین تعامل مخاطب 7.95% است و در ۲۴ ساعت نخست پس از انتشار، محتوا معمولاً 2.46% واکنش نسبت به کل مشترکان کسب می‌کند.
دسترسی پست‌ها: هر پست به طور میانگین 1 086 بازدید دریافت می‌کند. در اولین روز معمولاً 336 بازدید جمع‌آوری می‌شود.
واکنش‌ها و تعامل: مخاطبان به‌طور فعال حمایت می‌کنند؛ میانگین واکنش به هر پست 5 است.
علایق موضوعی: محتوا بر موضوعات کلیدی مانند panda, learning, row, api, ethic تمرکز دارد.

📝 توضیح و سیاست محتوایی

نویسنده این فضا را محل بیان دیدگاه‌های شخصی توصیف می‌کند:
“Data science and machine learning hub Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources. For beginners, data scientists and ML engineers 👉 https://rebrand.ly/bigdatachannels DMCA: @disclosure_bds Contact: @mldatasci...”

به لطف به‌روزرسانی‌های پرتکرار (آخرین داده در تاریخ 07 ژوئن, 2026)، کانال همواره به‌روز و دارای دسترسی بالاست. تحلیل‌ها نشان می‌دهد مخاطبان به‌طور فعال با محتوا تعامل دارند و آن را به نقطه اثرگذاری مهم در دسته فناوری و برنامه‌ها تبدیل کرده‌اند.

13 663

مشترکین

+124 ساعت

+597 روز

+17130 روز

1 086

نمایش های پست

~ 33624 ساعت

~ 49948 ساعت

7.95%

نرخ مشارکت

~ 1

پست های در روز

Ads index

beta

آرشیو پست ها

13 663

Repost from Programming, data science, ML - free courses by Big Data Specialist

Data Science Interview Questions and Answers.pdf13.55 MB

13 663

VC Dimension In theory courses, VC dimension appears abstract. But it answers a deep question:

How complex is your model’s decision boundary?

VC dimension measures the largest number of points a model can shatter (perfectly classify in all labelings). Why this is important❔ Two models with similar parameter counts can have very different capacities. For example: 📦 k-NN → very high effective capacity 📐 Linear classifier → limited capacity 🌳 Deep trees → extremely high capacity What you need to understand Generalization depends on capacity relative to data size. Too much capacity with little data leads to overfitting. ✅ VC dimension is about expressive power, not just number of parameters.

13 663

Data Lakehouse Architecture for ML Cheat Sheet.pdf1.04 KB

13 663

Repost from Programming Quiz Channel

Which ML concept refers to splitting data into training and testing subsets?

Anonymous voting

13 663

LLMs are getting insanely popular lately and suddenly everyone is talking about AI, chatbots, copilots, agents… so let’s clear it up 👇 So what are LLMs really? 🤔 LLMs = Large Language Models Think of them as insanely smart text prediction machines that learned from tons of books, code, docs, and conversations 📚💻 Why everyone is obsessed right now 🔥 • They can write code 🧑‍💻 • Explain complex stuff like a friend 🗣 • Analyze data 📊 • Power chatbots, copilots, agents 🤖 • One model, MANY tasks Why they exploded now 🚀 • GPUs got better and cheaper • Open source models became really good • Companies realized: this saves time and money 💰 The most famous LLMs you hear about 👀 • GPT-4 / GPT-4.1 by OpenAI • Claude 3 by Anthropic • Gemini by Google • LLaMA 3 by Meta • Mistral by Mistral AI Where LLMs are actually used today 🛠 • Chatbots and AI assistants • Writing SQL and Python • Data analysis and reporting • Customer support automation • Internal company tools Important truth 💡 LLMs are not magic 🪄 They are very powerful autocomplete with reasoning skills. Learn how to use them properly and you are already ahead of most people 😉

13 663

🧠 LayerNorm vs BatchNorm: Same Goal, Different Behavior Both techniques normalize activations, but they operate differently. Batch Normalization 📦 Normalizes across the batch ⚡️ Depends on batch statistics 🖼 Works very well in CNNs ⚠️ Sensitive to small batch sizes Layer Normalization 🔬 Normalizes across features per sample 📏 Independent of batch size 🤖 Preferred in transformers and NLP ✅ Stable for sequence models Why transformers use LayerNorm❔ Sequence models often run with variable or small batches. LayerNorm avoids reliance on batch statistics and stays stable. ✅ Rule of thumb 🖼 CNNs → BatchNorm 🤖 Transformers → LayerNorm 📌 They look similar mathematically but normalize along different axes.

13 663

Apache Kafka Cheat Sheet.pdf0.84 KB

13 663

Generative AI 101 in 10 Terms

13 663

⚡️📊 One Line Feature Scaling Scaling features without touching sklearn 👀

df["age_scaled"] = (df["age"] - df["age"].mean()) / df["age"].std()

Why it is useful: • Quick experiments • Better intuition • No pipeline overhead

13 663

Prompt Engineering Cheat Sheet.pdf0.67 KB

13 663

Python for Data Analytics: The Ultimate Library Ecosystem (2026 Edition) This wheel is the Python data stack that's recommended from raw scraping to production insights: ➡️ Data Manipulation → Pandas, Polars (the fast successor), NumPy ➡️ Visualization → Matplotlib, Seaborn, Plotly (interactive dashboards) ➡️ Analysis → SciPy, Statsmodels, Pingouin ➡️ Time Series → Darts, Kats, Tsfresh, sktime ➡️ NLP → NLTK, spaCy, TextBlob, transformers (BERT & friends) ➡️ Web Scraping → BeautifulSoup, Scrapy, Selenium 🔥 Pro tip from real projects: 👉Switch to Polars when Pandas starts choking on >1 GB datasets 👉 Use Plotly + Dash when stakeholders want interactive reports 👉 Combine Darts + Tsfresh for serious time-series feature engineering

13 663

Repost from Programming Quiz Channel

Unsupervised learning often uses:

Anonymous voting

13 663

AI Agents Roadmap 2026.pdf1.66 MB

13 663

Type of Data Professionals

13 663

🤯📈 Detect Outliers in 5 Lines Simple Z score based outlier detection.

import numpy as np

z = (df["salary"] - df["salary"].mean()) / df["salary"].std()
outliers = df[np.abs(z) > 3]

Why this matters: • Clean data • Better models • Fewer surprises in production Small code. Big impact.

13 663

Pre-Chunking vs. Post-Chunking (On-Demand Chunking) This visual breaks down two common ways to chunk documents in Retrieval-Augmented Generation (RAG) systems,and when each makes sense. Pre-Chunking Documents are cleaned, split into chunks, embedded, and stored ahead of time. • Pros: Fast retrieval at query time, simpler runtime pipeline. • Cons: Rigid,changing chunk size or strategy means reprocessing the entire dataset. • Best for: Stable datasets, high-throughput apps, predictable queries. Post-Chunking / On-Demand Chunking Documents are stored whole; chunking happens after retrieval based on the user’s query. • Pros: More flexible and query-aware, often more relevant context. • Cons: Higher latency and infrastructure complexity. • Best for: Evolving content, exploratory queries, precision-focused use cases. 🔑 Takeaway: There’s no one-size-fits-all. If speed and scale matter most, pre-chunk. If adaptability and relevance are key, post-chunk. Many production systems even combine both.

13 663

Layers of AI

13 663

Support Vector Machines Cheat Sheet.pdf1.28 KB

13 663

✅ Natural Language Processing (NLP) Basics You Should Know 🧠💬 Understanding NLP is key to working with language-based AI systems like chatbots, translators, and voice assistants. 1️⃣ What is NLP? NLP stands for Natural Language Processing. It enables machines to understand, interpret, and respond to human language. 2️⃣ Key NLP Tasks: - Text classification (spam detection, sentiment analysis) - Named Entity Recognition (NER) (identifying names, places) - Tokenization (splitting text into words/sentences) - Part-of-speech tagging (noun, verb, etc.) - Machine translation (English → French) - Text summarization - Question answering 3️⃣ Tokenization Example:

from nltk.tokenize import word_tokenize  
text = "ChatGPT is awesome!"  
tokens = word_tokenize(text)  
print(tokens)  # ['ChatGPT', 'is', 'awesome', '!']

4️⃣ Sentiment Analysis: Detects the emotion of text (positive, negative, neutral).

from textblob import TextBlob  
TextBlob("I love AI!").sentiment  # Sentiment(polarity=0.5, subjectivity=0.6)

5️⃣ Stopwords Removal: Removes common words like “is”, “the”, “a”.

from nltk.corpus import stopwords  
words = ["this", "is", "a", "test"]
filtered = [w for w in words if w not in stopwords.words("english")]

6️⃣ Lemmatization vs Stemming: - Stemming: Cuts off word endings (running → run) - Lemmatization: Uses vocab & grammar (better results) 7️⃣ Vectorization: Converts text into numbers for ML models. - Bag of Words - TF-IDF - Word Embeddings (Word2Vec, GloVe) 8️⃣ Transformers in NLP: Modern NLP models like BERT, GPT use transformer architecture for deep understanding. 9️⃣ Applications of NLP: - Chatbots - Virtual assistants (Alexa, Siri) - Sentiment analysis - Email classification - Auto-correction and translation 🔟 Tools/Libraries: - NLTK - spaCy - TextBlob - Hugging Face Transformers 💬 Tap ❤️ for more!

13 663

How To Tell a Data Story