ch
Feedback
Machine Learning with Python

Machine Learning with Python

前往频道在 Telegram

Learn Machine Learning with hands-on Python tutorials, real-world code examples, and clear explanations for researchers and developers. Admin: @HusseinSheikho || @Hussein_Sheikho

显示更多
67 815
订阅者
+324 小时
+757
+6330
帖子存档
Repost from Machine Learning
Data leakage is one of the main reasons why ML demos look impressive... and then fail in production. 📉 The model didn't become smarter. It just happened to see the correct answers in advance. In 4 minutes, you'll understand where data leaks hide. 🔍 Let's break it down below: 👇 1. Data Leakage 🕳️ Data leakage occurs when information that won't be available at the time of actual prediction is used during the model training process. Because of this, metrics on the validation stage can look much better than the actual quality of the model on new, previously unseen data. 2. Model Evaluation ⚖️ The test set isn't just "additional data". It's a simulation of the future. Only train the model on the information that would have been available to you at the time of prediction. Evaluate it on examples that the model couldn't have influenced during training. 3. Direct Leakage 🚨 This is the most obvious type of leakage. Examples: - a field with information from the future; - an ID that encodes the target variable; - a variable that appears only after an event has occurred; - duplicate records in both the training and test sets. If a feature doesn't exist at the time of inference (prediction), then it's likely a source of data leakage. 4. Indirect Leakage 🕵️ This is the type of leakage that most often traps teams. You perform normalization, imputation, feature selection, outlier removal, or dimensionality reduction before splitting the data into a training and test set. The model didn't directly see the data from the test set. But your preprocessing pipeline already saw it. 5. Train/Test Split ✂️ Wrong:
fit the scaler on all data → split the data → evaluate
Right:
split the data → fit the scaler only on the training set → apply it to both the training and test sets
The same idea applies to imputers, encoders, feature selection, PCA, and any preprocessing step that is trained on the data. 6. Cross-Validation 🔄 Each fold is a mini-experiment with a training and test set. Therefore, preprocessing should be performed within each fold. If you prepared the entire dataset once and then ran cross-validation, each fold would already have had access to its held-out data. 7. Pipelines 🛠️ A pipeline isn't just a way to make the code cleaner. It's also a defense against data leakage. Combine preprocessing, feature selection, and the model into a single pipeline, and then pass this pipeline to cross-validation or hyperparameter search (grid search). 8. AI Engineering Version 🤖 Data leaks also occur in RAG systems and when evaluating LLMs. Leakage occurs when you tune chunks, prompts, re-rankers, thresholds, or examples on the same evaluation dataset that you later present as "held-out". As a result, your benchmark turns into training data. 9. Leakage Checklist ✅ Before trusting the obtained metric, ask yourself: - Could this feature exist at the time of prediction? - Was any transformation (transform) step trained (fit) on the test data? - Did cross-validation include the entire pipeline? - Were we tuning parameters on the final evaluation dataset? If the answer is "yes", then the metric likely doesn't reflect the actual quality of the model. #MachineLearning #DataScience #MLOps #DataLeakage #ArtificialIntelligence #TechTips ✨ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk ⭐️ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Stop discovering ML Python libraries one random tutorial at a time 🛑 Best-of Machine Learning with Python is a curated GitHu
Stop discovering ML Python libraries one random tutorial at a time 🛑 Best-of Machine Learning with Python is a curated GitHub index of open-source machine learning Python libraries for builders who need a faster way to compare the ecosystem 📚. It helps you shortlist tools by grouping projects into categories and ranking them with a project-quality score based on metrics collected from GitHub and package managers 📊. Key features: • 920-project index – a large scan-friendly map of open-source ML Python projects 🗺️ • 34 categories – browse by area like ML frameworks, NLP, image data, AutoML, deployment, interpretability, and more 🧩 • Quality-score ranking – projects are ordered using an automated score from repo and package-manager signals ⚙️ • Rich project metadata – entries show signals like stars, forks, issues, contributors, activity, downloads, and dependencies 📈 • Weekly updates + contributions – the list is updated regularly and can be improved via issues, PRs, or projects.yaml edits 🔄 It’s open-source (CC BY-SA 4.0 license) 📜. https://github.com/lukasmasuch/best-of-ml-python 🔗 #MachineLearning #Python #ML #OpenSource #DataScience #TechStack ✨ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk ⭐️ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

🚀 HelloEncyclo Presale is LIVE! Master the skills that matter — Gen-AI, Data Science, Machine Learning and more — all in one
🚀 HelloEncyclo Presale is LIVE! Master the skills that matter — Gen-AI, Data Science, Machine Learning and more — all in one place. 🎁 First 250 members get a flat 40% OFF Use code: PRESALE-BOOK-WAVE-2GFG ✅ 13 full courses live right now ✅ 40+ more dropping in the next 2–3 weeks ✅ Complete library within 2 months — built and refined by industry experts ✅ 15-day money-back guarantee — don't love it? Get a full refund. ⚠️ Coupon works only after you log in with Gmail, and it's valid once per member. 👉 Log in now and start learning: https://helloencyclo.com Don't wait — the 40% deal disappears after the first 250 seats. 🔥

Transformer implementations for vision, audio, and AI agents 🤖👁️🎵 Repo: https://github.com/Nicolepcx/transformers-the-defi
Transformer implementations for vision, audio, and AI agents 🤖👁️🎵 Repo: https://github.com/Nicolepcx/transformers-the-definitive-guide #AI #MachineLearning #Vision #Audio #Agents #Tech ✨ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk ⭐️ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Stop. Kill BTTS chaos with our BTTS-NO picks 🚀🔥⚡💰 ONE MARKET. BIG EDGE. 2-day trial = 48h of filtered games (xG, shots, te
Stop. Kill BTTS chaos with our BTTS-NO picks 🚀🔥⚡💰 ONE MARKET. BIG EDGE. 2-day trial = 48h of filtered games (xG, shots, tempo) 🧠📉🛡️ Get the list inside unlock tomorrow’s BTTS-NO shortlist ⚽✅ ➡️ Start the 2-day BTTS-NO trial now #ad 📢 InsideAd

🔥 I send Gold alerts. You copy. No experience. No complex charts. 10 minutes/day from your phone. Join Tania’s Free Academy
🔥 I send Gold alerts. You copy. No experience. No complex charts. 10 minutes/day from your phone. Join Tania’s Free Academy 👇 #ad 📢 InsideAd

Did you know… unlock Prashant’s daily trade drops 🔒🤝🔥 Not for everyone: I’m leaking what the inner circle watches-3–4 Gold
Did you know… unlock Prashant’s daily trade drops 🔒🤝🔥 Not for everyone: I’m leaking what the inner circle watches-3–4 Gold & BTC setups daily with clear SL/targets. 🧠📊 Research from 5,000+ TradingView journals shows traders who pre-plan entries/SL outperform “market chasers” by 31%-get the exact zones inside the XAU/BTC sniper feed. 🎯💥 👉 join the private trade room #ad 📢 InsideAd

photo content

Repost from Machine Learning
🔖 A huge open-source course on AI Engineering from scratch In the repository, we've collected: — 435 lessons; — 320+ hours o
🔖 A huge open-source course on AI Engineering from scratch In the repository, we've collected: — 435 lessons; — 320+ hours of content; — Python, TypeScript, and Rust; — AI agents, MCP servers, prompts, and AI skills. Moreover, almost every lesson includes practical tasks, so this isn't just theory, but a full-fledged roadmap for AI Engineering. 🚀 ⛓️ Link to the repository https://github.com/rohitg00/ai-engineering-from-scratch #AI #MachineLearning #Python #Rust #OpenSource #Tech ✨ Join Best TG Channels https://t.me/addlist/0f6vfFbEMdAwODBk ⭐️ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

🔥 I send Gold alerts. You copy. No experience. No complex charts. 10 minutes/day from your phone. Join Tania’s Free Academy
🔥 I send Gold alerts. You copy. No experience. No complex charts. 10 minutes/day from your phone. Join Tania’s Free Academy 👇 #ad 📢 InsideAd

Unlock the Best Cricket Insights 🌟🏏 One day, I stumbled upon a hidden gem in the world of cricket predictions. It turns out
Unlock the Best Cricket Insights 🌟🏏 One day, I stumbled upon a hidden gem in the world of cricket predictions. It turns out that many fans forget to check the right channels for accurate forecasts and exclusive insights! 📊 Imagine missing out on key match strategies or the game-changing stats that can elevate your fandom to the next level. Don’t be that fan! By joining our channel, you can tap into the insider knowledge that others overlook. - Get live updates & predictions - Access exclusive content and analyses - Connect with fellow cricket enthusiasts Don’t wait - join now and elevate your cricket experience! 👉 Join Us Today #ad 📢 InsideAd

Found an easy way to learn math for ML: Mathematics for Machine Learning 🎓📚 This is a curated collection on GitHub, including books, research papers, video lectures, and basic materials on math for studying and reviewing the mathematical foundations of machine learning. 📖📊 It helps build a stronger knowledge base by bringing together trusted resources around topics that machine learning engineers constantly encounter: linear algebra, mathematical analysis, probability theory, statistics, information theory, matrix calculus, and deep learning mathematics. 🧮🤖 Free public repository on GitHub. 💻✨ https://github.com/dair-ai/Mathematics-for-ML #MachineLearning #Mathematics #DataScience #Learning #GitHub #AI

Repost from Data Analytics
Pandas vs Polars vs DuckDB: Which Library Should You Choose? 🤔📊 pandas remains the default choice for notebooks, explorator
Pandas vs Polars vs DuckDB: Which Library Should You Choose? 🤔📊 pandas remains the default choice for notebooks, exploratory analysis, visualization, and machine learning workflows 📝📈. Polars focus on fast, memory-efficient DataFrame processing ⚡💾, while DuckDB brings a SQL-first approach for querying local files and embedded analytics 🗄️🔍. Each tool fits a different kind of local data workflow 🛠️. In this article, we compare pandas, Polars, and DuckDB across performance, architecture, interoperability, and real-world use cases 🏆🔗. More: https://www.analyticsvidhya.com/blog/2026/05/pandas-vs-polars-vs-duckdb/ 🔗 #DataScience #Pandas #Polars #DuckDB #Python #Analytics

Did you know… Steal the “one boring task” AI workflow Everyone thinks AI wins by adding more tools… but the truth is: a singl
Did you know… Steal the “one boring task” AI workflow Everyone thinks AI wins by adding more tools… but the truth is: a single weekly task can save more time than 10 shiny apps 🤖📉 Inside the latest practical post is the exact format: task → input → AI step → human review → output 🧩⚙️ The twist: the human review isn’t optional - it’s the part that makes workflows reliable… and most people place it in the wrong spot 😬 👉 Build your first repeatable AI system today #ad 📢 InsideAd

Repost from Machine Learning
🔥 Awesome open-source project to learn more about Transformer Models! 🤖✨ We found this interactive website that shows you v
🔥 Awesome open-source project to learn more about Transformer Models! 🤖✨ We found this interactive website that shows you visually how transformer models work. 🌐📊 Transformer Explainer: https://poloclub.github.io/transformer-explainer/ #TransformerModels #OpenSource #AI #MachineLearning #DataScience #Tech

Unlock Practical AI Workflows Did you know that AI is transforming how we manage our tasks? 🚀🔥 From coding agents enhancing
Unlock Practical AI Workflows Did you know that AI is transforming how we manage our tasks? 🚀🔥 From coding agents enhancing enterprise workflows to customer support evolving into AI-driven networks, the future is here! 🤖💡 But… the real question that remains is: How do you maximize the potential of these AI tools in your daily operations? - Discover the essential steps to integrate AI seamlessly into your business. - Understand the shift from simple chatbots to impactful workflows. - Learn how to define clear processes that keep AI effective and efficient. Don’t miss out on the insights that could revolutionize your work! 👉 Join the AI Lab #ad 📢 InsideAd