ch
Feedback
Data Science & Machine Learning

Data Science & Machine Learning

前往频道在 Telegram

The first channel on Telegram that offers exciting questions, answers, and tests in data science, artificial intelligence, machine learning, and programming languages. For promotions: @love_data

显示更多

📈 Telegram 频道 Data Science & Machine Learning 的分析概览

频道 Data Science & Machine Learning (@datascienceinterviews) 英语 语言赛道中的 是活跃参与者。目前社区聚集了 27 265 名订阅者,在 教育 类别中位列第 7 190,并在 印度 地区排名第 15 948

📊 受众指标与增长动态

невідомо 创建以来,项目保持高速增长,吸引了 27 265 名订阅者。

根据 14 六月, 2026 的最新数据,频道保持稳定运转。过去 30 天订阅人数变化为 142,过去 24 小时变化为 10,整体触达仍然可观。

  • 认证状态: 未认证
  • 互动率 (ER): 平均受众互动率为 0.56%。内容发布后 24 小时内通常能获得 0.53% 的反应,占订阅者总量。
  • 帖子覆盖: 每篇帖子平均可获得 152 次浏览,首日通常累积 144 次浏览。
  • 互动与反馈: 受众积极参与,单帖平均反应数为 1
  • 主题关注点: 内容集中在 insidead, mining, pinix, learning, neo 等核心主题上。

📝 描述与内容策略

作者将该频道定位为表达主观观点的平台:
The first channel on Telegram that offers exciting questions, answers, and tests in data science, artificial intelligence, machine learning, and programming languages. For promotions: @love_data

凭借高频更新(最新数据采集于 15 六月, 2026),频道始终保持新鲜度与高覆盖。分析显示受众积极互动,使其成为 教育 类别中的关键影响点。

27 265
订阅者
+1024 小时
+407
+14230
帖子存档
Coffee Break NumPy Christian Mayer, 2018

What is feature selection? Why do we need it? Feature Selection is a method used to select the relevant features for the model to train on. We need feature selection to remove the irrelevant features which leads the model to under-perform.

What are the main parameters of the decision tree model? • maximum tree depth • minimum samples per leaf node • impurity criterion

What are the decision trees? This is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables. In this algorithm, we split the population into two or more homogeneous sets. This is done based on most significant attributes/ independent variables to make as distinct groups as possible. A decision tree is a flowchart-like tree structure, where each internal node (non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a value for the target variable. Various techniques : like Gini, Information Gain, Chi-square, entropy.

Why is it require to split our data into three parts: train, validation, and test? • The training set is used to fit the model, i.e. to train the model with the data. • The validation set is then used to provide an unbiased evaluation of a model while fine-tuning hyperparameters. This improves the generalization of the model. • Finally, a test data set which the model has never "seen" before should be used for the final evaluation of the model. This allows for an unbiased evaluation of the model. The evaluation should never be performed on the same data that is used for training. Otherwise the model performance would not be representative.

Can you explain how cross-validation works? Cross-validation is the process to separate your total training set into two subsets: training and validation set, and evaluate your model to choose the hyperparameters. But you do this process iteratively, selecting differents training and validation set, in order to reduce the bias that you would have by selecting only one validation set What is K-fold cross-validation? K fold cross validation is a method of cross validation where we select a hyperparameter k. The dataset is now divided into k parts. Now, we take the 1st part as validation set and remaining k-1 as training set. Then we take the 2nd part as validation set and remaining k-1 parts as training set. Like this, each part is used as validation set once and the remaining k-1 parts are taken together and used as training set. It should not be used in a time series data.

What is the bias-variance trade-off? • Bias is the error introduced by approximating the true underlying function, which can be quite complex, by a simpler model. Variance is a model sensitivity to changes in the training dataset. • Bias-variance trade-off is a relationship between the expected test error and the variance and the bias - both contribute to the level of the test error and ideally should be as small as possible: ExpectedTestError = Variance + Bias² + IrreducibleError • But as a model complexity increases, the bias decreases and the variance increases which leads to overfitting. And vice versa, model simplification helps to decrease the variance but it increases the bias which leads to underfitting.

What is sigmoid? What does it do? A sigmoid function is a type of activation function, and more specifically defined as a squashing function. Squashing functions limit the output to a range between 0 and 1, making these functions useful in the prediction of probabilities. Sigmod(x) = 1/(1+e^{-x})

What is overfitting? When your model perform very well on your training set but can't generalize the test set, because it adjusted a lot to the training set.

How do we evaluate classification models? Depending on the classification problem, we can use the following evaluation metrics: Accuracy Precision Recall F1 Score Logistic loss (also known as Cross-entropy loss) Jaccard similarity coefficient score

Data Science Interview questions.pdf17.59 MB

🎓 Amazing Opportunity to start your career in Data Analytics & Data Science 🚀 👩‍💻 Who: 2025 or earlier graduates students (B.Tech/B.Sc/B.E/BCA/MCA/M.Tech) 📅 Date: 22nd June 2024 🕔 Time: 5PM - 7PM 💡 What: Compete in Data Analytics Coding Contest Top 3 performers get internship/job referrals from partner companies Apply Link: https://bit.ly/3z7pYMc Don't miss out on this incredible opportunity! 🌟

[Compilation]1000+ Data Science Interview Questions/Preparation Resources Compilation created by kaggle users 1. GIT interview questions for DS and SQL Interview questions 2. 50 ML questions 3. Four years on interview questions 4. Compilation of pandas interview questions 5. Difference between common ML algortihms 6. Scenario based Data questions 7. Top python interview questions 8. Internship questions for DS interns 9. Questions from DS- Netflix 10. India specific Data science interview questions 11. R interview questions 12. Explain a project in Data science 13. A great collection of cheatsheets, analyzed here 14. A collection of questions on Github here 15. Cheat Sheets for Machine Learning Interview Topics 16. Compiled list of 600+ Q&As for Data Science interview prep 🎉 17. Approaching almost any ML Problem, originally shared on Kaggle 18. A Basics refresher 19. A notebook 20. Companies and Data Science Interview questions Megathread 21. Data Scientist - Interview Question Bank 22. ML Interview questions 23. Machine Learning Interviews Book https://www.kaggle.com/discussions/questions-and-answers/239533 Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624 Credits: https://t.me/datasciencefun Like if you need similar content 😄👍 Hope this helps you 😊

Are you looking to become a machine learning engineer? The algorithm brought you to the right place! 📌 I created a free and comprehensive roadmap. Let's go through this thread and explore what you need to know to become an expert machine learning engineer: Math & Statistics Just like most other data roles, machine learning engineering starts with strong foundations from math, precisely linear algebra, probability and statistics. Here are the probability units you will need to focus on: Basic probability concepts statistics Inferential statistics Regression analysis Experimental design and A/B testing Bayesian statistics Calculus Linear algebra Python: You can choose Python, R, Julia, or any other language, but Python is the most versatile and flexible language for machine learning. Variables, data types, and basic operations Control flow statements (e.g., if-else, loops) Functions and modules Error handling and exceptions Basic data structures (e.g., lists, dictionaries, tuples) Object-oriented programming concepts Basic work with APIs Detailed data structures and algorithmic thinking Machine Learning Prerequisites: Exploratory Data Analysis (EDA) with NumPy and Pandas Basic data visualization techniques to visualize the variables and features. Feature extraction Feature engineering Different types of encoding data Machine Learning Fundamentals Using scikit-learn library in combination with other Python libraries for: Supervised Learning: (Linear Regression, K-Nearest Neighbors, Decision Trees) Unsupervised Learning: (K-Means Clustering, Principal Component Analysis, Hierarchical Clustering) Reinforcement Learning: (Q-Learning, Deep Q Network, Policy Gradients) Solving two types of problems: Regression Classification Neural Networks: Neural networks are like computer brains that learn from examples, made up of layers of "neurons" that handle data. They learn without explicit instructions. Types of Neural Networks: Feedforward Neural Networks: Simplest form, with straight connections and no loops. Convolutional Neural Networks (CNNs): Great for images, learning visual patterns. Recurrent Neural Networks (RNNs): Good for sequences like text or time series, because they remember past information. In Python, it’s the best to use TensorFlow and Keras libraries, as well as PyTorch, for deeper and more complex neural network systems. Deep Learning: Deep learning is a subset of machine learning in artificial intelligence (AI) that has networks capable of learning unsupervised from data that is unstructured or unlabeled. Convolutional Neural Networks (CNNs) Recurrent Neural Networks (RNNs) Long Short-Term Memory Networks (LSTMs) Generative Adversarial Networks (GANs) Autoencoders Deep Belief Networks (DBNs) Transformer Models Machine Learning Project Deployment Machine learning engineers should also be able to dive into MLOps and project deployment. Here are the things that you should be familiar or skilled at: Version Control for Data and Models Automated Testing and Continuous Integration (CI) Continuous Delivery and Deployment (CD) Monitoring and Logging Experiment Tracking and Management Feature Stores Data Pipeline and Workflow Orchestration Infrastructure as Code (IaC) Model Serving and APIs Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624 Credits: https://t.me/datasciencefun Like if you need similar content 😄👍 Hope this helps you 😊

🚀🎢 Welcome to the Crypto Rollercoaster! 🎢🚀 Get ready for the thrill of a lifetime with $TICKET tokens! 🌟 - High Returns:
+2
🚀🎢 Welcome to the Crypto Rollercoaster! 🎢🚀 Get ready for the thrill of a lifetime with $TICKET tokens! 🌟    - High Returns: Potential gains up to 386,900% per ride!  - Low Trading Fee: Supporting the project, marketing, and the team. 🔥 Invest Now & Secure Your Ticket to Riches! 🔥   Buy $TICKETTwitter | TelegramChannel https://rollercoaster.finance

Ad 👇👇

Which of the following is not a machine learning type?
Anonymous voting

What is the difference between a random forest and a gradient boosting machine? 1. Random forest is an ensemble of decision trees while gradient boosting is a single decision tree 2. Random forest combines decision trees using boosting while gradient boosting combines decision trees using bagging 3. Random forest uses bagging while gradient boosting uses boosting 4. Random forest is used for regression while gradient boosting is used for classification ✅ Correct Response: 3 Explanation: Random forest is an ensemble of decision trees that combines the results of multiple decision trees using bagging. Gradient boosting is also an ensemble of decision trees, but it combines the results of multiple decision trees using boosting.

_DATA SCIENCE INTERVIEW _ ♦️♣️.pdf9.76 KB