Data Science & Machine Learning

رفتن به کانال در Telegram

The first channel on Telegram that offers exciting questions, answers, and tests in data science, artificial intelligence, machine learning, and programming languages. For promotions: @love_data

نمایش بیشتر

شبکه:Data Analytics الهند15 948 آموزش7 190...

📈 تحلیل کانال تلگرام Data Science & Machine Learning

کانال Data Science & Machine Learning (@datascienceinterviews) در بخش زبانی انگلیسی بازیگری فعال است. در حال حاضر جامعه شامل 27 265 مشترک است و جایگاه 7 190 را در دسته آموزش و رتبه 15 948 را در منطقه الهند دارد.

📊 شاخص‌های مخاطب و پویایی

از زمان ایجاد در невідомо، پروژه رشد سریعی داشته و 27 265 مشترک جذب کرده است.

بر اساس آخرین داده‌ها در تاریخ 14 ژوئن, 2026، کانال فعالیت پایداری دارد. در ۳۰ روز گذشته تغییر اعضا برابر 142 و در ۲۴ ساعت گذشته برابر 10 بوده و همچنان دسترسی گسترده‌ای حفظ شده است.

وضعیت تأیید: تأیید نشده
نرخ تعامل (ER): میانگین تعامل مخاطب 0.56% است و در ۲۴ ساعت نخست پس از انتشار، محتوا معمولاً 0.53% واکنش نسبت به کل مشترکان کسب می‌کند.
دسترسی پست‌ها: هر پست به طور میانگین 152 بازدید دریافت می‌کند. در اولین روز معمولاً 144 بازدید جمع‌آوری می‌شود.
واکنش‌ها و تعامل: مخاطبان به‌طور فعال حمایت می‌کنند؛ میانگین واکنش به هر پست 1 است.
علایق موضوعی: محتوا بر موضوعات کلیدی مانند insidead, mining, pinix, learning, neo تمرکز دارد.

📝 توضیح و سیاست محتوایی

نویسنده این فضا را محل بیان دیدگاه‌های شخصی توصیف می‌کند:
“The first channel on Telegram that offers exciting questions, answers, and tests in data science, artificial intelligence, machine learning, and programming languages. For promotions: @love_data”

به لطف به‌روزرسانی‌های پرتکرار (آخرین داده در تاریخ 15 ژوئن, 2026)، کانال همواره به‌روز و دارای دسترسی بالاست. تحلیل‌ها نشان می‌دهد مخاطبان به‌طور فعال با محتوا تعامل دارند و آن را به نقطه اثرگذاری مهم در دسته آموزش تبدیل کرده‌اند.

27 265

مشترکین

+1024 ساعت

+407 روز

+14230 روز

152

نمایش های پست

~ 14424 ساعت

اطلاعاتی وجود ندارد48 ساعت

0.56%

نرخ مشارکت

~ 2

پست های در روز

Ads index

beta

آرشیو پست ها

27 265

scientist interview questions

27 265

Virgilio Data Science This repository contains articles, GitHub repos and Kaggle kernels which provides data science and machine learning projects with code. Creator: virgili0 Stars ⭐️: 13.9k Forked By: 2.5k https://github.com/virgili0/Virgilio

27 265

Complete Roadmap to Data Analytics 👇👇 https://youtu.be/1-T-VBjLpJo?si=jDmHiR85vdrDsbja

27 265

What 𝗠𝗟 𝗰𝗼𝗻𝗰𝗲𝗽𝘁𝘀 are commonly asked in 𝗱𝗮𝘁𝗮 𝘀𝗰𝗶𝗲𝗻𝗰𝗲 𝗶𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄𝘀? These are fair game in interviews at 𝘀𝘁𝗮𝗿𝘁𝘂𝗽𝘀, 𝗰𝗼𝗻𝘀𝘂𝗹𝘁𝗶𝗻𝗴 & 𝗹𝗮𝗿𝗴𝗲 𝘁𝗲𝗰𝗵. 𝗙𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝘀 - Supervised vs. Unsupervised Learning - Overfitting and Underfitting - Cross-validation - Bias-Variance Tradeoff - Accuracy vs Interpretability - Accuracy vs Latency 𝗠𝗟 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝘀 - Logistic Regression - Decision Trees - Random Forest - Support Vector Machines - K-Nearest Neighbors - Naive Bayes - Linear Regression - Ridge and Lasso Regression - K-Means Clustering - Hierarchical Clustering - PCA 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴 𝗦𝘁𝗲𝗽𝘀 - EDA - Data Cleaning (e.g. missing value imputation) - Data Preprocessing (e.g. scaling) - Feature Engineering (e.g. aggregation) - Feature Selection (e.g. variable importance) - Model Training (e.g. gradient descent) - Model Evaluation (e.g. AUC vs Accuracy) - Model Productionization 𝗛𝘆𝗽𝗲𝗿𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿 𝗧𝘂𝗻𝗶𝗻𝗴 - Grid Search - Random Search - Bayesian Optimization 𝗠𝗟 𝗖𝗮𝘀𝗲𝘀 - [Capital One] Detect credit card fraudsters - [Amazon] Forecast monthly sales - [Airbnb] Estimate lifetime value of a guest I have curated the best interview resources to crack Data Science Interviews 👇👇 https://topmate.io/analyst/1024129 Like if you need similar content 😄👍

27 265

DATA SCIENCE INTERVIEW QUESTIONS [PART-4] Q. Why does overfitting occur? A. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model Q. What is ensemble learning? A. Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model, or reduce the likelihood of an unfortunate selection of a poor one. Q. What is F1 score? A. The F1 score is defined as the harmonic mean of precision and recall. As a short reminder, the harmonic mean is an alternative metric for the more common arithmetic mean. It is often useful when computing an average rate. In the F1 score, we compute the average of precision and recall. Q. What is pickling and unpickling? A.“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Q. What is lambda function? A. Python Lambda Functions are anonymous function means that the function is without a name. As we already know that the def keyword is used to define a normal function in Python. Similarly, the lambda keyword is used to define an anonymous function in Python. Q. What is the trade of between bias and variance ? A. Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance. ENJOY LEARNING 👍👍

27 265

What happens when we have correlated features in our data? In random forest, since random forest samples some features to build each tree, the information contained in correlated features is twice as much likely to be picked than any other information contained in other features. In general, when you are adding correlated features, it means that they linearly contains the same information and thus it will reduce the robustness of your model. Each time you train your model, your model might pick one feature or the other to "do the same job" i.e. explain some variance, reduce entropy, etc.

27 265

What are the main assumptions of linear regression? There are several assumptions of linear regression. If any of them is violated, model predictions and interpretation may be worthless or misleading. 1) Linear relationship between features and target variable. 2) Additivity means that the effect of changes in one of the features on the target variable does not depend on values of other features. For example, a model for predicting revenue of a company have of two features - the number of items a sold and the number of items b sold. When company sells more items a the revenue increases and this is independent of the number of items b sold. But, if customers who buy a stop buying b, the additivity assumption is violated. 3) Features are not correlated (no collinearity) since it can be difficult to separate out the individual effects of collinear features on the target variable. 4) Errors are independently and identically normally distributed (yi = B0 + B1*x1i + ... + errori): i) No correlation between errors (consecutive errors in the case of time series data). ii) Constant variance of errors - homoscedasticity. For example, in case of time series, seasonal patterns can increase errors in seasons with higher activity. iii) Errors are normaly distributed, otherwise some features will have more influence on the target variable than to others. If the error distribution is significantly non-normal, confidence intervals may be too wide or too narrow.

27 265

Imagine a digital assistant that can: 🤖 Answer any questions 🎨 Create stunning images 🎵 Compose original music ✂️ Edit photos and images All within a single Telegram bot. It sounds like a dream, but it is now a reality. Discover how this AI-powered companion can revolutionize your daily routine, enhance your productivity, and unleash your creativity. 🚀 GST AI Bot 🚀 ChatGPT: Get instant, accurate answers with GPT-3.5 Turbo or GPT-4. Midjourney: Turn text into beautiful illustrations. Suno: Compose original music effortlessly. Stable Diffusion: Generate high-quality images from text prompts. Image Editor: Easily enhance and manipulate photos. This all-in-one AI assistant adapts to your needs, making your daily tasks more efficient and enjoyable. 🌟 Experience the future of personal assistance! 🌟 🚀 Discover the bot here: GST AI bot 🚀

27 265

Data Science Interview Questions and Answers.pdf1.76 MB

27 265

FREE FREE FREE 🔥🔥 💻 Join our Premium Data Science Community for Free ✅ Learn 👇 👉 Data Science 👉 SQL 👉 Machine Learning 👉 Python 👉 Artificial Intelligence Etc. Join Now - https://whatsapp.com/channel/0029Va8v3eo1NCrQfGMseL2D ⚠️ Limited slots available

27 265

Repost from Data Science & Machine Learning

1. For the given points, how will you calculate the Euclidean distance in Python? plot1 = [1,3] plot2 = [2,5] The Euclidean distance can be calculated as follows: euclidean_distance = sqrt( (plot1[0]-plot2[0])*2 + (plot1[1]-plot2[1])*2 ) 2.Which of the following machine learning algorithms can be used for inputting missing values of both categorical and continuous variables? K-means clustering Linear regression K-NN (k-nearest neighbor) Decision trees The K nearest neighbor algorithm can be used because it can compute the nearest neighbor and if it doesn't have a value, it just computes the nearest neighbor based on all the other features. When you're dealing with K-means clustering or linear regression, you need to do that in your pre-processing, otherwise, they'll crash. Decision trees also have the same problem, although there is some variance. 3.How are confidence tests and hypothesis tests similar? How are they different? Confidence intervals and hypothesis testing are both tools used for to make statistical inferences. The confidence interval suggests a range of values for an unknown parameter and is then associated with a confidence level that the true parameter is within the suggested range of. Confidence intervals are often very important in medical research to provide researchers with a stronger basis for their estimations. A confidence interval can be shown as “10 +/- 0.5” or [9.5, 10.5] to give an example. Hypothesis testing is the basis of any research question and often comes down to trying to prove something did not happen by chance. For example, you could try to prove when rolling a dye, one number was more likely to come up than the rest. 4. What is the difference between observational and experimental data? Observational data comes from observational studies which are when you observe certain variables and try to determine if there is any correlation. Experimental data comes from experimental studies which are when you control certain variables and hold them constant to determine if there is any causality. An example of experimental design is the following: split a group up into two. The control group lives their lives normally. The test group is told to drink a glass of wine every night for 30 days. Then research can be conducted to see how wine affects sleep. ENJOY LEARNING 👍👍

27 265

Top free Data Science resources @datasciencefun 1. CS109 Data Science http://cs109.github.io/2015/pages/videos.html 2. Data Science Essentials https://www.edx.org/course/data-science-essentials 3. Learning From Data from California Institute of Technology http://work.caltech.edu/telecourse 4. Mathematics for Machine Learning by University of California, Berkeley https://gwthomas.github.io/docs/math4ml.pdf?fbclid=IwAR2UsBgZW9MRgS3nEo8Zh_ukUFnwtFeQS8Ek3OjGxZtDa7UxTYgIs_9pzSI 5. Foundations of Data Science by Avrim Blum, John Hopcroft, and Ravindran Kannan https://www.cs.cornell.edu/jeh/book.pdf?fbclid=IwAR19tDrnNh8OxAU1S-tPklL1mqj-51J1EJUHmcHIu2y6yEv5ugrWmySI2WY 6. Python Data Science Handbook https://jakevdp.github.io/PythonDataScienceHandbook/?fbclid=IwAR34IRk2_zZ0ht7-8w5rz13N6RP54PqjarQw1PTpbMqKnewcwRy0oJ-Q4aM 7. CS 221 ― Artificial Intelligence https://stanford.edu/~shervine/teaching/cs-221/ 8. Ten Lectures and Forty-Two Open Problems in the Mathematics of Data Science https://ocw.mit.edu/courses/mathematics/18-s096-topics-in-mathematics-of-data-science-fall-2015/lecture-notes/MIT18_S096F15_TenLec.pdf 9. Python for Data Analysis by Boston University https://www.bu.edu/tech/files/2017/09/Python-for-Data-Analysis.pptx 10. Data Mining bu University of Buffalo https://cedar.buffalo.edu/~srihari/CSE626/index.html?fbclid=IwAR3XZ50uSZAb3u5BP1Qz68x13_xNEH8EdEBQC9tmGEp1BoxLNpZuBCtfMSE Share the channel link with friends http://t.me/datasciencefun

27 265

【EU_Exchange】 AI-GPT company recruits HR manager You only need a phone to work from home Age 25 and above Monthly salary $1,800 to $5,000 Main job: 1. Help the company recruit personnel 2. Promote the company's AI smart products 3. If successfully admitted, you may receive a reward of $30+20 4. Telegram channel: https://t.me/EUExchangevip 5. Online customer service: https://chatlink.wchatlink.com/widget/standalone.html?eid=f55ff528d770d699c2cc389645f3577a&language=en

27 265

Some useful PYTHON libraries for data science NumPy stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array. This library also contains basic linear algebra functions, Fourier transforms, advanced random number capabilities and tools for integration with other low level languages like Fortran, C and C++ SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices. Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook –pylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot. Pandas for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Python’s usage in data scientist community. Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction. Statsmodels for statistical modeling. Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator. Seaborn for statistical data visualization. Seaborn is a library for making attractive and informative statistical graphics in Python. It is based on matplotlib. Seaborn aims to make visualization a central part of exploring and understanding data. Bokeh for creating interactive plots, dashboards and data applications on modern web-browsers. It empowers the user to generate elegant and concise graphics in the style of D3.js. Moreover, it has the capability of high-performance interactivity over very large or streaming datasets. Blaze for extending the capability of Numpy and Pandas to distributed and streaming datasets. It can be used to access data from a multitude of sources including Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables, etc. Together with Bokeh, Blaze can act as a very powerful tool for creating effective visualizations and dashboards on huge chunks of data. Scrapy for web crawling. It is a very useful framework for getting specific patterns of data. It has the capability to start at a website home url and then dig through web-pages within the website to gather information. SymPy for symbolic computation. It has wide-ranging capabilities from basic symbolic arithmetic to calculus, algebra, discrete mathematics and quantum physics. Another useful feature is the capability of formatting the result of the computations as LaTeX code. Requests for accessing the web. It works similar to the the standard python library urllib2 but is much easier to code. You will find subtle differences with urllib2 but for beginners, Requests might be more convenient. Additional libraries, you might need: os for Operating system and file operations networkx and igraph for graph based data manipulations regular expressions for finding patterns in text data BeautifulSoup for scrapping web. It is inferior to Scrapy as it will extract information from just a single webpage in a run.

27 265

https://t.me/datasciencej

27 265

Coding and Aptitude Round before interview Coding challenges are meant to test your coding skills (especially if you are applying for ML engineer role). The coding challenges can contain algorithm and data structures problems of varying difficulty. These challenges will be timed based on how complicated the questions are. These are intended to test your basic algorithmic thinking. Sometimes, a complicated data science question like making predictions based on twitter data are also given. These challenges are hosted on HackerRank, HackerEarth, CoderByte etc. In addition, you may even be asked multiple-choice questions on the fundamentals of data science and statistics. This round is meant to be a filtering round where candidates whose fundamentals are little shaky are eliminated. These rounds are typically conducted without any manual intervention, so it is important to be well prepared for this round. Sometimes a separate Aptitude test is conducted or along with the technical round an aptitude test is also conducted to assess your aptitude skills. A Data Scientist is expected to have a good aptitude as this field is continuously evolving and a Data Scientist encounters new challenges every day. If you have appeared for GMAT / GRE or CAT, this should be easy for you. Resources for Prep: For algorithms and data structures prep,Leetcode and Hackerrank are good resources. For aptitude prep, you can refer to IndiaBixand Practice Aptitude. With respect to data science challenges, practice well on GLabs and Kaggle. Brilliant is an excellent resource for tricky math and statistics questions. For practising SQL, SQL Zoo and Mode Analytics are good resources that allow you to solve the exercises in the browser itself. Things to Note: Ensure that you are calm and relaxed before you attempt to answer the challenge. Read through all the questions before you start attempting the same. Let your mind go into problem-solving mode before your fingers do! In case, you are finished with the test before time, recheck your answers and then submit. Sometimes these rounds don’t go your way, you might have had a brain fade, it was not your day etc. Don’t worry! Shake if off for there is always a next time and this is not the end of the world.

27 265

Enjoy our content? Advertise on this channel and reach a highly engaged audience! 👉🏻 It's easy with Telega.io. As the leading platform for native ads and integrations on Telegram, it provides user-friendly and efficient tools for quick and automated ad launches. ⚡️ Place your ad here in three simple steps: 1 Sign up: https://telega.io/c/DataScienceInterviews 2 Top up the balance in a convenient way 3 Create your advertising post If your ad aligns with our content, we’ll gladly publish it. Start your promotion journey now!

27 265

10 commonly asked data science interview questions along with their answers 1️⃣ What is the difference between supervised and unsupervised learning? Supervised learning involves learning from labeled data to predict outcomes while unsupervised learning involves finding patterns in unlabeled data. 2️⃣ Explain the bias-variance tradeoff in machine learning. The bias-variance tradeoff is a key concept in machine learning. Models with high bias have low complexity and over-simplify, while models with high variance are more complex and over-fit to the training data. The goal is to find the right balance between bias and variance. 3️⃣ What is the Central Limit Theorem and why is it important in statistics? The Central Limit Theorem (CLT) states that the sampling distribution of the sample means will be approximately normally distributed regardless of the underlying population distribution, as long as the sample size is sufficiently large. It is important because it justifies the use of statistics, such as hypothesis testing and confidence intervals, on small sample sizes. 4️⃣ Describe the process of feature selection and why it is important in machine learning. Feature selection is the process of selecting the most relevant features (variables) from a dataset. This is important because unnecessary features can lead to over-fitting, slower training times, and reduced accuracy. 5️⃣ What is the difference between overfitting and underfitting in machine learning? How do you address them? Overfitting occurs when a model is too complex and fits the training data too well, resulting in poor performance on unseen data. Underfitting occurs when a model is too simple and cannot fit the training data well enough, resulting in poor performance on both training and unseen data. Techniques to address overfitting include regularization and early stopping, while techniques to address underfitting include using more complex models or increasing the amount of input data. 6️⃣ What is regularization and why is it used in machine learning? Regularization is a technique used to prevent overfitting in machine learning. It involves adding a penalty term to the loss function to limit the complexity of the model, effectively reducing the impact of certain features. 7️⃣ How do you handle missing data in a dataset? Handling missing data can be done by either deleting the missing samples, imputing the missing values, or using models that can handle missing data directly. 8️⃣ What is the difference between classification and regression in machine learning? Classification is a type of supervised learning where the goal is to predict a categorical or discrete outcome, while regression is a type of supervised learning where the goal is to predict a continuous or numerical outcome. 9️⃣ Explain the concept of cross-validation and why it is used. Cross-validation is a technique used to evaluate the performance of a machine learning model. It involves spliting the data into training and validation sets, and then training and evaluating the model on multiple such splits. Cross-validation gives a better idea of the model's generalization ability and helps prevent over-fitting. 🔟 What evaluation metrics would you use to evaluate a binary classification model? Some commonly used evaluation metrics for binary classification models are accuracy, precision, recall, F1 score, and ROC-AUC. The choice of metric depends on the specific requirements of the problem. Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624 Credits: https://t.me/datasciencefun Like if you need similar content 😄👍 Hope this helps you 😊

27 265

Repost from Data Science Projects

SQL Interview Question for #DataScience: A company has provided sales data containing information about customer purchases, as shown in the table below. Your task is to: Calculate Total Revenue Calculate Total Sales by Product Find Top Customers by Revenue Solve it using SQL

27 265

Amazon Interview Process for Data Scientist position 📍Round 1- Phone Screen round This was a preliminary round to check my capability, projects to coding, Stats, ML, etc. After clearing this round the technical Interview rounds started. There were 5-6 rounds (Multiple rounds in one day). 📍 𝗥𝗼𝘂𝗻𝗱 𝟮- 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗕𝗿𝗲𝗮𝗱𝘁𝗵: In this round the interviewer tested my knowledge on different kinds of topics. 📍𝗥𝗼𝘂𝗻𝗱 𝟯- 𝗗𝗲𝗽𝘁𝗵 𝗥𝗼𝘂𝗻𝗱: In this round the interviewers grilled deeper into 1-2 topics. I was asked questions around: Standard ML tech, Linear Equation, Techniques, etc. 📍𝗥𝗼𝘂𝗻𝗱 𝟰- 𝗖𝗼𝗱𝗶𝗻𝗴 𝗥𝗼𝘂𝗻𝗱- This was a Python coding round, which I cleared successfully. 📍𝗥𝗼𝘂𝗻𝗱 𝟱- This was 𝗛𝗶𝗿𝗶𝗻𝗴 𝗠𝗮𝗻𝗮𝗴𝗲𝗿 where my fitment for the team got assessed. 📍𝗟𝗮𝘀𝘁 𝗥𝗼𝘂𝗻𝗱- 𝗕𝗮𝗿 𝗥𝗮𝗶𝘀𝗲𝗿- Very important round, I was asked heavily around Leadership principles & Employee dignity questions. So, here are my Tips if you’re targeting any Data Science role: -> Never make up stuff & don’t lie in your Resume. -> Projects thoroughly study. -> Practice SQL, DSA, Coding problem on Leetcode/Hackerank. -> Download data from Kaggle & build EDA (Data manipulation questions are asked) Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624 ENJOY LEARNING 👍👍