Data Science & Machine Learning
Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free For collaborations: @love_data
显示更多📈 Telegram 频道 Data Science & Machine Learning 的分析概览
频道 Data Science & Machine Learning (@datasciencefun) 英语 语言赛道中的 是活跃参与者。目前社区聚集了 75 818 名订阅者,在 教育 类别中位列第 2 113,并在 印度 地区排名第 4 286 位。
📊 受众指标与增长动态
自 невідомо 创建以来,项目保持高速增长,吸引了 75 818 名订阅者。
根据 18 六月, 2026 的最新数据,频道保持稳定运转。过去 30 天订阅人数变化为 884,过去 24 小时变化为 6,整体触达仍然可观。
- 认证状态: 未认证
- 互动率 (ER): 平均受众互动率为 3.25%。内容发布后 24 小时内通常能获得 1.38% 的反应,占订阅者总量。
- 帖子覆盖: 每篇帖子平均可获得 2 462 次浏览,首日通常累积 1 043 次浏览。
- 互动与反馈: 受众积极参与,单帖平均反应数为 4。
- 主题关注点: 内容集中在 learning, accuracy, distribution, panda, dataset 等核心主题上。
📝 描述与内容策略
作者将该频道定位为表达主观观点的平台:
“Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free
For collaborations: @love_data”
凭借高频更新(最新数据采集于 19 六月, 2026),频道始终保持新鲜度与高覆盖。分析显示受众积极互动,使其成为 教育 类别中的关键影响点。
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
# Example data
data = {
'Age': [29, 45, 50, 39, 48, 50, 55, 60, 62, 43],
'Cholesterol': [220, 250, 230, 180, 240, 290, 310, 275, 300, 280],
'Max_Heart_Rate': [180, 165, 170, 190, 155, 160, 150, 140, 130, 148],
'Heart_Disease': [0, 1, 1, 0, 1, 1, 1, 1, 1, 0]
}
df = pd.DataFrame(data)
# Independent variables (features) and dependent variable (target)
X = df[['Age', 'Cholesterol', 'Max_Heart_Rate']]
y = df['Heart_Disease']
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Creating and training the random forest model
model = RandomForestClassifier(n_estimators=100, random_state=0)
model.fit(X_train, y_train)
# Making predictions
y_pred = model.predict(X_test)
# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Classification Report:\n{class_report}")
# Feature importance
feature_importances = pd.DataFrame(model.feature_importances_, index=X.columns, columns=['Importance']).sort_values('Importance', ascending=False)
print(f"Feature Importances:\n{feature_importances}")
# Plotting the feature importances
sns.barplot(x=feature_importances.index, y=feature_importances['Importance'])
plt.title('Feature Importances')
plt.xlabel('Feature')
plt.ylabel('Importance')
plt.show()
## Explanation of the Code
1. Libraries: We import necessary libraries like numpy, pandas, sklearn, matplotlib, and seaborn.
2. Data Preparation: We create a DataFrame containing features (Age, Cholesterol, Max_Heart_Rate) and the target variable (Heart_Disease).
3. Feature and Target: We separate the features and the target variable.
4. Train-Test Split: We split the data into training and testing sets.
5. Model Training: We create a RandomForestClassifier model with 100 trees and train it using the training data.
6. Predictions: We use the trained model to predict heart disease for the test set.
7. Evaluation: We evaluate the model using accuracy, confusion matrix, and classification report.
8. Feature Importance: We compute and display the importance of each feature.
9. Visualization: We plot the feature importances to visualize which features contribute most to the model's predictions.
## Evaluation Metrics
- Accuracy: The proportion of correctly classified instances among the total instances.
- Confusion Matrix: Shows the counts of true positives, true negatives, false positives, and false negatives.
- Classification Report: Provides precision, recall, F1-score, and support for each class.
Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624
Credits: t.me/datasciencefun
ENJOY LEARNING 👍👍# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
# Example data
data = {
'Age': [25, 45, 35, 50, 23, 37, 32, 28, 40, 27],
'Income': ['High', 'High', 'High', 'Medium', 'Low', 'Low', 'Low', 'Medium', 'Low', 'Medium'],
'Student': ['No', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No'],
'Buys_Computer': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes']
}
df = pd.DataFrame(data)
# Convert categorical features to numeric
df['Income'] = df['Income'].map({'Low': 1, 'Medium': 2, 'High': 3})
df['Student'] = df['Student'].map({'No': 0, 'Yes': 1})
df['Buys_Computer'] = df['Buys_Computer'].map({'No': 0, 'Yes': 1})
# Independent variables (features) and dependent variable (target)
X = df[['Age', 'Income', 'Student']]
y = df['Buys_Computer']
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Creating and training the decision tree model
model = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=0)
model.fit(X_train, y_train)
# Making predictions
y_pred = model.predict(X_test)
# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Classification Report:\n{class_report}")
# Plotting the decision tree
plt.figure(figsize=(12,8))
plot_tree(model, feature_names=['Age', 'Income', 'Student'], class_names=['No', 'Yes'], filled=True)
plt.title('Decision Tree')
plt.show()
#### Explanation of the Code
1. Libraries: We import necessary libraries like numpy, pandas, sklearn, and matplotlib.
2. Data Preparation: We create a DataFrame containing features and the target variable. Categorical features are converted to numeric values.
3. Feature and Target: We separate the features (Age, Income, Student) and the target (Buys_Computer).
4. Train-Test Split: We split the data into training and testing sets.
5. Model Training: We create a DecisionTreeClassifier model, specifying the criterion (Gini impurity) and maximum depth of the tree, and train it using the training data.
6. Predictions: We use the trained model to predict whether a person buys a computer for the test set.
7. Evaluation: Evaluate the model using accuracy, confusion matrix, and classification report.
8. Visualization: Plot decision tree to visualize the decision-making process.
## Evaluation Metrics
- Accuracy
- Confusion Matrix: Shows the counts of true positives, true negatives, false positives, and false negatives.
- Classification Report: Provides precision, recall, F1-score, and support for each class.
Like if you need similar content 😄👍
Hope this helps you 😊# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score, roc_curve
import matplotlib.pyplot as plt
# Example data
data = {
'Hours_Studied': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Passed': [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
# Independent variable (feature) and dependent variable (target)
X = df[['Hours_Studied']]
y = df['Passed']
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Creating and training the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Making predictions
y_pred = model.predict(X_test)
y_pred_prob = model.predict_proba(X_test)[:, 1]
# Evaluating the model
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred_prob)
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Classification Report:\n{class_report}")
print(f"ROC-AUC: {roc_auc}")
# Plotting the ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
plt.plot(fpr, tpr, label='Logistic Regression (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()
## Explanation of the Code
1. Libraries: We import necessary libraries like numpy, pandas, sklearn, and matplotlib.
2. Data Preparation: We create a DataFrame containing the hours studied and whether the student passed.
3. Feature and Target: We separate the feature (Hours_Studied) and the target (Passed).
4. Train-Test Split: We split the data into training and testing sets.
5. Model Training: We create a LogisticRegression model and train it using the training data.
6. Predictions: We use the trained model to predict the pass/fail outcome for the test set and also obtain the predicted probabilities.
7. Evaluation: We evaluate the model using the confusion matrix, classification report, and ROC-AUC score.
8. Visualization: We plot the ROC curve to visualize the model's performance.
## Evaluation Metrics
- Confusion Matrix: Shows the counts of true positives, true negatives, false positives, and false negatives.
- Classification Report: Provides precision, recall, F1-score, and support for each class.
- ROC-AUC: Measures the model's ability to distinguish between the classes. AUC (Area Under the Curve) closer to 1 indicates better performance.
Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624
Credits: https://t.me/datasciencefun
Like if you need similar content 😄👍
Hope this helps you 😊# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
# Example data
data = {
'Size': [1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400],
'Price': [300000, 320000, 340000, 360000, 380000, 400000, 420000, 440000, 460000, 480000]
}
df = pd.DataFrame(data)
# Independent variable (feature) and dependent variable (target)
X = df[['Size']]
y = df['Price']
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Creating and training the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Making predictions
y_pred = model.predict(X_test)
# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")
# Plotting the results
plt.scatter(X, y, color='blue') # Original data points
plt.plot(X_test, y_pred, color='red', linewidth=2) # Regression line
plt.xlabel('Size (sq ft)')
plt.ylabel('Price ($)')
plt.title('Linear Regression: House Prices vs Size')
plt.show()
#### Explanation of the Code
1. Libraries: We import necessary libraries like numpy, pandas, sklearn, and matplotlib.
2. Data Preparation: We create a DataFrame containing the size and price of houses.
3. Feature and Target: We separate the feature (Size) and the target (Price).
4. Train-Test Split: We split the data into training and testing sets.
5. Model Training: We create a LinearRegression model and train it using the training data.
6. Predictions: We use the trained model to predict house prices for the test set.
7. Evaluation: We evaluate the model using Mean Squared Error (MSE) and R-squared (R²) metrics.
8. Visualization: We plot the original data points and the regression line to visualize the model's performance.
#### Evaluation Metrics
- Mean Squared Error (MSE): Measures the average squared difference between the actual and predicted values. Lower values indicate better performance.
- R-squared (R²): Represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s). Values closer to 1 indicate a better fit.
Share this channel with your real friends: https://t.me/datasciencefun
Like if you want me to continue this series 😄❤️
ENJOY LEARNING 👍👍
现已上线!2025 年 Telegram 研究 — 年度关键洞察 
