Data Science & Machine Learning
Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free For collaborations: @love_data
Ko'proq ko'rsatish๐ Telegram kanali Data Science & Machine Learning analitikasi
Data Science & Machine Learning (@datasciencefun) Ingliz til segmentidagi kanali faol ishtirokchi. Hozirda hamjamiyat 75 660 obunachidan iborat bo'lib, Taสผlim toifasida 2 114-o'rinni va Hindiston mintaqasida 4 359-o'rinni egallagan.
๐ Auditoriya koโrsatkichlari va dinamika
ะฝะตะฒัะดะพะผะพ sanasidan buyon loyiha tez oโsib, 75 660 obunachiga ega boโldi.
11 Iyun, 2026 dagi oxirgi maโlumotlarga koโra kanal barqaror faollikka ega. Oxirgi 30 kunda obunachilar soni 911 ga, soโnggi 24 soatda esa 29 ga oโzgardi va umumiy qamrov yuqori darajada qolmoqda.
- Tasdiqlash holati: Tasdiqlanmagan
- Jalb etish (ER): Auditoriya oโrtacha 3.63% darajada jalb etiladi. Nashrdan keyingi dastlabki 24 soatda kontent odatda umumiy obunachilar sonining 1.36% ini tashkil etuvchi reaksiyalarni toโplaydi.
- Post qamrovi: Har bir post oโrtacha 2 747 marta koโriladi; birinchi sutkada odatda 1 032 ta koโrish yigโiladi.
- Reaksiyalar va oโzaro taโsir: Auditoriya faol: har bir postga oโrtacha 5 ta reaksiya keladi.
- Tematik yoโnalishlar: Kontent learning, accuracy, distribution, panda, dataset kabi asosiy mavzularga jamlangan.
๐ Tavsif va kontent siyosati
Muallif resursni shaxsiy fikrni ifoda etish maydoni sifatida taโriflaydi:
โJoin this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free
For collaborations: @love_dataโ
Yuqori yangilanish chastotasi (oxirgi maโlumot 12 Iyun, 2026 da olingan) sababli kanal doimo dolzarb va katta qamrovli boโlib qoladi. Analitika auditoriya kontent bilan faol hamkorlik qilishini, uni Taสผlim toifasidagi muhim taโsir nuqtasiga aylantirishini koโrsatadi.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
Step 2. Load data
df = pd.read_csv("house_prices.csv")
df.head()
Step 3. Basic checks
df.shape
df.info()
df.isnull().sum()
Step 4. Data cleaning
Fill missing values.
df.fillna(df.median(numeric_only=True), inplace=True)
Step 5. Encode categorical variables
le = LabelEncoder()
for col in df.select_dtypes(include='object').columns:
df[col] = le.fit_transform(df[col])
Step 6. Feature scaling
scaler = StandardScaler()
X = df.drop('price', axis=1)
y = df['price']
X_scaled = scaler.fit_transform(X)
Step 7. Train test split
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, y, test_size=0.3, random_state=42
)
Step 8. Build model
Linear Regression.
model = LinearRegression()
model.fit(X_train, y_train)
Step 9. Predictions
y_pred = model.predict(X_test)
Step 10. Evaluation
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)
print("MAE:", mae)
print("RMSE:", rmse)
print("R2:", r2)
Typical results
โข R2 between 0.70 to 0.85
โข Location and area dominate price
Step 11. Feature importance
importance = pd.DataFrame({
'Feature': X.columns,
'Coefficient': model.coef_
}).sort_values(by='Coefficient', ascending=False)
importance
Interpretation: Positive coefficient increases price. Negative reduces price.
Step 12. Model improvements
โข Ridge regression for multicollinearity
โข Lasso for feature selection
โข Random Forest for non-linear patterns
Resume bullet example
โข Built house price prediction model using regression
โข Achieved R2 score above 0.8
โข Identified key price drivers
Interview explanation flow
โข Why RMSE matters
โข How multicollinearity affects coefficients
โข Why tree models outperform linear sometimes
Mini task for you
โข Try Ridge and Lasso
โข Compare RMSE
โข Plot actual vs predicted
Double Tap โฅ๏ธ For Moreimport pandas as pd
import numpy as np
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
nltk.download('stopwords')
Step 2. Load data
df = pd.read_csv("sentiment.csv")
df.head()
Example review: "The movie was amazing" sentiment: 1
Step 3. Basic checks
df.shape
df['sentiment'].value_counts()
Step 4. Text cleaning
stemmer = PorterStemmer()
stop_words = set(stopwords.words('english'))
def clean_text(text):
text = text.lower()
text = re.sub('[^a-z]', ' ', text)
words = text.split()
words = [stemmer.stem(w) for w in words if w not in stop_words]
return ' '.join(words)
df['clean_review'] = df['review'].apply(clean_text)
Step 5. Train test split
X = df['clean_review']
y = df['sentiment']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
Step 6. Text vectorization TF IDF
tfidf = TfidfVectorizer(max_features=5000)
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)
Why TF IDF
โข Reduces common word weight
โข Keeps meaningful words
Step 7. Model building
model = LogisticRegression(max_iter=1000)
model.fit(X_train_tfidf, y_train)
Step 8. Predictions
y_pred = model.predict(X_test_tfidf)
Step 9. Evaluation
accuracy_score(y_test, y_pred)
confusion_matrix(y_test, y_pred)
print(classification_report(y_test, y_pred))
Typical results
โข Accuracy 85 to 90 percent
โข Precision strong on positive reviews
โข Neutral text harder to classify
Step 10. Test on custom text
sample = ["The product quality is terrible"]
sample_clean = [clean_text(sample[0])]
sample_vec = tfidf.transform(sample_clean)
model.predict(sample_vec)
Output: 0 negative
Common interview questions
โข Why TF IDF over CountVectorizer
โข How stopwords affect meaning
โข Why Logistic Regression works well
Improvements
โข Use n grams
โข Try Naive Bayes
โข Use LSTM or Transformers
Resume bullet example
โข Built sentiment analysis model using TF IDF and Logistic Regression
โข Achieved 88 percent accuracy on review data
โข Automated text preprocessing pipeline
Mini task for you
โข Add bigrams
โข Compare Naive Bayes
โข Plot ROC curve
Double Tap โฅ๏ธ For Moreimport pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
Step 2. Load data
df = pd.read_csv("ratings.csv")
df.head()
Example data
user_id | item_id | rating
1 | 101 | 5
1 | 102 | 3
Step 3. Create user item matrix
user_item_matrix = df.pivot_table(
index='user_id',
columns='item_id',
values='rating'
)
Matrix shape
Rows users
Columns items
Values ratings
Step 4. Handle missing values
user_item_matrix.fillna(0, inplace=True)
Why? Cosine similarity needs numbers.
Step 5. Compute user similarity
user_similarity = cosine_similarity(user_item_matrix)
user_similarity_df = pd.DataFrame(
user_similarity,
index=user_item_matrix.index,
columns=user_item_matrix.index
)
Step 6. Find similar users
user_id = 1
similar_users = user_similarity_df[user_id].sort_values(ascending=False)
similar_users.head()
Top result User itself score 1. Ignore it.
Step 7. Recommend items
Get items rated by similar users
similar_users = similar_users[similar_users.index != user_id]
weighted_ratings = user_item_matrix.loc[similar_users.index].T.dot(similar_users)
recommendations = weighted_ratings.sort_values(ascending=False)
Remove already rated items.
already_rated = user_item_matrix.loc[user_id]
already_rated = already_rated[already_rated > 0].index
recommendations = recommendations.drop(already_rated)
recommendations.head(5)
Output Top 5 recommended item IDs.
Step 8. Why cosine similarity
โข Focuses on rating pattern
โข Ignores scale differences
โข Fast and simple
Limitations
โข Cold start problem
โข Sparse matrix
โข No item features
Improvements
โข Item based filtering
โข Matrix factorization
โข Hybrid models
Resume bullet example
โข Built recommendation system using collaborative filtering
โข Used cosine similarity on user item matrix
โข Generated personalized item recommendations
Interview explanation flow
โข Difference between content based and collaborative
โข Why sparsity hurts
โข Cold start solutions
Mini task for you
โข Convert to item based filtering
โข Add minimum similarity threshold
โข Evaluate using precision at K
Double Tap โฅ๏ธ For Moreimport pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_absolute_error, mean_squared_error
Step 2. Load Data
df = pd.read_csv("sales.csv")
df.head()
Step 3. Date Handling
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
# Sort by date
df = df.sort_index()
Step 4. Visualize Sales Trend
plt.plot(df.index, df['Sales'])
plt.title("Sales over time")
plt.show()
What you observe:
- Trend
- Seasonality
- Sudden spikes
Step 5. Decompose Time Series
decomposition = seasonal_decompose(df['Sales'], model='additive')
decomposition.plot()
plt.show()
Insight
- Trend shows long-term growth
- Seasonality repeats yearly or monthly
Step 6. Train Test Split
Split by time.
train = df.iloc[:-12]
test = df.iloc[-12:]
Why Last 12 months simulate future.
Step 7. Build ARIMA Model
model = ARIMA(train['Sales'], order=(1,1,1))
model_fit = model.fit() # corrected from (link unavailable)
Order meaning
- p: autoregressive
- d: differencing
- q: moving average
Step 8. Forecast
forecast = model_fit.forecast(steps=12)
print(forecast)
Step 9. Plot Forecast vs Actual
plt.plot(train.index, train['Sales'], label='Train')
plt.plot(test.index, test['Sales'], label='Actual')
plt.plot(test.index, forecast, label='Forecast')
plt.legend()
plt.show()
Step 10. Evaluation
mae = mean_absolute_error(test['Sales'], forecast)
rmse = np.sqrt(mean_squared_error(test['Sales'], forecast))
print("MAE:", mae)
print("RMSE:", rmse)
Typical results:
- RMSE depends on scale
- Trend captured well
- Peaks harder to predict
Step 11. Business Interpretation
- Underforecast leads to stockouts
- Overforecast leads to inventory waste
- Accuracy matters near peaks
Model Improvement Ideas
- SARIMA for seasonality
- Prophet for business calendars
- Add promotions and holidays
Resume Bullet Example
- Built time series model to forecast monthly sales
- Used ARIMA with rolling time-based split
- Reduced forecasting error using trend analysis
Interview Explanation Flow
- Why random split fails
- Importance of seasonality
- Error metrics selection
Mini Task for You
- Try SARIMA
- Forecast next 24 months
- Compare RMSE across models
Double Tap โฅ๏ธ For Moreimport pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score
Step 2. Load data
df = pd.read_csv("creditcard.csv")
df.head()
Step 3. Basic checks
df.shape
df['Class'].value_counts()
Output example:
โข Genuine 284315
โข Fraud 492
Step 4. Data understanding
Check class imbalance:
sns.countplot(x='Class', data=df)
plt.show()
Insight Highly imbalanced dataset.
Step 5. Feature scaling
Scale Amount column:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df['Amount'] = scaler.fit_transform(df[['Amount']])
Drop Time.python
df.drop('Time', axis=1, inplace=True)
Step 6. Split features and target
X = df.drop('Class', axis=1)
y = df['Class']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
Step 7. Baseline model
Logistic Regression with class weight:
model = LogisticRegression(
max_iter=1000, class_weight='balanced'
)
model.fit(X_train, y_train)
Why class_weight
โข Penalizes fraud mistakes more
โข Improves recall
Step 8. Predictions
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:,1]
Step 9. Evaluation
Confusion matrix:
confusion_matrix(y_test, y_pred)Classification report:
print(classification_report(y_test, y_pred))
ROC AUC:
roc_auc_score(y_test, y_prob)
Typical results
โข Accuracy looks high but ignored
โข Fraud recall improves sharply
โข ROC AUC around 0.97
Step 10. Threshold tuning
Increase fraud recall:
y_pred_custom = (y_prob > 0.3).astype(int) confusion_matrix(y_test, y_pred_custom)Business logic Lower threshold catches more fraud. More false alerts accepted. Step 11. Advanced approach Random Forest:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(
n_estimators=100, class_weight='balanced', random_state=42
)
rf.fit(X_train, y_train)
rf_prob = rf.predict_proba(X_test)[:,1]
roc_auc_score(y_test, rf_prob)
Resume bullet example
- Built fraud detection model on highly imbalanced data
- Improved fraud recall using class weighting and threshold tuning
- Evaluated model using ROC AUC instead of accuracy
Interview explanation flow
- Explain imbalance problem
- Why accuracy fails
- Why recall matters
- How threshold changes business impact
Mini task for you
- Apply SMOTE
- Compare with Isolation Forest
- Plot Precision Recall curve
Double Tap โฅ๏ธ For Moreimport pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
Step 2. Load data
df = pd.read_csv("customer_churn.csv")
df.head()
Step 3. Basic checks
df.shape
df.info()
df.isnull().sum()
Step 4. Data cleaning
Convert TotalCharges to numeric.
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df['TotalCharges'].fillna(df['TotalCharges'].median(), inplace=True)
Drop customer ID.
df.drop('customerID', axis=1, inplace=True)
Step 5. Exploratory Data Analysis
Churn distribution.
sns.countplot(x='Churn', data=df)
plt.show()
Tenure vs churn.
sns.boxplot(x='Churn', y='tenure', data=df)
plt.show()
Common insights:
โข Month-to-month contracts churn more
โข Low tenure users churn early
โข High monthly charges increase churn
Step 6. Encode categorical variables
le = LabelEncoder()
for col in df.select_dtypes(include='object').columns:
df[col] = le.fit_transform(df[col])
Step 7. Feature scaling
scaler = StandardScaler()
num_cols = ['tenure', 'MonthlyCharges', 'TotalCharges']
df[num_cols] = scaler.fit_transform(df[num_cols])
Step 8. Split data
X = df.drop('Churn', axis=1)
y = df['Churn']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
Step 9. Build model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
Step 10. Predictions
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:,1]
Step 11. Evaluation
confusion_matrix(y_test, y_pred)
print(classification_report(y_test, y_pred))
roc_auc_score(y_test, y_prob)
Typical results:
โข Accuracy around 78 to 83 percent
โข ROC AUC around 0.84
โข Recall for churn is key metric
Step 12. Business actions from model
โข Target high-risk users
โข Offer discounts to month-to-month users
โข Push yearly contracts
โข Improve onboarding for first 90 days
Resume bullet example:
โข Built churn prediction model using Logistic Regression
โข Identified contract type and tenure as top churn drivers
โข Improved churn recall using class-aware split
Interview explanation flow:
โข Revenue loss problem
โข Why recall matters more than accuracy
โข How features map to actions
Mini task for you:
โข Train Random Forest
โข Compare ROC AUC
โข Tune threshold for higher recall
Double Tap โฅ๏ธ For Part-3
Endi mavjud! Telegram Tadqiqoti 2025 โ yilning asosiy insaytlari 
