Data Engineers
Free Data Engineering Ebooks & Courses
Ko'proq ko'rsatish📈 Telegram kanali Data Engineers analitikasi
Data Engineers (@sql_engineer) Ingliz til segmentidagi kanali faol ishtirokchi. Hozirda hamjamiyat 10 371 obunachidan iborat bo'lib, Taʼlim toifasida 19 370-o'rinni va Hindiston mintaqasida 40 181-o'rinni egallagan.
📊 Auditoriya ko‘rsatkichlari va dinamika
невідомо sanasidan buyon loyiha tez o‘sib, 10 371 obunachiga ega bo‘ldi.
08 Iyun, 2026 dagi oxirgi ma’lumotlarga ko‘ra kanal barqaror faollikka ega. Oxirgi 30 kunda obunachilar soni 245 ga, so‘nggi 24 soatda esa 13 ga o‘zgardi va umumiy qamrov yuqori darajada qolmoqda.
- Tasdiqlash holati: Tasdiqlanmagan
- Jalb etish (ER): Auditoriya o‘rtacha 10.67% darajada jalb etiladi. Nashrdan keyingi dastlabki 24 soatda kontent odatda umumiy obunachilar sonining 2.43% ini tashkil etuvchi reaksiyalarni to‘playdi.
- Post qamrovi: Har bir post o‘rtacha 1 106 marta ko‘riladi; birinchi sutkada odatda 252 ta ko‘rish yig‘iladi.
- Reaksiyalar va o‘zaro ta’sir: Auditoriya faol: har bir postga o‘rtacha 5 ta reaksiya keladi.
- Tematik yo‘nalishlar: Kontent sql, learning, analytic, engineer, link:- kabi asosiy mavzularga jamlangan.
📝 Tavsif va kontent siyosati
Muallif resursni shaxsiy fikrni ifoda etish maydoni sifatida ta’riflaydi:
“Free Data Engineering Ebooks & Courses”
Yuqori yangilanish chastotasi (oxirgi ma’lumot 09 Iyun, 2026 da olingan) sababli kanal doimo dolzarb va katta qamrovli bo‘lib qoladi. Analitika auditoriya kontent bilan faol hamkorlik qilishini, uni Taʼlim toifasidagi muhim ta’sir nuqtasiga aylantirishini ko‘rsatadi.
# Load the DataFrame
df = spark.read.csv("path/to/data.csv", header=True, inferSchema=True)
# Handle missing values
df_filled = df.fillna(0)
# Aggregate data
from pyspark.sql.functions import sum, col
df_aggregated = df_filled.groupBy("category", "region").agg(sum(col("sales")).alias("total_sales"))
# Sort the results
df_aggregated_sorted = df_aggregated.orderBy("total_sales", ascending=False)
# Save the aggregated DataFrame
df_aggregated_sorted.write.csv("path/to/aggregated/data.csv", header=True)
Scenario 2: Data Transformation
Interviewer: "How would you transform a DataFrame by converting a column to timestamp, handling invalid dates and extracting specific date components?"
Candidate:
# Load the DataFrame
df = spark.read.csv("path/to/data.csv", header=True, inferSchema=True)
# Convert column to timestamp
from pyspark.sql.functions import to_timestamp, col
df_transformed = df.withColumn("date_column", to_timestamp(col("date_column"), "yyyy-MM-dd"))
# Handle invalid dates
df_transformed_filtered = df_transformed.filter(col("date_column").isNotNull())
# Extract date components
from pyspark.sql.functions import year, month, dayofmonth
df_transformed_extracted = df_transformed_filtered.withColumn("year", year(col("date_column"))).withColumn("month", month(col("date_column"))).withColumn("day", dayofmonth(col("date_column")))
# Save the transformed DataFrame
df_transformed_extracted.write.csv("path/to/transformed/data.csv", header=True)
Scenario 3: Data Partitioning
Interviewer: "How would you partition a large DataFrame by date and save it to parquet format, handling data skewness and optimizing storage?"
Candidate:
# Load the DataFrame
df = spark.read.csv("path/to/data.csv", header=True, inferSchema=True)
# Partition by date
df_partitioned = df.repartitionByRange("date_column")
# Save to parquet format
df_partitioned.write.parquet("path/to/partitioned/data.parquet", partitionBy=["date_column"])
# Optimize storage
df_partitioned.write.option("compression", "snappy").parquet("path/to/partitioned/data.parquet", partitionBy=["date_column"])
Here, you can find Data Engineering Resources 👇
https://topmate.io/analyst/910180
All the best 👍👍from pyspark.sql.functions import when, isnan
# Load the DataFrame
df = spark.read.csv("path/to/data.csv", header=True, inferSchema=True)
# Check for missing values
missing_count = df.select([count(when(isnan(c), c)).alias(c) for c in df.columns])
# Replace missing values with mean
from pyspark.sql.functions import mean
mean_values = df.agg(*[mean(c).alias(c) for c in df.columns])
df_filled = df.fillna(mean_values)
# Save the cleaned DataFrame
df_filled.write.csv("path/to/cleaned/data.csv", header=True)
Interviewer: "That's correct! Can you explain why you used the fillna() method?"
Candidate: "Yes, fillna() replaces missing values with the specified value, in this case, the mean of each column."
*Scenario 2: Data Aggregation*
Interviewer: "How would you aggregate data by category and calculate the average sales amount?"
Candidate:
# Load the DataFrame
df = spark.read.csv("path/to/data.csv", header=True, inferSchema=True)
# Aggregate data by category
from pyspark.sql.functions import avg
df_aggregated = df.groupBy("category").agg(avg("sales").alias("avg_sales"))
# Sort the results
df_aggregated_sorted = df_aggregated.orderBy("avg_sales", ascending=False)
# Save the aggregated DataFrame
df_aggregated_sorted.write.csv("path/to/aggregated/data.csv", header=True)
Interviewer: "Great answer! Can you explain why you used the groupBy() method?"
Candidate: "Yes, groupBy() groups the data by the specified column, in this case, 'category', allowing us to perform aggregation operations."
Here, you can find Data Engineering Resources 👇
https://topmate.io/analyst/910180
All the best 👍👍
Endi mavjud! Telegram Tadqiqoti 2025 — yilning asosiy insaytlari 
