Data Scientology

الذهاب إلى القناة على Telegram

Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf

1 151

المشتركون

+124 ساعات

+57 أيام

+730 أيام

208

عرض المشاهدات

~ 4224 ساعات

~ 5748 ساعات

18.07%

معدل المشاركة

لا توجد بيانات

المشاركات في اليوم

Ads index

beta

أرشيف المشاركات

1 151

Beginner here: My pothole detection model mistakes the roadside for potholes. https://redd.it/1v90113 @datascientology

1 151

30+ officially free AI/ML books, all in one curated repo https://redd.it/1v7cvqr @datascientology

1 151

Are there some textbooks that take a primarily engineering approach to machine learning (as opposed to a "scientific" approach)? D As someone who studied stats undergrad and industrial engineering operations research grad, and who thinks about the practical business of ML components in software.... I get lost and a bit hopeless when I think about how to make useful software out of ML models in a reasonable amount of time, and in the current business environment. And when I look at the businesses where I have worked that have mountains of middle management running tiny bits of the ML model lifecycle (think feature extraction, data ingestion and integration, training infra, hosting infra, more hosting infra, applied science)... that only makes my head hurt even more. How do you go about making practical software out of ML components? Edit: I should mention that I mean from scratch ML components, not just a call to a third party hosted tool. https://redd.it/1v16l6a @datascientology

1 151

SenseNova-Vision is open-sourced: handle every CV task as unified multimodal generation https://redd.it/1uyorje @datascientology

1 151

I reviewed that boy. https://redd.it/1uwsmbo @datascientology

1 151

Prompt-engineering paper accepted to ICML R "Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity" This paper was accepted to ICML this year. Its main idea is a very simple prompt-engineering trick: "changing the prompt this way led to more diverse sampling". Naturally, it is difficult to provide a rigorous theoretical analysis for something like this. Even if it works, I’m not sure this kind of prompt engineering belongs at a top-tier machine learning conference. Some people seems to call this kind of work “modern machine learning”, but I think it should be categorized as less technical venues. How do you think? Am I being too rigid? https://redd.it/1uv1xb3 @datascientology

1 151

Hyperparameter tuning approach question R I am doing some work with cell type classification, where I have 4.3 million cells and 512 features (condensed embeddings from the encoder of a transformer). The broader goal is to implement a contextual bandit for augmenting the training set of the dataset, as it is currently imbalanced, and rare cell type classification is poor when I tried a baseline logistic regression classifier. Dataset: Feature matrix shape: (4290471, 512) Labels shape: (4290471,) Class distribution: T cell 1966941 DC 858451 NK cell 561904 Monocyte 411170 B cell 375882 Platelet 54576 Progenitor cell 24689 ILC 24254 Erythrocyte 12604 I didn't do any hyperparameter tuning for the LR classifier, but I want to try other ML models (LightGBM, XGBoost, SVM) However, I face a bottleneck with hyperparameter tuning. I want to do 80/10/10 train/validate/test split, but the training set is so large and takes a long time even on H100. What are some solutions to this? I tried optuna but still very long for each hyperparameter trial. I then tried optuna but instead of using the full 80% for training each time, only 15% of the 80% is used (subsampling from the training set). I'm not sure if this is robust or not. I also couldn't really find anything in the literature. Anyone been in a similar situation? https://redd.it/1usa46w @datascientology

1 151

TorchJD: Training with multiple losses in PyTorch P Hi everyone! I wanted to share some recent progress on TorchJD that might be useful to the machine learning community. When training models with multiple losses (multiple tasks, constraints, auxiliary losses, regularization terms, etc.), you typically have two options: Scalarization: Various ways to combine those losses into a single loss (e.g. average them or combine them with trainable weights); then you can do gradient descent on it. Jacobian descent: Compute the Jacobian of the vector of losses (i.e. one gradient per loss), and aggregate it into an update vector that will decrease each individual loss (rather than just the average loss). There are many ways to do this aggregation step. Scalarization methods are generally cheaper in memory, but in some cases there is so much disagreement between your objectives that it's better to use a Jacobian descent method. In any case, thanks to our amazing new contributors, we've now finally implemented most existing methods of the literature from both categories into our library TorchJD, so that you can try anything in just a few line changes! Recently, TorchJD has been accepted into the PyTorch ecosystem, and we're trying to make it become the go-to library for training with multiple losses. If you'd like to help build the future of the project, come join us on Discord (link can be found in the readme of the repo). New ideas, contributions, bug reports, experiments, and any form of feedback are all welcome. We have many ideas on how to make all this even more efficient, and we will need help for that. If you want to support us, a star on GitHub also helps a lot! https://redd.it/1upzxk2 @datascientology

1 151

I trained a local AI model that generated 22,000+ novel drug-like molecules — verified against 4.6M known compounds. Dataset available. Built an 80M parameter causal transformer on consumer hardware (RTX 5070), trained on MOSES + ZINC-250k. Generated and filtered for QED ≥ 0.5, SA ≤ 4.0, MW ≤ 500. Top compound hits QED 0.947. 100% novel against MOSES, ZINC, and ChEMBL. HuggingFace: https://huggingface.co/datasets/MKEChem/mke-novel-druglike-smiles Happy to answer questions about the generation method. https://redd.it/1uojccn @datascientology

1 151

Books/Resources to improve mathematical foundations for ML research D I am a mid to late stage PhD student in ML. I've known this before, but only recently I started feeling this urgently: my mathematical foundations are shaky, because I kept "learning-things-as-I-go" when working on various problems. I likely have only a year or two left until I graduate, and before I do so, I want to really dedicate some time and focus to brush up on the fundamentals. Primarily, I want to improve my knowledge in Linear Algebra, Probability Theory, and Functional Analysis. For Lin. alg., I am looking at "Linear Algebra done right", and I think this book is sufficient for the topic, unless anyone thinks otherwise. I am not sure where to start for probability, as well as functional analysis. Rudin's books give me headaches. I instead started reading "A primer on RKHS" (https://arxiv.org/abs/1408.0952) to "dip my toe" into functional analysis. Apart from the above, I might re-read PRML book (I've only read specific chapters before), and try to finish Pat Kidger's Just-Know-Stuff list (https://kidger.site/thoughts/just-know-stuff). Thoughts? Anyone have any book/resource recommendations? Someone told me to look into "the bright side of mathematics" on YouTube, anyone ever go through the videos there? I'm aware finding good, digestible resources is less than 10% of the challenge. The difficult part is sticking through and actually reading/working through these topics, while still juggling other academic responsibilities. https://redd.it/1ulmy9g @datascientology

1 151

WIP: Currently building an app to teach (French) sign language using computer vision https://redd.it/1uk1lk7 @datascientology

1 151

A physical, working LeNet-1 (1989) built from transparent PCBs, glass and aluminium. https://redd.it/1uhr1g1 @datascientology

1 151

ShadeNet 28M — Dual-mode PBR material estimation from any RGB image https://redd.it/1ufmhd4 @datascientology

1 151

DeepSWE: new benchmark looking at how well today's frontier models can actually write code R DeepSWE delivers four advances over existing public benchmarks: Contamination free: Tasks are written from scratch, not adapted from existing commits or PRs, so no model has seen the solution during pretraining. High diversity: Tasks span a broad pool of 91 repositories across 5 languages. Real-world complexity: Prompts are \~half the length of SWE-bench Pro's, yet solutions require 5.5x more code and \~2x more output tokens. Reliable verification: Verifiers are hand-written to test software behavior rather than implementation details. The result is a benchmark that reflects how today's frontier coding agents actually perform in software engineering work. https://preview.redd.it/lacvagyr159h1.png?width=1373&format=png&auto=webp&s=6514340a15d51d7f03da733f08fb3f6a302cac75 It's open-source: https://github.com/datacurve-ai/deep-swe https://redd.it/1ue0hlp @datascientology

1 151

I've also been looking for the plane! https://redd.it/1ucd6rd @datascientology

1 151

C++ tracker for small aerial targets https://redd.it/1u9eder @datascientology

1 151

Next-Latent Prediction Transformers R Microsoft Research Preprint Next-token prediction is myopic. What if transformers learn to predict their own next latent state? Microsoft Research present Next-Latent Prediction (NextLat): a self-supervised learning method that teaches transformers to form compact world models for reasoning and planning. It also unlocks up to 3.3x faster inference via self-speculative decoding! On top of next-token prediction, NextLat trains the transformer to predict its own next latent state given the current latent state and next token. NextLat has a few key benefits: 1. Representation Learning: NextLat encourages transformers to compress history into compact belief states. 2. Better Data Efficiency: predicting in latent space provides denser supervision than predicting one-hot tokens. 3. Faster Inference: via recursive multi-step lookahead. I'm super excited about this work. Please do check it out below: 💬 Blog: https://jaydenteoh.github.io/blog/2026/nextlat 💻 Code: https://github.com/JaydenTeoh 📝 Paper: https://arxiv.org/abs/2511.05963 https://redd.it/1u84mio @datascientology

1 151

How does the ML community view evolutionary algorithm research? Career implications of an EA PhD? D How does the ML research community feel about evolutionary algorithms? Should I do a PhD in this area? Quick remark: I know some people in the ML community dunk on evolutionary algorithms because there’s often a better optimizer, but they do have their place, which is what researchers in my community aim to quantify. Background: I just finished my first year as a mathematics master’s student working on the theory of evolutionary algorithms (EAs)/randomized search heuristics. I’m fortunate to be on a research assistantship and have already coauthored several papers in strong conferences in our area. I’ve always been more interested in classical ML/deep learning theory but haven’t had anyone to work with. Researchers in my field, including my advisor, occasionally publish in mainstream ML venues such as AAAI and NeurIPS, but it’s primarily the EA venues. For a while now, I’ve been independently studying deep learning and statistical learning theory, and I have found intersections with my current research that I plan to pursue for my thesis. With my current CV, it’s looking like I could get into some of the best PhD programs in my area, but I’m wondering if I should try to go to a more ML-centric PhD, even if it means going to a less prestigious institution/group for the sake of my career. I’m not sure yet what I want to do after my PhD and a possible postdoc, but I want to keep myself competitive for top-tier opportunities. What implications might doing an EA PhD have for my career? With strong EA publications, could I get into a good ML PhD program if I pitch myself appropriately? Could staying somewhat outside mainstream ML actually be a good career move, given how competitive and crowded ML has become? https://redd.it/1u66q3l @datascientology

1 151

Which software or tools are used to make these kinds of diagrams or animations? https://redd.it/1u3bh7r @datascientology