Data science research papers
Больше
1 510
Подписчики
+324 часа
+237 дней
+9530 дней
- Подписчики
- Просмотры постов
- ER - коэффициент вовлеченности
Загрузка данных...
Прирост подписчиков
Загрузка данных...
Фото недоступноПоказать в Telegram
Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images
Publication date: 20 June 2024
Topic: Semantic Segmentation
Paper: https://arxiv.org/pdf/2406.14086v1
GitHub: https://github.com/zhuqinfeng1999/seg-lstm
Description:
Our study represents the first attempt to evaluate the effectiveness of Vision-LSTM in the semantic segmentation of remotely sensed images. This evaluation is based on a specifically designed encoder-decoder architecture named Seg-LSTM, and comparisons with state-of-the-art segmentation networks. Our study found that Vision-LSTM's performance in semantic segmentation was limited and generally inferior to Vision-Transformers-based and Vision-Mamba-based models in most comparative tests.
Фото недоступноПоказать в Telegram
SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals
Publication date: 28 May 2024
Topic: Contrastive Learning
Paper: https://arxiv.org/pdf/2405.17766v1.pdf
GitHub: https://github.com/rthapa84/sleepfm-codebase
Description:
We show that a novel leave-one-out approach for contrastive learning significantly improves downstream task performance compared to representations from standard pairwise contrastive learning. A logistic regression model trained on SleepFM's learned embeddings outperforms an end-to-end trained convolutional neural network (CNN) on sleep stage classification (macro AUROC 0.88 vs 0.72 and macro AUPRC 0.72 vs 0.48) and sleep disordered breathing detection (AUROC 0.85 vs 0.69 and AUPRC 0.77 vs 0.61). Notably, the learned embeddings achieve 48% top-1 average accuracy in retrieving the corresponding recording clips of other modalities from 90,000 candidates.
Фото недоступноПоказать в Telegram
Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis
Publication date: 5 June 2024
Topic: Representation Learning
Paper: https://arxiv.org/pdf/2406.03430v1.pdf
GitHub: https://github.com/xmindflow/awesome_mamba
Description:
Capitalizing on the advances in computer vision, medical imaging has heralded a new epoch with Mamba models. Intending to help researchers navigate the surge, this survey seeks to offer an encyclopedic review of Mamba models in medical imaging. Specifically, we start with a comprehensive theoretical review forming the basis of SSMs, including Mamba architecture and its alternatives for sequence modeling paradigms in this context. Next, we offer a structured classification of Mamba models in the medical field and introduce a diverse categorization scheme based on their application, imaging modalities, and targeted organs.
Фото недоступноПоказать в Telegram
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Publication date: 4 June 2024
Topic: Object detection
Paper: https://arxiv.org/pdf/2406.02548v1.pdf
GitHub: https://github.com/aminebdj/openyolo3d
Description:
We address this task by generating class-agnostic 3D masks for objects in the scene and associating them with text prompts. We observe that the projection of class-agnostic 3D point cloud instances already holds instance information; thus, using SAM might only result in redundancy that unnecessarily increases the inference time. We empirically find that a better performance of matching text prompts to 3D masks can be achieved in a faster fashion with a 2D object detector. We validate our Open-YOLO 3D on two benchmarks, ScanNet200 and Replica, under two scenarios: (i) with ground truth masks, where labels are required for given object proposals, and (ii) with class-agnostic 3D proposals generated from a 3D proposal network.
👍 1
Фото недоступноПоказать в Telegram
Parameter-Inverted Image Pyramid Networks
Publication date: 6 June 2024
Topic: Image Classification
Paper: https://arxiv.org/pdf/2406.04330v1.pdf
GitHub: https://github.com/opengvlab/piip
Description:
We propose a feature interaction mechanism to allow features of different resolutions to complement each other and effectively integrate information from different spatial scales. Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and single-branch networks, while reducing computational cost. Notably, when applying our method on a large-scale vision foundation model InternViT-6B, we improve its performance by 1%-2% on detection and segmentation with only 40%-60% of the original computation. These results validate the effectiveness of the PIIP approach and provide a new technical direction for future vision computing tasks.
Фото недоступноПоказать в Telegram
Matching Anything by Segmenting Anything
Publication date: 6 June 2024
Topic: Semantic Segmentation
Paper: https://arxiv.org/pdf/2406.04221v1.pdf
GitHub: https://github.com/siyuanliii/masa
Description:
We propose MASA, a novel method for robust instance association learning, capable of matching any objects within videos across diverse domains without tracking labels. Leveraging the rich object segmentation from the Segment Anything Model (SAM), MASA learns instance-level correspondence through exhaustive data transformations. We treat the SAM outputs as dense object region proposals and learn to match those regions from a vast image collection. We further design a universal MASA adapter which can work in tandem with foundational segmentation or detection models and enable them to track any detected objects. Those combinations present strong zero-shot tracking ability in complex domains.
👍 1
Фото недоступноПоказать в Telegram
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Publication date: 3 Apr 2024
Topic: Image Generation
Paper: https://arxiv.org/pdf/2404.02733v2.pdf
GitHub: https://github.com/instantstyle/instantstyle
Description:
In this paper, we commence by examining several compelling yet frequently overlooked observations. We then proceed to introduce InstantStyle, a framework designed to address these issues through the implementation of two key strategies: 1) A straightforward mechanism that decouples style and content from reference images within the feature space, predicated on the assumption that features within the same space can be either added to or subtracted from one another. 2) The injection of reference image features exclusively into style-specific blocks, thereby preventing style leaks and eschewing the need for cumbersome weight tuning, which often characterizes more parameter-heavy designs.
👍 1
Фото недоступноПоказать в Telegram
Language Guided Domain Generalized Medical Image Segmentation
Publication date: 1 April 2024
Topic: Contrastive Learning
Paper: https://arxiv.org/pdf/2404.01272v2.pdf
GitHub: https://github.com/shahinakk/lg_sdg
Description:
In this paper, we propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features to learn a more robust feature representation. We assess the effectiveness of our text-guided contrastive feature alignment technique in various scenarios, including cross-modality, cross-sequence, and cross-site settings for different segmentation tasks. Our approach achieves favorable performance against existing methods in literature.
👍 1
Фото недоступноПоказать в Telegram
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
Publication date: 26 Mar 2024
Topic: Object detection
Paper: https://arxiv.org/pdf/2403.17695v1.pdf
GitHub: https://github.com/chenhongyiyang/plainmamba
Description:
In this paper, we further adapt the selective scanning process of Mamba to the visual domain, enhancing its ability to learn features from two-dimensional images by (i) a continuous 2D scanning process that improves spatial continuity by ensuring adjacency of tokens in the scanning sequence, and (ii) direction-aware updating which enables the model to discern the spatial relations of tokens by encoding directional information. Our architecture is designed to be easy to use and easy to scale, formed by stacking identical PlainMamba blocks, resulting in a model with constant width throughout all layers.
Фото недоступноПоказать в Telegram
Targeted Visualization of the Backbone of Encoder LLMs
Publication date: 26 Mar 2024
Topic: Image Classification
Paper: https://arxiv.org/pdf/2403.18872v1.pdf
GitHub: https://github.com/LucaHermes/DeepView
Description:
We investigate the application of DeepView, a method for visualizing a part of the decision function together with a data set in two dimensions, to the NLP domain. While in previous work, DeepView has been used to inspect deep image classification models, we demonstrate how to apply it to BERT-based NLP classifiers and investigate its usability in this domain, including settings with adversarially perturbed input samples and pre-trained, fine-tuned, and multi-task models.
Выберите другой тариф
Ваш текущий тарифный план позволяет посмотреть аналитику только 5 каналов. Чтобы получить больше, выберите другой план.