AI & ML Papers
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers. Admin: @HusseinSheikho || @Hussein_Sheikho
نمایش بیشتر📈 تحلیل کانال تلگرام AI & ML Papers
کانال AI & ML Papers (@papernexus) در بخش زبانی انگلیسی بازیگری فعال است. در حال حاضر جامعه شامل 32 887 مشترک است و جایگاه 4 158 را در دسته فناوری و برنامهها و رتبه 12 536 را در منطقه الهند دارد.
📊 شاخصهای مخاطب و پویایی
از زمان ایجاد در невідомо، پروژه رشد سریعی داشته و 32 887 مشترک جذب کرده است.
بر اساس آخرین دادهها در تاریخ 23 ژوئن, 2026، کانال فعالیت پایداری دارد. در ۳۰ روز گذشته تغییر اعضا برابر 407 و در ۲۴ ساعت گذشته برابر 24 بوده و همچنان دسترسی گستردهای حفظ شده است.
- وضعیت تأیید: تأیید نشده
- نرخ تعامل (ER): میانگین تعامل مخاطب 1.18% است و در ۲۴ ساعت نخست پس از انتشار، محتوا معمولاً 0.81% واکنش نسبت به کل مشترکان کسب میکند.
- دسترسی پستها: هر پست به طور میانگین 389 بازدید دریافت میکند. در اولین روز معمولاً 266 بازدید جمعآوری میشود.
- واکنشها و تعامل: مخاطبان بهطور فعال حمایت میکنند؛ میانگین واکنش به هر پست 1 است.
- علایق موضوعی: محتوا بر موضوعات کلیدی مانند summary, apr, huggingface, github, framework تمرکز دارد.
📝 توضیح و سیاست محتوایی
نویسنده این فضا را محل بیان دیدگاههای شخصی توصیف میکند:
“Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.
Admin: @HusseinSheikho || @Hussein_Sheikho”
به لطف بهروزرسانیهای پرتکرار (آخرین داده در تاریخ 24 ژوئن, 2026)، کانال همواره بهروز و دارای دسترسی بالاست. تحلیلها نشان میدهد مخاطبان بهطور فعال با محتوا تعامل دارند و آن را به نقطه اثرگذاری مهم در دسته فناوری و برنامهها تبدیل کردهاند.
💡 The paper introduces AOHP, an open source operating system framework built on the Android Open Source Project, designed to support AI agents as first class entities. The motivation behind AOHP is to address the limitations of existing operating systems, which are application centric and do not provide native support for AI agents, resulting in execution overhead and safety risks. AOHP treats agents as first class OS actors, enabling adaptive user interfaces and agent friendly runtime environments. The framework introduces three agent oriented system mechanisms: personalized service composition, efficient agent interfaces, and secure information flow. The authors evaluated AOHP through preliminary experiments on challenging tasks and found that it shows significant advantages in task completion rate, execution cost, and security policy compliance. Specifically, AOHP achieved a 21.12 percent increase in task completion rate and a 51.55 percent reduction in token cost. The paper contributes to the research community by providing an open testbed to explore the architectural primitives desired for agent mediated interaction, and demonstrates the potential of AOHP to enhance the efficiency, security, and adoption of AI agents in operating systems.📅 Published on Jun 22 🔗 Links: • GitHub: https://github.com/huggingface • Project Page: https://huggingface.co/papers?q=Android%20Open%20Source%20Project • arXiv: https://arxiv.org/abs/2606.23449 • PDF: https://arxiv.org/pdf/2606.23449 ━━━━━━━━━━━━━━━━━━━━━━━━ 📢 By: https://t.me/PaperNexus #OperatingSystemSecurity #AIIntegrationInOS #AgentOrientedProgramming #OpenSourceAndroid #ArtificialIntelligenceInOS
💡 The paper introduces a unified framework called NanoGen for training and evaluating diffusion transformers, which are used in image generation tasks. The current evaluation setup for diffusion transformers is limited to class-conditional generation on ImageNet, which may not reflect real progress in generative modeling. The authors argue that text-to-image generation is a more comprehensive task, but it is often skipped due to perceived high costs and inconvenience. However, the authors show that with NanoGen, training and evaluating text-to-image models requires comparable compute to ImageNet. The NanoGen framework supports various diffusion methods and can be easily configured to train models on both ImageNet and text-to-image tasks. The authors trained 21 latent diffusion models using NanoGen and found that the ranking of methods on ImageNet and text-to-image tasks shows no strong correlation. This suggests that a method that improves performance on ImageNet may not necessarily improve performance on text-to-image generation. To address this issue, the authors propose a holistic benchmark called DiffusionBench, which summarizes results on both ImageNet and text-to-image tasks. The authors recommend reporting DiffusionBench in place of ImageNet alone, as methods that improve DiffusionBench are more likely to reflect broader progress in generative modeling. The main contribution of the paper is the introduction of NanoGen and DiffusionBench, which provide a more comprehensive evaluation setup for diffusion transformers and can help to advance research in generative modeling.📅 Published on Jun 23 🔗 Links: • GitHub: https://github.com/huggingface • arXiv: https://arxiv.org/abs/2606.24888 • PDF: https://arxiv.org/pdf/2606.24888 • Project Page: https://end2end-diffusion.github.io/diffusion-bench/ ━━━━━━━━━━━━━━━━━━━━━━━━ 📢 By: https://t.me/PaperNexus #DiffusionTransformers #ImageGenerationTasks #TextToImageGeneration #GenerativeModeling #DiffusionBasedArchitectures
💡 The paper introduces NatureBench, a benchmark of 90 scientific tasks derived from Nature publications, to assess the ability of AI coding agents to achieve discovery rather than just reproduction. The benchmark is built on NatureGym, an automated pipeline that constructs a standardized environment for each task, addressing the environment fragmentation problem that has limited the credibility of prior benchmarks. The authors evaluate ten frontier agent configurations under a strict protocol and find that the strongest model surpasses the state of the art on only 17.8 percent of tasks. The analysis reveals that agents succeed primarily through methodological translation, converting scientific tasks into familiar supervised prediction problems, rather than through genuine scientific innovation. The main reasons for failure are wrong method choice and insufficient compute budget, rather than task misunderstanding. The benchmark, pipeline, and a public leaderboard are released to facilitate further research. The paper contributes to the understanding of the limitations of current AI coding agents in achieving discovery and highlights the need for genuine scientific innovation in AI research.📅 Published on Jun 23 🔗 Links: • GitHub: https://github.com/huggingface • arXiv: https://arxiv.org/abs/2606.24530 • PDF: https://arxiv.org/pdf/2606.24530 • Project Page: https://frontisai.github.io/NatureBench/ 📊 Datasets citing this paper: • https://huggingface.co/datasets/FrontisAI/NatureBench ━━━━━━━━━━━━━━━━━━━━━━━━ 📢 By: https://t.me/PaperNexus #ArtificialIntelligenceInScience #NatureBench #AICodingAgents #ScientificDiscoveryWithAI #BenchmarkingAIAgents
💡 The paper introduces MobileForge, a system for adapting mobile graphical user interface agents to real target apps without requiring manual annotations. The problem addressed is that current mobile GUI agents require costly and time-consuming human-written tasks, demonstrations, or reward labels to adapt to new apps. Existing annotation-free GUI learning methods lack a unified approach to connect target-app exploration, curriculum mining, rollout execution, and feedback, and policy optimization often relies on isolated rollouts and coarse rewards. MobileForge consists of two main components: MobileGym, which generates tasks and evaluates rollouts based on real mobile app interaction, and Hierarchical Feedback-Guided Policy Optimization, which uses trajectory outcomes, step-level process feedback, and corrective hints to update the policy. This approach allows for efficient adaptation of mobile GUI agents to new apps without requiring manual annotations. The results show that MobileForge can adapt a mobile GUI agent to achieve 67.2 percent Pass@3 on AndroidWorld, which is close to the performance of a specialized model trained on closed data. Further adaptation using MobileForge reaches 77.6 percent Pass@3 on AndroidWorld and 41.0 percent success on the out-of-domain MobileWorld GUI-only split, establishing the strongest open-data mobile GUI agent in the evaluation. Overall, MobileForge provides a unified and efficient approach to adapting mobile GUI agents to new apps without requiring manual annotations, making it a significant contribution to the field.📅 Published on Jun 18 🔗 Links: • GitHub: https://github.com/huggingface • arXiv: https://arxiv.org/abs/2606.19930 • PDF: https://arxiv.org/pdf/2606.19930 • Project Page: https://mobile-forge.github.io 📊 Datasets citing this paper: • https://huggingface.co/datasets/lgy0404/mobileforge-exploration-trajectories • https://huggingface.co/datasets/lgy0404/mobileforge-training-data • https://huggingface.co/datasets/lgy0404/mobileforge-benchmark-results ━━━━━━━━━━━━━━━━━━━━━━━━ 📢 By: https://t.me/PaperNexus #MobileGUIAgents #HierarchicalFeedbackGuidedPolicyOptimization #AnnotationFreeLearning #MobileGraphicalUserInterface #PolicyOptimizationForMobileApps
💡 The paper introduces MemGUI-Agent, a mobile GUI agent designed to address the limitations of existing agents on long-horizon tasks. Current agents struggle with retaining intermediate facts across many steps and app transitions, leading to unreliable performance. This limitation is attributed to the ReAct-style prompting approach, which passively accumulates per-step records, causing prompt explosion and dilution of critical cross-app facts. To address this issue, the authors propose MemGUI-Agent, which uses proactive context management through Context-as-Action, or ConAct. ConAct casts context management as first-class actions emitted by the same policy that selects UI actions. This approach maintains three structured context fields: folded action history, folded UI state, and recent step record, preserving critical UI facts while keeping context compact. The authors also introduce MemGUI-3K, a dataset with 2,956 trajectories and full ConAct annotations for supervised training and offline analysis. Training an 8B model on MemGUI-3K results in MemGUI-8B-SFT, an 8B MemGUI-Agent that achieves the best open-data 8B performance on MemGUI-Bench and generalizes to the out-of-distribution MobileWorld benchmark. The contributions of the paper are threefold. Firstly, it identifies the limitations of existing mobile GUI agents on long-horizon tasks and attributes them to the ReAct-style prompting approach. Secondly, it proposes MemGUI-Agent with proactive context management through ConAct, which addresses the limitations of existing agents. Finally, it introduces MemGUI-3K, a dataset for supervised training and offline analysis, and demonstrates the effectiveness of MemGUI-8B-SFT, an 8B MemGUI-Agent trained on this dataset. The code, data, and trained models will be released to facilitate further research and development.📅 Published on Jun 18 🔗 Links: • GitHub: https://github.com/huggingface • arXiv: https://arxiv.org/abs/2606.19926 • PDF: https://arxiv.org/pdf/2606.19926 • Project Page: https://memgui-agent.github.io/ 🤖 Models citing this paper: • https://huggingface.co/lgy0404/MemGUI-8B-SFT 📊 Datasets citing this paper: • https://huggingface.co/datasets/lgy0404/MemGUI-3K ━━━━━━━━━━━━━━━━━━━━━━━━ 📢 By: https://t.me/PaperNexus #MobileGUIAutomation #LongHorizonTaskLearning #ProactiveContextManagement #ContextAsAction #EndToEndGUIAgents
💡 The paper presents Lift4D, a test-time optimization framework for reconstructing dynamic non-rigid objects from monocular video. The problem addressed is the difficulty in reconstructing 4D representations of dynamic objects from single-view video due to the scarcity of 4D training data and the limitations of prior approaches that either directly predict 4D representations or initialize a 3D representation and refine it based on video evidence. The method involves adapting a single-view 3D reconstruction model to yield temporally consistent per-frame predictions, which provides a coherent initialization for a deformable 3D Gaussian Splatting representation. This representation is then optimized to match the input video through an occlusion-aware optimization that recovers visible surface details and completes unobserved regions using a view-conditioned diffusion prior. The results show that Lift4D improves over prior 4D reconstruction methods, particularly on challenging in-the-wild sequences with severe occlusions and non-rigid motion. The framework effectively handles complex scenarios by integrating visual cues from direct observations with data-driven priors over geometry and appearance, making it a significant contribution to the field of 4D reconstruction from monocular video.📅 Published on Jun 22 🔗 Links: • GitHub: https://github.com/huggingface • arXiv: https://arxiv.org/abs/2606.23688 • PDF: https://arxiv.org/pdf/2606.23688 • Project Page: https://lift4d.github.io/ ━━━━━━━━━━━━━━━━━━━━━━━━ 📢 By: https://t.me/PaperNexus #4DReconstruction #SingleView3DEstimation #MonocularVideoAnalysis #DynamicNonRigidObjectReconstruction #TestTimeOptimizationFrameworks
💡 The paper introduces OpenRath, a programming model for multi-agent systems that addresses the issue of fragmented runtime state. In current agent systems, various aspects such as transcripts, tool effects, and memory events are recorded separately, making it difficult to inspect or reproduce the system's behavior. OpenRath solves this problem by introducing a central runtime abstraction called Session, which is a first-class value that can be passed between agents and workflows. The Session abstraction is designed to be branchable, inspectable, replayable, backend-aware, and composable, allowing it to record various execution state information such as conversation chunks, sandbox placement, and tool evidence. This enables explicit fork, merge, and replay operations as runtime operations rather than reconstructing states from external traces. OpenRath also defines other key concepts such as Sandbox, Tool, Agent, Memory, Workflow, and Selector, which work together to provide a comprehensive programming model for multi-agent systems. The Selector is particularly important as it turns control flow into runtime-routed decisions. The paper presents the programming model, architecture, and evidence protocol of OpenRath, and claims that the Session abstraction provides agent systems with a first-class runtime value for auditable composition. The results of this work are limited to controlled runtime properties, and further evaluation is needed to compare the performance of OpenRath with other systems and to assess its availability and quality. Overall, OpenRath contributes a novel programming model for multi-agent systems that provides a unified and explicit way to manage runtime state, making it easier to inspect, reproduce, and debug the behavior of these systems.📅 Published on Jun 17 🔗 Links: • GitHub: https://github.com/huggingface • arXiv: https://arxiv.org/abs/2606.19409 • PDF: https://arxiv.org/pdf/2606.19409 • Project Page: https://docs.openrath.com ━━━━━━━━━━━━━━━━━━━━━━━━ 📢 By: https://t.me/PaperNexus #MultiAgentSystems #RuntimeStateManagement #AgentOrientedProgramming #SessionCenteredArchitecture #DistributedSystemDesign
💡 The paper presents an efficient method for guiding large language model text generation using regular expressions and context-free grammars. The problem addressed is that guided generation can be impractical due to significant overhead. The authors propose an approach that adds minimal overhead to the token sequence generation process. This method makes guided generation feasible in practice. The approach is implemented in the open source Python library Outlines, providing a practical solution for efficient guided generation. The results indicate that the method is effective, allowing for guided generation with little to no overhead, which is a significant contribution to the field of natural language processing.📅 Published on Jul 19, 2023 🔗 Links: • GitHub: https://github.com/huggingface • arXiv: https://arxiv.org/abs/2307.09702 • PDF: https://arxiv.org/pdf/2307.09702 ━━━━━━━━━━━━━━━━━━━━━━━━ 📢 By: https://t.me/PaperNexus #LargeLanguageModels #GuidedTextGeneration #RegularExpressions #ContextFreeGrammars #EfficientGenerationMethods
اکنون در دسترس! پژوهش تلگرام ۲۰۲۵ — مهمترین بینشهای سال 
