cookie

نحن نستخدم ملفات تعريف الارتباط لتحسين تجربة التصفح الخاصة بك. بالنقر على "قبول الكل"، أنت توافق على استخدام ملفات تعريف الارتباط.

avatar

Big Data Science

Big Data Science channel gathers together all interesting facts about Data Science. For cooperation: [email protected] 💼 — https://t.me/bds_job — channel about Data Science jobs and career 💻 — https://t.me/bdscience_ru — Big Data Science [RU]

إظهار المزيد
مشاركات الإعلانات
3 743
المشتركون
+324 ساعات
+217 أيام
+9930 أيام

جاري تحميل البيانات...

معدل نمو المشترك

جاري تحميل البيانات...

💡Another small selection of AI tools for Big Data analytics KNIME Analytics Platform is a free, open-source platform that allows users to stay at the forefront of data science and has 300+ connectors to various data sources. and integrates with all popular machine learning libraries. Polymer - artificial intelligence for transforming data into an optimized, flexible and powerful database. All a user needs to do is upload their spreadsheet to the platform to instantly transform it into an optimized database that can then be mined for insights. IBM Cognos Analytics is a componentized online business intelligence (BI) service that provides access to a wide range of functions for creating business reports, data analysis, event monitoring and metrics to develop effective business decisions. Akkio is a business intelligence and forecasting tool that allows users to analyze their data and predict potential outcomes. The AI ​​tool allows users to upload their dataset and select the variable they want to predict, which helps Akkio build a neural network around that variable. Like many other tools, Akkio requires no prior programming experience. Monkeylearn - uses AI data analytics capabilities to help users visualize and reorganize their data. It can also be used to set up text classifiers and text extractors, which help automatically sort data according to topic or intent, and extract product characteristics or user data.
إظهار الكل...
KNIME Analytics Platform | KNIME

Access data from any data source - your laptop, an application or a data warehouseEasily blend data of any size and any type - all file formats supportedAggregate, sort, filter, and join data on your device, in-database, or in distributed big data environments Explore data with interactive charts and visualizationsAutomate spreadsheets or other manual, repetitive data tasksCreate visualizations automatically with a genAI assistantChoose from a complete range of analytic techniques, with

👍 1
⚡️Инструмент to significantly enhance the database WrenAI is an open-source tool that makes your existing database RAG-ready. It allows you to convert text to SQL, explore data from the database without writing SQL, and do many other things 🖥 GitHub 🟡 Documentation
إظهار الكل...
GitHub - Canner/WrenAI: Wren AI makes your database RAG-ready. Implement Text-to-SQL more accurately and securely.

Wren AI makes your database RAG-ready. Implement Text-to-SQL more accurately and securely. - Canner/WrenAI

⚡️💡💻 MySQL 9.0.0 has been released Oracle recently released MySQL DBMS 9.0.0. The developers of the project have prepared and made publicly available MySQL Community Server 9.0.0 builds for major Linux, FreeBSD, macOS and Windows distributions. In 2023, the company announced a change in the MySQL DBMS release formation model. Developers began releasing two types of MySQL branches: Innovation (new features, frequent updates, three months of support) and LTS (with extended support time and unchanged behavior). As the developers note, the MySQL 9.0 project is assigned to the Innovation branch, which will also include the next major releases of MySQL 9.1 and 9.2. Distributions based on Innovation branches are recommended for those users who want to get access to new functionality earlier. They are published every 3 months and are supported only until the next major release is published (for example, after the 9.1 branch is released, support for the 9.0 branch will be discontinued).
إظهار الكل...
Introducing MySQL Innovation and Long-Term Support (LTS) versions

Introducing MySQL Innovation and Long-Term Support (LTS) versions.

👍 1
💻High-performance distributed database YugabyteDB is a high-performance distributed database that supports all PostgreSQL features. YugabyteDB is well suited for cloud-based OLTP applications (i.e. real-time and business-critical) that require absolute data correctness and require scalability or high fault tolerance. 🖥 GitHub 🟡 Documentation Creating a local YugabyteDB cluster with Docker:
docker run -d --name yugabyte -p7000:7000 -p9000:9000 -p15433:15433 -p5433:5433 -p9042:9042 \
 yugabytedb/yugabyte:2.21.1.0-b271 bin/yugabyted start \
 --background=false
إظهار الكل...
GitHub - yugabyte/yugabyte-db: YugabyteDB - the cloud native distributed SQL database for mission-critical applications.

YugabyteDB - the cloud native distributed SQL database for mission-critical applications. - yugabyte/yugabyte-db

🎼Datasets and projects for music generation and analysis tasks MAESTRO - (MIDI and Audio Edited for Synchronous Tracks and Organization) contains over 200 hours of annotated recordings of international piano competitions over the past ten years. NSynth - the dataset consists of 305,979 musical notes and includes recordings of 1006 different instruments, such as flute, guitar, piano and organ. The dataset is annotated by instrument type (acoustic, electronic or synthetic) and other sound parameters. Lakh MIDI v0.1 - There are 176,581 MIDI files in the dataset, of which 45,129 are associated with samples from the Million Song Dataset. This dataset is designed to simplify the search for music information based on text and audio content on a large scale. Music21 - contains musical performances from 21 categories and is aimed at solving research problems (for example, finding an answer to the question: “Which group used these chords for the first time ?)
إظهار الكل...
The MAESTRO Dataset

MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) is a dataset composed of about 200 hours of virtuosic piano performances captured wit...

👍 1
🌎TOP DS-events all over the world in July Jul 9 - The Martech Summit - Hong Kong, China - https://themartechsummit.com/hongkong Jul 9-11 - DATA 2024 – Dijon, France - https://data.scitevents.org/ Jul 9-11 - Transform 2024 - San Francisco, USA - https://transform24.venturebeat.com/ Jul 11-12 - DataConnect Conference – Ohio, United States - https://www.dataconnectconf.com/ Jul 17 - Data & Analytics Live - Online - https://data-analytics-live.coriniumintelligence.com/ Jul 23 - CDAO Indonesia - Indonesia - https://cdao-id.coriniumintelligence.com/ Jul 26 - The MachineCon 2024 - New York, USA - https://machinecon.aimresearch.co/ Jul 29-30 - Gartner Data Analytics Summit - Sydney, Australia - https://www.gartner.com/en/conferences/apac/data-analytics-australia
إظهار الكل...
The MarTech Summit Hong Kong 9 July 2024

Join MarTech leaders from global companies in Hong Kong on 9 July 2024 to learn how to converge marketing & technology for a winning future.

⚡️Hyperconverged cloud open-source database MatrixOne is a hyperconverged cloud distributed database with a structure that separates storage, compute and transactions into a single HSTAP data engine. This mechanism allows a single database system to handle a variety of business workloads such as OLTP, OLAP, and stream computing. MatrixOne supports deployment and use in public and private clouds, providing compatibility with a variety of infrastructures. 🖥 GitHub 🟡 Documentation
إظهار الكل...
GitHub - matrixorigin/matrixone: Hyperconverged cloud-edge native database

Hyperconverged cloud-edge native database. Contribute to matrixorigin/matrixone development by creating an account on GitHub.

👍 1
⚔️🔎ACID in Kafka vs ACID in Airflow when processing Big data: advantages and disadvantages When considering two popular data science tools such as Apache Kafka and Apache Airflow, it is important to understand how they deal with ACID principles (Atomicity, Consistency, Isolation, Durability). These principles are critical to ensuring reliable and predictable data processing. Benefits of Kafka ACID: 1. Durability: Kafka stores data in disk memory, which ensures its safety even in the event of a system failure. 2. Consistency: When configured correctly, Kafka ensures that all consumers receive the same data in the same order. 3. Isolation: Messages in Kafka are divided into topics and sections, which helps isolate data processing between different threads. Disadvantages of Kafka ACID: 1. Atomicity: Kafka does not always guarantee atomicity at the message level. In some cases, duplicate messages or omissions may occur if additional tools such as Kafka Transactions are not used. 2. Complexity of Configuration: Achieving ACID properties in Kafka requires complex configuration and management, including replication and transaction configuration. Advantages of Airflow ACID: 1. Atomicity: Airflow provides atomicity at the task level. If a task fails, the entire DAG (Directed Acyclic Graph) can be re-run or restored from the point of failure. 2. Consistency: Airflow maintains a strict sequence of tasks, ensuring a consistent state of data. 3. Dependency Management: Airflow allows you to manage dependencies between tasks, making it easier to ensure data isolation and consistency. Disadvantages of Airflow ACID: 1. Performance: Unlike Kafka, Airflow is not designed for real-time data processing. Its main purpose is to manage long-term and complex work processes. 2. Durability: Although Airflow maintains the state of tasks and DAGs, it relies on external data stores (such as databases) for long-term data storage, which may require additional effort to ensure durability. Thus, Apache Kafka is better suited for real-time data processing with high performance and durability, but may require complex tuning to achieve atomicity and consistency. Apache Airflow, in turn, is great at managing and orchestrating complex workflows, providing atomicity and consistency at the task level, but is not designed for real-time streaming data processing.
إظهار الكل...

Apache Kafka: A Distributed Streaming Platform.

👍 1
📊A huge dataset of images and their captions Pixel Prose is a dataset that contains over 16 million diverse images from three different web databases (commonPool, CC12M, RedCaps) with captions created using Google Gemini 1.0 Pro Vision. The following Python script can be used to load a dataset using the API: from datasets import load_dataset # for downloading the whole data ds = load_dataset("tomg-group-umd/pixelprose")
إظهار الكل...
tomg-group-umd/pixelprose · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

⚡️💡Open-source data container orchestration system for running AI systems dstack is an open-source container orchestration engine designed for AI workloads in any cloud or data center. Cloud providers supported by this technology include AWS, GCP, Azure, OCI, Lambda, TensorDock, Vast.ai, RunPod, and CUDO. If you have standard AWS, GCP, Azure or OCI credentials on your device, the dstack server will pick them up automatically. 🖥GitHub 🟡 Documentation
إظهار الكل...
GitHub - dstackai/dstack: An open-source container orchestration engine for running AI workloads in any cloud or data center. https://discord.gg/u8SmfwPpMd

An open-source container orchestration engine for running AI workloads in any cloud or data center.

https://discord.gg/u8SmfwPpMd

- dstackai/dstack

1
اختر خطة مختلفة

تسمح خطتك الحالية بتحليلات لما لا يزيد عن 5 قنوات. للحصول على المزيد، يُرجى اختيار خطة مختلفة.