ch
Feedback
There will be no singularity

There will be no singularity

前往频道在 Telegram

Smartface, technologies and decay @antonrevyako

显示更多
1 957
订阅者
无数据24 小时
-17
-530
帖子存档
Eyes Wide Shut Data Lineage №1: DATA LINEAGE IS A LIE Suddenly discovered that any data lineage tools are missing one important part... Had to fix it 🙂 https://www.linkedin.com/posts/anton-revyako_snowflake-snowflakedatacloud-datalineage-activity-7232405043776282625--ADs/

Repost from DataEng
Ребята из Supabase не перестают радовать! 😲 На днях в сети появился новый сервис от Supabase: https://postgres.new. Это data modeling сервис на базе Postgres со встроенным AI ассистентом. Запуск Postgres прямо в браузере возможен благодаря https://pglite.dev/. Это облегчённая версия Postgres, упакованная в WASM. Целевая аудитория сервиса: — аналитики данных — дата инженеры — студенты, изучающие реляционные базы данных, в частности PostgreSQL — разработчики, разрабатывающие схемы таблиц Пример работы с сервисом смотрите на Ютубе.

Repost from Generative Anton
Из сорцов sqlite
Из сорцов sqlite

OtterTune is dead… https://ottertune.com/
OtterTune is dead… https://ottertune.com/

Hey friends, We built a database performance management tool Releem that helps engineers to keep database servers fast, secure and reliable. Why Releem: - Automatic performance tuning - Fast slow query identification - Monitoring & tuning in one place - MySQL / MariaDB & AWS RDS. - Open-source agent After few years development we launched on Product Hunt. If you interested in such solutions, have some questions or feedback appreciate your support❤️ here https://www.producthunt.com/posts/releem

A friend of mine is on a product hunt today!

Friends from Luna Park are looking for a Data Engineer — could be you or someone you know (could be DS, Python dev or ML engineer if you ask me) Palabra.ai — the first ever real-time voice interpreter: starting as a tiny team of five engineers, they built a speech-to-text solution that works 50 times faster than OpenAI's Whisper! The interpreter prototype is releasing in a month and will work in Zoom with a two-second delay — right now it’s one of a kind! They are now looking for a senior engineer, who will write and optimize complex multi-node data pipelines, work on large-scale data scraping and manage datasets that include hundreds of thousands of hours of audio data. Key requirements: 🟣5+ years of industry experience as a software developer/data engineer; 🟡Python skills (including modern backend frameworks, low-level asyncio, multithreading/multiprocessing); 🔵A good grasp on neural networks and related tools (NumPy, PyTorch, audio-related libraries such as torchaudio, librosa, etc); 🟢Experience in deploying, orchestrating, and scaling multi-node pipelines in the cloud. Nice-to-haves: 🔘Compilated languages such as Go, C/C++, and Rust; 🔘CUDA or other GPU-related frameworks experience; 🔘Audio processing-related experience. Salary is $70k-100k, plus equity up to 1%. It’s a fully remote position. To apply or learn more, reach out to my Luna Park buddy Fedya @owlkov

Snowflake's hidden gem - CTE macros We all like to use CTEs (Common Table Expression). It makes our code cleaner and sometimes help to speed up queries. But somehow, Snowflake documentation hides from us one beautiful behavior of CTE that will make your life even more convenient. What does the documentation say?
A CTE (common table expression) is a named subquery defined in a WITH clause. You can think of the CTE as a temporary view for use in the statement that defines the CTE
In other words: - we can only use SELECT in the CTE - the result of the CTE is a view-like object Surely many of you have used the CTE to define some constants that are used further in the query. For example:
 
WITH
  var_cte AS (
    SELECT 'Snowflake' AS vendor
  )
SELECT *
FROM t
WHERE
  vendor = (SELECT vendor FROM var_cte)
Tolerable, but not perfect. Especially scalar subqueries. Scalar subqueries are bad practice. Avoid using them! Can this be done in a more elegant way? Yes! Look at this:

WITH
  var_cte AS ('Snowflake')
SELECT *
FROM t
WHERE
  vendor = var_cte;
Wow! The code became easier to read, and we got rid of scalar subqueries at the same time! What if we want to use several values at once? We can make an object:

WITH
  var_cte AS ({'vendor': 'Snowflake'})
SELECT * FROM t WHERE
  vendor = var_cte:vendor;
Or here's the IN analogue:

WITH
  var_cte AS (['Snowflake', 'Bigquery'])
SELECT * FROM t WHERE
  ARRAY_CONTAINS(vendor::VARIANT, var_cte)
Although it's not so beautiful anymore…
It turns out that CTE can be not only a view-like object, but also a scalar value! Very cool, but even this is not a final:

CREATE TABLE t AS SELECT 1 AS a, 2 AS b;

WITH
  var_cte AS (a+b)
SELECT
  var_cte
FROM t;
As a result, we will get a table with a var_cte column and a value of 3. I.e. CTE is not only a view-like object, and not only a scalar value, but also an alias to any expression! Here's another example:

WITH
  var_cte AS (SUM(a))
SELECT
  var_cte
FROM t;
Yes, you can use any function calls there, including aggregate function calls. And even that works too:

WITH
  var_cte1 AS (a),
  var_cte2 AS (var_cte1+b)
SELECT
  var_cte2
FROM t;
And like a function's argument:

WITH
  var_cte AS (a + b)
SELECT
  ROUND(var_cte/2, 0)
FROM t;
Are there any downsides? Unfortunately, yes… First of all, CTE macros refuse to work when you use them in UNION and inline FROM queries:

-- doesn't work!

WITH
  var_cte AS (a+b)
SELECT var_cte FROM t1
UNION
SELECT var_cte FROM t2
;

-- and here

WITH
  var_cte AS (a)
SELECT * FROM (
  SELECT var_cte FROM t
);
Maybe Snowflake engineers will finish this functionality and CTE macros will become possible to use everywhere. And second, none of the data lineage tools will tell you that. But the good news is that in dwh.dev we take CTE macros into account at compile time and display all relevant connections in lineage! PS: I found out about it quite by accident from the last example in the documentation of the ENCRYPT_RAW function PPS: thumbs on at linkedin

2024 MAD (Machine Learning, AI & Data) Landscape https://mattturck.com/mad2024/ PDF: https://mattturck.com/landscape/mad2024.pdf