Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to compute pipeline saying Timevector containing nulls #497

Closed
jonatas opened this issue Aug 16, 2022 · 0 comments · Fixed by #503
Closed

Fail to compute pipeline saying Timevector containing nulls #497

jonatas opened this issue Aug 16, 2022 · 0 comments · Fixed by #503
Labels
bug Something isn't working

Comments

@jonatas
Copy link
Contributor

jonatas commented Aug 16, 2022

Relevant system information:

  • OS: Mac OS - (M1 2020 processor)
  • PostgreSQL version: PostgreSQL 14.4 on x86_64-apple-darwin21.5.0, compiled by Apple clang version 13.1.6 (clang-1316.0.21.2.5), 64-bit
  • TimescaleDB Toolkit version: │1.9.0-dev
  • Installation method: "source"

Describe the bug
I'm trying to benchmark and compare how the lambdas are performing against some language-based processing code. I got an error when I introduce a few thousand records.

To Reproduce
Here is a small script that can create a scenario to test:

CREATE TABLE "measurements" ("device_id" integer not null, "val" decimal not null, "ts" timestamp not null);
SELECT create_hypertable('measurements', 'ts', chunk_time_interval => INTERVAL '1 day');

-- Generating some data
INSERT INTO measurements (ts, device_id, val)
  SELECT ts, device_id, random()*80
 FROM generate_series(TIMESTAMP '2022-01-01 00:00:00',
        TIMESTAMP '2022-02-01 00:00:00', INTERVAL '1 second') AS g1(ts),
      generate_series(1, 3) AS g2(device_id);

set search_path to toolkit_experimental, public;


-- Validating if sort is generating some null value
WITH a as (
  SELECT device_id, (timevector(ts, val) -> sort() -> unnest()).*
  FROM "measurements"
  GROUP BY device_id)
select a.* FROM a WHERE a.time is null or a.value is null;

-- Trying only to compute delta
SELECT device_id, timevector(ts, val) -> sort() -> delta()
FROM "measurements"
GROUP BY device_id ;

-- Limiting scope to make it easy to compute 
SELECT timevector(ts, val) -> sort() -> delta() -> abs() -> sum() as volatility
FROM "measurements"
WHERE device_id = 2
LIMIT 10;

Expected behavior
I was expecting to compute deltas as I have no null values in my table either timevector.

Actual behavior
The following SQL:

SELECT device_id, timevector(ts, val) -> sort() -> delta()
FROM "measurements"
GROUP BY device_id ;

returns an error saying

ERROR:  Unable to compute deltas over timevector containing nulls
CONTEXT:  extension/src/time_vector/pipeline/delta.rs:35:9

I have tested several scenarios trying to identify where is the problem, and it seems related to managing a large number of records.

Changing the generate_series to 2 minutes interval between records will generate 66963 records and it works fine on my computer but If I move it to 1 minute it will generate 130k records and then the error pops up. I tried to also increase the number of devices but from 3 to 4 devices it already fails.

ERROR:  Unable to compute stats aggregate over timevector containing nulls
CONTEXT:  extension/src/time_vector/pipeline/aggregation.rs:112:9
@jonatas jonatas added the bug Something isn't working label Aug 16, 2022
@bors bors bot closed this as completed in 56da41d Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant