Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream Reader: event to detect first byte of each chunk added to internal buffer #1126

Closed
mlasak opened this issue May 11, 2021 · 8 comments
Closed

Comments

@mlasak
Copy link

mlasak commented May 11, 2021

After some hours of reading the specs it looks like there is no way to get an indication for the point in time when the first byte of a transferred chunk was added to internal buffer of a stream reader.

The use case of this is for example the measurement of throughput in bursty chunked-transfer with idle times between the HTTP chunks when using fetch api.

Example from the Spec how it IS:

function readAllChunks(readableStream) {
  const reader = readableStream.getReader();
  const chunks = [];

  return pump();

  function pump() {
    return reader.read().then(({ value, done }) => {
      if (done) {
        return chunks;
      }

      chunks.push(value);
      return pump();
    });
  }
}

Example how it could be (or in some other comparable way) to enable above use case:

function readAllChunks(readableStream) {
  const reader = readableStream.getReader();
  const chunks = [];

  // --- added from here ----
  const chunkStartTimes = [];
  const chunkEndTimes = [];
  const chunkBytes = [];
  // the following line is the essential addition
  reader.addEventListener('readable', () => chunksStartTimes.push(Date.now()) );
  // --- added to here ----

  return pump();

  function pump() {
    return reader.read().then(({ value, done }) => {
      if (done) {
        // --- added from here ----
        // at this point throughput calculation for each chunk is possible
        // --- added to here ----
        return chunks;
      }
  // --- added from here ----
  chunksEndTimes.push(Date.now());
  chunkBytes.push(value.byteLength)
  // --- added to here ----
      chunks.push(value);
      return pump();
    });
  }
}

Or is there some other way (preferably supported by browsers already) to achieve this desired measurement with fetch?

@ricea
Copy link
Collaborator

ricea commented May 11, 2021

The streams API for non-byte-streams treats chunks as atomic units, so there's no concept of a "first byte".

Eventually I expect fetch will use byte streams instead, but even then, the "time for first byte" will just be the same as "time for first chunk".

I don't know about other browsers, but Chrome we handle network input in chunks anyway, so there really isn't a concept of "first byte" distinct from "first chunk" (despite what the resource timing API may imply).

What this means for measuring throughput is that chunkEndTime will always be the same as chunkStartTime, so exposing them separately would not help you.

@mlasak
Copy link
Author

mlasak commented May 11, 2021

@ricea thanks for the quick response! Yes, the reason of this issue is actually to point out the lack of an event signaling the start of the chunk transfer via network for each chunk separately. Without such information correct throughput measurement seems impossible.

Let me provide a minimal self-contained code example to showcase the problem:

Node.js script to produce chunked-transfer data and serve the index.html below

const http = require('http');
const fs = require('fs');

let index = '';
fs.readFile('index.html', (err, data) => {
    if (err) {
        throw err;
    }
    index = data.toString();
});

const hostname = '127.0.0.1';
const port = 3000;

async function produceData() {
    return new Promise(resolve => {
        setTimeout(resolve.bind(this, Buffer.alloc(1024)), 100);
    });
}

const server = http.createServer(async (req, res) => {
  res.statusCode = 200;
  if (req.url !== '/data') {
    res.setHeader('Content-Type', 'text/html');
    res.end(index)
    return;
  }
  res.setHeader('Content-Type', 'application/octet-stream');
  res.setHeader('Transfer-Encoding', 'chunked');
  
  for (let iter = 0; iter < 100; iter++) {
    const chunk = await produceData();
    res.write(chunk);
  }
  res.end();
});

server.listen(port, hostname, () => {
  console.log(`Server running at http://${hostname}:${port}/`);
});

HMTL (index.html)

<!DOCTYPE html>
<html lang="en">
    <head>
        <title>fetch api - stream reader - throughput check</title>
    </head>
    <body>
        please open dev tools

        <script>
            fetch('./data')
                    .then(response => response.body)
                    .then(body => {
                        const reader = body.getReader();

                        let timeMark = Date.now();
                        let timeSum = 0;
                        let byteSum = 0;
                        let chunkCount = 0;
                        function pump() {
                            return reader.read().then(({ done, value }) => {
                                if (done) {
                                    console.log(`got all chunks. Prize question: what is the actual network throughput? ${byteSum / (timeSum / 1000)} bytes per second does not seem right!`)
                                    return;
                                }
                                console.log(`got ${++chunkCount}. chunk with ${value.byteLength} bytes, in ${Date.now() - timeMark} ms`);
                                byteSum += value.byteLength;
                                timeSum += (Date.now() - timeMark);
                                timeMark = Date.now();
                                return pump();
                            });
                        }
                        pump();
                    })
        </script>
    </body>
</html>

If you start above node script and navigate to http://127.0.0.1:3000/ you should see the following in dev tools

got 1. chunk with 1024 bytes, in 0 ms
got 2. chunk with 1024 bytes, in 103 ms
got 3. chunk with 1024 bytes, in 101 ms
...
got 98. chunk with 1024 bytes, in 101 ms
got 99. chunk with 1024 bytes, in 103 ms
got 100. chunk with 1024 bytes, in 100 ms
got all chunks. Prize question: what is the actual network throughput? 10123.578843302026 bytes per second does not seem right!

So, did i misunderstood something or is it simply impossible to measure correct network throughput with fetch api and the stream reader?

Side note: For more clarity i've changed the issue title and desired event name to 'chunkTransferStarted'.

@mlasak mlasak changed the title Stream Reader: event to detect first byte added to internal buffer Stream Reader: event to detect first byte of each chunk added to internal buffer May 11, 2021
@ricea
Copy link
Collaborator

ricea commented May 11, 2021

So, did i misunderstood something or is it simply impossible to measure correct network throughput with fetch api and the stream reader?

It's worse than that: it's impossible to measure correct network throughput in the browser at all. The browser cannot distinguish between slowness caused by the network and slowness caused by the origin server.

@mlasak
Copy link
Author

mlasak commented May 11, 2021

The reason for slowness is not an issue. It can be even mixed: slow production at the source AND network limitations on the way to the client. But what we really need is to measure the actual throughput on the client during transfer without the "idle" times between the chunks.

@ricea
Copy link
Collaborator

ricea commented May 11, 2021

But what we really need is to measure the actual throughput on the client during transfer without the "idle" times between the chunks.

The browser has no way to distinguish between "idle" and "slow". They look the same.

@MattiasBuelens
Copy link
Collaborator

As far as I know, this is indeed impossible.

I work for a company that builds online video player solutions. In recent years, there's a huge interest in the industry for low-latency live streaming. In such streams, the origin server announces the availability of the next audio/video segment before that segment is fully complete. A low-latency player can already send the request for that segment and start downloading it while it is still being generated.

However, this makes it difficult to do accurate network bandwidth estimations (for adaptive bitrate switching). The player is no longer continuously downloading at the full "line speed" of its network, instead it will receive "bursts" of data as the segment is being generated. A naive implementation that does not take these "bursts" into account would conclude that the bandwidth estimate is always less than or equal to the segment's bitrate. That is: if the player is downloading a 2 Mbps video segment over a 10 Mbps link, it would incorrectly estimate that the network bandwidth is 2 Mbps, and never attempt to switch up to a higher video quality (with a higher bitrate). This makes for a poor viewer experience.

The state-of-the-art is to try to detect which chunks were received without any delay between them (i.e. are part of the same "burst"), and only estimate the bandwidth across those chunks. For example, ACTE does this. From their paper:

At each chunk downloading step i, the average bandwidth is calculated as follows: [...] (1) where Q is the chunk size, and b and e are the beginning and end times of the chunk download, respectively, as illustrated in Figure 4. (1) requires us to know the values for Q, b and e. Q is inferred from the HTTP header and the HTTP Fetch API provides us the value for e. However, with the standard HTTP protocol we have no means to determine the value for b. If there is a non-negligible idle period after a chunk is downloaded (i.e., when bn+1 - en ≫ 0), that chunk must be disregarded in computing (1). Since we do not know the b values, we have to determine such chunks in a different way.

I don't know what your specific use case is, but perhaps it's close enough so that you can borrow some ideas from low-latency video streaming and ACTE? 🙂

@mlasak
Copy link
Author

mlasak commented May 14, 2021

@ricea that's was the motivation for raising this issue. On client side it should be possible to detect if transmission is ongoing or idle. My hope (and recommendation/suggestion) is that this missing piece in form of an event (see below) will be added to the Streams spec.

@MattiasBuelens Thank you for confirming the current (not very satisfying) situation.

Our use case is exactly the one you have mentioned ;) The group I’m working for is the current maintainer of dash.js
My task is to validate and improve the throughput calculation in low-latency streaming because we see that current implementations have issues.

The paper you have mentioned I know very well, interesting work. However, this approach fails in throughput estimation in my simple code example above since all chunks are equal. This is the case when an encoder produces chunks at equidistant times, which is very likely in ULL in my opinion.
Moreover, the authors of the paper themselves state what the missing piece in fetch api is. Actually, the authors write

with the standard HTTP protocol we have no means to determine the value for b

but in fact they mean fetch api/streams api, as HTTP 1.1 standard specifies the the size of the chunk to be sent [1]. So why not fixing this Stream spec issue to allow for simple and exact measurement in future?

An event announcing the start of chunk transmission will reduce the problem back again to the simple formula transferred_bits/transmission_duration. Interestingly, in Node.js this event already exists since a while, it is called readable [2] which offers this missing piece (note: I’ve changed above example to use this event name instead).

Here an Node.js example consumer, that allows for exact throughput measurement without the need for any sophisticated calculation and predictions (make sure sender example runs from above example #1126 (comment)):

const http = require('http');

let timeMark = Date.now();
let chunkCount = 0;
http.get('http://localhost:3000/data', (res) => {

    // https://nodejs.org/dist/latest-v14.x/docs/api/stream.html#stream_event_readable
    res.on('readable', () => {
        console.log(`readable`);
        timeMark = Date.now();
        res.read();
    });
    res.on('data', (chunk) => { 
        console.log(`got ${++chunkCount}. chunk with ${chunk.length} bytes, in ${Date.now() - timeMark} ms`);
     });
    res.on('end', () => {
        console.log('got all chunks');
    });
}).on('error', (e) => {
    console.error(`Got error: ${e.message}`);
});

The result is what we expect and desire:

readable
got 1. chunk with 1024 bytes, in 0 ms
readable
got 2. chunk with 1024 bytes, in 0 ms
...
readable
got 80. chunk with 1024 bytes, in 1 ms
...
readable
got 99. chunk with 1024 bytes, in 0 ms
readable
got 100. chunk with 1024 bytes, in 0 ms
readable
got all chunks

Making the readable event available in Web browsers would have huge benefits. Wdyt?

[1] https://datatracker.ietf.org/doc/html/rfc7230#section-4.1
[2] https://nodejs.org/dist/latest-v14.x/docs/api/stream.html#stream_event_readable

@ricea
Copy link
Collaborator

ricea commented May 17, 2021

@ricea that's was the motivation for raising this issue. On client side it should be possible to detect if transmission is ongoing or idle.

I think you misunderstood my point, which is that it is not possible at all. No API change can make it possible. The information is simply not available to the client.

An event announcing the start of chunk transmission will reduce the problem back again to the simple formula transferred_bits/transmission_duration. Interestingly, in Node.js this event already exists since a while, it is called readable [2] which offers this missing piece

This Node.js code doesn't measure anything meaningful, and certainly not the speed of the network. It's basically just a benchmark of how fast Node can emit the "data" event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants