A computer can only understand binary data, that is, data in the form of 0’s and 1’s. The sequential movement of these data is called a stream. Stream data in broken parts called chunks; the computer starts processing the data as soon as it receives a chunk, not waiting for the whole data.
We will look here at Streams and Buffers. Sometimes, the processing speed is less than the rate of receiving chunks or faster than the rate of receiving chunks; in both cases, it is necessary to hold the chunks because processing requires a minimum amount of it, which is done using the buffers.
Buffers
Buffers are an abstraction that allows us to deal with raw binary data in Node.js. They are particularly relevant when dealing with files and networks or I/O in general.
A buffer represents a chunk of memory that is allocated to our computer. The size of the buffer, once set, cannot be changed. A buffer is used to store bytes.
Let’s create some buffers with some data:
// buffer-data.js
// Let's create some buffers with some data
const bufferFromString = Buffer.from('Ciao human')
const bufferFromByteArray = Buffer.from([67, 105, 97, 111, 32, 104, 117, 109, 97, 110])
const bufferFromHex = Buffer.from('4369616f2068756d616e', 'hex')
const bufferFromBase64 = Buffer.from('Q2lhbyBodW1hbg==', 'base64')
// data is stored in binary format
console.log(bufferFromString) // <Buffer 43 69 61 6f 20 68 75 6d 61 6e>
console.log(bufferFromByteArray) // <Buffer 43 69 61 6f 20 68 75 6d 61 6e>
console.log(bufferFromHex) // <Buffer 43 69 61 6f 20 68 75 6d 61 6e>
console.log(bufferFromBase64) // <Buffer 43 69 61 6f 20 68 75 6d 61 6e>
// Raw buffer data can be "visualized" as a string, as hex or base64
console.log(bufferFromString.toString('utf-8')) // Ciao human ('utf-8' is the default)
console.log(bufferFromString.toString('hex')) // 4369616f2068756d616e
console.log(bufferFromString.toString('base64')) // Q2lhbyBodW1hbg==
// You can get the size of a buffer (in bytes) by using `length`
console.log(bufferFromString.length) // 10
Now, let’s create a Node.js script to copy a file from one place to another using buffers:
// buffer-copy.js
import {
readFile,
writeFile
} from 'fs/promises'
async function copyFile (src, dest) {
// read entire file content
const content = await readFile(src)
// write that content somewhere else
return writeFile(dest, content)
}
// `src` is the first argument from cli, `dest` the second
const [src, dest] = process.argv
// start the copy and handle the result
copyFile(src, dest)
.then(() => console.log(`${src} copied into ${dest}`))
.catch((err) => {
console.error(err)
process.exit(1)
})
You can use this script as follows:
node 01-buffer-vs-stream/buffer-copy.js source-file dest-file
But did you ever wonder what happens when you try to copy a big file, let’s say about 3Gb?
What happens is that you should see your script dramatically failing with the following error:
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3221225472) is greater than 2 GB
at readFileHandle (internal/fs/promises.js:273:11)
at async copyFile (file:///.../streams-workshop/01-buffer-vs-stream/buffer-copy.js:8:19) {
code: 'ERR_FS_FILE_TOO_LARGE'
}
Why is this happening? 😱
Essentially because when we use fs.readFile
we load all the binary content from the file in memory using a Buffer
object. Buffers are, by design, limited in size as they live in memory.
✏️ Tip
You can create a buffer with the maximum allowed size with the following code:
// biggest-buffer.js
import buffer from 'buffer'
// Careful, this will allocate a few GBs of memory!
const biggestBuffer = Buffer.alloc(buffer.constants.MAX_LENGTH) // creates a buffer with the maximum possible size
console.log(biggestBuffer) // <Buffer 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 4294967245 more bytes>
In a way, we can think about streams as an abstraction that allows us to deal with portions of data (chunks) arriving at different moments. Every chunk is a Buffer
instance.
Streams
A stream is an abstract interface for working with streaming data in Node.js. The Node.js stream module provides an API for implementing the stream interface. Node.js provides many stream objects. For instance, a request to an HTTP server and process.stdout are stream instances.
We require streams in Node.js to handle and manipulate the streaming data like a video, a large file, etc. The streams module in Node.js is used to manage all the streams. A stream is an abstract interface used to work along with the streaming data in Node.js. There are many stream objects that Node.js offers us.
For instance, both are considered stream instances if we request an HTTP server and process. stdout. Streams can be readable, writable, or both. All streams are instances of EventEmitter. To access the stream module, the syntax to be used is:
const stream = require('stream');
Types of streams
There are four fundamental stream types within Node.js:
import {
createReadStream,
createWriteStream
} from 'fs'
const [,, src, dest] = process.argv
// create source stream
const srcStream = createReadStream(src)
// create destination stream
const destStream = createWriteStream(dest)
// when there's data on the source stream,
// write it to the dest stream
// WARNING, this solution is not perfect as we will see later
srcStream.on('data', (chunk) => destStream.write(chunk))
* `[Writable](https://nodejs.dev/en/api/v18/stream/#class-streamwritable)`: streams to which data can be written (for example, `[fs.createWriteStream()](https://nodejs.dev/en/api/v18/fs#fscreatewritestreampath-options)`).
* `[Readable](https://nodejs.dev/en/api/v18/stream/#class-streamreadable)`: streams from which data can be read (for example, `[fs.createReadStream()](https://nodejs.dev/en/api/v18/fs#fscreatereadstreampath-options)`).
* `[Duplex](https://nodejs.dev/en/api/v18/stream/#class-streamduplex)`: streams that are both `Readable` and `Writable` (for example, `[net.Socket](https://nodejs.dev/en/api/v18/net#class-netsocket)`).
* `[Transform](https://nodejs.dev/en/api/v18/stream/#class-streamtransform)`: `Duplex` streams that can modify or transform the data as it is written and read (for example, `[zlib.createDeflate()](https://nodejs.dev/en/api/v18/zlib#zlibcreatedeflateoptions)`).
// stream-copy.js
Essentially we are replacing readFile
with createReadStream
and writeFile
with createWriteStream
. Those are then used to create two stream instances srcStream
and destStream
. These objects are, respectively, instances of a ReadableStream
(input) and a WritableStream
(output).
For now, the only important detail to understand is that streams are not eager; they don’t read all the data in one go. The data is read in chunks, small portions of data. You can immediately use a chunk as soon as it is available through the data
event. When a new chunk of data is available in the source stream, we immediately write it to the destination stream. This way, we never have to keep all the file content in memory.
Keep in mind that this implementation here is not bullet-proof, there are some rough edge cases, but for now, this is good enough to understand the basic principles of stream processing in Node.js!
Readable streams → This stream is used to create a data stream for reading, as to read a large chunk of files.
Example:
const fs = require('fs');
const readableStream = fs.createReadStream('./article.md', {
highWaterMark: 10
});
readableStream.on('readable', () => {
process.stdout.write(`[${readableStream.read()}]`);
});
readableStream.on('end', () => {
console.log('DONE');
});
Writable streams → This creates a stream of data to write. For example: to write a large amount of data in a file.
Example:
const fs = require('fs');
const file = fs.createWriteStream('file.txt');
for (let i = 0; i < 10000; i++)
{
file.write('Hello world ' + i);
}
file.end();
Duplex streams → This stream is used to create a stream that is both readable and writable at the same time.
Example:
const server = http.createServer((req, res) => {
let body = '';
req.setEncoding('utf8');
req.on('data', (chunk) => {
body += chunk;
});
req.on('end', () => {
console.log(body);
try {
// Send 'Hello World' to the user
res.write('Hello World');
res.end();
} catch (er) {
res.statusCode = 400;
return res.end(`error: ${er.message}`);
}
});
});
Flowing and Non-flowing
There are two types of readable streams in NodeJs:
- Flowing stream — A stream used to pass data from the system and provide this data to your programs.
- Non-flowing stream — The non-flowing stream that does not push data automatically. Instead, the non-flowing stream stores the data in the buffer and explicitly calls the
read()
method of the stream to read it.
Memory / Time comparison
Let’s see how the two implementations (buffer and streaming) compare regarding memory usage and execution time.
One way that we can see how much data is being allocated in buffers by a Node.js script is by calling:
process.memoryUsage().arrayBuffers
const { pipeline } = require('node:stream/promises');
const fs = require('node:fs');
const zlib = require('node:zlib');
async function run() {
await pipeline(
fs.createReadStream('archive.tar'),
zlib.createGzip(),
fs.createWriteStream('archive.tar.gz'),
);
console.log('Pipeline succeeded.');
}
run().catch(console.error);
Conclusion
We have discussed different types of the four available streams available for developers. Along with these streams, we have covered what buffers are and how to manage buffers in NodeJs.