I’m continuing with my notes on transferring big files from and to AWS S3 with node.js
If you are reading a file from a S3 bucket using a stream that you occasionally pause, mind that the read stream will be closed in 60 minutes.
If you cannot handle the file in that period of time, you’d receive a ‘data’ and an ‘end’ event, even though you didn’t finish processing the file.
One possible solution here is to download the file before starting the import, process it and delete it once we don’t need it any more.
//So instead of:
const s3Stream = s3.getObject( params ).createReadStream();
const csvStream = fastCsv.fromStream( s3Stream, csvParams );
/* Do your processing of the csvStream */
// Store your file to the file system
const s3Stream = s3.getObject( params ).createReadStream();
const localFileWriteStream = fs.createWriteStream( path.resolve( 'tmp' , 'big.csv' ) );
s3Stream.pipe( localFileWriteStream );
localFileWriteStream .on( 'close', () => {
const localReadStream = fs.createReadStream( path.resolve( 'tmp', 'big.csv' ) );
const csvStream = fastCsv.fromStream( localReadStream , csvParams );
csvStream.on( 'data', ( data ) => {
/* Do your processing of the csvStream */
});
csvStream.on( 'end', () => {
// Delete the tmp file
fs.unlink( path.resolve( 'tmp', 'big.csv' ) );
});
);
Thank you sir
Glad I could help! 🙂