AWS Closes S3 Read Stream Unexpectedly

I’m continuing with my notes on transferring big files from and to AWS S3 with node.js

If you are reading a file from a S3 bucket using a stream that you occasionally pause, mind that the read stream will be closed in 60 minutes.

If you cannot handle the file in that period of time, you’d receive a ‘data’ and an ‘end’ event, even though you didn’t finish processing the file.

One possible solution here is to download the file before starting the import, process it and delete it once we don’t need it any more.

//So instead of:
const s3Stream = s3.getObject( params ).createReadStream();
const csvStream = fastCsv.fromStream( s3Stream, csvParams );
/* Do your processing of the csvStream */


// Store your file to the file system
const s3Stream = s3.getObject( params ).createReadStream();
const localFileWriteStream = fs.createWriteStream( path.resolve( 'tmp' , 'big.csv' ) );
s3Stream.pipe( localFileWriteStream );

localFileWriteStream .on( 'close', () => {
    const localReadStream = fs.createReadStream( path.resolve( 'tmp', 'big.csv' ) );

    const csvStream = fastCsv.fromStream( localReadStream , csvParams );

    csvStream.on( 'data', ( data ) => {
        /* Do your processing of the csvStream */
    });

    csvStream.on( 'end', () => {
        // Delete the tmp file
        fs.unlink( path.resolve( 'tmp', 'big.csv' ) );
    });
);

Node.js Streams and why sometimes they don’t pause()

TL; DR: 

If you pipe your node.js streams, make sure you pause the last one in the chain.

const stream1 = s3.getObject( params ).createReadStream();
const stream2 = fasctCsv.fromStream( stream1 ); // This makes piping behind the scenes.
// If you want to pause the streams, pause the last in the chain
stream2.pause();

The longer story:

We’re building a node.js application that ingests data from multiple data sources for a client of ours. Since they are quite big in size and in user base (we’re going to process data for ~50 M users from tens of systems), the ingested files (CSV) are also relatively big in size (several GB)


We’re using AWS S3 as the glue for the data – the systems are uploading their data there and we’re monitoring for new data to ingest. We’re using the aws-sdk node package to read them as streams, parse them using fastCsv and create audit log and snapshots for each of the user in a PG database.

We are batching the inserts and are pausing the data stream for each of the batches, so we don’t end up with a back-pressure problem.

While testing the ingestion of the big files we noticed something peculiar. We thought we’ve paused the stream but it continued to push data like the .pause() was not invoked.

The mistake we made turned out to be quite common if you work with streams – we called the .pause() method of the s3stream, which we have pipe()-d to another stream – the fastCsv one. In this scenario when the fastCsv stream drained, it called the resume method of the s3stream.

In order to pause the streams, one must pause() the last piped one (in our case the fastCsv one)

More on back-pressure: Together with the research about our issue I found a very extensive article about back-pressure in node.js

Testing with Jest in a node and ReactJS monorepo (and getting rid of environment.teardown error)

Big number of the applications we develop have at least one ReactJS UI, that is held in one repo and an API, held in another. If we need to reuse some part of the code, we do so by moving it to another repository and adding it as a git submodule

For our latest project we decided to give the monorepo approach a try (which we didn’t come to a conclusion yet if it better fits our needs). The project is a node.js API with a ReactJS app, that is based on create-react-app

This first issue we faced with it was with testing the node app – tests ran just fine in the react application (/app/) but if you tried to run it for the server, you’d get the following error:

● Test suite failed to run
TypeError: environment.teardown is not a function

  at ../node_modules/jest-runner/build/run_test.js:230:25

In our package.json we had the trivial test definition – just running jest:

"scripts": {
    ...
    "test": "jest"
    ...
}

We didn’t had issue with this approach in a node API with no CRA app in it, so as it turned out to be the case, we had to indicate that the environment is node.

To do so we added a testconfig.json and added it to the script in the package.json

testconfig.json

{
	"testEnvironment": "node"
}

package.json

{
	"scripts": {
		"test": "jest --config=testconfig.json"
	}
}

If you want jest to monitor your files, change the “test’ script to “jest –watchAll –config=testconfig.json”

JS variable loaded using wp_localize_script is no longer available

TL;DR: Call the wp_localize_script after registering/enqueuing the script you are localizing.

Recently (WP 4.1.4 / WP 4.2) my ajax scripts stopped working. I’ve noticed that it was caused by the fact that the variable for the ajax_url was undefined so for some reason it ws no longer loaded to the page.

The reason turned out to be that I was using wp_localize_script to load the ajax url variable for a script that was not yet registered and enqueued.

Seeing the codex now I noticed that they have added: “IMPORTANT! wp_localize_script() MUST be called after the script it’s being attached to has been registered using wp_register_script() or wp_enqueue_script()”. However I don’t remember this not working before and also it did work until now.

So mind to check every site you have made using WP ajax and following the good practices (to load the ajax url using wp_localize_scripts) if they are still working.

Retrieving data from a form created with Contact Form 7

If you want to get the data from a form created with Contact Form 7 you can use the ‘wpcf7_before_send_mail’ hook. In your functions.php or from your plugin add action like follows:

add_action( 'wpcf7_before_send_mail', 'my_plugin_wpcf7_before_send_mail' );

function my_plugin_wpcf7_before_send_mail ( $contact_form ) {
    // TODO: get the data
}

Since version 3.9 Contact Form 7 removed $contact_form->posted data so this hook might seem like it is no longer working.

However we can still get the data but using a bit different approach provided by the Contact Form 7 API

function my_plugin_wpcf7_before_send_mail ( $contact_form ) {
    $wpcf7_submission = WPCF7_Submission::get_instance();

    $posted_data = $wpcf7_submission->get_posted_data();

    // in $posted_data you have the things sent via the form
    // try $posted_data[ 'your-email' ] to get the email from a default contact form
}

Using the $posted_data retrieved in the shown way we can get any of the fields that were sent by the user using the created form.

geeneric.com is live

We at Shtrak have been working lately on a WordPress and WooCommerce based platform for online shops.
You can create your eCommerce site for free on geeneric.com

Let me know what do you think about our service – I’m open to any ideas for improvement!

The native Facebook app on Android is no longer forcing you to use Messenger

Today I found out that the native Facebook app is no longer forcing me to use Messenger! At least on my Nexus 4.

Seems like in the end they understood that forcing a user to do something is never the right way.

From now on feel free to uninstall the crappy Messenger and use the native fb app instead.

Edit: Seems like it was a mistake with the update. So fb are still stupid.

JavaScript Събития (Custom Events) без DOM простотиите

Събитията са много добра техника да направите кода си независим между различните модули. С тях може да пишете значително по-добър и по-лесен за поддръжла JS.

Въпреки това има много случаи в които няма нужда обектите в кода да разчитат на DOM дървото за да вдигат събития. (Например когато репрезентацията на самия обект е на няколко места в страницата или такава репрезентация може да липсва дори).

Това беше и причината да напиша супер проста JS библиотека – eventy.js – С нея можеш да направиш всеки обект от кода си такъв, какъвто да вдига събития (events) и да предоставя начин за абониране към тях.

Работи дори за класове, които си си дефинирал и може да се вдигат събития от класа или от инстанция на този клас. (Същото важи и за абонирането).

Пробвай го на https://github.com/ninio/eventy.js

JavaScript Custom Events that are not DOM related

Events is a great technique to loose code coupling. JS Custom Events was a big step to help the front-end developers write better JS.

Despite this there are many cases where you don’t have your app objects represented in the DOM tree. And do you always have to trigger this event to the DOM when you don’t actually want this to be DOM related.

This was the reason I wrote the JavaScript library eventy.js – You can enable event interface for any object of your application. It works both for modules and for JS classes.
In the class option you can use the events both from an instance or from the class itself.

Go and check it on https://github.com/ninio/eventy.js