There are obviously some performance issues when it comes to the larger datasources, with latter two (500,000 & 1,000,000) being unusable within the current setup, even with the datasource content restricted to name and coordinates. This points very much to Nick's conclusion that it's the rendering of the markers rather than the datasources themselves, that are so process intensive. Also, load times stop increasing linearly quite rapidly, with the size of datasources we are using.
I have included two smaller datasources, of 10,000 and 50,000 entries, for comparison
Obviously thw times below will vary due to differences in people's machines, but they provide a good indicator.