DV footprints on Disk and in Memory, Part 1

More than 2 years ago I estimated the footprints for the sample dataset (428999 rows and 135 columns) when it encapsulated in text file, in compressed ZIP format, in Excel 2010, in PowerPivot 2010, Qlikview 10, Spofire 3.3 and Tableau 6. Since then everything upgraded to the “latest versions” and everything 64-bit now, including Tableau 8.1, Spotfire 5.5 (and 6), Qlikview 11.2, Excel 2013 and PowerPivot 2013.

I decided to use the new dataset with exactly 1000000 rows (1 million rows) and 15 columns with the following diversity of values (Distinct Counts for every Column below):

Then I put this dataset in every application and format mentioned above – both on disk and in memory. All results presented below for review of DV blog visitors:

Some comments about application specifics:

  • Excel and PowerPivot XLSX files are ZIP-compressed archives of bunch of XML files

  • Spotfire DXP is a ZIP archive of proprietary Spotfire text format

  • QVW  is Qlikview’s proprietary Datastore-RAM-optimized format

  • TWBX is Tableau-specific ZIP archive containing its TDE (Tableau Data Extract) and TWB (XML format) data-less workbook

  • Footprint in memory I calculated as RAM-difference between freshly-loaded (without data) application and  the same application when it will load appropriate application file (XLSX or DXP or QVW or TWBX)


5 thoughts on “DV footprints on Disk and in Memory, Part 1

  1. Thanks, interesting stuff. Really surprised about the different in footprint between Spotfire and Tableau. So does this essentially mean you can work with 3-4 X as much data into memory with Tableau than you could with Spotfire, given a machine with the same amount of RAM?

    • Hi Steve: the comparison above is not to suggest that Tableau can work with more data in Memory. Many other factors involved, e.g. the usage of Disk space as Virtual Memory. Both Spotfire and Tableau use the Virtual Memory, when RAM is not available and when the size of Dataset will grow it will affect RAM footprint for sure.
      Qlikview did not use Virtual RAM; until v 11.2 Qlikview required all data loaded into RAM, but since v.11.2 it introduced Direct Discovery allowing connect to disk-located data. Usage of Direct Discovery actually can slow down Qlikview.
      In any case, modern proliferation of SSD can improve the speed of Virtual memory and a speed of other DV interactions with Disks.

  2. Hello Andrei, can you possibly share your new sample data set? I would be great if it was available since it would make your comparative test even more valuable. With the data, people can compare additional products with your results as a benchmark.

  3. Pingback: DV footprints on Disk and in Memory, Part 2 | Data Visualization

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s