DV footprints on Disk and in Memory, Part 2

My previous blogpost, comparing footprints of DV Leaders (Tableau 8.1, Qlikview 11.2, Spotfire 6) on disk (in terms of size of application file with embedded dataset with 1 million rows) and in Memory (calculated as RAM-difference between freshly-loaded (without data) application and  the same application when it will load appropriate application file (XLSX or DXP or QVW or TWBX) got a lot of feedback from DV Blog visitors. It even got mentioning/reference/quote from Tableau Weekly #9 here:

http://us7.campaign-archive1.com/?u=f3dd94f15b41de877be6b0d4b&id=26fd537d2d&e=5943cb836b and the full list of Tableau Weekly issues is here: http://us7.campaign-archive1.com/home/?u=f3dd94f15b41de877be6b0d4b&id=d23712a896

The majority of feedback asked to do a similar Benchmark – the footprint comparison for larger dataset, say with 10 millions of rows. I did that but it required more time and work,  because the footprint in memory for all 3 DV Leaders depends on the number of visualized Datapoints (Spotfire for years used the term Marks for Visible Datapoints and Tableau adopted these terminology too, so I used it from time to time as well, but I think that the correct term here will be “Visible Datapoints“).

Basically I used the same dataset as in previous blogpost with main difference that I took subset with 10 millions of rows as a opposed to 1 Million rows in previous Benchmarks. The Diversity of used Dataset with 10 Million rows is here (each row has 15 fields as in previous benchmark):

I removed from benchmarks for 10 million rows the usage of Excel 2013 (Excel cannot handle more the 1,048,576 rows per worksheet) and PowerPivot 2013 (it is less relevant for given Benchmark). Here are the DV Footprints on disk and in Memory for Dataset with 10 Million rows and different number of Datapoints (or Marks: <16, 1000, around 10000, around 100000, around 800000):

Main observations and notes from benchmarking of footprints with 10 millions of rows as following:

  • Tableau 8.1 requires less (almost twice less) disk space for its application file .TWBX then Qlikview 11.2 (.QVW) for its application file (.QVW) or/and Spotfire 6 for its application file (.DXP).

  • Tableau 8.1 is much smarter when it uses RAM then Qlikview 11.2 and Spofire 6, because it takes advantage of number of Marks. For example for 10000 Visible Datapoints Tableau uses 13 times less RAM than Qlikview and Spotfire and for 100000 Visible Datapoints Tableau uses 8 times less RAM than Qlikview and Spotfire!

  • THe Usage of more than say 5000 Visible Datapoints (even say more than a few hundreds Marks) in particular Chart or Dashboard often the sign of bad design or poor understanding of the task at hand; the human eye (of end user) cannot comprehend too many Marks anyway, so what Tableau does (in terms of reducing the footprint in Memory when less Marks are used) is a good design.

  • For Tableau in results above I reported the total RAM used by 2 Tableau processes in memory TABLEAU.EXE itself and supplemental process TDSERVER64.EXE (this 2nd 64-bit process almost always uses about 21MB of RAM). Note: Russell Christopher also suggested to monitor TABPROTOSRV.EXE but I cannot find its traces and its usage of RAM during benchmarks.

  • Qlikview 11.2 and Spotfire 6 have similar footprints in Memory and on Disk.

DV footprints on Disk and in Memory, Part 1

More than 2 years ago I estimated the footprints for the sample dataset (428999 rows and 135 columns) when it encapsulated in text file, in compressed ZIP format, in Excel 2010, in PowerPivot 2010, Qlikview 10, Spofire 3.3 and Tableau 6. Since then everything upgraded to the “latest versions” and everything 64-bit now, including Tableau 8.1, Spotfire 5.5 (and 6), Qlikview 11.2, Excel 2013 and PowerPivot 2013.

I decided to use the new dataset with exactly 1000000 rows (1 million rows) and 15 columns with the following diversity of values (Distinct Counts for every Column below):

Then I put this dataset in every application and format mentioned above – both on disk and in memory. All results presented below for review of DV blog visitors:

Some comments about application specifics:

  • Excel and PowerPivot XLSX files are ZIP-compressed archives of bunch of XML files

  • Spotfire DXP is a ZIP archive of proprietary Spotfire text format

  • QVW  is Qlikview’s proprietary Datastore-RAM-optimized format

  • TWBX is Tableau-specific ZIP archive containing its TDE (Tableau Data Extract) and TWB (XML format) data-less workbook

  • Footprint in memory I calculated as RAM-difference between freshly-loaded (without data) application and  the same application when it will load appropriate application file (XLSX or DXP or QVW or TWBX)

Happy Shopping for your Data Visualization Lab!

Since we approaching (in USA that is) a Thanksgiving Day for 2013 and shopping is not a sin for few days, multiple blog visitors asked me what hardware advise I can share for their Data Science and Visualization Lab(s). First of all I wish you will get a good Turkey for Thanksgiving (below is what I got last year):

Turkey2012

I cannot answer DV Lab questions individually – everybody has own needs, specifics and budget, but I can share my shopping thoughts about needs for Data Visualization Lab (DV Lab). I think DV Lab needs many different types of devices: smartphones, tablets, projector (at least 1), may be a couple of Large Touchscreen Monitors (or LED TVs connectable to PCs), multiple mobile workstations (depends on size of DV Lab team), at least one or two super-workstation/server(S) residing within DV Lab etc.

Smartphones and Tablets

I use Samsung Galaxy S4 as of now, but for DV Lab needs I will consider either Sony Xperia Z Ultra or Nokia 1520 with hope that Samsung Galaxy S5 will be released soon (and may be it will be the most appropriate for DV Lab):

sonyVSnokia

My preference for Tablet will be upcoming Google Nexus 10 (2013 or 2014 edition – it is not clear, because Google is very secritive about it) and in certain cases Google Nexus 7 (2013 edition). Until Nexus 10 ( next generation) will be released, I guess that two leading choices will be ASUS Transformer Pad TF701T

t701

and Samsung Galaxy Note 10.1 2014 edition (below is a relative comparison of the size of these 2 excellent tablets):

AsusVsNote10

Projectors, Monitors and may be Cameras.

Next piece of hardware in my mind is a projector with support for full HD resolution and large screens. I think there are many good choices here, but my preference will be BENQ W1080ST for $920 (please advise if you have a better projector in mind in the same price range):

benq_W1080ST

So far you cannot find too many Touchscreen Monitors for reasonable price, so may be these two 27″ touchscreen monitors (DELL P2714T for $620 or Acer T272HL bmidz for $560) are good choices for now:

dell-p2714t-overview1

I also think that a good digital camera can help to Data Visualization Lab and considering something like this (can be bought for $300): Panasonic Lumix DMC FZ72 with 60X optical zoom and ability to do a Motion Picture Recording as HD Video in 1,920 x 1,080 pixels – for myself:

panasonic_lumix_dmc_fz72_08

Mobile and Stationary Workstations and Servers.

If you need to choose CPU, I suggest to start with Intel’s Processor Feature Filter here: http://ark.intel.com/search/advanced . In terms of mobile workstations you can get quad-core notebook (like Dell 4700 for $2400 or Dell Precison 4800 or HP ZBook 15 for $3500) with 32 GB RAM and decent configuration with multiple ports, see sample here:

m4700

If you are OK with 16GB of RAM for your workstation, you may prefer Dell M3800 with excellent touchscreen monitor (3200×1800 resolution) and only 2 kg of weight. For a stationary workstation (or rather server) good choices are Dell Precision T7600 or T7610 or HP Z820 workstation. Either of these workstations (it will cost you!) can support up to 256GB RAM, up to 16 or even 24 cores in case of HP Z820), multiple high-capacity hard disks and SSD, excellent Video Controllers and multiple monitors (4 or even 6!) Here is an example of backplane for HP Z820 workstation:

HP-z820

I wish to visitors of this blog a Happy Holidays and good luck with their DV Lab Shopping!

Data to the People: tb8.1 vs qv11.2 vs sf5.5

After announcement of Tableau 8.1 ( and completion of TCC13) this week people asked me to refresh my comparison of leading Data Visualization tools and I felt it is the good time to do it, because finally Tableau can claim it has 64-bit platform and it is able now to do more advanced Analytics, thanks to Integration with R (both new features needs to be benchmarked and tested, but until my benchmarks are completed I tend to believe to Tableau’s claims).  I actually felt that Tableau may be leapfrogged the competition and now Qlikview and Spotfire have to do something about it (of course if they care).

I enjoyed this week Tableau’s pun/wordplay/slogan “Data to the People” it reminds, of course, other slogan “Power to the People” but also indirectly refers to NYSE Symbol “DATA” which is the SYMBOL of Tableau Software Inc. and it means (indirectly): “Tableau to the People”:

DataToThePeople2

In fact the “keynote propaganda” from Christian Chabot and Chris Stolte was so close to what I am saying for years on this blog, that I used their slogan FEBA4A (“Fast, Easy, Beautiful, Anywhere for Anyone”) as the filter to include or remove from comparison any runner-ups, traditional, me-too and losing tools and vendors.

For example despite the huge recent progress Microsoft did with its BI Stack (updates in Office 2013, 365 and SQL 2012/14 of Power Pivot/View/Map/Query, SSAS, Data Explorer, Polybase, Azure Services, StreamInsight, in-Memory OLTP, Columnstore Indexing etc.) did not prevent me from removal of Microsoft’s BI Stack from comparison (MSFT still trying to sell Data Visualization as a set of add-ins to Excel and SQL Server as oppose to separate product), because it it is not FEBA4A.

For similar reasons I did not include runner-ups like Omniscope, Advizor, Panopticon (it is part of Datawatch now), Panorama, traditional BI vendors, like IBM, Oracle, SAP, SAS, Microstrategy and many me-too vendors like Actuate, Pentaho, Information Builders, Jaspersoft, Jedox, Yellowfin, Bime and dozens of others. I even was able finally to rule out wonderful toolkits like D3 (because they are not for “anyone” and they require brilliant people like Mike Bostock to shine).

I was glad to see similar thinking from Tableau’s CEO in his yesterday’s interview here: http://news.investors.com/091213-670803-tableau-takes-on-big-rivals-oracle-sap-ibm-microsoft.htm?p=full and I quote:

“The current generation of technology that companies and governments use to try to see and understand the data they store in their databases and spreadsheets is without exception complicated, development-intensive, staff-intensive, inflexible, slow-moving and expensive. And every one of those adjectives is true for each of the market-share leaders in our industry.”

Here is my brief and extremely personal (yes, opinionated but not bias) comparison of 3 leading Data Visualization (DV Comparison) platforms (if you cannot see in your browser, see screenshot below of Google Doc:

I did not add pricing to comparison, because I cannot find enough public info about it. This is all I have:

  • https://tableau.secure.force.com/webstore

  • http://www.qlikview.com/us/explore/pricing

  • https://silverspotfire.tibco.com/us/get-spotfire/silver-spotfire-feature-matrix

  • additional pricing info for Tableau Server Core Licensing: “8 core server (enough to support 1,000 users, or 100 concurrent) for Tableau is $180k first year, about $34k every year after year 1 for maintenance”. With 8 core licensing I actually witnessed support for more then 1000 users: 1300+ active interactors, 250+ Publishers, 3000+ Viewers. I also witnessed (2+ years ago, since then price grew!) more than once that negotiation with Tableau Sales can get you down to $160K for 8 Core license with 20% every year after year 1 for maintenance (so in 2010-2011 total price was about $192K with 1 year maintenance)

  • Also one of visitors indicated to me that current pricing for 8 core Tableau 8.0 license for 1st year is $240K  now plus (mandatory?) 20-25% maintenance for 1st year… However negotiations are very possible and can save you up to 20-25% of “discount”. I am aware of recent cases where 8-core license was sold (after discount) for around $195K with maintenance for 1st year for about $45K so total sale was $240K with 1st year maintenance (25% growth in price for last 3 years).

Below is a screenshot of above comparison, because some browsers (e.g. Safari or Firefox before version 24) cannot see either Google Doc embedded into WordPress or Google Doc itself:

DVComparisonSeptember2013

Please note that I did not quantify above which of 3 tools are better, it is not possible until I will repeat all benchmarks and tests (I did many of those in the past; if I will have time in the future, I can do it again) when actual Tableau 8.1 will be released (see latest here: https://licensing.tableausoftware.com/esdalt/ ). However I used above the green color for good and red color for bad (light-colored backgrounds in 3 middle columns indicated good/bad). Also keep in mind that Qliktech and TIBCO may release something new soon enough (say Qlikview 12 or they called it now Qlikview.Next and Spotfire 6), so leapfrogging game may continue.

Update 10/11/13: interesting article about Tableau (in context with Qlikview and Spotfire) by Akram Annous from SeekeingAlpha: http://seekingalpha.com/article/1738252-tableau-a-perfect-short-with-a-catalyst-to-boot . Akram is very active visitor to my blog, especially to this article above. This article only 1 month old but already needs updates due recent pre-announcements about Qlikview.Next (Qlikview 12) and Spotfire 6, which as I predicted showing that leapfrogging game continue at full speed. Akram is brave enough by “targeting” pricing for DATA shares as $55 IN 30 DAYS, $35 IN 6 MONTHS. I am not convinced yet.

frogleap4if you will see the AD below, it is not me, it is wordpress.com…