Happy New 2014!

My Best Wishes for 2014 to all visitors of this Blog!

New2014

2013 was very successful year for Data Visualization (DV) community, Data Visualization vendors and for this Data Visualization Blog (number of visitors per grew from average 16000 to 25000+ per month).

From certain point of view 2013 was the year of Tableau – it went public, Tableau has now the largest Market Capitalization among DV Vendors (more than $4B as of Today) and its strategy (Data to the People!) became the most popular among DV users and it had (again) largest YoY revenue growth (almost 75% !) among DV Vendors. Tableau already employed more than 1100 people and still has 169+ job openings as of today. I wish Tableau to stay the Leader of our community and to keep their YoY above 50% – this will not be easy.

Qliktech is the largest DV Vendor and it will exceed in 2014 the half-billion dollars benchmark in revenue (probably closer to $600M by end of 2014) and will employ almost 2000 employees. Qlikview is one of the best DV product on market. I wish in 2014 Qlikview will create Cloud Services, similar to Tableau Online and Tableau Public and I wish Qlikview.Next will keep Qlikview Desktop Professional (in addition to HTML5 client).

I wish TIBCO will stop trying to improve BI or make it better – you cannot reanimate a dead horse; instead I wish Spotfire will embrace the approach “Data to the People” and act accordingly. For Spotfire my biggest wish is that TIBCO will spin it off the same way EMC did with VMWare. And yes, I wish Spofire Cloud Personal will be free and enabled to read at least local flat files and local DBs like Access.

2014 (or may be 2015?) can witness new, 4th DV player coming to competition: Datawatch bought recently Panopticon and if it will complete integration of all products correctly and add features which other DV vendors above already have (like Cloud Services), it can be very competitive player. I wish them luck!

TibxDataQlikQwchFrom051713To122413

Microsoft released in 2013 a lot of advanced and useful DV-related functionality and I wish (I recycling this wish for many years now) that Microsoft finally will package the most its Data Visualization Functionality in one DV product and add it to Office 20XX (like they did with Visio) and Office 365 instead of bunch of plug-ins to Excel and SharePoint.

It is a mystery for me why Panorama, Visokio and Advizor Solutions still relatively small players, despite all 3 of them having an excellent DV features and products. Based on 2013 IPO experience with Tableau may be the best way for them to go public and get new blood? I wish to them to learn from Tableau and Qlikview success and try this path in 2014-15…

For Microstrategy my wish is very simple – they are only traditional BI player who realised that BI is dead and they started in 2013 (actually before then 2013) a transition into DV market and I wish them all success they can handle!

I also think that a few thousands of Tableau, Qlikview and Spotfire customers (say 5% of customer base) will need (in 2014 and beyond) more deep Analytics and they will try to complement their Data Visualizations with Advanced Visualization technologies they can get from vendors like http://www.avs.com/

My best wishes to everyone! Happy New Year!

y16_84590563

Advertisements

Reading Pointers 2/13-5/13

to DV or to D3 – that is the question

The most popular (among business users) approach to visualization is to use a Data Visualization (DV) tool like Tableau (or Qlikview or Spotfire), where a lot of features already implemented for you. Recent prove of this amazing popularity is that at least 100 million people (as of February 2013),  used Tableau Public as their Data Visualization tool of choice, see

http://www.tableausoftware.com/about/blog/2013/2/crossing-100-million-milestone-21304

However, to make your documents and stories (and not just your data visualization applications) driven by your data, you may need the other approach – to code visualization of your data into your story and visualization libraries like  popular D3 toolkit can help you. D3 stands for “Data-Driven Documents”. The Author of D3 Mr. Mike Bostock designs interactive graphics for New York Times – one of latest samples is here:

http://www.nytimes.com/interactive/2013/02/20/movies/among-the-oscar-contenders-a-host-of-connections.html

and NYT allows him to do a lot of Open Source work which he demonstartes at his website here:

https://github.com/mbostock/d3/wiki/Gallery .

overview

Mike was a “visualization scientist” and a computer science PhD student at #Stanford University and member of famous group of people, now called “Stanford Visualization Group”:

http://vis.stanford.edu/people/

This Visualization Group was a birthplace of Tableau’s prototype – sometimes they called it  “a Visual Interface” for exploring data and other name for it is Polaris:

http://www.graphics.stanford.edu/projects/polaris/

and we know that creators of Polaris started Tableau Software. One of other Group’s popular “products” was a graphical toolkit (mostly in JavaScript, as oppose to Polaris, written in C++) for Visualization, called ProtoVis:

http://mbostock.github.com/protovis/

– and Mike Bostock was one of ProtoViz’s main co-authors. Less then 2 years ago Visualization Group suddenly stopped developing ProtoViz and recommended to everybody to switch to D3 library

https://github.com/mbostock,

authored by Mike. This library is Open Source (only 100KB in ZIP format) and can be downloaded from here:

http://d3js.org/d3.v3.zip

Cubism

In order to use D3, you need to be comfortable with HTML, CSS, SVG, Javascript programming, DOM (and other Web Standards); understanding of jQuery paradigm will be useful too. Basically if you want to be at least partially as good as Mike Bostock, you need to have a mindset of a programmer (I guess in addition to business user mindset), like this D3 expert:

http://www.jasondavies.com/

Most of successful early D3 adopters combining even 3+ mindsets: programmer, business analyst, data artist and even sometimes data storyteller. For your programmer’s mindset you may be interested to know that D3 has a large set of Plugins, see:

https://github.com/d3/d3-plugins

and rich #API, see https://github.com/mbostock/d3/wiki/API-Reference

You can find hundreds of D3 demos, samples, examples, tools, products and even a few companies using D3 here: https://github.com/mbostock/d3/wiki/Gallery

ChordDiagram705x235

Tableau as the front-end for Big Data

Big Data can be useless without multi-layer data aggregations, hierarchical or cube-like intermediary Data Structures, when ONLY a few dozens, hundreds or thousands data-points exposed visually and dynamically every single viewing moment to analytical eyes for interactive drill-down-or-up hunting for business value(s) and actionable datum (or “datums” – if plural means data). One of best expression of this concept (at least how I interpreted it) I heard from my new colleague who flatly said:

“Move the function to the data!”

I got recently involved with multiple projects using large data-sets for Tableau-based Data Visualizations (100+ millions of rows and even Billions of records!). Some of largest examples of their sizes I used were: 800+ millions of records and other was 2+ billions of rows.

So this blog post is to express my thoughts about such Big Data (in average examples above have about 1+ KB per CSV record before compression and other advanced DB tricks, like columnar Databases used by Data Engine of Tableau) as back-end for Tableau.

Here are some Factors involved into Data Delivery from main and designated Database (Back-ends like Teradata, DB2, SQL Server or Oracle) for Tableau-based Big Data Visualizations) into “local” Tableau Visualizations (many people still trying to use Tableau as a Reporting tool as oppose to (Visual) Analytical Tool:

  • Queuing thousands of Queries to Database Server. There is no guarantee your Tableau query will be executed immediately; in fact it WILL be delayed.
  • Speed of Tableau Query when it will start to be executed depends on sharing CPU cycles, RAM and other resources with other queries executed SIMULTANEOSLY with your query.
  • Buffers, pools and other resources available for particular user(s) and queries at your Database Server are different and depends on privileges and settings given to you as a Database User
  • Network speed: between some servers it can be 10Gbits (or even more), in most cases it is 1Gbit inside server rooms, outside of server rooms I observed in many old buildings (over wired Ethernet) max 100Mbits coming into user’s PC; in case if you using Wi-Fi it can be even less (say 54 Mbits?). If you are using internet it can be even less (I observed speed in some remote offices as 1 Mbit or so over old T-1 lines); if you using VPN it will max out at 4Mbits or less (I observed it in my home office).
  • Utilization of network. I use Remote Desktop Protocol – RDP to VM (from my workstation or notebook; (VM or VDI Virtual Machine, sitting in server room) and connected to servers with network speed of 1Gbit, but it still using maximum 3% of network speed (about 30 MBits, which is about 3 Megabytes of data per second, which is probably about few thousands of records per seconds.

That means that network may have a problem to deliver 100 millions of records to “local” report overnight (say 10 hours, 10 millions of records per hour, 3000 records per second) – partially and probably because of factors 4 above.

On top of those factors please keep in mind that Tableau is a set of 32-bit applications (with exception of one out of 7 processes on Server side), which is restricted to 2GB of RAM; if data-set cannot fit into RAM, than Tableau Data Engine will use the disk as Virtual RAM, which is much, much slower and for some users such disk space actually not local to his/her workstation and mapped to some “remote” network file server.

Tableau desktop is using in many cases 32-bit ODBC drivers, which may even add more delay into data delivery into local “Visual Report”. As we learned from Tableau support itself, even with latest Tableau Server 7.0.X, the RAM allocated for one user session restricted to 3GB anyway.

Unfortunate Update: Tableau 8.0 will be 32-bit application again, but may be follow up version 8.x or 9 (I hope) will be ported to 64-bits… It means that Spotfire, Qlikview and even PowerPivot will keep some advantages over Tableau for a while…

Tableau as a Container and as PowerPoint replacement

Often I used small Tableau workbooks instead of PowerPoint, which are proving at least 2 concepts:

  • Tableau can be used as the Web or Desktop Container for Multiple Data Visualizations (it can be used to build a hierarchical Container Structures with more then 3 levels; currently 3: Container-Workbooks-Views)
  • It can be used as the replacement for PowerPoint; in example I embedded into this Container 2 Tableau Workbooks, one Google-based Data Visualization, 3 image-based Slides and Textual Slide: http://public.tableausoftware.com/views/TableauInsteadOfPowerPoint/1-Introduction
  • Tableau is better then PowerPoint for Presentations and Slides
  • Tableau is the Desktop and the Web Container for Web Pages, Slides, Images, Texts
  • Tableau is a Container for Tableau-based Data Visualizations
  • Sample Tableau Presentation above contains Introductory Textual Slide
  • Sample Tableau Presentation above  contains a few Tableau Visualization:
    • DrillDown Demo

Incidents

    • Motion Chart Demo ( 6 dimensions: X,Y, Shape, Color, Size, Motion in Time)

Motion

  • This Tableau Presentation contains a Web Page with Google-based Motion Chart Demo
  • This Tableau Presentation contains a few Image-based Slides:
    • Quick Description of Origins and Evolution of Software and Tools used for Data Visualizations during last 30+ years
    • Description of Multi-level Projection from Multidimensional Data Cloud to Datasets, Multidimensional Cubes and to Chart
    • Description of 6 stages of Software Development Life Cycle for Data Visualizations

Palettes and Colors

I was always intrigued with colors and their usage, since my mom told me that may be (just may be, there is no direct prove of it anyway) Ancient Greeks did not know what the BLUE color is – that puzzled me.

Later in my live, I realized that Colors and Palettes are playing the huge role in Data Visualization (DV) and it eventually led me to attempt to understand of how it can be used and pre-configured in advanced DV tools to make Data more Visible and to express the Data Patterns better. For this post I used Tableau to produce some palettes, but similar technique can be found in Qlikview, Spotfire etc.

Tableau published the good article of how to create customized palettes here: http://kb.tableausoftware.com/articles/knowledgebase/creating-custom-color-palettes and I followed it below. As this article recommended, I modified default Preferences.tps file; see it below with images of respective Palettes embedded.

For the first, regular Red-Yellow-Green-Blue Palette with known colors with well-established names, I created even a Visualization in order to compare their Red-Green-Blue components and I even tried to placed respective Bubbles on 2-dimensional surface, even originally it is clearly a 3 dimensional Dataset (click on image to see it in full size):

For the 2nd Red-Yellow-Green-NoBlue Ordered Sequential Palette, I tried to implement the extended “Set of Traffic Lights without any trace of BLUE Color” (so Homer and Socrates will understand it the same way as we are) while trying to use only web-safe colors. Please keep in mind, that Tableau does not have a simple way to have more than 20 colors in one Palette, like Spotfire does.

Other 5 Palettes below are useful too as ordered-diverging almost “mono-chromatic” (except Red-Green Diverging, since it can be used in Scorecards when Red is bad and Green is good). So see below Preferences.tps file with my 7 custom palettes.

<?xml version=’1.0′?> <workbook> <preferences>
<color-palette name=”RegularRedYellowGreenBlue” type=”regular”>
<color>#FF0000</color> <color>#800000</color> <color>#B22222</color>
<color>#E25822</color> <color>#FFA07A</color> <color>#FFFF00</color>
<color>#FF7E00</color> <color>#FFA500</color> <color>#FFD700</color>
<color>#F0e68c</color> <color>#00FF00</color> <color>#008000</color>
<color>#00A877</color> <color>#99cc33</color> <color>#009933</color>
<color>#0000FF</color> <color>#00FFFF</color> <color>#008080</color>
<color>#FF00FF</color> <color>#800080</color>

</color-palette>

<color-palette name=”RedYellowGreenNoBlueOrdered” type=”ordered-sequential” >
<color>#ff0000</color> <color>#cc6600</color> <color>#cccc00</color>
<color>#ffff00</color> <color>#99cc00</color> <color>#009900</color>

</color-palette>

<color-palette name=”RedToGreen” type=”ordered-diverging” >
<color>#ff0000</color> <color>#009900</color> </color-palette>

<color-palette name=”RedToWhite” type=”ordered-diverging” >
<color>#ff0000</color> <color>#ffffff</color></color-palette>

<color-palette name=”YellowToWhite” type=”ordered-diverging” >
<color>#ffff00</color> <color>#ffffff</color></color-palette>

<color-palette name=”GreenToWhite” type=”ordered-diverging” >
<color>#00ff00</color> <color>#ffffff</color></color-palette>

<color-palette name=”BlueToWhite” type=”ordered-diverging” >
<color>#0000ff</color> <color>#ffffff</color> </color-palette>
</preferences> </workbook>

In case if you wish to use the colors you like, this site is very useful to explore the properties of different colors: http://www.perbang.dk/rgb/

Free Tableau Reader enables Server-less Visualization!

Tableau made a couple of brilliant decisions to completely outsmart its competitors and gained extreme popularity, while convincing millions of potential, future and current customers to invest own time to learn Tableau. 1st reason of course is Tableau Public (we discuss it in separate blog post) and other is a Free Tableau Reader, which provides full desktop user experience and interactive Data Visualization without any Tableau Server (and any other server) involved and with better performance and UI then Server-based Visualizations.

While designing Data Visualizations is done with Tableau Desktop, most users got their Data Visualizations served by Tableau Server to their Web Browser. However in the large and small organizations that usage pattern is not always the best fit. Below I am discussing a few possible use cases, where the usage of Free Tableau Reader can be appropriate, see it here: http://www.tableausoftware.com/products/reader .

1. Tableau Application Server serves Visualizations well, but not as well as Tableau Reader, because Tableau Reader delivers a truly desktop User Experience and UI. Most known example of it is a Motion Chart: you can see automatic motion with Tableau Reader but Web Browser will force user to manually emulate motion. In cases like that user advised to download workbook, copy .TWBX file to his/her workstation and open it with Tableau Reader.

Here is an example of the Motion Chart, done in Tableau, similar to famous Hans Rosling’s presentation of Gapminder’s Motion Chart (an you need the free Tableau Reader or license to Tableau Desktop to see the automatic motion of the 6-dimensional dataset with all colored bubbles, resizing over time):
http://public.tableausoftware.com/views/MotionChart_0/Motion?:embed=y

Please note that the same Motion Chart using Google Spreadsheets will run in browser just fine (I guess because Google “bought” Gapminder and kept its code intact):
https://docs.google.com/spreadsheet/ccc?key=0AuP4OpeAlZ3PdC14OXU1RGJsV05uaDlxRV9GLXlTZXc#gid=2

2. When you have hundreds or thousands of Tableau Server users and more then couple of Admins (users with Administrative privileges), each of Admins can override viewing privileges for any workbook, regardless of designated for that workbook Users and User Groups. In such situation there is a  risk for violation of privacy and confidentiality of data involved, for example for HR Analytics and HR Dashboards and other Visualizations where private, personal and confidential data used.

Tableau Reader enables additional complementary method of delivering Data Visualizations through private channels like password-protected portals, file servers and FTP servers and in certain cases even by-passing Tableau Server entirely.

3. Due popularity of Tableau and ease of use, many groups and teams are considering Tableau as vehicle to delivering of hundreds and even thousands of Visual Reports to hundreds and may be even thousands of users. That can slow down Tableau Server, decrease user experience and create even more confidentiality problems, because it may expose confidential data to unintended users, like report for one store to users from another store.

4. Many small (and not so small either) organizations trying to save on Tableau Server licenses (at least initially) and they still can distribute Tableau-based Data Visualizations; developer(s) will have Tableau Desktop (relatively small investment) and users, clients and customers will use Tableau Reader, while all TWBX files can be distributed over FTP, portals or file servers or even by email. In my experience, when Tableau-based business will grow enough, it will pay  by itself for buying licenses for Tableau Server, so usage of Tableau Reader in n o way is threat to Tbaleau Software bottom line!

Update (12/12/12) for even more happy usage of Tableau Reader: in upcoming Tableau 8 all Tableau Data Extracts – TDEs – can be created and used without any Tableau Server involved. Instead Developer can create/update TDE either with Tableau in UI mode or using Tableau Command Line Interface and script TDEs in batch mode or programmatically with new TDE API (Python, C/C++, Java). It means that Tableau workbooks can be automatically refreshed with new data without any Tableau Server and re-delivered to Tableau Reader users over … FTP, portals or file servers or even by email.