- Stephen Few asking you a tricky question: Are You a Data Scientist?: http://www.perceptualedge.com/blog/?p=1719 . It seems to me that Stephen agrees with me that 3 Terms: Data Scientist, Data Science and Big Data are full of BS… or as our VP will say: “a Bunch of Malarkey”…
- 3 very important articles by Russel Christopher about History Tables in Tableau 8:
- A funny “Preview” by Stephen Few of Tableau 9: Gauges?!: http://www.perceptualedge.com/blog/?p=1706
- Leverage PowerShell and Tableau to Extract Server Datasources: https://www.interworks.com/blogs/mroberts/2013/05/28/leverage-powershell-and-tableau-extract-server-datasources
- Tableau 7 data dictionary: http://tableaulove.tumblr.com/post/50438516817/better-late-than-never-tableau-7-data-dictionary
- Collection of Good Practices, provided by Tableau: : http://www.tableausoftware.com/public/community/best-practices
- Tableau Server has many built in features to promote data exploration, collaboration, and security. The Data Server is arguably the most powerful of these tools but is commonly overlooked and underutilized. Answer these questions to see how the Data Server can save you time and increase productivity. http://www.tableausoftware.com/about/blog/2013/4/unleash-tableau-data-server-23038
- Using Tableau 8 to Leverage Your ‘Big-ish’ Data: https://www.interworks.com/blogs/mroberts/2013/04/08/using-tableau-8-leverage-your-big-ish-data
- “New in 8: Word Clouds”: http://www.tableausoftware.com/public/blog/2013/02/new-8-word-clouds-1825
- “What Makes a Chart Boring?”: http://www.perceptualedge.com/blog/?p=1612
- Emotional reactions to Tableau 8: http://www.datarevelations.com/dylans-gone-electric-emotional-reactions-to-tableau-8.html
- Leverage Multiple Tableau Data Extracts for Big Data in Tableau 8: : https://www.interworks.com/blogs/mroberts/2013/04/25/leverage-multiple-tableau-data-extracts-big-data-tableau-8
- Leverage the Tableau 8 Tableau Data Extract Command-Line Utility Without Tabcmd : https://www.interworks.com/blogs/mroberts/2013/04/15/leverage-tableau-8-tableau-data-extract-command-line-utility-without-tabcm
- Very popular article from Stephen Few about Tableau losing the clear vision of its youth collected 49 comments from known experts : http://www.perceptualedge.com/blog/?p=1532 and most interesting, 50th comment came from Stephen fimself: “I recently learned that when my review of Tableau 8 was published, Tableau employees were forbidden from responding publicly”.
- UK Boundary Polygon Data from Information Lab: http://www.theinformationlab.co.uk/2013/03/25/uk-area-polygon-mapping-in-tableau/
- Sample of how to use DataExtract API with C#/.NET: http://community.tableausoftware.com/message/203714#203714
- Tableau Data Blending, Sparse Data, Multiple Levels of Granularity, and Improvements in Version 8: http://drawingwithnumbers.artisart.org/tableau-data-blending-sparse-data-multiple-levels-of-granularity-and-improvements-in-version-8/
- Tableau Custom SQL connections through the JET driver: https://www.interworks.com/blogs/jwright/2013/02/22/musings-tableau-custom-sql-connections-through-jet-driver
- Software company XL Cubed has incorporated Bandlines into their product, but comments to post showing that Tableau has it fro a while too: http://www.perceptualedge.com/blog/?p=1485
- Stepheb Few introduced Bandlines as Sparklines Enriched with Information about Magnitude and Distribution: http://www.perceptualedge.com/articles/visual_business_intelligence/introducing_bandlines.pdf
The most popular (among business users) approach to visualization is to use a Data Visualization (DV) tool like Tableau (or Qlikview or Spotfire), where a lot of features already implemented for you. Recent prove of this amazing popularity is that at least 100 million people (as of February 2013), used Tableau Public as their Data Visualization tool of choice, see
However, to make your documents and stories (and not just your data visualization applications) driven by your data, you may need the other approach – to code visualization of your data into your story and visualization libraries like popular D3 toolkit can help you. D3 stands for “Data-Driven Documents”. The Author of D3 Mr. Mike Bostock designs interactive graphics for New York Times – one of latest samples is here:
and NYT allows him to do a lot of Open Source work which he demonstartes at his website here:
Mike was a “visualization scientist” and a computer science PhD student at #Stanford University and member of famous group of people, now called “Stanford Visualization Group”:
This Visualization Group was a birthplace of Tableau’s prototype – sometimes they called it “a Visual Interface” for exploring data and other name for it is Polaris:
– and Mike Bostock was one of ProtoViz’s main co-authors. Less then 2 years ago Visualization Group suddenly stopped developing ProtoViz and recommended to everybody to switch to D3 library
authored by Mike. This library is Open Source (only 100KB in ZIP format) and can be downloaded from here:
Most of successful early D3 adopters combining even 3+ mindsets: programmer, business analyst, data artist and even sometimes data storyteller. For your programmer’s mindset you may be interested to know that D3 has a large set of Plugins, see:
and rich #API, see https://github.com/mbostock/d3/wiki/API-Reference
You can find hundreds of D3 demos, samples, examples, tools, products and even a few companies using D3 here: https://github.com/mbostock/d3/wiki/Gallery
Big Data can be useless without multi-layer data aggregations, hierarchical or cube-like intermediary Data Structures, when ONLY a few dozens, hundreds or thousands data-points exposed visually and dynamically every single viewing moment to analytical eyes for interactive drill-down-or-up hunting for business value(s) and actionable datum (or “datums” – if plural means data). One of best expression of this concept (at least how I interpreted it) I heard from my new colleague who flatly said:
“Move the function to the data!”
I got recently involved with multiple projects using large data-sets for Tableau-based Data Visualizations (100+ millions of rows and even Billions of records!). Some of largest examples of their sizes I used were: 800+ millions of records and other was 2+ billions of rows.
So this blog post is to express my thoughts about such Big Data (in average examples above have about 1+ KB per CSV record before compression and other advanced DB tricks, like columnar Databases used by Data Engine of Tableau) as back-end for Tableau.
Here are some Factors involved into Data Delivery from main and designated Database (Back-ends like Teradata, DB2, SQL Server or Oracle) for Tableau-based Big Data Visualizations) into “local” Tableau Visualizations (many people still trying to use Tableau as a Reporting tool as oppose to (Visual) Analytical Tool:
- Queuing thousands of Queries to Database Server. There is no guarantee your Tableau query will be executed immediately; in fact it WILL be delayed.
- Speed of Tableau Query when it will start to be executed depends on sharing CPU cycles, RAM and other resources with other queries executed SIMULTANEOSLY with your query.
- Buffers, pools and other resources available for particular user(s) and queries at your Database Server are different and depends on privileges and settings given to you as a Database User
- Network speed: between some servers it can be 10Gbits (or even more), in most cases it is 1Gbit inside server rooms, outside of server rooms I observed in many old buildings (over wired Ethernet) max 100Mbits coming into user’s PC; in case if you using Wi-Fi it can be even less (say 54 Mbits?). If you are using internet it can be even less (I observed speed in some remote offices as 1 Mbit or so over old T-1 lines); if you using VPN it will max out at 4Mbits or less (I observed it in my home office).
- Utilization of network. I use Remote Desktop Protocol – RDP to VM (from my workstation or notebook; (VM or VDI Virtual Machine, sitting in server room) and connected to servers with network speed of 1Gbit, but it still using maximum 3% of network speed (about 30 MBits, which is about 3 Megabytes of data per second, which is probably about few thousands of records per seconds.
That means that network may have a problem to deliver 100 millions of records to “local” report overnight (say 10 hours, 10 millions of records per hour, 3000 records per second) – partially and probably because of factors 4 above.
On top of those factors please keep in mind that Tableau is a set of 32-bit applications (with exception of one out of 7 processes on Server side), which is restricted to 2GB of RAM; if data-set cannot fit into RAM, than Tableau Data Engine will use the disk as Virtual RAM, which is much, much slower and for some users such disk space actually not local to his/her workstation and mapped to some “remote” network file server.
Tableau desktop is using in many cases 32-bit ODBC drivers, which may even add more delay into data delivery into local “Visual Report”. As we learned from Tableau support itself, even with latest Tableau Server 7.0.X, the RAM allocated for one user session restricted to 3GB anyway.
Unfortunate Update: Tableau 8.0 will be 32-bit application again, but may be follow up version 8.x or 9 (I hope) will be ported to 64-bits… It means that Spotfire, Qlikview and even PowerPivot will keep some advantages over Tableau for a while…
Often I used small Tableau workbooks instead of PowerPoint, which are proving at least 2 concepts:
- Tableau can be used as the Web or Desktop Container for Multiple Data Visualizations (it can be used to build a hierarchical Container Structures with more then 3 levels; currently 3: Container-Workbooks-Views)
- It can be used as the replacement for PowerPoint; in example I embedded into this Container 2 Tableau Workbooks, one Google-based Data Visualization, 3 image-based Slides and Textual Slide: http://public.tableausoftware.com/views/TableauInsteadOfPowerPoint/1-Introduction
- Tableau is better then PowerPoint for Presentations and Slides
- Tableau is the Desktop and the Web Container for Web Pages, Slides, Images, Texts
- Tableau is a Container for Tableau-based Data Visualizations
- Sample Tableau Presentation above contains Introductory Textual Slide
- Sample Tableau Presentation above contains a few Tableau Visualization:
- DrillDown Demo
- Motion Chart Demo ( 6 dimensions: X,Y, Shape, Color, Size, Motion in Time)
- This Tableau Presentation contains a Web Page with Google-based Motion Chart Demo
- This Tableau Presentation contains a few Image-based Slides:
- Quick Description of Origins and Evolution of Software and Tools used for Data Visualizations during last 30+ years
- Description of Multi-level Projection from Multidimensional Data Cloud to Datasets, Multidimensional Cubes and to Chart
- Description of 6 stages of Software Development Life Cycle for Data Visualizations
I was always intrigued with colors and their usage, since my mom told me that may be (just may be, there is no direct prove of it anyway) Ancient Greeks did not know what the BLUE color is – that puzzled me.
Later in my live, I realized that Colors and Palettes are playing the huge role in Data Visualization (DV) and it eventually led me to attempt to understand of how it can be used and pre-configured in advanced DV tools to make Data more Visible and to express the Data Patterns better. For this post I used Tableau to produce some palettes, but similar technique can be found in Qlikview, Spotfire etc.
Tableau published the good article of how to create customized palettes here: http://kb.tableausoftware.com/articles/knowledgebase/creating-custom-color-palettes and I followed it below. As this article recommended, I modified default Preferences.tps file; see it below with images of respective Palettes embedded.
For the first, regular Red-Yellow-Green-Blue Palette with known colors with well-established names, I created even a Visualization in order to compare their Red-Green-Blue components and I even tried to placed respective Bubbles on 2-dimensional surface, even originally it is clearly a 3 dimensional Dataset (click on image to see it in full size):
For the 2nd Red-Yellow-Green-NoBlue Ordered Sequential Palette, I tried to implement the extended “Set of Traffic Lights without any trace of BLUE Color” (so Homer and Socrates will understand it the same way as we are) while trying to use only web-safe colors. Please keep in mind, that Tableau does not have a simple way to have more than 20 colors in one Palette, like Spotfire does.
Other 5 Palettes below are useful too as ordered-diverging almost “mono-chromatic” (except Red-Green Diverging, since it can be used in Scorecards when Red is bad and Green is good). So see below Preferences.tps file with my 7 custom palettes.
<?xml version=’1.0′?> <workbook> <preferences>
<color-palette name=”RegularRedYellowGreenBlue” type=”regular”>
<color>#FF0000</color> <color>#800000</color> <color>#B22222</color>
<color>#E25822</color> <color>#FFA07A</color> <color>#FFFF00</color>
<color>#FF7E00</color> <color>#FFA500</color> <color>#FFD700</color>
<color>#F0e68c</color> <color>#00FF00</color> <color>#008000</color>
<color>#00A877</color> <color>#99cc33</color> <color>#009933</color>
<color>#0000FF</color> <color>#00FFFF</color> <color>#008080</color>
<color-palette name=”RedYellowGreenNoBlueOrdered” type=”ordered-sequential” >
<color>#ff0000</color> <color>#cc6600</color> <color>#cccc00</color>
<color>#ffff00</color> <color>#99cc00</color> <color>#009900</color>
In case if you wish to use the colors you like, this site is very useful to explore the properties of different colors: http://www.perbang.dk/rgb/
Tableau made a couple of brilliant decisions to completely outsmart its competitors and gained extreme popularity, while convincing millions of potential, future and current customers to invest own time to learn Tableau. 1st reason of course is Tableau Public (we discuss it in separate blog post) and other is a Free Tableau Reader, which provides full desktop user experience and interactive Data Visualization without any Tableau Server (and any other server) involved and with better performance and UI then Server-based Visualizations.
While designing Data Visualizations is done with Tableau Desktop, most users got their Data Visualizations served by Tableau Server to their Web Browser. However in the large and small organizations that usage pattern is not always the best fit. Below I am discussing a few possible use cases, where the usage of Free Tableau Reader can be appropriate, see it here: http://www.tableausoftware.com/products/reader .
1. Tableau Application Server serves Visualizations well, but not as well as Tableau Reader, because Tableau Reader delivers a truly desktop User Experience and UI. Most known example of it is a Motion Chart: you can see automatic motion with Tableau Reader but Web Browser will force user to manually emulate motion. In cases like that user advised to download workbook, copy .TWBX file to his/her workstation and open it with Tableau Reader.
Here is an example of the Motion Chart, done in Tableau, similar to famous Hans Rosling’s presentation of Gapminder’s Motion Chart (an you need the free Tableau Reader or license to Tableau Desktop to see the automatic motion of the 6-dimensional dataset with all colored bubbles, resizing over time):
Please note that the same Motion Chart using Google Spreadsheets will run in browser just fine (I guess because Google “bought” Gapminder and kept its code intact):
2. When you have hundreds or thousands of Tableau Server users and more then couple of Admins (users with Administrative privileges), each of Admins can override viewing privileges for any workbook, regardless of designated for that workbook Users and User Groups. In such situation there is a risk for violation of privacy and confidentiality of data involved, for example for HR Analytics and HR Dashboards and other Visualizations where private, personal and confidential data used.
Tableau Reader enables additional complementary method of delivering Data Visualizations through private channels like password-protected portals, file servers and FTP servers and in certain cases even by-passing Tableau Server entirely.
3. Due popularity of Tableau and ease of use, many groups and teams are considering Tableau as vehicle to delivering of hundreds and even thousands of Visual Reports to hundreds and may be even thousands of users. That can slow down Tableau Server, decrease user experience and create even more confidentiality problems, because it may expose confidential data to unintended users, like report for one store to users from another store.
4. Many small (and not so small either) organizations trying to save on Tableau Server licenses (at least initially) and they still can distribute Tableau-based Data Visualizations; developer(s) will have Tableau Desktop (relatively small investment) and users, clients and customers will use Tableau Reader, while all TWBX files can be distributed over FTP, portals or file servers or even by email. In my experience, when Tableau-based business will grow enough, it will pay by itself for buying licenses for Tableau Server, so usage of Tableau Reader in n o way is threat to Tbaleau Software bottom line!
Update (12/12/12) for even more happy usage of Tableau Reader: in upcoming Tableau 8 all Tableau Data Extracts – TDEs – can be created and used without any Tableau Server involved. Instead Developer can create/update TDE either with Tableau in UI mode or using Tableau Command Line Interface and script TDEs in batch mode or programmatically with new TDE API (Python, C/C++, Java). It means that Tableau workbooks can be automatically refreshed with new data without any Tableau Server and re-delivered to Tableau Reader users over … FTP, portals or file servers or even by email.