You need to start thinking about performance right from the start of your design. If you leave it to the end, it is probably too late.
Basic principles – “it sounds like I’m being a parent. I’m just being practical”
1. Everything in moderation
2. If it isn’t fast in the database, it won’t be in Tableau (unless you are using Extracts)
3. If it isn’t fast in desktop, it won’t be fast in Server
.tde’s up to a few hundred million rows of data – don’t replace your data warehousing solutions
Flat files are opened in a temporary location and therefore doesn’t make anything faster. It’s using RAM. Use an extract to apply indexing.
Server will only beat Desktop when you are hitting the server cache (remember folks, server caching has improved a lot in v9)
4 major processes in desktop:
1. Connect to data
a. Native connection vs Generic ODBC (use the driver so it is fast and robust)
i. Slow loads could be to a lack of referential integrity
ii. Custom SQL is respected by Tableau and avoids join culling etc
2. Executing Query
a. Aggregations, Calculated Fields and Filters
b. Calcs – use Boolean instead of IF? Remove String manipulation and DATEPART()
c. Filters – often the culprit of slow performance
3. Computing Layout
a. Marks, Table Calcs and Sorting
b. Adding labels and working out if the labels are overlapping that is likely to take a long time
c. Table Calcs are happening locally so consider pushing back to the data source
4. Computing Quick Filters
a. If something isn’t likely to change than having to populate the list of filter options. Dropdowns and wildcard are better as they don’t need to be pre-populated.
Query > Data > Layout > Render
1. Query – query database, cache results
2. Data – Local data joins (location data from Tableau joining together with data set), Local calcs, local filters, Totals, Forecasting, Table Calcs, 2nd Pass filters, Sort
3. Layout – Layout Views, Compute Legends, Encode marks
4. Render – Marks, Selection, Highlighting, Labels
Parallel aggregations in v9 really make a difference
External query cache (aka persistent query cache) – the cache is being written to the disk
Multiple data engines – have helped but Query Fusion will assist by working out the common dimensions / aggregations and then working out locally what data is needed for each visualisation
The visual pipeline allows you think about what is happening.
To put the measure on level of detail will help with speed of interactivity
Mrunal uses 144 million rows of flight data to explore performance issues
- Shows full list for filtering (expensive) and three quick filters (all having to be queried for each stage)
- Relative date filter or range date filters are faster than date part filters
Using views for filters improves performance and the use of dashboard actions make life faster
Adding parameterised filter to the data source moves it up in the order of operations making your data source smaller, sooner
Mrunal and I will disagree about what the better User Experience is between filters and actions. When labelled well, I personally think dashboard actions make for a lot better experience and keeps you focused on the dashboard rather than the tool.
Aggregate to ‘Visible Dimensions’ is a great data granularity saver. ‘Hide All Unused Fields’ make the data set thinner.