You need to
start thinking about performance right from the start of your design. If you
leave it to the end, it is probably too late.
Basic
principles – “it sounds like I’m being a parent. I’m just being practical”
1. Everything in moderation
2. If it isn’t fast in the database, it
won’t be in Tableau (unless you are using Extracts)
3. If it isn’t fast in desktop, it won’t
be fast in Server
.tde’s up
to a few hundred million rows of data – don’t replace your data warehousing
solutions
Flat files
are opened in a temporary location and therefore doesn’t make anything faster.
It’s using RAM. Use an extract to apply indexing.
Server will
only beat Desktop when you are hitting the server cache (remember folks, server
caching has improved a lot in v9)
4 major
processes in desktop:
1. Connect to data
a. Native connection vs Generic ODBC
(use the driver so it is fast and robust)
i. Slow loads could be to a lack of
referential integrity
ii. Custom SQL is respected by Tableau
and avoids join culling etc
2. Executing Query
a. Aggregations, Calculated Fields and
Filters
b. Calcs – use Boolean instead of IF?
Remove String manipulation and DATEPART()
c. Filters – often the culprit of slow
performance
3. Computing Layout
a. Marks, Table Calcs and Sorting
b. Adding labels and working out if the
labels are overlapping that is likely to take a long time
c. Table Calcs are happening locally so
consider pushing back to the data source
4. Computing Quick Filters
a. If something isn’t likely to change
than having to populate the list of filter options. Dropdowns and wildcard are
better as they don’t need to be pre-populated.
Visual
Pipeline
Query >
Data > Layout > Render
1. Query – query database, cache
results
2. Data – Local data joins (location
data from Tableau joining together with data set), Local calcs, local filters,
Totals, Forecasting, Table Calcs, 2nd Pass filters, Sort
3. Layout – Layout Views, Compute
Legends, Encode marks
4. Render – Marks, Selection,
Highlighting, Labels
Parallel
aggregations in v9 really make a difference
External
query cache (aka persistent query cache) – the cache is being written to the
disk
Multiple
data engines – have helped but Query Fusion will assist by working out the
common dimensions / aggregations and then working out locally what data is
needed for each visualisation
The visual
pipeline allows you think about what is happening.
To put the
measure on level of detail will help with speed of interactivity
Mrunal uses
144 million rows of flight data to explore performance issues
-
Shows
full list for filtering (expensive) and three quick filters (all having to be
queried for each stage)
-
Relative
date filter or range date filters are faster than date part filters
Using views
for filters improves performance and the use of dashboard actions make life
faster
Adding
parameterised filter to the data source moves it up in the order of operations
making your data source smaller, sooner
Mrunal and
I will disagree about what the better User Experience is between filters and
actions. When labelled well, I personally think dashboard actions make for a
lot better experience and keeps you focused on the dashboard rather than the
tool.
Aggregate
to ‘Visible Dimensions’ is a great data granularity saver. ‘Hide All Unused
Fields’ make the data set thinner.
This comment has been removed by the author.
ReplyDeletevery informative blog and useful article thank you for sharing with us , keep posting learn more Data Science online Course
ReplyDelete