Technical details of how a Tableau data extract works?
In the first section, we learn how to create data extract in Tableau. To further understand how data extract works and make the best use of it, let’s learn about the technical details of how data extract works.
Tableau data extract’s design principle
A Tableau extract (.tde) file is a compressed snapshot of data extracted from a large variety of original data sources (Excel, databases, Salesforce, NoSQL and so on). It is stored on disk and loaded into memory as required to create a Tableau visualization.
There are two design principles of the Tableau extract that make it ideal for data analytics.
- The first principle is that Tableau extract is a columnar store. The columnar databases store column values rather than row values. The benefit is that the input/output time required to access/aggregate the values in a column is significantly reduced. This is why Tableau extract is great for data analytics.
- Tableau Data Export
- The second principle is about how a Tableau extract is structured to make sure it makes the best use of your computer’s memory. This will impact how it is loaded into memory and used by Tableau. To understand this principle better, we need to understand how Tableau extract is created and used as the data source to create the visualization.
When Tableau creates data extract, it defines the structure of the .tde file and creates separate files for each column in the original data source. When Tableau retrieves data from the original data source, it sorts, compresses, and adds the values for each column to their file. After that, individual column files are combined with metadata to form a single file with as many individual memory-mapped files as there are the columns in the original data source. tableau online training
Because a Tableau data extract file is a memory-mapped file, when Tableau requests data from a .tde file, the data is loaded directly into the memory by the operating system. Tableau does not have to open, process, or decompress the file. If needed, the operating system continues to move data in and out of RAM to ensure that all of the requested data is made available to Tableau. It means that Tableau can query data that is bigger than the RAM on the computer.
For example, in the superstore dataset, you want to calculate the sum of profit for each product category. In the traditional data table, you have to go through each row to get the value of profit, and then sum them up for each product category. In TDE files, only columns Product Category and Profit are loaded into the memory. You can get the profit value straight line from one column.