Explore one of the most important concepts in Prism — the data tables! Learn how to use Prism’s data tables to make it easy and intuitive to perform the right analysis and create the graphs you want to see, and ultimately save you lots of time.
You will learn how to:
Explore one of the most important concepts in Prism — the data tables! Learn how to use Prism’s data tables to make it easy and intuitive to perform the right analysis and create the graphs you want to see, and ultimately save you lots of time.
You will learn how to:
This video is part of the Getting Started series, presented by Dr. Trajen Head, Product Manager for GraphPad Prism.
Hello, and welcome. My name is Dr. Trajen Head and I'm the Product Manager for Prism at Graph Pad Software. In this video we're going to explore one of the most important aspect of Prisms. That is the concept of data tables and how to effectively utilize them. One of the things that really makes Prism unique and that sets it apart from other applications, is the concept of the data table. Generic spreadsheets like Excel don't offer any structure at all. They make no assumptions about your data, and you can enter your data however you'd like, but you'll have to manually define, which data are your variables and how those variables are related to each other. Other programs provide a single form of structured data table that organizes your variables, but it's the same structure for every type of experiment, regardless of how those data are related to each other. Move over, you'll still have to manually identify the relationships of those variables to each other each time you want to perform an analysis. In some cases, these programs require that you perform this whole process using a command line in a complex coding language.
Prism is different. Prism offers eight different types of data tables that are specifically tailored to house your data in a format that makes it easy to perform the analysis you want to run and create the graphs you want to see. Because Prisms data tables are unique, it's important to understand why they're important and how they work with your unique data. In this video, we'll cover what you need to know about Prism's data tables including what each of the data tables are and the unique structure of each. We'll look at how each data table can be used for your data and the types of analysis that can be performed from each. I'll also be providing a few demonstrations of some of the data tables using some simple data sets to help you get more comfortable with the concept of data tables and how to effectively utilize them within.
As this point, you may be wondering why are there different structures for each of the data tables in the first place? When we perform experiments and collect data, the data we collect possess and inherent organization based on the connections and relationships that each data point has with other points in the data set. Prism's data tables reflect these relationships and are structured in a way that allow you to organize your data based on those inherent relationships, so you can perform appropriate analysis. In fact, each table type has a default set of analysis and graphs appropriate for the type of data the table was designed for. This helps take some of the guesswork out of your statistical analysis and helps you generate the most appropriate graphs and visualizations for your data. Understanding the relationships in your data and how Prism's data tables utilize them will ultimately make Prism much more intuitive and save you lots of time.
If you spent any time in Prism, this screen will look familiar. This is the Welcome screen and it is the first thing you see when you launch Prism. The first tip that I have for you is not to just mindlessly click through options to get to data entry. There are a lot of important options on the welcome screen that you should consider before moving on to data entry. First, you'll want to choose the type of data table you want to use. You can see that each table type, such as grouped, column, or XY are all listed on the left side of the Welcome screen. For each table type, an example of the appearance of the table and a representative graph are shown at the top of the Welcome screen.
Importantly, at the very tope of the Welcome screen you'll also see a simple explanation of the relationship in the data that the table is targeting. However, I will point out that you shouldn't try to pick a data table based on the type of graph you want to create. There isn't always a one to one match for data tables and graph types. This approach can lead to confusion. You can see here that five different types of data tables can be used to make bar graphs. Instead, you should focus on understanding the underlying relationships in your data and which data table corresponds to targeting that organization.
Going back to the Welcome screen, each data table type has it's own set of options for more specific structuring to accommodate your data that you should consider before proceeding. These options in some cases allow you to specify the number of replicate values to enter, and also allow you to enter data as averages with error and sample size instead of wrong data. However, it's generally best if you have the raw data, to enter it directly. Prism can and will easily generate these averages and error values for you.
Finally, there are two important help features on the Welcome screen, to help you get comfortable with Prism. The first of the tutorial data sets that are available for every type of data table. Each tutorial data set will provide tips and instructions on how to use the data table, the goal of each tutorial experiment and how to perform particular analysis related to that tutorial data set. The other option is the Learn More button. If at any time you need more help on how the data tables are structured, or information about the data tables, you can click the learn more button to get extensive in depth information from our online help material.
Let's start by looking at XY tables. XY tables are used to examine the relationship between continuous variables. Therefore, each data point you collect needs to be defined by an X and a Y value or coordinate. Often, data for this table type takes the form of X values representing some independent variable and Y values representing some dependent variable, allowing you to perform linear or non-linear regression to examine the relationship of the two variables. However, you may have also measured data where the terms dependent and independent don't really make much sense. Let's say you measured the height and GPA for a group of students and you wanted to investigate a potential relationship between these two variables. Neither of these values will be dependent on the other, but the underlying relationship with the data would be that each point is defined by a continuous X and Y value so you would still use an XY table.
Look at the structure of an XY table, you can see that it's relatively straightforward. The first column is for row titles and is optional, but can allow you to enter labels to identify specific data points. The next column is for the X values. There is only one X column per XY table, but if you choose to enter X error values from the Welcome screen, this column will have two sub-columns. One for the X value and one for the error value. Each remaining column represents your Y values, with the number of sub-columns depending on the number of replicates for Y that you selected on the Welcome screen. Alternatively, if you choose to enter average data and error values instead of raw data for Y, the Y sub-columns will be formatted correctly to accept that form of data. It's also important to note that each XY table can have many separate data sets with Y values from each entered into their own column.
Let's look at an example of working with an XY data table using a simple data set. Let's imagine that a teacher is interested in knowing how the temperature of a classroom affects test scores. Here I have a data set showing various temperatures and resulting test scores from both an algebra class as well as an English class. You can see that within each class, there's only one test score for each temperature. What I'm going to do is create an XY data table where X is entered as numbers and Y will be entered as a single value for each point. Once we get the data table created, we can just manually enter this data. X will be temperature. The first group will represent the algebra test scores and the second group will represent the English test scores. We can fill the X values quickly by using a series that starts at 68 and increases by two each row. The rest of the data, we'll just type in quickly.
You may see that for some temperatures there are some values missing for Y. That's okay. In an XY table, Prism will only consider values for graphing and analysis that have both an X and a Y coordinate. Because of this, we can also enter our data slightly differently to accommodate the fact that our two sets don't have identical X values. I have a second Prism file in which I've done this already. In this case, I simply entered X values for each set independently, making sure to keep the Y values in their appropriate columns. The X and Y values for the algebra class are grouped together, and the X and Y values for the English class are grouped together as well. In either case, Prism sees the data the exact same way. You can verify this by looking at the graph of each Prism file and seeing that the graph of the first data set looks identical to the graph of the second.
Once your data is entered, Prism can analyze data from XY tables in a number of different ways. The types of analysis commonly performed with XY data include linear and non-linear regression, interpolation of XY values from a standard curve, determination of the area under a curve and calculating the correlation between X and Y variables, along with many others. Here, as you can see, the number of ways that you can customize XY graphs are very extensive. You can create a multitude of different types of graphs with different appearances allowing you to focus in on what you consider to be most important in your data.
The next type of data table is the column data table. This type of table is a little bit different than the XY tables in that each column now represents one level or factor of a single categorical grouping variable. Such as in this example, where we have measurements from a group of men and a group of women. In this case, the categorical variable is sex with men and women each being a level of that variable. You can also examine more than two groups, such as here where we have control, treatment one and treatment two groups. But in any case, each column represents a single group within a grouping variable. With replicate measures from that group stacked within the column.
The actual structure of a column data table is relatively straight forward. If your measurements between groups are fully independent, in that they are not matched or paired in any way, you can enter them into their appropriate column in any order. However, often in an experiment we use matched or paired measurements. In this case, each row would represent an individual repeated or matched value, so you would want to ensure that the impaired data for each group or column are entered into the same row and you can use the row titles to identify he matched set. For example, in this case measurements from subject four were obtained under control, treatment one and treatment two conditions. So you would want to enter each of those data for the three values on a single row in the data table.
Let's take another look at an example within the software. Going back to our teacher example, let's assume now that the teacher was no longer interested in the effects of temperature on test scores, but wanted to compare average final grades between different classes. Here you can see I have a set of final grades that are now only categorized by a single grouping variable of which class the scores came from. The algebra class, the English class or the history class. Thus, we can create a column data table with replicate values stacked in the columns.
The easiest method of getting this data into Prism is simply to copy and paste it in. You can see that Prism again automatically creates a graph and in this case, shows the average and error for each class. I previously mentioned matched or paired data. If we go back to the data table, you can see we hadn't originally set up this table to accept paired data or repeated measures. However, if we realized later that each row represents grades from a single student, we can expand this first column to include the student name or identifier in the row titles. When creating a column data table, if we select paired or repeated measures on the Welcome screen, this column for row titles is showing by default. The default graph appearance will reflect this change as well. Let's copy in this same data with names into the table set up for repeated measures and look at both graphs.
You can see that Prism provides many choices by default based on the set up of your experiment, and that these two graphs, although they represent the same data are categorized differently by your experiment and are represented differently within Prism. Analysis for column data tables are often aimed at comparing the measured value between different groups and include various forms of the T-test, an analysis of variants or ANOVA, as well as their non-parametric counterparts. From column data tables, you can also compute descriptive statistics for each group and generate a frequency distribution, perform receiver operator characteristic curve analysis and more. Like the graphs from XY data tables, there are a vast number of ways that you can present and customize data from column data tables as well in your visualizations. This allows you to illustrate the important aspects of your research in a nearly limitless number of ways.
The next type of data table that we're going to talk about is the group data table. Group data tables are similar in concept to column data tables in that the data are organized by categorical grouping variables, however, where data and column data tables are separated by a single grouping variable, group data tables are designed for data organized by two different grouping variables. Specifically in Prism, the levels of one grouping variable for a grouping data table are defined by columns, while the levels of the second grouping variable are defined by rows. In this structure, replicate values are entered in sub-columns defined by one grouping variable, across a single row defined by the other grouping variable.
Here you can see an example where we consider the fuel efficiency for a group of vehicles based on both the vehicle transmission type and the type of fuel that they use. This data is made up, but shows how the group data table will organize your data across rows and columns, based on the two groups of variables. In this case, transmission type and fuel type. As you can see, replicates for each combination run along rows. So the three values for fuel efficiency of automatic diesel vehicles is 24, 23 and 25 miles per gallon.
For the next example, we're going to look at a relatively simple data set, that examined groups of men and groups of women receiving one of three different treatments. As you can see, the first grouping variable of sex represented by the levels of men or women is spread across columns, while the levels of the second grouping variable, treatment option, are spread across rows with two measurement recorded for each group. When we look at the graph, by default, the column grouping variable would be organized by color while each set of bars represents a level of the row grouping variable. However, Prism provides ways to reorganize these bars.
Let's create a new group data table with the same structure as the original. Two replicate values in side by side sub-columns. Then we can copy and paste in the same data. When we go to the graph, let's select separated instead of inter-weaved. After changing the access labels from column titles to row titles, we can see that the groups of bars now represent levels of the column grouping variable. Note here that the decision as to which grouping variable to put into columns and which to put into rows will affect the ultimate appearance of your graph as it will tell Prism how to color and group your bars. In a future video, we'll look at various means of handling your data, including how to swap row and column grouping variables so that you can create the graph you want to see.
Here we can see another set of two graphs illustrating the point we were just discussing. In this case, each grouping variable has three levels. Group one, two or three for the first variable and condition one, two or three for the second variable. Although the data in both graphs are identical, the organization is slightly different with the assignment of rows and columns swapped between these two graphs. This is just one more way to emphasize what's important about the data from your experiment on top of the vast number of ways that graphs can be customized as we've seen from graphs made from other data tables.
With grouped data tables, you're able to investigate how data vary between groups based on their grouping variables using two or three way ANOVA. Prism allows you to perform regular or repeated measures ANOVA and in Prism eight you can even perform repeated measures one, two and three way ANOVA even if data are missing, through the use of a mixed effects model. This table type also allows you to perform multiple T-tests, or to calculate statistics for each condition. As we were just discussing, the choice of which variables to assign as columns, and which to assign as rows will affect the default appearance of the graph that Prism creates. However, it's important to realize that these analysis can still be performed with either variable assigned to either location.
The next three types of data tables are each a bit more specialized. The first of these, the contingency data table is similar to the group data table in that this data table organizes it's data defined by two grouping variables. However, whereas data in the group data table are continuous, data in contingency data tables are discrete. They contain exact counts of the number of observations for each condition, defined by each of the grouping variables. Sometimes, these are looked at as exposure versus outcome tables and they're best understood by example.
Let's imagine you investigate a population and determine the number of people who did or did not get the flu in one season after having received or not received the vaccine. By observing the number of individuals in each condition, either receiving the vaccine and getting the flu, receiving the vaccine and not getting the flu, so on and so forth, you can create a contingency table from which Prism can calculate the odds that someone would contract the flu after having received the vaccine or not. Other types of analysis that Prism can perform from this type of data table are the chi-square test and fisher's exact test. Prism can calculate odds ratios and relative risk scores and can also calculate and report the fraction of the column, row or grand total that each condition represents from the total population.
Next up, we'll look at the survival table. This data table is structured for the analysis of so called time to event data and can compare two or more groups. Although, as the name of this data table implies, these time to event studies are frequently investigating groups of control and treated subjects with respect to a specific illness, the event in question doesn't need to be related to dif or clinical studies at all. In fact, this type of data table can be used to study any one time event within a group. You could, for example, compare differences between different vehicles and how long each could be driven before the transmission had to be replaced. In survival tables, the X values are entered as elapsed time, usually as days or weeks or months, and the Y value is coded as either a one or a zero. Entering a one means that a subject of that group experienced the event of interest after the amount of time given by X. A zero indicates that data were censored at that time, which typically occurs when a subject either leaves the study, or the study ends before the subject ever reached the event of interest.
In Prism, survival data tables are unique in that when you create a survival data table, the survival curve analysis is performed automatically. Prism automatically generates a Kaplan-Meier survival curve for each group and compares multiple curves using the log rank test and Gehan-Breslow-Wilcoxon test.
The next data table type is the parts of a whole data table and is used when you would like to examine data that represents fractions or percentages of a whole, broken down by a single grouping variable. Another way to think about this is that parts of a whole data tables are used when it makes sense to ask the question, what fraction of the total does each value of my data set represent? If that question makes sense for your data, you may consider using the parts of a whole data table. This table is structured in such a way that each component of the whole is entered into a single column and Prism assumes that the values entered represent all parts of the whole with none missing. The graph most readily associated with this type of data table is the pie chart. As an example, you may want to investigate the number of students receiving a given grade on a test.
For parts of a whole data, the various component values are entered into individual rows of a single column. In this example, you would enter the number of students receiving each letter grade on a separate row of the first column in the data table. From this data, Prism can generate the corresponding fractions of the total or perform a chi-square goodness of fit test comparing the distribution of data entered in the data table with a defined theoretical distribution. Although the pie chart is the most commonly recognized graph from the parts of a whole data table, there are other types of graphs such as donut graphs and dot plots, which also illustrate this concept of parts of a whole.
The last two data table types that Prism offers are both brand new to Prism eight, so I'll spend a bit more time on each of these. The first of these is the multiple variables data table. Like all of Prism's data tables, this table adopts a specific structure. The structure of this data table is similar to the structure found in many other statistic software in which row represents one observation or individual and each column represents one variable. In the example shown here, you can see that each row represents a different food, drink or snack item and each column represents a different measurement of it's nutritional value. Specifically, the variables examined here include information on the number of calories, the amount of fat, the amount of sodium, carbohydrates and protein found in each item.
From this data table and structure, Prism allows you to quickly perform correlation analysis on any or all of the variables reporting back a matrix of correlation coefficients from which a heat map can be made to visually represent the value of these correlation coefficients. Perhaps most importantly though, the multiple variables table allows you to perform multiple linear regression, examining how a set of multiple independent variables affects the value of a single dependent variable. Although I won't be going through the process of multiple linear regression in detail in this video, we will have another video available that will walk you through both generating a heat map from a correlation matrix, as well as performing multiple linear regression on this data set.
Briefly though, when you perform multiple linear regression, Prism allows you to identify which variable is your dependent variable, which main effects and interactions you would like to examine, and if you would like to perform least squares regression or Poisson regression. Prism will also allow you to define two potential models and will report back which model fits the data best, as well as graphically reporting the residuals from the selected model in four different ways. Additionally, Prism also provides a way to extract data from the multiple variables data table structure and rearrange it to fit other data table types. You can choose which type of data table you'd like to extract to, and the variables that will define how the data are arranged, and when applicable, how to separate groups by a grouping variable. Let's look at a really quick example.
In this data table, I have information on the blood pressure, age, weight, height and sex for a group of individuals with the variable of sex coded as either a zero for men, or a one for women. If we go to analyze, you can see that we can perform a correlation analysis or multiple linear regression, but we're going to select extract and rearrange. We have the option of extracting this data to XY, column, grouped or contingency data tables. I'm going to stick with XY. When we go to data arrangement, I can select height as my X variable, Y as my weight variable and sex as my grouping variable. Clicking OK takes me to the results sheet in which weight has been extracted as Y as a function of height or X. There are separate Y columns for both men, again coded as zero, and women, coded as one. Clicking on the graph of this data shows the expected trend where for both men and women, as height increases, weight naturally also generally increases. Similar techniques can be used to extract data from multiple variables tables to other data table structures such as column, grouped, and contingency tables as you saw.
Finally, the eighth data table, also new to Prism eight, is the nested data table. The structure of the nested data table is unique and was specifically developed for the analysis of experiments with a nested or hierarchical design. This sort of experiment is one in which you have two grouping variables defining your measurements, but in which the levels of one variable are found exclusively within specific levels of the other. The structure of the nested data table reflects this with the levels of one variable defined by columns, and the levels of the second defined by sub-columns with specific levels or sub-columns of the second variable found only within individual levels or columns of the first.
In this example, measurements from any given mouse can be found only under control or treatment conditions. The main advantage of the nested data table is it's ability to analyze data from nested experimental designs, while avoiding the issue of pseudo replication. These analysis are referred to in Prism as a nested T-test to compare two groups such as here, where we have control and treated, or if we had three or more groups, for example, control, treatment one and treatment two, is called a nested ANOVA. Though the specifics are beyond the scope of this video, both of these analysis actually utilize a mixed effects model to perform the necessary calculations. Let's compare the nested data table to two other data tables in Prism to better understand the structure of the nested data table and this problem of pseudo replication.
As I mentioned, both the nested data table and the group data tables use two grouping variables to describe each measurement. Here, I have a group data table with measurements of wild type and mutant cell lines under both control or treatment conditions. I also show a nested data table with measurements from six mice under either control or treatment conditions. Thus, for the group data table, the variables are cell strain and treatment conditions. While for the nested data table the variables are mice ID and treatment condition. However, there's one important difference. In the group data table, you can see that there are measurements in each level of one variable, and every level of the second. For example, for mutant strain number two you have measurements in both control and treatment conditions. The same is true for each cell line. This is often referred to as a cross design. The levels of one variable are fully crossed with the levels of the other. In comparison, in a nested experimental design, the levels of one variable occur only within specific levels of the other variable.
In this example shown, measurements for mouse number three can be found only under controlled conditions. There are no measurements for mouse number three under treatment conditions. As a result, the way to properly analyze this data is different. Let's look at the basic example. Here you can see that again we have treatment and control conditions and we have two subjects within each condition. This is a nested design because measurements from subject one and subject two can be found only under control, while measurements from subject three and subject four can be found only under treatment conditions while the three measurements were taken from each subject. In this set up, each subject represents what is called an actual replicate, while each measurement of that subject is called a technical replicate. We can analyze this data using a nested T-test to compare these two groups and were we to do so with the data shown, we would see that the difference between the means of the treatment and control groups are not statistically significant.
However, without nested data tables or understanding how nested experiments are structured, you may have attempted to analyze the two experimental conditions using a regular parametric T-test with two groups, with each group having six measurements. After all, the values of the measurements belonging to either control or treatment are the same. However, and I can't stress this enough, this is absolutely incorrect. In this situation, you will have treated technical replicates as if they were actual replicates. The mistake that I referred to earlier as pseudo replication. When you treat technical replicates as actual replicates, it appears that you have a much more precise measurement of the mean for each group than you actually do. The result is that if we run a T-test with this data, although the nested T-test already told us that the difference in means was not statistically significant, the analysis reports that the difference is highly significant. However, and again, I cannot stress this enough, this is absolutely incorrect. Nested data tables provide the means of properly analyzing data from experiments with these nested or hierarchical experimental designs and avoiding the issue of pseudo replication.
So at this point hopefully you can start to see how each of Prism's data tables are uniquely designed to accommodate different types of data. Each data table is structured to reflect the inherent organization of the data being analyzed from the different types of experiments that you're performing. These data tables allow you to quickly evaluate your data using appropriate analysis that are meaningful for your data. They reduce the amount of complexity required to describe your data and make it faster and easier for you to get all of your results and generate all of your visualizations. As a final note, be sure to take advantage of the tutorial data sets to explore how each of the data tables can work with your own data and remember that the Prism user guide is available with a vast wealth of even more information.
Thanks for watching, and have a great day.