You are on page 1of 4

EXERCISE 7

1. What parameters are represented by the "class" and "id" headers in the first two
rows?
Answer: Class (isolation source); N-normal healthy colon tissue, T-tumour colon tissue.

2. Why do different rows sometimes display identical abundance values? NB this


question is about genomics and taxonomy/systematics, and NOT about mathematics;
check and compare the taxonomic indications of the rows.
Answer: different rows displays identical abundance value because the microorganisms with
this abundance value of the sample colon tissues belong to the same taxonomic group.

3. What do all columns add up to? WHY was normalization applied here?
Answer: 100%, because it sums all the bacteria abundance percentage per each sample.
Normalization was applied because the total number of bacteria per sample is different.

4. Which statistical test(s) will be performed by LEfSe for your input file before
performing the LDA Effect Size analysis?
Answer: Kruskal-Wallis (KW) sum-rank test to detect features with significant differential
abundance with respect to the class of interest and Wilcoxon rank-sum test to investigate
biological significance using pairwise test

5. At the A) Format Data for LEfSe page, you have to specify, for your input file,
which row contains the "class" parameter and which row contains the "subject" values.
Which rows are these in your input file?
Answer: the first row class and second row id of the input file.

6. Which of the offered input parameters at this webpage is in fact redundant for your
input file?
Answer: The input parameters that is redundant for the input file is the subclasses.

7. What is the main hypothesis that is investigated with this input file and the LEfSe
LDA?
Answer: the main hypothesis is that a certain bacteria taxon is significantly associated with
class 1 (or class 2), threshold on the logarithmic LDA score for discriminative features: 2.0.

8. To which statistical test outcome does the first info line refer to? (NB the actual
number "130" is not relevant at this moment, only the test is requested to name here).
Answer: the non-parametric factorial Kruskal-Wallis (KW) sum-rank test.

9. What do the "discriminative features" in the second info line refer to, in other words,
what does LDA analysis add to the outcome of the test of Question 8? it is helpful to
study the illustration of the LDA test that is at the page Huttenhower Lab Modules: B)
LDA Effect Size (LEfSe)
Answer: It is the LDA score calculated who compare with threshold value 2. If the LDA
score is higher than 2, it suggested there is a difference in the bacteria number between the
tumor tissue and normal tissue. Thus, the discriminative features refer to the number of
bacteria that are significantly associated to the two classes.
10. Give headers to the output spreadsheet using the "Output format" information that
is shown to the bottom of the LDA Effect Size page; the order of parameters described
here *nearly* matches the columns in the spreadsheet. You will notice that there is one
extra column in the spreadsheet that is not described on the corresponding web page;
what do the numbers in that column refer to, you think?
Answer: the numbers refer the p-value.

11. Sort the data in Excel using the "classes" column. How many bacterial taxa/classes
combinations have "highest means" and "LDA scores" associated with them? How
many bacterial taxa (OTUs) are associated with the "T" samples and how many with
the "N" samples?
Answer: There are 121 bacterial taxa/classes combination have the highest means and LDA
scores associated with them. There are 22 bacterial taxa (OTU) associated with the T
samples and 99 bacterial taxa (OTU) associated with the N samples.

12. Compare the no. of bacterial OTUs for the T and the N class. What is a plausible
biological interpretation for this difference?
Answer: The number of bacterial OTUs for the tumour tissue is smaller than that of normal
tissue; more for the normal tissue and less for the tumour tissue.

13. Have one last look at the LEfSe output table and inspect the names of the bacterial
taxa that show significant correlations with Tumor tissues. Which groups stand out/are
most represented in the analysis? it helps here to take a good look at those OTUs that
are shared by descent, that means: OTU(s) that are enriched in shared-by-descent
taxonomic levels.
Answer: The fusobacteria and proteobacteria groups are most represented in the analysis.
Both are associated with colorectal cancer.

14. What do the green and red bars indicate/correspond to? Count the green bars
(probably the smallest category) from the graphic; does the number ring a bell?
Answer: The green bars are 22 and correspond to the tumour -associated bacteria. The red
ones correspond to the normal healthy - associated bacteria.

15. From this graphic, which bacterial taxa associated with Tumor tissues have the
highest median values according to LDA analysis?
Answer: the bacterial taxa associated with the tumour tissue with the highest median values
according to LDA analysis are fusobacteria, fucobacteriales, fusobacteriaceae and
fusobacterium.

16. Which relationship can be seen between the features (bacterial taxa) in the
spreadsheet that is not visible in the graphical output?
Answer: The name of the bacteria in terms of hierarchy and phylogeny.

17. Considering the Input Parameters in the table above and the text on the LEfSe "Plot
Cladogram" page, graphical options descriptions: which change in what parameter
would yield a more complex, more extensively branched, cladogram?
Answer: Expand terminal non-leaf levels: whether to expand non-leaf taxa without children up to
the level of the leaves naming the new levels with the expanding taxa name.
Maximum number of taxonomic levels: you can limit the levels of the cladogram to a desired level.
18. What do the small deeper-greenish and red dots (circles) in the cladogram
correspond to? And what do the pale/lighter-green dots or circles correspond to?
Answer:
The green ones refer to tumor giving
The red ones to normal
The light green ones refer to non-significant p-value
It contains the whole dataset of bacteria
Some have a white background; this means that they do not have a known taxonomy (e.g. fast
developing bacteria, constantly new kinds of bacteria are discovered.

19. What does the size of a dot correspond with?


Answer: It refers to the LDA score of the highest mean. They have a maximum limit to the
size of the dots, to make it more visible.

20. Given that in this graphic, tree distances correlate with 16S similarity, would you say
that Fusobacteriales and Campylobacteriales are closely related or not? Does that
surprise you, given that these families are both significantly associated with Tumour
samples?
Answer: Normally you would expect that species that are related could both give a tumor,
because they have a similar pattern (same components...).- converted evolution (): In this case
they are not related, they could for instance just produce the same compound that causes tumor.

21. From this graphical output, which bacterial taxon would be the most relevant to
investigate in terms of correlations with tumor tissues? Why?
Answer: The Fusobacteria are better because they have highest LDA score.

22. What do the red vertical lines represent?


Answer: Red vertical lines represent the relative abundance of the bacteria taxon we selected
per individual sample

23. What do the horizontal black straight and dotted lines represent?
Answer: the horizontal black straight lines represent the class mean and the dotted lines
represent the class median.

24. Suppose that you would want to investigate specific samples further; which samples
would you choose after inspection of the graphs? Why?
Answer: The individuals in class N with high relative abundance of Fusobacteria, to study if
there are other bacteria that suppress the function of Fusobacteria and result in normal healthy
tissue. And the individuals in class T with low relative abundance of Fusobacteria

25. When you inspect the LEfSe results plot, what observation can you make regarding
the gender distribution of Fusobacterium, the taxon for which LEfSe analysis earlier
found a strong association with colon cancer? What might this suggest?
Answer: Fusobacteria is more related to female, which might suggest female has a higher risk
of suffering from colon cancer.

You might also like