MapQTL ® 7
Software for mapping quantitative trait loci in experimental populations of diploid species
Analyse your QTL experiments with powerful statistical methods. MapQTL is easy to use, is very fast and presents the analysis results in tables and adjustable charts. The results can be exported to MS-Windows ® text processing, presentation and spreadsheet software.
- What is MapQTL ?
- Version 7
- V7 Enhancements
- More ...
What is MapQTL ?
MapQTL is a computer program for genetic analysis of segregating quantitative traits in experimental populations of diploid species. The analysis tries to detect regions in the genome that are responsible for phenotypic variation in the investigated quantitative trait. These regions are called quantitative trait loci (QTL). At the same time the genotypic effects associated with the segregating QTL alleles are estimated. The software can deal with all common types of experimental population, including the full-sib family (F1) of a cross between two individuals of an outbreeding species. The available methods are interval mapping, the powerful MQM mapping (equivalent to composite interval mapping) and the nonparametric Kruskal-Wallis test. The permutation test and automatic cofactor selection are supporting methods that can be applied. The so-called traits scan method performs a condensed analysis of large numbers of quantitative traits based on interval mapping. Finally, it is possible to investigate the presence of epistasis between two loci. MapQTL is an easy-to-use MS-Windows program with an intuitive user interface.
This present version 7 (v7) builds on its predecessors. While preserving most of the user interface, the general workflow and the methods of version 6 (v6), v7 has many enhancements over v6, several of which are quite significant. Most are intended as improvements for working with larger datasets of genetic markers and traits. Major efforts have gone into increasing the computational speeds of the analyses, with success. In brief, the main technical improvements are: (a) the change towards 64-bit software, (b) the redesign of various probability computations, (c) the use of an embedded database system for storing all data, and (d) the parallelization of various calculations. The change to 64-bit allows access to more system memory, which is useful for very large datasets. It also allows for more efficient probability computations, where intermediate results are stored in memory and afterwards repeatedly reaccessed, thereby preventing the need for time-consuming recalculations. Using an embedded database system greatly improves the responsiveness of the user interface with large datasets. Due to the embedded property, the database system requires no special installation and usage instructions, everything is taken care of by the program. Finally, parallelization increases the computational speed on systems with a multi-core processor, of which normally all available computation cores can be utilized. Also, so-called data parallelism is applied in several methods by employing special AVX2 (Advanced Vector Extensions 2) instructions, which are available in most modern processors.
MapQTL 7 is 64-bit software for the 64-bit MS-Windows 10 platform. Other MS-Windows platforms are not supported. Previous versions are no longer available for ordering.
Enhancements introduced with version 7:
- Traits scan
- User interface
The v7 executable program is a 64-bit MS-Windows application. The 64-bit means that it can have access to more memory of the computer (i.e. RAM) than the 32-bit limit of 4 GB. Obviously, the program requires that it runs under a 64-bit version of the MS-Windows operating system and that the computer has more than this 4 GB of memory. An amount of 16 GB of memory is recommended for common analyses. Access to more memory means in practice that computations can take place entirely in memory without having to store intermediate results on the hard drive, resulting in higher speeds. Also, alternative methods that require more memory may be utilized.
For interval and MQM mapping the probabilities for all alternative genotypes of a putative QTL must be calculated at many positions on the genetic map based on the genotype observations of nearby loci (markers). If for a certain individual these observations are unknown or partially unknown, then use is made of the observations at neighbouring loci and the relevant map distances to obtain the best possible probability estimates. In previous versions of MapQTL these probability computations using neighbouring loci were done entirely from scratch at each position. In v7 the results of the major part of these computations are done once and stored in memory at the start of the interval and MQM mapping. This is made possible by having more memory available through v7 being 64-bit. At all positions where the QTL probabilities are needed in interval and MQM mapping, these intermediate results are looked up and used rather than recalculated. This approach greatly increases the speed of the analyses. It also enables the use of more dense maps.
When displaying data in tables, previous versions of MapQTL always retrieved the entire dataset from file. With larger datasets this could make the program not very responsive. The various data within MapQTL v7 projects are now stored in databases. Displaying data in tables or as plain text from within the program is made in such a way that only the currently visible part on screen is retrieved from the database file. This approach is called database driven. Using a database system this way greatly improves the responsiveness of the user interface with large datasets. With small datasets the responsiveness can be slightly slower than with previous MapQTL versions due to the database system overhead. The database system used is the embedded database engine called SQLite. It does not require any database server installation or maintenance. As a final remark, it is good to realize that the speed of the hard drive can become a limiting factor in dealing with very large datasets.
Modern processors usually have multiple computation cores that can execute separate calculations simultaneously. For MQM mapping, automatic cofactor selection and the epistasis test new algorithms were developed that can be distributed over multiple cores. This technique is called multithreading or symmetric multiprocessing (SMP). MapQTL v7 tries to make use of all available cores of the processor. Thus, the computations will be faster if the processor has more cores. Some recent processor models have two types of cores: for efficiency or for performance. Currently, it is not clear how MapQTL's parallel tasks will be distributed over cores in such processors, although it is assumed that only the performance cores will be assigned.
Ideally, the speed of an SMP algorithm scales linearly with the number of cores, except for a small amount of overhead. It turns out, however, that the size of the cache memory in combination with the size of the dataset can be a limiting factor. Another, rather technical, adverse phenomenon with cache memory is called false sharing. MapQTL's SMP algorithms try to avoid this false sharing as much as possible. It turns out that it is difficult to predict whether MapQTL's multithreaded algorithms speed up the analyses on your computer's processor model. Therefore, MapQTL v7 offers both the new multithreaded and the original serial versions of the analyses.
Most PC processors released since 2013 support special instructions for so-called data parallelism. Technically, this is called single instruction, multiple data (SIMD). For instance, a single instruction will result in four pairs of numbers being multiplied at the same time, where originally each pair would be executed one after the other. IntelŽ and AMDŽ have developed several versions of SIMD instructions, among which advanced vector extensions (AVX) and fused multiply-add (FMA) instructions. For MapQTL several algorithms were developed that make explicit use of SIMD instructions of versions AVX2 and FMA3 (in practice FMA3 is considered to be part of AVX2). Because there is some overhead involved, not all analyses benefit in speed from the SIMD algorithms. Especially the mixture model based analyses are faster, whereas regression based analyses may even be slower. MapQTL v7 offers both the new SIMD and the original versions of the analyses.
There are QTL studies that involve hundreds, or even many more, quantitative traits, for instance, expression QTL studies. In MapQTL v7 a new method, called traits scan, is added that performs a condensed analysis of a large number of quantitative traits based on interval mapping. The traits scan will run interval mapping on all traits selected for analysis, while only storing and reporting the highest LOD obtained for each trait. The adjusted user interface of v7 makes it easy to select a range of traits for analysis. The traits scan applies the interval mapping method only at the map positions of loci of the marked linkage groups, not between loci. The condensed results give insight into the important traits and their associations. After the traits scan, all traits with a maximum LOD below a given level in the traits scan can be removed from the population with a Data menu function.
The interval mapping method analyses map positions for statistical association with the trait of interest. In situations where multiple positions are showing association, genetic markers at these positions can be used as cofactors in the MQM mapping analysis. In this analysis, the cofactors are modelled as additive fixed effects, additive to each other and to the modelled QTL. There are situations where epistatic interactions are suspected. The new epistasis test of MapQTL v7 allows for the investigation of two-way, or QTL-by-QTL, interaction based on two map positions. This method estimates genotypic means for each two-locus genotype and compares the situation's likelihood with that of the situation where the two loci are additive to each other. There is no biological reason why three-way or higher interactions should not be possible, however testing for these requires many more degrees of freedom than available in a typical QTL analysis experiment. Therefore, the epistasis test is limited to two-way interaction.
Integrated two-way pseudo-testcross
Two functions were added that facilitate the analyses of CP type populations with the so-called integrated two-way pseudo-testcross approach. Basically, this approach performs separate parental QTL analyses, where, in addition, the marker loci with association in one parent can be used as cofactors in the MQM mapping of the other parent. The great advantage is that the probabilistic model is much simpler, which results in considerably faster computations than with the original CP model. A CP model analysis can become quite slow in situations with many loci with segregation types <lmxll> or <nnxnp>, which miss information on one of the parents and thus require extra probability calculations. Based on a CP population node, the first of the two new functions creates a new population node with the marker genotypes properly translated, while the traits are simply copied. The second new function creates a map node with a linkage map that corresponds to the newly created two-way pseudo-testcross population node.
The most important improvements of the user interface concern the dealing with the phenotypic traits. Individual traits are not shown anymore in the Populations tree of the navigation panel. Instead, they are shown on the Traits tabsheet of the contents-and-results panel. The individual observations of the traits are shown on the new Traits data tabsheet. As soon as a population has one or more traits, then the population node in the Populations tree will have a traits node as child node. On the traits tabsheet each trait has its own row, which has several columns: one showing whether or not the trait is numerical and next three columns with a checkbox. The first checkbox must be used to select the trait for analysis, the second must be checked if the trait is to be used as a covariate in the analysis, and the last is for using the trait as experimental design cofactor. The three checkboxes are mutually exclusive. It is easy to set or clear the checkboxes of a range of rows in the table. First you should mark a range of rows using the F8 key, the space bar or right-clicking the mouse; the range will become highlighted. Next, setting or clearing any one checkbox in the range will set or clear all checkboxes in the range.
The Population tabsheet is removed in v7. Instead, hovering the mouse pointer over the nodes in the Populations tabsheet of the navigation panel will show a hint pop-up with the relevant information similar to that of the removed tabsheet. Also, right-clicking on the traits and genotypes nodes will show the information in a message box.
In MapQTL v7 traits can be removed from a population separately. For this you need to mark a range of rows in the Traits tabsheet and apply the Data menu function Remove marked traits. Adding traits can be done as before by loading a .qua file with observations, but it can also be done by pasting data that are copied from a spreadsheet with the Data menu function Paste traits. Whereas in v6 new loaded traits would always replace the existing set of traits, in v7 you also have the option to add the new traits to the existing set.
In MapQTL v7 the placement of the various options over the first two chart options tabsheets is reorganized, while also two new interesting options are added. It is now possible to choose for a chart type where multiple groups are shown together with a single X- and Y-axis. In this case, the chart will take up the entire available page width (i.e. one chart per row). The parameter Groups side by side determines the number of groups shown next to each other in such a chart. If not is chosen for combining multiple groups in one chart, then this parameter determines the number of charts (with a single group) on one row. For multi-group charts, the linkage group names will be shown as labels on the X-axis instead of locus names or cM distances. The option Group label orientation determines whether the group labels should be drawn horizontally or vertically.
In the checklists with the groups to plot and which data to plot, it is now in v7 possible to select multiple rows in the regular MS-Windows fashion (click while holding the control or shift key). If next one of the corresponding checkboxes is set or cleared, then all checkboxes will be set or cleared.
Besides the other described enhancements, there are many smaller but quite useful improvements:
- A major effort was made for MapQTL to be fault tolerant with project files. In the hopefully very rare occasion that the project cannot be properly opened or appears corrupted, you may have MapQTL attempt to reconstruct the project database. The program will then try to reconstruct the project database as good as possible.
- MapQTL v7 will recognize a project of version 6, when trying to open it. Although it cannot use a v6 project directly, the program can convert it to a new v7 project.
- In v7 each session has its own Session notes tabsheet, useful for your administration.
- The program maintains a history of the last 50 messages that were shown on the status bar (as long as the program is active); this message history can be accessed by right-clicking on the status bar.
- The progress bar is improved to give a better representation of the progress of the executing procedure. Some database actions cannot be predicted for their duration, so that the standard progress bar growing to 100% cannot be used. To give feedback that the program really is busy in such cases, the progress bar area will show sequences of '>' symbols.
- Loci in the population and loci on the map may be excluded from the analysis.
- The message box has a copy button, that will place the text of the message box on the MS-Windows clipboard.
Overview of MapQTL's main features
- intuitive MS-Windows user interface;
- many experimental population types:
- BC1 - first generation backcross;
- RIx - recombinant inbred lines family;
- DH1, DH - family of F1-derived doubled haploids;
- DH2 - family of F2-derived doubled haploids;
- HAP1, HAP - family of haploids;
- BCpxFy - advanced backcross inbred lines family;
- IMxFy - advanced intermated inbred lines family;
- CP - outbreeder full-sib family;
- input in plain text files with a flexible layout of the quantitative trait data, the molecular marker genotypes and the (precalculated) linkage map; map and molecular marker data files are compatible with JoinMap; quantitative traits can also be copied from a spreadsheet;
- interval mapping;
- MQM mapping, in which markers are used as cofactors to absorb the effects of nearby QTLs, thereby increasing the power for mapping other segregating QTLs; it may even enable the separation and mapping of linked QTLs;
- automatic selection of cofactors using a backwards elimination procedure to easily get the set of cofactors for MQM mapping;
- permutation test to determine the significance level of interval mapping without the usual assumption of normality of the data residuals;
- nonparametric mapping with the Kruskal-Wallis rank sum test per marker to assess the segregation of QTLs for non-normally distributed data;
- traits scan, quickly analyse large numbers of traits with interval mapping;
- epistasis test;
- the Haley & Knott regression approximation to maximum likelihood interval and MQM mapping is available;
- simple experimental design (e.g. blocking) and covariates can be analysed jointly with interval and MQM mapping;
- multiple populations and maps in a single project;
- traits observed in multiple populations can be analysed combined over the populations with interval and MQM mapping based on a common (integrated) linkage map;
- analysis results stored in sessions;
- clearly arranged results, mostly in adjustable and sortable tables;
- QTL charts with many adjustable features;
- results and charts exportable to most MS-Windows text processing, presentation and spreadsheet software;
- print preview;
- manual in Adobe ® PDF format;
- easy-to-use installer.
Get an impression of the software with the slide show of MapQTL 7:
MapQTL 7 slide show (size: 0.9 MB).
If needed, support will be given to help you get the software running and solve problems not described in the manual. This support is limited to advice by e-mail to <support(at)kyazma.nl>. A list of frequently asked questions is presented at this web site.
- The first version of MapQTL available to the public was version 3.0. It was presented at the Plant Genome IV Conference, January 1996, San Diego, California, USA (Van Ooijen & Maliepaard, 1996, Plant Genome IV Abstracts).
- MapQTL 4.0 was presented at the Plant & Animal Genome VIII Conference, January 2000.
- MapQTL 5 was presented at the Plant & Animal Genome XII Conference, January 2004.
- MapQTL 6 was presented at the Plant & Animal Genome XVII Conference, January 2009.