Wednesday, 26 April 2017

Biodiverse now works on Macs



Thanks to some sterling work by Jason Mumbulla we now have a native version of Biodiverse for Macintosh computers.

Installation is by dmg file, so follows the same approach as most mac installations.

The interface is somewhat "old school" (think 1998), but all the functionality is in place and works (see screenshots below).

Download links are at https://purl.org/biodiverse/wiki/Downloads (at the time of writing there is only a development release), with installation instructions at https://purl.org/biodiverse/wiki/Installation.








Shawn Laffan
27-Apr-2017


For more details about Biodiverse, see http://purl.org/biodiverse 

For the full list of changes in the 1.99 series (leading to version 2) see https://purl.org/biodiverse/wiki/ReleaseNotes (for all issues addressed or being targeted to fix for version 2, see https://github.com/shawnlaffan/biodiverse/milestone/4 ).

To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users

Tuesday, 25 April 2017

Biodiverse now exports tree branch colours

Biodiverse version 2 will allow you to export the tree branch colours.  This means you can now get your tree colours to the point that you are happy with them, and export them to another program such as FigTree to generate figures for a publication.

Colours can be exported with the Nexus and tabular Tree formats.  If you use the tabular format then there will be a column in the output table called Colour.  For the Nexus format, the colours are stored in the comments block for each branch.

Click on the "Export colours" checkbox to export the colours in a Nexus file. 

A few examples are below.

View labels tab with a set of labels highlighted on the tree.


The selection colours displayed in FigTree.

A cluster analysis with six clusters coloured.

The tree in FigTree with the six clusters coloured.  

A user defined selection.

The same user defined selection in FigTree.

The colouring works for anything that is applied to the tree, so you can also export continuous palettes when you have calculations per node (branch).

PD calculated for each cluster node (branch).

And the same PD colouring now in FigTree.

Kudos to Luke Fitzpatrick for getting it working.

Shawn Laffan
26-Apr-2017


For more details about Biodiverse, see http://purl.org/biodiverse 

For the full list of changes in the 1.99 series (leading to version 2) see https://purl.org/biodiverse/wiki/ReleaseNotes (for all issues addressed or being targeted to fix for version 2, see https://github.com/shawnlaffan/biodiverse/milestone/4 ).

To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users

Matching spatial, tree, matrix and property data is now easier

One of the biggest bugbears when using Biodiverse has been matching the names between the spatial data and any tree, matrices, group properties and label properties.

Biodiverse uses an exact matching scheme to link tree branch names with Basedata labels (e.g. species).  If there is even a single character different then the two sets will not match, and analyses either do not run or are incomplete.

The standard approach used in Biodiverse to ensure the data match is for users to provide a remap table that "maps" labels from one data set to another (e.g. make the tree names match the basedata).  However, experience suggest that users are often unsure what columns should be used for what (and sometimes how many), and that generation of the remap file is more difficult than desired.

Biodiverse version 2 has a greatly simplified remap process that can generate possible matches automatically.  Any object can have its element names remapped at import, as before, but now there are also options under the Basedata, Tree and Matrix menus.  Better yet, there is also a centralised interface that can be accessed under the file menu.


In the centralised remap interface, one can match any object to any other object in the project, as well as to names loaded from a file.  If an object is chosen then Biodiverse will search the sets of names in each object to find possible matches.  The User defined from file is the conventional process, where the file needs to define the set of input and remapped columns.  The Auto from file allows users to load a list of names which will be searched to find possible matches, in the same way as it searches other objects.

The search process uses a fuzzy text matching scheme, as well as searching for differences purely in punctuation or in quoting characters.  The minimum acceptable distance option allows control over how different possible matches can be.  If a very large value is used then anything can match anything else, which is unlikely to be useful.

The interface for the automatic remaps is as below.  There are panes listing exact matches, non-matches (i.e. differences were too great), punctuation matches and possible typos.  Users can choose to ignore any of these sets, and also select subsets within the sets if, for example, there are false matches.

The menus for Basedata, Matrices and Trees all have remap options, but there is also a centralised system under the File menu.


Any object can be matched with any other object, as well as loading remaps from files.

The Maximum acceptable distance controls how different names can be before they are no longer considered as possible matches.
Users can control the level of remapping used within the match categories.  



Users can also choose to export the remap to a file to re-use later on (or inspect for possible problems) using the Export remap to file button.  A copy of the remap can also be sent to the clipboard for direct use in spreadsheets, text editors and the like.



Remapping of BaseData objects is not permitted if they have existing outputs, as that would potentially wreak havoc with the matches between the BaseData and the outputs.




The system will also warn if you try to map an object to itself (but it will let you try if you are determined to do so).




Kudos to Luke Fitzpatrick who got it all working.


Shawn Laffan
26-Apr-2017


For more details about Biodiverse, see http://purl.org/biodiverse 

For the full list of changes in the 1.99 series (leading to version 2) see https://purl.org/biodiverse/wiki/ReleaseNotes (for all issues addressed or being targeted to fix for version 2, see https://github.com/shawnlaffan/biodiverse/milestone/4 ).

To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users


Constrain the extent of your exported results

This one is a short post.

Following a suggestion by Chris Barratt, Biodiverse version 2 will allow users to use a definition query when exporting spatial outputs.  This means you can run your analysis for a large data set, but limit the set of exported groups to a subset.

If the analysis itself used a definition query then the export windows will set that by default.  Just delete it to export all records.

Users can specify a definition query when exporting their data.  In this case the analysis also used a definition query so it is specified by default. 

The exported data only contain those groups (cells) that passed the definition query.  In this screenshot, no-data cells are in light grey to indicate the full extent of the exported data set.  

Shawn Laffan
26-Apr-2017


For more details about Biodiverse, see http://purl.org/biodiverse 

For the full list of changes in the 1.99 series (leading to version 2) see https://purl.org/biodiverse/wiki/ReleaseNotes (for all issues addressed or being targeted to fix for version 2, see https://github.com/shawnlaffan/biodiverse/milestone/4 ).

To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users

Tuesday, 13 September 2016

New selection tool in Cluster analysis tab

A new feature just added to the Biodiverse cluster analysis tab is the ability to control the colour of branches on the tree and the cells that contain them.  This is perhaps most useful if you want to colour your Biodiverse plot to match some pre-existing map (and is the reason some users requested it).

In a nutshell, users can now switch to the Multiselect mode using the combo box where the lists are selected (and the default is still Cluster).  Once there they can choose a colour or accept the system generated default, click on a branch and watch the branch and all of its descendants and the associated groups (cells) plot in that colour.

The multiselect mode is turned on by selecting it in the lower left combo box.  


Users can assign colours to any branch in the tree to colour its descendants and the associated groups. In this example the red clade has also had a sub-clade cleared of colour (note the black branches and the highlighted cells that are not coloured).

Once the branch is selected the default colour changes to the next colour in the palette (unless you turn it off using the button to the left of the brush).  Repeatedly clicking on the branch will cycle through the palette, so if you missed the colour then just keep clicking until it goes around.  The palette in use at the moment has nine colours (it is the 9-colour paired palette from http://colorbrewer2.org).

You can also uncolour branches by selecting the brush icon to change to clear mode.  When in this mode, the mouse icon will change to a brush when a branch is hovered over to remind users what will happen when they click.

There is also little need to fear mis-clicks, as users can undo and redo selections.  Simply press the "u" key on the keyboard to undo one click, and repeat to keep going back.  If you over-do it then you can press "r" to redo and reinstate a branch colour.  Note that the redo list is reset as soon as you colour a branch.


The colour selection uses the same colour selector window as for the shapefile overlays and cell outline colours.

The colour selector can be used to specify your own colours.


Unfortunately the eyedropper selector does not work well on Windows, as it can only select colours from open Biodiverse windows.  This is a limitation of the system.  The workaround is to use a colour selector tool to copy the colour specification to the clipboard and then paste it into the Color name box in the selector window.   A list of possible tools is in this superuser.com question (with the caveat that I have not tested any).

You can also type colour names into the Color name box, and the small sample I tested of the colours at these URLs worked (mostly).  DarkGoldenRod or LemonChiffon anyone?
http://www.w3schools.com/colors/colors_names.asp
https://en.wikipedia.org/wiki/X11_color_names


Shawn Laffan
12-Sep-2016


For more details about Biodiverse, see http://purl.org/biodiverse 

For the full list of changes in the 1.99 series (leading to version 2) see https://purl.org/biodiverse/wiki/ReleaseNotes 

To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users



Monday, 29 August 2016

Easier to use randomisation results

This is an updated version of a previous post.  The key difference is that the significance classification described previously was too confusing, as values could be positive or negative and became more significant as they approached zero.  Instead, Biodiverse now provides relative ranks which can easily be converted to significant/not significant for any alpha cutoff.  This change should not affect many users, as the 0.99_003 release containing it was never announced...

One of the issues users face with the randomisations in Biodiverse is what to do with them once they have been run.

A key point is that the results are stored on the other analysis objects themselves, as extra indices and lists.  The index names themselves are a bit cryptic, but are consistent, and there is a description of what they mean here:  https://purl.org/biodiverse/wiki/AnalysisTypes#where-do-the-randomisation-results-go-and-what-do-they-mean

Even then, it can be difficult working out which of your groups have index scores that are significantly different from the set of randomised results.  This is because the data are plotted as a continuum (which in turn is because it uses the same plotting process as the original index scores).  The first image below is an example of this plotting.

One can easily export the data and work with them in a GIS or stats package, but any tied values need to be factored in for lower tail tests, for example as used in the CANAPE process.

With the next version of Biodiverse this categorisation will become a little easier.  Biodiverse will automatically calculate rank-relative positions that can be easily converted into significance levels.  These are stored in new lists that can be displayed and exported in the same way as any other data.

As an example, imagine you have run a randomisation analysis for a BaseData containing a spatial analysis in which you calculated phylogenetic endemism.  Assume that the randomisation's name is rr (not a good name, but it's convenient to type here), so the spatial analysis will now have three lists you can plot.  The first two are the same as ever:  SPATIAL_RESULTS contains the observed results for each group (cell), and rr>>SPATIAL_RESULTS contains indices to track the randomisations for each index in SPATIAL_RESULTS.  For example, for PE_WE there will be C_PE_WEQ_PE_WET_PE_WE and P_PE_WE collating, respectively, the number of times observed PE_WE was higher than that generated using the randomised data, the number of times observed PE_WE was compared against the scores from the randomised data, the number of times the observed and random scores were tied, and the proportion of iterations that the observed score was higher than the random scores (P_PE_WE = C_WE_PE / Q_PE_WE).

The new list is rr>>p_rank>>SPATIAL_RESULTS.  This contains a set of results using the same names as the original indices in SPATIAL_RESULTS, but converted to their rank relative positions.  Importantly, the lower tail ranks take into account any ties in the comparisons, thus simplifying any code that uses theses results.  Also, any value that would be considered not significant at alpha=0.05 (one tailed, high or low) is converted to undef (null).  This makes any plots of the results clearer within Biodiverse so one can more easily see which groups would pass a one-tailed high or low test.

An example plot is in the second image below.

All the ranks have been combined into the same list to reduce the number of indices and lists generated, and thus use less memory and disk space (an index cannot be simultaneously significantly high and low so there is no overlap).  If you are interested in a one tailed test for high values then ignore the low values, and vice-versa.  The values can be easily separated after exporting the results.

Currently the results are plotted in the same manner as any data, but there are plans to allow users to overlay the randomisation significance results over the top of the observed results, for example by masking our any non-significant scores for a given threshold.

The current system plots all scores, regardless of whether they pass a threshold or not.  This is useful, but is difficult to interpret when looking for significance against the randomisation.

The new randomisation list contains indices for the rank-relative positions of the observed values against the randomly generated values. These can then be used for one and two tailed significance tests.  The plotting could be improved, e.g. in this case it appears there are only two values, but this is simply due to the colour scaling.  However, it works well enough now for exploration - proper maps can also be made using a GIS or stats package.


To try it out you will need the 1.99_004 release (or later).  It can be accessed from https://github.com/shawnlaffan/biodiverse/wiki/Downloads

This is a new implementation, so any feedback about usability would be very useful.  

Shawn Laffan
29-Aug-2016

For more details about Biodiverse, see http://purl.org/biodiverse 

For the full list of changes in the 1.99 series (leading to version 2) see https://purl.org/biodiverse/wiki/ReleaseNotes 

To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users 



Sunday, 28 August 2016

New, more efficient file format

Users of Biodiverse will perhaps be familiar with what is called the "native" format for basedata, trees, matrices and projects.  These are the .bds, .bts, .bms and .bps files that are created when you save these objects.

The reality is that the "native" format is just a serialisation format in which all the various parts of the perl data structures that make up an object (e.g. a tree) are converted to a format that can be written to disk and then re-read at a later date, possibly on another computer.

While the format we have been using (called Storable) is stable and has done a good job over the years, a newer, more efficient format called Sereal is now available.  Version 2 of Biodiverse will use this new format by default.

The main reason for shifting to the Sereal format is efficiency: saving files is faster, and the file sizes are smaller.  See details here: http://blog.booking.com/sereal-a-binary-data-serialization-format.html 

These size and speed improvements will not be very noticeable for small files, but it can all add up when one is working with tens of thousands of groups (e.g. cells) and thousands of labels (e.g. species) across hundreds of spatial and cluster analyses.  A quick experiment with such a data set resulted in a greatly reduced file size (~1.6GB to ~750MB), with the time taken to save to file reducing from 30s to 12s.  The file load times were about the same at ~20s.  (Admittedly this was not a very scientific experiment, but the results were consistent across multiple runs).

What do users need to be aware of?  The main thing is that files created in Biodiverse version 2 will not be backwards compatible.  This means that Biodiverse version 1.1 or earlier will not be able to open files created using version 2 by default.  However, the "save as" dialogues have the option to save to the old format so you can maintain compatibility with older versions if you are in a mixed environment.

Also, any file in the old format that is loaded into Biodiverse version 2 will still be saved using the old format unless the user explicitly saves it to the new format.

If you want to test the new file format then it will be available in the 1.99_004 development release which should be coming out within the next week.


Shawn Laffan
29-Aug-2016


For more details about Biodiverse, see http://purl.org/biodiverse 

For the full list of changes in the 1.99 series (leading to version 2) see https://purl.org/biodiverse/wiki/ReleaseNotes 

To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users