Tuesday 23 June 2015

Import your species data from spreadsheets

Biodiverse has always imported data from delimited text files, for example using the Comma Separated Variable (CSV) format.  Support for rasters and shapefiles (point data only) was added in version 1.  However, data is commonly collated and sent around using spreadsheets.

If one has data in a spreadsheet then, in version 1 of Biodiverse and earlier, one has to export the data from the spreadsheet to CSV format.  This rapidly becomes annoying when one is updating the spreadsheet, as one repeatedly needs to export the data.  One thing we try to avoid in developing Biodiverse is annoyance.

As of version 1.0_001, you can import spreadsheet data.  This uses the same process as for text files, except for an additional selection of the data sheet to use within the workbook.  A couple of screenshots are below to illustrate the process.  Anyone familiar with importing data into Biodiverse will note how similar it is to existing processes.

The formats supported are Microsoft Excel (.xls and .xlsx formats) and LibreOffice (.ods).


As with text files, one can import multiple spreadsheets, and the same columns will be used from each.  However, there is the limitation that the same worksheet selection will be used for all selected files.  If you have your data across multiple spreadsheets, but the structure is not consistent, then you need to repeat the import process multiple times.




You can choose to import spreadsheet files on the first page of the data import process.   

The selection options are the same as for text imports, except the rename and property options are not available.  These can be added later using the BaseData menu in the GUI.  

Select which sheet is to be used from the spreadsheet.  If you have selected multiple spreadsheets then then this selection will be applied to all of them.

The rest of the import options are the same as for text imports.  

This is the same last step in the text import process.  

This functionality is available in the 1.0_001 development release which is now available.  Please give it a try and report any success or issues.  You can do this by commenting below, or by using the mailing list or the issue tracker.  

One known problem is that the process takes a long time for large spreadsheets (300,000 rows).  In such cases it is faster to save your data to CSV format and import using the Delimited Text format.

Shawn Laffan, 22-Jun-2015


For more details about Biodiverse, see http://purl.org/biodiverse

For the full list of changes in the 1.0 series see https://purl.org/biodiverse/wiki/ReleaseNotes#version-101

To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users 


No comments:

Post a Comment

Note: only a member of this blog may post a comment.