Skip to content

Conversation

@MingyuanChen94
Copy link
Collaborator

Fixes #2 Create on-demand downloader and bulk downloader with test files

Features created:

  1. On-demand download SoilGrids data by inputting lat-lon codes or British National Grid codes
  2. Bulk downloaders for downloading SoilGrids data and saving to a .nc file by inputting xmin, xmax, ymin, ymax (lat-lon codes)

Issues:

  1. Bulk downloaders only accpets lat-lon codes, does not signal input error
  2. Pylint: lines 49, 100, 116 (does not know how to resolve the issues but the codes still work)
  3. Bulk downloader does not ask for a directory to save the data, it automatically saves the data to the working directory
  4. Bulk downloader takes significant time to download data (4 hours for downloading the first two soil variables in testing)
  5. tmp file remains in the folder

Future feature proposal:

  1. Better error messages
  2. Speed up downloading speed (I am not sure how this can be achieved. I do not know if parallel would help, it does not help if the bottleneck is internet connection)

Fixes #2 Add soilgrids library in environment and ignore unrelated files
Fixes #2 Create on-demand downloader and bulk downloader with test files
@mcmancini
Copy link
Owner

That's brilliant, thank you, this is a very good starting point!

I have a couple of suggestions to improve the code:

  • there are a series of functions for the conversion of the data, which are declared and mapped to each soil variable whithin each class method.
    These can be taken out of the methods, and redefined as @staticmethod:
    class SoilApp:
        @staticmethod
        def mult_hundred(x):
            return x * 100
    
        @staticmethod
        def mult_ten(x):
            return x * 10
    
        @staticmethod
        def div_ten(x):
            return x / 10
    
        @staticmethod
        def div_hundred(x):
            return x / 100
    
        def __init__(self):
            self._obs_conversions = {
                "bdod": self.mult_hundred,
                ...
            }
    
        def on_demand_download(self, input_param):
            soil_data = ...
    
            # Perform the conversion for each variable
            for var, func in self._obs_conversions.items():
                soil_data[var] = func(soil_data[var])
  • The download on demand method still builds some bounding box and retrieves the data falling within that bounding box. At the moment the bounding box seems to be the lon-lat +/- 500km, which is very large. What we need to ideally do (maybe there is a way in the SoilGrids package, or we can think of reducing significantly the bounding box and then selecting the closest cell to the lon-lat passed as argument to the method).
  • If we remove the bounding box approach in the download on demand method, instead of returning an xarray (which would really be a single cell raster), maybe return a dictionary?
  • You will have noticed that in my code I do a weighted average of the soil properties across all depths up to a max value. This was done for the specific purpoises of our modelling but maybe it is better for this package to keep the data at different depths, since it could be useful to others. If we go for this, we need to figure out whether we can create a single netcdf file that has more than 3 dimensions (At the moment it is a raster with a number of layers equal to the number of soil variables, we would need to have for each current layer, an additional nuber of dimensions equal to the number of depth categories). Or maybe we create individual netcdf files for each depth? Let's have a think about it.
  • There are a few additional minor improvements that can be made: for example, pass all constants (i.e. the soil variable and the soil depth lists) as class attributes instead of local variables within each method.
  • One additional improvement that we can think of is to allow to pass **kwargs as arguments to the methods, with procedures within the class that would allow to identyfy the type of arguments passed, and process them accordingly (i.e.: if we pass an OSgrid code, that would have a key such as 'os_code', which, if present, would call a method (i.e. a staticmethod) which would convert automatically the code into a lon-lat pair). We could also implement error checking (i.e., we need at least a lon-lat pair or an osgrid code, otherwise return an error...).

Happy to have a chat about this whenever you have time! :)

Fixes #2 Refine the static functions, soilvars&depth as class attributes
Fixes #2 create a feature to draw a bounding box given central lat-lon
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create data downloader

3 participants