ノート
スライド ショー
アウトライン
1
Gfdnavi: A tool to archive, share, distribute, analyze, and visualize geophysical fluid data and knowledge
  • Takeshi Horinouchi (Kyoto Univ à Hokkaido Univ(soon)),
  • Seiya Nishizawa (Kyoto Univ à Kobe Univ),
  • Chiemi Watanabe (Ochanomizu Univ),
  • T. Koshiro, A. Tomobayashi, S. Otsuka,
  • Y. Morikawa, Y.-Y. Hayashi, M. Shiotani, and
  • GFD Dennou Club (Davis project)
2
What is Gfdnavi
  • = Geophysical fluid data navigator
  • A suite of software to construct Web-based database of geophysical fluid data
  • Functionality:
    • Search
    • Data analysis and visualization
    • Documentation of analysis results
  • Available:
  •     http://www.gfd-dennou.org/arch/davis/gfdnavi/
3
Background
4
Problems of Web-based database and analysis tools
  • Limited analysis capability
  •   à We often end up with downloading data
  • Not very suitable to desktop use
  •   à Service are not available to local data
5
More on the analysis capability
  • Impossible to predefine sufficient functionality  (since we are scientists)
  •   à Programmability is the key
  • Programmability in two ways:
    • Programmable on web-browser
    • Web-service API (program locally)
  •    à Both are desirable
6
Visualization is not the goal
  • To others (scientists / society): reports
  • While working: memos / internal documents
  • To collaborators: reports / know-how / discussion
7
Foundation of Gfdnavi
8
Two fundamental libraries used to build Gfdnavi (open-source)
  • GPhys – a Ruby library to analyze and visualize geophysical fluid data (by Horinouchi etc since 2003)
    • For consolidated access to data in files (NetCDF, GRIB, GrADS, NuSDAS, HDF5-EOS) or on runtime memory – A community infrastructure for data analysis    [http://ruby.gfd-dennou.org/] (since 1999)
  • Ruby on Rails – Development framework for Web application (since 2005)
    • Made it drastically easy to develop Web applications with RDB
    • Written in/for Ruby à We can use GPhys directly
9
GPhys (Gridded Physical quantity)
10
 
11
Why do we use Ruby?
  • Since we wanted a language for daily data analysis
    • Easy (fast) to write
    • Interactive use à like GrADS



    • Python is also fine (but we love Ruby)
12
Introducing Gfdnavi
13
Overview
14
Metadata DB
  • Metadata
    • name-value attributes; with a few standard field names
    • geospatial- and time-coordinate info
    • size, user info etc
  • Directory structure (inherit metadata from parent directories)
  • Generated by automatic scan (with a command)
    • variables: reading attributes through GPhys
    • directories: directory name and “Readme”-type texts
15
User Interface
16
 
17
Functionality
18
Functionality
19
Functionality
20
Functionality
21
Functionality
22
Functionality
23
Functionality
24
Functionality
25
Functionality
26
Functionality
27
Functionality
28
Functionality
29
Functionality
30
Functionality
31
Functionality
32
Functionality
33
Functionality
34
Functionality
35
Functionality
36
Web service
  •         à Tomorrow by Seiya Nishizawa
37
Network of Gfdnavi
  • Under development by C Watanabe (Ochanomizu Univ)
  • To create peer-to-peer network for cross search and cross use among Gfdnavi servers
  • Then one can access local data and remote data together
38
Summary
  • Novel features of Gfdnavi
    • Seamless coverage from desktop use to public data service (by having custom web server)
    • Programmability (on browser & by web service)
    • Documentation of analysis results (dynamically reproducible/extendible) (à memos / reports / PR / Blog for scientific collaboration)
  • Good implementation
    • Extendibility (by using GPhys)
    • Swift development (by using RonR)
39
Future Outlook
  • Support Networking à Create a Web of scientific data & knowledge
  • Increase analysis & visualization functionality (many needed)
  • Improve remote API accesses (tomorrow’s topic)


40
fin
41
 
42
GPhys (Gridded Physical quantity)
43
Example of GPhys’s associated coordinates
44
What is Ruby on Rails
http://www.rubyonrails.org/
  • Web development framework in Ruby
  • With RDBMS (Mysql, Postgres, SQL Server, SQLite etc)
  • Strong prototyping (e.g. Model-View-Controller (MVC) stucture)
  • Comprehensive library (covering Ajax and Web service)
  • Ruby-embedded html
    • à suitable to use our Ruby library
  • Has a private web server (Webrick); also runs on Apache, lighttpd etc
    • à One can personally run a web server anywhere with arbitrary port
45
From “Understanding Rails MVC”:
http://wiki.rubyonrails.org/rails/pages/UnderstandingRailsMVC
46
Sister-server method
47
P2P with directory server
48
Overlay network by P2P
49
copy from old slides
50
GPhys
A class of gridded physical quantities
  • Takeshi Horinouchi (RISH, Kyoto Univ.)
51
VArray
  • Virtual Array. A class of Ruby (written in pure ruby), which represents array data in GPhys
  • A VArray object behaves as an array, but its contents can be on various media: (case 1) simply a multi dimensional array on memory (NArray),  or data in a NetCDF file (in this case, a file pointer is stored), or GrADS data; (case 2) It can also represent a subset of another VArray or multiple VArrays tiled.
  • Can have attributes as variables in NetCDF datasets
  • In reality, NetCDF are handled by a subclass VArrayNetCDF etc.etc.
52
subset mapping of VArray
  • Always kept direct by compositing mappings, in order to prevent long chains (see the figure below).
  • Subset slicing (by such as va[0..10,3]) is done by subset mapping, not by making actual data extraction, if not explicitly specified otherwise. Therefore,
    • Computationally efficient
    • Suitable for writing in subsets of data in files.
  •  In other words, actual data cutting is deferred until needed – to defer operations until needed is a policy of GPhys construction
53
Structure of GPhys
  • Consists of a grid (coordinates) and multi dimensional array data
  • Can conduct mathematical operations (a GPhys behaves like an numeric array)
54
For your reference: Coordinates in NetCDF dataset
  • Variables that have same names as dimensions hold coordinate values (locations)
  • Weak point: this rule can be violated
55
More on cooridnates
  • 3 cases are prepared
    • point sampling
    • cell type
    • simple sequence (though it’s not physical)
56
Tiling
  • Data divided into “tiles” can be treated as one consolidated GPhys object.  Convenient to handle long time sequence divided by periods (such as by years) or outputs from parallel simulations on distributed-memory machines. Tiling is done by VArrayComposite.
  • Subsets can be handled (see the figure below)
  • May be applicable to parallel simulations in future?
  • So far, automatic configuration is available only for NetCDF, by using an Array or Regexp (e.g., /data_x(\d)y(\d).nc/ for data_x0y0.nc, data_x0y1.nc, data_x1y0.nc, data_x1y1.nc)
57
Big data handling
  • Iterator to handle data too big to read on memory at once.
    • GPhys::IO.each_along_dims_write – the result also written in file (since the result of operations is often big too.)  Another type of iterator is planned but yet to be implemented.
  • Example:
    • Without the iterator:
      •   in = GPhys::IO.open(infile, varname)
      •     ofile = NetCDF.create(ofilename)
      •     out = in.mean(0)          #  now, the entire result is on memory
      •     GPhys::IO.write( ofile, out )
      •     ofile.close
    • With the iterator, taking the last dimension to make a loop:
      •   in = GPhys::IO.open(infile, varname)
      •     ofile = NetCDF.create(ofilename)
      •     out = GPhys::IO.each_along_dims_write(in, ofile, -1){ |in_sub|
      •                 [ in_sub.mean(0) ]   #  written in ofile each time
      •               }
      •     ofile.close
58
Units of physical quantities
  • Handled by NumRu::Units (by E Toyoda)
  • mlt,div,etc.: handled as should be
  • add,sub: the units of the first term is inherited
    • e.g., addition of [m] and [km] is done after multiplying the second term by 1000.  Warning is made if the units are incompatible (in that case, no conversion is made).
  • Introduced a scalar numeric class with units UNumeric
    • GPhys, VArray, and UNumeric recognize one another (stronger to weaker in this order)
    • Example: to multiply the Coriolis parameter with a GPhys object u representing winds [m/s]:
    •     f = UNumeric[1e-4,”s-1”]
    •     coriolis_frc = f * u      # then the units will be in m.s-2
59
Distributed objects using dRuby
  • Data service to remote clients
    • gphys-remote: a simple directory service (like the anonymous ftp, directories and data (in which GPhys objects can be defined) under a top directory is made accessible to remote hosts.
    • gave (GUI): can connect to gphys-remote server