Ruby的轻量级ETL工具:Kiba

xf3f 9年前

编写可靠,简洁,完善的测试和可维护的数据处理代码是棘手的。Kiba让你能够使用Ruby轻松定义和运行高质量的ETL (Extract-Transform-Load) jobs。

Kiba provides you with a DSL to define ETL jobs:

# declare a ruby method here, for quick reusable logic  def parse_french_date(date)    Date.strptime(date, '%d/%m/%Y')  end    # or better, include a ruby file which loads reusable assets  # eg: commonly used sources / destinations / transforms, under unit-test  require_relative 'common'    # declare a pre-processor: a block called before the first row is read  pre_process do    # do something  end    # declare a source where to take data from (you implement it - see notes below)  source MyCsvSource, 'input.csv'    # declare a row transform to process a given field  transform do |row|    row[:birth_date] = parse_french_date(row[:birth_date])    # return to keep in the pipeline    row  end    # declare another row transform, dismissing rows conditionally by returning nil  transform do |row|    row[:birth_date].year < 2000 ? row : nil  end    # declare a row transform as a class, which can be tested properly  transform ComplianceCheckTransform, eula: 2015    # before declaring a definition, maybe you'll want to retrieve credentials  config = YAML.load(IO.read('config.yml'))    # declare a destination - like source, you implement it (see below)  destination MyDatabaseDestination, config['my_database']    # declare a post-processor: a block called after all rows are successfully processed  post_process do    # do something  end

项目主页:http://www.open-open.com/lib/view/home/1429931698244