非死book之Thrift简介

jopen 12年前
       以下内容是从网上各处简单整理而来,因为前段时间自己一直在研究Thrift,把研究的一点资料分享给需要的同行中人!    <h4>第一节 <span style="font-family:Cambria;">RPC</span><span style="font-family:宋体;">技术及实现简介</span></h4>    <p>首先思考一下分布式系统中的 <span style="font-family:Calibri;">RPC (Remote Procedure Call) </span><span style="font-family:宋体;">问题,一个完整的 </span><span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">模块需要可以分为三个层次</span></p>    <p>· 服务层(<span style="font-family:Calibri;">service</span><span style="font-family:宋体;">):</span><span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">接口定义与实现</span></p>    <p>· 协议层(<span style="font-family:Calibri;">protocol</span><span style="font-family:宋体;">):</span><span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">报文格式和数据编码格式</span></p>    <p>· 传输层(<span style="font-family:Calibri;">transport</span><span style="font-family:宋体;">):实现底层的通信(如 </span><span style="font-family:Calibri;">socket</span><span style="font-family:宋体;">)以及系统相关的功能(如事件循环、多线程)</span></p>    <p>在实际的大型分布式系统中,不同的服务往往会使用不同的语言来实现,所以一般的 <span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">系统会提供一种跨语言的过程调用功能,比如一段用</span><span style="font-family:Calibri;">C++</span><span style="font-family:宋体;">实现的客户端代码可以远程调用一个用 </span><span style="font-family:Calibri;">Java </span><span style="font-family:宋体;">实现的服务。实现跨语言 </span><span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">有两种方法</span>:</p>    <p>· 静态代码生成:开发者用一种中间语言(<span style="font-family:Calibri;">IDL</span><span style="font-family:宋体;">,接口定义语言)来定义 </span><span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">的接口和数据类型,然后通过一个编译器来生成不同语言的代码(如</span><span style="font-family:Calibri;">C++, Java, Python</span><span style="font-family:宋体;">),并由生成的代码来负责 </span><span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">协议层和传输层的实现。例如,服务的实现用</span><span style="font-family:Calibri;">C++</span><span style="font-family:宋体;">,则服务端需要生成实现</span><span style="font-family:Calibri;">RPC</span><span style="font-family:宋体;">协议和传输层的</span><span style="font-family:Calibri;">C++</span><span style="font-family:宋体;">代码,服务层使用生成的代码来实现与客户端的通信;而如果客户端用 </span><span style="font-family:Calibri;">Python</span><span style="font-family:宋体;">,则客户端需要生成</span><span style="font-family:Calibri;">Python</span><span style="font-family:宋体;">代码。</span></p>    <p>· 基于<span style="font-family:Calibri;">“</span><span style="font-family:宋体;">自省</span><span style="font-family:Calibri;">”</span><span style="font-family:宋体;">的动态类型系统来实现:协议和传输层可以只用一种语言实现成一个库,但是这种语言需要关联一个具备</span><span style="font-family:Calibri;">“</span><span style="font-family:宋体;">自省</span><span style="font-family:Calibri;">”</span><span style="font-family:宋体;">或者反射机制的动态类型系统,对外提供其他语言的绑定,客户端和服务端通过语言绑定来使用 </span><span style="font-family:Calibri;">RPC</span><span style="font-family:宋体;">。比如,可以考虑用 </span><span style="font-family:Calibri;">C </span><span style="font-family:宋体;">和 </span><span style="font-family:Calibri;">GObject </span><span style="font-family:宋体;">实现一个 </span><span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">库,然后通过 </span><span style="font-family:Calibri;">GObject </span><span style="font-family:宋体;">实现其他语言的绑定。</span></p>    <p>第一种方法的优点是<span style="font-family:Calibri;">RPC</span><span style="font-family:宋体;">的协议层和传输层的实现不需要和某种动态类型系统(如</span><span style="font-family:Calibri;">GObject</span><span style="font-family:宋体;">)绑定在一起,同时避免了动态类型检查和转换,程序效率比较高,但是它的缺点是要为不同语言提供不同的 </span><span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">协议层和传输层实现。第二种方法的主要难度在于语言绑定和通用的对象串行化机制的实现,同时也需要考虑效率的问题。</span></p>    <p>Thrift <span style="font-family:宋体;">是一个基于静态代码生成的</span>跨语言的<span style="font-family:Calibri;">RPC</span><span style="font-family:宋体;">协议栈实现,它可以生成包括</span><span style="font-family:Calibri;">C++, Java, Python, Ruby, PHP </span><span style="font-family:宋体;">等主流语言的代码,这些代码实现了 </span><span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">的协议层和传输层功能,从而让用户可以集中精力于服务的调用和实现。</span><span style="font-family:Calibri;">Cassandra </span><span style="font-family:宋体;">的服务访问协议是基于 </span><span style="font-family:Calibri;">Thrift </span><span style="font-family:宋体;">来实现的。</span></p>    <h4>第二节 <span style="font-family:Cambria;">thrift</span><span style="font-family:宋体;">介绍</span></h4>    <p><a style="font-weight:bold;" href="/misc/goto?guid=4958185587929544096" target="_blank">Thrift</a><span style="font-family:宋体;">源于大名鼎鼎的</span><span style="font-family:Calibri;">非死book</span><span style="font-family:宋体;">之手,在</span><span style="font-family:Calibri;">2007</span><span style="font-family:宋体;">年</span><span style="font-family:Calibri;">非死book</span><span style="font-family:宋体;">提交</span><span style="font-family:Calibri;">Apache</span><span style="font-family:宋体;">基金会将</span><span style="font-family:Calibri;">Thrift</span><span style="font-family:宋体;">作为一个开源项目,对于当时的</span><span style="font-family:Calibri;">非死book</span><span style="font-family:宋体;">来说创造</span><span style="font-family:Calibri;">thrift</span><span style="font-family:宋体;">是为了解决</span><span style="font-family:Calibri;">非死book</span><span style="font-family:宋体;">系统中各系统间大数据量的传输通信以及系统之间语言环境不同需要跨平台的特性。所以</span><span style="font-family:Calibri;">thrift</span><span style="font-family:宋体;">可以支持多种程序语言,例如</span><span style="font-family:Calibri;">:  C++, C#, Cocoa, Erlang, Haskell, Java, Ocami, Perl, PHP, Python, Ruby, Smalltalk. </span><span style="font-family:宋体;">在多种不同的语言之间通信</span><span style="font-family:Calibri;">thrift</span><span style="font-family:宋体;">可以作为二进制的高性能的通讯中间件,支持数据</span><span style="font-family:Calibri;">(</span><span style="font-family:宋体;">对象</span><span style="font-family:Calibri;">)</span><span style="font-family:宋体;">序列化和多种类型的</span><span style="font-family:Calibri;">RPC</span><span style="font-family:宋体;">服务。</span><span style="font-family:Calibri;">Thrift</span><span style="font-family:宋体;">适用于程序对程 序静态的数据交换,需要先确定好他的数据结构,他是完全静态化的,当数据结构发生变化时,必须重新编辑</span><span style="font-family:Calibri;">IDL</span><span style="font-family:宋体;">文件,代码生成,再编译载入的流程,跟其他</span><span style="font-family:Calibri;">IDL</span><span style="font-family:宋体;">工具相比较可以视为是</span><span style="font-family:Calibri;">Thrift</span><span style="font-family:宋体;">的弱项,</span><span style="font-family:Calibri;">Thrift</span><span style="font-family:宋体;">适用于搭建大型数据交换及存储的通用工具,对于大型系统中的内部数据传输相对于</span><span style="font-family:Calibri;">JSON</span><span style="font-family:宋体;">和</span><span style="font-family:Calibri;">xml</span><span style="font-family:宋体;">无论在性能、传输大小上有明显的优势。</span></p>    <p>Thrift <span style="font-family:宋体;">主要由</span><span style="font-family:Calibri;">5</span><span style="font-family:宋体;">个部分组成</span>:</p>    <p>· 类型系统以及 <span style="font-family:Calibri;">IDL </span><span style="font-family:宋体;">编译器:负责由用户给定的 </span><span style="font-family:Calibri;">IDL </span><span style="font-family:宋体;">文件生成相应语言的接口代码</span></p>    <p>· TProtocol<span style="font-family:宋体;">:实现 </span><span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">的协议层,可以选择多种不同的对象串行化方式,如 </span><span style="font-family:Calibri;">JSON, Binary</span><span style="font-family:宋体;">。</span></p>    <p>· TTransport<span style="font-family:宋体;">:实现 </span><span style="font-family:Calibri;">RPC </span><span style="font-family:宋体;">的传输层,同样可以选择不同的传输层实现,如</span><span style="font-family:Calibri;">socket, </span><span style="font-family:宋体;">非阻塞的 </span><span style="font-family:Calibri;">socket, MemoryBuffer </span><span style="font-family:宋体;">等。</span></p>    <p>· TProcessor<span style="font-family:宋体;">:作为协议层和用户提供的服务实现之间的纽带,负责调用服务实现的接口。</span></p>    <p>· TServer<span style="font-family:宋体;">:聚合 </span><span style="font-family:Calibri;">TProtocol, TTransport </span><span style="font-family:宋体;">和 </span><span style="font-family:Calibri;">TProcessor </span><span style="font-family:宋体;">几个对象。</span></p>    <p>上述的这<span style="font-family:Calibri;">5</span><span style="font-family:宋体;">个部件都是在 </span><span style="font-family:Calibri;">Thrift </span><span style="font-family:宋体;">的源代码中通过为不同语言提供库来实现的,这些库的代码在 </span><span style="font-family:Calibri;">Thrift </span><span style="font-family:宋体;">源码目录的 </span><span style="font-family:Calibri;">lib </span><span style="font-family:宋体;">目录下面,在使用 </span><span style="font-family:Calibri;">Thrift </span><span style="font-family:宋体;">之前需要先熟悉与自己的语言对应的库提供的接口。</span></p>    <h4>第三节 使用<span style="font-family:Cambria;">thrift</span><span style="font-family:宋体;">的项目</span></h4>    <p>(<span style="font-family:Calibri;">1</span><span style="font-family:宋体;">)</span> Thrift<span style="font-family:宋体;">用于</span><span style="font-family:Calibri;">Quara</span><span style="font-family:宋体;">系统后端数据的通信,服务端是用</span><span style="font-family:Calibri;">C++</span><span style="font-family:宋体;">来实现的,客户端则是</span><span style="font-family:Calibri;">python</span><span style="font-family:宋体;">。</span></p>    <p>Quara<span style="font-family:宋体;">背景</span><span style="font-family:Calibri;">:Quara</span><span style="font-family:宋体;">是在线问答服务公司,类似新浪微博和百度知道的合体,消息灵通人士透露,去年</span><span style="font-family:Calibri;">Quara</span><span style="font-family:宋体;">获得了</span><span style="font-family:Calibri;">1400</span><span style="font-family:宋体;">万美元投资,目前他们只有</span><span style="font-family:Calibri;">9</span><span style="font-family:宋体;">名员工。</span></p>    <p>(<span style="font-family:Calibri;">2</span><span style="font-family:宋体;">)</span>Thrift<span style="font-family:宋体;">用于在多种</span><span style="font-family:Calibri;">Evernote API</span><span style="font-family:宋体;">平台开发的客户端与</span><span style="font-family:Calibri;">Evernote</span><span style="font-family:宋体;">服务器之间的通信与数据传输,</span><span style="font-family:Calibri;">Evernote API</span><span style="font-family:宋体;">定义了自己的</span><span style="font-family:Calibri;">Evernote Data Access and Management (EDAM) </span><span style="font-family:宋体;">协议规范,让客户端使用更小的网络带宽上传、下载文件和在线即时搜索服务。</span></p>    <p>Evernote <span style="font-family:宋体;">背景</span><span style="font-family:Calibri;">:EverNote</span><span style="font-family:宋体;">是一款非常著名的免费软件,它最大的特点就是支持多平台,而且数据能通过网络互相同步。譬如说,你可以随时在手机上的</span><span style="font-family:Calibri;">Evernote</span><span style="font-family:宋体;">新增笔记,回家后在电脑上也能看到它了!</span></p>    <p>(<span style="font-family:Calibri;">3</span><span style="font-family:宋体;">)</span>HBase <span style="font-family:宋体;">中的</span><span style="font-family:Calibri;">Thrift</span>:Thrift<span style="font-family:宋体;">用于</span><span style="font-family:Calibri;">HBase</span><span style="font-family:宋体;">中是为了提供跨平台的服务接口,在</span><span style="font-family:Calibri;">HBase </span><span style="font-family:宋体;">中可以使用</span><span style="font-family:Calibri;">[hbase-root]/bin/hbase thrift start </span><span style="font-family:宋体;">命令启动涵盖</span><span style="font-family:Calibri;">Thrift</span><span style="font-family:宋体;">的</span><span style="font-family:Calibri;">HBase</span><span style="font-family:宋体;">服务端,客户端通过</span><span style="font-family:Calibri;">thrift</span><span style="font-family:宋体;">的命令生成不同版本的客户端代码,根据定义的数据格式,对远程</span><span style="font-family:Calibri;">HBase</span><span style="font-family:宋体;">服务端进行操作,是除了</span><span style="font-family:Calibri;">REST</span><span style="font-family:宋体;">远程方法调用的另一种途径。</span></p>    <p>(<span style="font-family:Calibri;">4</span><span style="font-family:宋体;">)其他系统:如</span><span style="font-family:Calibri;">非死book</span><span style="font-family:宋体;">的</span><span style="font-family:Calibri;">scribe</span><span style="font-family:宋体;">系统、淘宝的</span><span style="font-family:Calibri;">timetunnel</span><span style="font-family:宋体;">系统和</span><span style="font-family:Calibri;">Hive</span><span style="font-family:宋体;">等等。<br /> <br /> 来自:<a href="/misc/goto?guid=4959500664825195027" target="_blank">http://blog.csdn.net/wanweiaiaqiang/article/details/7195145</a></span></p>    <p><strong>项目主页:</strong><a href="http://www.open-open.com/lib/view/home/1326329023999" target="_blank">http://www.open-open.com/lib/view/home/1326329023999</a></p>