᳡ࡵ⛁㒓˖˄010˅88258888DŽ dbqq@phei.com.cnDŽ 䋼䞣ᡩ䆝䇋থ䚂ӊ㟇 zlts@phei.com.cn ˈⲫ⠜։ᴗВ᡹䇋থ䚂ӊ㟇 䇋Ϣᴀ⼒থ㸠䚼㘨㋏ˈ㘨㋏ঞ䚂䌁⬉䆱˖˄010˅88254888DŽ ޵᠔䌁ф⬉ᄤᎹϮߎ⠜⼒೒к᳝㔎ᤳ䯂乬ˈ䇋৥䌁фкᑫ䇗ᤶDŽ㢹кᑫଂ㔎ˈ ܗ ॄ ᭄˖3500 ݠ ᅮӋ˖39.00 ॄ ⃵˖2014 ᑈ 4 ᳜㄀ 1 ⃵ॄࠋ ᓔ ᴀ˖720h1000 1/16 ॄᓴ˖13 ᄫ᭄˖162 गᄫ ࣫ҀᏖ⍋⎔ऎϛᇓ䏃 173 ֵㆅ 䚂㓪˖100036 ߎ⠜থ㸠˖⬉ᄤᎹϮߎ⠜⼒ 㺙 䅶˖ϝ⊇Ꮦⱛᑘ䏃䗮㺙䅶ॖ ॄ ࠋ˖࣫Ҁ໽ᅛ᯳ॄࠋॖ 䋷ӏ㓪䕥˖߬ 㟿 Ё೑⠜ᴀ೒к佚 CIP ᭄᥂Ḍᄫ˄ 2014˅㄀ 043145 ো ĉ. ķ໻Ă Ċ. ķIĂ ċ. ķӕϮㅵ⧚ˉֵᙃㅵ⧚ Č. ķF272.7 ISBN 978-7-121-22605-2 2014.4 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵ˋ IT ᶊᵘ䆒䅵ⷨお㒘㓪㨫 . ü࣫Ҁ˖⬉ᄤᎹϮߎ⠜⼒ˈ ೒к೼⠜㓪Ⳃ˄CIP˅᭄᥂ ⠜ᴗ᠔᳝ˈ։ᴗᖙおDŽ ᳾㒣䆌ৃˈϡᕫҹӏԩᮍᓣ໡ࠊ៪ᡘ㺁ᴀкП䚼ߚ៪ܼ䚼ݙᆍDŽ ᴀк䗖ড়݋໛ϔᅮᶊᵘ෎⸔੠ᶊᵘ㒣偠ⱘҎ䯙䇏DŽ кЁᡒࠄⳌ݇ⱘᶊᵘ㒣偠ˈᇍᙼ೼Ҟৢⱘᶊᵘ䆒䅵Ꮉ԰Ё䛑㛑䍋ࠄᕜདⱘᐂࡽ԰⫼DŽ 䖯ⱘᶊᵘDŽ᮴䆎Դህ㘠Ѣાϔ㸠Ϯ䛑ৃҹҢᴀܜⱘḜ՟ᵕ݊ᅲ⫼ˈҷ㸼њ䆹乚ඳ䕗 Ḝ՟ሩᓔϢᶊᵘⳌ݇ⱘ䅼䆎DŽᴀк԰㗙ᴹ㞾Ѧ㘨㔥ǃᬭ㚆ǃӴ㒳㸠Ϯㄝ乚ඳˈߚѿ ᴀкҹ໻᭄᥂ᯊҷЎ㚠᱃ˈ䙔䇋㨫ৡӕϮЁⱘϔ㒓ᶊᵘᏜˈ㒧ড়Ꮉ԰Ёⱘᅲ䰙 ݙ ᆍ ㅔ ҟ ေፇ౧ࠀ˧ၹਗ਼ᄊᛡ˞˸ਅὊ˞ႃηᤂᖹ׸ࠄဘድጺӑܫေὊཀྵՑಪ૶ ܫ᜶රၹਗ਼ᄊʽᎪெঃᝮैॹᮌᜂδߛὊᏫ˅ᤇᭊ᜶ᤉᛡ஝૶Ѭౢ૎ଈ ൥ὊᤈࡃڂৌᒰТ᧘᜶ὊᏫԡ௧͜ፒᄊ CDR ភӭѬౢਫ਼ˀᑟଢΙᄊǍ ၹਗ਼ᄊʽᎪᛡ˞˗ᘔեᅌܸ᧚ᄊࠇਗ਼ྲढ़֗ࠇਗ਼ᭊරηৌὊᤈ̏η ዝὊ̰Ꮻ࠮ᒱࠫࠇਗ਼ᄊေᝍదᣗܸϠࣀὊᖹᩙ஍౧ˀΈǍ Ѭౢ˗ᜂឨॆ˞ʷښ˔௧ҫိᭊ᜶ᄊࢺͻႃភ὇߹ЛˀՏᄊː˔̡ड़ड़ ᤭ႃភ὇Ὂࠄᬅᤰភᄬᄊԣၷำவर὆Х˗ʷ˔௧௹ʽˁనԤᐊܹὊԳʷ ႀ̆௄ขᅼ௴ᤰភЯࠔὊː˔ᤰភᛡ˞വरዝͫ὆Γݠܴᫎ᫂௑ᫎᄊ᫂ ԪएॹཀྵॢࣀὊ๎ᠠܸ᧚ᠫູὊࣳ˅௄ขԩ४ᓢݞᄊ஍౧ǍᤈመѬౢὊ ᄊࠇਗ਼Ὂݠ౧࠲˨Տ኎ࠫॠὊࠇਗ਼ᄊଌی˞ᄱͫᄊ̡Իᑟ௧߹ЛˀՏዝ ࠇਗ਼ᛡ˞ѬౢԻᑟᎥܸܿ̀᧚ᄊࠇਗ਼ᛡ˞ద஍ηৌǍΓݠὊː˔ᤰភᛡ ᜂˏफǍᄬҒὊႃηᤂᖹ׸Զᑟ۳̆ CDR὆ហጺᤰភᝮै὇˞˟ᄊڂԔ ߛϲቇᫎ኎˞ڂʽᎪெঃႀ̆஝૶᧚ࢽܸὊ̗ၷՑԶᑟᜂδ႑ 3 ܹὊࡃ ረү̉ᐏᎪݠ൥௿ԣᄊ̭ܹὊඈܹ̗͘ၷܸ᧚ᄊʽᎪெঃὊᤈ̏ښ ေᄊဘ࿄ܫʷnjႃηᤂᖹ׸ʽᎪெঃ ᮍᓎ೑ ေ˗ᄊऄၹ౶౞ܫʽᎪெঃ ႃηᤂᖹ׸ښBEPPQ ੿ష( N N Y ಩ॷጸ͈ HBase HDFS ܱᦊ஝૶ູ ѵ ᫳ ৌ ๗ Ꭺᮆः segment ࣃѬዝ ေ ࣃઅԩܫᮕ આᇾ ѬዝѬឈᎪᮆ URL ः ఞழ ళઅԩ ࿄গ segment ࣃઅԩ URL ः ଢԩ ళઅԩ URL ྔԩ Ꭺᮆ ಖᝮ ጼൣ ေܫൣ ͊Ҭጼ Ꭺᮆः ࣃѬዝ Y URL ः ళઅԩ උࠫ ԩ URL ళઅ ଢԩ URL ௑஡͈ URL ˚ උࠫ URL Ѭዝ ۳ю ᣿໚ URL ெঃ ၹਗ਼ ਫ਼ᇨǍڏЦʹืሮݠʾ ˔੤఻Ղᆊᄊ˔̡؞ݞὊ˞ድюᖹᩙଢΙΚ૶Ǎ ፒᝠѣඈیᄊᎪᮆѬዝὊѾၹവڧࠫऄᎪ֗ڧ4Ὄಪ૶ඈ˔̡᝻᫈Ꭺ юᆸভǍ ὊଢᰴᎪᮆѬዝᄊیὊᤉᛡᎪᮆѬዝὊˀல͖ӑവی૶ોིᎪᮆѬዝവ ὊᯫЏྔԩᎪᮆ஝૶ὊཀྵՑࠫྔԩᄊᎪᮆ஝ڧڡ 3Ὄࠫళᅼᄊ URL 2Ὄࠫࣃᅼᄊ URL ஝૶Ὂોི۳ю URL ѬዝюѷᤉᛡѬዝǍ ᤉᛡଢԩǍڧڡ 1ὌࠫʽᎪெঃ஝૶ᄊ URL ေவขᄊืሮݠʾ὘ܫʽᎪெঃ஝૶ ေவขᄊืሮܫ̄njʽᎪெঃ஝૶ ေҪᑟǍܫ௑ଢΙ஝૶૎ଈ ᬤᅌ Hadoop ੿షࣱԼጇፒᄊѣဘὊԻ̿ࠄဘʽᎪெঃᄊߛϲὊՏ ᤂᖹଢΙ᧘᜶ᄊᖹᩙΚ૶Ǎ 3 Hadoop ᡔᴃ೼⬉ֵ䖤㧹ଚϞ㔥᮹ᖫ໘⧚Ёⱘᑨ⫼ᶊᵘ ᄊ஝૶ὊԻ̿ᤰ᣿ FTP ኎வیӊ઺᧔ᬷ֗ҫᣒ᣿ሮὊՏ௑˷ஃે஡͈ዝ ஝૶ःὊݠ OraclenjDB2 ኎ᄊ஝૶̔૱Ὂی஝૶ଌ԰ԻࠄဘࠫТጇ ጇፒଢΙ᝻᫈ଌ԰Ǎڊ஝૶ᄊ᧔ᬷὊ̉ᐏᎪᎪᮆЯࠔᄊྔԩܱ֗ࠫ ଌ԰ࡏ᠇᠊ˁܱᦊጇፒᄊ஝૶ᤉᛡ̔૱Ὂӊ઺ၹਗ਼஝૶njʽᎪெঃ ᥹ষሖ ᣥК҂ᬷᏆ HDFS ஡͈ጇፒ֗ HBase ஝૶ः˗Ǎ ̰ႃηᤂᖹ׸ጇፒ఩Ҭ٨ࠀ௑ᖍԩၹਗ਼۳వηৌ֗ʽᎪெঃηৌὊ ᭄᥂⑤ ᧫ࠫඈʷᦊѬᄊЦʹҪᑟ̮ፁݠʾǍ ਫ਼ᇨǍڏေጇፒᄊ᤿ᣤ౶౞வವݠʾܫေืሮὊʽᎪெঃܫ۳̆ʽᤘ ʼnjʽᎪெঃጇፒᄊ੿ష౶౞வವ 4 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 ӑ᧘ጸὊѳѬᒰࠫऄᄊࡏ൓ፇ౞Ὂݠ὆ᇫ̔-ᇫӝ὇ڱ࠲Х˗ᄊዝѿᤉᛡവ ฌв URL ԣХዝѿᄊྔԩὊࠫ஝૶ᤉᛡፒʷኮေὊࣳᎶ̆ેˤӑߛϲ˗Ǎ ᠇᠊ऄၹҪᑟᄊЦʹካขࠄဘǍࠄဘ̀ᎪᮆѬዝጊळὊᤰ᣿̉ᐏᎪ ᑨ⫼ሖ ូएǍ ౎͈ᤉᛡˀՏᄊืሮืᣁǍ᧔ၹप൦ូएኖ႕Ὂᑟܵஃેܸࣳԧ᧚ᄊ Ѭஃᄊৱцʾᤰ᣿ܳښ᡹ႀѼலᑟҧὊᑟܵܬᫎnj᧘ត൓஝ὊՏ௑Ц ܵᤉᛡ͊ਓ˙ࣳᐏᄊืሮូएὊࣳ˅ᑟܵ଍҄ᓬགᄊ͖Џጟnjᡔ௑௑ ေᑟܫ҄Ὂࣳ᠇᠊ Hadoop ᬷᏆᄊᤂᛡኮေ֗ጇፒઑ᝝ெঃኮေǍ஝૶ ေὊ ଢΙᬷᏆᄊ᝻᫈଍ܫӑڱᄊവڱေืሮവܫҪᑟࡏࠄဘ̀஝૶ ࡳ㛑ሖ Տᝈᓤᤉᛡ༧ำᦊᎸǍ ൥ᠫູࡏࠄဘ̀ࠫྭေᠫູᄊᒭүᦊᎸ֗үগੱ࡙ὊࠫѬ࣋रᬷᏆ˗ˀ ڂፒࣱԼࡏଢΙᤉʷ൦ᄊચ៶Ὂ̿ଢΙᒭүӑᦊᎸ֗ुভᄊᤂ፥ᑟҧὊ एὊᭊ᜶ࠫྭေᠫູࡏ֗ጇాܭႀ̆Ѭ࣋र౶౞ࣜ౏ᄊᆶ͈ᦊᎸᄊ 䌘⑤ሖ ኎᣿ሮǍ ϲ˗ԝǍᤰ᣿Ѭ࣋रᝠካ಴౶Ի̿ࠄဘ஝૶ᄊຍฤnjᣁ૱njಣᰎ֗ᜉᣒ Ὂ࠲஝૶ҫᣒ҂Ѭ࣋रߛی᣿஝૶ຍฤὊతጼોིᮕЏࠀ˧ݞᄊ஝૶വ ေࣱԼὊ̰஝૶ູચԩѣਫ਼ᭊᄊ஝૶Ὂፃܫ஝૶ࡏ௧Ѭ࣋रܸ஝૶ ᭄᥂ሖ ଍ኮေ֗߷Лভ኎ྲढ़Ǎ रᤉᛡ᧔ᬷǍጇፒܱࠫଢΙፒʷ᝻᫈ଌ԰ὊЦదनஊভnjᰴভᑟnjԻᄣ 5 Hadoop ᡔᴃ೼⬉ֵ䖤㧹ଚϞ㔥᮹ᖫ໘⧚Ёⱘᑨ⫼ᶊᵘ ਫ਼ᇨǍڏ੨ݠʾ ေጇፒᄊྭေᎪፏᦊᎸ઩ܫᤉᛡ᝻᫈଍҄ǍʽᎪெঃܗ˨ᫎᭊ᜶ၹ᫹༢ ေ౶౞᝺ᝠʽὊᭊ᜶᝺ᝠː˔߹டᄊЯᦊᬷᏆᎪፏὊᬷᏆᎪፏྭښ ൥Ὂ߲̓˨ᫎᄊ᝻᫈ᭊ᜶ˑಫ଍҄Ὂ̿δ᝽ᎪፏᦊᎸ߷ЛǍڂؓὊ ေଐஷˀܵ߹ܫ֗ႃηᤂᖹ׸ЯᦊᎪፏ̉ᐏὊᏫ˅ Hadoop ᬷᏆᄊ߷Л ေ Hadoop ᬷᏆ௧ܫႀ̆Ꭺፏྔԩืሮᭊ᜶̉ᐏᎪᠫູஃેὊ஝૶ ᄊЯࠔߛϲ҂ HDFS ஡͈ጇፒ˗Ǎڧ ڡ Ὂతጼંઅԩ҂ᄊ URLڧڡ MapReduce ᄊவरࣳԧઅԩ̗ၷᄊ URL ὊཀྵՑᤰ᣿ڧڡ ˗Ὂྔᙂ͘ಪ૶ʷࠀᄊ᜻ѷ̗ၷ᜶અԩᄊ URL ःڧڡ᤟Кྔᙂᄊڧڡ ԝ଀ǍҜʾᄊ URLڧڡ URLnjࣃፃѬዝᄊ URL ᄊ URLnjࣃፃઅԩᄊܭὙԝ᧘͘ં᧘ڧڡ ᜽ᮠnjᣄ͈኎Яࠔᄊ URL njྟڏὊࣳᤉᛡ᣿໚njԝ᧘୲ͻǍХ˗᣿໚୲ͻԝᬔڧڡ ஡͈˗ଢԩ URL መોིʷࠀᄊ᜻ѷὊᒭүઅԩʺ፥ᎪηৌᄊሮऀੋᏨᑮవǍሮऀ̰ெঃ ေืሮ௧ʷܫʹ᠇᠊̰̉ᐏᎪጇፒ˗ྔԩᎪᮆᄊЦʹЯࠔηৌǍЦ 㔥㒰⠀㰿 ေፇ౧ᤉᛡүগ࡙ᇨǍܫေҪᑟὊࣳࠫܫྀጷΎၹՊመऄၹ ေፇ౧ᤰ᣿ Web ᮆ᭧࡙ᇨὊࣳ˅ଢἹ̉ᮆ᭧Ὂܫ᠇᠊࠲ऄၹҪᑟ ሩ⼎ሖ ፒʷኮေǍڧڡ ̿Χஃ୞ᖹᩙำүὊࠄဘ̀ URL ХϠݞྲढ़Ὂಪ૶ЯࠔϠݞྲढ़ᤉᛡࠇਗ਼ጺѬὊࣳஃેᄬಖࠇਗ਼ᏆଢԩὊ ழὊˀல߹ؓǍࠄဘ̀ၹਗ਼ᛡ˞ፒʷѬౢὊ۳̆ࠇਗ਼ᄊ᝻᫈ᛡ˞Ὂគѿ བྷ᫃ឈලԣ࣢ၹឈලᄊྔԩὊಪ૶ਫ਼࡛ዝѿ౞थѬឈឈःǍឈःࠀరఞ ੋ὆ᇫ̔ -ॲӰ὇ὊࠫዝѿᤉᛡጊळǍࠄဘ̀ឈःѬዝኮေὊᤰ᣿ࠫᎪፏ 6 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 ेᬷᏆ˗ᄊ఩Ҭ٨ѣဘᩲឨ௑Ὂட˔ᝠካ᣿ሮࣳˀ͘ጼൣὊՏ௑Ѭ࣋र ေǍܫ˗ᤂᛡᄊܳ˔ᓬགښ஝૶ᄊ௑ϋὊᤰ᣿Ѭ࣋रᝠካ࠲Х͊ҬѬᝍࣳ ေܸܫښေܸ஝૶ᬷᄊᣄ͈಴౶Ǎܫ2ὌHadoop ੿షவವ᧔ၹࣳᛡ License ᄊੇవὊద஍ᎁᝍ̀ጇፒੱࠔࣜ౏ᄊᰴੇవԍҧǍ ౞ᄊ௿ᤰ PC ఩Ҭ٨ʽὊܸܸᬌͰ̀఩Ҭ٨֗ߛϲᄊੇవὊ̿ԣ஝૶ः ᧔ၹ X86 ౶ښѬ࣋रጇፒదᅌᰴࠔᩲভᄊྲགὊࣳ˅᝺ᝠၹ౏ᤂᛡ Ѭ࣋रߛϲ֗Ѭ࣋रᝠካጇፒǍ ಩ॷ੿షúúѬ࣋र஡͈ጇፒˁѬ࣋रᝠካ಴౶Ὂ ౞थ̀ʷடݓ߹டᄊ 1ὌHadoop ੿షவವ௧۳̆Ѭ࣋र۳ᆩ౶౞ὊЍѬѾၹѬ࣋रːܸ ʽᎪெঃጇፒ᧔ၹ Hadoop ੿షᝍхவವᄊ͖ҹదݠʾїགǍ njʽᎪெঃጇፒவವᄊ͖ҹپ ⠀㰿㔥㒰㋏㒳 Ϟ㔥᮹ᖫ໘⧚㋏㒳 ষᴎ Gnষ᮹ᖫ᥹ ষᴎ Ϟ㔥᮹ᖫ᥹ ষᴎ Ϟ㔥᮹ᖫ᥹ Clients Internal ` Clients Internal ` ... Web Server Web Server Hmaster Hmaster Node Node Name Name Node Data Node Data Node Data Node Data Node Data Traker Traker Job Job Switch Aggregation Internal Ѧ㘨㔥 ` 7 Hadoop ᡔᴃ೼⬉ֵ䖤㧹ଚϞ㔥᮹ᖫ໘⧚Ёⱘᑨ⫼ᶊᵘ ࠓࢅֺ̣dܤཙࡎēဍୣಾݮဟ Windows ༔થ Đပಇదԅٲc༔થᅽੋޯۦcࠓࢅֺ̣ď׻๠୶ჼٲޯܤ༔થރྻ ēճ׻๠୶Ӊ҅ܙปރࡂԅϦ೧ܬಓСޝᅖఉdտұϵူӖ໸ಬމغ ޏēฑనసࠝ MVPē੶ᄉ੠ּԙС٤ഓᆇ༿ࢳڳ԰㗙ҟ㒡˖ֺߙ Ὶूదҧᄊஃ୞Ǎ ေጇፒὊ˞សНՃᄊၹਗ਼ድюᖹᩙଢܫᖹ׸ᤂᛡࣱԼʽᦊᎸ̀ʽᎪெঃ ౽ᄵጟᤂښڡ᫳ὊࣃፃੇҪڄᄊ߹டᝍхவವ὇֗НՃूܸᄊካขஃે ܸ̈஝૶НՃὊ̵̓Κ੬ᒭ˟ᆑԧᄊ BDP ࣱԼᣄ͈὆ӊե Hadoop ࣱԼ ေጇፒોིʽᤘ౶౞Ὂଢѣ̀߹டᄊᝍхவವǍ࠿Х௧ܹܫࠫʽᎪெঃ ЯὊϸܹܸ̈஝૶njӨ˞nj̎η኎ܳࠒᅼՐᄊܸ͍ˊᦐ᧫ڎښᄬҒ ঌᤴnjᰴ஍ᤂᛡǍی̿ΎᎪፏྔᙂnjᎪᮆѬዝ֗ʽᎪᛡ˞኎വ ட˔ᬷᏆ˗ԧၷ஌ᬪᩲឨ௑ᄊ஝૶з͸Ǎᤈመ᝺ᝠவವԻښጇፒԻδᬪ 8 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 ቤ̂֗ঌᤴԧ࡙Ǎ Л᧚ˊҬ஝૶Ὂ௄ข൤ᆸંଧᩐᛡˊԧ࡙வՔὊ̰ᏫˀѾ̆ᩐᛡ͍ˊ ਓएὊ࠮ᒱࠇਗ਼ืܿὙ 3Ὄႀܸ̆᧚஝૶ሏጳߛϲὊᩐᛡ͍ˊ௄ขѬౢ 2Ὄܸ᧚஝૶ሏጳߛϲὊ࠮ᒱࠇਗ਼௄ขঌᤴᖍ४̔௜ηৌὊᬌͰࠇਗ਼໘ ҫ఻٨ভᑟ֗ߛϲቇᫎὊᄰଌҫܸ̀಩ॷጇፒᤂᖹ፥ઐੇవὙܙ1Ὄ ေᑟҧ Ǎͮ௧᧔ԩ̿ʽᄊኖ႕͘࠮ᒱ̿ʾˀᡜ὘ܫ̰ᏫଢᰴˊҬ஝૶ ԋԾ஝૶Ὂѓ࠶಩ॷጇፒᄊ஝૶ߛϲ᧚Ὂѓᣐ಩ॷጇፒᄊԍҧὊ͋ܬ2Ὄ ေᑟҧὙܫҫ಩ॷጇፒᄊ఻٨ভᑟ֗ߛϲቇᫎὊଢᰴˊҬ஝૶ܙ௧὘1Ὄ ൥ὊᄬҒ᧛ᚸᩐᛡˊ᧔ԩᄊ௿᥆ऄࠫኖ႕ڂ૶ःവरࣜ౏ࢽܸᄊԍҧǍ ஝ی஝૶ःЏܹভˀᡜὊࢽܸᄊ஝૶᧚ࠫ͘͜ፒᄊТጇیႀ̆Тጇ ᄱТˊҬ஝૶᧚ফқʽӤὊ᧛ᚸᩐᛡˊԁ࠲ᤉКܸ஝૶௑̽Ǎ ᫂Ὂܙঌᤴښ˷᝵ܳழУᄊஃ̷வरˀல๙ဘὊ᧛ᚸᩐᛡˊᄊ஝૶ηৌ᧚ ᩐᛡˊηৌӑࣃፃᤪຒ௿ԣὊͮ௧ᬤᅌ̉ᐏᎪ੿ష֗ऄၹᄊᮻᤴԧ࡙Ὂ ᧛ᚸᩐᛡˊᄊԧ࡙֗Ꭺፏᤰη۳ᆩ᝺ஷඵࣱᄊଢᰴὊ᧛ᚸڎᬤᅌੈ ʷnj᧛ᚸᩐᛡˊဘ࿄ 㭯ᔎᔺ ऄၹ౶౞ ᧛ᚸᩐᛡˊᄊښBEPPQ ࣱԼ( ጇፒǍڊ҂ܱڀေፇ౧ᤅܫᄊڀ௑ં̰಩ॷጇፒᤅ ေὊՏܫ̔௜஝૶ὊཀྵՑಪ૶̔௜ᆊᄊˀՏὊᣁ᤟ˀՏᄊ಩ॷጇፒᤉᛡ ጇፒᄊڊҒᎶˊҬጇፒ὘ˊҬ̔௜஝૶ᄊ᡺ᣁὊ߲᠇᠊ଌஆ౏ᒭܱ ጇፒǍڊܱ˞Ի̿ॆکˊҬᄱТᄊጇፒὊ ጇፒ὘᠇᠊ᄰଌˁࠇਗ਼ᤉᛡ̔̉ὊଢΙˊҬ఩ҬὊਫ਼దˁᩐᛡڊܱ ਫ਼ᇨǍڏ಩ॷˊҬጇፒጸੇὊݠʾ ጇፒnjҒᎶˊҬጇፒ֗ڊᄬҒὊᩐᛡ͍ˊᄊˊҬ۳వ᤿ᣤ౶౞ႀܱ Ѿၹ͉ϙǍ नԧᄱऄᄊካขࠫᤈ̏஝૶ᤉᛡ૎ଈѬౢὊଢᰴᩐᛡ͍ˊࠫԋԾ஝૶ᄊ ۳̆ Hadoop ੿షᄊྲགὊԻ̿ၹ߲౏ߛϲᩐᛡˊᄊሏጳ஝૶Ὂࣳ ᧛ᚸᩐᛡˊᄊऄၹ౶౞ښʼnj)BEPPQ੿ష х̵̓᭧˚ᄊܸ஝૶᫈ᮥǍ ऄܸࠫ஝૶௑̽ᄊ͖ҹӡѬ௚௭Ὂᡕ౏ᡕܳᄊ͍ˊ͘᧔ၹᤈመ੿షᝍښ ஝૶үগѬౢὊࣃፃ˞̵̓ࣜ౏̀ࢽܸᄊѾ๧Ǎ᧔ၹ۳̆ Hadoop ੿ష ᧔ၹ Hadoop ੿షࠄဘ̀๼ࠃ׸ֶ஝૶ߛϲ֗̔௜ڄᄬҒὊᬁ᧗ࣅࣅᬷ ऄၹὊ߲Ի̿ࠄဘ๒᧚஝૶ᄊͰੇవߛϲnj஝૶ᄊᰴ஍ᝠካ֗஝૶ѬౢǍ ᐏᎪᛡˊ֗ႃߕ׸Ҭᛡˊ४҂ࣹ̀ฅᄊ̉ښHadoop ੿షᄬҒࣃፃ ܸ஝૶ᄊ૎ଈऄၹ଎҂̀ʷ˔ழᄊᰴएǍ Ὂ࠲ی๒᧚஝૶ߛϲὊ߹ЛஃેѬ࣋रᝠካὊஃેᰴጟ஝૶૎ଈካขവ Hadoop ࣱԼ౶౞௧ࠫ͜ፒ౶౞ᄊᮬ᜸֗᭩ழὊ߲Ի̿ࠄဘͰੇవᄊ ̄nj)BEPPQ੿షᄊԧ࡙ဘ࿄ 10 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 ঴˗ॷ ҒᎶˊҬጇፒ ̡ᛡ Ꭺᩐnjஃ̷njေ᠉኎ጇፒ Х̵ҒᎶጇፒ ҒᎶጇፒ ҒᎶ఻ ႃភᩐᛡܬᒭҰ᝺ Ꭺག఩Ҭ٨ ˊҬጇፒ ኄʼவ ಩ॷˊҬጇፒ Ꭺᩐnjஃ̷njေ᠉኎ጇፒ Х̵ҒᎶጇፒ ҒᎶጇፒ ҒᎶ఻ ႃភᩐᛡܬᒭҰ᝺ Ꭺག఩Ҭ٨ ˊҬጇፒ ঴˗ॷ ҒᎶˊҬጇፒ ኄʼவ ̡ᛡ ࣱԼጇፒ಩ॷˊҬጇፒ Hadoop ਫ਼ᇨǍڏҫ Hadoop ࣱԼጇፒՑὊᩐᛡˊҬ۳వ᤿ᣤ౶౞ݠʾܙ ေҪᑟǍܫ஝૶૎ଈ ὙܱࠫଢΙ஝૶ಊវҪᑟὙᤇԻ̿ಪ૶஝૶ߛϲྲགὊଢΙ͋ܬ૶ߛϲ ҫ Hadoop ࣱԼጇፒὊࠄဘ಩ॷጇፒᄊԋԾ஝ܙ಩ॷጇፒࡏὊښˀԫὙ ᩐᛡ͍ˊΎၹ Hadoop ࣱԼ੿షᄊ۳వন᡹௧὘δેԔ౏ጇፒ౶౞ ေਫ਼ద̔௜ˊҬᄊЦʹࠄဘǍܫ᠊᠇಩ॷˊҬጇፒ὘ 11 Hadoop ᑇৄ೼䞥㵡䫊㸠Ϯⱘᑨ⫼ᶊᵘ ਫ਼ᇨǍڏፇ౞ݠʾڱᤰ࣢᧔ၹᄊҪᑟവ ःὊᤉᛡ᭤ፇ౞ӑᄊ஝૶ߛϲǍ अࡏὊѾၹ Hadoop ࣱԼጇፒὊᤉᛡ஝૶ܸ᜻വߛϲὊଢΙ HBase ஝૶ ေ֗஝૶ካข૎ଈὊ̿Χၷੇ໘ᡜᭊරᄊՊመ஝૶Ǎጇፒܫ૶ᤉᛡҫࢺ Ὂࠫ஝ڱေҪᑟവܫေࡏὊ͘᧫ࠫˀՏᄊˊҬᭊරଢΙˀՏᄊˊҬܫᫎ ᤉᛡ࡙ᇨǍጇፒ˗ڱေՑᄊፇ౧ᤰ᣿࡙ᇨവܫᦊጇፒᖍԩᠫູὊཀྵՑ࠲ Ὂ̰ܱڱ࡙ᇨവ֗ڱMVC ᄊവरᤉᛡ᝺ᝠǍᯫЏጇፒʽࡏὊᤰ᣿ଌ԰വ Hadoop ࣱԼጇፒ˞̀໘ᡜ᧛ᚸᮗ۫఩ҬᭊරὊጇፒЯᦊ౶౞᧔ၹ ᑟ࡙ᇨᭊ᜶Ǎ ҂಩ॷጇፒᄊ஝૶̲ःὊ̿Ι౽̏ઑ᛫Ҫڀေፇ౧ᤅܫጇፒὊ˷Ի̿࠲ ڊ҂ܱڀေፇ౧Ὂᤰ᣿ҒᎶˊҬጇፒὊᤅܫေὊࣳ࠲ܫࠫˊҬ̔௜ᤉᛡ Hadoop ࣱԼጇፒ὘ಪ૶ˊҬᭊරὊѾၹ̰಩ॷጇፒ࠮КᄊԋԾ஝૶Ὂ ̿ࠄဘ౽̏ಊវˊҬᭊරǍ ᭊ᜶ᄊ಩ॷ஝૶҂ Hadoop ࣱԼጇፒ˗Ὂ͋ܬ಩ॷˊҬጇፒ὘ࠀ௑ ጇፒǍڊ҂ܱڀေፇ౧ᤅܫေὊཀྵՑ࠲ܫᣁ᤟҂ Hadoop ࣱԼጇፒ˗ᤉᛡ ጇፒᄊ౽̏ಊវˊҬڊҒᎶˊҬጇፒ὘ಪ૶ˀՏᄊˊҬ̽ᆊὊ࠲ܱ ጇፒ὘ˀԧၷˀӑǍڊܱ 12 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵  ਫ਼ᇨǍ ڏᬷᏆᄊവरᤉᛡᦊᎸὊᤰ࣢᧔ၹᄊྭေᦊᎸፇ౞ݠʾܬ˟ጇፒ᧔ၹ ˀՏᄊ఻੝ᬷᏆ˗ᤉᛡᦊᎸὊښߛϲǍ ऄ࠲ Hadoop ࣱԼጇፒᣄ͈͋ܬ༫ ࠔڡ൥ጇፒॹᮌ᝺ᝠपڂ᧛ᚸᩐᛡˊࠫ஝૶ߛϲ߷Л᜶ර᭤࣢ᰴὊ ေவरǍܫරὊଢΙˀՏᄊ஝૶࡙ᇨ ေՑᄊፇ౧ᤉᛡ Web ᮆ᭧࡙ᇨὊՏ௑ᤇ᜶ಪ૶Ԕదጇፒᄊᭊܫࠫ ሩ⼎῵ഫ ߛϲኮေǍ͋ܬଢΙ HDFS ஡͈ጇፒὊଢΙ஝૶ܳҞవ ˟᜶Ҫᑟ௧ଢΙ HBase ஝૶ःὊࠫ᭤ፇ౞ӑ஝૶ᤉᛡፒʷߛஊኮေὊ ᭄᥂῵ഫ ေืሮ኎Ǎܫӊե஝૶૎ଈካขnjˊҬ˗ڱὊҪᑟऄၹവڱവ ေܫေᭊ᜶֗ጇፒᤂᛡᭊ᜶ଢΙࠫऄᄊҪᑟܫ˟᜶Ҫᑟ௧ಪ૶ˊҬ ࡳ㛑ᑨ⫼῵ഫ ေவขǍ ܫ˟᜶Ҫᑟ௧᧫ࠫˀՏᄊ஝૶ູ֗஝૶ಫरὊଢΙࠫऄᄊ஝૶࠮К ᥹ষ῵ഫ ູ஝૶௧̵̓ᄊ಩ॷˊҬ஝૶Ǎ ᧛ᚸᩐᛡˊ˗Ὂᤈ̏ښေᄊູ஝૶Ǎܫ˟᜶Ҫᑟ௧˞ጇፒଢΙҫࢺ ⑤᭄᥂῵ഫ ᄊЦʹឭ௚ݠʾǍڱඈ˔Ҫᑟവ 13 Hadoop ᑇৄ೼䞥㵡䫊㸠Ϯⱘᑨ⫼ᶊᵘ ˊေࠄ௑̔௜ܫڡ᠍ՂԋԾ஝૶੪ӿಊវ Ҫᑟᄊ̔௜὇ Ὂᝨ಩ॷጇፒఞݞ 4ὌѾၹ Hadoop ࣱԼጇፒὊ੾ઞ̀಩ॷጇፒ౽̏๗Ᏺভ̔௜὆Γݠ὘ ᝮैὊ˞ᄣኮࣟঅὊଢῚదҧᄊ੿షஃ୞Ǎ රὊᎄи஝૶૎ଈካขὊѾၹ̔௜஝૶Ὂঌᤴࠀ͍ͯˊ᭤ขฤᨑᄊ̔௜ 3ὌЍѬѾၹ̀ Hadoop ࣱԼ੿షᄊ஝૶૎ଈҪᑟǍԻ̿ಪ૶ˊҬᭊ ଢᰴ̀ࠇਗ਼ᄊ໘ਓएὊࠄဘ̀̿ࠇਗ਼˞˗ॷὊଢᰴ̀ᩐᛡᄊቤ̂ҧǍ ैὊඓሚጟଽጊፇ౧ὊԻ̿˞ၹਗ਼ࠄ௑ଢΙ͊͵̔௜௑ᫎᄊ̔௜஝૶Ὂ 2ὌЍѬѾၹ Hadoop ࣱԼ੿ష๒᧚஝૶ঌᤴଽጊҪᑟǍᄈʺ̣౎ᝮ Hadoop ࣱԼጇፒ˗Ὂࠄဘ๒᧚஝૶ߛϲǍ PB ጟᄊ஝૶ߛϲὊԻ̿ંᩐᛡˊҬ̗ၷᄊਫ਼దˊҬ஝૶ᦐߛϲ҂ 1ὌЍѬѾၹ Hadoop ࣱԼ੿షᄊߛϲ͖ҹǍHadoop ࣱԼԻ̿ଢΙ ࠲ʽᤘ౶౞வವळᤉ᧛ᚸᩐᛡˊ˗Ὂ࠲ЍѬѾၹ̿ʾ͖ҹǍ nj)BEPPQ੿షᄊ౶౞͖ҹپ 14 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 ಝdٲޯރົં࠼स स࠼ིdં஍ၽၩݮԙúúඟၩӖ೴࡬Ӥఉڕંົރಝ࠼ིٲֺ̣ޯ ೬ԅࢗ֟ٝᆴēပտّԨ໰ၮ࿯౥cࠡక໻ྜԅӖ೴࡬ࠓࢅޏڑ࡬ອ സݯᅥྜഋಶēЩ୙ҶಹӉ҅నߑcӖ೴ޙ԰㗙ҟ㒡˖ ༯ஜ཮ē ௑̽ᄊ҂౏ଢΙूదҧᄊ੿షδ᝽Ǎ ൥Ὂ Hadoop ࣱԼ੿షॹ࠲ࠫ᧛ᚸᩐᛡˊऄܸࠫ஝૶ڂऄಊវፇ౧Ǎ־ጟ ᛡᄊԋԾ஝૶ಊវጇፒ˗Ὂࠄဘ̀សᩐᛡਫ਼ద᠍Ղᄊ̔௜ԋԾᝮैඓሚ ᄬҒὊܹܸ̈஝૶НՃὊࣃፃ࠲ʽᤘ౶౞ᝍхவವੇҪऄၹ҂౽ᩐ ηৌጇፒᄊે፞Ϥकԧ࡙Ǎ ҬὊЍѬԧ૙͜ፒ஝૶ःᄊ͖ҹὊϢ҂͖ҹ̉ᛪὊ̰Ꮻδ᝽᧛ᚸᩐᛡˊ IT 15 Hadoop ᑇৄ೼䞥㵡䫊㸠Ϯⱘᑨ⫼ᶊᵘ ࣱԼᄊүͻ὆ӊեڮՏᄊ஝૶Ǎᤈਓ֊ᅌࣱԼၹਗ਼ᬤ௑Իᑟᤉᛡʷ̏ᆡ HDFS ʽߛదՊመ஝૶ὊదНၹᄊὊద఻ࠛᄊὊˀՏᄊၹਗ਼Ի̿᝻᫈ˀ ᫳ǍनஊԁС̚ Hadoop ᬷᏆ὘ڄ᫳ˁ᭤੿షڄ᫳ὊᤈХ˗ӊ઺੿ష ڄਫ਼់नஊὊࡃ௧ᑟܵनஊፌНՃ͊͵దᭊ᜶஝૶ࣱԼᄊХ̵ᐌᑟ ୄथनஊࣱԼ ̄˞ʷὊ౞थʷ˔नஊ˅߷Лᄊ Hadoop ፒʷࣱԼǍ ᫳ܸ஝૶ߛϲˁ੿షᄊஃેὊੈ̓ᭊ᜶࠲ː˔ Hadoop ᬷᏆՌڄࠫХ̵ ஃ୞ˊҬՌࣳὊ̿ԣڡՊᒭવదʷ˔උᣗܸᄊ Hadoop ᬷᏆǍ˞̀ఞݞ டՌ˨ҒὊː᣸ښ̓ࠫː˔᜽ᮠᎪባᄊ஝૶ᤉᛡடՌnjˊҬᤉᛡடՌǍ ࠈ࣋ՌࣳὊᤈ˔ழ᫕ᡜ̿ᝨˊႍෳᒑǍᬤ˨Ꮻ౏ᄊᭊ᜶ੈ៰ژெὊ͖ᦺ ൦ळК Hadoop ੿షǍ2012 ࣲ 3 థ 12ᤪښ˷᫳ڄHadoopὊНՃᄊХ̵ ΎၹښὊᡕ౏ᡕܳᄊᮗ۫ᦐ־ᅌ Hadoop ੿షᄊ଎ࣹ̿ԣܸ஝૶ᄊॖ Ύၹ HadoopǍːࣲᫎᬤښ᫳ڄࣲ˨ҒὊԶద͖ᦺᎪ஝૶ 2011 ښொ ᑀఀ ٙᵄ ˨᡹ BEPPQ ࣱԼनஊ( ៰ژᦺ͖ ˔൥ܱὊᬤᅌनஊᄊၹਗ਼ᡕ౏ᡕܳὊ˞̀வΧՑ፞ᄊੱ࡙Ὂੈ̓࠲Ѭᦡܳ ᤂᖹኮေʽੈ̓Զᭊ᜶Тฌї˔Т᪄ᄊᓬགǍښ˗ᄊ͊͵ʷ˔ᓬགὊᤈನ ᄊ୲ͻԶᑟᤰ᣿ࣱԼଢΙᄊࠇਗ਼ቫ఻٨౏ੰᛡὊၹਗ਼ˀᑟᄰଌ᝻᫈ᬷᏆ ఩ҬǍၹਗ਼ࠫᬷᏆܭJobTracker ੋᏨ NameNode ѣဘ஌ᬪ௑Ի̿ঌᤴূ ၹ఻ᄊᝈᓤὊेܬNameNode ᦊᎸὊ൥ܱὊSecondNameNode ᤇЍे̀ NameNode ˁ JobTracker ѬनᦊᎸὊSecondNameNode ˷࿘ቡ̆ Ǎڏʷ˔උᣗ߷Лᄊ Hadoop ᬷᏆᄊ઩੨ፇ౞˞ڏᭊ᜶ୄथ HadoopǍʾ ڡѺଌᝏ Hadoop ௑Ὂˀኮ௧ߦ˸ᤇ௧ၷ̗ऄၹὊˀԻᥘВښܸࠒ ઩੨౶౞ ভǍܥ৏ਓԣ᭤৏ਓ὇Ǎਫ਼̿ੈ̓ᯫЏ᜶δ᝽ࣱԼᄊ߷ЛভԣϤ 17 Ӭ䝋ೳ䈚 Hadoop ᑇৄᓔᬒП䏃 ʷᓊੈ̓͘नԧʷ̏ऄၹᤌଌ Hadoop ᄊ HDFS ఩ҬὊඋݠெঃ᧔ 䴲⊩ᑨ⫼ⱘ䖲᥹ ឭ௧ౝएˀ߷ЛᄊǍ ၹਗ਼ࡃԻ̿ᤰ᣿ᤈ˔ጼቫ୲ͻ HDFS ᄊਫ਼దᠫູὊᤈࠫनஊ஝૶ࣱԼ౏ ጼቫᤌଌ҂ੈ̓ᄊᬷᏆὊࣳ˅សၹਗ਼વదᤈ˔ጼቫᄊ root ᠍ਗ਼Ὂᥧ˦ស ࡃԻ̿᝻᫈ᠫູǍϜ᝺ၹਗ਼Aᤰ᣿ԳʷԼళᅼᄊLinuxڧڡԶ᜶ᅼ᥋఩Ҭ Hadoop ᬷᏆᳮᝣࣳ෥ࠫᤌଌХ఩Ҭᄊ Linux ጼቫϢᢶ͋ᝣ᝽Ὂਫ਼̿ Linux 㒜ッⱘ䱣ᛣ䖲᥹ ͋ᝣ᝽᫈ᮥnjၹਗ਼ిᬍ᫈ᮥnjWeb ႍ᭧᝻᫈଍҄᫈ᮥ኎Ǎ ʷ̏߷Л᫈ᮥǍඋݠᢶښˀ͘Ꮶᘽܺܳ߷Л᫈ᮥǍͮ௧ Hadoop ᆸࠄߛ ொరࣳښ߲ᄊߛϲᑟҧ̿ԣѬ࣋रᝠካᑟҧὊᒰ࠶˞ڂК Hadoop ᦐ௧ ᫳ळڄ͊͵ጇፒ᜶ࠄဘनஊᦐ᜶Џᝍх߷Л᫈ᮥǍੈ̓ᅼ᥋ܸᦊѬ ߷Л౶౞ ࠶ॢܳՑర፥ઐੇవǍ ࠇਗ਼఻ʽଢ̔᭤ᄱТᄊሮऀǍҒర᜻ᔵݞԻ̿ѓښᤉᛡᄣ଍Ὂ᫹ൣၹਗ਼ ᫳ࠄᛡᦡՌኮေǍेཀྵᤇᭊ᜶ࠫ఻٨ᠫູڄΙ฾តΎၹ̿ԣࠫඈ˔ၹਗ਼ ᄬै̏דᄬैΙၷ̗Ύၹnj̏דࠇਗ਼ቫᄊѺరࡃᭊ᜶҄ࠀˑಫᄊ᜻ᔵὊ ౞थښਅˀՏὊॢԻᑟѣဘ౽˔ၹਗ਼ᄊ୲ͻΎ४ட˔ࠇਗ਼ቫ߼఻Ǎਫ਼̿ ᫳СՏΎၹᄊৱцǍႀ̆ඈ˔̡ᄊ˸ڄ˔఻Ὂᤈ͘ѣဘ౽˔ࠇਗ਼఻దܳ ᫳ΎၹᄊৱцѬᦡᄱऄᄊࠇਗ਼ڄ˔˞̀ЍѬѾၹᠫູὊੈ̓ોིՊ ᄊͻၹǍ ၹܬ᫳Ѭᦡ҂ˀՏᄊࠇਗ਼ቫ఻٨ʽὊՏ௑˷ᡑ҂ڄࠇਗ਼ቫ఻٨Ὂ࠲ˀՏᄊ 18 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 Իᑟ࠮ᒱ task ᄊܳ൓ܿ᠌తጼ࠲࠮ᒱͻˊᄊܿ᠌Ǎ ᦡᎶ̿ԣੱ࡙ӊᦡᎶˁᬷᏆ˗ᄊХ̵ᓬགˀʷᒱὊॢܒ̏ slave ᓬགဗ ᝺Ꮆ̀౽˔ task ࠽ត 4 ൓ݠ౧ˀੇҪѷ࠲ᜂ᜽˞ܿ᠌Ὂݠ౧ழ຋ҫᄊᤈ MapReduce ಴౶˗ੈ̓ ښ௑ʾ౶Ὂᥧ˦࠲࠮ᒱᤈ˔஡͈ᄊˏܿǍ൥ܱὊ ᤈழ຋ҫᄊ slave ᓬགʽὊ˨Ցݠ౧᭤ขၹਗ਼৏ਓ࠲ᤈ̏ᓬགՏښߛϲ ᦐ͋ܬ˔ၹਗ਼຋ҫї˔ళᅼᄊ slave ᓬག҂ᬷᏆ˗Ὂ্ݞ౽˔஡͈ᄊʼ ߛϲὊᬤՑᜂХ̵͋ܬ˔̿ԣͻˊᄊܿ᠌ǍϜ᝺ੈ̓ඈʷ͋஡͈ᦐ௧ʼ ҫ҂ master ˗ᄊὊᤈನʷ̏ˀԻ᭥ᄊ slave ຋ҫదԻᑟ࠮ᒱ஝૶ᄊˏܿ ҒᄊᬷᏆ௧Ի̿ᬤਓ຋˨ښmaster ᓬགᆸࠀ̿ՑὊslave ᓬག ښᄊὊਫ਼̿ Hadoop ᄊː˔˟᜶ᦊѬ HDFS ֗ MapReduce ᦐ௧ master/slave ፇ౞ slave 㡖⚍䱣ᛣ⏏ࡴ ᤈನ࠮ᒱͥካᄊፇ౧ˀюᆸǍ ࣱԼϢੇవͥካઑ᛫௑͘࠲ A ๗Ᏺᄊᝠካᠫູᦐካ҂ B ၹਗ਼ᢶʽὊښ̄὘ ၹਗ਼дЍ B ၹਗ਼ଢ̔ͻˊὊ᝻᫈వ౏ A ၹਗ਼ࣳ෥దిᬍ᝻᫈ᄊ஝૶ὊХ రమ͢ᤵᄊᢶ͋ὊࡃԻ̿дЍសᢶ͋ᤉᛡͻˊଢ̔Ǎᤈ࠲࠮ᒱХʷ὘ A ଢ̔ MapReduce ͻˊ௑ὊԶ᜶࠲ user.name ᄊ࡛ভ᝺ᎶੇͿ̓ੈښ ܙ⫼᠋䑿ӑⱘݦ ᄊˊҬઑ᛫Ǎ ൤࣢־ᄊᝠካᠫູὊ࠮ᒱெ࣢ᄊˊҬᝠካ४ˀ҂ᡜܵᄊᝠካᠫູ̰Ꮻॖ ˀ߷ЛᄊǍՏ௑ᤇԻ̿नԧʷ̏ሓదᄊऄၹሮऀၹ౏᣿ए๗Ᏺ஝૶ࣱԼ ࡃԻ̿ड़ HDFS ʽߛϲ஝૶njξஈ஝૶Ὂᤈನࠫဘదᄊ஝૶௧ౝХڧڡ ᄊ஝૶ࣱԼ˗ࣳ෥ࠫኄʼவऄၹϢᢶ͋ᝣ᝽Ὂ͊͵ APP Զ᜶ᅼ᥋Х఩Ҭ Ғ˨ښᬷጇፒ࠲ܱᦊᄊˊҬጇፒ᧔ᬷ᣿౏ᄊெঃᄰଌʽ͜҂ HDFS ʽǍ 19 Ӭ䝋ೳ䈚 Hadoop ᑇৄᓔᬒП䏃 ˔ᄊ᝺ᝠ˷ᥖ̰ Linux ஡͈Ǎඈ˔஡͈વద឴njиnjੰᛡʼመ୲ͻὊඈ HDFS ஡͈ጇፒ᝺ᝠവલ̀ Linux ஡͈ጇፒὊਫ਼̿ HDFS ஡͈࡛ভ ⫼᠋㒘ֵᙃ᥻ࠊ Kerberos ᄊԻ᭥ভǍ ̀ूܙᝣ᝽ηৌὊᤈನܸܸ ᄊ᫈ᮥǍੈ̓ᦡᎶ̀˟̰ KDC ఩ҬὊՏ௑Ѿၹᑮవࠄ௑Տ൦˟̰ःᄊ ˔ӭག஌ᬪ᫈ᮥǍळКʷ˔ழᄊጇፒὊܳ̀ʷ˔ဗᓬॹཀྵࣜ͘౏ʷ̏ழ ʷښၹਗ਼ឰර఩Ҭᄊ௑ϋᭊ᜶Џ̰ KDC Ѭԧ TicketὊᤈನ KDC ߛ ᑮవԻ̿ʷ᪄ᝍхǍ Kerberos ʽѬᦡᄱऄᄊᢶ͋ηৌǍੈ̓ᤰ᣿ᒭүӑ ښᓬགᭊ᜶ܙਗ਼njழ ၹܙӊեᢶ͋ᝣ᝽ηৌὊKDC ˞ࠛᨅѬԧ఩Ҭ٨ǍळК Kerberos ՑὊழ Ὂӊ઺ Identity Store ֗ KDC ːᦊѬǍХ˗ Identity Store ˟᜶ڏ˟ʹፇ౞ ਫ਼ᇨ˞ Kerberos ᄊڏᝣ᝽नթὊʽᤘଡᤘᄊᄱТ᫈ᮥᦐᑟ४҂ᝍхǍʾ Ցᄊྠవ˗ஃે̀ KerberosὊੈ̓࠲ Kerberos ߷Л̿ 1.0 ښ Hadoop ᓩܹ Kerberos 20 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 ᧝̆ʽ᭧Ѭౢᄊፇ౧Ὂੈ̓࠲ Hadoop ឴ԩၹਗ਼ጸηৌᄊଌ԰ᤉᛡ 4ὌኮေౝХˀவΧǍ ᄊిᬍᠫູᦡᎶᭊර௄ข४҂໘ᡜǍాܭ3Ὄ ᬲǍڈ2Ὄࠇਗ਼ቫ఻٨͕ܳὊ௢࠱Тጇ஝૶ᄊʷᒱভඋᣗ ᬷᏆᄊ߷ЛǍڮ1Ὄ͊͵ࠇਗ਼ቫᄊˀ߷Лᦐ͘ᆡ ὘ܫ˨˔ˀᡜ ʾї̿ښ፬ʽѬౢὊੈ̓ԧဘ Hadoop ᳮᝣ Linux ࠇਗ਼ቫᄊጸηৌߛ ৌఞஈඋᣗᮠጓᏫ˅ԡඋᣗᤕѭὊᤈನࡃ͘˞ᬲኮေրǍ ୲ͻኮေʽౝ˞ˀவΧὊద௑ϋʷ̏ၹਗ਼ᄊిᬍηښৌЛᦊఞஈǍᤈನ ᤈ̏఻٨ʽ࠲ࠇਗ਼ቫᄊጸηښવదᤈ˔ၹਗ਼ᄊᄆैηৌὊཀྵՑѬѿᭊ᜶ їԼࠇਗ਼ቫ఻٨דᔪਇࠫ౽˔ၹਗ਼ᄊጸηৌᤉᛡఞஈὊᭊ᜶Џᅼ᥋ ˀʷᒱǍ ࠇਗ਼఻ʽ፥ઐʷ̏ᄱՏᄊၹਗ਼ጸηৌὊᥧ˦͘࠮ᒱၹਗ਼ጸηৌᄊ஝૶˔ ܳښᜂᤵϜǍ൥ܱὊᤌଌ҂஝૶ࣱԼᄊࠇਗ਼ቫ఻٨஝᧚ᣗܳὊݠ౧ᭊ᜶ ൥ܱὊݠ౧ጸηৌΚᠻ̆ࠇਗ਼ቫ఻٨ᄊភ˷ॢࠔ௜Ύၹਗ਼ᄊጸηৌ ˦ሮऀࡃ͘ઑᩲǍ ˷ࡃ௧ឭὊݠ౧ࠇਗ਼ቫၹਗ਼ᄊጸηৌˁ HDFS ʽᄊిᬍηৌˀӜᦡὊᥧ ిᬍѼலᄊ௑ϋᤰ᣿ូၹʷ˔ଌ԰౏ᖍԩ Linux ࠇਗ਼ቫᄊၹਗ਼ጸηৌὊ ᤉᛡၹਗ਼ښᄊిᬍǍᏫ Hadoop వᢶࣳ෥దߛϲၹਗ਼ᄊిᬍηৌὊᏫ௧ ੈ̓ԧဘὊHadoop ᔪ᜶ࠫ஡͈ࠄဘ༧ำᄊిᬍ଍҄ᭊ᜶᝺Ꮆጸၹਗ਼ ᦡԂˀܵ༧ำǍ ԁʷСવద 9 መˀՏᄊᦡᎶኖ႕Ǎᤈನᄊ᝺ᝠඋᣗእӭὊͮ௧ХిᬍѬ ֗ጸၹਗ਼ᄊిᬍǍᤈನʼመˀՏᄊᢶ͋ࠫʼ˔ˀՏᄊ୲ͻᤉᛡଆѵጸՌὊ ஡͈Զᑟॆ࡛̆ʷ˔ਫ਼దᏨὊॆ࡛̆ʷ˔ጸὊඈ˔஡͈ᦐࠀ˧̀ਫ਼దᏨ 21 Ӭ䝋ೳ䈚 Hadoop ᑇৄᓔᬒП䏃 read write HadoopDPM MySQL ਗ਼Զᑟ᝻᫈Хదిᬍᄊ஝૶ὊᤈನࡃԻ̿ࠄဘࠫ Web UI ᄊ᝻᫈଍҄Ǎ ൓᝻᫈௑௄ᮌᄆैǍ൥ܱὊၹਗ਼ᄊ token ˁၹਗ਼ᄊᢶ͋˷௧ፅࠀᄊὊၹ ಩Ǎၹਗ਼ᄆैᝣ᝽ᤰ᣿ѷ͘ၷੇʷ˔᝻᫈ token ࣳߛϲ҂ cookie ˗Ὂʾ ฌв௑ႀၹਗ਼ᒭᛡ᝺ࠀὊኮေրᤉᛡࠆښਗ਼Ր֗ࠛᆊὊសၹਗ਼Ր֗ࠛᆊ ኄʷ൓᝻᫈ Web UI ႍ᭧͘ᒭү᡺ᣁ҂ʷ˔ᄆैႍ᭧Ὂ᜶රၹਗ਼ᣥКၹ ᧫ࠫᤈ˔᫈ᮥὊੈ̓ξஈ̀ Hadoop ᳮᝣᄊ Serverlet ᣿໚٨Ὂၹਗ਼ ࣜ͘౏ʷ̏Х̵߷Л᫈ᮥǍ ૶ःᦡᎶᄊ᠍Ղࠛᆊҫᣒ҂ᦡᎶ஡͈˗Ὂᤈನᤰ᣿ Web ႍ᭧ఒ᭛ѣ౏Ὂ ਗ਼Ի̿ᄺ҂ਫ਼దͻˊᤂᛡηৌ̿ԣᄱТᦡᎶηৌὊద௑ੈ̓͘࠲ʷ̏஝ HDFS ʽᄊ஝૶Ὂᤈࠫʷ̏ሓద஝૶ᄊ߷Лভ௧෥దδ᝽ᄊǍ൥ܱὊၹ ηৌǍᳮᝣᤈː˔ႍ᭧ࣳళࠫၹਗ਼ᢶ͋ᤉᛡᝣ᝽Ὂ͊͵ၹਗ਼ᦐԻ̿᝻᫈ Hadoop ᳮᝣଢΙː˔ Web ႍ᭧὆ 50030 ֗ 50070὇ၹ౏Ιၹਗ਼ಊវ Web UI 䆓䯂᥻ࠊ Զᭊ᜶ᤰ᣿ DPM ࢺЦࡃԻ̿ᣐ౛૿ిǍ MySQL ˗ᄊၹਗ਼ጸηৌὊᤈನݠ౧ၹਗ਼ᭊ᜶᝻᫈ʷ˔ழᄊᠫູὊኮေր வᤉᛡኮေǍԳܱὊੈ̓नԧ̀ʷ˔ DPM ࢺЦὊၹ̆ξஈڡ˔ʷښ˗ ௧ᤰ᣿ MySQL ஝૶ःᖍԩၹਗ਼ᄊጸηৌὊᤈನਫ਼దၹਗ਼ᄊጸηৌᦐᬷ hadoop.security.group.mapping ᦡᎶ˞ᒭࣂࠄဘᄊʷ˔ዝὊΎ४ Hadoop ਫ਼ᇨὊ࠲ڏ᧘иὊ࠲ηৌેˤӑ҂ʷ˔࿘ቡᄊТጇ஝૶ः˗Ǎݠʾ 22 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 ඈ˔ၹਗ਼ᄊΎၹ˸ਅˀʷನὊඈ˔ၹਗ਼ᄊ୲ͻඵࣱˀʷನὊݠ౧෥ 㾘㣗 ॢ࠶ᄊʷᦊѬὊᤇᭊ᜶̰ॢܳவ᭧ᤉᛡ᜻ᔵ֗ᄣ଍Ǎ ᑟनஊᏫ௧᜶नஊݞǍݠ͵੦ᑟᤂᖹݞʷ˔ Hadoop ࣱԼὊ੿షड़ड़Ӵ ʷ˔Իၹᄊ࿄গǍेཀྵੈ̓రమᄊፇ౧ˀ̩̩௧̆ܫ௧ᡑ൦Ὂ̽᛫ࣱԼ ᝍх̀߷Л᫈ᮥԶᑟឭ Hadoop ࣱԼनஊѣԝ̀Ὂ߹ੇᤈ˔᫽඀Զ ࣱԼᤂᖹ ኮေր ၹਗ਼ ᢶ͋ᝣ᝽፥ઐᢶ͋ηৌ ࠇਗ਼ቫ఻٨ ᄆै ᧝ి ηৌ ፥ઐၹਗ਼ ឴ԩၹਗ਼ጸ Hadoop ၹਗ਼ηৌ Kerberos Web UI DPM Kerberos ˗ǍDPM ᤇଢΙၹਗ਼ฌвnjኮေրࠆ಩njઑ᝝኎ҪᑟǍ ᛡిᬍѼலὊኮေրξஈၹਗ਼ηৌ 5 Ѭ᧿ՑࡃԻ̿ၷ஍҂ Hadoop ጇፒ ૶ः˗ǍHadoop ᤰ᣿ MySQL ஝૶ः˗ᄊၹਗ਼ጸηৌࠫၹਗ਼ᄊឰරᤉ MySQL ஝ ښKerberosnjၹਗ਼ηৌnj Linux ࠇਗ਼ቫὊХ˗ၹਗ਼ηৌߛϲ ὊХ˗ኮေրᤰ᣿ DPM ࢺЦԻ̿୲ͻڏਫ਼ᇨ˞டʹ߷Л౶౞ڏʾ ᅝܼᶊᵘ೒ 23 Ӭ䝋ೳ䈚 Hadoop ᑇৄᓔᬒП䏃 ፇ౧ૉ࠮ᬷᏆԠ஝ូ͖Ǎ Ꮸ࠲૯ܿత࠵ӑǍᄣ଍ࠄᬅʽ˷௧ࠫဘద఩ҬᄊѬౢὊԻ̿ᤰ᣿ᄣ଍ᄊ ஌ԧၷҒԧဘҒЎὊ᧔ԩଐஷᥘВᤵੇ૯ܿὊੋ̃ښਫ਼់ᄣ଍ࡃ௧ ⲥ᥻ ᤂᖹৱцǍ ౽መሮएʽԻ̿Ԧ௢ࣱԼᄊښ፞ழืሮᄊᮋѾੰᛡǍืሮࠄஷᄊৱц Ὂᤈನ੦ᑟδ᝽ՑેڲંଧͱТ᪄ᄊགࡃݞǍืሮʷே଎ᛡὊʷࠀ᜶ ˁၹਗ਼ᄊˀॹ᜶෤ᤰǍืሮᄊ҄ࠀᭊ᜶ંଧՌᤠᄊएὊˀᑟܳ͸དဵὊ ᝻᫈ᄊ߷ЛǍࣱԼԧ࣋ழྲভᭊ᜶ᡌӤጟืሮǍืሮѓ࠶̀ኮေ̡ր ʷᤉКࣱԼࡃЩੇᓢݞᄊ˸ਅǍၹਗ਼ႂឰిᬍᭊ᜶ืሮὊδ᝽஝૶ښ ၹਗ਼ฌвழ᠍ਗ਼ᭊ᜶ืሮὊᤈನ͘ΨΎၹਗ਼Џྀ৙ࣱԼ᜻ᔵὊՏ௑ ⌕⿟ ࠄ᡻˗ˀல߹ؓǍښ҄ࠀˀරʷ൓҂ͯὊԻ̿ Ր኎Ǎ᜻ᔵδ᝽̀ᬤᅌ௑ᫎᄊ଎ረὊࣱԼΚைϸѸᦊᎸ௑ʷನǍ᜻ᔵᄊ ᄊˊҬᏫࠀǍेཀྵੈ̓ᤇ᝺Ꮆ֑̀Ր᜻ᔵὊਫ਼దᄊᄬैᦐၹ࠵иߚඇ֑ ᄬैὊݠ὘ /tmpnj/commonnj/tmpnj/warehouse ኎ὊЦʹ᜶ಪ૶ඈ˔НՃ user ʾǍेཀྵᤇదХ̵ ښ᫳ˊҬηৌὊጳʽᄊˊҬηৌˀЊ᝵ஊᎶڄϲ ᫳ᄬैὊᦡᮩඋᣗܸὊၹ̆ߛڄ˞ᮩὊΙၹਗ਼ߛϲ˔̡஝૶Ὑ work ᄬै थ̀ʷ˔ user ֗ work ᄬैὊХ˗ user ˞ၹਗ਼ᄊሓ̡ቇᫎὊ᝺Ꮆʷࠀᦡ ಪᄬैѹښඈʷጟᄊᄬै̿ԣᄱऄᄊˊҬե˧Ὂ̰ၹਗ਼ᝈए౏ᄺὊੈ̓ ҄ࠀߛϲ᜻ᔵ֗ᝠካ᜻ᔵǍHDFS ௧ʷ˔஡͈ጇፒὊ̰ಪᄬैनݽ҄ࠀ ေրᄊ፥ઐੇవǍHadoop ˟᜶ଢῚː˔఩Ҭ὘ߛϲˁᝠካὊੈ̓ᭊ᜶ ҫၹਗ਼˨ᫎᄊ෤ᤰੇవὊ͘ଢᰴኮܙదፒʷᄊ᜻ᔵ͘ΎࣱԼˀݞၹὊ͘ 24 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 ࣱԼनஊᒰ̭ܸܸ࠵࠵ᄊ᫈ᮥ˷᥅҂ˀ࠶Ὂඈʷ൓᫈ᮥᄊѣဘᦐ͘ ঴ፇ ͻˊࠫᬷᏆԠ஝ᤉᛡូடǍ ᫳๗ᏲᄊᝠካᠫູὊಪ૶ԋԾڄ˔ѬౢԋԾͻˊᤂᛡৱцὊ಩ካඈ ग़৆԰Ϯߚᵤ ReduceSlot ஝Ǎ ReduceSlot ᄊ๗ᏲඋὊ̿൥౏ូட᫳ѵᄊԠ஝Ὂ̿ԣ MapSlot ˁ ᄣ଍᫳ѵᄊࠄ௑ូएηৌὊඈ˔ͻˊᄊ኎ॠηৌὊ̿ԣ Mapslot ֗ 䯳߫ⲥ᥻ ࠫၹਗ਼ᄊ HDFS ᦡᮩᤉᛡᄣ଍Ὂଢᧈၹਗ਼ԣ௑ຍေ᣿ర஝૶Ǎ 䜡乱ⲥ᥻ټ⫼᠋ᄬ Ցї˫෥దˀྀ৙ᄊप࣢ηৌǍ ࠫ NameNodenjDataNode ᄊெঃ˗ᄊप࣢ࠀరᤉᛡѬౢὊѬౢ҂త ᓖᐌߚᵤ ᮩᬍ҄Ὂԣ௑ઑ᝝Ǎ ᭊ᜶ᄣ଍ඈ˔ᓬགᄊඈ˔ᇓᄨᄊЦʹΎၹৱцὊࣳࠫᇓᄨΎၹϢᦡ ⺕ⲬՓ⫼ⲥ᥻ ᄊೝಊǍ ᄊᓬག߼఻ళԣ௑ԧဘὊᤵੇ૯ܿǍྲѿ௧ SecondNameNode ߛำ࿄ц ᎄиᑮవᄣ଍Պᓬག௧աߛำὊ̿ԣᄱऄᄊઑ᝝ଐஷὊ̿Вܸ੻᧚ ᒋⲥ᥻ع⚍㡖 25 Ӭ䝋ೳ䈚 Hadoop ᑇৄᓔᬒП䏃 ஝૶ࣱԼ౶౞࣎Ὂழ๎ॲӰ὘cloudfireǍ៰ژ԰㗙ҟ㒡˖ϬౕὊ͖ᦺ 1 ˔̡ெᤉᛡெ࣢፥ઐԁԻǍ ʷСᤂᛡᄊͻˊ஝ᡔ᣿ 200 ʺǍᄬҒࣱԼᤂᖹඋᣗሷࠀὊඈևԶᭊઆК ၹὊᡔ᣿ 100 ܳ˔ฌвၹਗ਼Ὂඈܹଢ̔ͻˊ஝ᡔ᣿ 7000ὊࣱԼᤂᖹᒰ̭ Ύښ᫳ڄ˔ܳ ࣃፃద 20ڄΎၹࣱԼὊᄬҒЛᬷښ᫳ڄ˔ གǍ̰తѺᄊ 1 ᄣ଍ܙΨΎੈ̓নᏦ௧աᭊ᜶ξஈืሮnj௧աᭊ᜶຋ҫ᜻ᔵnjᭊ᜶᜶ழ 26 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 heartbeat …… heartbeat HAagent SlaveNameServer NameServer sdb sdc dsp1 dsp2 dsp3 DataServer sda dsp2 dsp3 sdb sdc dsp1 DataServer sda blockid,fileid heartbeatmessage Data controlmessage DataServer id blockid/ Application/Client ਫ਼ᇨǍڏጸੇὊݠʾ TFS ᬷᏆႀՐߚ఩Ҭ٨὆NameServer὇֗஝૶఩Ҭ٨὆DataServer὇ ౶౞እ̮ http://code.taobao.org/p/tfs/src/὇ὊଢΙፌܱᦊၹਗ਼ΎၹǍ TaoCode ʽनູ὆ᮊᄬ˟ᮆ὘ ښతܸᬷᏆߛϲ஡͈஝ࣃᤃӢ̣ǍTFS ࣃ ๼ࠃᄊՊᮊˊҬ˗ὊᄬҒࣃᦊᎸᄊښऄၹڡ஝૶ߛϲ఩ҬǍTFS ᜂࣹฅ ࣋र஡͈ጇፒὊ۳̆௿ᤰᄊ Linux ఩Ҭ٨౞थὊ˟᜶ଢΙ๒᧚᭤ፇ౞ӑ TFS὆Taobao File System὇௧ʷ˔ᰴԻၹnjᰴভᑟnjᰴԻੱ࡙ᄊѬ ᓴটϰ ๼ࠃ๒᧚஡͈ߛϲࠄ᡻ Яߛ˗ὊࣳˀᤉᛡેˤӑߛښNameServer ʽᄊਫ਼దЋηৌᦐδߛ ߛϲ఻҄ ὊΎ४ block Ҟవ஝ˀͰ̆ᬷᏆᦡᎶϙὊδ᝽ጇፒߛϲ஝૶ᄊԻ᭥ভǍ҄ ܭѷᝣ˞ DataServer ࣃፃ߼఻Ὂ͘࠲ស DataServer ʽߛϲᄊ block ᤉᛡ DataServerὊे NameServer ᡔ᣿ʷࠀ௑ᫎ෥దஆ҂ DataServer ᄊηৌὊ NameServer ԧ᤟ॷ᡺ηৌὊ NameServer ѷಪ૶ॷ᡺౏ኮေਫ਼దᄊ ፌڡթүՑὊ͘Ք NameServer ලઑХߛϲᄊਫ਼ద block ηৌὊࣳևరভ ᇓᄨὊ̿ΧЍѬѾၹᇓᄨ IO ᠫູǍDataServerڱᤉሮὊඈ˔ᤉሮኮေʷ ʷԼ఻٨ʽᦊᎸܳ˔ DataServerښDataServer ఩Ҭ٨ᦊᎸ௑ᤰ࣢͘ ҬᄊᰴԻၹভǍ ၹ఩Ҭ٨ࡃѭ૱˞˟఩Ҭ٨౏ଌኮ఩ҬὊ̿δ᝽఩ܬၹ఩Ҭ٨ʽὊܬ҂ NameServer ᄊ࿄গὊेХೝ฾҂˟఩Ҭ٨߼఻௑Ὂ HA agent ࠲ vip ѭ૱ ܬ˟ၹ఩Ҭ٨Ὂ HA agent ᠇᠊ᄣ଍ܬࣳ࠲ block ξஈηৌՏ൦ᒰ ఩Ҭ٨С̚ʷ˔ vipǍ൤࣢ৱцʾὊ˟ NameServer ેద vip ଢΙᄊ఩ҬὊ NameServer ఩ҬᦊᎸ௑᧔ၹ HA ౏ᥘВӭག஌ᬪὊːԼ NameServer ᄊ஡͈Ǎ ߛϲ஡͈௑ѬᦡὊblock id ֗ file id ጸੇᄊ̄Ћጸ׭ʷಖគʷ˔ᬷᏆ᧗ښ ˗ᄊ஡͈વదʷ˔ block Я׭ʷᄊ஡͈ᎄՂ὆file id὇Ὂfile id ႀ DataServer ᎄՂ὆block id὇Ὂblock id ႀ NameServer ѹथ௑ѬᦡὊblockڱ׭ʷᄊ஝૶ ᬷᏆ˗વదЛࡍښ ˀՏᄊ఻౶ʽὊ̿δ᝽஝૶ᄊᰴԻ᭥ভǍඈ˔ block block ˗ࠀͯ஡͈Ǎඈ˔ block ͘ߛϲܳ˔Ҟవ҂ ښथቡጊळὊ̿Χঌᤴ Տʷ˔ block ˗Ὂࣳ˞ blockښ64MB὆ԻᦡᎶ὇ὊTFS ͘࠲ܳ˔࠵஡͈ߛϲ ὆block὇˞ӭͯߛϲ֗ጸጻ஝૶Ὂblock ܸ࠵ᤰ࣢˞ڱTFS ̿஝૶ 137 ᅲ䏉ټᅱ⍋䞣᭛ӊᄬ⎬ ӱҿѕ߆ԇ֘݇ރ ݐଟޞstep4:գ ўކstep1:બࡌӊ ॆद,ଏ֛ ӊҵ݇ރ ۨ؏ӊݱҁ ࣍ߎेઍ blockޏ޾ step6: ݕсӊબࡌ ӊ؏ࡂ݇ރ:step5 ঀӊৈߧۯؚ step7:ଏ֛ ॆद,ଏ֛ ӊҵ݇ރ ӱҿѕ߆ԇ֘݇ރ ݐଟޞstep4:գ DataServer(Replica2)DataServer(Replica3) step3:էmasterՇଟӊબࡌ ێङblockͫଏ֛blockѹ৥Ғ step2:ଣӟ▲ЗՕӊ Client DataServer(Master) NameServer block ЯᦊᄊϠረͯᎶ̿ԣ஡͈ᄊܸ࠵Ǎ ښळ˗ᝮै̀ block ˗ඈ˔஡͈ DataServer ʽὊඈ˔ block ࠫऄʷ˔ጊळ஡͈Ὂጊ ښ᫈҂ߛϲᄊ஡͈Ǎ idnjfile id ᎄᆊጸੇᄊ஡͈ՐὊ̿Ց Client ᤰ᣿ស஡͈ՐԁԻ̰ TFS ᝻ ʷ˔ႀᬷᏆՂnjblockڀ஡͈ੇҪи҂ܳ˔ DataServer ՑὊ͘Ք Client ᤅ ᛦ஍౧ࣳˀݞǍेکࠄ௑ᄊὊಪ૶᣿௑ᄊηৌ౏Ѭᦡ blockὊࠄ᡻᝽௚Х ὊՏ௑ႀ̆᠇ᣒηৌˀ௧ాܭDataServer ᠇ᣒ౏Ѭᦡᄊኖ႕Ὂࠄဘʽᣗ ᧔ၹ round robin ᄊኖ႕ѬᦡὊᤈመኖ႕እӭద஍ǍХ̵ಪ૶ڡӭ ѬᦡԻи block ௑Ὂእښ ਫ਼ᇨὊNameServerڏTFS и஡͈ืሮݠʾ ᛦǍکᛦ௑Ὂ᣻ረ஝૶౏δ᝽఩Ҭ٨ᫎᄊ᠇ᣒکԧဘ DataServer ᠇ᣒˀښ block὆ᤰ࣢௧ႀ DataServer ߼఻࠮ᒱᄊ὇Ὑ ҄ܭԧဘ block Ꭵ࠶Ҟవ௑ ښৌǍNameServer ద˄᫃ᄊՑԼጳሮᣃវՊ˔ block ֗ Server ᄊ࿄গὊ ௢࠱Тጇ᛫Ὂ˞иឰරѬᦡԻиᄊ blockὊ˞឴ឰරಊវ block ᄊͯᎶη NameServer ಪ૶ᤈ̏ηৌ౏౞थ block ҂ Server ᄊ௢࠱Тጇ᛫Ὂಪ૶ស ϲǍDataServer թүՑὊ͘࠲Хવదᄊ block ηৌලઑፌ NameServerὊ 138 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 NameServerǍ tair ᎁߛᄊ֑˗ဋ᭤࣢ᰴὊΎ४ፐܸᦊѬᄊ឴ឰරᦐˀᭊ᜶ፃ᣿ Ѭ࣋र key/value ߛϲጇፒὊhttp://code.taobao.org/p/tair/wiki/intro/὇Ὂऄၹ tair ˗὆tair ௧๼ࠃनູᄊ ښߛὊ࠲ block ҂ DataServer ᄊ௢࠱Тጇᎁߛ ᎁߛᄊ֑˗ဋࣳˀᰴὊ˞൥ TFS ᤇஃેᤊቫᎁڡద஝ӢʺὊ̰Ꮻ࠮ᒱవ ᎁߛᄊЯߛࣳˀܸὊᏫᬷᏆ˗ block ᄊ஝᧚ڡູඋᣗదᬍὊᑟܵၹ̆వ ࠄᬅऄၹ˗Ὂᤰ࣢ࠇਗ਼ቫᑟΎၹᄊጇፒᠫښ̰తழᄊͯᎶʽ឴ԩ஡͈Ǎ DataServer὇Ὂࠇਗ਼ቫతጼ̰͘ NameServer ᖍԩ block తழᄊͯᎶηৌὊ DataServer ʽ᝻᫈ᄊ஡͈Ǎݠ౧ cache ࣃፃܿ஍὆ block ᜂ᣻ረ҂Х̵ cache ֑˗ὊܸᦊѬৱцʾᦐᑟ̰ cache ᧗ᖍԩ҂᜶̰ ڡਫ਼̿ʷேవ ԧၷ஝૶᣻ረᄊ௑ϋ੦͘ԧၷԫӑὊښDataServer ᄊ௢࠱ТጇʷᓊԶ͘ Ὂႀ̆ block ҂ڡClient ͘࠲ block ҂ DataServer ᄊ௢࠱Тጇᎁߛ҂వ failoverǍ˞̀ଢᰴ Client ឴ԩ஡͈ᄊ஍ဋࣳᬌͰ NameServer ᄊ᠇ᣒὊ ௑˟үᤉᛡ᠌ܿښClient ᠇᠊߹ੇ឴иѻ TFS ஡͈ᄊ۳వ᤿ᣤὊࣳ Ǎ־࣢ᄊ఩Ҭᤵੇॖ ᝻᫈Ͱ࢏రᤉᛡὊ̿ᥘВХࠫ൤ښѻᬔ஡͈ӴၹᄊቇᫎὊѻᬔ͊Ҭᤰ࣢ ஆڀᄊ஡͈஝᧚ᡔ᣿ʷࠀඋΓ௑Ὂࠫ͘ block ᤉᛡடေ὆compact὇Ὂ̿ block ᧗ѻᬔ଀ὊԶ௧˞஡͈᝺Ꮆʷ˔ѻᬔಖᝮǍेʷ˔ block Яᜂѻᬔ े Client ѻᬔ TFS ᧗ᄊ஡͈௑Ὂ఩Ҭ٨ቫࣳˀ͘ቡԁ࠲஡͈஝૶̰ ፌ ClientǍڀ҂஡͈ᄊͯᎶὊ̰ block ᄊᄱऄͯᎶ឴ԩ஝૶ࣳᤅ DataServer ଌஆ҂ Client ᄊ឴ឰර௑Ὂᤰ᣿ಊ੽ block ᄊጊळࡃᑟঌᤴ४ ԧ᤟឴ឰරǍݠ౧̰౽˔Ҟవ឴ԩܿ᠌Ὂ Client ͘᧘តХ̵ᄊҞవὙ ᄊ DataServer ηৌὊࣳՔ DataServerښৌὊ̰ NameServer ʽಊវ block ਫ਼ ᄊ block ηښे Client ឴ԩ஡͈௑ὊᯫЏಪ૶஡͈Րᝍౢѣ஡͈ਫ਼ 139 ᅲ䏉ټᅱ⍋䞣᭛ӊᄬ⎬ ௑ᭊ᜶ஃેʷ᫃ழឦᝓ᝻᫈ TFS ᄊੇవ˷ԫ४᭤࣢ͰὊԶᭊ᜶ોིөᝬ ʽጳΎၹՑὊTFS ᄊਫ਼దጸ͈ӤጟᦐᑟϢ҂ࠫၹਗ਼ᤩ௚ὊՏڱNginx വ ေਫ਼దᄊ TFS ឴иឰරὊՔၹਗ਼ଢΙ RESTful ᄊ᝻᫈ଌ԰Ǎ̽ڱᮥὊសവ github नູὊhttps://github.com/alibaba/nginx-tfs὇౏ᝍхស᫈ ښ὆ࣃڱവ ˔ऄၹ౏Ӥጟࠇਗ਼ቫὊӤጟੇవ᭤࣢˨ᰴǍTFS ᤰ᣿नԧ Nginx ࠇਗ਼ቫ Ύၹᄊ஝ᄈښࠇਗ਼ቫԧ࣋ፌၹਗ਼ΎၹՑὊʷேԧဘ bugὊᭊ᜶ᤰᅼࣃፃ ࠄဘࠇਗ਼ቫ᝻᫈Ցቫ఩Ҭ٨ᄊ᤿ᣤǍेڡܭ᧘ښ᫃ழឦᝓᄊஃેὊᦐ௧ ҫʷܙ൥ TFS ˷ଢῚ Java ࠇਗ਼ቫὊඈڂҬ˟᜶Ύၹ Java ᤉᛡनԧὊ TFS ଢῚಖюᄊ C++ࠇਗ਼ቫΙनԧᏨΎၹὊՏ௑ႀ̆๼ࠃЯᦊˊ ੇܸ஡͈Ǎ ྟࠫऄᄊ TFS ஡͈ՐηৌὊг̰ TFS ᧗឴ѣՊ˔Ѭྟᄊ஝૶Ὂ᧘ழጸՌ ᄊ௧ܸ஡͈ᄊѬྟηৌ὇Ὂेၹਗ਼᝻ܸ᫈஡͈௑ὊClient ͘Џ឴ѣՊ˔Ѭ ᄊ஡͈Ր὆ស஡͈Րˁ൤࣢ᄊ TFS ஡͈దᅌˀՏᄊҒ፰Ὂ̿ӝѬХߛϲ ஡͈ՐὊཀྵՑ࠲ܳ˔஡͈Րͻ˞ழᄊ஡͈஝૶ߛϲ҂ TFSὊ४҂ʷ˔ழ ࠵஡͈὆ᤰ࣢ඈ˔ 2MB὇ѬྟὊࣳ࠲ඈ˔Ѭྟᦐߛϲ҂ TFSὊ४҂ܳ˔ ̰ TFS ᧗឴ԩ஡͈Ǎܸࠫ̆஡͈ᄊߛϲὊClient ͘࠲ܸ஡͈ѭѬ˞ܳ˔ ᄊ஡͈௑Ὂѷ͘Џ̰ metaserver ಊវស஡͈Րࠫऄᄊ TFS ஡͈ՐὊཀྵՑ ஡͈Րˁ TFS ஡͈Րᄊ௢࠱Тጇߛϲ҂ metaserverὊे឴ԩᒭࠀ஡͈Ր ࠲Хߛϲ҂ TFS ˗Ὂ४҂ʷ˔ႀ TFS Ѭᦡᄊ஡͈ՐὊཀྵՑ࠲ၹਗ਼ૉࠀᄊ ஡͈Րᄊ௢࠱ТጇǍेၹਗ਼ߛϲʷ˔ૉࠀ஡͈Րᄊ஡͈௑Ὂ Client ᯫЏ TFS ଢΙӭ࿘ᄊЋ஝૶఩Ҭ٨὆ metaserver὇౏ኮေᒭࠀ˧஡͈Ր҂ TFS ᣿ଢΙழᄊ఩Ҭnj࠰ᜉ Client ౏ࠄဘǍࠫ̆ᒭࠀ˧஡͈Րᄊߛϲ఩ҬὊ ఀࣳ෥దஈԫ TFS ఩Ҭ٨ቫᄊߛϲ఻҄ὊᏫ௧ᤰڤેὊஃેᤈːመˊҬ ಪ૶ˊҬᄊᭊරὊTFS ᤇࠄဘ̀ࠫᒭࠀ˧஡͈Րܸ֗஡͈ߛϲᄊஃ 140 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 ᣤᬷᏆǍ ᄊ஝૶ᤰ᣿ᬷᏆᫎᄊՏ൦఻҄౏δ᝽஝૶̉˞᪫ϸὊ౞ੇʷ˔ܸᄊ᤿ ఻੝ՊᦊᎸʷ˔ TFS ྭေᬷᏆὊܳ˔ྭေᬷᏆ˔ܳښ༫ὊЦʹϢข௧ TFS ᬷᏆᤰ᣿ܳҞవ఻҄౏δ᝽஝૶ᄊԻ᭥ভὊՏ௑ஃેܳ఻੝ࠔ ఻੝ࠔ༫ ᝨட˔ᬷᏆᄊ఩Ҭ஍ဋతܸӑǍ ʷ˔ࡊ᧚ଌᤃᄊඵͯጳʽὊښᏆ᧗ਫ਼ద DataServer ᄊࠔ᧚Ύၹৱцδે ૶̰ࠔ᧚ᣗᰴᄊ DataServer ᣻ረ҂ழੱࠔᄊ DataServer ᧗ὊతጼΎ४ᬷ ᛦὊ࠲ᦊѬ஝کܸǍ᧫ࠫᤈመৱцὊ NameServer ࠫ͘ட˔ᬷᏆᤉᛡ᠇ᣒ ൥᠇ᣒʽࣀᡰ˷ॢڂࠔ᧚ΎၹʽˁᬷᏆ᧗Х̵ᄊ DataServer ࣀᡰॢܸὊ ښХߛϲᄊ঴஝૶᧚۳వն൤උТጇǍेழ DataServer ҫКᬷᏆ௑ὊХ ஡͈བྷགဘ៶Ὂਫ਼̿ DataServer ᄊ᠇ᣒˁښ᫈ឰර᭤࣢ᬤ఻Ὂ۳వˀߛ ҂ TFS ʽᄊ஡͈᝻ູڀႀ̆ TFS Ғቫద๼ࠃ CDN ᎁߛ஝૶Ὂతጼ DataServer ࡃԻ̿नݽଢΙ឴и఩Ҭ̀Ǎ ழᄊ DataServer ʽѹथʷ੻ block ၹ̆ଢΙи୲ͻὊ൥௑ழੱࠔᄊښ͘ DataServer ఩ҬԁԻǍे NameServer ਖᅼ҂ழᄊ DataServer ҫКᬷᏆ௑Ὂ Ὂթүܒݞੱࠔᄊழ఻٨ὊᦊᎸ DataServer ᄊᤂᛡဗܬ̡րԶᭊ᜶ю ᒰТ᧘᜶ǍTFS ࠫᬷᏆᄊੱࠔஃે᭤࣢ԤݞὊेᬷᏆᭊ᜶ੱࠔ௑Ὂᤂ፥ ࠫ̆ߛϲጇፒᏫᝓὊᬔ̀δ᝽஝૶ᄊԻ᭥ߛϲܱὊஃેࠔ᧚ੱ࡙˷ ࣱ໏ੱࠔ Ք Nginx ̽ေԧ᤟ HTTP ឰරԁԻǍ 141 ᅲ䏉ټᅱ⍋䞣᭛ӊᄬ⎬ ᬷᏆὊស஡͈ᤇ෥ద̰˟ᬷᏆՏ൦᣿౏὇Ὂܬˀ҂஡͈὆សྭေᬷᏆԻᑟ௧ ࠇਗ਼ቫ឴ԩ஡͈௑Ὂ͘ᤥહሏᒭࣂతᤃᄊྭေᬷᏆᤉᛡ឴ԩὊݠ౧឴ԩ ࠫ̆ஃેܳ఻੝ࠔ༫ᄊᬷᏆὊTFS ࠇਗ਼ቫଢῚ failover ᄊஃેὊ ᬷᏆՏ൦ὊᏫ᧫ࠫϦ஝Ղ block ᄊи୲ͻႀ 2 ՂᬷᏆՔ 1 ՂᬷᏆՏ൦Ǎ ԶѬᦡϦ஝Ղᄊ block idὙ᧫ࠫ݉஝Ղ block ᄊи୲ͻႀ 1 ՂᬷᏆՔ 2 Ղ и஡͈௑ښи஡͈௑ԶѬᦡ݉஝Ղᄊ block idὊᏫ 2 Ղ˟ᬷᏆښՂ˟ᬷᏆ ᬷᏆોིૉࠀᄊ᜻ѷѬᦡ block id ၹ̆и୲ͻǍ̿ː˔˟ᬷᏆ˞ΓὊ 1 ေᬷᏆǍ˞̀ᥘВܳ˔˟ᬷᏆՏ௑иʷ˔ block ᤵੇᄊифቊὊඈ˔˟ ေᬷᏆ௧ࠫ኎ᄊὊՏ௑ܱࠫଢΙ឴и఩ҬὊࣳ࠲и୲ͻՏ൦҂Х̵ᄊྭ Ꮖదи୲ͻὊѷᭊ᜶᧔ၹܳ˔˟ᬷᏆᄊᦊᎸவरὊԁ᤿ᣤᬷᏆ᧗ඈ˔ྭ ఻੝ᄊ TFS ᬷښ఻੝ᄊࠔ༫Ὂݠ౧ܳ˔఻੝ᄊऄၹᦐࠫਫ਼ڡࠫ̆प NameServer sdb sdc dsp1 dsp2 dsp3 DataServer sda dsp2 dsp3 sdb sdc dsp1 DataServer sda NameServer sdb sdc dsp1 dsp2 dsp3 DataServer sda dsp2 dsp3 sdb sdc dsp1 DataServer sda ેʷᒱᄊ࿄গǍ ᬷᏆʽߛϲᄊ஡͈஝૶δܬ˟ᬷᏆʽὊδ᝽ܬெঃ᧗ᄊ஡͈୲ͻऄၹ҂ ὆иnjѻᬔ኎὇ Ὂࣳႀ DataServer ᄊՑԼጳሮ᧘ஊெঃὊ࠲ی஡͈୲ͻዝ ऄᄊ DataServer ᝮैՏ൦ெঃὊெঃӊե஡͈ᄊ block id ֗ file id ̿ԣ ᬷᏆԶଢΙ឴఩ҬὊ˟ᬷᏆʽߛϲᄊਫ਼ద஡͈ᦐ͘ႀࠫܬΙ឴и఩ҬὊ ਫ਼ᇨὊ˟ᬷᏆՏ௑ଢڏὊݠʾܬܳ˟ᄊ᤿ᣤᬷᏆᦊᎸவर˞ʷیЧ 142 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵 үᤉᛡੱࠔǍ ఻٨஌ᬪ௑Ὂᒭү࠲ХʾጳὙेԧဘᬷᏆᄊࠔ᧚Ύၹᡔ᣿᝝੉ጳ௑Ὂ˟ ᄣ᜽఩Ҭ٨ᄊ఩Ҭ࿄গnjᄣ଍ᬷᏆᄊࠔ᧚Ύၹৱц኎Ὂेԧဘదᇓᄨੋ ਫ਼దᄊ TFS ఻٨ʽᦊᎸᄣ଍ሮऀὊښ˞̀ࡊொԧဘ᫈ᮥὊᤂ፥̡ր͘ Ꮖʽ᝻᫈Ǎ ѣဘ᧘ܸ᫈ᮥ௑ὊԻ̿ξஈ MySQL ᄊᦡᎶὊ࠲ऄၹѭ૱҂൤࣢ᄊᬷ ፌ ClientǍඋݠे౽˔ᬷᏆڀrcserver ࠲᧫ࠫសऄၹᄊతழᦡᎶηৌࣜ keepaliveὊClient ࠲ऄၹ឴и஡͈ᄊፒᝠηৌලઑፌ rcserverὊ ڡভ ηৌὊಪ૶ᦡᎶηৌ౏᝻᫈ TFS ᄊ఩ҬὙClient ˁ rcserver ᫎ͘ևర TFS ࠇਗ਼ቫթү௑Ὂ͘ಪ૶ appkey ̰ rcserver ʽᖍԩऄၹᄊਫ਼దᦡᎶ Ѭᦡʷ˔ appkeyὊՏ௑ಪ૶ऄၹᄊᭊර˞ХѬᦡᬷᏆߛϲᠫູǍे ᬷᏆᄊ᝻᫈ిᬍ኎ǍेЯᦊऄၹᭊ᜶Ύၹ TFS ௑ὊTFS ͘ፌඈ˔ऄၹ ေᬷᏆὊඈ˔ྭေྭ̏דMySQL ஝૶ः᧗Ὂඋݠʷ˔᤿ᣤᬷᏆ᧗ద ᬷᏆʽጳ௑ႀᤂ፥ኮေ̡ր຋ҫ҂ښ ඈ˔ᬷᏆᄊᦊᎸηৌ͘ ఻٨ʽጳ௑Ὂᄰଌಪ૶വ౜౏ၷੇᦡᎶ஡͈ǍښᦡᎶവ౜Ὂ MySQL ஝૶ः᧗ᦐదʷݓ ښҞవǍ˞̀ᥘВᦡᎶ஡͈ᩲឨὊඈ˔ᬷᏆ ߛϲ 2 ˔ҞవὊᏫదᄊᬷᏆѷ᜶රఞᰴᄊԻ᭥ভὊඈ˔ block ߛϲ 3 ˔ ᦡᎶʽᤰ࣢௧ˀՏᄊὊඋݠదᄊᬷᏆ᜶ර blockښ࿘ቡᄊܳ˔ᬷᏆ ὆rcserver὇ᤉᛡፒʷኮေǍ MySQL ஝૶ः᧗Ὂᤰ᣿ᠫູኮေ఩Ҭ٨ ښTFS ࠲ਫ਼దᄊᠫູηৌߛϲ ๼ࠃЯᦊᦊᎸదܳ˔ᬷᏆnjʽӢԼ఩Ҭ٨Ὂద஝ᄈ˔ऄၹ᝻᫈Ὂښ TFS  ᤂ፥ኮေ ࠇਗ਼ቫ͘࠽ត̰Х̵ᄊྭေᬷᏆ឴ԩ஡͈Ǎ 143 ᅲ䏉ټᅱ⍋䞣᭛ӊᄬ⎬ ຺dฉ੠ zyd_cuēఆ୷ϐᅖdדޥ೬ωူӖޏْᄵ ಬߌܫᅟಀ௕ԅྡྷ໔ࢗၗົંē๿ڑฉ౨ْᄵ໭ิc॓ܚᅟڑܫ൐๿ ཙࡎēં஍ၽඇͯۢ໯ຂහϦϵူ TFS ԅࢗ֟ٝᆴdڑϣ಴Ӊ҅ອד Ӗ༰dཙࡎಓࠈխᅖྑҶಹޏᄯࢳܟ԰㗙ҟ㒡˖ ჆ဖՊēΘྜဟ ᄊߛϲੇవᬌͰ 25%~50%Ǎ ኖ႕Ὂសᮊᄬᄊʽጳᮕᝠ͘Ύ TFS͋ܬ҂ጇፒ˗Ὂၹ̆̽ణ͜ፒᄊҞవ ࠲ Erasure code ੿షऄၹܬဋnjᬌͰߛϲ̿ԣᤂ፥ੇవǍᄬҒ TFS ൤ю δ᝽஝૶Ի᭥ভᄊ۳ᆩʽଢᰴ఩Ҭ஍ښTFS ˟᜶ԧ࡙வՔʷᄰ௧ ళ౏ࢺͻ 144 ໻᭄᥂ᯊҷⱘ IT ᶊᵘ䆒䅵
还剩35页未读

继续阅读

下载pdf到电脑,查找使用更方便

pdf的实际排版效果,会与网站的显示效果略有不同!!

需要 8 金币 [ 分享pdf获得金币 ] 0 人已下载

下载pdf

pdf贡献者

daydreamer

贡献于2015-05-28

下载需要 8 金币 [金币充值 ]
亲,您也可以通过 分享原创pdf 来获得金币奖励!
下载pdf