Linux内核源代码情景分析


w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ϔϾ༛䗍DŽ䖭М໮ᖫᜓ㗙খϢˈাᰃ䗮䖛Ѧ㘨㔥㓈ᣕᵒᬷⱘ㒘㒛ˈሙ✊㛑᳝ᴵϡ㋞ഄѦⳌ䜡ড়ˈᓔথ ˄ᅲ䰙Ϟህᰃ Netscape˅ㄝㄝˈ߭ℷདϢП䜡༫៤啭DŽҎӀ᱂䘡䅸Ў㞾⬅䕃ӊⱘᓔথᰃ䕃ӊ乚ඳЁⱘ ⱘ C 㓪䆥 gccǃ⿟ᑣ䇗䆩Ꮉ݋ gdbˈ䖬᳝৘⾡ Shell ੠ᅲ⫼⿟ᑣˈЗ㟇 Web ᳡ࡵ఼ Apacheǃ⌣㾜఼ Mozilla ᣕⱘ Linux ݙḌⱘᓔথǃᬍ䖯Ϣ㓈ᡸˈህ៤њ FSF ⱘЏ㽕乍ⳂПϔDŽৠᯊˈFSF ⱘ݊ᅗ乍Ⳃˈབ GNU 㓽ݭ˅ⱘ᪡԰㋏㒳੠ᑨ⫼⦃๗ˈ㗠 Linux ⱘߎ⦄ℷᰃ䗖ᕫ݊ᯊˈ䗖ᕫ݊᠔DŽѢᰃˈ⬅ Linus Torvalds Џ ᔧᯊ FSF Ꮖ㒣᳝䅵ߦ㽕ᓔথϔϾ㉏ Unix˄Ԛজϡᰃ Unixˈ᠔ҹ⿄Ў GNUˈ䖭ᰃ“Gnu is Not Unixāⱘ ᕜᖿ֓ᓩ䍋њ⛁⚜ⱘডᑨˈᑊϨϢ㕢೑Ā㞾⬅䕃ӊ෎䞥Ӯ”FSF ⱘЏᓴℷདϡ䇟㗠ড়DŽ⫣خ⾡Ҫⱘ䖭 ᴀҹৢህᡞᅗᬒ೼њѦ㘨㔥Ϟˈϔᴹᰃᡞ㞾Ꮕݭⱘҷⷕ݀䇌ѢӫˈѠᴹᰃ䙔䇋᳝݈䍷ⱘҎгᴹখϢDŽ 䖭М᱂ঞˈԚᰃ೼໻ᄺ੠݀ৌЁᏆ㒣⫼ᕫᕜ໮њDŽLinus Torvalds೼෎ᴀᅠ៤њ Linux ݙḌⱘ㄀ϔϾ⠜ ೼⦃ڣ䍋ᴹDŽ⬅Ѣ᠔ᅲ⦄ⱘ෎ᴀϞᰃ UnixˈLinus Torvaldsህᡞᅗ⿄Ў LinuxDŽ䙷ᯊ׭Ѧ㘨㔥㱑✊䖬ϡ ↩コᰃĀ߱⫳⠯⡞ϡᗩ㰢āˈࡴϞ㞾䑿ⱘ໽䌟੠ࢸ༟ˈ䖬᳝݀ⲞᖗˈLinus Torvalds ህ㞾Ꮕࡼ᠟ᑆњ ᅠܼⳃ೼ᬭᄺϞˈ಴ℸᑊϡ䅸Ў䖭ᰃϔϾདЏᛣˈ≵᳝䞛㒇䖭Ͼᓎ䆂DŽैܝⱘⳂ 䌍ⱘ˄⦄ҷ˅Unix ㋏㒳ˈজ᳝㋏㒳ⱘ⑤ҷⷕˈϨϡᄬ೼⠜ᴗ䯂乬DŽৃᰃˈTanenbaum ᬭᥜܡӫህ᮶᳝ ⱘ䆒䅵ˈᑊϨम䞛৘⾡⠜ᴀП䭓ˈ೼ PC ᴎϞᅲ⦄ˈᓔথߎϔϾⳳℷৃҹᅲ⫼ⱘ Unix ݙḌDŽ䖭ḋˈ݀ ϔϾ㢀݄ᄺ⫳ Linus Torvaldsህ㧠⫳њϔϾᗉ༈ˈे㒘㒛ϔѯҎˈҹ Minix Ў䍋⚍ˈ෎ᴀϞᣝ✻ Unix ๗DŽ䖭ḋˈMinix 㱑✊ϡ༅ЎϔϾϡ䫭ⱘᬭᄺᎹ݋ˈै㔎Уᅲ⫼ӋؐDŽⳟࠄ Minix ⱘ䖭Ͼ㔎⚍ˈᔧᯊⱘ ੠䆌໮Ꮉ݋ᗻⱘĀᅲ⫼⿟ᑣāDŽབᵰݙḌᦤկⱘᬃᣕϡᅠᭈˈህϡ㛑Ϣ䖭ѯ៤ߚ㒧ড়䍋ᴹᔶ៤ Unix ⦃ ݙḌሲѢϡৠⱘ䆒䅵ˈࡳ㛑Ϟ᳈ᰃϡৃৠ᮹㗠䇁DŽݡ䇈 Unix гϡҙҙᰃݙḌˈ䖬ࣙᣀњ݊Ā໪໇”Shell Minix ᰃϾ᠔䇧ĀᖂݙḌāˈϢ Unixˈܜ⫼DŽԚᰃˈMinix 㱑䇈ᰃĀ㉏ Unixāˈ݊ᅲ⾏ Unix Ⳍᔧ䖰DŽ佪 Unixā᪡԰㋏㒳 Minixˈ೼ PC ᴎϞ䖤㸠ˈ݊⑤ҷⷕ೼ 20 Ϫ㑾 80 ᑈҷৢᳳ੠ 90 ᑈҷࠡᳳ᳒㹿ᑓ⊯䞛 ϟˈߎѢᬭᄺⱘ䳔㽕ˈ㥋݄ⱘ㨫ৡᬭᥜ Andrew S. Tanenbaum㓪ݭњϔϾᇣൟⱘĀ㉏މ೼䖭⾡ᚙ জ᜶᜶ᰒᕫ䰜ᮻњˈ֓䗤⏤ϡݡ⫼ Unix ݙḌⱘ⑤ҷⷕ԰Ўᬭᴤњ˄ԚᰃⳈࠄ⦄೼䖬᳝೼⫼ⱘ˅DŽ থⱘDŽৢᴹˈUnix ៤њଚકˈ݊⑤ҷⷕгফࠄњ⠜ᴗⱘֱᡸˈݡ䇈г᮹Ⲟ໡ᴖ੠ᑲ໻њˈ㗠㄀ 6 ⠜߭ ߽ߚ᷵ᓔܟ੠থሩˈгৃҹⳟࠄ Unix 䍋ⴔ䞡㽕ⱘ԰⫼DŽUnix ϸ໻Џ⌕Пϔⱘ BSD ህᰃ೼ࡴᎲ໻ᄺԃ 䖛ᴹˈ䖭г֗䖯њ Unix ⱘ᱂ঞ੠থሩˈᑊϨ೼ᔧᯊᔶ៤њϔϾ Unix ѻϮDŽџᅲϞˈಲ乒⸙䈋ⱘᔶ៤ ⫳Փ⫼ⱘᬭᴤˈ⫮㟇ৃҹ䇈ˈ㕢೑ᔧᯊᭈᭈϔҷⱘ䅵ㅫᴎϧϮҎਬ䛑ᰃ䇏ⴔ Unix ⱘ⑤ҷⷕ៤䭓ⱘDŽড њᴵӊDŽ⡍߿ᰃ Unix ݙḌ㄀ 6 ⠜ⱘ⑤ҷⷕˈ೼Ⳍᔧ䭓ⱘϔ↉ᯊᳳݙᰃ໻ᄺ䅵ㅫᴎ㋏催ᑈ㑻ᄺ⫳੠ⷨお ᴎᦤկњᴵӊ˗঺ϔᮍ䴶ˈгᰃ᳈䞡㽕ⱘˈЎ䅵ㅫᴎ䕃ӊⱘḌᖗᡔᴃĀ᪡԰㋏㒳āⱘᬭᄺ੠ᅲ偠ᦤկ ঞϔѯ㽓ᮍ೑ᆊⱘ໻ᄺ੠⾥ⷨᴎᵘՓ⫼ˈᑊϨᦤկ⑤ҷⷕˈ䖭ϔᮍ䴶Ў催᷵੠⾥ⷨᴎᵘ᱂ঞՓ⫼䅵ㅫ 䌍կ㕢೑ܡ೼䅵ㅫᴎᡔᴃⱘথሩ৆ϞˈUnix ᪡԰㋏㒳ⱘߎ⦄ᰃϔϾ䞡㽕ⱘ䞠⿟⹥DŽᮽᳳⱘ Unix ᳒ 1.1 LinuxݙḌㅔҟ ㄀1ゴ乘໛ⶹ䆚 1 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ҹऩ CPU 㒧ᵘЎЏˈԚᰃ᳔ৢ᳝ϔゴϧ䮼䅼䆎໮ CPU 㒧ᵘDŽ ㋏㒳㒧ᵘˈᅗ᮶ᬃᣕᐌ㾘ⱘऩ CPU 㒧ᵘˈгᬃᣕ໮ CPU 㒧ᵘDŽϡ䖛ˈᴀкᇚϧ⊼Ѣ i386 CPUˈᑊϨ Linux ݙḌᰃ⦄Ҟ㽚Ⲫ䴶᳔ᑓⱘϔԧ࣪ݙḌDŽৠᯊˈ೼ৠϔϾ㋏߫ⱘ CPU ϞˈLinux ݙḌ䖬ᬃᣕϡৠⱘ AlphaǃM68KǃMIPSǃSPARCǃPower PC ㄝㄝ˄PentiumǃPentium II ㄝㄝഛሲѢ i386 ㋏߫˅DŽৃҹ䇈 Linux ᳔߱ᰃ೼ Intel 80386ĀᑇৄāϞᅲ⦄ⱘˈԚᰃᏆ㒣㹿⿏ỡࠄ৘⾡Џ㽕ⱘ CPU ㋏߫Ϟˈࣙᣀ ㅔ࣪ˈ⫮㟇ৃҹ䇈ᰃথ⫳њḍᴀᗻഄব࣪DŽ 䆒໛偅ࡼ⿟ᑣϔゴ˅ˈᰃᕜ໻ⱘᬍ䖯DŽᅗՓ Linux 䆒໛偅ࡼ⿟ᑣⱘ䆒䅵ǃᅲ⦄ǃ䇗䆩ҹঞথᏗ䛑໻໻ഄ ⣀ⱘ䖯⿟䖤㸠ˈ᠔ҹ݊䖤㸠ᬜ⥛䖬ᰃᕫࠄֱ䆕DŽ῵ഫˈгህᰃࡼᗕᅝ㺙ⱘ䆒໛偅ࡼ⿟ᑣⱘᅲ⦄˄䆺㾕 ೼ᖂݙḌЁ䙷ḋ԰Ўऩڣ㽕⫼ࠄᶤϔ῵ഫᯊ⬅㋏㒳㞾ࡼᅝ㺙DŽ䖭ḋⱘ῵ഫҡ✊೼ݙḌЁ䖤㸠ˈ㗠ϡᰃ 䆌೼䖤㸠⢊ᗕϟᔧ䳔ܕ䆌ࡼᗕഄ೼䖤㸠ᯊᅝ㺙ˈ⿄ЎĀ῵ഫā˗䖬ܕЁˈϔབӴ㒳ⱘ偅ࡼ⿟ᑣ䙷ḋ˗г 䆌ᡞ䆒໛偅ࡼ⿟ᑣ೼㓪䆥ᯊ䴭ᗕഄ䖲᥹೼ݙḌܕᕫ↨䕗དDŽLinux ᮶އ೼ Linux 䞠ˈ䖭Ͼ䯂乬ህ㾷 ⱘ⫼᠋↩コϡ໮DŽ ᇍ DOS/Windows 䙷ḋ˅ˈ᳝㛑࡯ׂᬍ Linux ݙḌⱘ䆒໛㸼ˈᑊ䞡ᮄ㓪䆥ݙḌڣ˄ҹᡞ䖭䆒໛ᅝ㺙Ϟњ Ⲭ㒭⫼᠋ˈՓᕫ⫼᠋া㽕䖤㸠üϟ“setupāህৃܝᦣҾ˅ˈᅗህϡৃ㛑䱣ৠ䖭ᮄⱘ䆒໛ᦤկϔ⠛䕃Ⲭ៪ ϟˈᔧᶤϔϾ݀ৌᓔথߎϔ⾡ᮄⱘ໪䚼䆒໛ᯊ˄↨ᮍ䇈ˈϔৄᔽ㡆ᠿމ࣪њDŽ೼䖭ḋⱘᚙۉ䙷ህᰃ໾ ᔧ✊г᳝ད໘ˈབ㋏㒳ⱘᅝܼᗻ᳈㛑ᕫࠄֱ䆕ˈԚ݊㔎⚍гᰃᕜᯢᰒⱘˈخᑊ䞡ᮄᓩᇐᭈϾ㋏㒳DŽ䖭ḋ ᰃ㓪ݭ䖭Ͼ䆒໛ⱘ偅ࡼ⿟ᑣˈᑊবࡼݙḌ⑤⿟ᑣЁⱘᶤѯ᭄᥂㒧ᵘ˄䆒໛㸼˅ˈݡ䞡ᮄ㓪䆥ᭈϾݙḌˈ⫣ خӴ㒳ⱘ Unix ݙḌᰃĀܼᇕ䯁āⱘDŽབᵰ㽕ᕔݙḌЁࡴϔϾ䆒໛˄๲ࡴϔ⾡᳡ࡵ˅ˈᮽᳳϔ㠀ⱘ 䗮⫼ᓣ㋏㒳ˈLinux 䞛⫼ϔԧ࣪ݙḌᰃᕜ㞾✊ⱘџDŽ ˄Monolithic Kernel˅DŽ䗮⫼ᓣⱘ㋏㒳⬅Ѣ᠔䳔ⱘ᳡ࡵ䴶ᑓ㗠䞣໻ˈϔԧ࣪ݙḌህ᳈Ўড়䗖ˈ԰Ўϔ⾡ ϢᖂݙḌⳌᇍᑨˈӴ㒳ⱘݙḌ㒧ᵘህ⿄ЎĀᅣݙḌā˄Macro•Kernel˅ˈ៪⿄ЎĀϔԧ࣪ݙḌ” ᐌᰃ᡹᭛Ӵ䗦˅ᦤկ᳡ࡵˈ࢓ᖙ๲ࡴ㋏㒳ⱘ䖤㸠ᓔ䫔ˈ䰡Ԣњᬜ⥛DŽ VxWorks ㄝDŽᔧ✊ˈᖂݙḌг᳝㔎⚍ˈᇚ䖭ѯ᳡ࡵⱘᦤկ䛑ᬒ೼䖯⿟ሖ⃵Ϟˈݡ䗮䖛䖯⿟䯈䗮ֵ˄䗮 ᠔䳔㽕ⱘ᳡ࡵজ↨䕗ऩϔ੠ㅔऩDŽ᠔ҹˈ޴Т᠔᳝ⱘጠܹᓣ㋏㒳੠ᅲᯊ㋏㒳䛑䞛⫼ᖂݙḌˈབ PSOSˈ ぎ䯈ⱘ䰤ࠊˈ㗠ټ㽕ᰃ಴Ў䗮ᐌ䖭ѯ㋏㒳䛑ϡᏺ⺕ⲬˈᭈϾ㋏㒳䛑ᖙ乏ᬒ೼ EPROM Ёˈᐌᐌফࠄᄬ Џ㽕ᰃᅲᯊ㋏㒳੠Āጠܹᓣā㋏㒳˄Embedded System˅ˈᖂݙḌⱘᗱᛇህᕜ᳝਌ᓩ࡯DŽお݊ॳ಴ˈЏ ੠ਃࡼDŽ෎Ѣ䖭ḋⱘᛇ⊩ˈ৘⾡ĀᖂݙḌā˄Micro•Kernel˅֓ᑨ䖤㗠⫳DŽ⡍߿ᰃᇍѢϔѯϧ⫼ⱘ㋏㒳ˈ ᮶✊ᏆҢݙḌЁ␌⾏ߎᴹˈ֓ৃҹऩ⣀ഄ䆒䅵ˈᅲ⦄ҹঞ䇗䆩ˈ᳈䞡㽕ⱘᰃৃҹᣝᅲ䰙ⱘ䳔㽕ᴹ䜡㕂 ᇣ੠ㅔ࣪DŽ㗠৘Ͼ᳡ࡵ䖯⿟ˈޣࡵᦤկ㗙ҢݙḌ䕀⿏ࠄ䖯⿟ⱘሖ⃵Ϟˈ䙷МݙḌᴀ䑿ⱘ㒧ᵘህৃҹ໻໻ ҹ㹿䆒䅵ᑊᅲ⦄ᶤѯĀ᳡ࡵ䖯⿟āˈ݊Ёᖙ乏㽕⬭೼ݙḌЁⱘ៤ߚ݊ᅲা᳝䖯⿟䯈䗮ֵDŽབᵰᡞ䖭ѯ᳡ ൟⱘ“Client/Serverāⱘ݇㋏DŽ݊ᅲˈ䖭ѯ᳡ࡵᦤկ㗙ᑊϡϔᅮ䴲ᕫ䛑⬭೼ݙḌЁϡৃˈҪӀᴀ䑿гৃ ໛偅ࡼ੠᭛ӊ㋏㒳ㄝㄝDŽݙḌЁᦤկ৘⾡᳡ࡵⱘ៤ߚϢՓ⫼䖭ѯ᳡ࡵⱘ䖯⿟П䯈ᅲ䰙Ϟህᔶ៤ü⾡݌ ᑨ⫼ሖⱘĀ䖯⿟āⱘ៤ߚˈབ䖯⿟ㅵ⧚ˈгࣙ৿Ў䖭ѯ䖯⿟ᦤկ৘⾡᳡ࡵⱘ៤ߚˈབ䖯⿟䯈䗮ֵǃ䆒 ໻ᆊⶹ䘧ˈӴ㒳ᛣНϟⱘ᪡԰㋏㒳ˈ݊ݙḌᑨ݋໛໮Ͼᮍ䴶ⱘࡳ㛑៪៤ߚˈ᮶ࣙ৿⫼Ѣㅵ⧚ሲѢ 㟇ᰃ৘⾡ Unix ⠜ᴀϢব⾡ⱘ䲚໻៤㗙DŽ ݙḌā˗Minix ᰃϾ㉏ Unix ⱘᬭᄺ⫼῵ൟˈ㗠 Linux ⫮ᴀϞህᰃ Unixˈ㗠Ϩᰃ Unix ⱘᓊ㓁੠থሩˈ⫮ 䙷МˈLinux Ϣᅗⱘࠡ䑿 Minix ⱘऎ߿ԩ೼ਸ਼˛ㅔऩഄ䇈ˈMinix ᰃϾĀᖂݙḌāˈ㗠 Linux ᰃϾĀᅣ ߎ催䋼䞣ⱘ㗠Ϩজᰃ䲒ᑺ䕗໻ⱘ㋏㒳䕃ӊˈᅲ೼ҸҎ䌲্DŽ 2 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ೼ net ᄤⳂᔩϟࣙ৿њ৘⾡㔥वⱘ偅ࡼ⿟ᑣˈԚᅲ䰙Ϟ䗮ᐌাӮ⫼ࠄϔ⾡㔥वˈ㗠Ϩ৘⾡㔥वⱘ偅ࡼ⿟ ⫼ᴹᬃᣕ৘⾡ϡৠ CPU ⱘҷⷕˈԚ㓪䆥ҹৢ↣ϔϾ݋ԧⱘݙḌ䛑াᰃ䩜ᇍϔ⾡⡍ᅮ CPU ⱘDŽݡབˈ ӊ䛑Ӯ⫼ࠄˈ㗠ᰃ೼㓪䆥˄ࣙᣀ䖲᥹˅ᯊḍ᥂㋏㒳ⱘ䜡㕂᳝䗝ᢽഄՓ⫼DŽ՟བˈ㱑✊⑤ҷⷕЁࣙ৿њ ؐᕫϔᦤⱘᰃˈLinux ⱘ⑤ҷⷕⳟԐᑲ໻ˈ݊ᅲᇍѢ↣ϔϾ݋ԧⱘݙḌ㗠㿔ᑊϡᰃ᠔᳝ⱘ.c ੠.h ᭛ 䖭Ͼ㡖⚍ᓔྟDŽLinux ⑤ҷⷕⱘ㒘៤ˈ໻ԧབϟ᠔⼎DŽ ⱘ tar ᭛ӊˈ߭ሩᓔҹৢ೼ϔϾি linux ⱘᄤⳂᔩЁDŽҹৢᴀкЁ䇜ࠄ⑤᭛ӊⱘ䏃ᕘᯊˈህᘏᰃҢ linux ೼ᅝ㺙དⱘ Linux ㋏㒳ЁˈݙḌⱘ⑤ҷⷕԡѢ/usr/src/linuxDŽབᵰᰃҢ GNU 㔥キϟ䕑ⱘ Linux ݙḌ 3 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! њ໻䞣ⱘᑨ⫼䕃ӊDŽᑊϨˈЎњᅝ㺙䕃ӊˈথ㸠ॖଚৠᯊ䖬ᦤկњ⫼Ѣ䕃ӊᅝ㺙ⱘᎹ݋ᗻ䕃ӊˈҹ߽ њ䖭ѯᎹ԰ˈ೼ݙḌПϞ䲚៤خ᠋⫼䛑ᰃᕜ䌍ᯊ䌍࡯ⱘџᚙDŽLinux ⱘথ㸠ॖଚℷᰃⳟࠄњ䖭ϔ⚍ˈ᳓ ⚍ϟ䕑৘⾡㞾⬅䕃ӊ⏏ࡴࠄ㞾Ꮕⱘ㋏㒳Ёˈ䖬㽕Ў㋏㒳ࡴܹ৘⾡᳝⫼ⱘᎹ݋ˈㄝㄝDŽ㗠᠔᳝䖭ѯᎹ԰ ㋏㒳ᰃⳌᔧೄ䲒ⱘˈ಴Ў䙷ḋ⫼᠋ϡԚ㽕㞾Ꮕϟ䕑ݙḌ⑤⿟ᑣˈ㞾Ꮕ㓪䆥ᅝ㺙ˈ䖬㽕Ңϡৠⱘ FTP キ ᇍѢ໻໮᭄⫼᠋ˈ⬅থ㸠ଚᦤկⱘ䖭ѯথ㸠⠜䍋ⴔकߚ䞡㽕ⱘ԰⫼DŽ䅽⫼᠋㞾㸠䜡㕂੠⫳៤ᭈϾ ⠜ᴀ˄བ“Red Hat 6.0ā˅⏋⎚ˈ՟བˈCaldera 2.2 ⠜ⱘݙḌᰃ 2.2.5 ⠜DŽ 䇧Āᶤᶤ Linuxāাᰃᅗⱘϔ⾡থ㸠⠜ᴀ៪ׂ䅶⠜ᴀDŽ঺໪ˈϡ㽕ᡞ Linux ݙḌⱘ⠜ᴀϢথ㸠ଚ㞾Ꮕⱘ ৠˈ৘ॖଚ᠔㛑ᦤկⱘଂৢ᳡ࡵǃᡔᴃᬃᣕг৘ϡⳌৠDŽ⬅ℸৃ㾕ˈॳ߭ϞܼϪ⬠া᳝ϔϾ Linuxˈ᠔ ⱘ˄བ∝࣪˅DŽϡৠⱘথ㸠⠜⬅ϡৠⱘথ㸠ଚᦤկ᳡ࡵDŽϡৠⱘথ㸠ଚᇍ㞾Ꮕ᠔থ㸠⠜ᴀⱘᅮԡг᳝ϡ ϟг᳝ᇍݙḌҷⷕ⿡԰ׂᬍމᅝ㺙⬠䴶ǃ䕃ӊࣙⱘ໮ᇥǃ䕃ӊࣙⱘᅝ㺙੠ㅵ⧚ᮍᓣㄝᮍ䴶ˈ೼⡍⅞ᚙ Ё᠔䞛⫼ⱘݙḌ೼⠜ᴀϞ᳝᠔ϡৠˈԚ݊ᴹ⑤෎ᴀϔ㟈DŽ৘থ㸠⠜ⱘϡৠП໘ϔ㠀㸼⦄೼ᅝ㺙⿟ᑣǃ ೼থ㸠 Linux ᪡԰㋏㒳ϡৠⱘথ㸠⠜˄distribution˅ˈབ Red HatǃCaldera ㄝㄝDŽ㱑✊ϡৠⱘথ㸠⠜ᴀ Linux ⱘݙḌ෎ᴀϞা᳝ϔ⾡ᴹ⑤ˈ䙷ህᰃ⬅ Linus Џᣕᓔথ੠㓈ᡸⱘݙḌ⠜ᴀDŽԚᰃ᳝ᕜ໮݀ৌ Ўձ᥂DŽ ᔧ乥㐕ⱘˈ޴Т↣Ͼ᳜䛑೼বDŽᴀк᳔߱䞛⫼ⱘᰃ 2.3.28 ⠜ˈ᳔ৢ៤кҬॄᯊ߭ҹℷᓣথ㸠ⱘ 2.4.0 ⠜ Linux ݙḌⱘ 0.0.2 ⠜೼ 1991 ᑈ佪⃵݀ᓔথ㸠ˈ2.2 ⠜೼ 1999 ᑈ 1 ᳜থ㸠DŽLinux ݙḌⱘᬍ䖯ᰃⳌ ᴀো䖒ࠄ 2.3.99 ᯊˈⳌᑨⱘথ㸠⠜䖬াᰃ 2.2.18DŽ Āথ㸠⠜ā੠Āᓔথ⠜āⱘ zz ᰃ⣀ゟ㓪োⱘˈ಴ℸᑊ≵᳝೎ᅮⱘᇍᑨ݇㋏DŽ՟བˈᔧᓔথ⠜ 2.3 ⱘ⠜ ⠜ᴀ⬅ 2.0.34 छ㑻ࠄ 2.0.35 াᛣੇⴔ⠜ᴀ 2.0.34 Ёⱘϔѯᇣ㔎䱋㹿ׂ໡ˈ៪㗙ҷⷕ᳝њϔѯᇣⱘᬍবDŽ 㟇Ѣ zzˈ߭ҷ㸼ⴔ೼ݙḌ๲ࡴⱘݙᆍϡᰃᕜ໮ǃᬍࡼϡᰃᕜ໻ᯊⱘব䖕ˈা㛑ㅫᰃৠϔϾ⠜ᴀDŽ՟བˈ ᓔথ㗙Ӏজᇚ߯ᓎϟϔϾᮄⱘᓔথ⠜ᴀDŽԚᰃ᳝ᯊ׭гӮ೼ग़㒣њ޴Ͼᓔথ⠜ҹৢᠡথᏗϔϾথ㸠⠜DŽ ᭄ⱘথ㸠⠜DŽПৢˈيᴀϔᮺ䗮䖛⌟䆩ҹঞ䆩䖤㸠ˈ䆕ᯢᏆ㒣〇ᅮϟᴹˈህৃ㛑ӮথᏗϔϾ yy ⱘؐЎ ༛᭄߭㸼⼎䖬೼ᓔথЁˈⳂࠡ䖬ϡ໾〇ᅮǃ៪㗙೼䖤㸠Ёৃ㛑ߎ⦄↨䕗໻ⱘ䯂乬ⱘ⠜ᴀDŽᓔথЁⱘ⠜ ᭄֓㸼⼎ᰃϔϾⳌᇍ〇ᅮǃᏆ㒣থ㸠ⱘ⠜ᴀ˗㢹Ўيⱘ⾡㉏ˈेĀথ㸠⠜ā៪Āᓔথ⠜āDŽབᵰ yy Ў ⱘϡৠোⷕᷛᖫⴔݙḌ೼䆒䅵Ϟ៪ᅲ⦄Ϟⱘ䞡໻ᬍবˈyy ϔᮍ䴶㸼⼎⠜ᴀⱘব䖕ˈϔᮍ䴶ᷛᖫⴔ⠜ᴀ 䘧ᰃሲѢĀথ㸠⠜ā䖬ᰃĀᓔথ⠜āˈ᠔ҹ Linux ݙḌⱘ⠜ᴀ㓪োᰃ᳝㾘߭ⱘDŽ೼⠜ᴀো x.yy.zz Ёˈx ᅮǃ಴㗠䖬ϡ㛑থ㸠ⱘ⠜ᴀˈ಴ℸˈ䳔㽕᳝ϔ༫㓪োⱘᮍḜˈՓ⫼᠋ⳟࠄϔϾ݋ԧⱘ⠜ᴀোህৃҹⶹ ⬅Ѣ Linux ⑤ҷⷕⱘᓔᬒᗻˈ݀ӫ䱣ᯊ䛑ৃҹҢ㔥Ϟϟ䕑᳔ᮄⱘ⠜ᴀˈࣙᣀ䖬೼ᓔথЁǃᇮ᳾〇 䅶ⱘ⃵᭄DŽབ 0.99p15ˈҷ㸼䖭ᰃᇍ⠜ᴀ 0.99 ⱘݙḌⱘ㄀ 15 ⃵ׂ䅶DŽ ৢ䴶᳝ᯊӮ㾕ࠄ pNN ⱘᄫḋˈNN ᰃҟѢ 0 ࠄ 20 П䯈ⱘ᭄ᄫDŽᅗҷ㸼ᇍᶤϔ⠜ⱘݙḌĀᠧ㸹ϕā៪ׂ ݊Ё x ҟѢ 0 ࠄ 9 П䯈ˈ㗠 yyǃzz ߭ҟѢ 0 ࠄ 99 П䯈DŽ䗮ᐌ᭄ᄫᛜ催֓䇈ᯢ⠜ᴀᛜᮄDŽϔѯ⠜ᴀো Linux ݙḌⱘ⠜ᴀ೼থ㸠Ϟ᳝㞾Ꮕⱘ㾘߭ˈৃҹҢ݊⠜ᴀোࡴҹ䆚߿DŽ⠜ᴀোⱘḐᓣЎ“x.yy.zzāDŽ བ᮴⡍߿䇈ᯢˈ߭ Linux 䗮ᐌᰃᣛ݊ݙḌDŽ ೼䆆ࠄ Linux ᯊ᳝ᯊ׭ᰃᣛᭈϾ᪡԰㋏㒳ˈ᳝ᯊ׭߭ᰃᣛ݊ݙḌˈ㽕ḍ᥂Ϟϟ᭛ࡴҹऎߚDŽ೼ᴀкЁˈ 䇈ˈݙḌাᰃ᪡԰㋏㒳ⱘϔ䚼ߚˈे݊Ḍᖗ䚼ߚDŽԚᰃˈҎӀᕔᕔᡞ Linux ⱘݙḌህ⿄Ў Linuxˈ᠔ҹ 䗮ᐌˈ೼䇈ࠄ Linux ᯊˈᰃᣛᅗⱘݙḌࡴϞ䖤㸠೼ݙḌПϞⱘ৘⾡ㅵ⧚⿟ᑣ੠ᑨ⫼⿟ᑣDŽϹḐഄ ೼㒧ᴳᴀ㡖Пࠡˈ䖬㽕ҟ㒡ϔϟ᳝݇ Linux ݙḌ⠜ᴀⱘϔѯ㾘ᅮDŽ ᑣᅲ䰙Ϟ໻ৠᇣᓖDŽ 4 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᕟϞⱘ䯂乬DŽᰃ৺⍝ঞ⊩ᕟ䯂乬ྥϨϡ䆎ˈкЁ᠔߫ⱘࡳ㛑⹂ᅲ䛑ᰃᄬ೼ⱘˈৃҹ䗮䖛ᅲ偠䆕ᅲˈг ϡҙ᳝䘧ᖋϞⱘ䯂乬ˈг᳝⊩خDOS/Windows ᑨ⫼䕃ӊᏖഎⱘ൘ᮁDŽ԰㗙Ӏ೼кЁᣛ䋷 Microsoft 䖭ḋ Փ䙷ѯᓔথଚ᮴⊩Ϣ Microsoft ݀ᑇゲѝˈҢ㗠Փ Microsoft ৃҹ䗮䖛ᇍ݇䬂ᡔᴃⱘ൘ᮁ䖒ࠄᇍ জᰃϔϾᑨ⫼⿟ᑣⱘᓔথଚˈ䗮䖛৥݊ᅗⱘᑨ⫼⿟ᑣᓔথଚ䱤ⵦϔѯ᪡԰㋏㒳⬠䴶Ϟⱘᡔᴃ݇䬂ˈህ ᰃ᮴⊩⫼⭣ᗑ៪䘫ⓣࡴҹ㾷䞞ⱘˈ㗠া㛑ᰃᬙᛣ৥⫼᠋䱤ⵦDŽMicrosoft ᮶ᰃ᪡԰㋏㒳ⱘᦤկ㗙ˈৠᯊ 䌘᭭ⱘ䆌໮᳝⫼˄㗠Ϩ䞡㽕˅ⱘࡳ㛑DŽ԰㗙Ӏ䅸ЎˈMicrosoft ≵᳝ᇚ䖭ѯࡳ㛑ᬊܹ݊ᡔᴃ䌘᭭ⱘॳ಴ ⸈䆥੠ᘏ㒧ߎᴹⱘ DOS/Windows API˄ᑨ⫼⿟ᑣ䆒䅵⬠䴶˅ᅲ䰙ϞᦤկњԚै≵᳝ܹ߫ Microsoft ᡔᴃ кЁˈ԰㗙Ӏ˄Andrew SchulmanˈDavid Maxey ҹঞ Matt Pietrek ㄝ˅ϔϔ߫Вњ㒣䖛ҪӀ䕯ࢸࡾ࡯ᠡ ঺ϔᴀি Undocumented Windowsˈϸᴀкഛ㹿ܹ߫ DOS/Windows ㋏㒳⿟ᑣਬⱘᖙ໛Ꮉ݋кDŽ೼䖭ϸᴀ 䇈ࠄ催ᇮˈℸ໘乎֓໮䇈޴হDŽ㕢೑᳒㒣ߎ䖛ϸᴀᕜ᳝ѯᕅડⱘкˈϔᴀি Undocumented DOSˈ ᑨ䆹䇈ˈFSF ⱘᵘᗱᰃᕜᎻ཭гᰃᕜড়⧚ⱘˈ݊Ⳃⱘгᰃᕜ催ᇮⱘDŽ ফ GPL ᴵℒⱘ䰤ࠊDŽ བᵰԴᓔথњϔϾ⫼᠋⿟ᑣˈাᰃ䗮䖛㋏㒳䇗⫼ⱘ⬠䴶Փ⫼ݙḌˈ߭Դ㞾Ꮕᢹ᳝ᅠܼⱘⶹ䆚ѻᴗˈϡ ៪㗙೼Դⱘ⿟ᑣЁᓩ⫼њ Linux ݙḌЁⱘᶤѯ↉㨑ˈԴህᖙ乏ࡴҹ⬇ᯢᑊϨ݀ᓔԴⱘ⑤ҷⷕDŽԚᰃˈ Ϟ֗䖯䕃ӊⱘ݅ѿ੠䞡໡Փ⫼DŽ݋ԧࠄ Linux ⱘݙḌᴹ䇈ˈབᵰԴᇍݙḌ⑤ҷⷕⱘᶤѯ䚼ߚ԰њׂᬍˈ ᴵℒⱘ㑺ᴳ៪䰤ࠊDŽᘏПˈGPL ⱘЏ㽕Ⳃᷛᰃ˖Փ㞾⬅䕃ӊঞ݊㸡⫳ѻક㒻㓁ֱᣕᓔᬒ⢊ᗕˈҢᭈԧ 䍞⒮䍞໻DŽϡ䖛ˈབᵰϔϾ䕃ӊাᰃ䗮䖛ᶤϾ GNU 䕃ӊⱘ⫼᠋⬠䴶˄API˅Փ⫼䆹䕃ӊˈ߭ϡফ GPL ⒮䲾⧗ϔḋڣᛣѻકⱘߎଂϢ⑤ҷⷕⱘ݀ᓔᑊϡϔᅮⳌ⶯Ⳓ)DŽ䗮䖛䖭ḋⱘ䗨ᕘˈ㞾⬅䕃ӊⱘ䰉ᆍህӮ 㗠ᴹⱘˈ䙷М䖭Ͼ䕃ӊⱘ⑤ҷⷕህгᖙ乏ᇍՓ⫼㗙݀ᓔ˄⊼ܙӊᰃ೼ GNU ⑤ҷⷕⱘ෎⸔ࡴҹׂᬍǃᠽ GNU˅ˈᑊϨᖙ乏㽕ֱ䆕䅽᥹ᬊ㗙㛑໳݅ѿ⑤ҷⷕˈ㛑Ң⑤ҷⷕ䞡ᵘৃᠻ㸠ҷⷕDŽᤶ㿔ПˈབᵰϔϾ䕃 ӊⱘ෎⸔ࡴҹׂᬍ㗠៤ⱘ䕃ӊˈ೼থᏗ˄៪䕀䅽ǃߎଂ˅ᯊᖙ乏㽕⬇ᯢ䆹䕃ӊߎ㞾 GNU˄៪㗙⑤㞾 ҹথᏗ⫮㟇ߎଂˈԚᖙ乏㽕ヺড় GPL ⱘᶤѯᴵℒDŽㅔ㗠㿔Пˈ䖭ѯᴵℒ㾘ᅮ GNU 䕃ӊҹঞ೼ GNU 䕃 䌍ഄপᕫ GNU 䕃ӊঞ݊⑤ҷⷕˈᑊϨݡࡴܡ䆌ӏԩҎܕⱘ⑤ҷⷕ䞡ᵘৃᠻ㸠ҷⷕDŽ䖯ϔℹˈGPL 䖬 䌍ഄՓ⫼ GNU 䕃ӊˈᑊϨৃҹ⫼ GNU 䕃ӊܡ䆌ӏԩҎܕˈᡓᢙ GPL 㾘ᅮⱘϔѯНࡵDŽᣝ GPL 㾘ᅮ 䆌⫼᠋ᇍ԰ક䖯㸠໡ࠊǃׂᬍˈԚ㽕∖⫼᠋ܕֱᡸ԰㗙ᇍ݊԰કঞ݊㸡⫳કⱘ⣀ऴᴗˈ㗠 Copyleft ߭ гি Copyleftˈ䖭ᰃϢ䗮ᐌ᠔䆆ⱘ⠜ᴗे Copyright ៾✊ϡৠⱘࠊᑺDŽCopyright े䗮ᐌᛣНϟⱘ⠜ᴗˈ FSF ㅵ⧚DŽFSF Ў᠔᳝ⱘ GNU 䕃ӊࠊᅮњϔϾ݀⫼䆌ৃ䆕ࠊᑺˈ⿄Ў GPL˄General Public License˅ˈ Linux ݙḌ⑤ҷⷕˈᰃ᳝⠜ᴗֱᡸⱘˈাϡ䖛䖭⠜ᴗᔦ݀ӫ˄៪㗙䇈ܼҎ㉏˅᠔᳝ˈ⬅㞾⬅䕃ӊ෎䞥Ӯ 䌍ⱘ݀ᓔ䕃ӊˈ䙷ህ᮴᠔䇧⠜ᴗⱘ䯂乬њDŽ݊ᅲϡ✊DŽLinux ҹঞܡ䆌໮ҎҹЎˈ᮶✊ Linux ᰃ ᖙ乏݀ᓔ⑤ҷⷕDŽܙ ᖿህӮ᥼ߎⳌᑨⱘᮄ⠜ᴀDŽḍ᥂ FSF ᇍ㞾⬅䕃ӊ⠜ᴗⱘ㾘ᅮ˄GPL˅ˈ䖭ѯব⾡⠜ᴀᇍݙḌⱘׂᬍϢ㸹 ㄝㄝDŽᔧ✊ˈЁ᭛ Linux гᰃ݊Ёⱘü㉏DŽ↣ᔧ᳝ᮄⱘ Linux ݙḌ⠜ᴀথᏗᯊˈ䖭ѯব⾡⠜ᴀ䗮ᐌгᕜ ᅲᯊā㽕∖ⱘ㋏㒳ˈ᳝Ҏህᓔথњ RT•Linux˗䩜ᇍ᠟ᣕᓣ䅵ㅫᴎⱘ㽕∖ˈ᳝Ҏህᓔথߎњ Baby Linux˗ ⅞⦃๗៪㽕∖ⱘব⾡DŽ՟བˈ䩜ᇍĀጠܹᓣā㋏㒳ⱘ㽕∖ˈ᳝Ҏህᓔথߎ Embedded Linux˗䩜ᇍ᳝Ā⹀ ᔶ៤ϔѯ䩜ᇍ⡍ˈܙࠊDŽৠᯊˈ㗗㰥ࠄϔѯ⡍⅞ⱘᑨ⫼ˈϔѯᓔথଚ៪ᴎᵘᕔᕔᇍݙḌࡴҹׂᬍ੠㸹 Linux ݙḌⱘ㒜ᵕⱘᴹ⑤㱑✊া᳝ϔϾˈԚᰃৃҹЎ݊ᬍ䖯੠থሩ԰ߎ䋵⤂ⱘᖫᜓ㗙Ҏ᭄ैᑊ᮴䰤 ৘᳝ϡ䎇DŽ ᠔ҹϡৠॖଚⱘথ㸠⠜৘᳝⡍⚍гˈޚѢ⫼᠋ᅝ㺙ㅵ⧚DŽ⬅Ѣ㒘㒛ᮄⱘথ㸠⠜ᯊᑊ≵᳝ϔϾ㒳ϔⱘᷛ 5 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ѯ 20 ԡⱘᣛҸϧ⫼Ѣഄഔ䖤ㅫ੠᪡԰ˈԚᰃ䙷ḋজӮ䗴៤ CPU ݙ䚼㒧ᵘⱘϡഛࣔᗻDŽݡབˈᔧᯊⱘ ೼ϔѯ 8 ԡ CPU Ё䙷ḋˈ๲䆒ϔڣᮍḜᔧ✊᳝ᕜ໮⾡DŽ՟བˈৃҹއԩᴹ฿㸹䖭Ͼぎ䱭ਸ਼˛ৃ㛑ⱘ㾷 ᰃ 20 ԡˈԚ CPU Ё ALU ⱘᆑᑺैা᳝ 16 ԡˈгህᰃ䇈ৃⳈ᥹ࡴҹ䖤ㅫⱘᣛ䩜ⱘ䭓ᑺᰃ 16 ԡⱘDŽབ Ⳍᑨഄ⹂ᅮњˈ䙷ህᰃ 20 ԡDŽ䖭ḋˈϔϾ䯂乬ህᨚ೼њ Intel ⱘ䆒䅵Ҏਬ䴶ࠡ˖㱑✊ഄഔᘏ㒓ⱘᆑᑺ ᅮњ೼݊ 16 ԡ CPUˈे 8086 Ё䞛⫼ 1M ᄫ㡖ⱘݙᄬഄഔぎ䯈ˈഄഔᘏ㒓ⱘᆑᑺгህއ ᮶✊ Intel ৢߎ⦄ⱘџᅲ䆕ᯢᰃԄ䅵ϡ䎇ⱘDŽ ㄪˈᕔᕔᕜᖿህ㹿џއൟᴎгাϡ䖛ᰃ 4M ᄫ㡖ⱘݙᄬぎ䯈DŽ೼䅵ㅫᴎⱘথሩ৆Ϟˈ޴Т↣ϔϾᡔᴃ ⹂ᅲˈ1M ᄫ㡖ⱘݙᄬぎ䯈೼ᔧᯊᏆ㒣ᕜՓϔѯ⿟ᑣਬ▔ࡼϡᏆњˈ䙷ᯊ׭䜡㕂唤ܼⱘᇣൟᴎˈ⫮㟇໻ 䙷ᯊ㾝ᕫᑨ䆹ᰃ䎇໳њDŽˈס ᅮ䞛⫼ 1Mˈгህᰃ䇈 64K ⱘ 16އ ఼㢃⠛ⱘӋḐˈIntelټ⫼ࠡ᱃ˈҹঞᄬ ᅮⱘഄഔぎ䯈˄64K˅䖬ᰃ໾ᇣˈ䖬ᑨ䆹ࡴ໻DŽࡴࠄ໮໻ਸ਼˛㒧ড়ᔧᯊҎӀ᠔㛑ⳟࠄⱘᖂൟᴎⱘᑨއ᠔ ࠄ 16 ԡⱘᯊ׭ˈᴀᴹഄഔᘏ㒓ⱘᆑᑺᰃৃҹ䎳᭄᥂ᘏ㒓ϔ㟈њˈԚᰃᔧᯊҎӀᏆ㒣㾝ᕫ⬅ 16 ԡഄഔ ഛࣔᗻˈ೼ 8 ԡ CPU ⱘᣛҸ㋏㒳ЁᐌᐌӮথ⦄ϔѯᅲ䰙Ϟᰃ 16 ԡⱘ᪡԰DŽᔧ CPU ⱘᡔᴃҢ 8 ԡথሩ њDŽ᠔ҹˈϔ㠀 8 ԡ CPU ⱘഄഔᘏ㒓䛑ᰃ 16 ԡⱘDŽ䖭г䗴៤њϔѯ 8 ԡ CPU ೼ݙ䚼㒧ᵘϞⱘϔѯϡ 䖭ᰒ✊໾ᇣˈܗ㰥ˈ߭䖭ᅲ䰙Ϟᰃϡ⦄ᅲⱘˈ಴ЎϔϾ 8 ԡⱘഄഔা㛑⫼ᴹᇏ䆓 256 Ͼϡৠⱘഄഔऩ ഄഔˈгህᰃϔϾᣛ䩜ˈ᳔དᰃϢϔϾᭈ᭄ⱘ䭓ᑺϔ㟈DŽԚᰃˈབᵰҢ 8 ԡ CPU ᇏഔ㛑࡯ⱘ㾦ᑺᴹ㗗 ᘏ㒓āⱘᆑᑺਸ਼˛᳔㞾✊ⱘഄഔᘏ㒓ᆑᑺᰃϢ᭄᥂ᘏ㒓ϔ㟈DŽ䖭ᰃ಴ЎҢ⿟ᑣ䆒䅵ⱘ㾦ᑺᴹ䇈ˈϔϾ ㋏㒳ᘏ㒓Ёⱘ᭄᥂㒓䚼ߚˈ⿄ЎĀ᭄᥂ᘏ㒓āˈ䗮ᐌϢ ALU ݋᳝Ⳍৠⱘᆑᑺ˄Ԛ᳝՟໪˅DŽ䙷МĀഄഔ ā˄ALU˅ⱘᆑᑺDŽܗᔧ៥Ӏ䇈ϔϾ CPU ᰃ“16 ԡā៪“32 ԡāᯊˈᣛⱘᰃ໘⧚఼ЁĀㅫᴃ䘏䕥ऩ ೼ᇏഔᮍᓣϞᓔྟњҢĀᅲഄഔ῵ᓣāࠄĀֱᡸ῵ᓣāⱘ䖛⏵DŽ Ң 8088 ࠄ 80386ˈгህᰃҢ 16 ԡࠄ 32 ԡ䖛⏵ᯊⱘϔϾЁ䯈ℹ偸DŽ80286 㱑✊ҡᰃ 16 ԡ໘⧚఼ˈԚᰃ ೼ X86 ㋏߫Ёˈ8086 ੠ 8088 ᰃ 16 ԡ໘⧚఼ˈ㗠Ң 80386 ᓔྟЎ 32 ԡ໘⧚఼ˈ80286 ߭ᰃ䆹㋏߫ ݊ᇏഔᮍᓣ԰ϔѯㅔ㽕ⱘ䇈ᯢDŽ ㅵ⧚ᇍټ᳝݇DŽ䰤Ѣ㆛ᐙˈᴀкϡᇍ䖭Ͼ㋏߫ⱘ㋏㒳㒧ᵘ԰ܼ䴶ⱘҟ㒡ˈ㗠াᰃ㒧ড় Linux ݙḌⱘᄬ ℶ೼ PC ᴎЁՓ⫼ 80186ذᅮއݐᆍᴎⱘথሩӥ៮Ⳍ݇њDŽ݊Ё 80186 ᑊϡᑓЎҎⶹˈህϢ IBM ᔧ߱ োⱘ Pentium 㢃⠛DŽ㞾Ң IBM 䗝ᢽ 8088 ⫼Ѣ PC ϾҎ䅵ㅫᴎҹৢˈX86 ㋏߫ⱘথሩህϢ IBM PC ঞ݊ ো䛑ֱᣕϢҹࠡⱘ৘⾡ൟোݐᆍˈЏ㽕᳝ 8086ǃ8088ǃ80186ǃ80286ǃ80386ǃ80486 ҹঞҹৢ৘⾡ൟ 䗴ⱘDŽ᠔䇧 X86 ㋏߫ˈᰃᣛ Intel Ң 16 ԡᖂ໘⧚఼ 8086 ᓔྟⱘᭈϾ CPU 㢃⠛㋏߫ˈ㋏߫Ёⱘ↣⾡ൟ Intel ৃҹ䇈ᰃ䌘Ḑ᳔㗕ⱘᖂ໘⧚఼㢃⠛ࠊ䗴ଚњˈग़৆Ϟⱘ㄀ϔϾᖂ໘⧚㢃⠛ 4004 ህᰃ Intel ࠊ 1.2 Intel X86 CPU㋏߫ⱘᇏഔᮍᓣ ݈䍷៪᳝䳔㽕ⱘ䇏㗙ৃҹ˄㗠Ϩᑨ䆹˅Ҩ㒚䯙䇏DŽ ৡЎ/usr/scr/linux/COPYINGDŽ㗠೼ϟ䕑ⱘ Linux ݙḌ tar ᭛ӊЁˈ㒣䖛㾷य़ৢ䆹᭛ӊ೼乊ሖⳂᔩЁDŽ᳝ Ⲭᅝ㺙ⱘ Linux ㋏㒳Ёˈ䆹᭛ӊⱘ䏃ᕘܝGPL ⱘℷ᭛ࣙ৿೼ϔϾি COPYING ⱘ᭛ӊЁDŽ೼䗮䖛 㒧䆎DŽ ߎ㞾Ꮕⱘخ㽕ᰃᇚ FSF Ϣ Microsoft ᬒ೼ϔ䍋ˈ߭Ѡ㗙ᙄད៤Ў剰ᯢⱘᇍ↨DŽᏂ߿П໻ˈ䇏㗙ϡ䲒 ⹂ᅲ≵᳝ݭܹ Microsoft ৥ᅶ᠋ᦤկⱘᡔᴃ䌘᭭DŽ 6 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ߎᅲ䰙Ϟᑨ䆹ᬒϞ᭄᥂ᘏ㒓ⱘഄഔ˖ ৥䖭ḋϔϾ᭄᥂㒧ᵘⱘᣛ䩜DŽ䖭ḋˈᔧϔᴵ䆓ݙᄬᣛҸথߎϔϾݙᄬഄഔᯊˈCPU ህৃҹ䖭ḋᴹᔦ㒇 ෎ᴀᗱ䏃ᰃ˖೼ֱᡸ῵ᓣϟᬍব↉ᆘᄬ఼ⱘࡳ㛑ˈՓ݊ҢϔϾऩ㒃ⱘ෎ഄഔ˄বⳌⱘ෎ഄഔ˅ব៤ᣛ 䯂ᴗ䰤П㉏DŽ᠔ҹˈ䖭䞠䳔㽕ⱘᰃϔϾ᭄᥂㒧ᵘˈ㗠ᑊ䴲ϔϾऩ㒃ⱘ෎ഄഔDŽᇍℸˈIntel 䆒䅵Ҏਬⱘ ఼ᴹ⹂ᅮϔϾ෎ഄഔᰃϡ໳ⱘˈ㟇ᇥ䖬ᕫ㽕᳝ϔϾഄഔ↉ⱘ䭓ᑺˈᑊϨ䖬䳔㽕ϔѯ݊ᅗⱘֵᙃˈབ䆓 ᰃ⫼↉ᆘᄬܝˈ⫼ॳ᳝ⱘಯϾ↉ᆘᄬ఼˅ˈԚᰃैজ๲⏏њϸϾ↉ᆘᄬ఼ FS ੠ GSDŽЎњᅲ⦄ֱᡸ῵ᓣ Intel 䗝ᢽњ೼↉ᆘᄬ఼ⱘ෎⸔Ϟᵘㄥֱᡸ῵ᓣⱘᵘᗱˈᑊϨֱ⬭↉ᆘᄬ఼Ў 16 ԡ˄䖭ḋᠡৃҹ߽ জᰃϔ⃵ᣥ៬DŽ ᆘᄬ఼ⱘ෎⸔Ϟҹֱᣕ亢ḐϞⱘϔ㟈ˈᑊϨ䖬㛑㡖㑺 CPU ⱘݙ䚼䌘⑤ਸ਼˛䖭ᇍѢ Intel ⱘ䆒䅵Ҏਬ᮴⭥ 䖬ᖙ乏ᬃᣕᅲഄഔ῵ᓣˈϢℸৠᯊজ㽕㛑ᬃᣕֱᡸ῵ᓣDŽ㗠ֱᡸ῵ᓣᰃᅠܼ঺᧲ϔ༫ˈ䖬ᰃᓎゟ೼↉ ࠄ䖭ϔ⚍DŽ԰ЎϔϾѻક㋏߫Ёⱘϔਬˈ80386 ᖙ乏㓈ᣕ䙷ѯ↉ᆘᄬ఼ˈخ⫣㞾✊ⱘDŽԚᰃˈ80386 ै᮴ ࠄᕜㅔ⋕ǃᕜخݙᄬᴹ䇈ԐТᰃ䎇໳њDŽ᠔ҹˈབᵰᮄ䆒䅵ϔϾ 32 ԡ CPU ⱘ䆱ˈ݊㒧ᵘᑨ䆹ᰃৃҹ ᇍѢˈ˅ܚᘏ㒓ᆑᑺᰃϢ᭄᥂ᘏ㒓ϔ㟈DŽᔧഄഔᘏ㒓ⱘᆑᑺ䖒ࠄ 32 ԡᯊˈ݊ᇏഔ㛑࡯䖒ࠄњ 4G˄4 ग 80386 ᰃϾ 32 ԡ CPUˈгህᰃ䇈ᅗⱘ ALU ᭄᥂ᘏ㒓ᰃ 32 ԡⱘDŽ៥Ӏ೼ࠡ䴶䇈䖛ˈ᳔㞾✊ⱘഄഔ ϟ䴶៥Ӏᇚҹ 80386 Ў㚠᱃ˈҟ㒡 i386 ㋏߫ⱘֱᡸ῵ᓣDŽ ϞሲѢৠϔ⾡㋏㒳㒧ᵘЁⱘᬍ䖯Ϣࡴᔎˈ㗠ᑊ᮴䞡໻ⱘ䋼ⱘᬍবˈ᠔ҹ㒳⿄Ў i386 㒧 ᵘ ˈ៪ i386 CPUDŽ PentiumǃPentium II ㄝㄝൟোˈ㱑✊೼䗳ᑺϞᦤ催њད޴Ͼ䞣㑻ˈࡳ㛑Ϟг᳝њϡᇣⱘᬍ䖯ˈԚ෎ᴀ ԡ CPU ⱘ亲䎗ˈ㗠 80286 ߭ব៤䖭⃵亲䎗ⱘϔϾЁ䯈ℹ偸DŽҢ 80386 ҹৢˈIntel ⱘ CPU ग़㒣 80486ǃ CPU гᓔথ៤ࡳњDŽ䖭ḋˈҢ 8088/8086 ࠄ 80386 ህᅠ៤њϔ⃵Ң↨䕗ॳྟⱘ 16 ԡ CPU ࠄ⦄ҷⱘ 32 া㛑Ңᅲഄഔ῵ᓣ䕀ֱܹᡸ῵ᓣˈैϡ㛑Ңֱᡸ῵ᓣ䕀ಲᅲഄഔ῵ᓣ˅DŽৠᯊˈϡЙҹৢ 32 ԡⱘ 80386 䩜ᇍ 8086 ⱘ䖭⾡㔎䱋ˈIntel Ң 80286 ᓔྟᅲ⦄݊Āֱᡸ῵ᓣā˄Protected ModeˈԚᰃᮽᳳⱘ 80286 ᰒ✊ˈ೼ᅲഄഔ῵ᓣϞᰃ᮴⊩ᓎ䗴䍋⦄ҷᛣНϞⱘĀ᪡԰㋏㒳āⱘDŽ ᓣāˈህ⿄ЎĀᅲഄഔ῵ᓣāDŽ ໘⧚఼DŽ⬅Ѣ 8086 ⱘ䖭⾡ݙᄬᇏഔᮍᓣ㔎Уᇍݙᄬぎ䯈ⱘֱᡸˈ᠔ҹЎњऎ߿Ѣৢᴹߎ⦄ⱘĀֱᡸ῵ བᵰ㔎Уᇍݙᄬ䆓䯂ⱘ䰤ࠊˈ៪㗙䇈ֱᡸˈህ䇜ϡϞҔМݙᄬㅵ⧚ˈгህ䇜ϡϞᰃ⦄ҷᛣНϞⱘЁ༂ 㛑ᇍϔϾ䖯⿟ⱘݙᄬ䆓䯂ࡴҹ䰤ࠊˈгህ䇜ϡϞᇍ݊ᅗ䖯⿟ҹঞ㋏㒳ᴀ䑿ⱘֱᡸDŽϢℸⳌᑨˈϔϾ CPU 㗠ϱ↿ϡফࠄ䰤ࠊDŽϡˈܗᬍব↉ᆘᄬ఼ⱘݙᆍˈϔϾ䖯⿟ৃҹ䱣ᖗ᠔℆ഄ䆓䯂ݙᄬЁⱘӏԩϔϾऩ 㗠᮴⊩ࡴҹ䰤ࠊDŽৠᯊˈৃҹ⫼ᴹᬍব↉ᆘᄬ఼ݙᆍⱘᣛҸгϡᰃҔМĀ⡍ᴗᣛҸāˈгህᰃ䇈ˈ䗮䖛 Ͼ⬅↉ᆘᄬ఼ⱘݙᆍ⹂ᅮⱘĀ෎ഄഔāˈϔϾ䖯⿟ᘏᰃ㛑໳䆓䯂Ңℸᓔྟⱘ 64K ᄫ㡖ⱘ䖲㓁ഄഔぎ䯈ˈ ㋏㒳⧚䆎ЁⱘĀ↉ᓣݙᄬㅵ⧚āⳌԐˈԚᑊϡᅠܼϔḋˈЏ㽕ᰃ≵᳝ഄഔぎ䯈ⱘֱᡸᴎࠊDŽᇍѢ↣ϔ ݙ䚼ഄഔЁⱘ催 12 ԡϢ↉ᆘᄬ఼Ёⱘ 16 ԡⳌࡴˈ㗠ݙ䚼ഄഔЁⱘԢ 4 ԡֱ⬭ϡবDŽ䖭Ͼᮍ⊩Ϣ᪡԰ Ā᯴ᇘāDŽ䖭䞠㽕⊼ᛣ↉ᆘᄬ఼ЁⱘݙᆍᇍᑨѢ 20 ԡഄഔᘏ㒓Ёⱘ催 16 ԡˈ᠔ҹ೼Ⳍࡴᯊᅲ䰙Ϟᰃᣓ ᆍⳌࡴˈᔶ៤ϔϾ 20 ԡⱘᅲ䰙ഄഔDŽ䖭ḋˈህᅲ⦄њҢ 16 ԡݙ䚼ഄഔࠄ 20 ԡᅲ䰙ഄഔⱘ䕀ᤶˈ៪㗙 ⱘĀݙ䚼ഄഔā䛑ᰃ 16 ԡⱘˈԚᰃ೼䗕Ϟഄഔᘏ㒓Пࠡ䛑೼ CPU ݙ䚼㞾ࡼഄϢᶤϾ↉ᆘᄬ఼Ёⱘݙ ᭄᥂ǃේᷜ੠݊ᅗDŽ↣Ͼ↉ᆘᄬ఼䛑ᰃ 16 ԡⱘˈᇍᑨѢഄഔᘏ㒓Ёⱘ催 16 ԡDŽ↣ᴵĀ䆓ݙāᣛҸЁ Intel ೼ 8086 CPU Ё䆒㕂њಯϾĀ↉ᆘᄬ఼ā˖CSǃDSǃSS ੠ ESˈߚ߿⫼Ѣৃᠻ㸠ҷⷕेᣛҸǃ ഔぎ䯈DŽ㒧ᵰˈIntel 䆒䅵њϔ⾡೼ᔧᯊⳟᴹ䖬ϡ༅Ꮋ཭ⱘᮍ⊩ˈेߚ↉ⱘᮍ⊩DŽ ҹᇚ 16 ԡⱘഄഔ᯴ᇘࠄ 24 ԡⱘഄৃ˅ܗPDP•11 ᇣൟᴎгᰃ 16 ԡⱘˈԚᰃ㒧ড়݊ MMU˄ݙᄬㅵ⧚ऩ 7 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ݊ᅗϔѯֵᙃˈ݊㒧ᵘབ೒ 1.2 ᠔⼎DŽ ֱᡸⱘ԰⫼DŽ↣Ͼ↉ᦣ䗄㸼乍ⱘ໻ᇣᰃ 8 Ͼᄫ㡖ˈ↣Ͼᦣ䗄㸼乍৿᳝↉ⱘ෎ഄഔ੠↉ⱘ໻ᇣˈݡࡴϞ ⱘ෎ഄഔⳌࡴᕫࠄᦣ䗄㸼乍ⱘ䍋ྟഄഔDŽ಴ℸህ᮴⊩䗮䖛ׂᬍᦣ䗄㸼乍ⱘݙᆍᴹ⥽ᓘ䆵䅵ˈҢ㗠䍋ࠄ 㸼乍೼ݙᄬЁⱘҔМഄᮍˈгৃҹ⧚㾷៤ˈᇚ↉ᆘᄬ఼ݙᆍⱘԢ 3 ԡሣ㬑ᥝҹৢϢ GDTR ៪ LDTR Ё ᅮњ݋ԧⱘ↉ᦣ䗄އGDTR ៪ LDTR Ёⱘ↉ᦣ䗄㸼ᣛ䩜੠↉ᆘᄬ఼Ё㒭ߎⱘϟᷛ㒧ড়೼ϔ䍋ˈᠡ ೒ 1.1 ↉ᆘᄬ఼ᅮН ˄index˅ˈབ೒ 1.1 ᠔⼎DŽ ೼ℸ෎⸔Ϟˈ↉ᆘᄬ఼ⱘ催 13 ԡ˄Ԣ 3 ԡ঺԰ᅗ⫼˅⫼԰䆓䯂↉ᦣ䗄㸼Ё݋ԧᦣ䗄㒧ᵘⱘϟᷛ ೼Ϣॳ᳝ⱘᣛҸᰃ৺ݐᆍⱘ䯂乬ˈ䆓䯂䖭ϸϾᆘᄬ఼ⱘϧ⫼ᣛҸ֓䆒䅵៤Ā⡍ᴗᣛҸāDŽ ೼ݙᄬЁⱘϔϾ↉ᦣ䗄㒧ᵘ᭄㒘ˈ៪㗙⿄Ў↉ᦣ䗄㸼DŽ⬅Ѣ䖭ϸϾᆘᄬ఼ᰃᮄ๲䆒ⱘˈϡᄬټᣛ৥ᄬ table register˅ˈ঺ϔϾᰃሔ䚼ᗻⱘ↉ᦣ䗄㸼ᆘᄬ఼ LDTR˄local descriptor table register)ˈߚ߿ৃҹ⫼ᴹ ೼ 80386 CPU Ё๲䆒њϸϾᆘᄬ఼˖ϔϾᰃܼሔᗻⱘ↉ᦣ䗄㸼ᆘᄬ఼ GDTR˄global descriptorˈܜ佪 ⱘᅲ䰙ᅲ⦄DŽ ᯢⱑњ䖭Ͼᗱ䏃ˈ80386 ⱘ↉ᓣݙᄬㅵ⧚ᴎࠊህ↨䕗ᆍᯧ⧚㾷њ˄䖬ᰃᕜ໡ᴖ˅DŽϟ䴶ህᰃℸᴎࠊ ⊩䆓䯂݊ᅗ䖯⿟ⱘぎ䯈៪㋏㒳ぎ䯈DŽ ໛⡍ᴗⱘ⫼᠋⿟ᑣ䗮䖛⥽ᓘᶤѯ䆵䅵˄՟བׂᬍ↉ᆘᄬ఼ⱘݙᆍˈׂᬍ↉ᦣ䗄㒧ᵘⱘݙᆍㄝ˅ˈᕫҹ䴲 㗙䇈Ā䘏䕥ഄഔā˅䕀ᤶ៤⠽⧚ഄഔⱘ䖛⿟Ёˈᖙ乏㽕೼ᶤϾ⦃㡖Ϟᇍ䆓䯂ᴗ䰤䖯㸠↨ᇍˈҹ䰆ℶϡ݋ ೼䖤㸠ᯊ߭Փ⫼݊೼ CPU ЁⱘĀᕅᄤāDŽҢĀֱᡸāⱘ㾦ᑺ㗗㰥ˈ೼⬅˄ᣛҸ㒭ߎⱘ˅ݙ䚼ഄഔ˄៪ ೼ݙᄬЁˈ೼ᅲ䰙Փ⫼ᯊैᇚ݊㺙䕑ܹ CPU Ёⱘϔ㒘Āᕅᄤā㒧ᵘˈ㗠 CPUټ㱑✊↉ᦣ䗄㒧ᵘᄬ (6) ᇚᣛҸЁথߎⱘഄഔ԰Ўԡ⿏ˈϢ෎ഄⳌࡴ㗠ᕫߎᅲ䰙ⱘĀ⠽⧚ഄഔāDŽ (5) ḍ᥂ᣛҸⱘᗻ䋼੠↉ᦣ䗄ヺЁⱘ䆓䯂ᴗ䰤ᴹ⹂ᅮᰃ৺䍞ᴗDŽ (4) ᇚᣛҸЁথߎⱘഄഔ԰Ўԡ⿏ˈϢ↉ᦣ䗄㒧ᵘЁ㾘ᅮⱘ↉䭓ᑺⳌ↨ˈⳟⳟᰃ৺䍞⬠DŽ (3) Ңഄഔ↉ᦣ䗄㒧ᵘЁᕫࠄ෎ഄഔDŽ (2) ḍ᥂↉ᆘᄬ఼ⱘݙᆍˈᡒࠄⳌᑨⱘĀഄഔ↉ᦣ䗄㒧ᵘāDŽ ᣛҸЁⱘഄഔ೼᭄᥂↉DŽ䖭ϔ⚍Ϣᅲഄഔ῵ᓣⳌৠDŽ ḍ᥂ᣛҸⱘᗻ䋼ᴹ⹂ᅮᑨ䆹Փ⫼ાϔϾ↉ᆘᄬ఼ˈ՟བ䕀⿏ᣛҸЁⱘഄഔ೼ҷⷕ↉ˈ㗠প᭄ (1) 8 9 ೒ 1.2 8ᄫ㡖↉ᦣ䗄ヺ㸼乍ⱘᅮН 㒧ᵘЁⱘ B31̚B24 ੠ B23̚B16 ߚ߿Ў෎ഄഔⱘ bit16̚bit23 ੠ bit24̚bit31DŽ㗠 L19̚L16 ੠ L15̚L0 ߭Ў↉䭓ᑺ˄Limit˅ⱘ bit0̚bit15 ੠ bit16̚bit19DŽ݊Ё DPL ᰃϾ 2 ԡⱘԡ↉ˈ㗠 type ᰃϔ Ͼ 4 ԡⱘԡ↉DŽᅗӀ᠔೼ⱘᭈϾᄫ㡖ߚ㾷བ೒ 1.3 ᠔⼎DŽ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! unsigned int base_0_23:24; /* ෎ഄഔⱘԢ 24 ԡ */ unsigned int type:4; /* ↉ⱘ㉏ൟˈϢϞ䴶ⱘ S ᷛᖫԡϔ䍋Փ⫼ */ unsigned int s:1; /* ᦣ䗄乍㉏ൟˈ1 㸼⼎㋏㒳ˈ0 㸼⼎ҷⷕ៪᭄᥂ */ unsigned int dpl:2; /* Descriptor privilege level ˈ䆓䯂ᴀ↉᠔䳔ᴗ䰤 */ unsigned int p:1; /* segment present ˈЎ 0 ᯊ㸼⼎䆹↉ⱘݙᆍϡ೼ݙᄬЁ */ unsigned int seg_limit_16_19:4; /* ↉䭓ᑺⱘ᳔催 4 ԡ *ˋ unsigned int avl:1; /* avalaible ˈৃկ㋏㒳䕃ӊՓ⫼ */ unsigned int unused:1; /* ೎ᅮ䆒㕂៤ 0 *ˋ unsigned int d_b:1; /* default operation size ᄬপᮍᓣˈ0=16 ԡˈ1=32 ԡ */ unsigned int g:1; /* granularity ˈ㸼↉ⱘ䭓ᑺऩԡˈ0 㸼⼎ᄫ㡖ˈ1 㸼⼎ 4KB */ unsigned int base24_31:8; /* ෎ഄഔⱘ᳔催 8 ԡ */ typedef struct { ៥Ӏгৃҹ⫼ϔ↉ĀӾҷⷕāᴹ䇈ᯢᭈϾ↉ᦣ䗄㒧ᵘ˖ ೒ 1.3 ↉ᦣ䗄㸼乍 TYPE ᄫ㡖ⱘᅮН 10 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䗮䖛ׂᬍ↉ᦣ䗄乍ᴹᠧ⸈㋏㒳ⱘֱᡸᴎࠊDŽ䙷Мˈ80386 ᗢМᴹߚ䱨㋏㒳⢊ᗕ੠⫼᠋⢊ᗕˈᑊϨᦤկ೼ ᦣ䗄㸼೼ݙᄬЁⱘԡ㕂ˈজ᮴⊩䆓䯂݊↉ᦣ䗄㸼᠔೼ⱘぎ䯈˄া㛑೼㋏㒳⢊ᗕϟᠡ㛑䆓䯂˅ˈҢ㗠᮴⊩ ㋏㒳ⱘݙḌЁ˅Փ⫼ˈᠡՓᕫ⫼᠋⿟ᑣϡԚϡ㛑ᬍব GDTR ੠ LDTR ⱘݙᆍˈ䖬಴Ў᮶᮴⊩⹂ⶹ݊↉ LGDT/LLDT ੠ SGDT/SLDT ㄝህ䛑ᰃ⡍ᴗᣛҸDŽℷᰃ⬅Ѣ䖭ѯᣛҸ䛑া㛑೼㋏㒳⢊ᗕ˄гህᰃ೼᪡԰ GDTR ੠ LDTR ⱘᣛҸ ټࠄֱᡸⱘᬜᵰDŽࠡ䴶Ꮖ㒣ᦤࠄ䖛⡍ᴗᣛҸⱘ䆒㕂ˈབ⫼ᴹ㺙ܹ੠ᄬ 䆌೼㋏㒳⢊ᗕϟՓ⫼˅ⱘ䆒ゟˈ䙷Мሑㅵ᳝њࠡ䗄ⱘ↉ᓣݙᄬㅵ⧚ˈг䖬ϡ㛑䍋ܕҹঞ⡍ᴗᣛҸ˄া ᇍ↉ᓣݙᄬㅵ⧚ⱘᬃᣕাᰃ i386 ֱᡸ῵ᓣⱘϔϾ㒘៤䚼ߚDŽབᵰ≵᳝㋏㒳⢊ᗕ੠⫼᠋⢊ᗕⱘߚ⾏ˈ ↉߭ৃҹݭܹ⺕Ⲭˈᑊᇚ݊ᦣ䗄乍Ё p ᷛᖫԡᬍ៤ 0DŽټᄬ ᄬЁⱘᶤϾഄᮍˈᑊ᥂ℸ䆒㕂ᦣ䗄乍Ёⱘ෎ഄഔˈݡᇚ p ᷛᖫԡ䆒㕂៤ 1DŽⳌᑨഄˈݙᄬЁ᱖ᯊϡ⫼ⱘ ⫳ϔ⃵ᓖᐌ˄exceptionˈ㉏ԐѢЁᮁ˅ˈ㗠Ⳍᑨⱘ᳡ࡵ⿟ᑣ֓ৃҹҢ⺕ⲬѸᤶऎᇚ䖭ϔ↉ⱘݙᆍ䇏ܹݙ ህ㸼⼎䆹ᦣ䗄乍᠔ᣛ৥ⱘ䙷ϔ↉ݙᆍϡ೼ݙᄬЁ˄гህᰃ䇈ˈ೼⺕ⲬϞⱘᶤϾഄᮍ˅ˈℸᯊ CPU Ӯѻ ܹ CPU ЁDŽ೼ℸ䖛⿟ЁˈCPU ӮẔᶹ䆹ᦣ䗄乍Ёⱘ p ᷛᖫԡ˄㸼⼎“presentā˅ˈབᵰ p ᷛᖫԡЎ 0ˈ ݙᆍᬍবᯊˈCPU 㽕ḍ᥂ᮄⱘ↉ᆘᄬ఼ݙᆍҹঞ GDTR ៪ LDTR ⱘݙᆍᡒࠄⳌᑨⱘ↉ᦣ䗄乍ᑊᇚ݊㺙 ߽⫼ 80386 ᇍ↉ᓣݙᄬㅵ⧚ⱘ⹀ӊᬃᣕˈৃҹᅲ⦄↉ᓣ㰮ᄬㅵ⧚DŽབࠡ᠔䗄ˈᔧϔϾ↉ᆘᄬ఼ⱘ 㽕њ㾷ᅠᭈⱘ㒚㡖ৃҹখ䯙 Intel ⱘ᳝݇ᡔᴃ䌘᭭DŽ DŽ䇏㗙ᛇܙҟ㒡䖭ѯˈҹৢ䱣ⴔҷⷕߚᵤⱘ䖯ሩ㾚䳔㽕ݡࡴҹ㸹ܜ݇Ѣ 80386 ⱘ↉ᓣݙᄬㅵ⧚ህ ⱘϔ⾡Փ⫼⡍՟DŽ ഔⱘՓ⫼ᑊϡᛣੇⴔ㒩䖛њ↉ᦣ䗄㸼ǃ↉ᆘᄬ఼䖭üᭈ༫↉ᓣݙᄬㅵ⧚ⱘᴎࠊˈ㗠াᰃ↉ᓣݙᄬㅵ⧚ 䴶˄Flat˅āഄഔDŽLinux ݙḌⱘ⑤ҷⷕ˄᳈⹂ߛഄᑨ䆹䇈ᰃ gcc˅䞛⫼ᑇ䴶ഄഔDŽ䖭䞠㽕ᣛߎˈᑇ䴶ഄ 㒭ߎⱘഄഔDŽ䖭ḋⱘഄഔ᳝߿Ѣ⬅Ā↉ᆘᄬ఼ˋԡ⿏䞣āᵘ៤ⱘĀሖ⃵ᓣāഄഔˈ᠔ҹ Intel ⿄݊ЎĀᑇ ↉DŽ⬅Ѣ෎ഄഔЎ 0ˈℸᯊⱘ⠽⧚ഄഔϢ䘏䕥ഄഔⳌৠˈCPU ᬒࠄഄഔᘏ㒓Ϟএⱘഄഔህᰃ೼ᣛҸЁ ߭ᇚ෎ഄഔ䆒៤ 0ˈᑊᇚ↉䭓ᑺ䆒៤᳔໻ˈ䖭ḋ֓ᔶ៤ϔϾҢ 0 ᓔྟ㽚ⲪᭈϾ 32 ԡഄഔぎ䯈ⱘϔϾᭈ ೼ 80386 ⱘ↉ᓣݙᄬㅵ⧚ⱘ෎⸔Ϟˈབᵰᡞ↣Ͼ↉ᆘᄬ఼䛑ᣛ৥ৠϔϾᦣ䗄乍ˈ㗠೼䆹ᦣ䗄乍Ё 乍ⱘぎ䯈ˈ䖭ϔ䚼ߚᰃϧկ CPU ݙ䚼Փ⫼ⱘDŽ ⱘ↉ᆘᄬ఼ϔḋ˗঺ϔ䚼ߚᰃϡৃ㾕ⱘˈህᰃ⫼ᴹᄬᬒᕅᄤᦣ䗄ܜߚᰃৃ㾕ⱘ˄ᇍ⿟ᑣ㗠㿔˅ˈ䖬Ϣॳ ⱘ↉ᆘᄬ఼ߚ៤ϸ䚼ߚˈϔ䚼ৢܙDŽᠽܙ఼ህ᳝޴Ͼᕅᄤᦣ䗄乍ˈ᠔ҹгৃҹⳟ԰ᰃᇍ↉ᆘᄬ఼ⱘᠽ ᅮⱘ↉ᦣ䗄乍㺙ܹ CPU ݙ䚼ⱘϔϾĀᕅᄤāᦣ䗄乍DŽ䖭ḋˈCPU Ё᳝޴Ͼ↉ᆘᄬއᆘᄬ఼ⱘᮄݙᆍ᠔ ↣ᔧϔϾ↉ᆘᄬ఼ⱘݙᆍᬍবᯊ˄䗮䖛 MOVǃPOP ㄝᣛҸ៪থ⫳Ёᮁㄝџӊ˅ˈCPU ህᡞ⬅䖭↉ 㾝DŽ Ԛᰃ 80286 Ꮖ㒣থଂߎএњˈѢᰃህাདׂׂ㸹㸹DŽᔧᯊⱘ Intel ⹂ᅲ㒭Ҏϔ⾡Āᇣ㛮ཇҎ䍄䏃āⱘᛳ ᛣ℆Փ⫼ 24 ԡഄഔぎ䯈ˈϡЙজ䅸䆚ࠄᑨ䆹⫼ 32 ԡˈܜԡഄഔぎ䯈ⱘ໻ᇣDŽ᠔ҹˈৃҹⳟߎˈIntel 䍋 㗠↉䭓ᑺᄫ↉ⱘԢ 16 ԡⱘᆍ䞣ᰃ 64Kˈ᠔ҹϔϾ↉ⱘ᳔໻ৃ㛑䭓ᑺЎ 64K×4K=256Mˈ㗠䖭ℷᰃ 24 32 ԡഄഔぎ䯈DŽ䖭гৃҹҢ↉䭓ᑺᄫ↉гᰃᢚ៤ϸ㡖ᕫࠄॄ䆕˖ᔧ g ᷛᖫԡЎ 1 ᯊˈ䭓ᑺⱘऩԡЎ 4KBˈ Ԣ 24 ԡϡ䖲೼ϔ䍋˛᳔㞾✊г᳔ড়⧚ⱘ㾷䞞ህᰃ˖ᓔྟᯊ Intel ⱘᛣ೒ᰃ 24 ԡഄഔぎ䯈ˈৢᴹজᬍ៤ 䇏㗙ϔᅮӮ䯂˖ЎҔМᡞ↉ᦣ䗄乍ᅮН៤䖭ḋϔ⾡༛ᗾⱘ㒧ᵘ˛՟བˈЎҔМ෎ഄഔⱘ催 8 ԡ੠ ҹ䖭䞠ⱘԡ↉ type Ў՟ˈ“:4ā㸼⼎݊ᆑᑺЎ 4 ԡDŽᭈϾ᭄᥂㒧ᵘⱘ໻ᇣЎ 64 ԡˈे 8 Ͼᄫ㡖DŽ } ↉ᦣ䗄乍; unsigned int seg_limit_0_15:16; /* ↉䭓ᑺⱘԢ 16 ԡ */ 11 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ࠡ䴶䆆䖛ˈ80386 ⱘ↉ᓣݙᄬㅵ⧚ᴎࠊˈᰃᇚᣛҸЁ㒧ড়↉ᆘᄬ఼Փ⫼ⱘ 32 ԡ䘏䕥ഄഔ᯴ᇘ˄䕀 80286 ᓔྟⱘ↉ᓣݙᄬㅵ⧚ⱘৠᯊ䖬ᅲ⦄њ义ᓣݙᄬㅵ⧚DŽ ಴ℸˈ೼ϡЙҹৢⱘ 80386 Ёህᅲ⦄њᇍ义ᓣݙᄬㅵ⧚ⱘᬃᣕDŽгህᰃ䇈ˈ80386 䰸њᅠ៤ᑊᅠ୘Ң 㗠≵᳝义ᓣݙᄬㅵ⧚ᰃϡ໳ⱘˈ䙷ḋӮՓᅗⱘX86㋏߫䗤⏤༅এゲѝ࡯ҹঞ԰ЎЏ⌕ CPU ѻકⱘഄԡDŽ ᳝↉ᓣݙᄬㅵ⧚ܝˈ⦃Intel Ң 80286 ᓔྟᅲ⦄݊Āֱᡸ῵ᓣāˈгे↉ᓣݙᄬㅵ⧚DŽԚᰃᕜᖿህথ ϔᯊ៤Ў᪡԰㋏㒳乚ඳⱘϔϾ⛁⚍DŽ 䖯DŽҢ 80 ᑈҷЁᳳᓔྟˈ义ᓣݙᄬㅵ⧚䖯ܹњ৘⾡᪡԰㋏㒳˄ҹ Unix ЎЏ˅ⱘݙḌˈܜᓣㅵ⧚᳈Ў ᄺ䖛᪡԰㋏㒳ॳ⧚ⱘ䇏㗙䛑ⶹ䘧ˈݙᄬㅵ⧚᳝ϸ⾡ˈϔ⾡ᰃ↉ᓣㅵ⧚ˈ঺ϔ⾡ᰃ义ᓣㅵ⧚ˈ㗠义 1.3 i386ⱘ义ᓣݙᄬㅵ⧚ᴎࠊ ⿄ПЎ㋏㒳⢊ᗕ੠⫼᠋⢊ᗕDŽ া⫼њϸϾ㑻߿ˈे 0 㑻੠ 3 㑻ˈ԰Ў㋏㒳⢊ᗕ੠⫼᠋⢊ᗕDŽᴀк೼ҹৢⱘ䅼䆎Ёᇚ⊓⫼ Unix ⱘӴ㒳 ޴Т᠔᳝ᑓ⊯Փ⫼ⱘ CPU 䛑≵᳝䖭М໡ᴖDŽ㗠Ϩˈ೼ 80386 Ϟᅲ⦄ⱘ৘⾡ Unix ⠜ᴀˈࣙᣀ Linuxˈ䛑 㽕DŽ᳝ѯ᪡԰㋏㒳ˈབ OS/2 Ёˈг⹂ᅲ⫼њDŽԚᰃᕜ໮Ҏ䛑ᗔ⭥ᰃ৺ⳳ᳝ᖙ㽕᧲ᕫ䙷М໡ᴖDŽџᅲϞˈ ⦄ i386 ⱘֱᡸ῵ᓣᯊᇚ CPU ⱘᠻ㸠⢊ᗕߚ៤ಯ㑻ˈᛣ೒ᰃЎ⒵䎇᳈Ў໡ᴖⱘ᪡԰㋏㒳੠䖤㸠⦃๗ⱘ䳔 ҹঞᦣ䗄ӏࡵ⢊ᗕⱘĀӏࡵ⢊ᗕ↉”TSS ㄝˈ䖭ѯ䛑ᇚ೼݊ᅗゴ㡖Ё᳝䳔㽕ᯊݡࡴҹҟ㒡DŽIntel ೼ᅲ Ё䖬᳝ϾЁᮁ৥䞣㸼ᣛ䩜ᆘᄬ఼ IDTRǃϢ䖯⿟˄೼ Intel ᴃ䇁Ё⿄ЎĀӏࡵāˈTask˅᳝݇ⱘᆘᄬ఼ TR 䆎DŽℸ໪ˈ䰸њܼሔ↉ᦣ䗄㸼ᣛ䩜 GDTR ੠ሔ䚼↉ᦣ䗄㸼ᣛ䩜 LDTR ϸϾᆘᄬ఼໪ˈ݊ᅲ i386 CPU 㟇Ѣᗢḋ೼ϡৠⱘᠻ㸠ᴗ䰤П䯈ߛᤶˈ៥Ӏᇚ೼䖯⿟䇗ᑺǃ㋏㒳䇗⫼੠Ёᮁ໘⧚ⱘ᳝݇ゴ㡖Ё䅼 ࠡᠻ㸠ᴗ䰤੠↉ᆘᄬ఼᠔ᣛᅮ㽕∖ⱘᴗ䰤ഛϡԢѢ᠔㽕䆓䯂ⱘ䙷ϔ↉ݙᄬⱘᴗ䰤 dplDŽ 㗠 rpl ߭㸼⼎᠔㽕∖ⱘᴗ䰤DŽᔧᬍবϔϾ↉ᆘᄬ఼ⱘݙᆍᯊˈCPU ӮࡴҹẔᶹˈҹ⹂ֱ䆹↉⿟ᑣⱘᔧ ᔧ↉ᆘᄬ఼ CS Ёⱘ ti ԡЎ 1 ᯊˈ㸼⼎㽕Փ⫼ܼሔ↉ᦣ䗄㸼ˈЎ 0 ᯊˈ߭㸼⼎㽕Փ⫼ሔ䚼↉ᦣ䗄㸼DŽ } ↉ᆘᄬ఼; 㑻߿ */ܜ unsigned short rpl:2; /* Requested Privilege Level ˈ㽕∖ⱘӬ unsigned short ti:1; /* ↉ᦣ䗄㸼ᣛ⼎ԡˈ0 㸼⼎ GDTˈ1 㸼⼎ LDT */ unsigned short seg_idx:13; /* 13 ԡⱘ↉ᦣ䗄乍ϟᷛ */ typedef struct { 䖬ᰃ䗮䖛ϔ↉Ӿҷⷕᴹ䇈ᯢ˖ ࠡ䴶䆆䖛ˈ16 ԡⱘ↉ᆘᄬ఼Ёⱘ催 13 ԡ⫼԰ϟᷛᴹ䆓䯂↉ᦣ䗄㸼ˈ㗠Ԣ 3 ԡᰃᑆҔМⱘਸ਼˛៥Ӏ ᰃ೼ 0 㑻⢊ᗕϟ⬅ݙḌ䆒ᅮⱘDŽ㗠ܼሔ↉ᦣ䗄ⱘ dpl ᄫ↉ˈ߭জ᳝᠔ϡৠˈᅗᰃ㸼⼎᠔䳔ⱘ㑻߿DŽ ᅮ˄dpl 㸼⼎“descriptor privilege level˅DŽᔧ✊ˈ↣Ͼᦣ䗄乍Ёⱘ dpl ᄫ↉䛑އ↉ᦣ䗄乍˅Ёⱘ dpl ᄫ↉ ⫼⿟ᑣ䛑ᰃ 3 㑻DŽϔ㠀⿟ᑣⱘᔧࠡ䖤㸠㑻߿⬅݊ҷⷕ↉ⱘሔ䚼ᦣ䗄乍˄े⬅↉ᆘᄬ఼ CS ᠔ᣛ৥ⱘሔ䚼 㑻ⱘ⢊ᗕϟᠡ㛑Փ⫼ˈ㗠ϔ㠀ⱘ䕧ܹˋ䕧ߎᣛҸ˄INˈOUT˅߭㾘ᅮЎ 0 㑻៪ 1 㑻DŽ䗮ᐌˈ⫼᠋ⱘᑨ 㑻߿ˈ݊Ё 0 㑻Ў᳔催ˈ3 㑻Ў᳔ԢDŽ↣ϔᴵᣛҸг䛑᳝݊䗖⫼㑻߿ˈབࠡ䗄ⱘ LGDTˈህা᳝೼ 0 ⱘ䙷ḋˈߦߚߎ㋏㒳⢊ᗕ੠⫼᠋⢊ᗕˈ㗠ᰃߦߚ៤ಯϾ⡍ᴗخϔ㠀 CPU 䗮ᐌ᠔ڣ80386 ᑊϡাᰃ ϸ⾡⢊ᗕП䯈ߛᤶⱘᴎࠊਸ਼˛ 12 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! (4) ᇚ义䴶ᦣ䗄乍Ё㒭ߎⱘ义䴶෎ഄഔϢ㒓ᗻഄഔЁⱘ offset ԡ↉Ⳍࡴᕫࠄ⠽⧚ഄഔDŽ (3) ҹ㒓ᗻഄഔЁⱘ page ԡ↉Ўϟᷛˈ೼᠔ᕫࠄⱘ义䴶㸼ЁপᕫⳌᑨⱘ义䴶ᦣ䗄乍DŽ (2) ҹ㒓ᗻഄഔЁⱘ dir ԡ↉Ўϟᷛˈ೼ⳂᔩЁপᕫⳌᑨ义䴶㸼ⱘ෎ഄഔDŽ (1) Ң CR3 পᕫ义䴶Ⳃᔩⱘ෎ഄഔDŽ 义䴶Ⳃᔩⱘᣛ䩜DŽ䖭ḋˈҢ㒓ᗻഄഔࠄ⠽⧚ഄഔⱘ᯴ᇘ䖛⿟Ў˖ Ёজ᳝݅ 1024 Ͼ义䴶ᦣ䗄乍DŽ㉏ԐѢ GDTR ੠ LDTRˈজ๲ࡴњϔϾᮄⱘᆘᄬ఼ CR3 ԰Ўᣛ৥ᔧࠡ ৃҹⳟߎˈ೼义䴶ⳂᔩЁ᳝݅ 210 = 1024 ϾⳂᔩ乍ˈ↣ϾⳂᔩ乍ᣛ৥ϔϾ义䴶㸼ˈ㗠೼↣Ͼ义䴶㸼 ೒ 1.4 㒓ᗻഄഔⱘḐᓣ offset ⿏أ义䴶Ⳃᔩ dir 义䴶㸼 义ݙ 31 22 21 12 11 0 䖭Ͼ㒧ᵘৃҹ⫼೒ 1.4 ᔶ䈵ഄ㸼⼎DŽ } 㒓ᗻഄഔ; 䞣 */⿏أ unsigned int offset:12; /* ೼ 4K ᄫ㡖⠽⧚义䴶ݙⱘ unsigned int page:10; /* ⫼԰݋ԧ义䴶㸼Ёⱘϟᷛˈ䆹㸼乍ᣛ৥ϔϾ⠽⧚义䴶 */ unsigned int dir:10; /* ⫼԰义䴶㸼ⳂᔩЁⱘϟᷛˈ䆹Ⳃᔩ乍ᣛ৥ϔϾ义䴶㸼 */ typedef struct { ⬅Ѣ义ᓣᄬㅵⱘᓩܹˈᇍ 32 ԡⱘ㒓ᗻഄഔ᳝њᮄⱘ㾷䞞˄ҹࠡህᰃ⠽⧚ഄഔ˅˖ ㅵˈ᠔᳝ⱘ㒓ᗻഄഔ䛑㽕㒣䖛义ᓣ᯴ᇘˈ䖲 GDTR Ϣ LDTR Ё㒭ߎⱘ↉ᦣ䗄㸼䍋ྟഄഔгϡ՟໪DŽ ᗻгℷ೼Ѣℸ)DŽ䖭䞠䳔㽕ᣛߎⱘᰃˈ㱑✊义ᓣᄬㅵᰃᓎゟ೼↉ᓣᄬㅵⱘ෎⸔ϞˈԚϔᮺਃ⫼њ义ᓣᄬ ぎ䯈䖬ᰃ䖲㓁ⱘDŽԚᰃ೼义ᓣᄬㅵЁˈ䖲㓁ⱘ㒓ᗻഄഔ㒣䖛᯴ᇘৢ೼⠽⧚ぎ䯈ैϡϔᅮ䖲㓁˄݊♉⌏ ᄫ㡖໻ᇣⱘऎ䯈˄䖍⬠ᖙ乏Ϣ 4K ᄫ㡖ᇍ唤˅DŽ೼↉ᓣᄬㅵЁˈ䖲㓁ⱘ䘏䕥ഄഔ㒣䖛᯴ᇘৢ೼㒓ᗻഄഔ ぎ䯈Ёӏᛣϔഫ 4Kټ80386 ᡞ㒓ᗻഄഔぎ䯈ߦߚ៤ 4K ᄫ㡖ⱘ义䴶ˈ↣Ͼ义䴶ৃҹ㹿᯴ᇘ㟇⠽⧚ᄬ ഔ˗៪㗙ˈᔧϡՓ⫼义ᓣᄬㅵᯊˈህᇚ㒓ᗻഄഔⳈ᥹⫼԰⠽⧚ഄഔDŽ ᇚ䘏䕥ഄഔ᯴ᇘ៤㒓ᗻഄഔˈ✊ৢݡ⬅义ᓣᄬㅵᇚ㒓ᗻഄഔ᯴ᇘ៤⠽⧚ഄܜᗻഄഔāDŽѢᰃˈ↉ᓣᄬㅵ ݡࡴϞϔሖഄഔ᯴ᇘDŽ⬅Ѣℸᯊ⬅↉ᓣᄬㅵ᯴ᇘ㗠៤ⱘഄഔϡݡᰃĀ⠽⧚ഄഔāњˈIntel ህ⿄ПЎĀ㒓 ㅵা㛑ᓎゟ೼↉ᓣᄬㅵⱘ෎⸔ϞDŽ䖭гᛣੇⴔˈ义ᓣᄬㅵⱘ԰⫼ᰃ೼⬅↉ᓣᄬㅵ᠔᯴ᇘ㗠៤ⱘഄഔϞ ᅮњᅗⱘ义ᓣᄬއܼ঺䍋♝♊ˈ䙷ህ᮴⊩㒩䖛↉ᓣᄬㅵᴹᅲ⦄义ᓣᄬㅵDŽгህᰃ䇈ˈ80386 ⱘ㋏㒳㒧ᵘ ᅮ߽⫼䚼ߚᏆ㒣ᄬ೼ⱘ䌘⑤ˈ㗠ϡᰃᅠއϢӏԩ݊ᅗⱘ᭄᥂㒧ᵘ≵᳝݇㋏DŽ಴ℸˈ೼ 80386 Ёˈ᮶✊ ԰ϔᇍ↨DŽ೼ PDP•11 Ё CPU ⱘᔧࠡᠻ㸠ᴗ䰤ᄬᬒ೼ϔϾ⣀ゟⱘᆘᄬ఼ PSW Ёˈ㗠މPDP•11 Ёⱘᚙ CPU ⱘᔧࠡᠻ㸠ᴗ䰤ህᰃ೼᳝݇ⱘҷⷕ↉ᦣ䗄乍Ё㾘ᅮⱘDŽ䇏䖛 Unix ᮽᳳ⠜ᴀⱘ䇏㗙ϡོᇚℸϢ ᆚϡৃߚⱘDŽ՟བˈټ⸔ПϞˈ䖭ᰃϸ⾡ϡৠⱘᴎࠊDŽৃᰃˈ೼ 80386 Ёˈֱᡸ῵ᓣⱘᅲ⦄ᰃϢ↉ᓣᄬ ㅵ⧚ⱘ෎ټㅵ⧚ᑊϡ䳔㽕ᓎゟ೼↉ᓣᄬټㅵ⧚DŽᴀᴹˈ义ᓣᄬټ᠔ҹˈ↨䕗དⱘࡲ⊩䖬ᰃ䞛⫼义ᓣᄬ ߚᇣˈ㱑✊ϔϾ↉ᦣ䗄㸼Ёৃҹᆍ㒇 8192 Ͼᦣ䗄乍˄಴Ў᳝ 13 ԡϟᷛ˅ˈг᳾ᖙህ㛑ֱ䆕䎇໳Փ⫼DŽ Ͼ䖯⿟ⱘぎ䯈ߦߚ៤ᕜ໮ᇣ↉ᯊˈህ࢓ᖙ㽕∖೼⿟ᑣЁ乥㐕ഄᬍব↉ᆘᄬ఼ⱘݙᆍDŽৠᯊˈབᵰᇚ↉ ᮍ䴶Ā↉āᰃৃব䭓ᑺⱘˈ䖭ህ㒭ⲬऎѸᤶ᪡԰ᏺᴹњϡ֓˗঺ϔᮍ䴶ˈབᵰЎњ๲ࡴ♉⌏ᗻ㗠ᇚϔ ㅵ⧚ᴎࠊⱘ♉⌏ᗻ੠ᬜ⥛䛑↨䕗ᏂDŽϔټⱘഄഔDŽԚᰃˈ↉ᓣᄬܗҹᇏ䆓⠽⧚Ϟᄬ೼ⴔⱘ݋ԧݙᄬऩ ᤶ˅៤ৠḋᰃ 32 ԡⱘ⠽⧚ഄഔDŽП᠔ҹ⿄ЎĀ⠽⧚ഄഔāˈᰃ಴Ў䖭ᰃⳳℷᬒࠄഄഔᘏ㒓Ϟএˈᑊ⫼ 13 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! unsigned int reserved:1; /* ֱ⬭ˈ∌䖰ᰃ 0 */ unsigned int ps:1; /* 义䴶໻ᇣˈ0 㸼⼎ 4K ᄫ㡖 */ unsigned int g:1; /* global ˈܼሔᗻ义䴶 */ unsigned int avail:3; /* կ㋏㒳⿟ᑣਬՓ⫼ */ unsigned int ptba:20; /* 义㸼෎ഄഔⱘ催 20 ԡ */ typedef struct { ⱘDŽѢᰃˈⳂᔩ乍ⱘ㒧ᵘЎ˖ ḋˈ೼Ⳃᔩ乍੠义㸼乍Ё䛑া㽕᳝ 20 ԡ⫼Ѣᣛ䩜ህ໳њˈ㗠ԭϟⱘ 12 ԡ߭ৃҹ⫼Ѣ᥻ࠊ៪݊ᅗⱘⳂ ᣛ䩜DŽ⬅Ѣ义䴶㸼੠义䴶ⱘ䍋ྟഄഔ䛑ᘏᰃ೼ 4K ᄫ㡖ⱘ䖍⬠Ϟˈ䖭ѯᣛ䩜ⱘԢ 12 ԡ䛑∌䖰ᰃ 0DŽ䖭 བࠡ᠔䗄ˈⳂᔩ乍Ё৿᳝ᣛ৥ϔϾ义䴶㸼ⱘᣛ䩜ˈ㗠义䴶㸼乍Ё߭৿᳝ᣛ৥ϔϾ义䴶䍋ྟഄഔⱘ ⱘ Alpha CPU Ё义䴶ⱘ໻ᇣᰃ 8K ᄫ㡖ˈ಴ЎⳂᔩ㸼乍੠义䴶㸼乍ⱘ໻ᇣ䛑ব៤њ 8 Ͼᄫ㡖DŽ ᙄདৃҹᬒ೼ϔϾ义䴶ЁDŽ㗠㢹໮Ѣ 1024 乍ህ㽕ՓⳂᔩ៪义䴶㸼䎼义䴶ᄬᬒњDŽгℷЎℸˈ೼ 64 ԡ ⱘ໻ᇣᰃ 4K ᄫ㡖ˈ㗠↣ϔϾ义䴶㸼乍៪Ⳃᔩ㸼乍ⱘ໻ᇣᰃ 4 Ͼᄫ㡖DŽ1024 Ͼ㸼乍ℷདгᰃ 4K ᄫ㡖ˈ 䯈ˈ䙷ህϡҙϡ㛑㡖ⳕˈড㗠㽕໮⍜㗫ϔϾⳂᔩ᠔ऴ⫼ⱘぎ䯈ˈԚ䙷ὖ⥛෎ᴀϞᰃ 0DŽ঺໪ˈϔϾ义䴶 ぎټϟˈབᵰϔϾ䖯⿟ⳳⱘ㽕⫼ࠄܼ䚼 4G ⱘᄬމぎ䯈DŽᔧ✊ˈ೼᳔ണⱘᚙټᑨⱘ义㸼ˈҢ㗠ⳕϟњᄬ 䖭ḋህ䗴៤њ⌾䌍DŽ㗠㢹ߚ៤ϸሖˈ߭义㸼ৃҹ㾚䳔㽕㗠䆒㕂ˈབᵰⳂᔩЁᶤ乍Ўぎˈህϡᖙ䆒ゟⳌ ܼ䚼ぎ䯈ˈ᠔ҹ໻䚼ߚ㸼乍࢓ᖙᰃぎⴔⱘDŽৃᰃˈ೼ϔϾ᭄㒘ЁˈेՓᰃぎⴔϡ⫼ⱘ㸼乍гऴ⫼ぎ䯈ˈ ҡЎ 4K×1M=4Gˈℷདᰃ 32 ԡഄഔぎ䯈ⱘ໻ᇣDŽԚᰃˈᅲ䰙Ϟᕜ䲒ᛇ䈵᳝ϔϾ䖯⿟Ӯ䳔㽕⫼ࠄ 4G ⱘ 20 ԡˈ಴ℸ义䴶㸼ⱘ໻ᇣህᇚᰃ 1K×1K=1M Ͼ㸼乍DŽ⬅Ѣ↣Ͼ义䴶ⱘ໻ᇣЎ 4K ᄫ㡖ˈᘏⱘぎ䯈໻ᇣ 䙷ḋϔℹࠄԡਸ਼˛䖭ᰃߎѢぎ䯈ᬜ⥛ⱘ㗗㰥DŽབᵰᇚ㒓ᗻഄഔЁⱘ dir ੠ page ϸϾԡ↉ড়ᑊ೼ϔ䍋ᰃ ೼Փ⫼↉ᆘᄬ఼ᯊڣᡒࠄⳂᔩ乍ˈݡᡒࠄ义䴶ᦣ䗄乍ˈ㗠ϡᰃܜˈ䙷МˈЎҔМ㽕Փ⫼ϸϾሖ⃵ ೒ 1.5 义ᓣ᯴ᇘ⼎ᛣ೒ Ϟ䗄᯴ᇘ䖛⿟ৃ⫼೒ 1.5 Ⳉ㾖ഄ㸼⼎DŽ (5) 14 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䖬ᰃ᳝䖰㾕ⱘDŽצ ϞˈIntel ໘⧚㽕∖ⱘ᮹Ⲟ๲ࡴˈ4M ᄫ㡖ⱘ义䴶໻ᇣ᳝ৃ㛑Ӯ៤ЎЏ⌕DŽ೼䖭ü⚍ڣ䗳ᑺⱘᰒ㨫ᦤ催ˈҹঞᇍ೒ ᇥњϔϾሖ⃵DŽ䱣ⴔݙᄬᆍ䞣੠⺕Ⲭᆍ䞣ⱘ᮹Ⲟ๲ࡴˈ⺕Ⲭ䆓䯂ޣे 1024×4M=4GˈԚᰃ᯴ᇘⱘ䖛⿟ ׭ˈ㒓ᗻഄഔЁⱘԢ 22 ԡህܼ䚼⫼԰೼ 4M ᄫ㡖义䴶Ёⱘԡ⿏DŽ䖭ḋˈᘏⱘᇏഔ㛑࡯䖬ᰃ≵᳝ᬍবˈ ᴎࠊDŽᔧ ps ԡЎ 1 ᯊˈ义䴶ⱘ໻ᇣህ៤њ 4M ᄫ㡖ˈ㗠义䴶㸼ህϡݡՓ⫼њDŽ䖭ᯊܙPSE 义䴶໻ᇣᠽ 4K ᄫ㡖ˈ䖭гᰃⳂࠡ೼ Linux ݙḌЁ᠔䞛⫼ⱘ义䴶໻ᇣDŽԚᰃˈҢ Pentium ໘⧚఼ᓔྟˈIntel ᓩܹњ ᔧⳂᔩ乍Ёⱘ ps˄page size˅ԡЎ 0 ᯊˈࣙ৿೼⬅䆹Ⳃᔩ乍᠔ᣛⱘ义䴶㸼Ё᠔᳝义䴶ⱘ໻ᇣ䛑ᰃ ᅗֵᙃˈབ㹿ᤶߎⱘ义䴶೼⺕ⲬϞⱘԡ㕂ㄝㄝDŽ݊ټഛ᮴ᛣНˈ᠔ҹৃ㹿⫼ᴹЈᯊᄬ ✊ৢᇚⳌᑨ义䴶㸼乍ⱘ p ԡ䆒㕂Ў 0DŽ䖭ḋˈህৃҹᅲ⦄义ᓣ㰮ᄬњDŽᔧ p ԡЎ 0 ᯊˈ㸼乍ⱘ݊ԭ৘ԡ 䆒㕂㸼乍Ёⱘ෎ഄഔˈᑊᇚ p ԡ䆒㕂៤ 1DŽⳌডˈгৃҹᇚݙᄬЁ᱖ϡՓ⫼ⱘ义䴶ݭܹ⺕ⲬⱘѸᤶऎˈ 䖭ḋˈݙḌЁⱘ᳝݇ᓖᐌ᳡ࡵ⿟ᑣህৃҹҢ⺕ⲬϞⱘ义䴶ѸᤶऎᇚⳌᑨⱘ义䴶䇏ܹݙᄬˈᑊϨⳌᑨഄ CPU ৃҹѻ⫳ϔϾĀ义䴶䫭ā˄Page Fault˅ᓖᐌ˄г⿄Ў㔎义ЁᮁˈԚᓖᐌ੠Ёᮁ݊ᅲᰃ᳝ऎ߿ⱘ˅DŽ ៪Ⳃᔩ乍Ёⱘ᳔Ԣԡ p Ў 0 ᯊˈ㸼⼎Ⳍᑨⱘ义䴶៪义䴶㸼ϡ೼ݙᄬˈḍ᥂݊ᅗϔѯ᳝݇ᆘᄬ఼ⱘ䆒㕂ˈ Ⳃᔩ乍Ёֱ⬭ϡ⫼˅߭Ў D˄Dirty˅ᷛᖫˈ㸼⼎䆹义䴶Ꮖ㒣㹿ݭ䖛ˈ᠔ҹᏆ㒣Ā㛣āњDŽᔧ义䴶㸼乍 义㸼乍ⱘ㒧ᵘ෎ᴀϞϢℸⳌৠˈԚ≵᳝Ā义䴶໻ᇣāԡ psˈ᠔ҹ㄀ 8 ԡֱ⬭ϡ⫼ˈԚ㄀ 7 ԡ˄೼ ೒ 1.6 义Ⳃᔩ乍⼎ᛣ೒ Ⳃᔩ乍ⱘⳈ㾖㸼⼎བ೒ 1.6 㸼⼎DŽ } Ⳃᔩ乍; unsigned int p:1; /* Ў 0 ᯊ㸼⼎Ⳍᑨⱘ义䴶ϡ೼ݙᄬЁ */ unsigned int r_w:1; /* া䇏៪ৃݭ */ unsigned int u_s:1; /* Ў 0 ᯊ㸼⼎㋏㒳˄៪䍙㑻˅ᴗ䰤ˈЎ 1 ᯊ㸼⼎⫼᠋ᴗ䰤 */ ఼ */ټᄬކ unsigned int pwt:1; /* Write Through ˈ⫼Ѣ㓧 ఼ */ټᄬކ unsigned int pcd:1; /* ݇䯁˄ϡՓ⫼˅㓧 unsigned int a:1; /* accessed ˈᏆ㹿䆓䯂䖛 */ 15 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! struct foo { gcc 䖬ᬃᣕϔϾֱ⬭ᄫ“attributeāˈ⫼ᴹ԰ሲᗻᦣ䗄DŽབ˖ ⷕЁ᳝ᯊ׭ⳟࠄ“asmāˈ㗠᳝ᯊ׭জⳟࠄ“__asm__āⱘॳ಴DŽ ಴㗠“__inline__āㄝӋѢֱ⬭ᄫ“inlineāDŽৠḋⱘ䘧⧚ˈ“__asm__āㄝӋѢ“asmāDŽ䖭ህᰃ៥Ӏ೼ҷ 䆌೼԰Ўֱ⬭ᄫՓ⫼ⱘ“inlineāࠡǃৢ䛑ࡴϞ“__āˈܕ 䖭Ͼ䯂乬ˈgccއさDŽЎњ㾷ކ䖭ḋህѻ⫳њ ᰃ⬅Ѣ“inlineāॳ䴲ֱ⬭ᄫ˄೼ C++Ёᰃֱ⬭ᄫ˅ˈ᠔ҹ೼㗕ⱘҷⷕЁৃ㛑Ꮖ㒣᳝ϔব䞣ৡЎ inlineˈ さDŽ՟བˈgcc ᬃᣕֱ⬭ᄫ inlineˈৃކ㿔˄བ ANSI C˅Ё䖭ѯ䆡ᑊ䴲ֱ⬭ᄫˈ䖭ḋህ᳝ৃ㛑ѻ⫳ϔѯ ᇥ䖭ḋⱘᦣ䗄ヺDŽ䖭ѯᦣ䗄ヺⱘՓ⫼ㄝѢᰃ೼ C 䇁㿔Ё๲ࡴњϔѯᮄⱘֱ⬭ᄫDŽৃᰃˈ೼ॳᴹⱘ C 䇁 䆌໮ C 䇁㿔䛑ᬃᣕϔѯĀሲᗻᦣ䗄ヺā˄attribute˅ˈབ“alignedāǃ“ packedāㄝㄝ˗gcc гᬃᣕϡ long intāˈ䆹㉏ൟ೼ݙḌҷⷕЁᐌᐌ⫼ࠄDŽ 䖬᳝ˈЎњᬃᣕ 64 ԡⱘ CPU 㒧 ᵘ˄ Alpha ህᰃ 64 ԡ ⱘ ˅ˈ gcc ๲ࡴњϔ⾡ᮄⱘ෎ᴀ᭄᥂㉏ൟ“long ⬅Ѣ inline ߑ᭄ⱘ໻䞣Փ⫼ˈⳌᔧϔ䚼ߚⱘҷⷕҢ.c ᭛ӊ⿏ܹњ.h ᭛ӊЁDŽ ᅣ᪡԰ϔḋ㵡ܹњᓩ⫼໘ⱘҷⷕЁˈ᳝߽Ѣᦤ催䖤㸠ᬜ⥛DŽڣ⫼Ӭ࣪䞡ᮄ㓪䆥ϔ⃵ˈ䖭ѯ inline ߑ᭄ህ བᵰ㓪䆥ᯊϡࡴӬ࣪ˈ߭䖭ѯ inline ߑ᭄ህᰃ᱂䗮ⱘǃ⣀ゟⱘߑ᭄ˈ᳈֓Ѣ䇗䆩DŽ䇗䆩དњҹৢˈݡ䞛 ⱘՓ⫼Ϣ#define ᅣᅮНⳌԐˈԚ᳈᳝Ⳍᇍⱘ⣀ゟᗻˈг᳈ᅝܼDŽՓ⫼ inline ߑ᭄г᳝߽Ѣ⿟ᑣ䇗䆩DŽ ᮶ᰃ C 㓪䆥জᰃ C++㓪䆥ˈ᠔ҹҢ C++Ё਌ᬊϔѯϰ㽓ࠄ C Ёᰃᕜ㞾✊ⱘDŽҢࡳ㛑Ϟ䇈ˈinline ߑ᭄ gcc Ң C++䇁㿔Ё਌ᬊњ“inlineā੠“constāDŽ݊ᅲˈGNU ⱘ C ੠ C++ᰃড়Ўϔԧⱘˈgccˈܜ佪 DŽܙ䱣ⴔ݋ԧⱘᚙ᱃੠ҷⷕⱘሩᓔˈ೼䳔㽕ᯊ䖬Ӯ㒧ড়ᅲ䰙ࡴҹ㸹 ԰ϔѯㅔऩⱘҟ㒡DŽҹৢˈܜ੠ᡔᎻܙӮᕅડ䇏㗙䯙䇏 Linux ݙḌ⑤⿟ᑣˈ៪Փ䇏㗙ᛳࠄೄᚥⱘϔѯᠽ ݋ԧⱘᚙ᱃੠Ϟϟ᭛ˈ㔫߫ϔ໻ේ㾘߭ˈᇍѢ䇏㗙ᘤᗩг≵᳝໮໻ᐂࡽDŽ᠔ҹˈ៥Ӏ೼䖭䞠াᇍৃ㛑 ੠ᡔᎻDŽݡ䇈ˈ⾏ᓔܙGNU C 䇁㿔ⱘϧ㨫ˈг䴲ᡔᴃ᠟ݠˈ᠔ҹϡ೼䖭䞠ϔϔ߫В੠䆺㒚䅼䆎䖭ѯᠽ Ӯ⫼ࠄϔѯ೼ᑨ⫼⿟ᑣ䆒䅵Ёϡᐌ㾕ⱘ䇁㿔៤ߚ៪㓪⿟ᡔᎻˈг䆌Փ䇏㗙ᛳࠄ䰠⫳DŽᴀкᑊ䴲ҟ㒡 㛑ᰃ䇏㗙ᇮ᳾㾕ࠄ䖛ⱘDŽ঺ϔᮍ䴶ˈ⬅ѢᰃݙḌҷⷕˈᕔᕔৃˈܙ䑿˄೼ ANSI C ෎⸔Ϟ˅԰њϡᇥᠽ Linux ݙḌⱘЏԧᰃҹ GNU ⱘ C 䇁㿔㓪ݭⱘˈGNU Ўℸᦤկњ㓪䆥Ꮉ݋ gccDŽGNU ᇍ C 䇁㿔ᴀ 1.4 LinuxݙḌ⑤ҷⷕЁⱘ C 䇁㿔ҷⷕ ⱘˈ᠔ҹᴀкᇚҙ೼᳝ᖙ㽕ᯊᠡࡴҹㅔऩⱘ䇈ᯢˈ㗠ϡ೼ℸ䆺䗄њDŽ ੠⌕∈㒓ࡳ㛑DŽԚᰃᇍѢ䕃ӊǃᇍѢ᪡԰㋏㒳ⱘݙḌᴹ䇈ˈ䙷೼ᕜ໻⿟ᑺϞᰃ䗣ᯢټᄬކᔎⱘ催䗳㓧 ㅵ⧚໪ˈ80386 䖬᳝ᕜټ㋏㒳㒧ᵘDŽџᅲϞˈLinux ॳᴹህᏆ㒣೼ Alpha CPUϞᬃᣕ 64 ԡഄഔDŽ䰸ᄬ ⱘ᳝݇ᡔᴃ䌘᭭៪ϧ㨫DŽℸ໪ˈIntel Ꮖ㒣᥼ߎњ 64 ԡⱘ IA•64 ㋏㒳㒧ᵘˈLinux ݙḌгᏆ㒣ᬃᣕ IA•64 ໮᭄⫼᠋䛑䖬ϡ䳔㽕Փ⫼ 36 ԡ˄64G˅⠽⧚ഄഔぎ䯈ˈ᠔ҹ䖭䞠Ң⬹ˈ᳝݈䍷ⱘ䇏㗙ৃҹখ䯙 Intel ㅵ⧚ⱘ᯴ᇘᴎࠊг㞾✊ഄ᳝᠔ᬍবDŽϡ䖛໻ټᑺህব៤њ 36 ԡ˄জ๲ࡴњ 4 ԡ˅DŽϢℸⳌᑨˈ义ᓣᄬ ఼ CR4 Ёজ๲ࡴњϔԡ PAE˄㸼⼎ Physical Address Extension˅ˈᔧ PAE ԡ䆒㕂៤ 1 ᯊˈഄഔᘏ㒓ⱘᆑ ⱘᰃ⠽⧚ഄഔⱘᆑᑺDŽIntel ೼঺ϔϾ᥻ࠊᆘᄬܙDŽ䖭ϔ⃵ᠽܙҢ Pentium Pro ᓔྟˈIntel জ԰њᠽ ㅵ⧚ⱘ᯴ᇘᴎࠊDŽټ៤ 1 ᯊˈCPU ህᓔਃњ义ᓣᄬ ᳔ৢˈi386 CPUЁ䖬᳝Ͼᆘᄬ఼ CR0ˈ᳔݊催ԡ PG ᰃ义ᓣ᯴ᇘᴎࠊⱘᘏᓔ݇DŽᔧ PG ԡ㹿䆒㕂 16 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! do_something_else(); else memcpy(bufp,addr,nr); bufp += nr; if (addr) 㒣䖛乘໘⧚ҹৢˈ䖭↉ҷⷕህӮব៤䖭ḋ˖ do_something_else(); else DUMP_WRITE(addr,nr); if (addr) ᴹ䇈ᯢ˖ ᛇⱘ՟ᄤ؛ϡ㸠DŽབᵰ᳝ϔ↉⿟ᑣ೼ϔϾ if 䇁হЁᓩ⫼䖭Ͼᅣ᪡԰ህӮߎ䯂乬ˈ䅽៥Ӏ䗮䖛ϔϾ 163 #define DUMP_WRITE(addr,nr) memcpy(bufp,addr,nr); bufp += nr; 㛑ϡ㛑ᅮН៤བϟᓣḋ˛ˈܜ䖭ԐТ᳝⚍ᗾDŽ៥Ӏϡོⳟⳟ݊ᅗ޴⾡ৃ㛑DŽ佪 ᪡԰ᯊӮᠻ㸠ᕾ⦃ԧϔ⃵ˈ㗠Ϩাᠻ㸠ϔ⃵DŽৃᰃˈЎҔМ㽕䖭ḋ䗮䖛ϔϾ do•while ᕾ⦃ᴹᅮНਸ਼˛ ᠻ㸠ৢ߸ᮁᕾ⦃ᴵӊDŽ᠔ҹˈ䖭ϾᅮНᛣੇⴔ↣ᔧᓩ⫼䖭Ͼᅣܜ䇏㗙ᛇᖙⶹ䘧ˈdo•while ᕾ⦃ᰃ 163 #define DUMP_WRITE(addr,nr) do { memcpy(bufp,addr,nr); bufp += nr; } while(0) ==================== fs/proc/kcore.c 163 163 ==================== ⳟϔϾᅲ՟ˈপ㞾 fs/proc/kcore.cDŽܜ䞠԰ϔѯ㾷䞞DŽ Ёҡ᳝䆌໮ᅣ᪡԰ᅮНDŽҎӀᐌᐌӮᇍݙḌҷⷕЁϔѯᅣ᪡԰ⱘᅮНᮍᓣᛳࠄ䗋ᚥϡ㾷ˈ᳝ᖙ㽕೼䖭 བࠡ᠔䗄ˈLinux ݙḌⱘҷⷕЁՓ⫼њ໻䞣 inline ߑ᭄DŽϡ䖛ˈ䖭ᑊ᳾⍜䰸ᇍᅣ᪡԰ⱘՓ⫼ˈݙḌ 㓪䆥DŽ ᬃᣕ޴Т᠔᳝䞡㽕ⱘǃᐌ⫼ⱘ CPUˈgcc ᬃᣕⱘ CPU ህ᳈໮њDŽ㗠Ϩˈgcc 䖬ᬃᣕᇍ৘⾡ CPU ⱘѸঝ ⱘ䳔㽕DŽ݊⃵ˈৃ⿏ỡᗻ䯂乬ⳟԐ䞡໻ˈ݊ᅲᑊϡ໾Ϲ䞡DŽབࠡ᠔䗄ˈⳂࠡⱘ Linux ݙḌ⑤ҷⷕᏆ㒣 Unix 㽕থሩˈC 䇁㿔ᔧ✊г㽕থሩDŽᇍѢ Unix ᴹ䇈ˈC 䇁㿔ϡ䖛ᰃᎹ݋ˈ㗠Ꮉ݋ᔧ✊㽕᳡ҢⳂⱘᴀ䑿 ᰃҹ∛㓪੠ B 䇁㿔кݭⱘˈℷᰃ಴Ў Unix ⱘ䳔㽕ᠡ᳝њ C 䇁㿔DŽ᠔ҹˈC 䇁㿔ৃ䇈ᰃ Unix ⱘᄾ⫳⠽DŽ ࠄᮄⱘ CPU Ϟᑨ䴲䲒џDŽಲ乒ϔϟ Unix ⱘग़৆DŽ᳔߱ⱘ Unix˅ܙDŽݡ䇈ˈᇚ gcc ⿏ỡ˄݊ᅲᰃᠽܜ ೼ৃ⿏ỡᗻϢᴀ䑿ⱘ䋼䞣П䯈ˈGNU 䗝ᢽњҹ䋼䞣ЎӬˈܜᅮDŽā佪އ㒣䖛ᴗ㸵ᕫ༅߽ᓞҹৢ԰ߎⱘ ձ䌪݇㋏DŽ䇏㗙㞾✊Ӯ䯂˖Ā䖭ḋˈLinux ݙḌⱘৃ⿏ỡᗻᰃ৺Ӯফࠄᤳᆇ˛āಲㄨᰃ˖ĀᰃⱘˈԚ䖭ᰃ ህϡ㛑ݡՓ⫼䕗㗕⠜ᴀⱘ gcc ᴹ㓪䆥DŽгህᰃ䇈ˈLinux ݙḌⱘ৘⾡⠜ᴀ᳝ⴔᇍ gcc ⠜ᴀⱘˈܙࡴᮄᠽ ℸˈ⬅Ѣ gcc ੠ Linux ݙḌ೼ᑇ㸠ഄথሩˈϔᮺ೼ Linux ݙḌЁՓ⫼њ gccˈ೼݊䕗ᮄ⠜ᴀЁ᳝њᮄ๲ ᕜ㞾✊ഄ Linux ⱘݙḌህা㛑⫼ gcc 㓪䆥DŽϡҙབˈܙ⬅Ѣ೼ Linux ⱘݙḌЁՓ⫼њ gcc ᇍ C ⱘᠽ さњDŽކ⫳ぎ⋲DŽ䖭ḋˈ“packedāህϡӮϢব䞣ৡথ 䖭䞠ሲᗻᦣ䗄“packedā㸼⼎೼ᄫヺ a Ϣᭈൟ᭄㒘 x П䯈ϡᑨЎњϢ 32 ԡ䭓ᭈ᭄䖍⬠ᇍ唤㗠⬭ϟ } int x[z] attribute__ ((packed)); char a; 17 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䚼ˈ៤Ў䆹᭄᥂㒧ᵘⱘϔϾĀ䖲᥹ӊā˗гৃҹ⣀ゟᄬ೼㗠៤ЎüϾ䯳߫ⱘ༈DŽ䖭Ͼ᭄᥂㒧ᵘⱘᅮН೼ 㒧ᵘЁᢑ䈵ߎᴹ៤Ўϔ⾡᭄᥂㒧ᵘ list_headˈ䖭⾡᭄᥂㒧ᵘ᮶ৃҹĀᆘᆓā೼݋ԧⱘᆓЏ᭄᥂㒧ᵘݙ ҹ⫼ࠄ৘⾡ϡৠ᭄᥂㒧ᵘⱘ䯳߫᪡԰DŽЎℸˈҷⷕⱘ԰㗙Ӏᡞᣛ䩜 prev ੠ next Ң݋ԧⱘĀᆓЏā᭄᥂ ໻䯂乬ˈԚᇍѢՓ⫼໻䞣䯳߫ⱘݙḌህ៤䯂乬њDŽ᠔ҹˈLinux ݙḌЁ䞛⫼њϔ༫䗮⫼ⱘǃϔ㠀ⱘǃৃ ໮ᇥ⾡᭄᥂㒧ᵘⱘ䯳߫ˈህᕫ᳝໮ᇥ༫ⱘ䯳߫᪡԰ᄤ⿟ᑣDŽᇍѢՓ⫼䯳߫䕗ᇥⱘᑨ⫼⿟ᑣ៪䆌ϡᰃϾ ᰃ೎ᅮⱘ˄䛑ᣛ৥ foo ᭄᥂㒧ᵘ˅ˈ䖭ѯᄤ⿟ᑣϡ㛑⫼Ѣ݊ᅗ᭄᥂㒧ᵘⱘ䯳߫᪡԰DŽᤶ㿔Пˈ䳔㽕㓈ᣕ ✊ৢЎ䖭⾡᭄᥂㒧ᵘݭϔ༫⫼Ѣ৘⾡䯳߫᪡԰ⱘᄤ⿟ᑣDŽ⬅Ѣ⫼ᴹ㓈ᣕ䯳߫ⱘ䖭ϸϾᣛ䩜ⱘ㉏ൟ } foo; ...... struct foo *next; struct foo *prev; { typedef struct foo ⫼ⱘࡲ⊩ህᰃ೼䖭Ͼ᭄᥂㒧ᵘⱘ㉏ൟᅮНЁࡴܹϸϾᣛ䩜ˈ՟བ˖ བᵰ៥Ӏ᳝ϔ⾡᭄᥂㒧ᵘ fooˈᑊϨ䳔㽕㓈ᣕϔϾ䖭⾡᭄᥂㒧ᵘⱘঠ䫒䯳߫ˈ᳔ㅔऩⱘǃгᰃ᳔ᐌ ϔѯҟ㒡DŽ ㅵ⧚ㄝㄝ˅ˈ᠔ҹ៥Ӏ೼䖭䞠԰ټ㗠䖭জϡᰃϧ䮼ሲѢાϔϾᮍ䴶ⱘݙᆍ˄བ䖯⿟ㅵ⧚ǃ᭛ӊ㋏㒳ǃᄬ 䇏㗙೼ᄺд᭄᥂㒧ᵘᯊϔᅮᄺ䖛䯳߫˄ᣛঠ䫒䯳߫˅᪡԰DŽݙḌЁ໻䞣ഄՓ⫼ⴔ䯳߫੠䯳߫᪡԰ˈ 㗠೼঺ϔѯ CPU Ϟህϡ䳔㽕ˈ᠔ҹ㽕ᡞᅗᅮНЎぎ᪡԰DŽ ໛ˈޚݙḌ೼䇗ᑺϔϾ䖯⿟䖤㸠ˈ䖯㸠ߛᤶП䰙ˈ೼᳝ѯ CPU Ϟ䳔㽕䇗⫼ prepare_to_switch()԰ѯ 14 #define prepare_to_switch() do { } while(0) ==================== include/asm•i386/system.h 14 14 ==================== include/asm•i386/system.h Ёⱘ prepare_to_switch()˖ ੠ϡৠⱘ㋏㒳䜡㕂ˈ᠔ҹᐌᐌ䳔㽕೼ϔᅮⱘᴵӊϟᡞᶤѯᅣ᪡԰ᅮНЎぎ᪡԰DŽ՟བ೼ њ㾷њ䖭ϔ⚍ПৢˈݡᴹⳟᇍĀぎ᪡԰āⱘᅮНDŽ⬅Ѣ Linux ݙḌⱘҷⷕ㽕㗗㰥ࠄ৘⾡ϡৠⱘ CPU ϟ䛑≵᳝䯂乬DŽމ↨Пϟˈ䞛⫼ do•while ⱘᅮН೼ӏԩᚙ ৠḋˈgcc ೼⺄ࠄ else ࠡ䴶ⱘ“;āᯊህ䅸Ў if 䇁হᏆ㒣㒧ᴳˈ಴㗠ৢ䴶ⱘ else ϡ೼ if 䇁হЁDŽⳌ do_something_else(); else {memcpy(bufp,addr,nr); bufp += nr;}; if (addr) ৃᰃˈϞ䴶䙷↉⿟ᑣ䖬ᰃ䗮ϡ䖛㓪䆥ˈ಴Ў㒣䖛乘໘⧚ህব៤䖭ḋ˖ 163 #define DUMP_WRITE(addr,nr) {memcpy(bufp,addr,nr); bufp += nr;} 䇏㗙偀ϞӮᛇࠄ㽕೼ᅮНЁࡴϞ㢅ᣀোˈ៤Ў䖭ḋ˖ 䖛ˈ䯂乬ै᳈Ϲ䞡њˈ಴Ўϡㅵᴵӊ⒵䎇Ϣ৺ bufp += nr 䛑Ӯᕫࠄᠻ㸠DŽ ᰃৃҹ䗮צ✊ৢैজ⺄ࠄϔϾ elseDŽབᵰᡞ DUMP_WRITE()੠ do_something_else()ᤶϔϟԡ㕂ˈ㓪䆥 㓪䆥䖭↉ҷⷕᯊ gcc Ӯ༅䋹ˈᑊ᡹ਞ䇁⊩ߎ䫭DŽ಴Ў gcc 䅸Ў if 䇁হ೼ memcpy()ҹৢህ㒧ᴳњˈ 18 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 56 } 55 __list_add(new, head, head•>next); 54 { 53 static __inline__ void list_add(struct list_head *new, struct list_head *head) ==================== include/linux/list.h 53 56 ==================== list_add()ˈ䖭ᰃϔϾ inline ߑ᭄ˈ݊ҷⷕ೼ include/linux/list.h Ё˖ 㽕ᇚϔϾ page 㒧ᵘ䗮䖛݊Ā䯳߫༈”list 䫒ܹ˄᳝ᯊ׭៥Ӏг䇈Āᣖܹā˅ϔϾ䯳߫ᯊˈৃҹՓ⫼ 䑿DŽ খ᭄ ptr Ўᣛ৥䳔㽕߱ྟ࣪ⱘ list_head 㒧ᵘDŽৃ㾕߱ྟ࣪ҹৢϸϾᣛ䩜䛑ᣛ৥䆹 list_head 㒧ᵘ㞾 27 } while (0) 26 (ptr)•>next = (ptr); (ptr)•>prev = (ptr); \ 25 #define INIT_LIST_HEAD(ptr) do { \ ==================== include/linux/list.h 25 27 ==================== INIT_LIST_HEAD 䖯㸠˖ ᇍѢᆓЏ᭄᥂㒧ᵘݙ䚼ⱘ↣Ͼ list_head ᭄᥂㒧ᵘ䛑㽕ࡴҹ߱ྟ࣪ˈৃҹ䗮䖛ϔϾᅣ᪡԰ 䯳߫ˈϡ䖛៥Ӏ೼䖭䞠ᑊϡ݇ᖗDŽޥᴖ 㒧ᵘৃҹৠᯊᄬ೼ѢϸϾঠ䫒䯳߫ЁDŽℸ໪ˈ㒧ᵘЁ䖬᳝Ͼऩ䫒ᣛ䩜 next_hashˈ⫼ᴹ㓈ᣕϔϾऩ䫒ⱘ ৃ㾕ˈ೼ page ᭄᥂㒧ᵘЁᆘᆓњϸϾ list_head 㒧ᵘˈ៪㗙䇈᳝ϸϾ䯳߫᪡԰ⱘ䖲᥹ӊˈ᠔ҹ page 148 } mem_map_t; ==================== include/linux/mm.h 148 148 ==================== ...... 141 struct list_head lru; ==================== include/linux/mm.h 141 141 ==================== ...... 138 struct page *next_hash; ==================== include/linux/mm.h 138 138 ==================== ...... 135 struct list_head list; 134 typedef struct page { ==================== include/linux/mm.h 134 135 ==================== ݊ᅮНЎ˖˄㾕 include/linux/mm.h˅ ᵘⱘ䯳߫ˈህ೼䖭⾡㒧ᵘݙ䚼ᬒϞϔϾ list_head ᭄᥂㒧ᵘDŽҹ⫼Ѣݙᄬ义䴶ㅵ⧚ⱘ page ᭄᥂㒧ᵘЎ՟ˈ 䖭䞠៥Ӏᡞ㒧ᵘৡҹ㉫ԧᄫᥦߎˈⳂⱘҙ೼Ѣ䝦Ⳃˈᑊ≵᳝⡍߿ⱘ৿НDŽབᵰ䳔㽕᳝ᶤ⾡᭄᥂㒧 18 }; 17 struct list_head *next, *prev; 16 structlist_head { ==================== include/linux/list.h 16 18 ==================== DŽ˅خ䖭ѯ䆡гᐌᐌϡࡴϹḐऎߚDŽᔧ✊ˈ៥Ӏᑊϡ哧ࢅ䇏㗙䖭ḋ ϡ䙷МϹḐⱘᗕᑺDŽᇍĀᅮНā੠Ā⬇ᯢāˈ䖬᳝ᇍĀ᭄᥂㒧ᵘ㉏ൟā੠Ā᭄᥂㒧ᵘāˈЗ㟇Ā㒧ᵘ” include/linux/list.h Ё˄ᅲ䰙Ϟᰃ᭄᥂㒧ᵘ㉏ൟⱘ⬇ᯢˈЎ㸠᭛ᮍ֓ˈᴀк䞛পϡ䙷МĀᄺおāˈ៪㗙䇈 19 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䯳߫Ёⱘ᳔ৢϔ乍ˈ߭Ѡ㗙Ⳍৠˈህᰃ䯳߫ⱘ༈ˈ䙷гᰃϔϾ list_head 㒧ᵘˈϡ䖛ϡ೼ӏԩᆓЏ㒧ᵘ ⊼ᛣ೼__list_del()Ёⱘ᪡԰ᇍ䈵ᰃ䯳߫Ё೼ entry Пࠡ੠ПৢⱘϸϾ list_head 㒧ᵘDŽབᵰ entry ᰃ 83 } 82 prev•>next = next; 81 next•>prev = prev; 80 { 79 struct list_head * next) 78 static __inline__ void __list_del(struct list_head * prev, [list_del()>__list_del()] ==================== include/linux/list.h 78 83 ==================== ৠḋˈ䖭䞠гᰃ䇗⫼঺ϔϾ inline ߑ᭄__list_del()ᴹᅠ៤᪡԰˖ 93 } 92 __list_del(entry•>prev, entry•>next); 91 { 90 static __inline__ void list_del(struct list_head *entry) ==================== include/linux/list.h 90 93 ==================== ݡᴹⳟҢ䯳߫Ё㜅䫒ⱘ᪡԰ list_del()˖ ᕘDŽ㟇Ѣ__list_add()ᴀ䑿ⱘҷⷕˈ៥Ӏህᡞᅗ⬭㒭䇏㗙њDŽ ៪䅼䆎Ёⱘ䏃ᕘDŽ՟བˈ᳝ѯߑ᭄г䆌䏇䖛 list_add()㗠Ⳉ᥹䇗⫼__list_add()ˈ㗠ᔶ៤঺ϔᴵϡৠⱘ䏃 Ў䍋⚍DŽϡ䖛ˈ䇏㗙㽕⊼ᛣˈᇍৠϔߑ᭄ⱘϡৠ䇗⫼䏃ᕘᕔᕔ᳝ᕜ໮ˈ៥Ӏ߫ߎⱘাᰃ೼݋ԧⱘᚙ᱃ ໻Ѣো߫ߎ݊䇗⫼䏃ᕘˈ䖭⾡䏃ᕘ䗮ᐌҹϔϾ↨䕗䞡㽕៪ᐌ⫼ⱘߑ᭄Ў䍋⚍ˈ՟བ䖭䞠ህᰃҹ list_add() ᇍѢ䕫䕀䇗⫼ⱘߑ᭄ˈЎᐂࡽ䇏㗙䱣ᯊњ㾷݊ᴹ啭এ㛝ˈᴀк䗮ᐌ೼ߑ᭄ⱘҷⷕࠡ䴶⫼ᮍᣀো੠ 43 } 42 prev•>next = new; 41 new•>prev = prev; 40 new•>next = next; 39 next•>prev = new; 38 { 37 struct list_head * next) 36 struct list_head * prev, 35 static __inline__ void __list_add(struct list_head * new, 34 */ 33 * the prev/next entries already! 32 * This is only for internal list manipulation where we know 31 * 30 * Insert a new entry between two known consecutive entries. 29 /* [list_add()>__list_add()] ==================== include/linux/list.h 29 43 ==================== ৃҹᰃϡৠ㉏ൟⱘᆓЏ㒧ᵘ˅ݙ䚼DŽ䖭Ͼ inline ߑ᭄䇗⫼঺ϔϾ inline ߑ᭄__list_add()ᴹᅠ៤᪡԰˖ ᰃϾ list_head 㒧ᵘˈᅗৃҹᰃϾ⣀ᅜⱘǃⳳℷᛣНϞⱘ䯳߫༈ˈгৃҹ೼঺ϔϾᆓЏ᭄᥂㒧ᵘ˄⫮㟇 খ᭄ new ᣛ৥℆䫒ܹ䯳߫ⱘᆓЏ᭄᥂㒧ᵘݙ䚼ⱘ list_head ᭄᥂㒧ᵘDŽখ᭄ head ߭ᣛ৥䫒ܹ⚍ˈг 20 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⱘഄഔDŽ ৠḋⱘ䘧⧚ˈབᵰᰃ೼ page 㒧ᵘⱘ lru 䯳߫䞠ˈ߭Ӵϟᴹⱘ member Ў lruˈϔḋ㛑ㅫߎᆓЏ㒧ᵘ 䖭ህᰃԡ⿏DŽ ԡ⿏䞣ࠄᑩᰃ໮ᇥਸ਼˛&((struct page*)0)•>list ህ㸼⼎ᔧ㒧ᵘ page ℷད೼ഄഔ 0 Ϟᯊ݊៤ߚ list ⱘഄഔˈ এϔϾԡ⿏䞣ˈे៤ߚ list ೼ page ݙ䚼ⱘԡ⿏䞣ˈᠡ㛑䖒ࠄ㽕∖DŽ䙷Мˈ䖭ޣ ഔˈ᠔ҹ㽕Ңഄഔ curr 䖭䞠ⱘ curr ᰃϔϾ page 㒧ᵘݙ䚼ⱘ៤ߚ list ⱘഄഔˈ㗠៥Ӏ᠔䳔㽕ⱘैᰃ䙷Ͼ page 㒧ᵘᴀ䑿ⱘഄ page=((struct page*)((char*)(curr)•(unsigned long)(&((struct page*)0)•>list))); ៤Ў˖ ᇚࠡ䴶ⱘ 188 㸠Ϣℸᇍ✻ˈህৃҹⳟߎ݊Ёⱘ༹⾬˖㒣䖛 C 乘໘⧚ⱘ᭛ᄫ᳓ᤶˈ䖭ϔ㸠ⱘݙᆍህ 142 ((type *)((char *)(ptr)•(unsigned long)(&((type *)0)•>member))) 141 #define list_entry(ptr, type, member) \ 140 */ 139 * @member: the name of the list_struct within the struct. 138 * @type: the type of the struct this is embedded in. 137 * @ptr: the &struct list_head pointer. 136 * list_entry • get the struct for this entry 135 /** ==================== include/linux/list.h 135 142 ==================== 㗠 list_entry ⱘᅮН߭೼ include/linux/list.h Ё˖ 48 #define memlist_entry list_entry ==================== mm/page_alloc.c 48 48 ==================== џᅲϞˈ೼ৠϔ᭛ӊЁᇚ memlist_entry ᅮН៤ list_entryˈ᠔ҹᅲ䰙ᓩ⫼ⱘᰃ list_entry()˖ ᅮНⱘDŽ ᰃϾ㉏ൟˈ㗠ϡᰃ݋ԧⱘ᭄᥂DŽབᵰⳟϔϟߑ᭄ rmqueue()ⱘᭈϾҷⷕˈ䖬ৃҹথ⦄೼䙷䞠 list コᰃ᮴ ݊ᆓЏ page 㒧ᵘⱘᣛ䩜DŽ䇏㗙ৃ㛑Ӯᇍ memlist_entry()ⱘᅲ⦄੠䇗⫼ᛳࠄೄᚥDŽ಴Ў݊䇗⫼খ᭄ page 䖭䞠ⱘ memlist_entry()ᇚϔϾ list_head ᣛ䩜 curr ᤶㅫ៤݊ᆓЏ㒧ᵘⱘ䍋ྟഄഔˈгህᰃপᕫᣛ৥ 188 page = memlist_entry(curr, struct page, list); [rmqueue()] ==================== mm/page_alloc.c 188 188 ==================== mm/page_alloc.c Ёⱘϔ㸠ҷⷕ˖ ⱘDŽϟ䴶ᰃপ㞾އᰃⱘˈ䖭ᰃϾ䯂乬DŽ៥Ӏ䖬ᰃ䗮䖛ϔϾᅲ՟ᴹⳟ䖭Ͼ䯂乬ᰃᗢḋ㾷 䖲᥹ӊDŽ ᵘਸ਼˛೼ list_head 㒧ᵘЁᑊ≵᳝ᣛ৥ᆓЏ㒧ᵘⱘᣛ䩜ଞDŽ↩コˈ៥Ӏⳳℷ݇ᖗⱘᰃᆓЏ㒧ᵘˈ㗠ϡᰃ list_del()˗ৃᰃˈড䖛ᴹˈᔧ៥Ӏ乎ⴔϔϾ䯳߫পᕫ݊Ёϔ乍ⱘ list_head 㒧ᵘᯊˈজᗢḋᡒࠄ݊ᆓЏ㒧 ៥Ӏ᠟Ϟ᳝ϾᆓЏ㒧ᵘˈ䙷ᔧ✊ህⶹ䘧њᅗⱘᶤϾ list_head ೼ા䞠ˈҢ㗠ҹℸЎখ᭄䇗⫼ list_add()៪ 䇏㗙г䆌Ꮖ㒣ㄝϡঞ㽕䯂њ˖䯳߫᪡԰䛑ᰃ䗮䖛 list_head 䖯㸠ⱘˈԚᰃ䙷ϡ䖛ᰃϾ䖲᥹ӊˈབᵰ ݙ䚼DŽ 21 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! DŽܙᕜᔎⱘᠽ 㗠 GNU ⱘ C 㓪䆥 gcc г೼䖭ᮍ䴶԰њˈܙ㾘ᅮˈџᅲϞ৘⾡ᅲ䰙Փ⫼ⱘ C 㓪䆥Ё䛑԰њ䖭ᮍ䴶ⱘᠽ Ёᑊ≵᳝݇Ѣ∛㓪⠛↉ⱘޚ㄀Ѡ⾡ᰃጠܹ೼ C ⿟ᑣЁⱘ∛㓪䇁㿔⠛↉DŽ㱑✊೼ ANSI ⱘ C 䇁㿔ᷛ ᭄᥂㒧ᵘгϔḋৃҹ೼.h ᭛ӊЁࡴҹᅮНDŽ Пࠡⱘ᭛ӊ߭ҹ.S Ўৢ㓔DŽℸ㉏˄.S˅᭛ӊг੠ C ⿟ᑣϔḋˈৃҹՓ⫼#includeǃ#ifdef ㄝㄝ៤ߚˈ㗠 㓪ҷⷕˈ⦄ҷⱘ∛㓪Ꮉ݋г਌ᬊњ C 䇁㿔乘໘⧚ⱘ䭓໘ˈг೼∛㓪ПࠡࡴϞњϔ䍳乘໘⧚ˈ㗠乘໘⧚ ㄀ϔ⾡ᰃᅠܼⱘ∛㓪ҷⷕˈ䖭ḋⱘҷⷕ䞛⫼.s ԰Ў᭛ӊৡⱘৢ㓔DŽџᅲϞˈሑㅵᰃĀ㒃㊍āⱘ∛ ೼ Linux ݙḌⱘ⑤ҷⷕЁˈҹ∛㓪䇁㿔㓪ݭⱘ⿟ᑣ៪⿟ᑣ↉ˈ᳝޴⾡ϡৠⱘᔶᓣDŽ ⱘ໻ᇣ໮ߎϔϾᄫ㡖гϡ㸠ˈ᠔ҹህা㛑ҹ∛㓪䇁㿔㓪ݭDŽ ՟ᄤDŽ㋏㒳ⱘᓩᇐ⿟ᑣ䗮ᐌϔᅮ㽕㛑ᆍ㒇೼⺕ⲬϞⱘ㄀ϔϾ᠛ऎЁDŽ䖭ᯊ׭ˈાᗩ䖭↉⿟ᑣ · ೼ᶤѯ⡍⅞ⱘഎড়ˈϔ↉⿟ᑣⱘぎ䯈ᬜ⥛гӮᰒᕫ䴲ᐌ䞡㽕DŽ᪡԰㋏㒳ⱘᓩᇐ⿟ᑣህᰃϔϾ ህ≵᳝ᇍᑨⱘ䇁㿔៤ߚˈ᠔ҹˈ㋏㒳䇗⫼ⱘ䖯ܹ੠䖨ಲᰒ✊ᖙ乏⫼∛㓪䇁㿔ᴹ㓪ݭDŽ 䖬⡉⍝ࠄ⫼᠋ぎ䯈੠㋏㒳ぎ䯈П䯈ⱘᴹಲߛᤶˈ㗠⫼Ѣ䖭ϾⳂⱘⱘϔѯᣛҸ೼ C 䇁㿔Ёᴀᴹ ⿟ˈ↣⾦䩳ৃ㛑Ӯ⫼ࠄ៤गϞϛ⃵ˈ݊ᯊ䯈ᬜ⥛ৃ䇧В䎇䕏䞡DŽݡ䇈ˈ㋏㒳䇗⫼ⱘ䖯ߎ䖛⿟ 䖛᥼ᭆDŽ㋏㒳䇗⫼ⱘ䖯ܹ੠䖨ಲህᰃϔϾ݌ൟⱘ՟ᄤDŽ㋏㒳䇗⫼ⱘ䖯ߎᰃ䴲ᐌ乥㐕⫼ࠄⱘ䖛 ᐌ㽕↨⫼催㑻䇁㿔㓪ݭⱘ催DŽ೼ℸ㉏⿟ᑣ៪⿟ᑣ↉Ёˈᕔᕔ↣ϔᴵ∛㓪ᣛҸⱘՓ⫼䛑䳔㽕㒣 ᬜ⥛ህᰒᕫᕜ䞡㽕DŽ㗠⫼∛㓪䇁㿔㓪ݭⱘ⿟ᑣˈ೼ㅫ⊩੠᭄᥂㒧ᵘⳌৠⱘᴵӊϟˈ݊ᬜ⥛䗮 · ݙḌЁᅲ⦄ᶤѯ᪡԰ⱘ䖛⿟ǃ⿟ᑣ↉៪ߑ᭄ˈ೼䖤㸠ᯊӮ䴲ᐌ乥㐕ഄ㹿䇗⫼ˈ಴ℸ݊˄ᯊ䯈˅ гᕫ⫼∛㓪䇁㿔DŽ њᮄⱘᣛҸˈৃ䖭ѯᣛҸⱘՓ⫼ܙPentiumǃPentium II ੠ Pentium MMX 䛑೼ॳᴹⱘ෎⸔Ϟᠽ ㋏㒳㒧ᵘⱘϡৠ CPU 㢃⠛Ёˈ⡍߿ᰃᮄᓔথߎᴹⱘ㢃⠛ЁˈᕔᕔӮ๲ࡴϔѯᮄⱘᣛҸˈ՟བ · CPU Ёⱘϔѯ⡍⅞ᣛҸг≵᳝ᇍᑨⱘ C 䇁㿔៤ߚˈབ݇ЁᮁˈᓔЁᮁㄝㄝDŽℸ໪ˈ೼ৠϔ⾡ ᆘᄬ఼ⱘ᪡԰гᰃϔḋˈ՟བˈ㽕䆒㕂ϔϾ↉ᆘᄬ఼ᯊˈгাད⫼∛㓪䇁㿔ᴹ㓪ݭDŽ ㄝഛ᮴ᇍᑨⱘ C 䇁㿔䇁হDŽ಴ℸˈ䖭ѯᑩሖⱘ᪡԰䳔㽕⫼∛㓪䇁㿔ᴹ㓪ݭDŽCPU Ёⱘϔѯᇍ 䇁㿔Ёᑊ᮴ᇍᑨⱘ䇁㿔៤ߚDŽ՟བˈ೼ 386 ㋏㒳㒧ᵘЁˈᇍ໪䆒ⱘ䕧ܹˋ䕧ߎᣛҸབ inbǃoutb ·᪡԰㋏㒳ݙḌЁⱘᑩሖ⿟ᑣⳈ᥹Ϣ⹀ӊᠧѸ䘧ˈ䳔㽕⫼ࠄϔѯϧ⫼ⱘᣛҸˈ㗠䖭ѯᣛҸ೼ C ⫼∛㓪䇁㿔㓪ݭḌᖗҷⷕЁⱘ䚼ߚҷⷕˈ໻ԧϞᰃߎѢབϟ޴Ͼᮍ䴶ⱘ㗗㰥˖ ᳝݇ⱘ⿟ᑣҹঞϔѯḌᖗҷⷕЁ䇗⫼ⱘ݀⫼ᄤ⿟ᑣDŽ ៤ϡࠄ 20 Ͼᠽሩৡ.s ੠.m ⱘ᭛ӊˈ݊Ё໻䚼ߚᰃ݇ѢЁᮁϢᓖᐌ໘⧚ⱘᑩሖ⿟ᑣˈ䖬᳝ህᰃϢ߱ྟ࣪ 䖛 Unix Sys V ⑤ҷⷕⱘ䇏㗙䛑ⶹ䘧ˈ೼݊㑺 3 ϛ㸠ⱘḌᖗҷⷕЁ⫼∛㓪䇁㿔㓪ݭⱘҷⷕ㑺 2000 㸠 ˈߚ ӏԩϔϾ⫼催㑻䇁㿔㓪ݭⱘ᪡԰㋏㒳ˈ݊ݙḌ⑤ҷⷕЁᘏ᳝ᇥ䚼ߚҷⷕᰃ⫼∛㓪䇁㿔㓪ݭⱘDŽ䇏 1.5 LinuxݙḌ⑤ҷⷕЁⱘ∛㓪䇁㿔ҷⷕ 䘧њDŽ ҢҷⷕЁϡᆍᯧⳟߎϔϾ list_head ⱘᆓЏ㒧ᵘᰃҔМˈ㗠ҹࠡা㽕ⳟϔϟᣛ䩜 next ⱘ㉏ൟህⶹܝህᰃ 㾕ˈ䖭ϔ༫᪡԰᮶᱂䘡䗖⫼ˈজֱᣕњ䕗催ⱘᬜ⥛DŽԚᰃˈᇍѢ䯙䇏ҷⷕⱘҎै᳝Ͼ㔎⚍ˈ䙷ৃ 22 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! MOV AL, BYTE PTR FOO (Intel Ḑᓣ) ALˈ೼ϸ⾡ḐᓣЁϡৠⱘ㸼⼎བϟ˖ Ёⱘᄫ㡖পܹ 8 ԡⱘᆘᄬ఼ܗPTRāˈ៪“DWORD PTRāᴹ㸼⼎DŽ՟བˈᇚ FOO ᠔ᣛݙᄬऩ ⱘ᪡԰᭄ࠡ䴶ࡴϞ“BYTE PTRāˈ“ WORDܗ32 ԡ˅DŽ㗠೼ Intel ḐᓣЁˈ߭ᰃ೼㸼⼎ݙᄬऩ ᅮDŽ⫼԰᪡԰ⷕৢ㓔ⱘᄫ↡᳝ b˄㸼⼎ 8 ԡ ˅ˈ w ˄㸼⼎ 16 ԡ˅੠ l˄㸼⼎އ԰ⷕⱘৢ㓔˅ᴹ (4) ೼ AT&T ḐᓣЁˈ䆓䯂ᣛҸⱘ᪡԰᭄໻ᇣ˄ᆑᑺ˅⬅᪡԰ⷕৡ⿄ⱘ᳔ৢϔϾᄫ↡˄гህᰃ᪡ Ḑᓣⱘ䆒䅵㗙᠔ᛇⱘᰃ“%eax •> %ebxāDŽ ᓣЁЎ“move %eax, %ebxāDŽⳟᴹˈIntel Ḑᓣⱘ䆒䅵㗙᠔ᛇⱘᰃ“EBX = EAXāˈ㗠 AT&T ՟བˈᇚᆘᄬ఼ eax ⱘݙᆍ䗕ܹ ebxˈ೼ Intel ḐᓣЁЎ“MOVE EBX, EAXāˈ㗠೼ AT&T Ḑ ЁℷདⳌডDŽ೼ Intel ḐᓣЁᰃⳂᷛ೼ࠡˈ⑤೼ৢ˗㗠೼ AT&T ḐᓣЁ߭ᰃ⑤೼ࠡˈⳂᷛ೼ৢDŽ (3) ೼ AT&T ⱘ 386 ∛㓪䇁㿔ЁˈᣛҸⱘ⑤᪡԰᭄ϢⳂᷛ᪡԰᭄ⱘ乎ᑣϢ೼ Intel ⱘ 386 ∛㓪䇁㿔 (2) ೼ AT&T ḐᓣЁˈᆘᄬ఼ৡ㽕ࡴϞ“%ā԰Ўࠡ㓔ˈ㗠೼ Intel ḐᓣЁ߭ϡᏺࠡ㓔DŽ (1) ೼ Intel ḐᓣЁ໻໮Փ⫼໻ݭᄫ↡ˈ㗠೼ AT&T ḐᓣЁ䛑Փ⫼ᇣݭᄫ↡DŽ ⱘˈϡࡴ䞡㾚ህӮ䗴៤ೄᡄDŽ݋ԧ䆆ˈЏ㽕᳝ϟ䴶䖭МϔѯᏂ߿˖ 䙷Мˈ䖭ϸ⾡∛㓪䇁㿔П䯈ⱘᏂ䎱ࠄᑩ᳝໮໻ਸ਼˛݊ᅲᰃ໻ৠᇣᓖDŽৃᰃ᳝ᯊ׭ᇣᓖгᰃᕜ䞡㽕 ḐᓣDŽ ⱘݐᆍᗻˈ⬅ GNU ᓔথⱘ৘⾡㋏㒳Ꮉ݋㞾✊ഄ㒻ᡓњ AT&T ⱘ 386 ∛㓪䇁㿔Ḑᓣˈ㗠ϡ䞛⫼ Intel ⱘ ࠡⱘ৘⾡ Unix ⠜ᴀϢᎹ݋᳝ሑৃ㛑དܜݙ⌏ࡼⱘ˄㱑✊ GNU ᰃ“GNU is Not Unixāⱘ㓽ݭ˅DŽЎњϢ ∛㓪䇁㿔ህ↨䕗᥹䖥䙷ѯ∛㓪䇁㿔DŽৢᴹˈ೼ Unixware Ёֱ⬭њ䖭⾡ḐᓣDŽGNU Џ㽕ᰃ೼ Unix 乚ඳ ⱘ໘⧚఼ϞDŽ䖭ѯᴎ఼ⱘ∛㓪䇁㿔೼亢ḐϞǃҢ㗠೼ḐᓣϞϢ Intel ⱘ᳝᠔ϡৠDŽ㗠 AT&T ᅮНⱘ 386 ỡࠄ VAX ੠ 68000 ㋏߫⿏ৢܜˈᛃ੠䳔㽕㗠ᅮНњ䖭ḋⱘḐᓣDŽUnix ᳔߱ᰃ೼ PDP•11 ᴎ఼Ϟᓔথⱘ ᰃ⬅ AT&T ᅮНⱘḐᓣDŽᔧ߱ˈᔧ AT&T ᇚ Unix ⿏ỡࠄ 80386 ໘⧚఼Ϟᯊˈḍ᥂ Unix ೜ݙҎ຿ⱘд ᳝ⱘ᳝݇ 386 ∛㓪䇁㿔⿟ᑣ䆒䅵ⱘᬭ⾥к៪খ㗗кЁ᠔Փ⫼ⱘḐᓣDŽৃᰃˈ೼ Unix 乚ඳЁˈ䞛⫼ⱘै ೼ DOS/Windows 乚ඳЁˈ386 ∛㓪䇁㿔䛑䞛⫼⬅ Intel ᅮНⱘ䇁হ˄ᣛҸ˅Ḑᓣˈ䖭гᰃ޴Т೼᠔ 1.5.1 GNU ⱘ 386 ∛㓪䇁㿔 ⍝ঞ݋ԧⱘ∛㓪䇁㿔ҷⷕᯊ䖬Ӯࡴҹ㾷䞞DŽ ϟՓ⫼ⱘ 386 ∛㓪䇁㿔ˈҹৢ೼݋ԧⱘᚙ᱃Ёމ䲚Ёഄҟ㒡ϔϟ೼ݙḌЁ䖭ϸ⾡ᚙܜ᠔ҹˈ៥Ӏ 䇁㿔⠛↉ᅲ䰙Ϟব៤њϔ⾡ҟТ 386 ∛㓪੠ C П䯈ⱘϔ⾡Ё䯈䇁㿔DŽ Փ⫼ᆘᄬ఼ǃҹঞབԩϢ C ⿟ᑣЁᅮНⱘব䞣Ⳍ㒧ড়ⱘ䇁㿔៤ߚDŽ䖭ѯ៤ߚՓᕫጠܹ C ⿟ᑣЁⱘ∛㓪 ϡৠѢᐌ⫼ 386 ∛㓪䇁㿔ⱘহ⊩˗㗠೼ጠܹ C ⿟ᑣⱘ⠛↉Ёˈ߭᳈๲ࡴњϔѯᣛᇐ∛㓪Ꮉ݋བԩߚ䜡 ⿟ᑣ៪⠛↉ᯊ䛑Ӯᛳࠄೄ䲒ˈ᳝ⱘ⫮㟇Ӯᳯ㗠ैℹDŽ݊ॳ಴ᰃ˖೼ݙḌĀ㒃ā∛㓪ҷⷕЁ GNU 䞛⫼њ ᇍѢᮄ᥹㾺 Linux ݙḌ⑤ҷⷕⱘ䇏㗙ˈાᗩҪ↨䕗❳ᙝ i386 ∛㓪䇁㿔ˈ೼⧚㾷䖭ϸ⾡∛㓪䇁㿔ⱘ ᣕDŽ ⬅Ѣᴀкϧ⊼Ѣ Intel i386 ㋏㒳㒧ᵘϟⱘ Linux ݙḌˈϟ䴶៥Ӏাҟ㒡 GNU ᇍ i386 ∛㓪䇁㿔ⱘᬃ ℸ໪ˈݙḌҷⷕЁг᳝޴Ͼ Intel Ḑᓣⱘ∛㓪䇁㿔⿟ᑣˈᰃ⫼Ѣ㋏㒳ᓩᇐⱘDŽ 23 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᔧ䳔㽕೼ C 䇁㿔ⱘ⿟ᑣЁጠܹϔ↉∛㓪䇁㿔⿟ᑣ↉ᯊˈৃҹՓ⫼ gcc ᦤկⱘ“asmā䇁হࡳ㛑DŽ՟ 1.5.2 ጠܹ C ҷⷕЁⱘ 386 ∛㓪䇁㿔⿟ᑣ↉ foo(, %EAX, 4) (AT&T Ḑᓣ) [foo+EAX*4] (Intel Ḑᓣ) DISP Ў fooˈ㗠݊ᅗഛⳕ⬹ˈ߭㸼⼎Ў˖ ᔧѢ(%ebp, , )ˈ䖯ϔℹⳌᔧѢ(%ebp, 0, 0)DŽজབˈᔧ INDEX Ў EAXˈSCALE Ў 4˄32 ԡ ˅ˈ ೼ AT&T ḐᓣⱘᣀোЁབᵰা᳝ϔ乍 baseˈህৃҹⳕ⬹䗫োˈ৺߭ϡ㛑ⳕ⬹ˈ᠔ҹ(%ebp)Ⳍ •4(%ebp) (AT&T Ḑᓣ) [ebp•4] (Intel Ḑᓣ) ⳕ⬹ˈBASE Ў EBPˈ㗠 DISP˄ԡ⿏˅Ў 4 ᯊˈ㸼⼎བϟ˖ ⊼ᛣ೼ AT&T ḐᓣЁ䱤৿њ᠔䖯㸠ⱘ䅵ㅫDŽ՟བˈᔧ SECTION ⳕ⬹ˈINDEX ੠ SCALE г ᄫ↉೼㒧ᵘЁⱘԡ⿏DŽ ㋴ᰃ᭄᥂㒧ᵘˈ߭ disp Ў݋ԧܗ㋴ⱘ໻ᇣˈindex ЎϟᷛDŽབᵰ᭄㒘ܗഔˈscale Ў↣Ͼ᭄㒘 ㋴ݙⱘϔϾᄫ↉ˈbase Ў᭄㒘ⱘ䍋ྟഄܗ䖭⾡ᇏഔᮍᓣᐌᐌ⫼Ѣ೼᭄᥂㒧ᵘ᭄㒘Ё䆓䯂⡍ᅮ Section:disp(base, index, scale) (AT&T Ḑᓣ) SECTION:[BASE+INDEX*SCALE+DISP] (Intel Ḑᓣ) (8) 䯈᥹ᇏഔⱘϔ㠀Ḑᓣˈϸ㗙ऎ߿བϟ lret $stack_adjust (AT&T Ḑᓣ) RET FAR STACK—ADJUST (Intel Ḑᓣ) ϢПⳌᑨⱘ䖰⿟䖨ಲᣛҸˈ߭Ў˖ ljmp $section, $offset (AT&T Ḑᓣ) lcall $section, $offset (AT&T Ḑᓣ) JMP FAR SECTION:OFFSET (Intel Ḑᓣ) CALL FAR SECTION:OFFSET (Intel Ḑᓣ) ϸ⾡ϡৠⱘ㸼⼎བϟ˖ ೼ Intel ḐᓣЁˈ߭Ў“JMP FARā੠“CALL FARāDŽᔧ䕀⿏੠䇗⫼ⱘⳂᷛЎⳈ᥹᪡԰᭄ᯊˈ (7) 䖰⿟ⱘ䕀⿏ᣛҸ੠ᄤ⿟ᑣ䇗⫼ᣛҸⱘ᪡԰ⷕৡ⿄ˈ೼ AT&T ḐᓣЁЎ“ljmpā੠“lcallāˈ㗠 㽕ࡴϞ“*ā԰Ўࠡ㓔˄䇏㗙໻ὖӮ㘨ᛇࠄ C 䇁㿔Ёⱘᣛ䩜৻˅ˈ㗠೼ Intel ḐᓣЁ߭ϡᏺDŽ (6) ೼ AT&T ḐᓣЁˈ㒱ᇍ䕀⿏៪䇗⫼ᣛҸ jump/call ⱘ᪡԰᭄˄гे䕀⿏៪䇗⫼ⱘⳂᷛഄഔ˅ˈ ḐᓣЁⱘ“PUSH 4āˈ೼ AT&T ḐᓣЁህবЎ“pushl $4āDŽ (5) ೼ AT&T ḐᓣЁˈⳈ᥹᪡԰᭄㽕ࡴϞ“$ā԰Ўࠡ㓔ˈ㗠೼ Intel ḐᓣЁ߭ϡᏺࠡ㓔DŽ᠔ҹˈIntel movb FOO, %al (AT&T Ḑᓣ) 24 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! гϡৠѢ C 䇁㿔ⱘᶤ⾡Ё䯈䇁㿔DŽ ๲ࡴᇍ∛㓪Ꮉ݋ⱘᣛᇐ԰⫼DŽ݊㒧ᵰᰃ݊䇁⊩ᅲ䰙Ϟব៤њ᮶ϡৠѢ∛㓪䇁㿔ˈˈܙ䇁㿔԰᳈໮ⱘᠽ 䞠᳝Ͼᗢḋߚ䜡Փ⫼ᆘᄬ఼ˈᗢḋϢ C ҷⷕЁⱘব䞣㒧ড়ⱘ䯂乬DŽЎњ䖭ϾⳂⱘˈᖙ乏ᇍ᠔⫼ⱘ∛㓪 ϔ㠀㗠㿔ˈᕔ C ҷⷕЁᦦܹ∛㓪䇁㿔ⱘҷⷕ⠛ᮁ㽕↨Ā㒃㊍āⱘ∛㓪䇁㿔ҷⷕ໡ᴖᕫ໮ˈ಴Ў䖭 35 } 34 :"ir" (i), "m" (v•>counter)); 33 :"=m" (v•>counter) 32 LOCK "addl %1,%0" 31 __asm__ __volatile__( 30 { 29 static __inline__ void atomic_add(int i, atomic_t *v) ==================== include/asm•i386/atomic.h 29 35 ==================== 䲒໮њ˖ བᵰ䇏㗙㾝ᕫ䖭↩コ䖬ᰃᆍᯧ⧚㾷ⱘ䆱ˈ䙷Мϟ䴶䖭ϔ↉˄প㞾 include/asm•i386/atomic.h˅ህೄ ⶹ䘧㒣䖛㓪䆥ҹৢˈ⡍߿ᰃབᵰ㒣䖛Ӭ࣪ⱘ䆱ˈ᳔ৢѻ⫳ⱘ∛㓪ҷⷕおコᗢḋDŽ ⦄ਸ਼˛ॳ಴೼Ѣᛇ㽕ᇍℸ᳝↨䕗⹂ߛⱘ᥻ࠊDŽབᵰ⫼ C 䇁হᴹ⍜㗫ϔѯᯊ䯈ⱘ䆱ˈԴᐌᐌϡ㛑⹂ߛഄ 㽕⍜㗫ᥝϔѯᯊ䯈ˈ㗠ϡᰃ㽕㡖ⳕϔѯᯊ䯈ˈ䙷МЎҔМ㽕⫼∛㓪䇁হᴹᅲ⦄ˈ㗠ϡᰃ೼ C 䞠䴶ᴹᅲ ϸᴵ䕀⿏ᣛҸ㗠⍜㗫ᥝϔѯᯊ䯈DŽ᮶✊ᰃخ㛑䆄ᕫDŽ᠔ҹˈ䖭ϔᇣ↉∛㓪ҷⷕⱘ⫼ᛣህ೼ѢՓ CPU ぎ ᰃ 1b ህ㸼⼎ᕔৢᡒDŽ䖭гᰃҢᮽᳳⱘ Unix ∛㓪ҷⷕЁ㒻ᡓϟᴹⱘˈ䇏䖛 Unix ㄀ 6 ⠜ⱘ䇏㗙໻ὖ䛑䖬 䖭ᰃ䕀⿏ᣛҸⱘⳂᷛ 1f 㸼⼎ᕔࠡ˄f 㸼⼎ forward˅ᡒࠄ㄀ϔϾᷛোЎ 1 ⱘ䙷ϔ㸠DŽⳌᑨഄˈབᵰ 1: 1: jmp 1f jmp 1f ᠔ҹˈgcc ᇚП㗏䆥៤ϟ䴶ⱘḐᓣ㗠Ѹ㒭 gas এ∛㓪˖ 䖭ህϡ䙷МⳈ㾖њDŽ䖭䞠ˈϔ݅ᦦܹњϝ㸠∛㓪䇁হˈ“\nāህᰃᤶ㸠ヺˈ㗠“\tā߭㸼⼎ TAB ヺDŽ 38 #define __SLOW_DOWN_IO __asm__ __volatile__ ("jmp 1f \n1:\tjmp 1f \n1:) ==================== include/asm•i386/io.h 38 38 ==================== __SLOW_DOWN_IO জ᳝ϡৠⱘᅮН˖ ೼ৠϔϾ asm 䇁হЁгৃҹᦦܹ໮㸠∛㓪⿟ᑣDŽህ೼ৠϔϾ᭛ӊЁˈ೼ϡৠⱘᴵӊϟˈ ⱘ∛㓪ᣛҸˈᕜᆍᯧ⧚㾷DŽ 㗠ᆘᄬ఼ৡ al гࡴњࠡ㓔“%āDŽⶹ䘧њࠡ䴶᠔䆆 AT&T ḐᓣϢ Intel Ḑᓣⱘϡৠˈ䖭ህᰃϔᴵᕜ᱂䗮 Ϟࡴњৢ㓔“bāҹ㸼⼎䖭ᰃ 8 ԡⱘˈ㗠 0x80 ಴Ўᰃᐌ᭄ˈे᠔䇧ĀⳈ᥹᪡԰᭄āˈ᠔ҹ㽕ࡴϞࠡ㓔“$āˈ ᴹⳟᣀো䞠䴶ࡴϞњᓩোⱘ∛㓪ᣛҸDŽ䖭ᰃϔᴵ 8 ԡ䕧ߎᣛҸˈབࠡ᠔䗄೼᪡԰ヺܜ៥Ӏ䖬㽕䆆ࠄDŽ 䴶ৢˈܙ䖭䞠ˈ᱖Ϩᗑ⬹೼ asm ੠ volatile ࠡৢⱘϸϾ“__āᄫヺˈ䖭гᰃ gcc ᇍ C 䇁㿔ⱘϔ⾡ᠽ 40 #define __SLOW_DOWN_IO __asm__ __volatile__ ("outb %al, $0x80") ==================== include/asm•i386/io.h 40 40 ==================== བˈ೼ include/asm•i386/io.h Ё᳝䖭Мϔ㸠˖ 25 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! immediate˅ˈᑊϨ䆹᪡԰᭄ᴹ㞾Ѣ C ҷⷕЁⱘব䞣ৡ˄䖭䞠ᰃ䇗⫼খ᭄˅iDŽ㄀ѠϾ㑺ᴳЎ"m" ܹ䚼᳝ϸϾ㑺ᴳDŽ㄀ϔϾЎ"ir" (i)ˈ㸼⼎ᣛҸЁⱘ%1 ৃҹᰃϔϾ೼ᆘᄬ఼ЁⱘĀⳈ᥹᪡԰᭄ā˄i 㸼⼎ 䕧ߎ䚼ৢ䴶ᰃĀ䕧ܹ䚼āDŽ䕧ܹ㑺ᴳⱘḐᓣϢ䕧ߎ㑺ᴳⳌԐˈԚϡᏺ“=āোDŽ೼ࠡ䴶՟ᄤЁⱘ䕧 Пࠡⱘݙᆍˈ䖭ህ㒭 gcc ᦤկњ䇗ᑺՓ⫼䖭ѯᆘᄬ఼ⱘձ᥂DŽ ᰃϢ䕧ߎ䚼Ё䇈ᯢⱘ᪡԰᭄Ⳍ㒧ড়ⱘᆘᄬ఼៪᪡԰᭄ᴀ䑿ˈ೼ᠻ㸠ጠܹⱘ∛㓪ҷⷕҹৢഛϡֱ⬭ᠻ㸠 v•>counterDŽ޵ ܗ䖭䞠া᳝ϔϾ㑺ᴳˈ“=mā㸼⼎ⳌᑨⱘⳂᷛ᪡԰᭄˄ᣛҸ䚼Ёⱘ%0˅ᰃϔϾݙᄬऩ :"=m" (v•>counter) 䚼Ў ✊ৢᰃϔϾᄫ↡㸼⼎ᇍ᪡԰᭄㉏ൟⱘ䇈ᯢˈ✊ৢᰃ݇Ѣব䞣㒧ড়ⱘ㑺ᴳDŽ՟བˈ೼Ϟ䴶ⱘ՟ᄤЁ䕧ߎ ᴳ ā˄ constraint˅DŽᖙ㽕ᯊ䕧ߎ䚼Ёৃҹ᳝໮Ͼ㑺ᴳˈѦⳌҹ䗫োߚ䱨DŽ↣Ͼ䕧ߎ㑺ᴳҹ“=āোᓔ༈ˈ Ā䕧ߎ䚼āˈ⫼ҹ㾘ᅮᇍ䕧ߎব䞣ˈेⳂᷛ᪡԰᭄བԩ㒧ড়ⱘ㑺ᴳᴵӊDŽ↣Ͼ䖭ḋⱘᴵӊ⿄ЎϔϾĀ㑺 䙷Мˈᗢḋ㸼䖒ᇍব䞣㒧ড়ⱘ㑺ᴳᴵӊਸ਼˛䖭ህᰃ݊ԭ޴Ͼ䚼ߚⱘ԰⫼DŽ㋻᥹೼ᣛҸ䚼ৢ䴶ⱘᰃ DŽ⎚⏋ܡϾ“%āヺˈҹ ໘⧚DŽ⬅Ѣ䖭ѯḋᵓ᪡԰᭄гՓ⫼“%āࠡ㓔ˈ೼⍝ঞࠄ݋ԧⱘᆘᄬ఼ᯊህ㽕೼ᆘᄬ఼ৡࠡ䴶ࡴϞϸ ᭄ˈህ䇈ᯢ᳝޴Ͼব䞣䳔㽕Ϣᆘᄬ఼㒧ড়ˈ⬅ gcc ੠ gas ೼㓪䆥੠∛㓪ᯊḍ᥂ৢ䴶ⱘ㑺ᴳᴵӊ㞾㸠ব䗮 Ѣ݋ԧ CPU Ё䗮⫼ᆘᄬ఼ⱘ᭄䞣DŽ䖭ḋˈᣛҸ䚼Ё⫼ࠄњ޴Ͼϡৠⱘ䖭⾡᪡԰އℸ㉏᪡԰᭄ⱘᘏ᭄প ೼ᣛҸ䚼Ёˈ᭄ᄫࡴϞࠡ㓔%ˈབ%0ǃ%1 ㄝㄝˈ㸼⼎䳔㽕Փ⫼ᆘᄬ఼ⱘḋᵓ᪡԰᭄DŽৃҹՓ⫼ⱘ gcc ੠ gas এ໘⧚DŽ 㗠ᇍᆘᄬ఼ⱘՓ⫼߭ϔ㠀াᦤկϔϾĀḋᵓā੠ϔѯ㑺ᴳᴵӊˈ㗠ᡞࠄᑩབԩϢব䞣㒧ড়ⱘ䯂乬⬭㒭 ࠄ䖭ḋ䖬ϡᆍᯧDŽ䩜ᇍ䖭Ͼ䯂乬ˈgcc 䞛পњϔ⾡ᡬЁⱘࡲ⊩˖⿟ᑣਬাᦤկ݋ԧⱘᣛҸˈخ㽕މ᳈ԩ ߎ䖭ѯ㽕∖ˈݡ䗮䖛Ӭ࣪ˈ᳔ৢг㛑䖒ࠄⳂⱘDŽԚᰃˈेՓ䙷ḋˈ᠔ᓩܹⱘϡ⹂ᅮᗻг䖬ᰃϾ䯂乬ˈ ᕅડᅗᇍᆘᄬ఼ߚ䜡DŽᔧ✊ˈབᵰ gcc ⱘࡳ㛑䴲ᐌᔎˈ䙷М䗮䖛ߚᵤጠܹⱘ∛㓪ҷⷕгᑨ䆹㛑໳ᔦ㒇 г䖬ᰃϡ໳ˈ䖬ᕫ᳝Ͼ᠟↉ᡞՓ⫼ᆘᄬ఼ⱘ㽕∖ਞⶹ gccˈড䖛ᴹމ㹿ࡼഄⶹ䘧 gcc ᇍᆘᄬ఼ⱘߚ䜡ᚙ ᰃܝˈⱘࠡৢӮᡞાϔϾᆘᄬ఼ߚ䜡⫼ѢાϔϾব䞣ˈҹঞાϔϾ៪ા޴Ͼᆘᄬ఼ᰃぎ䯆ⴔⱘDŽ㗠Ϩ ⱘ∛㓪ҷⷕᯊˈᣝ✻⿟ᑣ䘏䕥ⱘ㽕∖ᕜ⏙Ἦᑨ䆹䗝⫼ҔМᣛҸˈԚᰃै᮴⊩⹂ߛഄⶹ䘧 gcc ೼ጠܹ⚍ ህ໡ᴖ໮њDŽ䖭ᰃ಴Ў˖⿟ᑣਬ೼㓪ݭጠܹމᣛҸЁⱘ᪡԰᭄䳔㽕Ϣ C ⿟ᑣЁⱘᶤѯব䞣㒧ড়ᯊˈᚙ ᴀ㡖ᓔ༈ⱘϸϾ՟ᄤЁˈ∛㓪ᣛҸ䛑≵᳝ѻ⫳Ϣ C ⿟ᑣЁⱘব䞣㒧ড়ⱘ䯂乬ˈ᠔ҹ↨䕗ㅔऩDŽᔧ∛㓪 ᔧᇚ∛㓪䇁㿔ҷⷕ⠛ᮁጠܹࠄ C ҷⷕЁᯊˈ᪡԰᭄Ϣ C ҷⷕЁⱘব䞣བԩ㒧ড়ᰒ✊ᰃϾ䯂乬DŽ೼ ϟህϢᐌ㾘ⱘ∛㓪䇁হ෎ᴀⳌৠˈབࠡ䴶ⱘϸϾ՟ᄤ䙷ḋDŽމ᠔ҹ೼᳔ㅔऩⱘᚙ 㗠ⳕ⬹ˈމ໘ϟ䴶Ӯ䆆ࠄDŽ䖭ϔ䚼ߚৃҹ⿄ЎĀᣛҸ䚼āˈᰃᖙ乏᳝ⱘˈ㗠݊ᅗ৘䚼ߚ߭ৃ㾚݋ԧⱘᚙ ㄀ϔ䚼ߚህᰃ∛㓪䇁হᴀ䑿ˈ݊ḐᓣϢ೼∛㓪䇁㿔⿟ᑣЁՓ⫼ⱘ෎ᴀⳌৠˈԚг᳝ऎ߿ˈϡৠП ⊼ᛣϡ㽕ᡞ䖭ѯ“:āো䎳⿟ᑣᷛোЁ᠔⫼ⱘ˄བࠡ䴶ⱘ 1:˅⏋⎚DŽ ᣛҸ䚼:䕧ߎ䚼:䕧ܹ䚼:ᤳണ䚼 ᦦܹ C ҷⷕЁⱘϔϾ∛㓪䇁㿔ҷⷕ⠛ᮁৃҹߚ៤ಯ䚼ߚˈҹ“:āোࡴҹߚ䱨ˈ݊ϔ㠀ᔶᓣЎ˖ ᱃ᯊ⺄ࠄ݋ԧⱘҷⷕᯊ䖬Ӯࡴҹᦤ⼎DŽ ҟ㒡ϔϟᦦܹ C ҷⷕЁⱘ∛㓪៤ߚⱘϔ㠀Ḑᓣˈᑊࡴҹ㾷䞞DŽҹৢˈ೼៥Ӏ䍄䖛৘⾡ᚙܜˈϟ䴶 26 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᣛҸᰃ≵ֱ᳝䆕ⱘˈг᮴⊩㽕∖೼䅵ㅫ䖛⿟Ёᇍᘏ㒓ࡴ䫕DŽ 㽕ᇚᘏ㒓䫕ԣˈҹֱ䆕᪡԰ⱘĀॳᄤᗻ˄atomic˅āDŽⳌ↨ПϟˈϞ䗄ⱘ C 䇁হ೼㓪䆥Пৢࠄᑩ᳝޴ᴵ བ“v•>counter += i;āˈЎҔМ㽕⫼∛㓪ਸ਼˛ॳ಴ህ೼Ѣˈ䖭䞠㽕∖ᭈϾ᪡԰া⬅ϔᴵᣛҸᅠ៤ˈᑊϨ Ͼ CPU˅ᠧᡄDŽ䇏㗙г䆌㽕䯂ˈᇚϸϾ᭄Ⳍࡴᰃᕜㅔऩⱘ᪡԰ˈC 䇁㿔Ёᯢᯢ᳝Ⳍᑨⱘ䇁㿔៤ߚˈ՟ Ёⱘ݇䬂ᄫ LOCK 㸼⼎೼ᠻ㸠 addl ᣛҸᯊ㽕ᡞ㋏㒳ⱘᘏ㒓䫕ԣˈϡ䅽߿ⱘ CPU˄བᵰ㋏㒳Ё᳝ϡℶϔ ಲࠄϞ䴶ⱘ՟ᄤˈ䇏㗙⦄೼ᑨ䆹ৃҹ⧚㾷䖭↉ҷⷕⱘ԰⫼ᯊᇚখ᭄ i ⱘؐࡴࠄ v•>counter ϞDŽҷⷕ 䖬㽕⊼ᛣˈᔧ䕧ߎ䚼Ўぎˈे≵᳝䕧ߎ㑺ᴳᯊˈབᵰ᳝䕧ܹ㑺ᴳᄬ೼ˈ߭乏ֱ⬭ߚ䱨ᷛ䆄“:āোDŽ 㛑Ꮖ㒣ϡϔ㟈DŽ ЁⱘݙᆍᏆ᳝ᬍবˈབᵰॳᴹᶤϾᆘᄬ఼˄г䆌೼ᴀ⃵᪡԰Ёᑊ᳾⫼ࠄ˅ⱘݙᆍᴹ㞾ݙᄬˈ߭⦄೼ৃ ᇍᑨⱘ᪡԰᭄㓪োᬒ೼㑺ᴳᴵӊЁDŽ೼ᤳണ䚼ᐌᐌӮҹ“memoryāЎ㑺ᴳᴵӊˈ㸼⼎᪡԰ᅠ៤ৢݙᄬ ℸ໪ˈབᵰϔϾ᪡԰᭄㽕∖Փ⫼Ϣࠡ䴶ᶤϾ㑺ᴳЁ᠔㽕∖ⱘᰃৠϔϾᆘᄬ఼ˈ䙷ህᡞϢ䙷Ͼ㑺ᴳ “I”——㸼⼎ᐌ᭄˄0 㟇 31˅DŽ “Sāˈ“D”——ߚ߿㸼⼎㽕∖Փ⫼ᆘᄬ఼ esi ៪ edi˗ “aāˈ“ b āˈ“ c āˈ“ d ”——ߚ߿㸼⼎㽕∖Փ⫼ᆘᄬ఼ eaxˈebxˈecx ៪ edx˗ “g”——㸼⼎Āӏᛣā˗ “Eā੠“F”——㸼⼎⍂⚍᭄˗ “iā੠“h”——㸼⼎Ⳉ᥹᪡԰᭄˗ “q”——㸼⼎ᆘᄬ఼ eaxˈebxˈecxˈedx Пϔ˗ r”——㸼⼎ӏԩᆘᄬ఼˗“ ˗ܗ“māˈ“ v ā੠“o”——㸼⼎ݙᄬऩ 㸼⼎㑺ᴳᴵӊⱘᄫ↡᳝ᕜ໮DŽЏ㽕᳝˖ %ϢᑣোП䯈ᦦܹϔϾ“bā㸼⼎᳔Ԣᄫ㡖ˈᦦܹϔϾ“hā㸼⼎⃵Ԣᄫ㡖DŽ 䆌ᯢ⹂ᣛߎᰃᇍાϔϾᄫ㡖᪡԰ˈℸᯊ೼ܕϡ䖛ˈ೼ϔѯ⡍⅞ⱘ᪡԰Ёˈᇍ᪡԰᭄䖯㸠ᄫ㡖᪡԰ᯊг ᄫ㡖᪡԰៪ᄫ˄16 ԡ˅᪡԰DŽᇍ᪡԰᭄䖯㸠ⱘᄫ㡖᪡԰咬䅸Ўᇍ᳔݊Ԣᄫ㡖ⱘ᪡԰ˈᄫ᪡԰гᰃϔḋDŽ Ёᓩ⫼ϔϾ᪡԰᭄ᯊᘏᰃᡞᅗᔧ៤ϔϾ 32 ԡⱘĀ䭓ᄫāˈԚᰃᇍ݊ᅲᮑⱘ᪡԰ˈ߭ḍ᥂䳔㽕гৃҹᰃ 䚼Ёᓩ⫼䖭ѯ᪡԰᭄៪ߚ䜡⫼Ѣ䖭ѯ᪡԰᭄ⱘᆘᄬ఼ᯊˈህ೼ᑣোࠡ䴶ࡴϞϔϾ“%āোDŽ೼ᣛҸ䚼 ᪡԰᭄ⱘ㓪োҢ䕧ߎ䚼ⱘ㄀ϔϾ㑺ᴳ˄ᑣোЎ 0˅ᓔྟˈ乎ᑣ᭄ϟᴹˈ↣Ͼ㑺ᴳ䅵᭄ϔ⃵DŽ೼ᣛҸ 䅽 gcc 䞛পⳌᑨⱘ᥾ᮑDŽϡ䖛ˈ᳝ᯊ׭ህⳈ᥹ᡞ䖭ѯ䇈ᯢᬒ೼䕧ߎ䚼њˈ䙷гᑊ᮴ϡৃDŽ ᪡԰ⱘЁ䯈㒧ᵰDŽ䖭ḋˈ䖭ѯᆘᄬ఼ॳ᳝ⱘݙᆍህᤳണњˈ᠔ҹ㽕೼ᤳണ䚼ᇍ᪡԰ⱘࡃ԰⫼ࡴҹ䇈ᯢˈ ೼᳝ѯ᪡԰Ёˈ䰸⫼Ѣ䕧ܹ᪡԰᭄੠䕧ߎ᪡԰᭄ⱘᆘᄬ఼ҹ໪ˈ䖬㽕ᇚ㢹ᑆϾᆘᄬ఼⫼Ѣ䅵ㅫ៪ ৢᦦܹϔᴵ popl ᣛҸˈᘶ໡ᆘᄬ఼ⱘݙᆍDŽ ᆍDŽℸᯊ gcc Ӯ㞾ࡼ೼ᓔ༈໘ᦦܹϔᴵ pushl ᣛҸˈᇚ䆹ᆘᄬ఼ॳᴹⱘݙᆍֱᄬ೼ේᷜЁˈ㗠೼㒧ᴳҹ ϔϾˈ䙷ህᕫֱ䆕೼Փ⫼ҹৢᘶ໡݊ॳ᳝ⱘݙ⫼׳䇧ˈৃᰃབᵰ᠔᳝ⱘᆘᄬ఼䛑೼Փ⫼ˈ㗠াད᱖ᯊ ᮴᠔צܹ䆹ᆘᄬ఼ˈৃᰃ䖭Ͼᆘᄬ఼ॳᴹⱘݙᆍህϡ໡ᄬ೼њDŽབᵰ䖭Ͼᆘᄬ఼ᴀᴹህᰃぎ䯆ⱘˈ䙷 %1 㽕∖Փ⫼ᆘᄬ఼ˈ᠔ҹ gcc ӮЎ݊ߚ䜡ϔϾᆘᄬ఼ˈᑊ㞾ࡼᦦܹϔᴵ movl ᣛҸᡞখ᭄ i ⱘ᭄ؐ㺙 ᭄㒧ড়ⱘᆘᄬ఼៪᪡԰᭄ᴀ䑿ˈ೼ᠻ㸠ጠܹⱘ∛㓪ҷⷕҹৢгϡֱ⬭ᠻ㸠ПࠡⱘݙᆍDŽ՟བˈ䖭䞠ⱘ ߚ䜡ϔϾᆘᄬ఼ˈᑊ㞾ࡼᦦܹᖙ㽕ⱘᣛҸᇚ᪡԰᭄ेব䞣ⱘؐ㺙ܹ䆹ᆘᄬ఼DŽϢ䕧ܹ䚼Ё䇈ᯢⱘ᪡԰ v•>countcr)ˈᛣНϢ䕧ߎ㑺ᴳЁⳌৠDŽབᵰϔϾ䕧ܹ㑺ᴳ㽕∖Փ⫼ᆘᄬ఼ˈ߭೼乘໘⧚ᯊ gcc ӮЎП) 27 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! d2 े%2 ᖙ乏ᬒ೼ᆘᄬ఼ esi ЁDŽݡⳟ䕧ܹ䚼ˈ䖭䞠᳝ಯϾ㑺ᴳˈᇍᑨѢ᪡԰᭄%3 㟇%6DŽ݊Ё᪡԰᭄ Ў᪡԰᭄%0ˈᖙ乏ᬒ೼ᆘᄬ఼ ecx Ёˈॳ಴ㄝϔϟህӮᯢⱑDŽৠḋˈd1 े%1 ᖙ乏ᬒ೼ᆘᄬ఼ edi Ё˗ ⳟ㑺ᴳᴵӊ੠ব䞣Ϣᆘᄬ఼ⱘ㒧ড়DŽ䕧ߎ䚼᳝ϝϾ㑺ᴳˈᇍᑨѢ᪡԰᭄%0 㟇%2DŽ݊Ёব䞣 d0ܜ ݙᄬぎ䯈ⱘݙᆍˈ㗠ᗑ⬹᭄݊᥂㒧ᵘDŽ䖭ᰃՓ⫼䴲ᐌ乥㐕ⱘϔϾߑ᭄ˈ᠔ҹ݊䖤㸠ᬜ⥛कߚ䞡㽕DŽ 䇏㗙г䆌ⶹ䘧 memcpy()DŽ䖭䞠ⱘ__memcpy()ህᰃݙḌЁᇍ memcpy()ⱘᑩሖᅲ⦄ˈ⫼ᴹ໡ࠊϔഫ 215 } 214 return (to); 213 : "memory"); 212 :"0" (n/4), "q" (n),"1" ((long) to),"2" ((long) from) 211 : "=&c" (d0), "=&D" (d1), "=&S" (d2) 210 "2:" 209 "movsb\n" 208 "je 2f\n\t" 207 "1:\ttestb $1,%b4\n\t" 206 "movsw\n" 205 "je 1f\n\t" 204 "testb $2,%b4\n\t" 203 "rep ; movsl\n\t" 202 __asm__ __volatile__( 201 int d0, d1, d2; 200 { 199 static inline void * __memcpy(void * to, const void * from, size_t n) ==================== include/asm•i386/string.h 199 215 ==================== ݡᴹⳟϔϾ໡ᴖϔ⚍ⱘ՟ᄤˈপ㞾 include/asm•i386/string.h˖ ϡᛳࠄೄ䲒ˈгᯢⱑЎҔМ㽕⫼∛㓪䇁㿔ⱘॳ಴њDŽ 䖭䞠ⱘᣛҸ btsl ᇚϔϾ 32 ԡ᪡԰᭄Ёⱘᶤϔԡ䆒㕂៤ 1DŽখ᭄ nr 㸼⼎䆹ԡⱘԡ㕂DŽ⦄೼䇏㗙ᑨ䆹 32 } 31 :"Ir" (nr)); 30 :"=m" (ADDR) 29 "btsl %1,%0" 28 __asm__ __volatile__( LOCK_PREFIX 27 { 26 static __inline__ void set_bit(int nr, volatile void * addr) 25 24 #define ADDR (*(volatile long *) addr) 23 22 #endif 21 #define LOCK_PREFIX "" 20 #else 19 #define LOCK_PREFIX "lock ; " 18 #ifdef CONFIG_SMP ==================== include/asm•i386/bitops.h 18 32 ==================== ݡⳟϔ↉ጠܹ∛㓪ҷⷕˈ䖭ϔ⃵প㞾 include/asm•i386/bitops.hDŽ 28 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 146 return __res; 145 :"1" (cs),"2" (ct),"3" (count)); 144 :"=a" (__res), "=&S" (d0), "=&D" (d1), "=&c" (d2) 143 "4:" 142 "orb $1,%%al\n" 141 "3:\tsbbl %%eax,%%eax\n\t" 140 "jmp 4f\n" 139 "2:\txorl %%eax,%%eax\n\t" 138 "jne 1b\n" 137 "testb %%al,%%al\n\t" 136 "jne 3f\n\t" 135 "scasb\n\t" 134 "lodsb\n\t" 133 "js 2f\n\t" 132 "1:\tdecl %3\n\t" 131 __asm__ __volatile__( 130 int d0, d1, d2; 129 register int __res; 128 { 127 static inline int strncmp(const char * cs,const char * ct,size_t count) ==================== include/asm•i386/string.h 127 147 ==================== ݠᇍ✻ⴔ䯙䇏DŽ ԰Ў䇏㗙ⱘ໡дᴤ᭭ˈϟ䴶ᰃ strncmp()ⱘҷⷕˈϡ❳ᙝ i386 ᣛҸⱘ䇏㗙ৃҹᡒϔᴀ Intel ⱘᣛҸ᠟ 㓪䆥ҹৢ⫼ objdump ⳟᅗⱘᅲ⦄ˈᑊϢℸ԰ϔ↨䕗ˈⳌֵህ㛑ԧӮࠄЎҔМ䖭䞠㽕䞛⫼∛㓪䇁㿔DŽ ৺߭ህᡞᅗ䏇䖛DŽࠄ䖒ᷛো 2 ⱘᯊ׭ˈᠻ㸠ህ㒧ᴳњDŽ䇏㗙ϡོ㞾Ꮕݭϔ↉ C ҷⷕᴹᅲ⦄䖭Ͼߑ᭄ˈ ⌟䆩᪡԰᭄%4 ⱘ bit0ˈབᵰ䖭ϔԡЎ 1 ህ䇈ᯢ䖬࠽ϟüϾᄫ㡖ˈ᠔ҹ䗮䖛ᣛҸ movsb ݡ໡ࠊϔϾᄫ㡖ˈ ৺߭ህᡞᅗ䏇䖛DŽݡ䗮䖛 testb˄⊼ᛣᅗࠡ䴶ᰃ\tˈ㸼⼎೼乘໘⧚ৢⱘ∛㓪ҷⷕЁᦦܹϔϾ TAB ᄫヺ˅ བᵰ䖭ϔԡЎ 1 ህ䇈ᯢ䖬᳝㟇ᇥϸϾᄫ㡖ˈ᠔ҹ䗮䖛ᣛҸ movsw ໡ࠊϔϾⷁᄫ˄esi ੠ edi ߭ߚ߿ࡴ 2˅ˈ 䗮䖛 testb ⌟䆩᪡԰᭄%4ˈे໡ࠊ䭓ᑺ n ⱘ᳔Ԣᄫ㡖Ёⱘ bit1ˈܜ᥹ⴔህᰃ໘⧚࠽ϟⱘϝϾᄫ㡖њDŽ ⱘᆘᄬ఼ЁDŽ ᭄ˈ䖭ѯ䛑䱤৿೼ᣛҸЁˈҢᄫ䴶ϞⳟϡߎᴹDŽৠᯊˈ䖭г䇈ᯢњЎҔМ䖭ѯ᪡԰᭄ᖙ乏ᄬᬒ೼ᣛᅮ esi ҹঞ edi ϝϾᆘᄬ఼ˈे%0˄ৠᯊгᰃ%3˅ǃ %2˄ৠᯊгᰃ%6˅ҹঞ%1˄ৠᯊгᰃ%5˅ϝϾ᪡԰ ࠄ䖒 204 㸠ᯊˈ᠔᳝ⱘ䭓ᄫ䛑Ꮖ໡ࠊདˈ᳔໮া࠽ϟϝϾᄫ㡖њDŽ೼䖭Ͼ䖛⿟Ёˈᅲ䰙ϞՓ⫼њ ecxǃ ⱘഄᮍ໡ࠊϔϾ䭓ᄫࠄ edi ᠔ᣛⱘഄᮍˈᑊՓ esi ੠ edi ߚ߿ࡴ 4DŽ䖭ḋˈᔧҷⷕЁⱘ 203 㸠ᠻ㸠ᅠ↩ˈ Ⳉࠄব៤ 0 ЎℶDŽ᠔ҹˈ೼䖭↉ҷⷕЁϔ݅ᠻ㸠 n/4 ⃵DŽ䙷Мˈmovsl জᑆѯҔМਸ਼˛ᅗҢ esi ᠔ᣛˈ1 ޣ㄀ϔᴵᣛҸᰃ“repāˈ㸼⼎ϟϔᴵᣛҸ movsl 㽕䞡໡ᠻ㸠ˈ↣䞡໡ϔ䘡ህᡞᆘᄬ఼ ecx Ёⱘݙᆍ ᅠ䖭ѯᣛҸህᯢⱑњDŽ ݡⳟᣛҸ䚼ˈ䇏㗙偀Ϟህ㛑ⳟࠄ䖭䞠ԐТা⫼њ%4DŽЎҔМ䙷М໮ⱘ᪡԰᭄ԐТ䛑≵᳝⫼ਸ਼˛䇏 ᠔ҹгᖙ乏ᰃᆘᄬ఼ edi ੠ esiDŽ ӏᛣߚ䜡ϔϾᆘᄬ఼ᄬᬒDŽ᪡԰᭄%5 Ϣ%6ˈेখ᭄ to Ϣ fromˈߚ߿Ϣ%1 ੠%2 Փ⫼Ⳍৠⱘᆘᄬ఼ˈ ᇚ݊䆒㕂៤ n/4ˈᅲ䰙Ϟᰃᇚ໡ࠊ䭓ᑺҢᄫ㡖Ͼ᭄ n ᤶㅫ៤䭓ᄫϾ᭄ n/4DŽ㟇Ѣ n ᴀ䑿ˈ߭㽕∖ gccܜџ Ϣ᪡԰᭄%0 Փ⫼ৠϔϾᆘᄬ఼ˈ᠔ҹгᖙ乏ᰃᆘᄬ఼ ecx˗ᑊϨ㽕∖⬅ gcc 㞾ࡼᦦܹᖙ㽕ⱘᣛҸˈ %3 29 30 147 } Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Ⳍࡴ֓ᕫࠄⳌᑨⱘ⠽⧚ഄഔDŽ (4) 㒓ᗻഄഔЁⱘ᳔ৢԡ↉Ў⠽⧚义䴶ݙⱘⳌᇍԡ⿏䞣ˈᇚℸԡ⿏䞣ϢⳂᷛ⠽⧚义䴶ⱘ䍋ྟഄഔ ᣛ৥⠽⧚义䴶ⱘᣛ䩜DŽ (3) ⫼㒓ᗻഄഔЁⱘ㄀ϝϾԡ↉԰Ўϟᷛ೼义䴶㸼ЁᡒࠄⳌᑨⱘ㸼乍 PTEˈ䆹㸼乍Ёᄬᬒⱘህᰃ (2) ⫼㒓ᗻഄഔЁⱘ㄀ѠϾԡ↉԰Ўϟᷛ೼ℸ PMD ЁᡒࠄⳌᑨⱘ㸼乍ˈ䆹㸼乍ᣛ৥Ⳍᑨ义䴶㸼DŽ Ⳃᔩ PMDDŽ (1) ⫼㒓ᗻഄഔЁ᳔催ⱘ䙷ϔϾԡ↉԰Ўϟᷛ೼ PGD ЁᡒࠄⳌᑨⱘ㸼乍ˈ䆹㸼乍ᣛ৥ⳌᑨⱘЁ䯈 ࠄ⠽⧚ഄഔⱘ᯴ᇘ˖ ߚབϟಯℹᅠ៤Ң㒓ᗻഄഔܗ݋ԧϔ⚍䇈ˈᇍѢ CPU থߎⱘ㒓ᗻഄഔˈ㰮ᢳⱘ Linux ݙᄬㅵ⧚ऩ ⠽⧚义䴶ݙⱘԡ⿏DŽ䖭ḋˈᇍ㒓ᗻഄഔⱘ᯴ᇘህߚ៤བ೒ 2.1 ᠔⼎ⱘಯℹDŽ ԡ↉ˈ৘ऴ㢹ᑆԡˈߚ߿⫼԰೼Ⳃᔩ PGD ЁⱘϟᷛǃЁ䯈Ⳃᔩ PMD Ёⱘϟᷛǃ义䴶㸼Ёⱘϟᷛҹঞ ⱘ㓽ݭDŽPGDǃPMD ੠ PT ϝ㗙ഛЎ᭄㒘DŽⳌᑨഄˈ೼䘏䕥Ϟгᡞ㒓ᗻഄഔҢ催ԡࠄԢԡߦߚ៤ 4 Ͼ ⿄Ў PGDˈЁ䯈Ⳃᔩ⿄Ў PMDˈ㗠义䴶㸼߭⿄Ў PTDŽPT Ёⱘ㸼乍߭⿄Ў PTEˈPTE ᰃ“ Page Table Entry” ݙḌⱘ᯴ᇘᴎࠊ䆒䅵៤ϝሖˈ೼义䴶Ⳃᔩ੠义䴶㸼Ё䯈๲䆒њϔሖĀЁ䯈ⳂᔩāDŽ೼ҷⷕЁˈ义䴶Ⳃᔩ Ў෎⸔ˈ䆒䅵ߎϔ⾡䗮⫼ⱘ῵ൟˈݡᡞᅗߚ߿㨑ᅲࠄ৘⾡݋ԧⱘ CPU ϞDŽ಴ℸˈLinux˅ܗᄬㅵ⧚ऩ ᛇⱘǃ㰮ᢳⱘ CPU ੠ MMU˄ݙ؛⾡ᅲ⦄ˈ᠔ҹϡ㛑ҙҙ䩜ᇍ i386 㒧ᵘᴹ䆒䅵ᅗⱘ᯴ᇘᴎࠊˈ㗠㽕ҹϔ Linux ݙḌⱘ䆒䅵㽕㗗㰥ࠄ೼৘⾡ϡৠ CPU Ϟⱘᅲ⦄ˈ䖬㽕㗗㰥ࠄ೼ 64 ԡ CPU˄བ Alpha˅Ϟⱘ ഄഔⱘᆑᑺ໻Ѣ 32 ԡᯊˈϸሖ᯴ᇘህᰒᕫϡሑড়⧚ˈϡ໳᳝ᬜњDŽ 䴶㸼˄1024 Ͼ义䴶ᦣ䗄乍˅DŽᔧഄഔⱘᆑᑺЎ 32 ԡᯊˈϸሖ᯴ᇘᴎࠊ↨䕗᳝ᬜг↨䕗ড়⧚DŽԚᰃˈᔧ ϔϾⳂᔩ乍᠔ᇍᑨⱘ䙷䚼ߚぎ䯈ᰃϾぎ⋲ˈህৃҹᡞ䆹Ⳃᔩ乍䆒㕂៤ĀぎāˈҢ㗠ⳕϟњϢПᇍᑨⱘ义 ໻໮᭄䖯⿟ϡӮ⫼ࠄᭈϾ㰮ᄬぎ䯈ˈ೼㰮ᄬぎ䯈Ё䗮ᐌ䛑⬭᳝ᕜ໻ⱘĀぎ⋲āDŽ䞛⫼ϸሖⱘᮍᓣˈা㽕 ϟৃҹ㡖ⳕ义䴶㸼᠔ऴ⫼ⱘぎ䯈DŽ಴Ўމ⦄Ң㒓ᗻഄഔࠄ⠽⧚ഄഔⱘ᯴ᇘDŽ䖭⾡᯴ᇘ῵ᓣ೼໻໮᭄ᚙ ៥Ӏ೼ࠡ䴶䇜ࠄ䖛ˈi386 CPU Ёⱘ义ᓣᄬㅵⱘ෎ᴀᗱ䏃ᰃ˖䗮䖛义䴶Ⳃᔩ੠义䴶㸼ߚϸϾሖ⃵ᅲ ㅵ⧚᳔㒜ⱘᅲ⦄ᔧ✊㽕⬅䕃ӊᅠ៤DŽ ೼Ϟϔゴˈ៥Ӏҟ㒡њ i386 CPUˈࣙᣀ Pentiumˈ೼⹀ӊሖ⃵Ϟᇍݙᄬㅵ⧚᠔ᦤկⱘᬃᣕDŽݙᄬ 2.1 Linuxݙᄬㅵ⧚ⱘ෎ᴀḚᶊ ㅵ⧚ټ㄀2ゴᄬ 31 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 8 #define PGDIR_SHIFT 22 7 6 */ 5 * traditional i386 two•level paging structure: 4 /* ==================== include/asm•i386/pgtable•2level.h 4 18 ==================== ᭛ӊ include/asm•i386/pgtable•2level.h ЁᅮНњѠሖ᯴ᇘЁ PGD ੠ PMD ⱘ෎ᴀ㒧ᵘ˖ 䇏㗙ৃҹ㞾㸠䯙䇏᳝݇ 36 ԡഄഔⱘϝሖ᯴ᇘⱘҷⷕDŽ ഔⱘѠሖ᯴ᇘDŽ䖭䞠ˈ៥Ӏᇚ䲚Ё䅼䆎 32 ԡഄഔⱘѠሖ᯴ᇘDŽ೼ᓘ⏙њ 32 ԡഄഔⱘѠሖ᯴ᇘҹৢˈ pgtable•3level.h ៪ pgtable•2level.h ЁѠ㗙䗝ϔˈࠡ㗙⫼Ѣ 36 ԡഄഔⱘϝሖ᯴ᇘˈ㗠ৢ㗙߭⫼Ѣ 32 ԡഄ ⫼ 36 ԡഄഔˈ߭೼㓪䆥ᯊ䗝ᢽ乍 CONFIG_X86_PAE Ў 1ˈ৺߭Ў 0ˈḍ᥂ℸ乍䗝ᢽˈ㓪䆥ᯊҢ ᅮ䞛އ೼䜡㕂㋏㒳ᯊ䖬᳝ϔϾ䗝ᢽ乍ᰃ݇Ѣ PAE ⱘˈབᵰ᠔⫼ⱘ CPU ᰃ Pentium Pro ៪ҹϞᯊˈᑊϨ ヺো䖲᥹ࠄ݋ԧ CPU ϧ⫼ⱘ᭛ӊⳂᔩDŽᇍѢ i386 CPUˈ䆹Ⳃᔩ㹿ヺো䖲᥹ࠄ include/asm•i386DŽৠᯊˈ ḍ᥂೼㓪䆥 Linux ݙḌПࠡⱘ㋏㒳䜡㕂˄config˅䖛⿟Ёⱘ䗝ᢽˈ㓪䆥ⱘᯊ׭ӮᡞⳂᔩ include/asm 110 #endif 109 # include 108 #else 107 # include 106 #if CONFIG_X86_PAE ==================== include/asm•i386/pgtable.h 106 110 ==================== include/asm•i386/pgtable.h Ёⱘϔ↉ᅮН˖ 䅽៥Ӏᴹⳟܜ䙷Мˈ݋ԧᇍѢ i386 㒧ᵘⱘ CPUˈLinux ݙḌᰃᗢḋᅲ⦄䖭⾡᯴ᇘᴎࠊⱘਸ਼˛佪 ⱘ CPU Ϟˈা㽕ᇚ CPU ⱘݙᄬㅵ⧚䆒㕂៤ PAE ῵ᓣˈህ㛑Փ㰮ᄬⱘ᯴ᇘব៤ϝሖ῵ᓣDŽ 䆌ᇚഄഔᆑᑺҢ 32 ԡᦤ催ࠄ 36 ԡˈᑊϨ೼⹀ӊϞᬃᣕϝሖ᯴ᇘ῵ൟDŽ䖭ḋˈ೼ Pentium Pro ঞҹৢܕ ˈࡳ㛑 PAEܙ᯴ᇘˈ䏇䖛Ё䯈ⱘ PMD ሖ⃵DŽ঺ϔᮍ䴶ˈҢ Pentium Pro ᓔྟˈIntel ᓩܹњ⠽⧚ഄഔᠽ ᅲ䰙Ϟϡᰃᣝϝሖ㗠ᰃᣝϸሖⱘ῵ൟ䖯㸠ഄഔ᯴ᇘⱘDŽ䖭ህ䳔㽕ᇚ㰮ᢳⱘϝሖ᯴ᇘ㨑ᅲࠄ݋ԧⱘϸሖ Ԛᰃˈ䖭Ͼ㰮ᢳⱘ᯴ᇘ῵ൟᖙ乏㨑ᅲࠄ݋ԧ CPU ੠ MMU ⱘ⠽⧚᯴ᇘᴎࠊDŽህҹ i386 ᴹ䇈ˈCPU ೒ 2.1 ϝሖഄഔ᯴ᇘ⼎ᛣ೒ 32 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ݋ԧⱘ᯴ᇘ಴ぎ䯈ⱘᗻ䋼㗠ᓖˈԚᰃৢ䴶䇏㗙ᇚӮⳟࠄ˄䰸⫼ᴹ῵ᢳ 80286 ⱘ VM86 ῵ᓣ໪˅ˈ݊ 䖛њˈԚᰃ䕃ӊⱘ㒧ᵘै䖬ֱᣕⴔϝሖ᯴ᇘⱘḚᶊDŽ 䖭ḋˈ䘏䕥Ϟⱘϝሖ᯴ᇘᇍѢ i386 CPU੠ MMU ህব៤њѠሖ᯴ᇘˈᡞЁ䯈Ⳃᔩ PMD 䖭ϔሖ䏇 ഄഔⳌࡴ֓ᕫࠄⳌᑨⱘ⠽⧚ഄഔDŽ (4) 㒓ᗻഄഔЁⱘ᳔ৢԡ↉Ў⠽⧚义䴶ݙⱘⳌᇍԡ⿏䞣ˈMMU ᇚℸԡ⿏䞣ϢⳂᷛ⠽⧚义䴶ⱘ䍋ྟ ЁᡒࠄⳌᑨⱘ㸼乍 PTEˈ䆹㸼乍Ёᄬᬒⱘህᰃᣛ৥⠽⧚义䴶ⱘᣛ䩜DŽ (3) ݙḌЎ MMU 䆒㕂དњ᠔᳝ⱘ义䴶㸼ˈMMU ⫼㒓ӊഄഔЁⱘ PT ԡ↉԰Ўϟᷛ೼Ⳍᑨ义䴶㸼 ህᰃֱᣕॳؐϡব˗⦄೼ϔ䕀᠟ैᣛ৥义䴶㸼њDŽ (2) PMD াᰃ䘏䕥Ϟᄬ೼ˈेᇍݙḌ䕃ӊ೼ὖᗉϞᄬ೼ˈԚᰃ㸼Ёা᳝ϔϾ㸼乍ˈ㗠᠔䇧ⱘ᯴ᇘ ৥Ⳍᑨⱘ义䴶㸼ˈMMU ᑊϡⶹ䘧 PMD ⱘᄬ೼DŽ ᷛ೼ PGD ЁᡒࠄⳌᑨⱘ㸼乍DŽ䆹㸼乍䘏䕥Ϟᣛ৥ϔϾЁ䯈Ⳃᔩ PMDˈԚᰃᰃ⠽⧚ϞⳈ᥹ᣛ (1) ݙḌЎ MMU 䆒㕂ད᯴ᇘⳂᔩ PGDˈMMU ⫼㒓ᗻഄഔЁ᳔催ⱘ䙷ϔϾԡ↉˄10 ԡ˅԰Ўϟ 䖭ḋˈϞ䗄ⱘ 4 ℹ᯴ᇘ䖛⿟ᇍѢݙḌ˄䕃ӊ˅੠ i386 MMU ህ៤Ў˖ ህ៤Ў 1˄20=1˅DŽ 㽕Ң㒓ᗻഄഔ䘏䕥Ϟⱘ 4 Ͼ㰮ᢳԡ↉Ёᡞ PMD ᢑএˈՓᅗⱘ䭓ᑺЎ 0ˈ᠔ҹ䘏䕥Ϟⱘ PMD 㸼ⱘ໻ᇣ ঞ݊ MMU 㗠ᅮНⱘˈ಴Ў㽕ᇚ Linux 䘏䕥ϟⱘϝሖ᯴ᇘ῵ൟ㨑ᅲࠄ i386 㒧ᵘ⠽⧚ϞⱘѠሖ᯴ᇘˈህ Ͼ᭄ PTRS_PER_PMD ߭ᅮНЎ 1ˈ㸼⼎↣Ͼ PMD 㸼Ёা᳝ϔϾ㸼乍DŽৠḋˈ䖭гᰃ䩜ᇍ i386 CPU ⱘ䭓ᑺЎ 0ˈϔϾ PMD 㸼乍᠔ҷ㸼ⱘぎ䯈Ϣ PGD 㸼乍᠔ҷ㸼ⱘぎ䯈ᰃϔḋ໻ⱘDŽ㗠 PMD 㸼Ёᣛ䩜ⱘ ᇍ PMD ⱘᅮНህᕜ᳝ᛣᗱњDŽPMD_SHIFT гᅮНЎ 22ˈϢ PGD_SHIFF Ⳍৠˈ㸼⼎ PMD ԡ↉ 㗠Ⳃᔩⱘ໻ᇣЎ 1024DŽϡ 䖛 ˈ೼ 32 ԡⱘ㋏㒳Ё↣Ͼᣛ䩜ⱘ໻ᇣЎ 4 Ͼᄫ㡖ˈ᠔ҹ PGD 㸼ⱘ໻ᇣЎ 4KBDŽ i386 CPU ঞ݊ MMU ⱘˈ಴Ў䴲 PAE ῵ᓣⱘ i386 MMU ⫼㒓ᗻഄഔЁⱘ᳔催 10 ԡ԰ЎⳂᔩЁⱘϟᷛˈ 䖭ᰃϢ㒓ᗻഄഔЁ PGD ԡ↉ⱘ䭓ᑺ˄10 ԡ˅Ⳍヺⱘˈ಴Ў 210=1024DŽ䖭ϸϾᐌ᭄ؐⱘᅮНᅠܼᰃ䩜ᇍ ᯊˈpgtable_2level.h ЁজᅮНњ PTRS_PER_PGDˈгህ䅽↣Ͼ PGD 㸼Ёᣛ䩜ⱘϾ᭄Ў 1024DŽᰒ✊ˈ гህᰃ䇈ˈPGD Ёⱘ↣ϔϾ㸼乍᠔ҷ㸼ⱘぎ䯈˄ᑊϡᰃ PGD ᴀ䑿᠔ऴⱘぎ䯈˅໻ᇣᰃ 1×222DŽৠ 117 #define PGDIR_SIZE (1UL << PGDIR_SHIFT) ==================== include/asm•i386/pgtable.h 117 117 ==================== ೼᭛ӊ include/asm•i386/pgtable.h ЁᅮНњ঺ϔϾᐌ᭄ PGDIR_SIZE Ў˖ ˄㄀ 23 ԡ˅DŽ⬅Ѣ PGD ᰃ㒓ᗻഄഔЁ᳔催ⱘԡ↉ˈ᠔ҹ䆹ԡ↉ᰃҢ㄀ 23 ԡࠄ㄀ 32 ԡˈϔ݅ᰃ 10 ԡDŽ 䖭䞠 PGDIR_SHIFT 㸼⼎㒓ᗻഄഔЁ PGD ϟᷛԡ↉ⱘ䍋ྟԡ㕂ˈ᭛ӊЁᇚ݊ᅮНЎ 22ˈгे bit22 18 #define PTRS_PER_PTE 1024 17 16 #define PTRS_PER_PMD 1 15 #define PMD_SHIFT 22 14 */ 13 * PMD directory physically. 12 * the i386 is two•level, so we don't really have any 11 /* 10 define PTRS_PER_PGD 1024# 9 33 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ...... 82 81 #define __PAGE_OFFSET (0xC0000000) 80 79 */ 78 * and CONFIG_HIGHMEM64G options in the kernel configuration. 77 * If you want more physical memory than this then see the CONFIG_HIGHMEM4G 76 * 75 * amount of physical memory you can use to about 950MB. 74 * a virtual address space of one gigabyte, which limits the 73 * A __PAGE_OFFSET of 0xC0000000 means that the kernel has 72 * 71 * it. 70 * option, but too many people screw it up, and too few need 69 * This handles the memory map.. We could make this a config 68 /* ==================== include/asm•i386/page.h 68 82 ==================== ಴ℸˈ೼ҷⷕЁᇚℸԡ⿏⿄Ў PAGE_OFFSET 㗠ᅮНѢ᭛ӊ include/asm•i386/page.h Ё˖ ᓔྟDŽ᠔ҹˈᇍѢݙḌᴹ䇈ˈ݊ഄഔⱘ᯴ᇘᰃᕜㅔऩⱘ㒓ᗻ᯴ᇘˈ0xC0000000 ህᰃϸ㗙П䯈ⱘԡ⿏䞣DŽ 㱑✊㋏㒳ぎ䯈ऴ᥂њ↣Ͼ㰮ᄬぎ䯈Ё᳔催ⱘ 1G ᄫ㡖ˈ೼⠽⧚ⱘݙᄬЁैᘏᰃҢ᳔Ԣⱘഄഔ˄0˅ ೒ 2.2 䖯⿟㰮ᄬぎ䯈⼎ᛣ೒ 催ⱘ 1G ᄫ㡖߭ЎϢ᠔᳝䖯⿟ҹঞݙḌ݅ѿⱘ㋏㒳ぎ䯈ˈབ೒ 2.2 ⼎DŽ Ң݋ԧ䖯⿟ⱘ㾦ᑺⳟˈ߭↣Ͼ䖯⿟䛑ᢹ᳝ 4G ᄫ㡖ⱘ㰮ᄬぎ䯈ˈ䕗Ԣⱘ 3G ᄫ㡖Ў㞾Ꮕⱘ⫼᠋ぎ䯈ˈ᳔ ϔϾ䖯⿟䗮䖛㋏㒳䇗⫼䖯ܹњݙḌˈ䆹䖯⿟ህ೼݅ѿⱘ㋏㒳ぎ䯈Ё䖤㸠ˈϡݡ᳝݊㞾Ꮕⱘ⣀ゟぎ䯈DŽ ӊ˅໻ᇣⱘ䰤ࠊDŽ㱑✊৘Ͼ䖯⿟ᢹ᳝݊㞾Ꮕⱘ 3G ᄫ㡖⫼᠋ぎ䯈ˈ㋏㒳ぎ䯈ै⬅᠔᳝ⱘ䖯⿟݅ѿDŽ↣ᔧ ఼˄ࣙᣀݙᄬҹঞ⺕ⲬѸᤶऎ៪Ѹᤶ᭛ټⱘ⫼᠋ぎ䯈䛑ᰃ 3G ᄫ㡖DŽᔧ✊ˈᅲ䰙ⱘぎ䯈໻ᇣফࠄ⠽⧚ᄬ 㡖˄Ң㰮ഄഔ 0x0 㟇 0xBFFFFFFF˅ˈ⫼԰৘Ͼ䖯⿟ⱘĀ⫼᠋ぎ䯈āDŽ䖭ḋˈ⧚䆎Ϟ↣Ͼ䖯⿟ৃҹՓ⫼ ᄫ㡖˄Ң㰮ഄഔ 0xC0000000 㟇 0xFFFFFFFF˅ˈ⫼ѢݙḌᴀ䑿ˈ⿄ЎĀ㋏㒳ぎ䯈āDŽ㗠ᇚ䕗Ԣⱘ 3G ᄫ 32 ԡഄഔᛣੇⴔ 4G ᄫ㡖ⱘ㰮ᄬぎ䯈ˈLinux ݙḌᇚ䖭 4G ᄫ㡖ⱘぎ䯈ߚ៤ϸ䚼ߚDŽᇚ᳔催ⱘ 1G ऎߚDŽ ↉ᓣ᯴ᇘ෎ഄഔᘏᰃ 0ˈ᠔ҹ㒓ᗻഄഔϢ㰮ᢳഄഔᘏᰃϔ㟈ⱘDŽ೼ҹৢⱘ䅼䆎Ёˈ៥ӀᐌᐌᇍѠ㗙ϡࡴ 34 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ㅵټㅵ⧚Ⳍ↨ˈ义ᓣᄬټᢳഄഔĀ᯴ᇘā៤˄៪㗙䇈বᤶ៤˅ᶤϾ⠽⧚ݙᄬ义䴶ЁⱘഄഔDŽϢ↉ᓣᄬ ㅵ⧚DŽ㰮ᢳഄഔぎ䯈ߦߚ៤೎ᅮ໻ᇣⱘĀ义䴶āˈ⬅ MMU ೼䖤㸠ᯊᇚ㰮ټLinux ݙḌ䞛⫼义ᓣᄬ 2.2 ഄഔ᯴ᇘⱘܼ䖛⿟ 䖰ᰃ 0ˈㄝㄝ˅ҹ໪ˈᇮ᳝ 8180 Ͼ㸼乍ৃկՓ⫼ˈ᠔ҹ⧚䆎Ϟ㋏㒳Ё᳔໻ⱘ䖯⿟᭄䞣ᰃ 4090DŽ 乍ߚ߿⫼ѢݙḌⱘҷⷕ↉੠᭄᥂↉ˈ㄀ 4 乍੠㄀ 5 乍∌䖰⫼Ѣᔧࠡ䖯⿟ⱘҷⷕ↉੠᭄᥂↉ˈ㄀ 1 乍∌ ᆑᑺᰃ 13 ԡˈ᠔ҹ GDT Ёৃҹ᳝ 8192 Ͼᦣ䗄乍DŽ䰸ϔѯ㋏㒳ⱘᓔ䫔˄՟བ GDT Ёⱘ㄀ 2 乍੠㄀ 3 ሔ↉ᦣ䗄㸼 GDT Ёऴ᥂ϸϾ㸼乍DŽ䙷МˈGDT ⱘᆍ䞣᳝໮໻ਸ਼˛↉ᆘᄬ఼Ё⫼԰ GDT 㸼ϟᷛⱘԡ↉ ᳝ϔϾ TSS 㒧ᵘ˄ӏࡵ⢊ᗕ↉˅гᰃϔḋDŽ˄݇Ѣ TSS ҹৢ䖬Ӯࡴҹ䅼䆎˅᠔ҹˈ↣Ͼ䖯⿟䛑㽕೼ܼ 㽕᳝ϔϾ㸼乍ᣛ৥䖭Ͼ↉ⱘ䍋ྟഄഔˈᑊ䇈ᯢ䆹↉ⱘ䭓ᑺҹঞ݊ᅗϔѯখ᭄DŽ䰸ℸП໪ˈ↣Ͼ䖯⿟䖬 ࠡ䴶䆆䖛ˈ↣Ͼ䖯⿟ⱘሔ䚼↉ᦣ䗄㸼 LDT 䛑԰ЎϔϾ⣀ゟⱘ↉㗠ᄬ೼ˈ೼ܼሔ↉ᦣ䗄㸼 GDT Ё ᣛ৥ᮄ䖯⿟ next ⱘ义䴶Ⳃᔩ㸼 PGD њDŽ ⠽⧚ഄഔ˄ᄬᬒ೼ᶤϾᆘᄬ఼˅ˈ✊ৢ⫼ mov ᣛҸᇚ݊ݭܹᆘᄬ఼ CR3DŽ㒣䖛䖭ᴵᣛҸҹৢˈCR3 ህ 䖭ᰃϔ㸠∛㓪ҷⷕˈ䇈ⱘᰃᇚ next•>pgdˈेϟϔϾ䖯⿟ⱘ义䴶Ⳃᔩ䍋ྟഄഔˈ䗮䖛__pa()䕀ᤶ៤ 44 asm volatile("movl %0,%%cr3": :"r" (__pa(next•>pgd))); 43 /* Re•load page tables */ ==================== include/asm•i386/mmu_context.h 43 44 ==================== ᯊ׭ህ㽕⫼ࠄ__pa()њDŽ䖭㸠䇁হ೼᭛ӊ include/asm•i386/mmu_context.h Ё˖ 䖯⿟ⱘ义䴶Ⳃᔩ PGDˈ㗠䆹Ⳃᔩⱘ䍋ྟഄഔ೼ݙḌҷⷕЁᰃ㰮ഄഔˈԚ CR3 ᠔䳔㽕ⱘᰃ⠽⧚ഄഔˈ䖭 䘧ϢϔϾ㰮ഄഔᇍᑨⱘ⠽⧚ഄഔᯊᦤկᮍ֓DŽ՟བˈ೼ߛᤶ䖯⿟ⱘᯊ׭㽕ᇚᆘᄬ఼ CR3 䆒㕂៤ᣛ৥ᮄ ᔧ✊ˈCPU ᑊϡᰃ䗮䖛䖭䞠᠔䇈ⱘ䅵ㅫᮍ⊩䖯㸠ഄഔ᯴ᇘⱘˈ__pa()াᰃЎݙḌҷⷕЁᔧ䳔㽕ⶹ 䖭ᰃ಴Ў೼䇜䆎ϔϾ⫼᠋䖯⿟ⱘ໻ᇣᯊˈᑊϡࣙᣀℸ䖯⿟೼㋏㒳ぎ䯈Ё݅ѿⱘ䌘⑤DŽ 261 #define TASK_SIZE (PAGE_OFFSET) 260 */ 259 * User space process size: 3GB (default). 258 /* ==================== include/asm•i386/processor.h 258 261 ==================== ˄include/asm•i386/processor.h˅˖ ৠᯊˈPAGE_OFFSET гҷ㸼ⴔ⫼᠋ぎ䯈ⱘϞ䰤ˈ᠔ҹᐌ᭄ TASK_SIZE ህᰃ䗮䖛ᅗᅮНⱘ ᑨഄˈ㒭ᅮϔϾ⠽⧚ഄഔ xˈ݊㰮ഄഔᰃ x+PAGE_OFFSETDŽ এ PAGE_OFFSET˗Ⳍޣгህᰃ䇈˖ᇍѢ㋏㒳ぎ䯈㗠㿔ˈ㒭ᅮϔϾ㰮ഄഔ xˈ݊⠽⧚ഄഔᰃҢ x Ё 116 #define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET)) 115 #define __pa(x) ((unsigned long)(x)•PAGE_OFFSET) 114 #define PAGE_OFFSET ((unsigned long)__PAGE_OFFSET) 113 include/asm•i386/page.h 113 116 ==================== ==================== 35 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 804837e: c9 leave 804837b: 83 c4 10 add $0x10,%esp 8048376: e8 35 ff ff ff call 80482b0 8048371: 68 84 84 04 08 push $0x8048484 804836e: 83 ec 0c sub $0xc,%esp 804836b: 83 ec 08 sub $0x8,%esp 8048369: 89 e5 mov %esp,%ebp 8048368: 55 push %ebp 08048368 : ৃҹᕫࠄ៥Ӏ᠔݇ᖗⱘ䙷䚼ߚ㒧ᵰˈ䕧ߎⱘ⠛ᮁ˄ড∛㓪ⱘ㒧ᵰ˅Ў˖ $objdump •d hello ᳝ϔϾᅲ⫼⿟ᑣDŽobjdump ᰃ䴲ᐌ᳝⫼ⱘˈৃҹ⫼ᴹড∛㓪ϔ↉Ѡ䖯ࠊҷⷕDŽ䗮䖛ੑҸ˖ ᴹⳟⳟ gcc ੠ ld˄㓪䆥੠䖲᥹˅ᠻ㸠ৢⱘ㒧ᵰDŽLinuxܜ㒣䖛㓪䆥ҹৢˈ៥Ӏᕫࠄৃᠻ㸠ҷⷕ helloDŽ greeting()ᴹᰒ⼎៪ᠧॄ“Hello, world!āDŽ 䇏㗙ϔᅮᕜ❳ᙝDŽ䖭Ͼ⿟ᑣϢ໻䚼ߚҎݭⱘ㄀ϔϾ C ⿟ᑣা᳝ϔ⚍ϡৠˈ៥Ӏᬙᛣ䅽 main()䇗⫼ } greeting(); { main() } printf("Hello, world!\n"); { greeting() #include ᅮ៥Ӏݭњ䖭МϔϾ⿟ᑣ˖؛ 㒣䖛↉ᓣ᯴ᇘˈ✊ৢᠡৃҹᅲ⦄݊ᴀ䑿ⱘ䆒䅵DŽܜা㽕ᰃ೼ i386 Ϟᅲ⦄ˈህᖙ乏㟇ᇥ೼ᔶᓣϞ㽕 M68KǃPower PCㄝˈህḍᴀϡᄬ೼↉ᓣ᯴ᇘ䖭ϔሖњDŽডПˈϡㅵᰃҔМ᪡԰㋏㒳˄՟བ UNIX˅ˈ ഄഔ᯴ᇘⱘܼ䖛⿟DŽ䖭䞠㽕ᣛߎˈ䖭Ͼ䖛⿟ҙᰃᇍ i386 ໘⧚఼㗠㿔ⱘDŽᇍѢ݊ᅗⱘ໘⧚఼ˈ↨བ䇈 ᳝ᬓㄪˈ៥᳝ᇍㄪāˈᛍϡ䍋ህ䒆ⴔ䍄DŽᴀ㡖ᇚ䗮䖛ϔϾᚙ᱃ˈⳟⳟ Linux ݙḌ೼ i386 CPUϞ䖤㸠ᯊ ᯴ᇘⱘ䖛⿟ᅲ䰙Ϟϡ䍋ҔМ԰⫼˄䰸⡍⅞ⱘ VM86 ῵ᓣ໪ˈ䙷ᰃ⫼ᴹ῵ᢳ 80286 ⱘ˅DŽгህᰃ䇈ˈĀԴ ㅵ⧚ᡔᴃ䖭ḋϔ⾡Ԑᰃ㗠䴲ⱘ㒧䆎DŽϟ䴶䇏㗙ᇚӮⳟࠄˈLinux ݙḌ᠔䞛পⱘࡲ⊩ᰃՓ↉ᓣټ义ᓣāᄬ ⱘঠ䞡᯴ᇘ݊ᅲᰃ↿᮴ᖙ㽕ⱘˈгՓ᯴ᇘⱘ䖛⿟বᕫϡᆍᯧ⧚㾷ˈҹ㟇᳝Ҏ䖬ᕫߎњ Linux 䞛⫼Ā↉ ᇘˈ✊ৢᠡ㛑䖯㸠义ᓣ᯴ᇘDŽ᮶✊ CPU ⱘ⹀ӊ㒧ᵘᰃ䖭ḋˈLinux ݙḌгাད᳡Ң Intel ⱘ䗝ᢽDŽ䖭ḋ 䖯㸠↉ᓣ᯴ܜ䯈ҹৢᠡথሩ䍋ᴹⱘDŽ᠔ҹˈϡㅵ⿟ᑣᰃᗢḋݭⱘˈi386 CPU ϔᕟᇍ⿟ᑣЁՓ⫼ⱘഄഔ ㅵ⧚Ꮖ㒣ᄬ೼њⳌᔧ䭓ⱘᯊټㅵ⧚ⱘᬃᣕᰃ೼݊↉ᓣᄬټ⬅Ѣ i386 ㋏߫ⱘग़৆ⓨব䖛⿟ˈᅗᇍ义ᓣᄬ ᰃ⡍⅞ⱘDŽމㅵ⧚DŽԚᰃˈ៥Ӏ೼ࠡ䴶䆆䖛ˈi386 ⱘᚙټㅵ⧚ˈህ᮴䳔ݡᬃᣕ↉ᓣᄬټ✊ᬃᣕ义ᓣᄬ ㅵ⧚᠔㽕∖ⱘ⹀ӊᬃᣕϡৠˈϔ⾡ CPU ᮶ټㅵ⧚Ϣ↉ᓣᄬټᰃᣝ义䖯㸠ˈᬜ⥛ᰒ✊㽕催ᕫ໮DŽ义ᓣᄬ ㅵ⧚Ё߭ټㅵ⧚Ё㽕ᇚᭈϾ↉˄䗮ᐌ䛑ᕜ໻˅䛑ᤶߎˈ㗠೼义ᓣᄬټᤶߎࠄ⺕ⲬϞⱘᯊ׭ˈ೼↉ᓣᄬ 义䴶䛑ᰃ೎ᅮ໻ᇣⱘˈ֓Ѣㅵ⧚DŽ᳈䞡㽕ⱘᰃˈᔧ㽕ᇚϔ䚼ߚ⠽⧚ぎ䯈ⱘݙᆍˈܜ᳝ᕜ໮ད໘DŽ佪⧛ 36 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⦄೼ˈৃҹᴹⳟⳟ CS ⱘݙᆍњDŽݙḌ೼ᓎゟϔϾ䖯⿟ᯊ䛑㽕ᇚ݊↉ᆘᄬ఼䆒㕂ད˄೼䖯⿟ㅵ⧚ϔ Ͼ䖯⿟䛑⫼݊㞾Ꮕⱘ LDTˈ᳔Ԣϸԡ RPL Ў᠔㽕∖ⱘ⡍ᴗ㑻߿ˈ݅ߚ 4 㑻ˈ0 Ў᳔催DŽ гህᰃ䇈ˈᔧ bit2 Ў 0 ᯊ㸼⼎⫼ GDTˈЎ 1 ᯊ㸼⼎⫼ LDTDŽIntel ⱘ䆒䅵ᛣ೒ᰃݙḌ⫼ GDT 㗠৘ ೒ 2.3 ↉ᆘᄬ఼ḐᓣᅮН ⱘḐᓣˈ㾕೒ 2.3DŽ 䞡⏽ϔϟֱᡸ῵ᓣϟ↉ᆘᄬ఼ܜሔ↉ᦣ䗄㸼 GDT 䖬ᰃሔ䚼↉ᦣ䗄㸼 LDT˛䙷ህ㽕ⳟ CS ЁⱘݙᆍњDŽ ᔧࠡؐᴹ԰Ў↉ᓣ᯴ᇘⱘĀ䗝ᢽⷕāˈгህᰃ⫼ᅗ԰Ў೼↉ᦣ䗄㸼ЁⱘϟᷛDŽાϔϾ↉ᦣ䗄㸼ਸ਼˛ᰃܼ CPU ЁⱘĀᣛҸ䅵఼᭄”EIP ᠔ᣛ৥ⱘˈ᠔ҹ೼ҷⷕ↉ЁDŽ಴ℸˈi386 CPUՓ⫼ҷⷕ↉ᆘᄬ఼ CS ⱘ ᰃ↉ᓣ᯴ᇘ䰊↉DŽ⬅Ѣഄഔ 0x08048368 ᰃϔϾ⿟ᑣⱘܹষˈ᳈䞡㽕ⱘᰃ೼ᠻ㸠ⱘ䖛⿟Ёᰃ⬅ܜ佪 ഄ䍄䖛䖭Ͼഄഔⱘ᯴ᇘ䖛⿟DŽ 08048368ā䖭ᴵᣛҸˈ㽕䕀⿏ࠄ㰮ᢳഄഔ 0x08048368 এDŽ᥹ϟএህ䇋䇏㗙㗤ⴔᗻᄤ䎳䱣៥Ӏϔℹϔℹ ᅮ䆹⿟ᑣᏆ㒣೼䖤㸠ˈᭈϾ᯴ᇘᴎࠊ䛑Ꮖ㒣ᓎゟདˈᑊϨ CPU ℷ೼ᠻ㸠 main()Ёⱘ“call؛ ⧚ݙᄬ义䴶DŽ Ѣᔧᯊ᠔ߚ䜡ࠄⱘ⠽އⱘᅲ䰙ԡ㕂߭ህ㽕⬅ݙḌ೼Ў݊ᓎゟݙᄬ᯴ᇘᯊЈᯊ԰ߎᅝᥦˈ݋ԧഄഔ߭প ᘏᰃҢ 0x8000000 ᓔྟᅝᥦ⿟ᑣⱘĀҷⷕ↉āˈᇍ↣Ͼ⿟ᑣ䛑ᰃ䖭ḋDŽ㟇Ѣ⿟ᑣ೼ᠻ㸠ᯊ೼⠽⧚ݙᄬЁ ҢϞ䗄㒧ᵰৃҹⳟࠄˈld 㒭 greeting()ߚ䜡ⱘഄഔЎ 0x08048368DŽ೼ elf Ḑᓣⱘৃᠻ㸠ҷⷕЁˈld 80483a3: 90 nop 80483a2: c3 ret 80483a1: c9 leave 804839c: e8 c7 ff ff ff call 8048368 804839a: 29 c4 sub %eax,%esp 8048397: c1 e0 04 shl $0x4,%eax 8048394: c1 e8 04 shr $0x4,%eax 8048391: 83 c0 0f add $0xf,%eax 804838e: 83 c0 0f add $0xf,%eax 8048389: b8 00 00 00 00 mov $0x0,%eax 8048386: 83 e4 f0 and $0xfffffff0,%esp 8048383: 83 ec 08 sub $0x8,%esp 8048381: 89 e5 mov %esp,%ebp 8048380: 55 push %ebp 08048380
: 804837f: c3 ret 37 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! TI 䛑ᰃ 0ˈгህᰃ䇈ܼ䛑Փ⫼ GDTDŽ䖭ህϢ Intel ⱘ䆒䅵ᛣ೒ϡϔ㟈њDŽᅲ䰙Ϟˈ೼ Linuxˈܜ佪 ϔᇍ✻ህ⏙Ἦњˈ䙷ህᰃ˖ ⿟DŽ⦄೼ˈ៥Ӏᇚ䖭ಯ⾡᭄ؐ⫼Ѡ䖯ࠊሩᓔᑊϢ↉ᆘᄬ఼ⱘḐᓣⳌᇍ✻˖ гህᰃ䇈ˈLinux ݙḌЁাՓ⫼ಯ⾡ϡৠⱘ↉ᆘᄬ఼᭄ؐˈϸ⾡⫼ѢݙḌᴀ䑿ˈϸ⾡⫼Ѣ᠔᳝ⱘ䖯 8 #define __USER_DS 0x2B 7 #define __USER_CS 0x23 6 5 #define __KERNEL_DS 0x18 4 #define __KERNEL_CS 0x10 ==================== include/asm•i386/segment.h 4 8 ==================== ݡᴹⳟⳟ USER_CS ੠ USER_DS ࠄᑩᰃҔМDŽ䙷ᰃ೼ include/asm•i386/segment.h ЁᅮНⱘ˖ Linux ݙḌैᑊϡф䖭Ͼ䋺DŽ೼ Linux ݙḌЁේᷜ↉੠᭄᥂↉ᰃϡߚⱘDŽ ᅗг㹿䆒៤ USER_DSDŽህᰃ䇈ˈ㱑✊ Intel ⱘᛣ೒ᰃᇚϔϾ䖯⿟ⱘ᯴䈵ߚ៤ҷⷕ↉ǃ᭄᥂↉੠ේᷜ↉ˈ 䆒㕂៤ USE_CS ໪ˈ݊ᅗ᠔᳝ⱘ↉ᆘᄬ఼䛑䆒㕂៤ USER_DSDŽ䖭䞠⡍߿ؐᕫ⊼ᛣⱘᰃේᷜᆘᄬ఼ SSˈ 䖭䞠 regs•>xds ᰃ↉ᆘᄬ఼ DS ⱘ᯴䈵ˈԭ㉏᥼DŽ䖭䞠Ꮖ㒣ৃҹⳟࠄϔϾ᳝䍷ⱘџˈህᰃ䰸 CS 㹿 417 } while (0) 416 regs•>esp = new_esp; \ 415 regs•>eip = new_eip; \ 414 regs•>xcs = __USER_CS; \ 413 regs•>xss = __USER_DS; \ 412 regs•>xes = __USER_DS; \ 411 regs•>xds = __USER_DS; \ 410 set_fs(USER_DS); \ 409 __asm__("movl %0,%%fs ; movl %0,%%gs": :"r" (0)); \ 408 #define start_thread(regs, new_eip, new_esp) do { \ ==================== include/asm•i386/processor.h 408 417 ==================== ゴЁ㽕䆆ࠄ䖭Ͼ䯂乬˅ˈ᳝݇ҷⷕ೼ include/asm•i386/processor.h Ё˖ 38 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ·P ԡ䛑ᰃ 1 ——ಯϾ↉䛑೼ݙᄬDŽ ·D ԡ䛑ᰃ 1 ——ᇍಯϾ↉ⱘ䆓䯂䛑ᰃ 32 ԡᣛҸ ·C ԡ䛑ᰃ 1 ——↉䭓ऩԡഛЎ 4KB˗ · L0•L15ǃL16•L19 䛑ᰃ 1 ——↉ⱘϞ䰤ܼᰃ 0xffffff˗ · B0•B15ǃB16•B31 䛑ᰃ 0 ——෎ഄഔܼЎ 0˗ (1) ಯϾ↉ᦣ䗄乍ⱘϟ߫ݙᆍ䛑ᰃⳌৠⱘDŽ 䇏㗙㒧ড়ϟ义೒ 2.4 ↉ᦣ䗄乍ⱘᅮНҨ㒚ᇍ✻ˈৃҹᕫߎབϟ㒧䆎˖ ϟ˖ ఼᭄ؐDŽЎ֓Ѣᇍ✻ˈϟ䴶ݡ⃵㒭ߎ↉ᦣ䗄乍ⱘḐᓣˈৠᯊˈᇚ 4 Ͼ↉ᦣ䗄乍ⱘݙᆍᣝѠ䖯ࠊሩᓔབ ῵ᓣᑊՓ⫼ GDTˈ䖭гᰃ Intel ⱘ㾘ᅮDŽ㄀Ѡ乍гϡ⫼DŽҢϟᷛ 2 㟇 5 ݅ 4 乍ᇍᑨѢࠡ䴶ⱘ 4 ⾡↉ᆘᄬ GDT Ёⱘ㄀ϔ乍˄ϟᷛЎ 0˅ᰃϡ⫼ⱘˈ䖭ᰃЎњ䰆ℶ೼ࡴ⬉ৢ↉ᆘᄬ఼᳾㒣߱ྟ࣪ህ䖯ֱܹᡸ 458 .quad 0x0000000000000000 /* not used */ 457 .quad 0x0000000000000000 /* not used */ 456 .quad 0x00cff2000000ffff /* 0x2b user 4GB data at 0x00000000 */ 455 .quad 0x00cffa000000ffff /* 0x23 user 4GB code at 0x00000000 */ 454 .quad 0x00cf92000000ffff /* 0x18 kernel 4GB data at 0x00000000 */ 453 .quad 0x00cf9a000000ffff /* 0x10 kernel 4GB code at 0x00000000 */ 452 .quad 0x0000000000000000 /* not used */ 451 .quad 0x0000000000000000 /* NULL descriptor */ 450 ENTRY(gdt_table) 449 */ 448 * change anything. 447 * NOTE! Make sure the gdt descriptor in head.S matches this if you 446 * 445 * This contains typically 140 quadwords, depending on NR_CPUS. 444 /* ==================== arch/i386/kernel/head.S 444 458 ==================== ߱ྟⱘ GDT ݙᆍᰃ೼ arch/i386/kernel/head.S ЁᅮНⱘˈ݊Џ㽕ݙᆍ೼䖤㸠Ёᑊϡᬍব˖ Ёᡒᇍᑨⱘ↉ᦣ䗄乍DŽ 䖯⿟䖯ܹ䖤㸠ᯊˈᡞ CS 䆒㕂៤__USER_CSˈे 0x23DŽ᠔ҹˈCPU ҹ 4 ЎϟᷛˈҢܼሔ↉ᦣ䗄㸼 GDT ಲࠄ៥Ӏⱘ⿟ᑣЁDŽ៥Ӏⱘ⿟ᑣᰒ✊ϡሲѢݙḌˈ᠔ҹ೼䖯⿟ⱘ⫼᠋ぎ䯈Ё䖤㸠ˈݙḌ೼䇗ᑺ䆹 ݡⳟ RPLˈা⫼њ 0 ੠ 3 ϸ㑻ˈݙḌЎ 0 㑻㗠⫼᠋˄䖯⿟˅Ў 3 㑻DŽ ᢳ䖤㸠 Windows 䕃ӊ៪ DOS 䕃ӊⱘ⿟ᑣЁᠡՓ⫼DŽ ݙḌЁ෎ᴀϞϡՓ⫼ሔ䚼↉ᦣ䗄㸼 LDTDŽLDT াᰃ೼ VM86 ῵ᓣЁ䖤㸠 wine ҹঞ݊ᅗ೼ Linux Ϟ῵ 39 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⳳℷ䞡㽕ⱘᰃ义ᓣ᯴ᇘ䰊↉ⱘֱᡸᴎࠊDŽ ㅵ⧚ˈ䖭䞠াϡ䖛ᰃ೼ᇍҬᴀᴹህ↿᮴ᖙ㽕ैজ䴲ᕫབℸⱘ՟㸠݀џ㗠ᏆDŽټ೼Ѣᅗ䞛⫼ⱘᰃ义ᓣᄬ ϡ⫼ˈህৃҹᠧ⸈ i386 ⱘ↉ᓣֱᡸᴎࠊ৫˛ᰃⱘˈԚᰃϡ㽕ᖬ䆄ˈLinux ݙḌП᠔ҹ䖭ḋᅝᥦˈॳ಴ 䇏㗙г䆌Ӯ䯂˖བℸ䇈ᴹˈᗔ᳝ᙊᛣⱘ⿟ᑣਬቖϡᰃৃҹ䗮䖛䆒㕂ᆘᄬ఼ CS ੠ DSˈ⫮㟇䖲䖭г ҹˈ䖭䞠 Linux ݙḌাϡ䖛ᰃ㺙῵԰ḋഄ㊞ᓘ i386 CPUˈᇍҬ݊Ẕᶹ↨ᇍ㗠ᏆDŽ ԰↉ᓣ᯴ᇘˈ✊ৢᠡৃҹ԰义ᓣ᯴ᇘˈ䙷ህḍᴀϡ䳔㽕↉ᦣ䗄乍੠↉ᆘᄬ఼њDŽ᠔ܜЁⱘ MMU 㾘ᅮ i386 CPU Ёⱘ MMU 㽕԰䖭ḋⱘẔᶹ↨ᇍˈ䙷ህা㽕ϔϾ↉ᦣ䗄乍ህ໳њDŽ䖯ϔℹˈ㽕ϡᰃ i386 CPU Ẕᶹ↨ᇍ೼义ᓣ᯴ᇘⱘ䖛⿟Ё䖬㽕䖯㸠ˈ᠔ҹ᮶✊⫼њ义ᓣ᯴ᇘˈ䖭䞠ⱘẔᶹ↨ᇍህᰃ໮ԭDŽ㽕ϡᰃ 䆌DŽᅲ䰙Ϟˈ䖭䞠᠔԰ⱘܕ㽕ԢDŽ៪㗙ˈབᵰ↉ᦣ䗄乍䇈ᰃ᭄᥂↉ˈ㗠⿟ᑣЁ䗮䖛 CS ᴹ䆓䯂ˈ䙷гϡ 䆌њˈ಴Ў䙷䇈ᯢ CPU ⱘᔧࠡ䖤㸠㑻߿↨ᛇ㽕䆓䯂ⱘऎ↉ܕ㗠↉ᆘᄬ఼ CS Ёⱘ DPL Ў 3 㑻ˈ䙷ህϡ ㉏ൟˈ៪Ўҷⷕˈ៪Ў᭄᥂DŽ䖭ϸ乍䛑ᰃ CPU ೼᯴ᇘ䖛⿟Ё㽕ࡴҹẔᶹḌᇍⱘDŽབᵰ DPL Ў 0 㑻ˈ ᳝ऎ߿ⱘ݊ᅲা᳝ϸϾഄᮍ˖ϔᰃ DPLˈݙḌЎ᳔催ⱘ 0 㑻ˈ⫼᠋Ў᳔Ԣⱘ 3 㑻˗঺ϔϾᰃ↉ⱘ Ў 0010ˈ㸼⼎᭄᥂↉ˈৃ䇏ˈৃݭˈᇮ᳾ফࠄ䆓䯂DŽ · ᇍ USER_DS˖ेϟᷛЎ 5 ᯊˈDPL=3ˈ㸼⼎ 3 㑻˖S ԡЎ 1ˈ㸼⼎ҷⷕ↉៪᭄᥂↉ˈtype ҷⷕ↉ˈৃ䇏ˈৃᠻ㸠ˈᇮ᳾ফࠄ䆓䯂DŽ · ᇍ USER_CS˖DPL=3ˈ㸼⼎ 3 㑻˖S ԡЎ 1ˈ㸼⼎ҷⷕ↉៪᭄᥂↉˗type Ў 1010ˈ㸼⼎ ⼎᭄᥂↉ˈৃ䇏ˈৃݭˈᇮ᳾ফࠄ䆓䯂DŽ · ᇍ KERNEL_DS˖DPL=0ˈ㸼⼎ 0 㑻˗S ԡЎ 1ˈ㸼⼎ҷⷕ↉៪᭄᥂↉˗type Ў 0010 㸼 ⼎ҷⷕ↉ˈৃ䇏ˈৃᠻ㸠ˈᇮ᳾ফࠄ䆓䯂DŽ · ᇍ KERNEL_CS˖DPL=0ˈ㸼⼎ 0 㑻˗S ԡЎ 1ˈ㸼⼎ҷⷕ↉៪᭄᥂↉˖type Ў 1010ˈ㸼 (2) ᳝ऎ߿ⱘഄᮍাᰃ೼ bit40̚bit46ˈᇍᑨѢᦣ䗄乍Ёⱘ type ҹঞ S ᷛᖫ੠ DPL ԡ↉DŽ ಴ℸˈ䅼䆎៪⧚㾷 Linux ݙḌⱘ义ᓣ᯴ᇘᯊˈৃҹⳈ᥹ᇚ㒓ᗻഄഔᔧ԰㰮ᢳഄഔˈѠ㗙ᅠܼϔ㟈DŽ 㒧䆎˖↣Ͼ↉䛑ᰃҢ 0 ഄഔᓔྟⱘᭈϾ 4GB 㰮ᄬぎ䯈ˈ㰮ഄഔࠄ㒓ᗻഄഔⱘ᯴ᇘֱᣕॳؐϡবDŽ 40 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᯴ᇘᯊ᠔⫼ⱘ߭ᰃ⠽⧚ഄഔDŽ䖭ᰃ೼ inline ߑ᭄ switch_mm() Ёᅠ៤ⱘˈ݊ҷⷕ㾕 Ёপᕫᣛ৥ᔧࠡ义䴶Ⳃᔩⱘᣛ䩜DŽϡ䖛ˈCPU ೼ᠻ㸠⿟ᑣᯊՓ⫼ⱘᰃ㰮ᢳഄഔˈ㗠 MMU ⹀ӊ೼䖯㸠 ⿟䖯ܹ䖤㸠ⱘᯊ׭ˈݙḌ䛑㽕Ўेᇚ䖤㸠ⱘ䖯⿟䆒㕂ད᥻ࠊᆘᄬ఼ CR3ˈ㗠 MMU ⱘ⹀ӊ߭ᘏᰃҢ CR3 䑿ⱘ义䴶Ⳃᔩ PGDˈᣛ৥䖭ϾⳂᔩⱘᣛ䩜ֱᣕ೼↣Ͼ䖯⿟ⱘ mm_struct ᭄᥂㒧ᵘЁDŽ↣ᔧ䇗ᑺϔϾ䖯 Ϣ↉ᓣ᯴ᇘ䖛⿟Ё᠔᳝䖯⿟ܼ䛑݅⫼ϔϾ GDT ϡϔḋˈ⦄೼ৃᰃࡼⳳḐⱘњˈ↣Ͼ䖯⿟䛑᳝݊㞾 ⦄њDŽϟ䴶ᠡ䖯ܹњ义ᓣ᯴ᇘⱘ䖛⿟DŽ ᠔ҹˈLinux ݙḌ䆒䅵ⱘ↉ᓣ᯴ᇘᴎࠊᡞഄഔ 0x08048368 ᯴ᇘࠄњ݊㞾䑿ˈ⦄೼԰Ў㒓ᗻഄഔߎ ೒ 2.4 ↉ᦣ䗄乍ᅮН 41 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ㅵ⧚ⱘ Windows ៪ DOS 䕃ټ䰸ᐌ㾘ⱘ义ᓣ᯴ᇘП໪ˈЎњ㛑೼ Linux ݙḌϞӓⳳ䖤㸠䞛⫼↉ᓣᄬ ⬅⹀ӊᅲ⦄ⱘˈ᠔ҹ䗳ᑺᕜᖿDŽ 㓧ᄬҹৢˈϔ㠀䛑ৃҹ೼催䗳㓧ᄬЁᡒࠄˈ㗠ϡ䳔㽕ݡࠄݙᄬЁএ䇏পњDŽ঺ϔᮍ䴶ˈ䖭ᭈϾ䖛⿟ᰃ ᳝њ催䗳㓧ᄬˈ㱑✊೼㄀ϔ⃵⫼ࠄ݋ԧⱘ义䴶Ⳃᔩ੠义䴶㸼ᯊ㽕ࠄݙᄬЁএ䇏পˈԚϔᮺ㺙ܹњ催䗳 Ѡ⃵ᰃ义䴶㸼ˈ㄀ϝ⃵ᠡᰃ䆓䯂ⳳℷⱘⳂᷛDŽ᠔ҹ㰮ᄬⱘ催ᬜᅲ⦄᳝䌪Ѣ催䗳㓧ᄬ˄cache˅ⱘᅲ⦄DŽ 䇏㗙ৃ㛑Ꮖ㒣⊼ᛣࠄˈ೼义䴶᯴ᇘⱘ䖛⿟Ёˈi386 CPU 㽕䆓䯂ݙᄬϝ⃵DŽ㄀ϔ⃵ᰃ义䴶Ⳃᔩˈ㄀ ೼䖭䞠DŽټgreeting()ⱘᠻ㸠ҷⷕህᄬ ѢݙḌЁⱘࡼᗕߚ䜡˅ˈ䙷М greeting()ܹষⱘ⠽⧚ഄഔህᰃ 0x740368ˈއഔЎ 0x740000 ⱘ䆱˄݋ԧপ ᕫࠄњ᳔㒜ⱘ⠽⧚ݙᄬഄഔDŽ䖭ᯊ䖭Ͼ㒓ᗻഄഔⱘ᳔Ԣ 12 ԡЎ 0x368DŽ᠔ҹˈབᵰⳂᷛ义䴶ⱘ䍋ྟഄ ৥ⱘϡݡᰃϔϾЁ䯈㒧ᵘˈ㗠ᰃ᯴ᇘⱘⳂᷛ义䴶њDŽ೼݊䍋ྟഄഔϞࡴϞ㒓ᗻഄഔЁⱘ᳔Ԣ 12 ԡˈህ ৥ϔϾ⠽⧚ݙᄬ义䴶ˈ೼ৢ䖍⏏Ϟ 12 Ͼ 0 ህᕫࠄ䖭⠽⧚ݙᄬ义䴶ⱘ䍋ྟഄഔDŽ᠔ϡৠⱘᰃˈ䖭ϔ⃵ᣛ 乍ⳌԐˈᔧ义䴶㸼乍ⱘ P ᷛᖫԡЎ 1 ᯊ㸼⼎᠔᯴ᇘⱘ义䴶೼ݙᄬЁDŽ32 ԡⱘ义䴶㸼乍Ёⱘ催 20 ԡᣛ 0001001000ˈेक䖯ࠊⱘ 72DŽѢᰃ CPU ህҹℸЎϟᷛ೼Ꮖ㒣ᡒࠄⱘ义䴶㸼ЁᡒࠄⳌᑨⱘ㸼乍DŽϢⳂᔩ ᡒࠄ义䴶㸼ҹৢˈCPU ݡᴹⳟ㒓ᗻഄഔЁⱘЁ䯈 10 ԡDŽ㒓ᗻഄഔ 0X08048368 ⱘ㄀ѠϾ 10 ԡЎ ЁDŽ ᠡৃҹᡞ 32 ԡⳂᔩ乍ЁⱘԢ 12 ԡ᣾԰ᅗ⫼ˈ݊Ёⱘ᳔ԢԡЎ P ᷛᖫԡˈЎ 1 ᯊ㸼⼎䆹义䴶㸼೼ݙᄬ 义䴶㸼ऴϔϾ义䴶ˈ᠔ҹ㞾✊ህᰃ 4K ᄫ㡖䖍⬠ᇍ唤ⱘˈ݊䍋ྟഄഔⱘԢ 12 ԡϔᅮᰃ 0DŽℷ಴Ўབℸˈ 20 ԡᣛ৥ϔϾ义䴶㸼DŽCPU ೼䖭 20 ԡৢ䖍⏏Ϟ 12 Ͼ 0 ህᕫࠄ䆹义䴶㸼ⱘᣛ䩜DŽҹࠡ៥Ӏ䆆䖛ˈ↣Ͼ ˄⹂ߛഄ䇈ᰃ CPU Ёⱘ MMUˈϟৠ˅ህҹ 32 Ўϟᷛএ义䴶ⳂᔩЁᡒࠄ݊Ⳃᔩ乍DŽ䖭ϾⳂᔩ乍Ёⱘ催 ᇍ✻㒓ᗻഄഔⱘḐᓣˈৃ㾕᳔催 10 ԡЎѠ䖯ࠊⱘ 0000100000ˈгህᰃक䖯ࠊⱘ 32ˈ᠔ҹ i386 CPU 0000 1000 0000 0100 1000 0011 0110 1000 ᇚ㒓ᗻഄഔ 0X08048368 ᣝѠ䖯ࠊሩᓔ˖ܜ៥Ӏ䖭Ͼ䖯⿟ⱘ义䴶ⳂᔩњDŽ ᔧ៥Ӏ೼⿟ᑣЁ㽕䕀⿏ࠄഄഔ 0X08048368 এⱘᯊ׭ˈ䖯⿟ℷ೼䖤㸠ЁˈCR3 ᮽᏆ䆒㕂དˈᣛ৥ ᮺ䖯ܹݙḌህ䖯њ㋏㒳ぎ䯈ˈ䛑᳝Ⳍৠⱘ义䴶᯴ᇘˈ᠔ҹϡӮ᳝䯂乬DŽ Փ⫼ϡৠⱘ义䴶ⳂᔩˈϡӮՓ⿟ᑣⱘᠻ㸠ϡ㛑䖲㓁њ৫˛ㄨḜᰃˈ䖭ᰃ೼ݙḌЁDŽϡㅵҔМ䖯⿟ˈϔ ᄬ఼%%cr3ˈгे CR3DŽ㒚ᖗⱘ䇏㗙ৃ㛑Ӯ䯂˖䖭ḋˈ೼䖭ϔ㸠ҹࠡ੠ҹৢ CR3 ⱘؐϡϔḋˈгህᰃ ៥Ӏҹࠡ᳒⫼䖭㸠ҷⷕ䇈ᯢ__pa()ⱘ⫼䗨ˈ䖭䞠ᇚϟϔϾ䖯⿟ⱘ义䴶Ⳃᔩ PGD ⱘ⠽⧚ഄഔ㺙ܹᆘ 59 } ==================== include/asm•i386/mmu_context.h 59 59 ==================== ...... 44 asm volatile("movl %0,%%cr3": :"r" (__pa(next•>pgd))); ==================== include/asm•i386/mmu_context.h 44 44 ==================== ...... 29 { unsigned cpu) 28 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk, ==================== include/asm•i386/mmu_context.h 28 29 ==================== include/asm•i386/mmu_context.hDŽԚᰃ៥Ӏ೼ℸ݇ᖗⱘাᰃ݊Ё᳔݇䬂ⱘϔ㸠˖ 42 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 㟇བҞˈ䳔㽕ࡴҹ῵ᢳ䖤㸠 DOS 䕃ӊᏆ㒣ᕜᇥњˈ៪㗙ᑆ㛚Ꮖ㒣㒱䗍њDŽ᠔ҹᴀк೼ 80386 ⱘᇏഔᮍ Ў䞛⫼ֱᡸ῵ᓣⱘ㋏㒳˄བ Windowsˈ0S/2 ㄝ˅ᦤկϢᅲ῵ᓣ䕃ӊ˄ᐌᐌᰃ DOS 䕃ӊ˅ⱘݐᆍᗻDŽџ ϧ䮼ᦤկњϔ⾡ᇏഔᮍᓣ VM86ˈ⫼ᴹ೼ֱᡸ῵ᓣϟ῵ᢳ䖤㸠ᅲ῵ᓣ˄real•mode˅ⱘ䕃ӊDŽ݊Ⳃⱘᰃ Ϣ modify_ldt()Ⳍ㉏Ԑˈ䖬᳝ϔϾ㋏㒳䇗⫼ vm86()ˈ⫼ᴹ೼ linux Ϟ῵ᢳ䖤㸠 DOS 䕃ӊDŽi386 CPU 2.2.2 vm86(struct vm86_struct *info) 䅽⫼᠋䖯⿟ᥠᦵׂᬍ义䴶Ⳃᔩ੠义䴶㸼ⱘ᠟↉ˈ㋏㒳ህ䖬ᰃᅝܼⱘDŽ ㅵ⧚ˈা㽕ϡټᄬㅵ⧚ᴎࠊϞᓔњϔϾᇣᇣⱘ㔎ষˈԚ঺ϔᮍ䴶ᅗⱘ㚠ৢҡ✊ᰃ Linux ݙḌⱘ义ᓣᄬ 䗄㸼ˈᅗቖϡህৃ䆒⊩։⢃ࠄ݊ᅗ䖯⿟៪ݙḌⱘぎ䯈Ёএ˛䖭㽕Ңϸᮍ䴶ᴹⳟDŽϔᮍ䴶ᅗ⹂ᅲᰃ೼ݙ 䇏㗙ৃ㛑Ӯ㽕䯂˖䖭ḋቖϡᰃ೼ݙᄬㅵ⧚ᴎࠊϞᣪњϾ⋲˛᮶✊ϔϾ䖯⿟ৃҹᬍবᅗⱘሔ䚼↉ᦣ ԡ↉ЁএⱘؐDŽ ݊Ё entry_number ᰃᛇ㽕ᬍবⱘ㸼乍ⱘᑣোˈेϟᷛDŽ㗠㒧ᵘЁ݊ԭⱘ៤ߚ߭㒭ߎ㽕䆒㕂ࠄ৘Ͼ 25 }; 24 unsigned int useable:1; 23 unsigned int seg_not_present:1; 22 unsigned int limit_in_pages:1; 21 unsigned int read_exec_only:1; 20 unsigned int contents:2; 19 unsigned int seg_32bit:1; 18 unsigned int limit; 17 unsigned long base_addr; 16 unsigned int entry_number; 15 struct modify_ldt_ldt_s { ==================== include/asm•i386/ldt.h 15 25 ==================== include/asm•i386/ldt.h˖ ϔϾ㒧ᵘ modify_ldt_ldt_sDŽ㗠 bytecount ߭Ў sizeof(struct modify_ldt_ldt_s)DŽ䆹᭄᥂㒧ᵘⱘᅮН㾕Ѣ ऎЁDŽᔧ func খ᭄ⱘؐЎ 1 ᯊˈptr ᑨᣛ৥ކ䗄㸼ⱘᅲ䰙໻ᇣˈ㗠㸼ⱘݙᆍህ೼⫼᠋䗮䖛 ptr ᦤկⱘ㓧 modify_ldt()ህᰃ಴ᓔথ WINE ⱘ䳔㽕㗠䆒㕂ⱘDŽᔧ func খ᭄ⱘؐЎ 0 ᯊˈ䆹䇗⫼䖨ಲᴀ䖯⿟ሔ䚼↉ᦣ ϔϾ⦃๗ˈՓᕫ⫼᠋ৃҹ೼Ϟ䴶䖤㸠 Windows ⱘ䕃ӊˈህ៤њϔϾᓔᢧᏖഎⱘВ᥾DŽ㗠㋏㒳䇗⫼ ㄝ˅ˈ㗠೼ Linux Ϟ≵᳝Ⳍৠⱘ䕃ӊᕔᕔ៤њ䆌໮Ҏϡᜓᛣ䕀৥ Linux ⱘॳ಴DŽ᠔ҹˈ೼ Linux Ϟᓎゟ Ϟӓⳳ䖤㸠 Windows ⱘ䕃ӊDŽ໮ᑈᴹˈ᳝ѯ Windows 䕃ӊᏆ㒣ᑓ⊯ഄЎҎӀ᠔᥹ফ੠❳ᙝ˄བ MS Word ໪䖬᳝䆌໮Ͼ乍Ⳃ೼䖯㸠DŽ݊Ё᳝ϔϾি“WINEāˈ݊ৡᄫᴹ㞾“Windows EmulationāˈⳂⱘᰃ೼ Linux 䖭Ͼ㋏㒳䇗⫼ৃҹ⫼ᴹᬍবᔧࠡ䖯⿟ⱘሔ䚼↉ᦣ䗄㸼DŽ೼㞾⬅䕃ӊ෎䞥Ӯ FSF ϟ䴶ˈ䰸 Linux ҹ 2.2.1 modify_ldt(int func, void *ptr, unsigned bytecount) ㅵ⧚᳝݇ⱘ㋏㒳䇗⫼DŽټӊˈ䖬ᦤկњϸϾ⡍⅞ⱘǃϢ↉ᓣᄬ 43 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 52 typedef struct { unsigned long pgprot; } pgprot_t; ==================== include/asm•i386/page.h 52 52 ==================== ঺㸠ᅮНњϔϾ⫼ᴹ䇈ᯢ义䴶ֱᡸⱘ㒧ᵘ pgprot_t˖ 䆓䯂ᴗ䰤DŽ೼ݙḌҷⷕЁᑊ≵᳝೼ pte_t ㄝ㒧ᵘЁᅮН᳝݇ⱘԡ↉ˈ㗠ᰃ೼ include/asm•i386/page.h Ё 䴶䍋ྟഄഔⱘ催 20 ԡজৃҹⳟ԰ᰃ⠽⧚义䴶ⱘᑣোDŽ᠔ҹˈpte_t ЁⱘԢ 12 ԡ⫼Ѣ义䴶ⱘ⢊ᗕֵᙃ੠ Ўᣛ䩜ᅲ䰙Ϟা䳔㽕ᅗⱘ催 20 ԡDŽৠᯊˈ᠔᳝ⱘ⠽⧚义䴶䛑ᰃ䎳 4K ᄫ㡖ⱘ䖍⬠ᇍ唤ⱘˈ಴㗠⠽⧚义 Ҏ䇈 Linux ݙḌⱘҷⷕ਌ᬊњ䴶৥ᇍ䈵ⱘ⿟ᑣ䆒䅵᠟⊩˅DŽԚᰃˈབ៥Ӏҹࠡ䇈䖛ⱘ䙷ḋˈ㸼乍 PTE ԰ ᶹDŽৠᯊˈҷⷕЁজᅮНњ޴Ͼㅔऩⱘߑ᭄ᴹ䆓䯂䖭ѯ㒧ᵘⱘ៤ߚ˗བ pte_val()ǃpgd_val()ㄝ˄䲒ᗾ᳝ long longᭈ᭄DŽП᠔ҹϡⳈ᥹ᅮН៤䭓ᭈ᭄ⱘॳ಴೼Ѣ䖭ḋৃҹ䅽 gcc ೼㓪䆥ᯊࡴҹ᳈ϹḐⱘ㉏ൟẔ ৃ㾕ˈᔧ䞛⫼ 32 ԡഄഔᯊˈpgd_tǃpmd_t ੠ pte_t ᅲ䰙Ϟህᰃ䭓ᭈ᭄ˈ㗠ᔧ䞛⫼ 36 ԡഄഔᯊ߭ᰃ 50 #define PTE_MASK PAGE_MASK 49 #endif 48 #define pte_val(x) ((x).pte_low) 47 typedef struct { unsigned long pgd; } pgd_t; 46 typedef struct { unsigned long pmd; } pmd_t; 45 typedef struct { unsigned long pte_low; } pte_t; 44 #else 43 #define pte_val(x) ((x).pte_low | ((unsigned long long)(x).pte_high << 32)) 42 typedef struct { unsigned long long pgd; } pgd_t; 41 typedef struct { unsigned long long pmd; } pmd_t; 40 typedef struct { unsigned long pte_low, pte_high; } pte_t; 39 #if CONFIG_X86_PAE 38 */ 37 * These are used to make use of C type•checking.. 36 /* ==================== include/asm•i386/page.h 36 50 ==================== 㗠䖭ѯ㸼乍জ䛑ᰃ᭄᥂㒧ᵘˈᅮНѢ include/asm•i386/page.h Ё˖ 义䴶Ⳃᔩ PGDǃЁ䯈Ⳃᔩ PMD ੠义䴶㸼 PT ߚ߿ᰃ⬅㸼乍 pgd_tǃpmd_t ҹঞ pte_t ᵘ៤ⱘ᭄㒘ˈ ⱘ෎ᴀḚᶊDŽ ㅵ⧚᳝݇ⱘݙḌҷⷕЁˈ᳝޴Ͼ᭄᥂㒧ᵘᰃᕜ䞡㽕ⱘˈ䖭ѯ᭄᥂㒧ᵘঞ݊Փ⫼ᵘ៤њҷⷕЁݙᄬㅵ⧚ ⱘㅵ⧚Ꮉ԰ै㽕໡ᴖᕫ໮DŽ೼Ϣݙᄬخ໛Ꮉ԰DŽ㱑✊᳔㒜ⱘⳂⱘᰃഄഔ᯴ᇘˈԚᰃᅲ䰙ϞݙḌ᠔䳔㽕 ޚGDT ੠ሔ䚼↉ᦣ䗄㸼 LDTˈᑊℷ⹂ഄ䆒㕂᳝݇ⱘᆘᄬ఼ˈህᅠ៤њݙᄬㅵ⧚ᴎࠊЁഄഔ᯴ᇘ䚼ߚⱘ ໛ད义䴶Ⳃᔩ PGDǃ义䴶㸼 PT ҹঞܼሔ↉ᦣ䗄㸼ޚҢ⹀ӊⱘ㾦ᑺᴹ䇈ˈLinux ݙḌা㽕㛑Ў⹀ӊ 2.3 ޴Ͼ䞡㽕ⱘ᭄᥂㒧ᵘ੠ߑ᭄ ᰃЎњϢ Windows 䕃ӊ੠ DOS 䕃ӊݐᆍ㗠䞛পⱘᴗᅰП䅵DŽ ㅵ⧚Ḛᶊˈ㗠ټᰒ✊ˈ䖭ϸϾ㋏㒳䇗⫼ҹঞ⬅ℸᅲ⦄ⱘࡳ㛑ᅲ䰙ϞᑊϡሲѢ Linux ݙḌᴀ䑿ⱘᄬ ⑤ҷⷕˈЏ㽕᳝ include/asm•i386/vm86.h ੠ arch/i386/kemel/vm86.cDŽ ᓣϔ㡖Ё⬹এњ VM86 ῵ᓣⱘݙᆍˈ᳝݈䍷ⱘ䇏㗙ৃҹখ✻ Intel ⱘᡔᴃ䌘᭭ˈ㞾㸠䯙䇏ݙḌЁ᳝݇ⱘ 44 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ៤᯴ᇘ㗠ѻ⫳ϔ⃵㔎义ᓖᐌˈℸᯊ㸼乍Ёⱘ݊ᅗݙᆍᇍ MMU ህ≵᳝ӏԩᛣНњDŽ䰸 MMU ⹀ӊḍ᥂ ⱘ义䴶ᰃ৺೼ݙᄬЁDŽা᳝೼ P ᷛᖫԡЎ 1 ⱘᯊ׭ MMU ᠡӮᅠ៤᯴ᇘⱘܼ䖛⿟˗৺߭ህӮ಴ϡ㛑ᅠ Ẕᶹⱘᰃ P ᷛᖫԡˈህᰃϞ䴶ⱘ_PAGE_PRESENTˈᅗᣛ⼎ⴔ᠔᯴ᇘܜ೼᯴ᇘⱘ䖛⿟ЁˈMMU 佪 42 #define set_pte(pteptr, pteval) (*(pteptr) = pteval) ==================== include/asm•i386/pgtable•2level.h 42 42 ==================== include/asm•i386/pgtable•2level.h Ё˖ 䖬᳝ϔϾᐌ⫼ⱘᅣ᪡԰ set_pte()ˈ⫼ᴹᡞϔϾ㸼乍ⱘؐ䆒㕂ࠄϔϾ义䴶㸼乍Ёˈ䖭Ͼᅣ᪡԰ᅮНѢ 㸹Ϟ 12 Ͼ 0 ৢ˅ህᰃ⠽⧚义䴶ⱘ䍋ྟഄഔDŽ Ͼᑣো⫼԰ϟᷛህৃҹҢ mem_map ᡒࠄҷ㸼䖭Ͼ⠽⧚义䴶ⱘ page ᭄᥂㒧ᵘDŽᇍѢ⹀ӊˈ߭˄೼Ԣԡ 䴶㸼乍ⱘ催 20 ԡᇍѢ䕃ӊ੠ MMU ⹀ӊ᳝ⴔϡৠⱘᛣНDŽᇍѢ䕃ӊˈ䖭ᰃϔϾ⠽⧚义䴶ⱘᑣোˈᇚ䖭 ᵘ˅ˈ↣Ͼ page ᭄᥂㒧ᵘҷ㸼ⴔϔϾ⠽⧚义䴶ˈᭈϾ᭄㒘ህҷ㸼ⴔ㋏㒳Ёⱘܼ䚼⠽⧚义䴶DŽ಴ℸˈ义 ݙḌЁ᳝Ͼܼሔ䞣 mem_mapˈᰃϔϾᣛ䩜ˈᣛ৥ϔϾ page ᭄᥂㒧ᵘⱘ᭄㒘˄ϟ䴶Ӯ䅼䆎 page 㒧 58 #define __pte(x) ((pte_t) { (x) } ) 56 #define pgprot_val(x) ((x).pgprot) ==================== include/asm•i386/page.h 56 58 ==================== Ͼᅣ᪡԰ഛᅮНѢ include/asm•i386/page.h Ё˖ ԡ↉Ⳍ៪ˈህᕫࠄњ㸼乍ⱘؐDŽ䖭䞠ᓩ⫼ⱘϸމ⢊ˋ䖭䞠ᇚ义䴶ᑣোᎺ⿏ 12 ԡˈݡϢ义䴶ⱘ᥻ࠊ 61 #define __mk_pte(page_nr,pgprot) __pte(((page_nr) << PAGE_SHIFT) | pgprot_val(pgprot)) ==================== include/asm•i386/pgtable•2level.h 61 61 ==================== ᅣ᪡԰ mk_pte ᅠ៤ⱘ˖ ড়೼ϔ䍋ህᕫࠄᅲ䰙⫼Ѣ义䴶㸼Ёⱘ㸼乍DŽ݋ԧⱘ䅵ㅫᰃ⬅ include/asm•i386/pgtable•2level.h ЁᅮНⱘ ೼ᅲ䰙Փ⫼Ёˈpgprot ⱘ᭄ؐᘏᰃᇣѢ 0x1000ˈ㗠 pte Ёⱘᣛ䩜䚼ߚ߭ᘏᰃ໻Ѣ 0x1000ˈᇚѠ㗙 ᠔ҹᇍ MMU ϡ䍋԰⫼DŽ ⊼ᛣ䖭䞠ⱘ_PAGE_PROTNONE ᇍᑨѢ义䴶㸼乍Ёⱘ bit7ˈ೼ Intel ⱘ᠟ݠЁ䇈䖭ϔԡֱ⬭ϡ⫼ˈ 172 #define _PAGE_PROTNONE 0x080 /* If not present */ 171 170 #define _PAGE_GLOBAL 0x100 /* Global TLB entry PPro+ */ 169 #define _PAGE_PSE 0x080 /* 4 MB (or 2MB) page, Pentium+, if present.. */ 168 #define _PAGE_DIRTY 0x040 167 #define _PAGE_ACCESSED 0x020 166 #define _PAGE_PCD 0x010 165 #define _PAGE_PWT 0x008 164 #define _PAGE_USER 0x004 163 #define _PAGE_RW 0x002 162 #define _PAGE_PRESENT 0x001 ==================== include/asm•i386/pgtable.h 162 172 ==================== ⱘᔧࠡ⢊ᗕ੠䆓䯂ᴗ䰤˄䆺㾕㄀ 1 ゴ˅DŽݙḌҷⷕЁ԰њⳌᑨⱘᅮН˖ খ᭄ pgprot ⱘؐϢ i386 MMU ⱘ义䴶㸼乍ⱘԢ 12 ԡⳌᇍᑨˈ݊Ё 9 ԡᰃᷛᖫԡˈ㸼⼎᠔᯴ᇘ义䴶 45 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 135 struct list_head list; 134 typedef struct page { 133 */ 132 * is used for linear searches (eg. clock algorithm scans). 131 * The first line is data used in page cache lookup, the second line 130 * 129 * beneficial on 32•bit processors. 128 * here (16 bytes or greater). This ordering should be particularly 127 * Try to keep the most commonly accessed fields in single cache lines 126 /* ==================== include/linux/mm.h 126 148 ==================== ҷ㸼⠽⧚义䴶ⱘ page ᭄᥂㒧ᵘᰃ೼᭛ӊ include/linux/mm.h ЁᅮНⱘ˖ 117 #define virt_to_page(kaddr) (mem_map + (__pa(kaddr) >> PAGE_SHIFT)) ==================== include/asm•i386/page.h 117 117 ==================== ℸгᅮНњϔϾᅣ᪡԰˄include/asm•i386/page.h˅˖ ᰃϔḋⱘDŽ೼ݙḌⱘҷⷕЁˈ䖬ᐌᐌ䳔㽕ḍ᥂㰮ᄬഄഔᡒࠄⳌᑨ⠽⧚义䴶ⱘ page ᭄᥂㒧ᵘˈ᠔ҹ䖬Ў ⬅Ѣ mem_map ᰃ page 㒧ᵘᣛ䩜ˈ᪡԰ⱘ㒧ᵰгᰃϾ page 㒧ᵘᣛ䩜ˈmem_map+x Ϣ&mem_map[x] 59 #define pte_page(x) (mem_map+((unsigned long)(((x).pte_low >> PAGE_SHIFT)))) ==================== include/asm•i386/pgtable•2level.h 59 59 ==================== гᅮНњϔϾᅣ᪡԰˄include/asm•i386/pgtable•2level.h˅˖ ᑨഄˈ⫼䖭Ͼϟᷛˈህৃҹ೼Ϟ䗄ⱘ page 㒧ᵘ᭄㒘Ёᡒࠄҷ㸼Ⳃᷛ⠽⧚义䴶ⱘ᭄᥂㒧ᵘDŽҷⷕЁЎℸ ⠽⧚义䴶ⱘĀ᭄㒘āˈ䙷М䖭催 20 ԡ˄ে⿏ 12 ԡҹৢ˅ህᰃ᭄㒘ⱘϟᷛˈгህᰃ⠽⧚义䴶ⱘᑣোDŽⳌ ⠽⧚义䴶ⱘ䍋ྟഄഔᖙ✊ᰃϢ义䴶䖍⬠ᇍ唤ⱘˈ᠔ҹԢ 12 ԡϔᅮᰃ 0DŽབᵰᡞᭈϾ⠽⧚ݙᄬⳟ៤ϔϾ བࠡ᠔䗄ˈᔧ义䴶㸼乍ⱘ P ᷛᖫԡЎ 1 ᯊˈ݊催 20 ԡЎⳌᑨ⠽⧚义䴶䍋ྟഄഔⱘ催 20 ԡˈ⬅Ѣ ᔧ✊ˈ䖭ѯᷛᖫԡা᳝೼ P ᷛᖫԡЎ 1 ᯊᠡ᳝ᛣНDŽ 271 static inline int pte_write(pte_t pte) { return (pte).pte_low & _PAGE_RW; } 270 static inline int pte_young(pte_t pte) { return (pte).pte_low & _PAGE_ACCESSED; } 269 static inline int pte_dirty(pte_t pte) { return (pte).pte_low & _PAGE_DIRTY; } ==================== include/asm•i386/pgtable.h 269 271 ==================== 㒣ᤶߎࠄѸᤶ䆒໛Ϟˈ䆺㾕ৢ䴶ⱘ义䴶Ѹᤶ˅DŽ 㗠བᵰ义䴶㸼乍ϡЎ 0ˈԚ P ᷛᖫԡЎ 0ˈ߭㸼⼎᯴ᇘᏆ㒣ᓎゟˈԚᰃ᠔᯴ᇘⱘ⠽⧚义䴶ϡ೼ݙᄬЁ˄Ꮖ ᇍ䕃ӊᴹ䇈ˈ义䴶㸼乍Ў 0 㸼⼎ᇮ᳾Ў䖭Ͼ㸼乍˄᠔ҷ㸼ⱘ㰮ᄬ义䴶˅ᓎゟ᯴ᇘˈ᠔ҹ䖬ᰃぎⱑ˗ 248 #define pte_present(x) ((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE)) ==================== include/asm•i386/pgtable.h 248 248 ==================== 60 #define pte_none(x) (!(x).pte_low) ==================== include/asm•i386/pgtable•2level.h 60 60 ==================== 㕂义䴶㸼乍DŽݙḌЁ䖬ЎẔ⌟义䴶㸼乍ⱘݙᆍᅮНњϔѯᎹ݋ᗻⱘߑ᭄៪ᅣ᪡԰ˈ݊Ё᳔䞡㽕ⱘ᳝˖ 义䴶㸼乍ⱘݙᆍ䖯㸠义䴶᯴ᇘ໪ˈ䕃ӊгৃҹ䆒㕂៪Ẕ⌟义䴶㸼乍ⱘݙᆍˈϞ䴶ⱘ set_pte()ህᰃ⫼ᴹ䆒 46 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 11 /* ==================== include/linux/mmzone.h 11 58 ==================== ᰃ೼᭛ӊ include/linux/mmzone.h ЁᅮНⱘ˖ гህᰃ䇈᳔໻ⱘ䖲㓁义䴶ഫৃҹ䖒ࠄ 210=1024 Ͼ义䴶ˈे 4M ᄫ㡖DŽ䖭ϸϾ᭄᥂㒧ᵘҹঞ޴Ͼᐌ᭄䛑 2 ⱘ义䴶ഫҹঞ䖲㓁䭓ᑺЎ 4ǃ8ǃ16ǃĂǃⳈ㟇 2MAX_ORDER ⱘ义䴶ഫDŽᐌ᭄ MAX_ORDER ᅮНЎ 10ˈ ᮶㽕᳝ϔϾ䯳߫ᴹֱᣕϔѯ⾏ᬷ˄䖲㓁䭓ᑺЎ 1˅ⱘ⠽⧚义䴶ˈ䖬㽕᳝ϔϾ䯳߫ᴹֱᣕϔϾ䖲㓁䭓ᑺЎ ഄߚ䜡೼⠽⧚ぎ䯈ݙ䖲㓁ⱘ໮Ͼ义䴶ˈ᠔ҹ㽕ᣝഫⱘ໻ᇣߚ߿ࡴҹㅵ⧚DŽ಴ℸˈ೼ㅵ⧚ऎ᭄᥂㒧ᵘЁ 䯈 ā˄ free_area_t˅䯳߫DŽЎҔМᰃĀϔ㒘ā䯳߫ˈ㗠ϡᰃĀϔϾā䯳߫ਸ਼˛䖭гᰃ಴Ўᐌᐌ䳔㽕៤Āഫ” ↣Ͼㅵ⧚ऎ䛑᳝ϔϾ᭄᥂㒧ᵘˈे zone_struct ᭄㒧ᵘDŽ೼ zone_struct ᭄᥂㒧ᵘЁ᳝ϔ㒘Āぎ䯆ऎ ⠽⧚Ϟϡ䖲㓁ⱘ义䴶DŽ᠔ҹˈ⫼Ѣ DMA ⱘ⠽⧚义䴶ᰃ㽕ऩ⣀ࡴҹㅵ⧚ⱘDŽ Ͼ义䴶೼⠽⧚Ϟ䖲㓁ˈ಴Ўℸᯊ DMA ᥻ࠊ఼ϡ㛑ձ䴴೼ CPU ݙ䚼ⱘ MMU ᇚ䖲㓁ⱘ㰮ᄬ义䴶᯴ᇘࠄ ऎ䍙䖛ϔϾ⠽⧚义䴶ⱘ໻ᇣᯊˈህ㽕∖ϸކЎ DMA ϡ㒣䖛 MMU ᦤկⱘഄഔ᯴ᇘˈᔧ DMA ᠔䳔ⱘ㓧 㒓Ϟⱘ໪䆒᥹ষव˅೼䖭ᮍ䴶ᕔᕔ᳝ѯ䰤ࠊˈ㽕∖⫼Ѣ DMA ⱘ⠽⧚ഄഔϡ㛑䖛催DŽ঺ϔᮍ䴶ˈℷ಴ ⱘഄഔ᯴ᇘDŽ䖭ḋˈ໪䚼䆒໛ህ㽕Ⳉ᥹ᦤկ䆓䯂⠽⧚义䴶ⱘഄഔˈৃᰃ᳝ѯ໪䆒˄⡍߿ᰃᦦ೼ ISA ᘏ ঺᳝ѯ CPU 䙷ḋ⬅ϔϾऩ⣀ⱘ MMU ᦤկˈ᠔ҹ DMA ϡ㒣䖛 MMU ᦤկڣ೼ CPU ݙ䚼ᅲ⦄ⱘˈ㗠ϡ ㅵ⧚ⱘ⹀ӊᬃᣕᰃټ⊩䖯㸠义䴶ϢⲬऎⱘѸᤶњDŽℸ໪ˈ䖬᳝ѯ⡍⅞ⱘॳ಴DŽ೼ i386 CPU Ёˈ义ᓣᄬ њˈ䙷ህ᮴ܝDMA Փ⫼ⱘ义䴶ᰃ⺕Ⲭ I/O ᠔ᖙ䳔ⱘˈབᵰᡞҧᑧЁ᠔᳝ⱘ⠽⧚义䴶䛑ߚ䜡ˈܜਸ਼˛佪 ㅵ⧚ऎ ZONE_DMA 䞠ⱘ义䴶ᰃϧկ DMA Փ⫼ⱘDŽЎҔМկ DMA Փ⫼ⱘ义䴶㽕ऩ⣀ࡴҹㅵ⧚ ぎ䯈˅DŽټϝϾㅵ⧚ऎ ZONE_HIGHMEMˈ⫼Ѣ⠽⧚ഄഔ䍙䖛 1GB ⱘᄬ ᑧā䞠ⱘ⠽⧚义䴶ߦߚ៤ ZONE_DMA ੠ ZONE_NORMAL ϸϾㅵ⧚ऎ˄ḍ᥂㋏㒳䜡㕂ˈ䖬ৃ㛑᳝㄀ ⴔ㋏㒳ЁⱘϔϾ⠽⧚义䴶DŽ↣Ͼ⠽⧚义䴶ⱘ page 㒧ᵘ೼䖭Ͼ᭄㒘䞠ⱘϟᷛህᰃ䆹⠽⧚义䴶ⱘᑣোDŽĀҧ ໻ᇣᓎゟ䍋ϔϾ page 㒧ᵘ᭄㒘 mem_mapˈ԰Ў⠽⧚义䴶ⱘĀҧᑧāˈ䞠䴶ⱘ↣Ͼ page ᭄᥂㒧ᵘ䛑ҷ㸼 ㋏㒳Ёⱘ↣ϔϾ⠽⧚义䴶䛑᳝ϔϾ page 㒧ᵘ˄៪ mem_map_t˅DŽ㋏㒳೼߱ྟ࣪ᯊḍ᥂⠽⧚ݙᄬⱘ 㒓˄16 Ͼᄫ㡖˅ЁDŽކⳂⱘᰃሑ䞣Փᕫ㘨㋏㋻ᆚⱘ㢹ᑆ៤ߚ೼ᠻ㸠ᯊ㹿㺙฿ܹ催䗳㓧ᄬⱘৠϔ㓧 ᯊˈ߭ index ᣛᯢњ义䴶ⱘএ৥DŽ㒧ᵘЁ৘Ͼ៤ߚⱘ⃵ᑣᰃ᳝䆆おⱘˈކ䆒໛ϞǃԚ䖬ֱ⬭ⴔݙᆍ԰Ў㓧 ᔧ义䴶ⱘݙᆍᴹ㞾ϔϾ᭛ӊᯊˈindex ҷ㸼ⴔ䆹义䴶೼᭛ӊЁⱘᑣো˗ᔧ义䴶ⱘݙᆍ㹿ᤶߎࠄѸᤶ ݙḌЁ⫼ᴹ㸼⼎䖭Ͼ᭄᥂㒧ᵘⱘব䞣ৡᐌᐌᰃ page ៪ mapDŽ 148 } mem_map_t; 147 struct zone_struct *zone; 146 void *virtual; /* non•NULL if kmapped */ 145 struct buffer_head * buffers; 144 struct page **pprev_hash; 143 wait_queue_head_t wait; 142 unsigned long age; 141 struct list_head lru; 140 unsigned long flags; /* atomic flags, some possibly updated asynchronously */ 139 atomic_t count; 138 struct page *next_hash; 137 unsigned long index; struct address_space *mapping; 136 47 48 12 * Free memory management • zoned buddy allocator. 13 */ 14 15 #define MAX_ORDER 10 16 17 typedef struct free_area_struct { 18 struct list_head free_list; 19 unsigned int *map; 20 } free_area_t; 21 22 struct pglist_data; 23 24 typedef struct zone_struct { 25 /* 26 * Commonly accessed fields: 27 */ 28 spinlock_t lock; 29 unsigned long offset; 30 unsigned long free_pages; 31 unsigned long inactive_clean_pages; 32 unsigned long inactive_dirty_pages; 33 unsigned long pages_min, pages_low, pages_high; 34 35 /* 36 * free areas of different sizes 37 */ 38 struct list_head inactive_clean_list; 39 free_area_t free_area[MAX_ORDER]; 40 41 /* 42 * rarely used fields: 43 */ 44 char *name; 45 unsigned long size; 46 /* 47 * Discontig memory support fields. 48 */ 49 struct pglist_data *zone_pgdat; 50 unsigned long zone_start_paddr; 51 unsigned long zone_start_mapnr; 52 struct page *zone_mem_map; 53 } zone_t; 54 55 #define ZONE_DMA 0 56 #define ZONE_NORMAL 1 57 #define ZONE_HIGHMEM 2 58 #define MAX_NR_ZONES 3 ㅵ⧚ऎ㒧ᵘЁⱘ offset 㸼⼎䆹ߚऎ೼ mem_map Ёⱘ䍋ྟ义䴶োDŽϔᮺᓎゟ䍋ㅵ⧚ऎˈ↣Ͼ⠽⧚义 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 82 struct page *node_mem_map; 81 zonelist_t node_zonelists[NR_GFPINDEX]; 80 zone_t node_zones[MAX_NR_ZONES]; 79 typedef struct pglist_data { ==================== include/linux/mmzone.h 79 90 ==================== 㡖⚍ⱘ pglist_data ᭄᥂㒧ᵘˈᅮНѢ include/linux/mmzone.h Ё˖ټ㸼ⴔᄬ ⱘˈ㗠ᰃҢሲѢ݋ԧⱘ㡖⚍њDŽҢ㗠ˈ೼ zone_struct 㒧ᵘ˄ҹঞ page 㒧ᵘ᭄㒘˅ПϞজ᳝њ঺ϔሖҷ 㡖⚍Ё䛑᳝㟇ᇥϸϾㅵ⧚ऎDŽ㗠Ϩࠡ䗄ⱘ page 㒧ᵘ᭄㒘гϡݡᰃܼሔᗻټ催ሖⱘᴎᵘˈ㗠ᰃ೼↣Ͼᄬ ⬅Ѣ NUMA 㒧ᵘⱘᓩܹˈᇍѢϞ䗄ⱘ⠽⧚义䴶ㅵ⧚ᴎࠊг԰њⳌᑨⱘׂℷDŽㅵ⧚ऎϡݡᰃሲѢ᳔ ৃ䗝乍˅DŽ ⬅Ѣ໮໘⧚఼㒧ᵘⱘ㋏㒳᮹Ⲟᑓ⊯ⱘᑨ⫼ˈLinux ݙḌ 2.4.0 ⠜ᦤկњᇍ NUMA ⱘᬃᣕ˄԰ЎϔϾ㓪䆥 ᯊ⡍߿ࡴҹ⊼ᛣህৃҹњDŽ✊㗠ˈ೼݌ൟⱘ NUMA 㒧ᵘЁህ䳔㽕ᴹ㞾ݙḌЁݙᄬㅵ⧚ᴎࠊⱘᬃᣕњDŽ ఼䛑ᕜᇣˈ᠔ҹᡞᅗӀᬒ೼⡍⅞ⱘഄഔϞ៤ЎᇣᇣⱘĀᄸቯāˈݡ೼㓪⿟ټ䰸ĀЏᄬ”RAM ҹ໪ⱘᄬ ぎ䯈ህࣙᣀњ RAMǃROM˄⫼Ѣ BIOS˅ˈ䖬᳝೒ᔶवϞⱘ䴭ᗕ RAMDŽԚᰃ೼ UMA 㒧ᵘЁˈټ⧚ᄬ џᅲϞˈϹḐᛣНϞⱘ UMA 㒧ᵘ޴Тᰃϡᄬ೼ⱘDŽህᣓ䜡㕂᳔ㅔऩⱘऩ CPU ⱘ PC ᴹ䇈ˈ݊⠽ Ϟᰒ✊㽕དᕫ໮DŽ ϟˈᇚ 4 Ͼ义䴶䛑ߚ䜡೼݀⫼῵ഫމ义䴶ैߚ䜡ࠄњ CPU ῵ഫ 2 Ϟˈ䙷ᰒ✊ᰃϡড়䗖ⱘDŽ೼䖭ḋⱘᚙ 䜡 4 Ͼ⠽⧚义䴶ˈৃᰃ⬅Ѣᴀ῵ഫϞⱘぎ䯈Ꮖ㒣ϡ໳ˈ᠔ҹࠡ 3 Ͼ义䴶ߚ䜡೼ᴀ῵ഫϞˈ㗠᳔ৢϔϾ 䴶ᯊϔ㠀䛑㽕∖ߚ䜡೼䋼ഄⳌৠⱘऎ䯈˄⿄Ў nodeˈेĀ㡖⚍ā˅DŽВ՟ᴹ䇈ˈ㽕ᰃ CPU ῵ഫ 1 㽕∖ߚ ˄Non•Uniform Memory Architecture˅ˈㅔ⿄ NUMADŽ೼ NUMA 㒧ᵘⱘ㋏㒳Ёˈߚ䜡䖲㓁ⱘ㢹ᑆ⠽⧚义 㒧ᵘ”ټぎ䯈㱑✊ഄഔ䖲㓁ˈĀ䋼ഄāैϡϔ㟈ˈ᠔ҹ⿄ЎĀ䴲ഛ䋼ᄬټ೼䖭ḋⱘ㋏㒳Ёˈ݊⠽⧚ᄬ ఼ህ↨䕗᜶ˈ㗠Ϩ䖬䴶Ј಴ৃ㛑ⱘゲѝ㗠ᓩ䍋ⱘϡ⹂ᅮᗻDŽгህᰃ䇈ˈټ῵ഫ៪݊ᅗ CPU ῵ഫϞⱘᄬټ ఼ᰃ䗳ᑺ᳔ᖿⱘˈ㗠こ䖛㋏㒳ᘏ㒓䆓䯂݀⫼ᄬټᰒ✊ˈህᶤϾ⡍ᅮⱘ CPU 㗠㿔ˈ䆓䯂݊ᴀഄⱘᄬ · ᠔᳝䖭ѯ⠽⧚ݙᄬⱘഄഔѦⳌ䖲㓁㗠ᔶ៤ϔϾ䖲㓁ⱘ⠽⧚ഄഔぎ䯈DŽ ῵ഫˈ᠔᳝ⱘ CPU ῵ഫ䛑ৃҹ䗮䖛㋏㒳ᘏ㒓ᴹ䆓䯂ᅗDŽټ· ㋏㒳ᘏ㒓Ϟ䖬䖲᥹ⴔϔϾ݀⫼ⱘᄬ ㋏㒳ᘏ㒓䆓䯂݊ᅗ CPU ῵ഫϞⱘݙᄬDŽ · ᳝໮Ͼ CPU ῵ഫ䖲᥹೼㋏㒳ᘏ㒓Ϟˈ↣Ͼ CPU ῵ഫ䛑᳝ᴀഄⱘ⠽⧚ݙᄬˈԚᰃгৃҹ䗮䖛 · ㋏㒳ⱘЁᖗᰃϔᴵᘏ㒓ˈ՟བ PCI ᘏ㒓DŽ 䆩ᛇ᳝䖭Мϔ⾡㋏㒳㒧ᵘ˖ ぎ䯈೼䖭ᮍ䴶ⱘϔ㟈ᗻै៤њ䯂乬DŽټϔѯᮄⱘ㋏㒳㒧ᵘЁˈ⡍߿ᰃ೼໮ CPU 㒧ᵘⱘ㋏㒳Ёˈ⠽⧚ᄬ 㒧ᵘā˄Uniform Memory Architecture˅ˈㅔ⿄ UMADŽৃᰃˈ೼ټ䳔ⱘᯊ䯈䛑Ⳍৠˈ᠔ҹ⿄ЎĀഛ䋼ᄬ ೼Ӵ㒳ⱘ䅵ㅫᴎ㒧ᵘЁˈᭈϾ⠽⧚ぎ䯈䛑ᰃഛࣔϔ㟈ⱘˈCPU 䆓䯂䖭Ͼぎ䯈ЁⱘӏԩϔϾഄഔ᠔ 䆆䗄ݙḌᗢḋҢᅗⱘҧᑧЁߚ䜡ϔഫ⠽⧚ぎ䯈ˈे㢹ᑆ䖲㓁ⱘ⠽⧚义䴶DŽ 㒧ᵘℷᰃ䗮䖛ᅗ䖯ܹ free_area_struct 㒧ᵘЁⱘঠ৥䫒䯳߫ⱘDŽ೼Ā⠽⧚义䴶ⱘߚ䜡āϔ㡖Ёˈ៥Ӏᇚ next ϸϾᣛ䩜DŽಲࠄϞ䴶ⱘ page 㒧ᵘˈ݊Ёⱘ㄀ϔϾ៤ߚህᰃϔϾ list_head 㒧ᵘˈ⠽⧚义䴶ⱘ page ⱘ᭄᥂㒧ᵘˈlinux ݙḌЁ䳔㽕Փ⫼ঠ৥䫒䯳߫ⱘഄᮍ䛑Փ⫼䖭⾡᭄᥂㒧ᵘDŽ㒧ᵘᕜㅔऩˈህᰃ prev ੠ Ѣ݊ഄഔϔḋDŽぎ䯆ऎ free_area_struct 㒧ᵘЁ⫼ᴹ㓈ᣕঠ৥䫒䯳߫ⱘ㒧ᵘ list_head ᰃϔϾ䗮⫼އ䕪প ϔᐶᓎㄥ⠽ሲѢાϔϾ⌒ߎ᠔ㅵڣѢ义䴶ⱘ䍋ྟഄഔˈህདއ䴶֓∌ЙഄሲѢᶤϔϾㅵ⧚ऎˈ݋ԧপ 49 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Ё˖ 䈵ᰃϔϾ䞡㽕ⱘ᭄᥂㒧ᵘDŽ೼ Linux ݙḌЁˈ䖭ህᰃ vm_area_struct ᭄᥂㒧ᵘˈᅮНѢ include/linux/mm.h ᄬぎ䯈Ёⱘ৘Ͼ䚼ԡজ᳾ᖙᰃ䖲㓁ⱘˈ䗮ᐌᔶ៤㢹ᑆ⾏ᬷⱘ㰮ᄬĀऎ䯈āDŽᕜ㞾✊ഄˈᇍ㰮ᄬぎ䯈ⱘᢑ 䚼ߚᴹ䇈ˈ໻ὖ≵᳝ϔϾ䖯⿟Ӯⳳⱘ䳔㽕Փ⫼ܼ䚼ⱘ 3G ᄫ㡖ⱘぎ䯈DŽৠᯊˈϔϾ䖯⿟᠔䳔㽕Փ⫼ⱘ㰮 ᰃҢĀ䳔āⱘ㾦ᑺᴹㅵ⧚ⱘˈህᰃĀ៥Ӏ䳔㽕⫼㰮ᄬぎ䯈Ёⱘાѯ䚼ߚāDŽᣓ㰮ᄬぎ䯈ЁⱘĀ⫼᠋ぎ䯈” བᵰ䇈⠽⧚ぎ䯈ᰃҢĀկāⱘ㾦ᑺᴹㅵ⧚ⱘˈгህᰃ˖ĀҧᑧЁ䖬᳝ѯҔМā˗߭㰮ᄬぎ䯈ⱘㅵ⧚ ݅ѿⱘDŽҹৢ៥Ӏᇍ䖯⿟ⱘĀ㰮ᄬぎ䯈ā੠Ā⫼᠋ぎ䯈ā䖭ϸϾ䆡ᐌᐌӮϡࡴऎߚDŽ 䖯⿟䛑᳝৘㞾ⱘ㰮ᄬ˄⫼᠋˅ぎ䯈DŽϡ䖛ˈབࠡ᠔䗄ˈ↣Ͼ䖯⿟ⱘĀ㋏㒳ぎ䯈āᰃ㒳ϔЎ᠔᳝䖯⿟᠔ ぎ䯈ⱘㅵ⧚䙷ḋ᳝ϔϾᘏⱘ⠽⧚义䴶ҧᑧˈ㗠ᰃҹ䖯⿟Ў෎⸔ⱘˈ↣Ͼ⧛⠽ڣ⧚DŽ㰮ᄬぎ䯈ⱘㅵ⧚ϡ ࠡ䴶޴Ͼ᭄᥂㒧ᵘ䛑ᰃ⫼Ѣ⠽⧚ぎ䯈ㅵ⧚ⱘˈ⦄೼ᴹⳟⳟ㰮ᢳぎ䯈ⱘㅵ⧚ˈгህᰃ㰮ᄬ义䴶ⱘㅵ ህᰃ䇈ˈ᳔໮ৃҹ㾘ᅮ 256 ⾡ϡৠⱘㄪ⬹DŽ㽕∖ߚ䜡义䴶ᯊˈ㽕䇈ᯢ䞛⫼ાϔ⾡ߚ䜡ㄪ⬹DŽ 76 #define NR_GFPINDEX 0x100 ==================== include/linux/mmzone.h 76 76 ==================== ᰃϔϾ zonelist_t ᭄㒘ˈ᭄㒘ⱘ໻ᇣЎ NR_GFPINDEXˈᅮНЎ˖ 㡖⚍ϡᑨ䆹া᳝ϔ⾡ߚ䜡ㄪ⬹ˈ᠔ҹ೼ pglist_data 㒧ᵘЁᦤկⱘټᅮњϔ⾡ߚ䜡ㄪ⬹DŽ✊㗠ˈ↣Ͼᄬ ㅵ⧚ऎˈ㢹ϡ໳ 4 Ͼ义䴶ህܼ䚼Ң݀⫼῵ഫⱘ ZONE_DMA ㅵ⧚ऎЁߚ䜡DŽህᰃ䇈ˈ↣Ͼ zonelist_t 㾘 䆩ᴀ㡖⚍ˈे CPU ῵ഫ 1 ⱘ ZONE_DMAܜ˖㡖⚍DŽ䖭ḋˈ䩜ᇍϞ䴶᠔Вⱘ՟ᄤህৃҹ㾘ᅮټѢϡৠⱘᄬ 䆩 zones[0]᠔ᣛ৥ⱘㅵ⧚ऎˈབϡ㛑⒵䎇㽕∖ህ䆩 zones[1]᠔ᣛ৥ⱘㅵ⧚ऎˈㄝㄝDŽ䖭ѯㅵ⧚ऎৃҹሲ ܜ㋴ᣝ⡍ᅮⱘ⃵ᑣᣛ৥݋ԧⱘ义䴶ㅵ⧚ऎˈ㸼⼎ߚ䜡义䴶ᯊܗ䖭䞠ⱘ zones[]ᰃϾᣛ䩜᭄㒘ˈ৘Ͼ 74 } zonelist_t; 73 int gfp_mask; 72 zone_t * zones [MAX_NR_ZONES+1]; // NULL delimited 71 typedef struct zonelist_struct { ==================== include/linux/mmzone.h 71 74 ==================== ৠᯊˈজ೼ pglist_data 㒧ᵘ䞠䆒㕂њϔϾ᭄㒘 node_zonelists[]ˈ݊㉏ൟᅮНг೼ৠϔ᭛ӊЁ˖ ᥂㒧ᵘDŽ 义䴶ㅵ⧚ऎDŽড䖛ᴹˈ೼ zone_struct 㒧ᵘЁг᳝ϔϾᣛ䩜 zone_pgdatˈᣛ৥᠔ሲ㡖⚍ⱘ pglist_data ᭄ Ёⱘᣛ䩜 node_mem_map ᣛ৥݋ԧ㡖⚍ⱘ page 㒧ᵘ᭄㒘ˈ㗠᭄㒘 node_zones[]ህᰃ䆹㡖⚍ⱘ᳔໮ϝϾ 㡖⚍ⱘ pglist_data ᭄᥂㒧ᵘৃҹ䗮䖛ᣛ䩜 node_next ᔶ៤ϔϾऩ䫒䯳߫DŽ↣Ͼ㒧ᵘټᰒ✊ˈ㢹ᑆᄬ 90 } pg_data_t; 89 struct pglist_data *node_next; 88 int node_id; 87 unsigned long node_size; 86 unsigned long node_start_mapnr; 85 unsigned long node_start_paddr; 84 struct bootmem_data *bdata; unsigned long *valid_addr_bitmap; 83 50 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 㒭ᅮϔϾ㰮ᢳഄഔ㗠㽕ᡒߎ݊᠔ሲⱘऎ䯈ᰃϔϾ乥㐕⫼ࠄⱘ᪡԰ˈབᵰ↣⃵䛑㽕乎ⴔ vm_next ೼䫒Ё њഄഔⱘ䖛㓁ᗻˈϔϾ䖯⿟ⱘ㰮ᄬ˄⫼᠋˅ぎ䯈ᕜৃ㛑Ӯ㹿ߦߚ៤໻䞣ⱘऎ䯈DŽݙḌЁއᑊϡҙҙপ 䛑㽕ᣝ㰮ᄬഄഔⱘ催Ԣ⃵ᑣ䫒᥹೼ϔ䍋ˈ㒧ᵘЁⱘ vm_next ᣛ䩜ህᰃ⫼Ѣ䖭ϾⳂⱘDŽ⬅Ѣऎ䯈ⱘߦߚ ੠݊ᅗϔѯሲᗻˈ䖭ህᰃ㒧ᵘЁⱘ៤ߚ vm_page_prot ੠ vm_flags ⱘ⫼䗨DŽሲѢৠϔϾ䖯⿟ⱘ᠔᳝ऎ䯈 ᕫ㽕ߚ៤ϸϾऎ䯈DŽ᠔ҹˈࣙ৿೼ৠϔϾऎ䯈䞠ⱘ᠔᳝义䴶䛑ᑨ᳝Ⳍৠⱘ䆓䯂ᴗ䰤˄៪㗙䇈ֱᡸሲᗻ˅ 义䴶ⱘ䆓䯂ᴗ䰤DŽབᵰϔϾഄഔ㣗ೈݙⱘࠡϔञ义䴶੠ৢϔञ义䴶᳝ϡৠⱘ䆓䯂ᴗ䰤៪݊ᅗሲӊˈህ Ѣऎ䯈ⱘ݊ᅗሲᗻˈЏ㽕ᰃᇍ㰮ᄬއѢഄഔⱘ䖲㓁ᗻˈгপއࣙ৿೼ऎ䯈ݙDŽऎ䯈ⱘߦߚᑊϡҙҙপ ᅮњϔϾ㰮ᄬऎ䯈DŽvm_start ᰃࣙ৿೼ऎ䯈ݙⱘˈ㗠 vm_end ߭ϡއ 㒧ᵘЁⱘ vm_start ੠ vm_end ೼ݙḌⱘҷⷕЁˈ⫼Ѣ䖭Ͼ᭄᥂㒧ᵘⱘব䞣ৡᐌᐌᰃ vmaDŽ 69 }; 68 void * vm_private_data; /* was vm_pte (shared mem) */ 67 unsigned long vm_raend; 66 struct file * vm_file; 65 unsigned long vm_pgoff; /* offset in PAGE_SIZE units, *not* PAGE_CACHE_SIZE */ 64 struct vm_operations_struct * vm_ops; 63 62 struct vm_area_struct **vm_pprev_share; 61 struct vm_area_struct *vm_next_share; 60 */ 59 * for shm areas, the list of attaches, otherwise unused. 58 * one of the address_space•>i_mmap{,shared} lists, 57 /* For areas with an address space and backing store, 56 55 struct vm_area_struct * vm_avl_right; 54 struct vm_area_struct * vm_avl_left; 53 short vm_avl_height; 52 /* AVL tree of VM areas per task, sorted by address */ 51 50 unsigned long vm_flags; 49 pgprot_t vm_page_prot; 48 47 struct vm_area_struct *vm_next; 46 /* linked list of VM areas per task, sorted by address */ 45 44 unsigned long vm_end; 43 unsigned long vm_start; 42 struct mm_struct * vm_mm; /* VM area parameters */ 41 struct vm_area_struct { 40 */ 39 * library, the executable area etc). 38 * space that has a special rule for the page•fault handlers (ie a shared 37 * per VM•area/task. A VM area is any part of the process virtual memory 36 * This struct defines a memory VMM memory area. There is one of these 35 /* include/linux/mm.h 35 69 ==================== ==================== 51 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 207 pgd_t * pgd; 206 struct vm_area_struct * mmap_cache; /* last find_vma result */ 205 struct vm_area_struct * mmap_avl; /* tree of VMAs */ 204 struct vm_area_struct * mmap; /* list of VMAs */ 203 struct mm_struct { ==================== include/linux/sched.h 203 227 ==================== include/linux/sched.h ЁᅮНⱘ˖ ᳔ৢˈvm_area_struct Ё䖬᳝ϔϾᣛ䩜 vm_mmˈ䆹ᣛ䩜ᣛ৥ϔϾ mm_struct ᭄᥂㒧ᵘˈ䙷ᰃ೼ ⫼ⱘߑ᭄DŽ nopage ᣛ⼎ᔧ಴˄㰮ᄬ˅义䴶ϡ೼ݙᄬЁ㗠ᓩ䍋Ā义䴶ߎ䫭ā˄page fault˅ᓖᐌ˄㾕㄀ 3 ゴ˅ᯊ᠔ᑨ䇗 ҔМ㽕᳝䖭ѯߑ᭄ਸ਼˛䖭ᰃ಴ЎᇍѢϡৠⱘ㰮ᄬऎ䯈ৃ㛑Ӯ䳔㽕ϔѯϡৠⱘ䰘ࡴ᪡԰DŽߑ᭄ᣛ䩜 㒧ᵘЁܼᰃߑ᭄ᣛ䩜DŽ݊Ё openǃcloseǃnopage ߚ߿⫼Ѣ㰮ᄬऎ䯈ⱘᠧᓔǃ݇䯁੠ᓎゟ᯴ᇘDŽЎ 124 }; 123 struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int write_access); 122 void (*close)(struct vm_area_struct * area); 121 void (*open)(struct vm_area_struct * area); 120 struct vm_operations_struct { 119 */ 118 * to the functions called when a no•page or a wp•page exception occurs. 117 * unmapping it (needed to keep files on disk up•to•date etc), pointer 116 * These are the virtual MM functions • opening of an area, closing and 115 /* ==================== include/linux/mm.h 115 124 ==================== 䖭⾡᭄᥂㒧ᵘгᰃ೼ include/linux/mm.h ЁᅮНⱘ˖ 㰮ᄬऎ䯈㒧ᵘЁ঺ϔϾ䞡㽕ⱘ៤ߚᰃ vm_opsˈ䖭ᰃᣛ৥ϔϾ vm_operations_struct ᭄᥂㒧ᵘⱘᣛ䩜DŽ ೼ҹৢ㒧ড়݋ԧⱘᚙ᱃ҟ㒡䖭ѯ៤ߚⱘՓ⫼DŽ ៤ߚˈབ mappingǃvm_next_shareǃvm_pprev_shareǃvm_file ㄝˈ⫼ҹ䆄ᔩ੠ㅵ⧚ℸ⾡㘨㋏DŽ៥Ӏᇚ ⬅Ѣ㰮ᄬऎ䯈˄᳔㒜ᰃ义䴶˅Ϣ⺕Ⲭ᭛ӊⱘ䖭⾡㘨㋏ˈ೼ vm_area_struct 㒧ᵘЁⳌᑨഄ䆒㕂њϔѯ ϔϾᄫヺ᭄㒘䙷ḋᴹ䆓䯂䖭Ͼ᭛ӊⱘݙᆍˈ㗠ϡᖙ䗮䖛 lseek()ǃread()៪ write()ㄝ䖯㸠᭛ӊ᪡԰DŽ 䆓䯂ݙᄬЁڣR4.2 ᓔྟⱘ˅ˈՓϔϾ䖯⿟ৃҹᇚϔϾᏆ㒣ᠧᓔⱘ᭛ӊ᯴ᇘࠄ݊⫼᠋ぎ䯈Ёˈℸৢህৃҹ ⺕Ⲭ᭛ӊ᯴ᇘࠄϔϾ䖯⿟ⱘ⫼᠋ぎ䯈ЁDŽLinux ᦤկњϔϾ㋏㒳䇗⫼ mmap()˄ᅲ䰙ϞᰃҢ Unix Sys V ߭ᰃᇚϔϾމህᰃ໻ᆊ᠔ⶹ䘧ⱘϔ㠀ᛣНϞⱘĀᣝ䳔䇗ᑺā义ᓣ㰮ᄬㅵ⧚˄demand paging˅DŽ঺ϔ⾡ᚙ ϡ໳ߚ䜡ᯊˈϔѯЙ᳾Փ⫼ⱘ义䴶ৃҹ㹿Ѹᤶࠄ⺕ⲬϞএˈ㝒ߎ⠽⧚义䴶ҹկ᳈ᗹ䳔ⱘ䖯⿟Փ⫼ˈ䖭 ϟ㰮ᄬ义䴶˄៪ऎ䯈˅Ӯ䎳⺕Ⲭ᭛ӊথ⫳݇㋏DŽϔ⾡ᰃⲬऎѸᤶ˄swap˅ˈᔧݙᄬ义䴶މ೼ϸ⾡ᚙ ԡ㕂ⱘDŽ vm_avl_heightǃvm_avl_left ҹঞ vm_avl_right ϝϾ៤ߚህᰃ⫼Ѣ AVL ᷥˈ㸼⼎ᴀऎ䯈೼ AVL ᷥЁⱘ ᰃ O(lg n)ˈेϢᷥⱘ໻ᇣⱘᇍ᭄˄㗠ϡᰃᷥⱘ໻ᇣ˅៤↨՟DŽ㰮ᄬऎ䯈㒧ᵘ vm_area_struct Ёⱘ ᷥᰃϔ⾡ᑇ㸵ⱘᷥ㒧ᵘˈ䇏㗙Ң᳝݇ⱘ᭄᥂㒧ᵘϧ㨫Ёৃҹњ㾷ࠄˈ೼ AVL ᷥЁ᧰㋶ⱘ䗳ᑺᖿ㗠ҷӋ Ͼ㒓ᗻ䯳߫ҹ໪ˈ䖬ৃҹ೼ऎ䯈᭄䞣䕗໻ᯊЎПᓎゟϔϾ AVL˄Adelson•Velskii and Landis˅ᷥDŽAVL ԰㒓ᗻ᧰㋶ⱘ䆱ˈ࢓ᖙӮᰒ㨫ഄᕅડࠄݙḌⱘᬜ⥛DŽ᠔ҹˈ䰸њ䗮䖛 vm_next ᣛ䩜ᡞ᠔᳝ऎ䯈І៤ϔ 52 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ේҹঞේᷜ↉ⱘ䍋⚍੠㒜⚍ˈ䖭䞠ህϡ໮䇈њDŽ⊼ᛣˈϡ㽕ᡞ䖯⿟ټ䖯⿟᯴䈵Ёҷⷕ↉ǃ᭄᥂↉ǃᄬ 㒧ᵘЁ݊ᅗ៤ߚⱘ⫼䗨↨䕗ᰒ㗠ᯧ㾕ˈབ start_codeǃend_codeǃstart_dataǃend_data ㄝㄝህᰃ䆹 ೼ VM86 ῵ᓣϟᠡӮ᳝ LDTDŽ ᣛ䩜 segment ᣛ৥䆹䖯⿟ⱘሔ䚼↉ᦣ䗄㸼 LDTDŽϾ䖛ˈϔ㠀ⱘ䖯⿟ᰃϡ⫼ሔ䚼↉ᦣ䗄㸼ⱘˈা᳝ 䆌಴Ёᮁ៪݊ᅗॳ಴㗠ফࠄᑆᡄDŽܕህᰃϡ mm_countDŽ㉏ൟ atomic_t ᅲ䰙Ϟህᰃᭈ᭄ˈԚᰃᇍ䖭⾡㉏ൟⱘᭈ᭄䖯㸠ⱘ᪡԰ᖙ乏ᰃĀॳᄤāⱘˈг Ϣ⠊䖯⿟݅ѿϔϾ mm_struct 㒧ᵘDŽ᠔ҹˈ೼ mm_struct 㒧ᵘЁ䖬Ўℸ䆒㕂њ䅵఼᭄ mm_users ੠ ᳔ㅔऩⱘ՟ᄤህᰃˈᔧϔϾ䖯⿟߯ᓎ˄vfork()៪ clone()ˈ㾕㄀ 4 ゴ˅ϔϾᄤ䖯⿟ᯊˈ݊ᄤ䖯⿟ህৃ㛑 㱑✊ϔϾ䖯⿟াՓ⫼ϔϾ mm_struct 㒧ᵘˈড䖛ᴹϔϾ mm_struct 㒧ᵘैৃ㛑Ў໮Ͼ䖯⿟᠔݅ѿDŽ ˄semaphore˅ˈे mmap_semDŽℸ໪ˈpage_table_lock гᰃЎ㉏ԐⱘⳂⱘ㗠䆒㕂ⱘDŽ ৠⱘϞϟ᭛Ёফࠄ䆓䯂ˈ㗠䖭ѯ䆓䯂জᖙ乏Ѧ᭹ˈ᠔ҹ೼㒧ᵘЁ䆒㕂њ⫼Ѣ PǃV᪡԰ⱘֵো䞣 䖭೼ࠡ䴶Ꮖ㒣ⳟࠄ䖛њDŽ঺ϔᮍ䴶ˈ⬅Ѣ mm_struct 㒧ᵘঞ݊ϟሲⱘ vm_area_struct 㒧ᵘ䛑᳝ৃ㛑೼ϡ 䴶ⳂᔩⱘˈᔧݙḌ䇗ᑺϔϾ䖯⿟䖯ܹ䖤㸠ᯊˈህᇚ䖭Ͼᣛ䩜䕀ᤶ៤⠽⧚ഄഔˈᑊݭܹ᥻ࠊᆘᄬ఼ CR3ˈ ᷥЁ˅᳝޴Ͼ㰮ᄬऎ䯈㒧ᵘˈгህᰃ䇈䆹䖯⿟᳝޴Ͼ㰮ᄬऎ䯈DŽᣛ䩜 pgd ᰒ㗠ᯧ㾕ᰃᣛ৥䆹䖯⿟ⱘ义 ህᰃϟϔ⃵㽕⫼ࠄⱘऎ䯈ˈ䖭ḋህৃҹᦤ催ᬜ⥛DŽ঺ϔϾ៤ߚ map_countˈ߭䇈ᯢ೼䯳߫Ё˄៪ AVL ⫼ࠄⱘ䙷Ͼ㰮ᄬऎ䯈㒧ᵘ˗䖭ᰃ಴Ў⿟ᑣЁ⫼ࠄⱘഄഔᐌᐌᏺ᳝ሔ䚼ᗻˈ᳔䖥ϔ⃵⫼ࠄⱘऎ䯈ᕜৃ㛑 ᓎゟϔϾ㰮ᄬऎ䯈㒧ᵘⱘ AVL ᷥˈ䖭೼ࠡ䴶Ꮖ㒣䇜䖛DŽ㄀ϝϾᣛ䩜 mmap_cacheˈ⫼ᴹᣛ৥᳔䖥ϔ⃵ ᰃ݇Ѣ㰮ᄬऎ䯈ⱘDŽ㄀ϔϾ mmap ⫼ᴹᓎゟϔϾ㰮ᄬऎ䯈㒧ᵘⱘऩ䫒㒓ᗻ䯳߫DŽ㄀ѠϾ mmap_avl ⫼ᴹ ৃҹ䇈ˈmm_struct ᭄᥂㒧ᵘᰃ䖯⿟ᭈϾ⫼᠋ぎ䯈ⱘᢑ䈵ˈгᰃᘏⱘ᥻ࠊ㒧ᵘDŽ㒧ᵘЁⱘ༈ϝϾᣛ䩜䛑 㒧ᵘˈ೼↣Ͼ䖯⿟ⱘĀ䖯⿟᥻ࠊഫāˈे task_struct 㒧ᵘЁˈ᳝ϔϾᣛ䩜ᣛ৥䆹䖯⿟ⱘ mm_struct 㒧ᵘDŽ ᰒ✊ˈ䖭ᰃ೼↨ vm_area_struct ᳈催ሖ⃵ϞՓ⫼ⱘ᭄᥂㒧ᵘDŽџᅲϞˈ↣Ͼ䖯⿟া᳝ϔϾ mm_struct ೼ݙḌⱘҷⷕЁˈ⫼Ѣ䖭Ͼ᭄᥂㒧ᵘ˄ᣛ䩜˅ⱘব䞣ৡᐌᐌᰃ mmDŽ 227 }; 226 mm_context_t context; 225 /* Architecture•specific MM context */ 224 223 unsigned long swap_address; 222 unsigned long swap_cnt; /* number of pages to swap on next pass */ 221 unsigned long cpu_vm_mask; 220 unsigned long def_flags; 219 unsigned long rss, total_vm, locked_vm; 218 unsigned long arg_start, arg_end, env_start, env_end; 217 unsigned long start_brk, brk, start_stack; 216 unsigned long start_code, end_code, start_data, end_data; 215 214 struct list_head mmlist; /* List of all active mm's */ 213 212 spinlock_t page_table_lock; 211 struct semaphore mmap_sem; 210 int map_count; /* number of VMAs */ 209 atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */ atomic_t mm_users; /* How many users with user space? */ 208 53 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 410 /* Check the cache first. */ 409 if (mm) { 408 407 struct vm_area_struct *vma = NULL; 406 { 405 struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long addr) 404 /* Look up the first VMA which satisfies addr < vm_end, NULL if none. */ ==================== mm/mmap.c 404 440 ==================== ሲⱘऎ䯈ҹঞⳌᑨⱘ vm_area_struct 㒧ᵘDŽ䖭ᰃ⬅ find_vma()ᴹᅲ⦄ⱘˈ݊ҷⷕ೼ mm/mmap.c Ё˖ ࠡ䴶䆆䖛ˈ೼ݙḌЁ㒣ᐌ㽕⫼ࠄ䖭ḋⱘ᪡԰˖㒭ᅮϔϾሲѢᶤϾ䖯⿟ⱘ㰮ᢳഄഔˈ㽕∖ᡒࠄ݊᠔ ೒ 2.5 㰮ᄬㅵ⧚᭄᥂㒧ᵘ㘨㋏೒ ೒ 2.5 ᰃϾ⼎ᛣ೒ˈ೒Ё䇈ᯢњ⫼Ѣ䖯⿟㰮ᄬㅵ⧚ⱘ৘⾡᭄᥂㒧ᵘП䯈ⱘ㘨㋏DŽ њᇍ义䴶ⱘկᑨ˗㗠义䴶ⳂᔩǃЁ䯈Ⳃᔩҹঞ义䴶㸼߭ᰃѠ㗙Ё䯈ⱘḹṕDŽ ϾᛣНϞˈmm_struct ੠ vm_area_struct 䇈ᯢњᇍ义䴶ⱘ䳔∖˗ࠡ䴶ⱘ pageǃzone_struct ㄝ㒧ᵘ߭䇈ᯢ ᐌ˄г⿄㔎义ᓖᐌǃ㔎义Ёᮁ˅ˈ䙷ᯊ׭ Page Fault ᓖᐌⱘ᳡ࡵ⿟ᑣህӮᴹ໘⧚䖭Ͼ䯂乬DŽ᠔ҹˈҢ䖭 䴶ˈ᳈ϡֱ䆕䆹义䴶ህ೼ݙᄬЁDŽᔧϔϾ᳾㒣᯴ᇘⱘ义䴶ফࠄ䆓䯂ᯊˈህӮѻ⫳ϔϾ“Page Faultāᓖ ᢳഄഔ᳝Ⳍᑨⱘ㰮ᄬऎ䯈ᄬ೼ˈᑊϡֱ䆕䆹ഄഔ᠔೼ⱘ义䴶Ꮖ㒣᯴ᇘࠄᶤϔϾ⠽⧚˄ݙᄬ៪Ⲭऎ˅义 བࠡ᠔䗄ˈmm_struct 㒧ᵘঞ݊ሲϟⱘ৘Ͼ vm_area_struct াᰃ㸼ᯢњᇍ㰮ᄬぎ䯈ⱘ䳔∖DŽϔϾ㰮 ㅵ⧚āЁⱘĀ↉āⳌ⏋⎚DŽټ᯴䈵Ёⱘ䖭ѯĀ↉ā䎳Ā↉ᓣᄬ 54 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 965 __insert_vm_struct(mm, vmp); 964 spin_lock(¤t•>mm•>page_table_lock); 963 lock_vma_mappings(vmp); 962 { 961 void insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vmp) ==================== mm/mmap.c 961 968 ==================== ҷⷕ೼ৠϔ᭛ӊЁ˖ 䇗⫼ insert_vm_struct()ᇚ݊ᦦܹࠄ mm_struct Ёⱘ㒓ᗻ䯳߫៪ AVL ᷥЁএDŽߑ᭄ insert_vm_struct()ⱘ⑤ Ў䳊˄NULL˅ˈ㸼⼎䆹ഄഔ᠔ሲⱘऎ䯈䖬᳾ᓎゟDŽℸᯊ䗮ᐌህᕫ㽕ᓎゟ䍋ϔϾᮄⱘ㰮ᄬऎ䯈㒧ᵘˈݡ ᳔ৢˈབᵰᡒࠄⱘ䆱ˈህᡞ mmap_cache ᣛ䩜䆒㕂៤ᣛ৥᠔ᡒࠄⱘ vm_area_struct 㒧ᵘDŽߑ᭄ⱘ䖨ಲؐ བᵰᏆ㒣ᓎゟ䖛 AVL 㒧ᵘ˄ᣛ䩜 mmap_avl 䴲䳊˅ˈህ೼ AVL ᷥЁ᧰㋶ˈ৺߭ህ೼㒓ᗻ䯳߫Ё᧰㋶DŽ 䖭гℷᰃ೼ mm_struct 㒧ᵘЁ䆒㕂ϔϾ mmap_cache ᣛ䩜ⱘॳ಴DŽབᵰ≵᳝ੑЁⱘ䆱ˈ䙷ህ㽕᧰㋶њDŽ ᙄད೼Ϟϔ⃵˄᳔䖥ϔ⃵˅䆓䯂䖛ⱘৠϔϾऎ䯈ЁDŽḍ᥂ҷⷕ԰㗙᠔ࡴⱘ⊼䞞ˈੑЁ⥛ϔ㠀ৃ䖒 35%ˈ ⳟϔϟ䖭ഄഔᰃ৺ܜߑ᭄ⱘখ᭄᳝ϸϾˈϔϾᰃഄഔˈϔϾᰃᣛ৥䆹䖯⿟ⱘ mm_struct 㒧ᵘⱘᣛ䩜DŽ佪 ᔧ៥Ӏ䇈ࠄϔϾ⡍ᅮⱘ⫼᠋ぎ䯈㰮ᢳഄഔᯊˈᖙ乏䇈ᯢᰃાϔϾ䖯⿟ⱘ㰮ᄬぎ䯈Ёⱘഄഔˈ᠔ҹ 440 } 439 return vma; 438 } 437 } 436 mm•>mmap_cache = vma; 435 if (vma) 434 } 433 } 432 tree = tree•>vm_avl_right; 431 } else 430 tree = tree•>vm_avl_left; 429 break; 428 if (tree•>vm_start <= addr) 427 vma = tree; 426 if (tree•>vm_end > addr) { 425 break; 424 if (tree == vm_avl_empty) 423 for (;;) { 422 vma = NULL; 421 struct vm_area_struct * tree = mm•>mmap_avl; 420 /* Then go through the AVL tree quickly. */ 419 } else { 418 vma = vma•>vm_next; 417 while (vma && vma•>vm_end <= addr) 416 vma = mm•>mmap; 415 /* Go through the linear list. */ 414 if (!mm•>mmap_avl) { 413 if (!(vma && vma•>vm_end > addr && vma•>vm_start <= addr)) { 412 vma = mm•>mmap_cache; Cache hit rate is typically around 35%.) */) */ 411 55 56 966 spin_unlock(¤t•>mm•>page_table_lock); 967 unlock_vma_mappings(vmp); 968 } ᇚϔϾ vm_area_struct ᭄᥂㒧ᵘᦦܹ䯳߫ⱘ᪡԰ᅲ䰙ᰃ⬅__insert_vm_struct()ᅠ៤ⱘˈԚᰃ䖭Ͼ᪡ ԰㒱ϡܕ䆌ফࠄᑆᡄˈ᠔ҹ㽕ᇍ᪡԰ࡴ䫕DŽ䖭䞠ࡴњϸᡞ䫕DŽ㄀ϔᡞࡴ೼ҷ㸼ᮄऎ䯈ⱘ vm_area_struct ᭄᥂㒧ᵘЁˈ㄀Ѡᡞࡴ೼ҷ㸼ⴔᭈϾ㰮ᄬぎ䯈ⱘ mm_struct ᭄᥂㒧ᵘЁˈՓᕫ೼᪡԰䖛⿟Ёϡ䅽݊ᅗ䖯 ⿟㛑໳೼Ё䗨ᦦ䖯ᴹˈгᇍ䖭ϸϾ᭄᥂㒧ᵘ䖯㸠䯳߫᪡԰DŽϟ䴶ᰃ__insert_vm_struct()ⱘЏԧˈ៥Ӏ⬹ এњϢ᭛ӊ᯴ᇘ᳝݇ⱘ䚼ߚҷⷕDŽ⬅ѢϢ find_vma()ᕜⳌԐˈ䖭䞠ህϡࡴ䇈ᯢњˈ⬭㒭䇏㗙㞾㸠䯙䇏DŽ ᇍ AVL 㔎Уњ㾷ⱘ䇏㗙ৃҹা䯙䇏ϡ䞛⫼ AVL ᷥˈे mm•>mmap_avl Ў 0 ⱘ䙷ϔ䚼ߚҷⷕDŽ ==================== mm/mmap.c 913 939 ==================== 913 /* Insert vm structure into process list sorted by address 914 * and into the inode's i_mmap ring. If vm_file is non•NULL 915 * then the i_shared_lock must be held here. 916 */ 917 void __insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vmp) 918 { 919 struct vm_area_struct **pprev; 920 struct file * file; 921 922 if (!mm•>mmap_avl) { 923 pprev = &mm•>mmap; 924 while (*pprev && (*pprev)•>vm_start <= vmp•>vm_start) 925 pprev = &(*pprev)•>vm_next; 926 } else { 927 struct vm_area_struct *prev, *next; 928 avl_insert_neighbours(vmp, &mm•>mmap_avl, &prev, &next); 929 pprev = (prev ? &prev•>vm_next : &mm•>mmap); 930 if (*pprev != next) 931 printk("insert_vm_struct: tree inconsistent with list\n"); 932 } 933 vmp•>vm_next = *pprev; 934 *pprev = vmp; 935 936 mm•>map_count++; 937 if (mm•>map_count >= AVL_MIN_MAP_COUNT && !mm•>mmap_avl) 938 build_mmap_avl(mm); 939 ...... ==================== mm/mmap.c 959 959 ==================== 959 } ᔧϔϾ㰮ᄬぎ䯈Ёऎ䯈ⱘ᭄䞣䕗ᇣᯊˈ೼㒓ᗻ䯳߫Ё᧰㋶ⱘᬜ⥛ᑊϡ៤Ў䯂乬ˈ᠔ҹϡ䳔㽕ЎП ᓎゟ AVL ᷥDŽ㗠ᔧऎ䯈ⱘ᭄䞣๲໻ࠄ AVL_MIN_MAP_COUNTˈे 32 ᯊˈህ䳔㽕䗮䖛 build_mmap_avl() ᓎゟ AVL ᷥˈҹᦤ催᧰㋶ᬜ⥛њDŽ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 124 * 'reference' page table is init_mm.pgd. 123 * We fault•in kernel•space virtual memory on•demand. The 122 /* 121 120 tsk = current; 119 118 __asm__("movl %%cr2,%0":"=r" (address)); 117 /* get the address */ 116 115 siginfo_t info; 114 int write; 113 unsigned long fixup; 112 unsigned long page; 111 unsigned long address; 110 struct vm_area_struct * vma; 109 struct mm_struct *mm; 108 struct task_struct *tsk; 107 { 106 asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code) ==================== arch/i386/mm/fault.c 106 152 ==================== ᴹⳟᓔ༈޴㸠ҷⷕ˖ܜ᱃ⱘ䖯ሩᣝ䳔㽕ᴹሩ⼎᳝݊݇ⱘ⠛ᮁDŽ䖭䞠 ߑ᭄ do_page_fault()ⱘҷⷕ೼᭛ӊ arch/i386/mm/fault.c ЁDŽ䖭Ͼߑ᭄ⱘҷⷕ↨䕗䭓ˈ៥Ӏᇚ䱣ⴔᚙ do_page_fault()ⱘܹষ໘DŽ ᅮ CPU ⱘ䖤㸠Ꮖ㒣ࠄ䖒њ义䴶ᓖᐌ᳡ࡵ⿟ᑣⱘЏԧ؛ࠄ䖯ܹݙḌⳌᑨ᳡ࡵ⿟ᑣⱘܼ䖛⿟DŽ䖭䞠 Ёᮁ䇋∖ҹঞᓖᐌⱘડᑨᴎࠊᇚ೼ĀЁᮁ੠ᓖᐌāϔゴЁ䲚Ёҟ㒡ˈ䇏㗙೼䙷䞠ৃҹᡒࠄҢথ⫳ᓖᐌ ϔ⃵಴䍞⬠䆓䯂ϔϾ᮴ᬜഄഔ˄Invalid Address˅㗠ᓩ䍋᯴ᇘ༅䋹ˈҢ㗠ህѻ⫳њϔ⃵义䴶ߎ䫭ᓖᐌDŽ ೼⿟ᑣЁᶤϾഄᮍ䖬ݡ⃵䆓䯂䖭ϾᏆ㒣᩸䫔ⱘऎඳ˄⿟ᑣਬӀϔᅮӮৠᛣˈ䖭ᰃϡ䎇Ў༛ⱘ˅DŽ䖭ᯊ׭ˈ Ё⬭ϟϔϾᄸゟⱘぎ⋲ˈ㗠Ⳍᑨⱘഄഔ߭ϡᑨ㒻㓁Փ⫼њDŽԚᰃˈ೼⫼᠋⿟ᑣЁᕔᕔӮ᳝䫭䇃ˈҹ㟈 ✊ৢজᏆ㒣ᇚ᯴ᇘ᩸䫔˄䗮䖛 munmap()㋏㒳䇗⫼˅DŽ೼᩸䫔ϔϾ᯴ᇘऎ䯈ᯊˈᐌᐌӮ೼㰮ᄬഄഔぎ䯈 ᅮϔ↉⫼᠋⿟ᑣ᳒㒣ᇚϔϾᏆᠧᓔ᭛ӊ䗮䖛 mmap()㋏㒳䇗⫼᯴ᇘࠄݙᄬˈ؛೼䖭Ͼᚙ᱃䞠ˈ៥Ӏ · ᣛҸЁ㾘ᅮⱘ䆓䯂ᮍᓣϢ义䴶ⱘᴗ䰤ϡヺˈ՟བӕ೒ݭϔϾĀা䇏āⱘ义䴶DŽ · Ⳍᑨⱘ⠽⧚义䴶ϡ೼ݙᄬЁDŽ 㗙Ꮖ㒣᩸䫔DŽ Ⳍᑨⱘ义䴶Ⳃᔩ乍៪义䴶㸼乍Ўぎˈгህᰃ䆹㒓ᗻഄഔϢ⠽⧚ഄഔⱘ᯴ᇘ݇㋏ᇮ᳾ᓎゟˈ៪ · ˖މᘶ໡ᠻ㸠ˈ៪䖯㸠ϔѯ୘ৢ໘⧚DŽ䖭䞠᠔䇈ⱘ䰏⹡ৃҹ᳝ҹϟ޴⾡ᚙ ⱘᣛҸ໘ᓔྟذ⿄㔎义Ёᮁ˅ˈ䖯㗠ᠻ㸠乘ᅮⱘ义䴶ᓖᐌ໘⧚⿟ᑣˈՓᑨ⫼⿟ᑣᕫҹҢ಴᯴ᇘ༅䋹㗠᱖ 㗠ᔧࠡⱘᣛҸгህϡ㛑ᠻ㸠ᅠ៤DŽℸᯊ CPU Ӯѻ⫳ϔ⃵义䴶ߎ䫭˄Page Fault˅ᓖᐌ˄Exception˅˄г ᯴ᇘ֓༅䋹њˈˈܗഄഔDŽབᵰ೼䖭Ͼ䖛⿟Ё䘛ࠄᶤ⾡䰏⹡㗠Փ CPU ᮴⊩᳔㒜䆓䯂ࠄⳌᑨⱘ⠽⧚ݙᄬऩ ㅵ⧚ᴎࠊ䗮䖛义䴶Ⳃᔩ੠义䴶㸼ᇚ↣Ͼ㒓ᗻഄഔ˄гৃҹ⧚㾷Ў㰮ᢳഄഔ˅䕀ᤶ៤⠽⧚ټ义ᓣᄬ 䍞⬠䆓䯂 2.4 57 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! জ䖨ಲ 0ˈ䙷䖭⃵ᓖᐌথ⫳೼ҔМഄᮍਸ਼˛݊ᅲ䖬ᰃ೼ᶤϾЁᮁˋᓖᐌ᳡ࡵ⿟ᑣЁˈাϡ䖛ϡ೼ ᰃ䇈䆹䖯⿟ⱘ᯴ᇘᇮ᳾ᓎゟˈᔧ✊гህϡৃ㛑Ϣᔧࠡ䖯⿟᳝݇DŽৃᰃˈϡ䎳ᔧࠡ䖯⿟᳝݇ˈin_interrupt() ᰃᔧࠡ䖯⿟ⱘ mm ᣛ䩜ЎぎˈгህމᶤϾЁᮁ᳡ࡵ⿟ᑣЁˈ಴㗠Ϣᔧࠡ䖯⿟↿᮴݇㋏DŽ঺ϔϾ⡍⅞ᚙ ᰃ in_interrupt()䖨ಲ䴲 0ˈ䇈ᯢ᯴ᇘⱘ༅䋹থ⫳೼މDŽϔϾ⡍⅞ᚙމ᥹ϟᴹˈ䳔㽕Ẕ⌟ϸϾ⡍⅞ᚙ mm_struct 㒧ᵘড᯴њˈ៪㗙䇈ᦣ䗄њ䖭⾡᯴ᇘDŽ ҹࠡ䆆䖛ⱘ䙷ḋ䗮䖛义䴶Ⳃᔩ੠义䴶㸼䖯㸠ˈԚᰃڣ䖯㸠ⱘ᯴ᇘᑊϡ⍝ঞ mm_struct 㒧ᵘˈ㗠ᰃ ৥݊ mm_struct ᭄᥂㒧ᵘˈ㗠䎳㰮ᄬㅵ⧚੠᯴ᇘ᳝݇ⱘֵᙃ䛑೼䙷Ͼ㒧ᵘЁDŽ䖭䞠㽕ᣛߎˈCPU ᅲ䰙 ⿟˄ᔧࠡℷ೼䖤㸠ⱘ䖯⿟˅ⱘ task_struct 㒧ᵘⱘഄഔDŽ೼↣Ͼ䖯⿟ⱘ task_struct 㒧ᵘЁ᳝ϔϾᣛ䩜ˈᣛ ✊ৢᰃ㦋পᔧࠡ䖯⿟ⱘ task_struct ᭄᥂㒧ᵘDŽ೼ݙḌЁˈৃҹ䗮䖛ϔϾᅣ᪡԰ current পᕫᔧࠡ䖯 error_code ߭䖯ϔℹᣛᯢ᯴ᇘ༅䋹ⱘ݋ԧॳ಴DŽ থ⫳ࠡ໩ CPU Ё৘ᆘᄬ఼ݙᆍⱘϔӑࡃᴀˈ䖭ᰃ⬅ݙḌⱘЁᮁડᑨᴎࠊֱᄬϟᴹⱘĀ⦄എāDŽ㗠 ৠᯊˈݙḌⱘЁᮁˋᓖᐌડᑨᴎࠊ䖬Ӵ䖛ᴹϸϾখ᭄DŽϔϾᰃ pt_regs 㒧ᵘᣛ䩜 regsˈᅗᣛ৥՟໪ 㗠≵᳝䕧ܹ䚼ˈᅗᇚ%0 Ϣব䞣 address Ⳍ㒧ড়ˈᑊ䇈ᯢ䆹ব䞣ᑨ䆹㹿ߚ䜡೼ϔϾᆘᄬ఼ЁDŽ Ёᑊ≵᳝Ⳍᑨⱘ䇁㿔៤ߚৃҹ⫼ᴹ䇏প CR2 ⱘݙᆍˈ᠔ҹা㛑⫼∛㓪ҷⷕDŽ䖭㸠∛㓪ҷⷕা᳝䕧ߎ䚼 ༅䋹ⱘ㒓ᗻഄഔᬒ೼᥻ࠊᆘᄬ఼ CR2 Ёˈ㗠䖭ᰒ✊ᰃⳌᑨⱘ᳡ࡵ⿟ᑣ᠔ᖙ䳔ⱘֵᙃDŽৃᰃˈ೼ C 䇁㿔 ᰃϔ㸠∛㓪ⷕDŽЎҔМ㽕⫼∛㓪ⷕਸ਼˛ᔧ i386 CPU ѻ⫳Ā义䴶䫭āᓖᐌᯊˈCPU ᇚᇐ㟈᯴ᇘܜ佪 152 goto bad_area; 151 if (!(vma•>vm_flags & VM_GROWSDOWN)) 150 goto good_area; 149 if (vma•>vm_start <= address) 148 goto bad_area; 147 if (!vma) 146 vma = find_vma(mm, address); 145 144 down(&mm•>mmap_sem); 143 142 goto no_context; 141 if (in_interrupt() || !mm) 140 */ 139 * context, we must not take the fault.. 138 * If we're in an interrupt or have no user 137 /* 136 135 info.si_code = SEGV_MAPERR; 134 mm = tsk•>mm; 133 132 goto vmalloc_fault; 131 if (address >= TASK_SIZE) 130 */ 129 * nothing more. 128 * only copy the information from the master page table, 127 * be in an interrupt or a critical region, and should NOTE! We MUST NOT take any locks for this case. We may * 126 * 125 58 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 233 info.si_signo = SIGSEGV; 232 tsk•>thread.trap_no = 14; 231 tsk•>thread.error_code = error_code; 230 tsk•>thread.cr2 = address; 229 if (error_code & 4) { 228 /* User mode accesses just cause a SIGSEGV */ 227 bad_area_nosemaphore: 226 225 up(&mm•>mmap_sem); 224 bad_area: 223 */ 222 * Fix it, but check if it's kernel or user first.. 221 * Something tried to access memory that isn't in our memory map.. 220 /* [do_page_fault()] ==================== arch/i386/mm/fault.c 220 239 ==================== ᠔ҹˈ៥Ӏህ䱣ⴔ䖭䞠ⱘ goto 䇁হ䕀৥ bad_areaˈ䙷ᰃ೼ 224 㸠˖ DŽމ䫔㗠⬭ϟⱘˈ៪㗙೼ᓎゟ᯴ᇘᯊ䏇䖛њϔഫഄഔDŽ䖭ህᰃ㄀Ѡ⾡ৃ㛑ˈгᰃ៥Ӏ䖭Ͼᚙ᱃᠔䇈ⱘᚙ ᰃ䆹ᷛᖫԡЎ 0 ⱘ䆱ˈ䙷ህ䇈ᯢぎ⋲Ϟᮍⱘऎ䯈ᑊ䴲ේᷜऎˈ䇈ᯢ䖭Ͼぎ⋲ᰃ಴ЎϔϾ᯴ᇘऎ䯈㹿᩸ find_vma()ᡒࠄⱘऎ䯈ᰃේᷜऎ䯈ˈ䙷М೼ᅗⱘ vm_flags Ёᑨ䆹᳝Ͼᷛᖫԡ VM_GROWSDOWNDŽ㽕 ᗢḋᠡⶹ䘧䖭ഄഔᰃ㨑೼䖭Ͼぎ⋲䞠ਸ਼˛䇋ⳟ⿟ᑣ 150 㸠DŽ៥Ӏⶹ䘧ˈේᷜऎᰃ৥ϟԌሩⱘˈབᵰ 㽕㗗㰥ˈ៥Ӏᇚ೼ϟϔϾᚙ᱃Ё䅼䆎DŽԚᰃˈމᔧ᯴ᇘ༅䋹ⱘഄഔ㨑೼䖭Ͼぎ⋲䞠ᯊˈ䖬᳝Ͼ⡍⅞ᚙ ህᰃ೼ේᷜऎҹϟⱘ䙷Ͼ໻ぎ⋲ˈᅗҷ㸼ⴔկࡼᗕߚ䜡˄䗮䖛㋏㒳䇗⫼ brk()˅㗠ҡ᳾ߚ䜡ߎএⱘぎ䯈DŽ ᯴ᇘᇮ᳾ᓎゟ៪Ꮖ㒣᩸䫔DŽ೼⫼᠋㰮ᄬぎ䯈Ёˈৃ㛑᳝ϸ⾡ϡৠⱘぎ⋲DŽ㄀ϔ⾡ぎ⋲া㛑᳝ϔϾˈ䙷 ࠽ϟⱘህᰃ㒭ᅮഄഔℷ㨑೼ϸϾऎ䯈ᔧЁⱘぎ⋲䞠ˈгህᰃ䆹ഄഔ᠔೼义䴶ⱘˈމ䰸њ䖭ϸ⾡ᚙ 䖭гϡᰃ៥Ӏ䖭Ͼᚙ᱃᠔㽕䇈ⱘDŽ ⱘഄഔᙄད㨑೼䖭Ͼऎ䯈DŽ䖭ḋˈ᯴ᇘ㚃ᅮᏆ㒣ᓎゟˈ᠔ҹህ䕀৥ good_area এ䖯ϔℹẔᶹ༅䋹ⱘॳ಴DŽ བᵰᡒࠄњ䖭МϔϾऎ䯈ˈ㗠Ϩ݊䍋ྟഄഔজϡ催Ѣ㒭ᅮⱘഄഔ˄㾕⿟ᑣ 148 㸠˅ˈ䙷ህ䇈ᯢ㒭ᅮ DŽމህ䕀৥ bad_areaˈϡ䖛៥Ӏ䖭Ͼᚙ᱃᠔䇈ⱘϡᰃ䖭Ͼᚙ ഔᰃ೼ේᷜПϟˈгህᰃ 3G ᄫ㡖ҹϞњDŽ㽕Ң⫼᠋ぎ䯈䆓䯂ሲѢ㋏㒳ⱘぎ䯈ˈ䙷ᔧ✊ᰃ䍞⬠њˈ✊ৢ ⷕ੠᭄᥂䛑ᰃ㞾ᑩ৥Ϟߚ䜡ぎ䯈DŽབᵰ≵᳝ϔϾऎ䯈ⱘ㒧ᴳഄഔ催Ѣ㒭ᅮⱘഄഔˈ䙷ህᰃ䇈ᯢ䖭Ͼഄ ᡒϡࠄਸ਼˛ಲᖚϔϟݙḌᇍ⫼᠋㰮ᄬぎ䯈ⱘՓ⫼ˈේᷜ೼⫼᠋ऎⱘ乊䚼ˈҢϞ৥ϟԌሩˈ㗠䖯⿟ⱘҷ ϟӮމ㄀ϔϾऎ䯈DŽབᵰᡒϡࠄⱘ䆱ˈ䙷ᴀ⃵义䴶ᓖᐌህᖙᅮᰃ಴䍞⬠䆓䯂㗠ᓩ䍋DŽ䙷Мˈ೼ҔМᚙ ⱘџᚙDŽҹࠡ䆆䖛ˈfind_vma()䆩೒೼ϔϾ㰮ᄬぎ䯈Ёᡒߎ㒧ᴳഄഔ໻Ѣ㒭ᅮഄഔⱘخfind_vma()᠔㽕 ഔᰃ৺㨑೼ᶤϾᏆ㒣ᓎゟ䍋᯴ᇘⱘऎ䯈ˈ៪㗙䖯ϔℹ݋ԧᣛߎ೼ાϾऎ䯈DŽџᅲℷᰃ䖭ḋˈ䖭ህᰃ ೼ⶹ䘧њথ⫳᯴ᇘ༅䋹ⱘഄഔҹঞ᠔ሲⱘ䖯⿟ҹৢˈ᥹ϟᴹᑨ䆹㽕᧲⏙Ἦⱘᰃ䖭Ͼഄˈڣৃҹᛇ Ң down()䖨ಲҹৢˈህϡӮ᳝߿ⱘ䖯⿟ᴹᠧᡄњDŽ down()/up()᪡԰ᴹֱ䆕DŽЎњ䖭ϾⳂⱘˈ೼ mm_struct 㒧ᵘЁ䖬䆒㕂њ᠔䳔ⱘֵো䞣 mmap_semDŽ䖭ḋˈ ҹϟⱘ᪡԰᳝Ѧ᭹ⱘ㽕∖ˈгህᰃϡᆍ䆌߿ⱘ䖯⿟ᴹᠧᡄˈ᠔ҹ㽕᳝ᇍֵো䞣ⱘ P/V ᪡԰ˈे ໘ˈϡ䖛䙷Ϣ៥Ӏ䖭Ͼᚙ᱃᮴݇ˈ᠔ҹ៥Ӏ⬹এᇍ䙷↉ҷⷕⱘ䅼䆎DŽ ᥻ࠊህ䗮䖛 goto 䇁হ䕀ࠄᷛো no_cotextˈމin_interrupt()㛑Ẕ⌟ࠄⱘ㣗ೈЁ㗠ᏆDŽབᵰথ⫳䖭ѯ⡍⅞ᚙ 59 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⏽дϔϟࠡϔϾᚙ᱃DŽܜᑨ䆹 ᴹⳟⳟᔧ⫼᠋ේᷜ䖛ᇣˈԚᰃ಴䍞⬠䆓䯂㗠Ā಴⽌ᕫ⽣āᕫҹԌሩⱘᚙ᱃DŽ೼䯙䇏ᴀᚙ᱃Пࠡˈ䇏㗙 ϟDŽ⦄೼៥ӀህމԚᰃˈ䇏㗙г䆌Ӯᛳࠄ᚞༛ˈ䍞⬠䆓䯂᳝ᯊ׭ᰃℷᐌⱘDŽϡ䖛ˈ䖭াথ⫳೼ϔ⾡⍜ ೼ϞϔϾᚙ᱃Ёˈ៥ӀĀ␌㾜খ㾖āњϔ⃵಴䍞⬠䆓䯂㗠䗴៤᯴ᇘ༅䋹Ң㗠ᓩ䍋䖯⿟⌕ѻⱘ䖛⿟DŽ 2.5 ⫼᠋ේᷜⱘᠽሩ ϡ䖛ˈҹৢ೼݊ᅗⱘᚙ᱃䞠៥Ӏ䖬Ӯಲࠄ䖭ѯҷⷕЁᴹDŽ ៥Ӏ೼䖭䞠䏇䖛њ do_page_fault()Ёⱘ䆌໮ҷⷕˈ಴Ў䙷ѯҷⷕϢ៥Ӏⴐϟ䖭Ͼ⡍ᅮⱘᚙ᱃᮴݇DŽ ϟᑊ᮴ᛣНˈ಴ЎᴀᴹህϡӮಲএњDŽމ “Segment Faultāᦤ⼎ˈ✊ৢՓ䖯⿟⌕ѻ˄᩸䫔˅DŽ㟇ѢҢᓖᐌ໘⧚䖨ಲ⫼᠋ぎ䯈ৢⱘഄഔˈ೼䖭⾡ᚙ SIGSEGV ⱘডᑨˈ䙷ᰃᔎࠊⱘˈ݊ৢᵰᰃ೼䆹䖯⿟ⱘᰒ⼎ሣϞᰒ⼎⿟ᑣਬӀᗩ㾕ࠄैজ㒣ᐌ㾕ࠄⱘ ᅮᗢМࡲDŽᇍ᳝ѯ䕃Ёᮁⱘ໘⧚ᰃĀ㞾ᜓāⱘˈ᳝ѯ߭ᰃᔎࠊⱘDŽ㗠ᇍѢއ䋼ҹঞ䖯⿟ᴀ䑿ⱘ䗝ᢽ ೼៥Ӏ䖭Ͼᚙ᱃䞠ᔧ✊ᰃ᳝ⱘˈ݊Ё㟇ᇥ᳝ϔϾህᰃ SIGSEGVDŽ✊ৢˈݙḌḍ᥂䖭ѯᕙ໘⧚ֵোⱘᗻ ⱘֵো䳔㽕໘⧚ˈއゴ㡖ҹৢህӮᯢⱑDŽ↣⃵ҢЁᮁˋᓖᐌ䖨ಲПࠡˈ䛑㽕Ẕᶹᔧࠡ䖯⿟ᰃ৺᳝ 㗠᳾ 䇏㗙໻ὖӮ䯂˖Āህ䖭ḋᅠњ˛āᰃⱘˈᅠњDŽ᥹ϟᴹⱘ䆺ᚙˈ䇏㗙೼ⳟњ᳝݇Ёᮁ໘⧚੠ֵোⱘ ৥䆹䖯⿟থߎϔϾᔎࠊⱘĀֵোā˄៪⿄Ā䕃Ёᮁā˅SIGSEGVDŽ㟇ℸˈᴀ⃵՟໪᳡ࡵህ㒧ᴳњDŽ ᠔ҹ᥻ࠊᇚ䖯ܹ 229 㸠DŽ೼䙷䞠ˈᇍᔧࠡ䖯⿟ⱘ task_struct 㒧ᵘݙⱘϔѯ៤ߚ䖯㸠ϔѯ䆒㕂ҹৢˈህ ᔧ error_code ⱘ bit2 Ў 1 ᯊˈ㸼⼎༅䋹ᰃᔧ CPU ໘Ѣ⫼᠋῵ᓣᯊথ⫳ⱘˈ䖭ℷϢ៥Ӏⱘᚙ᱃Ⳍヺˈ 105 */ 104 * bit 2 == 0 means kernel, 1 means user•mode 103 * bit 1 == 0 means read, 1 means write 102 * bit 0 == 0 means no page found, 1 means protection fault 101 * error_code: 100 * 99 * routines. 98 * and the problem, and then passes it off to one of the appropriate 97 * This routine handles page faults. It determines the address, 96 /* ==================== arch/i386/mm/fault.c 96 105 ==================== 㾷˖ 䖛 up()䗔ߎЈ⬠ऎDŽ᥹ⴔˈህ㽕䖯ϔℹ㗗ᆳ error_codeˈⳟⳟ༅䋹ⱘ݋ԧॳ಴DŽҷⷕⱘ԰㗙Ўℸࡴњ⊼ ᔧ᥻ࠊ⌕ࠄ䖒䖭䞠ᯊˈᏆ㒣ϡݡ䳔㽕Ѧ᭹˄಴Ўϡݡᇍ mm_struct 㒧ᵘ䖯㸠᪡԰Ǔˈ᠔ҹ䗮ˈܜ佪 239 } 238 return; 237 force_sig_info(SIGSEGV, &info, tsk); 236 info.si_addr = (void *)address; 235 /* info.si_code has been set above */ info.si_errno = 0; 234 60 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⊩䍞⬠䆓䯂њDŽৃᰃˈᗢḋᴹ߸ᅮĀℷᐌā៪ϡℷᐌਸ਼˛䗮ᐌˈϔ⃵य़ܹේᷜⱘᰃ 4 Ͼᄫ㡖ˈ᠔ҹ䆹 ᔧ✊ᰃ㋻᣼ⴔⱘDŽԚᰃབᵰᰃ%esp•40 ਸ਼˛䙷ህϡӮᰃ಴Ўℷᐌⱘේᷜ᪡԰㗠ᓩ䍋ˈ㗠ᰃ䋻ⳳӋᅲⱘ䴲 ᠔ҹ䖬䳔㽕Ẕᶹথ⫳ᓖᐌᯊⱘഄഔᰃ৺㋻᣼ⴔේᷜᣛ䩜᠔ᣛⱘഄᮍDŽ೼៥Ӏ䖭Ͼᚙ᱃Ёˈ䙷ᰃ%esp•4ˈ ᇍᕙⱘˈމᠻ㸠DŽᔧ᯴ᇘ༅䋹থ⫳೼⫼᠋ぎ䯈˄bit2 Ў 1˅ᯊˈ಴ේᷜ᪡԰㗠ᓩ䍋ⱘ䍞⬠ᰃ԰Ў⡍⅞ᚙ 䖭ϔ⃵ˈぎ⋲Ϟᮍⱘऎ䯈ᰃේᷜऎ䯈ˈ݊ VM_GROWSDOWN ᷛᖫԡЎ 1ˈ᠔ҹ CPU ህ㒻㓁ᕔࠡ 164 goto bad_area; 163 if (expand_stack(vma, address)) 162 } 161 goto bad_area; 160 if (address + 32 < regs•>esp) 159 */ 158 * doesn't show up until later.. 157 * pusha) doing post•decrement on the stack and that 156 * The "+ 32" is there due to some instructions (like 155 * accessing the stack below %esp is always a bug. 154 /* 153 if (error_code & 4) { 152 goto bad_area; 151 if (!(vma•>vm_flags & VM_GROWSDOWN)) [do_page_fault()] ==================== arch/i386/mm/fault.c 151 164 ==================== arch/i386/mm/fault.c ⱘ㄀ 151 㸠DŽ ⱘഄഔˈ಴ℸᖙ✊㽕ᓩ䍋ϔ⃵义䴶䫭ᓖᐌDŽ䅽៥Ӏ乎ⴔϞϔϾᚙ᱃ЁᏆ㒣䍄䖛ⱘ䏃㒓ࠄ䖒᭛ӊ ぎ䯈ЁഄഔЎ(%esp • 4)ⱘഄᮍDŽৃᰃˈ೼៥Ӏ䖭Ͼᚙ᱃Ёഄഔ(%esp • 4)㨑ܹњぎ⋲Ёˈ䖭ᰃᇮ᳾᯴ᇘ ᅮ⦄೼䳔㽕䇗⫼ᶤϾᄤ⿟ᑣˈ಴ℸ CPU 䳔ᇚ䖨ಲഄഔय़ܹේᷜˈгህᰃ㽕ᇚ䖨ಲഄഔݭܹ㰮ᄬ؛ ೒ 2.6 䖯⿟ഄഔぎ䯈⼎ᛣ೒ %esp Ꮖ㒣ᣛ৥ේᷜऎ䯈ⱘ䍋ྟഄഔˈ㾕೒ 2.6DŽ ˄䆄ԣˈේᷜᰃҢϞ৥ϟԌሩⱘ˅ˈᏆ㒣ࠄ䖒њᏆ᯴ᇘⱘේᷜऎ䯈ⱘϟ⊓DŽ៪㗙䇈ˈCPU Ёⱘේᷜᣛ䩜 䆒೼䖯⿟䖤㸠ⱘ䖛⿟ЁˈᏆ㒣⫼ሑњЎᴀ䖯⿟ߚ䜡ⱘේᷜऎ䯈ˈгህᰃҢේᷜⱘĀ乊䚼āᓔྟ؛ 61 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 167 * we can handle it.. 166 * Ok, we have a good vm_area for this memory access, so 165 /* [do_page_fault()] ==================== arch/i386/mm/fault.c 165 207 ==================== 义䴶ᇍ⠽⧚ݙᄬⱘ᯴ᇘDŽ䖭Ͼӏࡵ⬅᥹ϟএⱘ good_area ᅠ៤˖ Ԛᰃˈ៥ӀᏆ㒣ⳟࠄˈexpand_stack()াᰃᬍবњේᷜऎⱘ vm_area_struct 㒧ᵘˈ㗠ᑊ᳾ᓎゟ䍋ᮄᠽሩⱘ ϟ䛑ϡ㟇Ѣ⫼ሑ䌘⑤ˈ᠔ҹ expand_stack()ϔ㠀䛑ᰃℷᐌ䖨ಲⱘDŽމህϢࠡϔᚙ᱃ϔḋњDŽϡ䖛ϔ㠀ᚙ ᔧ expand_stack()䖨ಲⱘؐЎ䴲 0ˈгे•ENOMEM ᯊˈ೼ do_page_fault()ЁгӮ䕀৥ bad_areaˈ݊㒧ᵰ ぎ䯈ৃҹߚ䜡њ˗৺߭ህᑨ䖨ಲ 0DŽټϡ㛑ᠽሩњˈህӮ䖨ಲϔϾ䋳ⱘߎ䫭ҷⷕ•ENOMEMˈ㸼⼎≵᳝ᄬ 䯈໻ᇣ䍙䖛њৃ⫼Ѣේᷜⱘ䌘⑤ˈ៪㗙Փࡼᗕߚ䜡ⱘ义䴶ᘏ䞣䍙䖛њৃ⫼Ѣ䆹䖯⿟ⱘ䌘⑤䰤ࠊˈ䙷ህ RLIMIT_STACK ህᰃᇍ⫼᠋ぎ䯈ේᷜ໻ᇣⱘ䰤ࠊDŽ᠔ҹˈ䖭䞠ህ䖯㸠䖭ḋⱘẔᶹDŽབᵰᠽሩҹৢⱘऎ ϡᰃⱘDŽ↣Ͼ䖯⿟ⱘ task_struct 㒧ᵘЁ䛑᳝Ͼ rlim 㒧ᵘ᭄㒘ˈ㾘ᅮњᇍ↣⾡䌘⑤ߚ䜡Փ⫼ⱘ䰤ࠊˈ㗠 ᐌᰃϔϾ˅DŽ䖭䞠䖬᳝Ͼ䯂乬ˈේᷜⱘ䖭⾡ᠽሩᰃ৺ϡফ䰤ࠊˈⳈࠄᡞぎ䯈ЁⱘᭈϾぎ⋲⫼ᅠЎℶਸ਼˛ ᇚഄഔᣝ义䴶䖍⬠ᇍ唤ˈᑊ䅵ㅫ䳔㽕๲䭓޴Ͼ义䴶ᠡ㛑ᡞ㒭ᅮⱘഄഔࣙᣀ䖯এ˄䗮ˈܜ᠔೼ⱘऎ䯈DŽ佪 খ᭄ vma ᣛ৥ϔϾ vm_area_struct ᭄᥂㒧ᵘˈҷ㸼ⴔϔϾऎ䯈ˈ೼䖭䞠ህᰃҷ㸼ⴔ⫼᠋ぎ䯈ේᷜ 504 } 503 return 0; 502 vma•>vm_mm•>locked_vm += grow; 501 if (vma•>vm_flags & VM_LOCKED) 500 vma•>vm_mm•>total_vm += grow; 499 vma•>vm_pgoff •= grow; 498 vma•>vm_start = address; 497 return •ENOMEM; 496 ((vma•>vm_mm•>total_vm + grow) << PAGE_SHIFT) > current•>rlim[RLIMIT_AS].rlim_cur) 495 if (vma•>vm_end • address > current•>rlim[RLIMIT_STACK].rlim_cur || 494 grow = (vma•>vm_start • address) >> PAGE_SHIFT; 493 address &= PAGE_MASK; 492 491 unsigned long grow; 490 { 489 static inline int expand_stack(struct vm_area_struct * vma, unsigned long address) 488 * and even address < vma•>vm_start. Have to extend vma. */ 487 /* vma is the first one with address < vma•>vm_end, [do_page_fault()>expand_stack()] ==================== include/linux/mm.h 487 504 ==================== Ͼ inline ߑ᭄˖ ܹ䲒ᷜऎ䯈ˈՓ݊ᕫҹᠽሩDŽ᠔ҹህ㽕䇗⫼ expand_stack()ˈ䖭ᰃ೼᭛ӊ include/linux/mm.h ЁᅮНⱘϔ ᮶✊ᰃሲѢℷᐌⱘේᷜᠽሩ㽕∖ˈ䙷ህᑨ䆹Ңぎ⋲ⱘ乊䚼ᓔྟߚ䜡㢹ᑆ义䴶ᓎゟ᯴ᇘˈᑊᇚПᑊ ϔḋˈ䕀৥ bad_areaDŽ㗠೼៥Ӏ⦄೼䖭Ͼᚙ᱃Ёˈ䖭Ͼ⌟䆩ᑨᰃ乎߽䗮䖛њDŽ ߭ᰃ%esp•32DŽ䍙ߎ䖭Ͼ㣗ೈህϔᅮᰃ䫭ⱘњˈ᠔ҹ䎳೼ࠡϔϾᚙ᱃Ёޚᆍ˅य़ܹේᷜDŽ᠔ҹˈẔᶹⱘ ഄഔᑨ䆹ᰃ%esp•4DŽԚᰃ i386 CPU᳝ϔᴵ pusha ᣛҸˈৃҹϔ⃵ᇚ 32 Ͼᄫ㡖˄8 Ͼ 32 ԡᆘᄬ఼ⱘݙ 62 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ==================== mm/memory.c 1189 1208 ==================== handle_mm_fault()њDŽ䆹ߑ᭄ᅮНѢ mm/memory.c Ё˖ 䆌ݭܹⱘDŽѢᰃˈህࠄ䖒 196 㸠ˈ䇗⫼㰮ᄬㅵ⧚ܕ䆌ݭܹˈ㗠ේᷜ↉ᰃܕẔᶹⳌᑨⱘऎ䯈ᰃ৺ Ў 0ˈ㸼⼎≵᳝⠽⧚义䴶ˈ㗠 bit1 Ў 1 㸼⼎ݭ᪡԰DŽ᠔ҹˈ᳔ԢϸԡⱘؐЎ 2DŽ᮶✊ᰃݭ᪡԰ˈᔧ✊㽕 ॳ಴ᑊ䞛পⳌᑨⱘᇍㄪ˄error_code ᳔ԢϝԡⱘᅮНᏆ㒣೼ࠡ㡖Ё߫ߎ˅DŽህ⦄೼䖭Ͼᚙ᱃㗠㿔ˈbit0 ೼䖭䞠ⱘ switch 䇁হЁˈݙḌḍ᥂⬅ЁᮁડᑨᴎࠊӴ䖛ᴹⱘ error_code ᴹ䖯ϔℹ⹂ᅮ᯴ᇘ༅䋹ⱘ 207 } 206 goto out_of_memory; 205 default: 204 goto do_sigbus; 203 case 0: 202 break; 201 tsk•>maj_flt++; 200 case 2: 199 break; 198 tsk•>min_flt++; 197 case 1: 196 switch (handle_mm_fault(mm, vma, address, write)) { 195 */ 194 * the fault. 193 * make sure we exit gracefully rather than endlessly redo 192 * If for any reason at all we couldn't handle the fault, 191 /* 190 189 } 188 goto bad_area; 187 if (!(vma•>vm_flags & (VM_READ | VM_EXEC))) 186 case 0: /* read, not present */ 185 goto bad_area; 184 case 1: /* read, present */ 183 break; 182 write++; 181 goto bad_area; 180 if (!(vma•>vm_flags & VM_WRITE)) 179 case 2: /* write, not present */ 178 /* fall through */ 177 #endif 176 printk("WP fault at %08lx\n", regs•>eip); 175 if (regs•>cs == KERNEL_CS) 174 #ifdef TEST_VERIFY_AREA 173 default: /* 3: write, present */ 172 switch (error_code & 3) { 171 write = 0; 170 info.si_code = SEGV_ACCERR; good_area: 169 /* 168 63 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 123 122 address = (address >> PAGE_SHIFT) & (PTRS_PER_PTE • 1); 121 { 120 extern inline pte_t * pte_alloc(pmd_t * pmd, unsigned long address) [do_page_fault()>handle_mm_fault()>pte_alloc()] ==================== include/asm•i386/pgalloc.h 120 141 ==================== ໛DŽ䖭ᰃ䗮䖛 pte_alloc()ᅠ៤ⱘˈ݊ҷⷕ೼ include/asm•i386/pgalloc.h Ё˖ޚ དخϔϾ义䴶㸼ˈݡ೼义䴶㸼ЁᡒࠄⳌᑨⱘ㸼乍DŽ䖭ḋˈᠡৃҹЎϟ䴶ߚ䜡⠽⧚ݙᄬ义䴶ᑊᓎゟ᯴ᇘ ߚ䜡ܜḍ᥂㒭ᅮⱘഄഔ೼㸼ЁᡒࠄⳌᑨⱘ义䴶㸼乍DŽ៪㗙ˈⳂᔩ乍гৃ㛑䖬ᰃぎⱘˈ䙷ḋⱘ䆱ህ䳔㽕 ѯҔМ˛义䴶Ⳃᔩᘏᰃ೼ⱘˈⳌᑨⱘⳂᔩ乍г䆌Ꮖ㒣ᣛ৥ϔϾ义䴶㸼ˈℸᯊ䳔㽕خᛇᛇˈ᥹ϟᴹ䳔㽕 㗠㿔ˈpmd_alloc()ᰃ㒱ϡӮ༅䋹ⱘˈ᠔ҹ䖭䞠ⱘ pmd ϡৃ㛑Ў 0DŽ䇏㗙ϡོ乎ⴔ㒓ᗻഄഔⱘ᯴ᇘ䖛⿟ CPU Ёˈᡞ݋ԧⱘⳂᔩ乍ᔧ៤ϔϾা৿ϔϾ㸼乍˄㸼ⱘ໻ᇣЎ 1˅ⱘЁ䯈ⳂᔩDŽ᠔ҹˈᇍѢ i386 CPU ᯴ᇘˈ᠔ҹ೼ include/asm_i386/pgtable_2level.h Ёᇚ݊ᅮНЎ“return (pmd_t *)pgd˗āDŽгህᰃ䇈ˈ೼ i386 㟇Ѣϟ䴶ⱘ pmd_alloc()ˈᴀᴹᰃᑨ䆹ߚ䜡˄៪㗙ᡒࠄ˅ϔϾЁ䯈Ⳃᔩ乍ⱘDŽ⬅Ѣ i386 াՓ⫼ϸሖ 316 #define pgd_offset(mm, address) ((mm)•>pgd+pgd_index(address)) ==================== include/asm•i386/pgtable.h 316 316 ==================== ...... 312 #define pgd_index(address) ((address >> PGDIR_SHIFT) & (PTRS_PER_PGD•1)) 311 /* to find an entry in a page•table•directory. */ ==================== include/asm•i386/pgtable.h 311 312 ==================== 䆹ഄഔ᠔ሲ义䴶Ⳃᔩ乍ⱘᣛ䩜DŽ䖭ᰃ೼ include/asm•i386/pgtable.h ЁᅮНⱘ˖ ḍ᥂㒭ᅮⱘഄഔ੠ҷ㸼ⴔ݋ԧ㰮ᄬぎ䯈ⱘ mm_struct ᭄᥂㒧ᵘˈ⬅ᅣ᪡԰ pgd_offset()䅵ㅫߎᣛ৥ 1208 } 1207 return ret; 1206 } 1205 ret = handle_pte_fault(mm, vma, address, write_access, pte); 1204 if (pte) 1203 pte_t * pte = pte_alloc(pmd, address); 1202 if (pmd) { 1201 1200 pmd = pmd_alloc(pgd, address); 1199 pgd = pgd_offset(mm, address); 1198 1197 pmd_t *pmd; 1196 pgd_t *pgd; 1195 int ret = •1; 1194 { 1193 unsigned long address, int write_access) 1192 int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct * vma, 1191 */ 1190 * By the time we get here, we already hold the mm semaphore 1189 /* do_page_fault()>handle_mm_fault()]] 64 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 1146 * adds them. As such, once we have noticed that the page is not present, 1145 * pages from under us. Note that kswapd only ever _removes_ pages, never 1144 * Note the "page_table_lock". It is to protect against kswapd removing 1143 * 1142 * PowerPC hashed page tables that act as extended TLBs). 1141 * with external mmu caches can use to update those (ie the Sparc or 1140 * There is also a hook called "update_mmu_cache()" that architectures 1139 * 1138 * RISC architectures). The early dirtying is also good on the i386. 1137 * and/or accessed for architectures that don't do it in hardware (most 1136 * These routines also need to handle stuff like marking pages dirty 1135 /* [do_page_fault()>handle_mm_fault()>handle_pte_fault()] ==================== mm/memory.c 1135 1187 ==================== ݙ˖ ⱘDŽ࠽ϟⱘህᰃ⠽⧚ݙᄬ义䴶ᴀ䑿њˈ䙷ᰃ⬅ handle_pte_fault()ᅠ៤ⱘDŽ䆹ߑ᭄ᅮНѢ mm/memory.c Ϟݭܹࠄњ义䴶Ⳃᔩ乍 pgd ЁDŽ䖭ḋˈ᯴ᇘ᠔䳔ⱘĀ෎⸔䆒ᮑā䛑Ꮖ㒣唤ܼњˈԚ义䴶㸼乍 pte 䖬ᰃぎ ህ䗮䖛 set_pmd()Ёᇚ݊䍋ྟഄഔ䖲ৠϔѯሲᗻᷛᖫԡϔ䍋ݭܹЁ䯈Ⳃᔩ乍 pmd Ёˈ㗠ᇍ i386 ैᅲ䰙 㛑Ꮖ㒣⫼ᅠˈ䳔㽕ᡞݙᄬЁᏆ㒣ऴ⫼ⱘ义䴶Ѹᤶࠄ⺕ⲬϞএˈህৃҹᯢⱑњDŽߚ䜡ࠄϔϾ义䴶㸼ҹৢˈ 䴶㸼ህ䙷М咏⚺৫ˈЎҔМᰃ“slowāਸ਼˛ಲㄨᰃ᳝ᯊ׭ৃ㛑Ӯᕜ᜶DŽা㽕ᛇϔϟ⠽⧚ݙᄬ义䴶᳝ৃ 㒣ぎњˈ䙷ህাད䗮䖛 get_pte_kernel_slow()ᴹߚ䜡њDŽ䇏㗙г䆌Ӯᛇˈߚ䜡ϔϾ⠽⧚ݙᄬ义䴶⫼԰义 Ꮖ∴ކ䖭ህᰃ get_pte_fast()DŽ㽕ᰃ㓧ˈ∴ކⳟϔϟ㓧ܜ䞞ᬒDŽ䖭ḋˈ೼㽕ߚ䜡ϔϾ义䴶㸼ᯊˈህৃҹ ϟᠡⳳⱘᇚ义䴶㸼᠔ऴⱘ⠽⧚ݙᄬ义䴶މᏆ⒵ⱘᚙ∴ކϡᇚ݊⠽⧚ݙᄬ义䴶䞞ᬒDŽা᳝೼㓧ܜЁˈ㗠 ∴ކᄬ೼ϔϾ㓧ֱܜݙḌЁᇍ义䴶㸼ⱘߚ䜡԰њѯӬ࣪DŽᔧ䞞ᬒϔϾ义䴶㸼ᯊˈݙḌᇚ䞞ᬒⱘ义䴶㸼 乍Ўぎˈ᠔ҹ䳔㽕䕀ࠄᷛো get_new()໘ߚ䜡ϔϾ义䴶㸼DŽϔϾ义䴶㸼᠔ऴⱘぎ䯈ᙄདᰃϔϾ⠽⧚义䴶DŽ ᅮᣛ䩜 pmd ᠔ᣛ৥ⱘⳂᔩ؛ˈᇚ㒭ᅮⱘഄഔ䕀ᤶ៤݊᠔ሲ义䴶㸼ЁⱘϟᷛDŽ೼៥Ӏ䖭Ͼᚙ᱃Ёܜ { 141 140 return NULL; 139 __handle_bad_pmd(pmd); 138 fix: 137 } 136 return (pte_t *)page + address; 135 set_pmd(pmd, __pmd(_PAGE_TABLE + __pa(page))); 134 return get_pte_slow(pmd, address); 133 if (!page) 132 131 unsigned long page = (unsigned long) get_pte_fast(); 130 { 129 getnew: 128 return (pte_t *)pmd_page(*pmd) + address; 127 goto fix; 126 if (pmd_bad(*pmd)) 125 goto getnew; if (pmd_none(*pmd)) 124 65 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᴀህ≵᳝䯂乬њDŽ ᵰ pte_present()ⱘ⌟䆩㒧ᵰᰃ䆹㸼乍᠔᯴ᇘⱘ义䴶⹂ᅲ೼ݙᄬЁˈ䙷М䯂乬ϔᅮߎ೼䆓䯂ᴗ䰤ˈ៪㗙ḍ ϔϾ㸼乍ᰃ৺ЎぎDŽ᠔ҹˈህᖙᅮӮ䖯ܹ do_no_page()˄৺߭ህᰃ do_swap_page()˅DŽ乎֓䆆ϔϟˈབ Ёˈ㗠៥Ӏⱘ⠽⧚ݙᄬ义䴶䖬≵᳝ߚ䜡DŽ䖯ϔℹˈpte_none()᠔⌟䆩ⱘᴵӊгϔᅮ㛑⒵䎇ˈ಴Ўᅗ⌟䆩 ḋˈ⿟ᑣϔᓔ༈ⱘ if 䇁হⱘᴵӊϔᅮ㛑⒵䎇ˈ಴Ў pte_present()⌟䆩ϔϾ㸼乍᠔᯴ᇘⱘ义䴶ᰃ৺೼ݙᄬ ೼៥Ӏ䖭Ͼᚙ᱃䞠ˈϡㅵ义䴶㸼ᰃᮄߚ䜡ⱘ䖬ᰃॳᴹህ᳝ⱘˈⳌᑨⱘ义䴶㸼乍ैϔᅮᰃぎⱘDŽ䖭 1187 } 1186 return 1; 1185 spin_unlock(&mm•>page_table_lock); 1184 establish_pte(vma, address, pte, entry); 1183 entry = pte_mkyoung(entry); 1182 } 1181 entry = pte_mkdirty(entry); 1180 1179 return do_wp_page(mm, vma, address, pte, entry); 1178 if (!pte_write(entry)) 1177 if (write_access) { 1176 1175 } 1174 return do_swap_page(mm, vma, address, pte, pte_to_swp_entry(entry), write_access); 1173 return do_no_page(mm, vma, address, write_access, pte); 1172 if (pte_none(entry)) 1171 spin_unlock(&mm•>page_table_lock); 1170 */ 1169 * drop the lock. 1168 * and the PTE updates will not touch it later. So 1167 * If it truly wasn't present, we know that kswapd 1166 /* 1165 if (!pte_present(entry)) { 1164 entry = *pte; 1163 spin_lock(&mm•>page_table_lock); 1162 */ 1161 * and the SMP•safe atomic PTE updates. 1160 * We need the page table lock to synchronize with kswapd 1159 /* 1158 1157 pte_t entry; 1156 { 1155 int write_access, pte_t * pte) 1154 struct vm_area_struct * vma, unsigned long address, 1153 static inline int handle_pte_fault(struct mm_struct *mm, 1152 */ 1151 * our VM. 1150 * so we don't need to worry about a page being suddenly been added into 1149 * The adding of pages is protected by the MM semaphore (which we hold), 1148 * we can drop the lock early. * 1147 66 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ϡӮ᳝ᣛᅮⱘ nopage()᪡԰ˈ᠔ҹ䖯ܹ do_anonymous_page()DŽ ᇍѢ៥Ӏ䖭Ͼᚙ᱃ᴹ䇈ˈ᠔⍝ঞⱘ㰮ᄬऎ䯈ᰃկේᷜ⫼ⱘˈ䎳᭛ӊ㋏㒳៪义䴶݅ѿ≵᳝ҔМ݇㋏ˈ 1133 } ==================== mm/memory.c 1133 1133 ==================== ...... 1098 return do_anonymous_page(mm, vma, page_table, write_access, address); 1097 if (!vma•>vm_ops || !vma•>vm_ops•>nopage) 1096 1095 pte_t entry; 1094 struct page * new_page; 1093 { 1092 unsigned long address, int write_access, pte_t *page_table) 1091 static int do_no_page(struct mm_struct * mm, struct vm_area_struct * vma, 1090 */ 1089 * This is called with the MM semaphore held. 1088 * 1087 * do not need to flush old virtual caches or the TLB. 1086 * As this is called only for pages that do not currently exist, we 1085 * 1084 * page fault. 1083 * the "write_access" parameter is true in order to avoid the next 1082 * tries to share with existing pages, but makes a separate copy if 1081 * do_no_page() tries to create a new page mapping. It aggressively 1080 /* [do_page_fault()>handle_mm_fault()>handle_pte_fault()>do_no_page()] ==================== mm/memory.c 1080 1098 ==================== ⦄೼ᴹⳟⳟ do_no_page()ⱘᓔ༈޴㸠˖ do_anonymous_page()ᴹߚ䜡⠽⧚ݙᄬ义䴶DŽ ໛ϔϾ vm_operations_struct 㒧ᵘDŽᔧ≵᳝ᣛᅮⱘ nopage() ᪡԰ᯊˈݙḌህ䇗⫼ϔϾߑ᭄ vma•>vm_ops•>nopage 䛑᳝ৃ㛑ᰃぎˈ䙷ህ㸼⼎≵᳝ЎПᣛᅮ݋ԧⱘ nopage()᪡԰ˈ៪㗙ḍᴀህ≵᳝䜡 ᅮњߚ䜡⠽⧚ݙᄬ义䴶ⱘ᪡԰ⱘ䆱ˈ䙷ህᰃ vma•>vm_ops•>nopage()DŽԚᰃˈvma•>vm_ops ੠ ЎϔϾ㰮ᄬऎ䯈 vma ᣛܜᣛᅮϔѯ⡍ᅮⱘ᪡԰ᐌᐌᰃᕜ᳝ᖙ㽕ⱘDŽѢᰃˈབᵰᏆ㒣乘ܜⱘ㰮ᄬぎ䯈乘 㽕ⱘᇍ᭛ӊⱘ䰘ࡴ᪡԰DŽ঺ϔᮍ䴶ˈ⠽⧚义䴶ⱘⲬऎѸᤶᰒ✊гᰃ䎳᭛ӊ᪡԰᳝݇ⱘDŽ᠔ҹˈЎ⡍ᅮ ᓎゟ䍋᯴ᇘৢˈህৃҹ䗮䖛ᇍ䖭ѯߑ᭄ⱘ䇗⫼ᇚᇍݙᄬⱘ᪡԰䕀࣪៤ᇍ᭛ӊⱘ᪡԰ˈ៪㗙䖯㸠ϔѯᖙ ࠄ fork()䖬㽕԰䕗Ў䆺㒚ⱘҟ㒡DŽ䖭ḋˈᔧ䗮䖛 mmap()ᇚϔഫ㰮ᄬऎ䯈䎳ϔϾᏆᠧᓔ᭛ӊ˄ࣙᣀ䆒໛˅ ᳝ᖙ㽕঺໪໡ࠊϔӑ⣀ゟⱘࡃᴀˈ⿄Ў“copy on writeā៪㗙 COWDŽ݇Ѣ COW ៥Ӏ೼䖯⿟ϔゴЁ䆆 ⱘ㰮ᄬぎ䯈ЁᯊˈݙᄬЁ䗮ᐌা㽕ֱᄬϔӑ⠽⧚义䴶ህৃҹњDŽা᳝ᔧϔϾ䖯⿟䳔㽕ݭܹ䆹᭛ӊᯊᠡ Ϣ᭛ӊ᪡԰᳝݇ਸ਼˛಴Ў䖭ᇍѢৃ㛑ⱘ᭛ӊ݅ѿᰃᕜ᳝ᛣНⱘDŽᔧ໮Ͼ䖯⿟ᇚৠϔϾ᭛ӊ᯴ᇘࠄ৘㞾 ԰᳝݇ⱘߑ᭄ᣛ䩜DŽ݊Ё᳝ϔϾߑ᭄ᣛ䩜ህᰃ⫼Ѣ⠽⧚ݙᄬ义䴶ⱘߚ䜡DŽ⠽⧚ݙᄬ义䴶ⱘߚ䜡ЎҔМ vm_operations_struct ᭄᥂㒧ᵘDŽ䖭Ͼ᭄᥂㒧ᵘᅲ䰙ϞᰃϔϾߑ᭄䏇䕀㸼ˈ㒧ᵘЁ䗮ᐌᰃϔѯϢ᭛ӊ᪡ ҹࠡ៥Ӏ᳒㒣ᦤ䍋䖛ˈ೼㰮ᄬऎ䯈㒧ᵘ vm_area_struct Ё᳝Ͼᣛ䩜 vm_ops ˈᣛ৥ϔϾ ㅔ㽕ഄҟ㒡ϔϟˈ✊ৢݡᴹⳟҷⷕDŽܜߑ᭄ do_no_page()гᰃ೼ mm/memory.c ЁᅮНⱘDŽ䖭䞠 67 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! empty_zero_pageˈ㗠ϡㅵ݊㰮ᢳഄഔᰃҔМDŽᅲ䰙Ϟˈ䖭Ͼ义䴶ⱘݙᆍЎܼ 0ˈ᠔ҹ᯴ᇘП߱㢹Ң䆹 ህᰃ䇈ˈা㽕ᰃĀা䇏ā˄гህᰃݭֱᡸ˅ⱘ义䴶ˈᓔྟᯊ䛑ϔᕟ᯴ᇘࠄৠϔϾ⠽⧚ݙᄬ义䴶 96 #define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page)) 95 extern unsigned long empty_zero_page[1024]; 94 */ 93 * for zero•mapped memory areas etc.. 92 * ZERO_PAGE is a global shared page that is always zero: used 91 /* ==================== include/asm•i386/pgtable.h 91 96 ==================== 䖭Ͼ义䴶ᰃ೼ include/asm•i386/pgtable.h ЁᅮНⱘ˖ 䆌䇏˗㗠೼ pte_write()ैᡞ䖭Ͼᷛᖫԡ䆒៤ 1DŽৠᯊˈᇍѢ䇏᪡԰ˈ᠔᯴ᇘⱘ⠽⧚义䴶ᘏᰃ ZERO_PAGEˈ ܕᇍ↨ϔϟˈህৃⳟߎˈ೼ pte_wrprotect()Ёˈᡞ_PAGE_RW ᷛᖫԡ䆒៤ 0ˈ㸼⼎䖭Ͼ⠽⧚义䴶া 271 static inline int pte_write(pte_t pte) { return (pte).pte_low & _PAGE_RW; } ==================== include/asm•i386/pgtable.h 271 271 ==================== 277 static inline pte_t pte_wrprotect(pte_t pte) { (pte).pte_low &= ~_PAGE_RW; return pte; } ==================== include/asm•i386/pgtable.h 277 277 ==================== 䖭Ѡ㗙᳝ҔМϡৠਸ਼˛㾕 include/asm•i386/pgtable.h˖ pte_wrprotect()ࡴҹׂℷ˗㗠བᵰᰃݭ᪡԰˄খ᭄ write_access Ў䴲 0˅ˈ߭䗮䖛 pte_mkwrite()ࡴҹׂℷDŽ ៥Ӏ⊼ᛣࠄˈབᵰᓩ䍋义䴶ᓖᐌⱘᰃϔ⃵䇏᪡԰ˈ䙷М⬅ mk_pte()ᵘㄥⱘ᯴ᇘ㸼乍㽕䗮䖛ܜ佪 1078 } 1077 return 1; /* Minor fault */ 1076 update_mmu_cache(vma, addr, entry); 1075 /* No need to invalidate • it was non•present before */ 1074 set_pte(page_table, entry); 1073 } 1072 flush_page_to_ram(page); 1071 mm•>rss++; 1070 entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma•>vm_page_prot))); 1069 clear_user_highpage(page, addr); 1068 return •1; 1067 if (!page) 1066 page = alloc_page(GFP_HIGHUSER); 1065 if (write_access) { 1064 pte_t entry = pte_wrprotect(mk_pte(ZERO_PAGE(addr), vma•>vm_page_prot)); 1063 struct page *page = NULL; 1062 { int write_access, unsigned long addr) 1061 static int do_anonymous_page(struct mm_struct * mm, struct vm_area_struct * vma, pte_t *page_table, 1060 */ 1059 * This only needs the MM semaphore 1058 /* [do_page_fault()>handle_mm_fault()>handle_pte_fault()>do_no_page()>do_anonymous_page()] mm/memory.c 1058 1078 ==================== ==================== 68 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 㗙ᇍℸ᳔ད㛑᳝᳈໮ϔѯњ㾷DŽ 䌘䞥೼ӕϮЁⱘՓ⫼੠਼䕀ϔḋ䞡㽕DŽ಴ℸˈ䇏ڣ义䴶೼㋏㒳ЁⱘՓ⫼੠਼䕀ህདټ䌘⑤њDŽ⠽⧚ᄬ 义䴶ৃҹ䇈ᰃ᳔෎ᴀǃ᳔䞡㽕ⱘټLinux 䖭ḋⱘ⦄ҷ᪡԰㋏㒳ᴹ䇈ˈ⠽⧚ᄬ ڣ䰸 CPU П໪ˈᇍѢ 2.6 ⠽⧚义䴶ⱘՓ⫼੠਼䕀 ӓԯҢϔᓔྟህᏆ㒣ߚ䜡དњ䎇໳໻ⱘぎ䯈ϔḋDŽ ҔМџг≵᳝থ⫳䖛ˈ㗠ේᷜऎ䯈ህڣᠻ㸠њDŽᇍѢ⫼᠋⿟ᑣᴹ䇈ˈ䖭ᭈϾ䖛⿟䛑ᰃĀ䗣ᯢāⱘˈህ ໘⧚䖨ಲҹৢˈේᷜऎᏆ㒣ᠽሩњˈݡ䞡ᮄᠻ㸠ϔ䘡ҹࠡ༁ᡬⱘ䙷ᴵय़ᷜᣛҸˈ✊ৢህৃҹ㒻㓁ᕔϟ ᯊᏆ㒣Ё䗨༁ᡬњˈᑊ≵᳝ѻ⫳ҔМᬜᵰ˄՟བේᷜᣛ䩜%esp 䖬ᰃᣛ৥ॳᴹⱘԡ㕂˅DŽ⦄೼ˈҢᓖᐌ Ёˈᔧ߱ᰃ಴Ў೼ϔᴵᣛҸЁ㽕य़ᷜˈԚᰃ䍞ߎњᏆ㒣Ўේᷜऎߚ䜡ⱘぎ䯈㗠ᓩ䍋ⱘDŽ䙷ᴵᣛҸ೼ᔧ ӊᑆ乘DŽҢ䖭ϾᛣНϞ䆆ˈ᠔䇧Ā㔎义Ёᮁāᰃϡᇍⱘˈᑨ䆹িĀ㔎义ᓖᐌāᠡᇍDŽ೼៥Ӏ䖭Ͼᚙ᱃ ህৃҹ೼Ңᓖᐌ໘⧚䖨ಲᯊᅠ៤᳾コⱘџϮDŽ䖭Ͼ⡍⅞ᗻᰃ೼ CPU ⱘݙ䚼⬉䏃Ёᅲ⦄ⱘˈ㗠ϡ䳔⬅䕃 བ䰸ҹ 0ˈ᯴ᇘ༅䋹ˈㄝㄝ˅㗠༁ᡬⱘᣛҸᴀ䑿ⱘഄഔ˄㗠ϡᰃϟϔᴵᣛҸⱘഄഔ˅य़ܹේᷜDŽ䖭ḋˈ ⱘഄഔय़ܹේᷜ԰ЎЁᮁ᳡ࡵⱘ䖨ಲഄഔDŽԚᰃᓖᐌैϡৠDŽᔧᓖᐌথ⫳ᯊˈCPU ᇚ಴᮴⊩ᅠ៤˄՟ 䘧ˈЁᮁҹঞ㞾䱋˄trap ᣛҸ˅থ⫳ᯊˈCPU 䛑ӮᇚϟϔᴵᣛҸˈгህᰃ᥹ϟএᴀᴹᑨ䆹ᠻ㸠ⱘᣛҸ 䋹㗠Ё䗨༁ᡬⱘ䙷ᴵᣛҸˈ✊ৢᠡ㒻㓁ᕔϟᠻ㸠ˈ䖭ᰃᓖᐌ໘⧚ⱘ⡍⅞ᗻDŽᄺ䖛᳝݇䇒⿟ⱘ䇏㗙䛑ⶹ 䞡ᮄᠻ㸠಴᯴ᇘ༅ܜ᳔ৢˈ⡍߿㽕ᣛߎˈᔧ CPU Ңϔ⃵义䴶䫭ᓖᐌ໘⧚䖨ಲࠄ⫼᠋ぎ䯈ᯊˈᇚӮ 218 return; 217 up(&mm•>mmap_sem); 216 } 215 tsk•>thread.screen_bitmap |= 1 << bit; 214 if (bit < 32) 213 unsigned long bit = (address • 0xA0000) >> PAGE_SHIFT; 212 if (regs•>eflags & VM_MASK) { 211 */ 210 * Did it hit the DOS screen memory VA from vm86 mode? 209 /* [do_page_fault()] ==================== arch/i386/mm/fault.c 209 218 ==================== Ԛᰃ䙷Ϣ៥Ӏ䖭Ͼᚙ᱃Ꮖ㒣≵᳝݇㋏њ˖ˈމⱘ⡍⅞ᚙ ऎ᳝݇ټᄬڣdo_page_fault()DŽ೼ߑ᭄ do_page_fault()Ёˈ䖬㽕໘⧚ϔϾϢ VM86 ῵ᓣҹঞ VGA ⱘ೒ ᯴ᇘ᮶Ꮖᓎゟˈϟ䴶ህᰃ䗤ሖ䖨ಲњDŽ⬅Ѣ᯴ᇘ៤ࡳˈ৘Ͼሖ⃵Ёⱘ䖨ಲؐ䛑ᰃ 1ˈⳈ㟇 ᰃᅲ⦄೼ CPU ݙ䚼ˈ㗠ᑊ≵᳝⣀᳝ⱘ MMUDŽ˅ܗᄬㅵ⧚ऩ ⱘ update_mmu_cache()ᇍ i386 CPU ᰃϾぎߑ᭄˄㾕 include/asm•i386/pgtable.h˅ˈ಴Ў i386 ⱘ MMU˄ݙ 䆒㕂䖯ᣛ䩜 page_table ᠔ᣛⱘ义䴶㸼乍DŽ㟇ℸˈҢ㰮ᄬ义䴶ࠄ⠽⧚ݙᄬ义䴶ⱘ᯴ᇘ㒜ѢᓎゟњDŽ䖭䞠 Ͼ⠽⧚ݙᄬ义䴶ˈᑊᇚߚ䜡ࠄⱘ⠽⧚义䴶䖲ৠ᠔᳝ⱘ⢊ᗕঞᷛᖫԡ˄㾕⿟ᑣ 1115 㸠˅ˈϔ䍋䗮䖛 set_pte() 䞠ˈ᠔䳔㽕ⱘ义䴶ᰃ೼ේᷜऎˈᑊϨᰃ⬅Ѣݭ᪡԰ᠡᓩ䍋ᓖᐌⱘˈ᠔ҹ㽕䗮䖛 alloc_page()Ў݊ߚ䜡ϔ 义䴶䇏ߎህ䇏ᕫ 0DŽা᳝ৃݭⱘ义䴶ˈᠡ䗮䖛 alloc_page()Ў݊ߚ䜡⣀ゟⱘ⠽⧚ݙᄬDŽ೼៥Ӏ䖭Ͼᚙ᱃ 69 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ==================== include/linux/swap.h 49 64 ==================== include/linux/swap.h Ё˖ НњϔϾ swap_info_struct ᭄᥂㒧ᵘˈ⫼ҹᦣ䗄੠ㅵ⧚⫼Ѣ义䴶Ѹᤶⱘ᭛ӊ៪䆒໛DŽᅗⱘᅮНࣙ৿೼ Փ⫼ˈҹঞ᳝޴Ͼ⫼᠋೼݅ѿ䖭Ͼ义䴶DŽᇍⲬϞ义䴶ⱘㅵ⧚ᰃᣝ᭛ӊ៪⺕Ⲭ䆒໛ᴹ䖯㸠ⱘDŽݙḌЁᅮ ⱘ᭄᥂㒧ᵘ˄៪㗙䇈Ā᠋ষā˅ˈϡ䖛䙷㽕ㅔऩᕫ໮ˈᅲ䰙ϞাᰃϔϾ䅵᭄ˈ㸼⼎䆹义䴶ᰃ৺Ꮖ㹿ߚ䜡 Ϣℸ㉏ԐˈѸᤶ䆒໛˄䗮ᐌᰃ⺕Ⲭˈгৃҹᰃ᱂䗮᭛ӊ˅ⱘ↣Ͼ⠽⧚义䴶г㽕೼ݙᄬЁ᳝ϾⳌᑨ ⧚ऎЁ߭䆒㕂ϔϾぎ䯆ഫ䯳߫ˈҹ֓⠽⧚ݙᄬ义䴶ⱘߚ䜡Փ⫼DŽ䖭ϔѯˈ䇏㗙Ꮖ㒣೼ࠡ䴶ⳟࠄ䖛њDŽ ড়៤⠽⧚ഄഔ䖲㓁ⱘ䆌໮ݙᄬ义䴶Āഫāˈݡḍ᥂ഫⱘ໻ᇣᓎゟ䍋㢹ᑆĀㅵ⧚ऎā˄zone˅ˈ㗠೼↣Ͼㅵ ᔶ៤ϔϾ page 㒧ᵘⱘ᭄㒘ˈᑊՓϔϾܼሔ䞣 mem_map ᣛ৥䖭Ͼ᭄㒘DŽৠᯊˈজᣝ䳔㽕ᇚ䖭ѯ义䴶ᣐ ࠄāDŽ೼㋏㒳ⱘ߱ྟ࣪䰊↉ˈݙḌḍ᥂Ẕ⌟ࠄⱘ⠽⧚ݙᄬⱘ໻ᇣˈЎ↣ϔϾ义䴶䛑ᓎゟϔϾ page 㒧ᵘˈ ᄬ೼ⱘDŽৠḋˈϔϾ⠽⧚Ϟᄬ೼ⱘݙᄬ义䴶ˈབᵰ≵᳝ϔϾⳌᑨⱘ page 㒧ᵘˈህḍᴀϡӮ㹿㋏㒳Āⳟ ҎП᳝Ā᠋ষā៪㗙ĀḷḜāϔḋDŽϔϾ⠽⧚Ϟᄬ೼ⱘҎˈབᵰ≵᳝᠋ষˈҢㅵ⧚ⱘ㾦ᑺᴹ䇈֓ᰃϡ Ͼ↣ڣ᥂㒧ᵘDŽ↣ϔϾ⠽⧚ݙᄬ义䴶П᳝ page ᭄᥂㒧ᵘ˄ҹঞ↣Ͼ䖯⿟П᳝݊ task_struct 㒧ᵘ˅ˈህད ࠡ䴶Ꮖ㒣ㅔ⬹ഄҟ㒡䖛ˈЎњᮍ֓˄⠽⧚˅ݙᄬ义䴶ⱘㅵ⧚ˈ↣ϔϾݙᄬ义䴶䛑ᇍᑨϔϾ page ᭄ 㽕ᇍ⠽⧚义䴶ǃ⡍߿ᰃ⺕Ⲭ义䴶ⱘᢑ䈵ᦣ䗄԰ϔϾㅔ㽕䇈ᯢDŽܜˈ೼ҟ㒡义䴶਼䕀ⱘㄪ⬹Пࠡ ᅮᅗᰃᠧᓔⱘDŽ؛Ё ໻ⱘϡ⹂ᅮᗻDŽ಴ℸˈLinux ᦤկњ⫼ᴹᓔਃ੠݇䯁义䴶Ѹᤶᴎࠊⱘ㋏㒳䇗⫼ˈϡ䖛៥Ӏ೼ᴀゴⱘভ䗄 ЎПDŽ⡍߿ᰃ೼᳝ᅲᯊ㽕∖ⱘ㋏㒳Ёˈᰃϡᅰ䞛⫼义䴶Ѹᤶⱘˈ಴ЎᅗՓ⿟ᑣⱘᠻ㸠೼ᯊ䯈Ϟ᳝њ䕗 䯈ˈ᳝ᯊ׭ᰃҹᯊ䯈ᤶぎ䯈DŽ㗠义䴶ⱘѸᤶˈ߭ᰃ݌ൟⱘҹᯊ䯈ᤶぎ䯈DŽᖙ乏ᣛߎˈ䖭াᰃϡᕫᏆ㗠 ೼䅵ㅫᴎᡔᴃЁˈᯊ䯈੠ぎ䯈ᰃϔᇍ⶯Ⳓˈᐌᐌ䳔㽕೼Ѡ㗙П䯈ᡬЁᴗ㸵ˈ᳝ᯊ׭ᰃҹぎ䯈ᤶᯊ 䳔义䴶ѸᤶāᡔᴃDŽ ㅵ⧚෎⸔ϞⱘĀᣝټऎѸᤶᰃᕜ㉫㊭ⱘˈᇍ㋏㒳ᗻ㛑ⱘᕅડг↨䕗໻ˈ᠔ҹৢᴹথሩ䍋њᓎゟ೼义ᓣᄬ Ѹᤶߎএ˄ᡞ݊ᅗ䖯⿟ᤶ䖯ᴹˈᬙ᳄ĀѸᤶā˅ˈࠄ䇗ᑺ䖭Ͼ䖯⿟䖤㸠ᯊݡѸᤶಲᴹDŽᰒ✊ˈ䖭ḋⱘⲬ ㅵ⧚ⱘ෎⸔ϞⱘˈᔧϔϾ䖯⿟᱖ϡ䖤㸠ⱘᯊ׭ህৃҹᡞᅗ˄ҷⷕ↉੠᭄᥂↉ㄝ˅ټᴃᰃᓎゟ೼↉ᓣᄬ ᄬᬒࠄ⺕ⲬϞˈЎ݊ᅗᗹ⫼ⱘֵᙃ㝒ߎぎ䯈ˈࠄ䳔㽕ᯊݡҢ⺕ⲬϞ䇏䖯ᴹⱘᡔᴃDŽᮽᳳⱘⲬऎѸᤶᡔ ሩ৆Ϟᕜᮽህ᳝њᡞݙᄬⱘݙᆍϢϔϾϧ⫼ⱘ⺕Ⲭぎ䯈ĀѸᤶāⱘᡔᴃˈेᡞ᱖ᯊϡ⫼ⱘֵᙃ˄ݙᆍ˅ ϟˈ㽕Ў㋏㒳䜡໛䎇໳ⱘݙᄬህᕜ䲒DŽ᠔ҹˈ೼䅵ㅫᴎᡔᴃⱘথމⱘ䳔∖ᘏ䞣ህᕜ໻њDŽ೼䖭ḋⱘᚙ ぎ䯈ټⱘˈ՟བ޴क KB ៪ϔѠⱒ KBDŽৃᰃˈᔧ㋏㒳Ё᳝޴ⱒϾǃϞगϾ䖯⿟ৠᯊᄬ೼ⱘᯊ׭ˈᇍᄬ ߭㽕ᇣᕫ໮ˈϔ㠀ϡӮ䍙䖛޴Ͼ MBDŽ⡍߿ഄˈӴ㒳ⱘ Linux˄ҹঞ Unix˅ৃᠻ㸠⿟ᑣ䗮ᐌ䛑ᰃ↨䕗ᇣ བࠡ᠔䗄ˈ↣Ͼ䖯⿟ⱘ㰮ᄬぎ䯈ᰃᕜ໻ⱘ˄⫼᠋ぎ䯈Ў 3GB˅DŽϡ䖛ˈ↣Ͼ䖯⿟ᅲ䰙ϞՓ⫼ⱘぎ䯈 ߭ᣛⱘᰃ݊ݙᆍDŽ䇏㗙ˈ⡍߿ᰃ䴲䅵ㅫᴎϧϮⱘ䇏㗙ˈϔᅮ㽕⏙Ἦᑊ䆄ԣ䖭ϔ⚍DŽ ᔧ៥Ӏ೼䇜ঞ⠽⧚ݙᄬ义䴶ⱘߚ䜡੠䞞ᬒⱘᯊ׭ˈᣛⱘҙᰃ⠽⧚ҟ䋼ˈ㗠೼䇜ঞ义䴶ⱘᤶܹ੠ᤶߎᯊ ϔϾ义䴶ݙᆍⱘ䙷䚼ߚҟ䋼ˈг⿄ЎϔϾ⠽⧚义䴶DŽ᠔ҹˈټ䚼䆒໛Ϟˈ՟བ೼㔥㒰᥹ষवϞˈ⫼ᴹᄬ ᴀкᇚߚ߿⿄ПЎĀ˄⠽⧚˅ݙᄬ义䴶ā੠ĀⲬϞ˄⠽⧚Ǔ义䴶āDŽℸ໪ˈ೼ᶤ乍໪ˈމऎߚ䖭ϸ⾡ᚙ Ϟˈ䙷ህᰃĀ⠽⧚义䴶āDŽḍ᥂݋ԧҟ䋼ⱘϡৠˈϔϾ⠽⧚义䴶ৃҹ೼ݙᄬЁˈгৃҹ೼⺕ⲬϞDŽЎњ ҟ䋼ټ义䴶໻ᇣ˄4KB˅ᇍ唤ⱘऎ䯈ঞ݊ݙᆍDŽ㰮ᄬ义䴶᳔㒜㽕㨑ᅲࠄˈ៪㗙䇈㽕᯴ᇘࠄᶤ⾡⠽⧚ᄬ 㽕╘⏙ᴀкЁՓ⫼ⱘ޴Ͼᴃ䇁DŽĀ㰮ᄬ义䴶āˈᰃᣛ೼㰮ᢳഄഔぎ䯈ЁϔϾ೎ᅮ໻ᇣˈ䖍⬠Ϣܜ佪 70 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 154 int head; /* head of priority•ordered swapfile list */ 153 struct swap_list_t { ==================== include/linux/swap.h 153 156 ==================== 䖭䞠ⱘ swap_list_t ᭄᥂㒧ᵘᰃ೼ include/linux/swap.h ЁᅮНⱘ˖ 23 struct swap_list_t swap_list = {•1, •1}; ==================== mm/swapfile.c 23 23 ==================== 㑻催Ԣ䫒᥹೼ϔ䍋˖ܜ㒧ᵘᣝӬ ৠᯊˈ䖬䆒ゟњϔϾ䯳߫ swap_listˈᇚ৘Ͼৃҹߚ䜡⠽⧚义䴶ⱘ⺕Ⲭ䆒໛៪᭛ӊⱘ swap_info_struct 25 struct swap_info_struct swap_info[MAX_SWAPFILES]; ==================== mm/swapfile.c 25 25 ==================== ⱘ䰉߫˄᭄㒘˅swap_infoˈ䖭ᰃ೼ mm/swapfile.c ЁᅮНⱘ˖ 䆌Փ⫼໮Ͼ义䴶Ѹᤶ䆒໛˄៪᭛ӊ˅ˈ᠔ҹ೼ݙḌЁᓎゟњϔϾ swap_info_struct 㒧ᵘܕLinux ݙḌ ℸ㗠䆒㕂ⱘDŽ ᠔ҹ೼ߚ䜡ⲬϞ义䴶ぎ䯈ᯊሑৃ㛑ᣝ䲚㕸˄cluster˅ᮍᓣ䖯㸠ˈ㗠ᄫ↉ cluster_next ੠ cluster_nr ህᰃЎ ೼䖲㓁ⱘ⺕Ⲭ᠛ऎЁϡ㾕ᕫᰃ᳔᳝ᬜⱘᮍ⊩ˈټҟ䋼ᰃ䕀ࡼⱘ⺕Ⲭˈᇚഄഔ䖲㓁ⱘ义䴶ᄬټ⬅Ѣᄬ ህᰃ䆒໛៪᭛ӊⱘ⠽⧚໻ᇣDŽ ᓔྟࠄҔМഄᮍЎℶᰃկ义䴶ѸᤶՓ⫼ⱘDŽ঺ϔϾᄫ↉ max ߭㸼⼎䆹䆒໛៪᭛ӊЁ᳔໻ⱘ义䴶োˈг ༈੠㒧ሒϸϾഄᮍˈ᠔ҹ swap_info_struct 㒧ᵘЁⱘ lowest_bit ੠ highest_bit ህ䇈ᯢ᭛ӊЁҢҔМഄᮍ ϡৠⱘ义䴶ѸᤶऎḐᓣ˄ҹঞ⠜ᴀ˅ˈ䖬᳝ϔѯ݊ᅗⱘ义䴶гϡկ义䴶ѸᤶՓ⫼DŽ䖭ѯ义䴶䛑䲚Ё೼ᓔ ϔϾ㸼ᯢાѯ义䴶ৃկՓ⫼ⱘԡ೒DŽ䖭ѯֵᙃ᳔߱ᰃ೼ᡞ䆹䆒໛Ḑᓣ࣪៤义䴶Ѹᤶऎᯊ䆒㕂ⱘDŽḍ᥂ гे swap_map[0]᠔ҷ㸼ⱘ䙷Ͼ义䴶ᰃϡ⫼Ѣ义䴶Ѹᤶⱘˈᅗࣙ৿њ䆹䆒໛៪᭛ӊ㞾䑿ⱘϔѯֵᙃҹঞ 㸼⼎䆹义䴶Ѹᤶ䆒໛៪᭛ӊⱘ໻ᇣDŽ䆒໛Ϟ˄៪᭛ӊЁˈ䆒໛гᰃϔ⾡᭛ӊˈϟৠ˅ⱘ㄀ϔϾ义䴶ˈ Ѣ pagesˈᅗއᅮњ䆹义䴶೼ⲬϞ៪᭛ӊЁⱘԡ㕂DŽ᭄㒘ⱘ໻ᇣপއⱘϔϾ⠽⧚义䴶ˈ㗠᭄㒘ⱘϟᷛ߭ ݊Ёⱘᣛ䩜 swap_map ᣛ৥ϔϾ᭄㒘ˈ䆹᭄㒘Ёⱘ↣ϔϾ᮴ヺݭⷁᭈ᭄ेҷ㸼ⲬϞ˄៪᱂䗮᭛ӊЁ˅ 64 }; 63 int next; /* next entry on swap list */ 62 unsigned long max; 61 int pages; 60 int prio; /* swap priority */ 59 unsigned int cluster_nr; 58 unsigned int cluster_next; 57 unsigned int highest_bit; 56 unsigned int lowest_bit; 55 unsigned short * swap_map; 54 struct vfsmount *swap_vfsmnt; 53 struct dentry * swap_file; 52 spinlock_t sdev_lock; 51 kdev_t swap_device; 50 unsigned int flags; struct swap_info_struct { 49 71 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ҔМै⿄ПЎ type ਸ਼˛Ԅ䅵䖭ᰃҢ pte_t 㒧ᵘЁ䖛ᴹⱘDŽ䇏㗙ৃ㛑䆄ᕫˈpte_t ᅲ䰙ϞгᰃϔϾ 32 ԡ᮴ ໛៪᭛ӊⱘᑣো˄ϔ݅ৃҹᆍ㒇 127 Ͼ䖭ḋⱘ᭛ӊˈԚᅲ䰙Ϟ߭㾚㋏㒳ⱘ䜡㕂㗠ᅮˈ䖰ᇣѢ 127˅ˈЎ ᣛ䆹义䴶೼ાϔϾ᭛ӊЁˈᰃϾᑣোDŽ䖭Ͼԡ↉ⱘੑৡᕜᆍᯧᓩ䍋䇏㗙ⱘ䇃㾷ˈᯢᯢᰃᣛ义䴶Ѹᤶ䆒 䖭䞠 offset 㸼⼎义䴶೼ϔϾ⺕Ⲭ䆒໛៪᭛ӊЁⱘԡ㕂ˈгህᰃ᭛ӊЁⱘ䘏䕥义䴶ো˗㗠 type ߭ᰃ 341 #define swp_entry_to_pte(x) ((pte_t) { (x).val }) 340 #define pte_to_swp_entry(pte) ((swp_entry_t) { (pte).pte_low }) 339 #define SWP_ENTRY(type, offset) ((swp_entry_t) { ((type) << 1) | ((offset) << 8) }) 338 #define SWP_OFFSET(x) ((x).val >> 8) 337 #define SWP_TYPE(x) (((x).val >> 1) & 0x3f) 336 /* Encode and de•code a swap entry */ ==================== include/asm•i386/pgtable.h 336 341 ==================== ᅮНњ޴Ͼᅣ᪡԰˖ ᭛ӊ include/asm•i386/pgtable.h Ё䖬Ў type ੠ offset ϸϾԡ↉ⱘ䆓䯂ҹঞϢ pte_t 㒧ᵘП䯈ⱘ݇㋏ˈ ೒ 2.7 义䴶Ѹᤶ乍㒧ᵘ⼎ᛣ೒ ϝϾ䚼ߚˈ㾕೒ 2.7DŽ ৃ㾕ˈϔϾ swp_entry_t 㒧ᵘᅲ䰙ϞাᰃϔϾ 32 ԡ᮴ヺোᭈ᭄DŽԚᰃˈ䖭Ͼ 32 ԡᭈ᭄ᅲ䰙Ϟߚ៤ 18 } swp_entry_t; 17 unsigned long val; 16 typedef struct { 15 */ 14 * mm.h, but m.h is including fs.h via sched .h :•/ 13 * We have to move it here, since not every user of fs.h is including 12 * 11 * swapper address space. 10 * the entry is hidden in the "index" field of the 9 * A swap entry has to fit into a "unsigned long", as 8 /* ==================== include/linux/shmem_fs.h 8 18 ==================== МϔϾ swp_entry_t ᭄᥂㒧ᵘˈ䖭ᰃ೼ include/linux/shmem_fs.h ЁᅮНⱘ˖ 䗮䖛 pte_t ᭄᥂㒧ᵘ˄义䴶㸼乍˅ᇚ⠽⧚ݙᄬ义䴶Ϣ㰮ᄬ义䴶ᓎゟ㘨㋏ϔḋˈⲬϞ义䴶г᳝䖭ڣህ ህᇚ䆹᭛ӊⱘ swap_info_struct 㒧ᵘ䫒ܹ䯳߫ЁDŽ ᓔྟᯊ䯳߫Ўぎˈ᠔ҹ head ੠ next ഛЎ•1DŽᔧ㋏㒳䇗⫼ swap_on()ᣛᅮϔϾ᭛ӊ⫼Ѣ义䴶Ѹᤶᯊˈ 156 }; int next; /* swapfile to be used next */ 155 72 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ==================== mm/swapfile.c 159 182 ==================== swap_info_struct 㒧ᵘDŽ᭛ӊᡒࠄҹৢˈϟ䴶ህᴹⳟ݋ԧⱘ义䴶њ˖ 㒧ᵘ೼ swap_info[] ᭄㒘ЁⱘϟᷛDŽ᠔ҹ 156 㸠ҹℸЎϟᷛҢ swap_info[]Ёপᕫ݋ԧ᭛ӊⱘ 䴶ѸᤶⱘDŽ᥹ⴔˈབࠡ᠔䗄ˈSWAP_TYPE ᠔䖨ಲⱘᅲ䰙Ϟᰃ义䴶Ѹᤶ䆒໛ⱘᑣোˈे݊ swap_info_struct ӏԩџˈ಴Ў೼ӏԩ义䴶Ѹᤶ䆒໛៪᭛ӊЁ义䴶 0 ᰃϡ⫼Ѣ义خབᵰ entry.val Ў 0ˈህᰒ✊ϡ䳔㽕 158 goto bad_device; 157 if (!(p•>flags & SWP_USED)) 156 p = & swap_info[type]; 155 goto bad_nofile; 154 if (type >= nr_swapfiles) 153 type = SWP_TYPE(entry); 152 151 goto out; 150 if (!entry.val) 149 148 unsigned long offset, type; 147 struct swap_info_struct * p; 146 { 145 void __swap_free(swp_entry_t entry, unsigned short count) 144 */ 143 * is still around or has not been recycled. 142 * Caller has made sure that the swapdevice corresponding to entry 141 /* ==================== mm/swapfile.c 141 158 ==================== ᴹⳟ__swap_free()ⱘᓔ༈޴㸠˖ܜ г೼ৠϔ᭛ӊЁˈ䇏㗙ϡོ㞾㸠䯙䇏DŽ 䴶䖭ϔ↉䇈ᯢⱘ⧚㾷DŽℸߑ᭄ⱘҷⷕ೼᭛ӊ mm/swapfile.c ЁDŽ㗠ߚ䜡⺕Ⲭ义䴶ⱘߑ᭄__get_swap_page() ҟ㒡ϔϟ⫼ᴹ䞞ᬒϔϾ⺕Ⲭ义䴶ⱘߑ᭄__swap_free()DŽ䗮䖛䖭Ͼߑ᭄ⱘ䯙䇏ˈ䇏㗙ৃҹࡴ⏅ᇍϞܜ ϟ䴶䕀ܹᴀ㡖ᷛ乬᠔䇈ᇍ⠽⧚义䴶਼䕀ⱘҟ㒡DŽ៥Ӏ䖬ᰃ䗮䖛ϔѯߑ᭄ⱘҷⷕᴹᐂࡽ䇏㗙⧚㾷DŽ SWP_FILE(entry)DŽ ៤ڣᣛᯢњ⠽⧚义䴶ⱘএ৥੠᠔೼DŽ䇏㗙೼䯙䇏ݙḌⱘ⑤⿟ᑣᯊˈϡོᇚ SWP_TYPE(entry)ᛇ ᠔ҹˈᔧ义䴶೼ݙᄬᯊˈ义䴶㸼ЁⱘⳌᑨ㸼乍⹂ᅮњഄഔⱘ᯴ᇘ݇㋏˗㗠ᔧ义䴶ϡ೼ݙᄬᯊˈ߭ 义䴶೼ℸ᭛ӊЁⱘⳌᇍԡ㕂DŽ 䞞DŽ೼ Linux ݙḌЁˈህ⫼ᅗᴹᚳϔഄ⹂ᅮϔϾ义䴶೼ⲬϞⱘԡ㕂ˈࣙᣀ೼ાϔϾ᭛ӊ៪䆒໛ˈҹঞ ᇍ݊ԭ৘ԡ䛑ᗑ⬹ϡ乒ˈ㗠⬭ᕙ㋏㒳䕃ӊ㞾Ꮕᴹࡴҹ㾷ܗ㸼⼎义䴶ϡ೼ݙᄬˈ᠔ҹ CPU Ёⱘ MMU ऩ ⧚ݙᄬ义䴶ˈ㗠ᰃব៤њϔϾ swp_entry_tĀ㸼乍āˈᣛ⼎ⴔ䖭Ͼ义䴶ⱘএ৥DŽ⬅Ѣℸᯊ᳔݊ԢԡЎ 0ˈ ᯢ⠽⧚ݙᄬ义䴶ⱘഄഔঞ义䴶ⱘሲᗻDŽ㗠ᔧϔϾ义䴶೼⺕ⲬϞᯊˈ߭Ⳍᑨⱘ义䴶㸼乍ϡݡᣛ৥ϔϾ⠽ ϔϾ义䴶೼ݙᄬЁᯊˈ义䴶㸼Ёⱘ㸼乍 pte_t ⱘ᳔Ԣԡ P ᷛᖫЎ 1ˈ㸼⼎义䴶೼ݙᄬЁˈ㗠݊ԭ৘ԡᣛ U/Sˈㄝㄝˈ᠔ҹ⿄ПЎ type ԡ↉DŽ㗠 swp_entry_t Ϣ pte_t ϸ⾡᭄᥂㒧ᵘ໻ᇣⳌৠˈ݇㋏䴲ᐌᆚߛDŽᔧ ಴Ў义䴶䛑ᰃ 4KB 䖍⬠ᇍ唤ⱘ˅ˈ㗠Ϣ䖭 7 ԡⳌᇍᑨⱘ߭䛑ᰃѯ㸼⼎义䴶৘⾡ᗻ䋼ⱘᷛᖫԡˈབ R/Wˈ ヺোᭈ᭄ˈ݊Ё᳔催ⱘ 20 ԡЎ⠽⧚义䴶䍋ྟഄഔⱘ催 20 ԡ˄⠽⧚义䴶䍋ྟഄഔⱘԢ 12 ԡ∌䖰ᰃ 0ˈ 73 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ේāDŽ݊ټ· ᱂䗮ⱘ⫼᠋ぎ䯈义䴶ˈࣙᣀ䖯⿟ⱘҷⷕ↉ǃ᭄᥂↉ǃේᷜ↉ˈҹঞࡼᗕߚ䜡ⱘĀᄬ ᣝ义䴶ⱘݙᆍ੠ᗻ䋼ˈ⫼᠋ぎ䯈ⱘ义䴶᳝ϟ䴶޴⾡˖ ݙḌՓ⫼ⱘ义䴶DŽ ᇘⱘDŽ᠔䇧Ā⫼᠋ぎ䯈ⱘ义䴶āˈᰃᣛ೼㟇ᇥϔϾ䖯⿟ⱘ⫼᠋ぎ䯈Ё᳝᯴ᇘⱘ义䴶ˈডП߭Ў˄া㛑⬅˅ 䖭䞠㽕䇈ᯢϔϟˈ೼ݙḌЁৃҹ䆓䯂᠔᳝ⱘ⠽⧚义䴶ˈᤶ㿔П᠔᳝ⱘ⠽⧚义䴶೼㋏㒳ぎ䯈Ё䛑ᰃ᳝᯴ ߎএⱘDŽџᅲϞˈা᳝᯴ᇘࠄ⫼᠋ぎ䯈ⱘ义䴶ᠡӮ㹿ᤶߎˈ㗠ݙḌˈे㋏㒳ぎ䯈ⱘ义䴶߭ϡ೼ℸ߫DŽ ऎѸᤶDŽ݊ѠᠡᰃⲬऎѸᤶˈ㗠ѸᤶⱘⳂⱘ᳔㒜гᰃ义䴶ⱘಲᬊDŽᑊ䴲᠔᳝ⱘݙᄬ义䴶䛑ᰃৃҹѸᤶ ᠔䇧ݙᄬ义䴶ⱘ਼䕀᳝ϸᮍ䴶ⱘᛣᗱDŽ݊ϔᰃ义䴶ⱘߚ䜡ǃՓ⫼੠ಲᬊˈᑊϡϔᅮ⍝ঞ义䴶ⱘⲬ 䖛DŽ ߚ䜡义䴶ᯊᇚ݊䆒㕂៤ 1DŽ䖭ᰃ೼ߑ᭄ rmqueue()Ё䗮䖛 set_page_count()䆒㕂ⱘˈ៥Ӏ೼ࠡ䴶Ꮖ㒣ⳟࠄ 䴶ぎ䯆ˈгህᰃ⬭೼ᶤϔϾぎ䯆义䴶ㅵ⧚ऎⱘぎ䯆䯳߫Ёᯊˈ݊ page 㒧ᵘЁⱘ䅵᭄ count Ў 0ˈ㗠೼ ⶹ䘧њݙḌᗢḋㅵ⧚ݙᄬ义䴶੠ⲬϞ义䴶ҹৢˈህৃҹᴹⳟⳟݙᄬ义䴶ⱘ਼䕀њDŽᔧϔϾݙᄬ义 ᄬЁĀ䋺䴶āϞⱘ᪡԰ˈ㸼⼎⺕ⲬϞ䙷Ͼ义䴶ⱘݙᆍᏆ㒣԰ᑳDŽ᠔ҹˈ㢅䌍ⱘҷӋᰃᵕᇣⱘDŽ nr_swap_pages г๲ࡴњDŽؐᕫ⊼ᛣⱘᰃˈ䞞ᬒ⺕Ⲭ义䴶ⱘ᪡԰ᅲ䰙Ϟᑊϡ⍝ঞ⺕Ⲭ᪡԰ˈ㗠াᰃ೼ݙ ህ㽕Ⳍᑨഄ䇗ᭈ䖭Ͼ㣗ೈⱘ䖍⬠ lowest_bit ៪ highest_bitˈৠᯊˈৃկߚ䜡ⱘⲬϞ义䴶ⱘ᭄䞣 countDŽᔧ䅵᭄䖒ࠄ 0 ᯊˈ䖭Ͼ义䴶ህⳳℷব៤ぎ䯆њDŽℸᯊˈབᵰ义䴶㨑೼ᔧࠡৃկߚ䜡ⱘ㣗ೈП໪ˈ এޣ໻Ѣ SWAP_MAP_MAXDŽߑ᭄ⱘ䇗⫼খ᭄ count 㸼⼎᳝޴ϾՓ⫼㗙䞞ᬒ䆹义䴶ˈ᠔ҹҢ䅵᭄Ё p•>swap_map[offset]ᰃ䆹义䴶ⱘߚ䜡˄੠Փ⫼˅䅵᭄ˈབЎ 0 ህ㸼ᯢᇮ᳾ߚ䜡DŽৠᯊˈߚ䜡䅵᭄гϡᑨ བࠡ᠔䗄ˈoffset ᰃ义䴶೼᭛ӊЁⱘԡ㕂ˈᔧ✊ϡ㛑໻Ѣ᭛ӊᴀ䑿᠔ᦤկⱘ᳔໻ؐDŽ㗠 182 return; 181 out: 180 swap_list_unlock(); 179 swap_device_unlock(p); 178 } 177 } 176 nr_swap_pages++; 175 p•>highest_bit = offset; 174 if (offset > p•>highest_bit) 173 p•>lowest_bit = offset; 172 if (offset < p•>lowest_bit) 171 if (!(p•>swap_map[offset] •= count)) { 170 goto bad_count; 169 if (p•>swap_map[offset] < count) 168 if (p•>swap_map[offset] < SWAP_MAP_MAX) { 167 swap_device_lock(p); 166 swap_list.next = type; 165 if (p•>prio > swap_info[swap_list.next].prio) 164 swap_list_lock(); 163 goto bad_free; 162 if (!p•>swap_map[offset]) 161 goto bad_offset; 160 if (offset >= p•>max) offset = SWP_OFFSET(entry); 159 74 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⹂ഄ乘⌟ᇍ义䴶ⱘ䆓䯂DŽ᠔ҹˈᅠܼ᳝ৃ㛑থ⫳䖭ḋⱘ⦄䈵ˈህᰃϔϾ义䴶Ꮖ㒣དЙ≵᳝ޚᮍ⊩ৃҹ 䖥᳔ᇥ⫼ࠄāⱘ义䴶DŽԚᰃˈ䖭⾡⿃ᵕⱘ义䴶Ѹᤶㄪ⬹ᅲ㸠䍋ᴹг᳝䯂乬ˈ಴Ўᅲ䰙Ϟᑊϡᄬ೼ϔ⾡ ߭ˈϔ㠀䛑ᰃ LRUˈेᣥ䗝Ā᳔ޚՓᕫ೼㔎义Ёᮁথ⫳ᯊᘏᰃ᳝ぎ䯆ݙᄬ义䴶ৃկߚ䜡DŽ㟇Ѣᣥ䗝ⱘ ᤶߎ㗠㝒ߎϔѯݙᄬ义䴶ˈҢ㗠೼㋏㒳Ё㓈ᣕϔᅮⱘぎ䯆义䴶կᑨ䞣ˈܜⳌᇍぎ䯆ᯊˈᣥ䗝ϔѯ义䴶乘 ᰃĀЈ䰉⺼ᵾāˈথ⫳೼㋏㒳ᖭ⹠ⱘᯊ׭㗠≵᳝䇗ᑺⱘԭഄDŽ↨䕗⿃ᵕⱘࡲ⊩ᰃᅮᳳഄˈ᳔དᰃ೼㋏㒳 ⲬϞˈҢ㗠㝒ߎϔѯݙᄬ义䴶ᴹDŽԚ䍋ˈ䖭⾡ᅠܼ⍜ᵕⱘ义䴶Ѹᤶㄪ⬹᳝Ͼ㔎⚍ˈህᰃ义䴶ⱘѸᤶᘏ 䇏ܹࠄߚ䜡ᕫࠄⱘݙᄬ义䴶ЁDŽབᵰ≵᳝ぎ䯆义䴶ৃկߚ䜡ˈህ䆒⊩ᇚϔϾ៪޴Ͼݙᄬ义䴶ᤶߎࠄ⺕ ᰒ✊ˈ᳔ㅔऩⱘ义䴶Ѹᤶㄪ⬹ህᰃ˖↣ᔧ㔎义ᓖᐌᯊ֓ߚ䜡ϔϾݙᄬ义䴶ˈᑊᡞ೼⺕ⲬϞⱘ义䴶 Ⳍ↨Пϟˈ义䴶Ѹᤶᰃ᳔໡ᴖⱘˈ᠔ҹ៥Ӏᇚ㢅䕗໻ⱘ㆛ᐙᴹҟ㒡DŽ ৃˈԚᰃ䙷ḋҹৢ㽕⫼ᯊህজ㽕ҬߎҷӋњDŽ 䖭ѯ义䴶ⱘݙᆍᰃҢ᭛ӊ㋏㒳ЁⳈ᥹䇏ܹ៪㒣䖛㓐ড়পᕫⱘˈ䞞ᬒৢゟेಲᬊ঺԰ᅗ⫼гᑊ᮴ϡ ऎDŽކ· ⫼Ѣ᭛ӊ㋏㒳䇏ˋݭ᪡԰ⱘ㓧 ϔѯ inode 㒧ᵘⱘぎ䯈DŽټᄬކ· ೼᭛ӊ㋏㒳᪡԰Ё⫼ᴹ㓧 ϔѯ᭛ӊⳂᔩ㒧ᵘ dentry ⱘぎ䯈DŽټᄬކ· ೼᭛ӊ㋏㒳᪡԰Ё⫼ᴹ㓧 䆌ᯊᠡࡴҹಲᬊDŽ䖭⾡⫼䗨ⱘݙḌ义䴶໻㟈Ϟ᳝ϟ䴶䖭ѯ˖ܕ᠋ā˗৺߭֓㒻㓁㗕࣪ˈⳈࠄᴵӊϡݡ 䅽݊Ā㗕࣪ā˗བᵰ೼ℸᳳ䯈ᗑ✊জ㽕⫼ࠄ݊ݙᆍњˈ֓Ⳉ᥹ᇚ义䴶䖲ݙᆍߚ䜡㒭Ā⫼ކϔ↉ᯊ䯈ⱘ㓧 г䆌ৃҹᦤ催ҹৢⱘ᪡԰ᬜ⥛DŽ䖭㉏义䴶˄៪᭄᥂㒧ᵘ˅೼Ā䞞ᬒāПৢ㽕ᬒܹϔϾ LRU 䯳߫ˈ㒣䖛 䍋ᴹ”ݏ䆌ˈᡞ䖭ѯ义䴶Āܕ঺ϔ㉏ᰃ㱑✊Փ⫼ᅠ↩њˈԚᰃ݊ݙᆍҡֱ᳝ᄬⱘӋ⍜DŽা㽕ᴵӊ Փ⫼ᅠ↩֓᮴ֱᄬⱘӋؐˈ᠔ҹゟे֓ৃ䞞ᬒDŽ ㋏㒳ේᷜ᠔೼ⱘϸϾ义䴶ˈҹঞҢ㋏㒳ぎ䯈໡ࠊখ᭄ᯊՓ⫼ⱘ义䴶ㄝㄝDŽ䖭ѯ义䴶гᰃϔᮺ · ݙḌЁ䗮䖛 alloc_page()ߚ䜡ˈ⫼԰ᶤℸЈᯊᗻՓ⫼੠Ўㅵ⧚Ⳃⱘⱘݙᄬ义䴶ˈབ↣Ͼ䖯⿟ⱘ 䴶䞞ᬒDŽ 䞞ᬒDŽϡ䖛⬅ѢϔϾ义䴶Ёᕔᕔ᳝໮Ͼৠ⾡᭄᥂㒧ᵘˈ᠔ҹ㽕ࠄᭈϾ义䴶䛑ぎ䯆ᯊᠡ㛑ᡞ义 བ vma_area_struct ᭄᥂㒧ᵘㄝㄝDŽ䖭ѯ᭄᥂㒧ᵘϔᮺՓ⫼ᅠ↩֓᮴ֱᄬӋؐˈ᠔ҹゟे֓ৃ · ݙḌЁ䗮䖛 kmalloc()៪ vmalloc()ߚ䜡ǃ⫼԰ᶤѯЈᯊᗻՓ⫼੠Ўㅵ⧚Ⳃⱘ㗠䆒ⱘ᭄᥂㒧ᵘˈ ぎ䯆ė˄ߚ䜡˅ėՓ⫼ė˄䞞ᬒ˅ėぎ䯆DŽ䖭⾡⫼䗨ⱘݙḌ义䴶໻㟈Ϟ᳝䖭ḋϔѯ˖ ϔ㉏ᰃϔᮺՓ⫼ᅠ↩֓᮴ֱᄬⱘӋؐˈ᠔ҹゟे֓ৃ䞞ᬒǃಲᬊDŽ䖭㉏义䴶ⱘ਼䕀ᕜㅔऩˈህᰃ ℸ㉏ᐌ偏ݙᄬⱘ义䴶ḍ᥂݊ݙᆍⱘᗻ䋼ৃҹߚ៤ϸ㉏DŽ 䰸ℸП໪ˈݙḌЁՓ⫼ⱘݙᄬ义䴶г㽕㒣䖛ࡼᗕߚ䜡ˈԚ∌䖰䛑ֱ⬭೼ݙᄬЁˈϡӮ㹿ѸᤶߎএDŽ 䜡ˈ᳔ৢ䛑Ӯ㹿䞞ᬒˈᑊϨЁ䗨ৃ㛑㹿ᤶߎ㗠ಲᬊৢ঺㸠ߚ䜡˅ ⱘDŽ˄Ⳍ↨Пϟˈ䖯⿟ⱘҷⷕ↉੠ܼሔ䞣䛑೼⫼᠋ぎ䯈ˈ᠔ऴⱘݙᄬ义䴶䛑ᰃࡼᗕⱘˈՓ⫼ࠡ㽕㒣䖛ߚ ݙḌҷⷕ੠ݙḌЁܼሔ䞣᠔ऴⱘݙᄬ义䴶᮶ϡ䳔㽕㒣䖛ߚ䜡гϡӮ㹿䞞ᬒˈ䖭䚼ߚぎ䯈ᰃ䴭ᗕˈܜ佪 ޵ᰃ᯴ᇘࠄ㋏㒳ぎ䯈ⱘ义䴶䛑ϡӮ㹿ᤶߎˈԚ䖬ᰃৃҹᣝՓ⫼੠਼䕀ⱘϡৠ㗠໻㟈Ϟߚ៤޴㉏DŽ 䖭ѯ义䴶᮶⍝ঞߚ䜡ǃՓ⫼੠ಲᬊˈг⍝ঞ义䴶ⱘᤶߎˋᤶܹDŽ · 䖯⿟䯈ⱘ݅ѿݙᄬऎDŽ · 䗮䖛㋏㒳䇗⫼ mmap()᯴ᇘࠄ⫼᠋ぎ䯈ⱘᏆᠧᓔ᭛ӊⱘݙᆍDŽ 䜡ⱘDŽ Ё᳝ѯ义䴶Ң⫼᠋⿟ᑣे䖯⿟ⱘ㾦ᑺⳟᰃ䴭ᗕⱘ˄བҷⷕ↉˅ˈԚҢ㋏㒳ⱘ㾦ᑺⳟҡᰃࡼᗕߚ 75 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ā义䴶䯳߫Ёಲᬊ义䴶ˈ៪䗔ಲࠄぎ䯆䯳߫Ёˈ៪Ⳉ᥹঺㸠ߚ䜡DŽޔ(8) ᔧ᳝䳔㽕ᯊˈህҢĀᑆ (7) བᵰ೼䕀ܹϡ⌏䎗⢊ᗕҹৢⱘϔ↉ᯊ䯈ݙ义䴶ফࠄ䆓䯂ˈ߭জ䕀ܹ⌏䎗⢊ᗕᑊᘶ໡᯴ᇘDŽ ā义䴶䯳߫ inactive_clean_listDŽޔ䴶䯳߫ˈ↣Ͼ义䴶ㅵ⧚ऎ䛑᳝ϔϾϡ⌏䎗Āᑆ ā义ޔDŽ义䴶ⱘ page ᭄᥂㒧ᵘ䗮䖛݊䯳߫༈㒧ᵘ lru 䫒ܹᶤϾϡ⌏䎗Āᑆ˅ޔ(6) ϡ⌏䎗⢊ᗕ˄ᑆ ā义䴶䯳߫ЁDŽޔ߫ inactive_dirty_list 䕀⿏ࠄᶤϾϡ⌏䎗Āᑆ (5) ᇚϡ⌏䎗Ā㛣ā义䴶ⱘݙᆍݭܹѸᤶ䆒໛ˈᑊᇚ义䴶ⱘ page ᭄᥂㒧ᵘҢϡ⌏䎗Ā㛣ā义䴶䯳 1DŽ ޣ ᯊ䛑Փ义䴶ⱘՓ⫼䅵᭄ count inactive_dirty_listˈԚᰃॳ߭Ϟϡݡ᳝ӏԩ䖯⿟ⱘ义䴶㸼乍ᣛ৥䆹义䴶DŽ↣ᔧᮁᓔ义䴶ⱘ᯴ᇘ (4) ϡ⌏䎗⢊ᗕ˄㛣˅DŽ义䴶ⱘ page ᭄᥂㒧ᵘ䗮䖛݊䯳߫༈㒧ᵘ lru 䫒ܹϡ⌏䎗Ā㛣ā义䴶䯳߫ 䴶ⱘՓ⫼䅵᭄ count ࡴ 1DŽ ᇥ᳝ϔϾ䖯⿟ⱘ˄⫼᠋ぎ䯈˅义䴶㸼乍ᣛ৥䆹义䴶DŽ↣ᔧЎ义䴶ᓎゟ៪ᘶ໡᯴ᇘᯊˈ䛑Փ义 (3) ⌏䎗⢊ᗕDŽ义䴶ⱘ page ᭄᥂㒧ᵘ䗮䖛݊䯳߫༈㒧ᵘ lru 䫒ܹ⌏䎗义䴶䯳߫ active_listˈᑊϨ㟇 䜡义䴶ⱘՓ⫼䅵᭄ count 㕂៤ 1ˈ݊ page ᭄᥂㒧ᵘⱘ䯳߫༈ list 㒧ᵘ߭ব៤ぎ䯆DŽ (2) ߚ䜡DŽ䗮䖛ߑ᭄__alloc_pages()៪__get_free_page()ҢᶤϾぎ䯆䯳߫Ёߚ䜡ݙᄬ义䴶ˈᑊᇚ᠔ߚ ߫ free_areaDŽ义䴶ⱘՓ⫼䅵᭄ count Ў 0DŽ (1) ぎ䯆DŽ义䴶ⱘ page ᭄᥂㒧ᵘ䗮䖛݊䯳߫༈㒧ᵘ list 䫒ܹᶤϾ义䴶ㅵ⧚ऎ˄zone˅ⱘぎ䯆ऎ䯳 㓐Ϟ᠔䗄ˈ⠽⧚ݙᄬ义䴶ᤶܹˋᤶߎⱘ਼䕀㽕⚍བϟ˖ 义䴶ⱘ㢅䌍ᰃᕜᇣⱘDŽ ”ޔࠄⳳ᳝ᖙ㽕ᯊᠡࡴҹಲᬊˈ಴ЎಲᬊϔϾĀᑆކā义䴶ˈ߭䖬ৃҹ㒻㓁㓧ޔā义䴶DŽ㟇ѢĀᑆޔ ā៪Ā㗕࣪āৢݡݭߎএˈҢ㗠ব៤ĀᑆैދҢ义䴶᯴ᇘ㸼ᮁᓔˈ㒣䖛ϔ↉ᯊ䯈ⱘĀܜߎএˈ㗠ৃҹ ⲬϞ义䴶ⱘݙᆍⳌϔ㟈ˈ䖭ḋⱘ义䴶ᔧ✊ϡ⫼ݭߎএDŽ݊⃵ˈेՓᰃĀ㛣āⱘ义䴶ˈгϡᖙゟࠏህݭ āⱘˈгህᰃϢޔ⺕ⲬDŽབᵰ㞾Ң᳔䖥ϔ⃵ᤶܹ䆹义䴶ҹৢҢ᳾ݭ䖛䖭Ͼ义䴶ˈ䙷М䖭Ͼ义䴶ᰃĀᑆ ໛ᤶߎϔϾ义䴶ᯊᑊϡϔᅮ㽕ᡞᅗⱘݙᆍݭܹޚ೼ˈܜᆳ䖭Ͼ䯂乬ˈህৃҹⳟߎ݊ᅲ䖬ৃҹᬍ䖯DŽ佪 ᇥ㋏㒳೼义䴶ѸᤶϞⱘ㢅䌍DŽৃᰃˈབᵰ᳈⏅ܹഄ㗗ޣᇣᡪࡼⱘৃ㛑ˈᑊϨޣ䖭⾡ㄪ⬹ᰒ✊ৃҹ Ͼ义䴶ⱘ᯴ᇘᑊՓ݊㜅⾏᱖ᄬ䯳߫ህৃҹњˈℸᯊ䆹义䴶জಲࠄњ⌏䎗⢊ᗕDŽ ⬭೼᱖ᄬ䯳߫Ёⱘ义䴶জফࠄ䆓䯂ˈ⹂ߛഄ䇈ᰃথ⫳њҹℸ义䴶ЎⳂᷛⱘ义䴶ᓖᐌˈ䙷Мা㽕ᘶ໡䖭 ᄬ䯳߫ैᏆϡݡ᳝˄⫼᠋ぎ䯈˅᯴ᇘⱘ义䴶ˈ䖬ᰃ≵᳝ফࠄ䆓䯂ˈ䙷ህࠄњ᳔ৢ䗔ᕍⱘᯊ׭њDŽབᵰ ݙᆍˈህϡ䳔㽕ҢⲬϞ䇏ܹњDŽডПˈབᵰ㒣䖛ϔ↉ᯊ䯈ҹৢˈϔϾϡ⌏䎗ⱘݙᄬ义䴶ˈे䖬⬭೼᱖ ⧚义䴶ⱘ᱖ᄬ䯳߫ЁᡒಲⳌᑨⱘ义䴶ˈݡ⃵ЎПᓎゟ᯴ᇘDŽ⬅Ѣℸ义䴶ᇮ᳾䞞ᬒˈ䖬ֱ⬭ⴔ݊ॳᴹⱘ ҹৢ᳝ᴵӊഄ䖯㸠DŽ䖭ḋˈབᵰ೼ϔϾ义䴶㹿ᤶߎҹৢゟेজফࠄ䆓䯂㗠থ⫳㔎义ᓖᐌˈህৃҹҢ⠽ ݯҎҢĀ⦄ᕍā䕀ܹњĀ乘໛ᕍāDŽ㟇Ѣݙᄬ义䴶ⱘĀ䗔ᕍāˈे᳔ৢ䞞ᬒˈ߭᥼䖳ࠄڣ⌏䎗⢊ᗕāˈህ 䯳߫˅ЁˈাᰃՓ݊ҢĀ⌏䎗⢊ᗕā䕀ܹњĀϡކ݊ page 㒧ᵘ⬭೼ϔϾĀ᱖ᄬā˄cache˅䯳߫˄៪⿄㓧 ᣛ৥ⲬϞ义䴶˄P ᷛᖫԡЎ 0ˈ㸼⼎义䴶ϡ೼ݙᄬ˅ˈԚᰃ᠔ऴ᥂ⱘݙᄬ义䴶ैᑊϡゟे䞞ᬒˈ㗠ᰃᇚ ໛ᤶߎᯊˈᇚ䖭ѯ义䴶ⱘݙᆍݭܹⳌᑨⱘ⺕Ⲭ义䴶ЁˈᑊϨᇚⳌᑨ义䴶㸼乍ⱘݙᆍᬍ៤ޚᑆݙᄬ义䴶 DŽᔧ㋏㒳ᣥ䗝ߎ㢹خⱘথ⫳ˈৃҹᇚ义䴶ⱘᤶߎ੠ݙᄬ义䴶ⱘ䞞ᬒߚ៤ϸℹᴹމЎњ䰆ℶ䖭⾡ᚙ ᪡԰DŽ᳝Ҏᡞℸ⾡⦄䈵⿄Ў˄义䴶ⱘ˅ĀᡪࡼāDŽ ϟˈ᳝ৃ㛑ᭈϾ㋏㒳ⱘ໘⧚㛑࡯䛑㹿䖭ḋⱘᤶܹˋᤶߎ᠔佅੠ˈ㗠ᅲ䰙Ϟḍᴀϡ㛑䖯㸠᳝ᬜⱘ䖤ㅫ੠ މফࠄ䆓䯂њˈԚᰃ߮ᡞᅗᤶߎࠄ⺕ⲬϞˈैজ᳝䆓䯂њˈѢᰃাདজ䍊ᖿᡞᅗᤶ䖯ᴹDŽ೼᳔ണⱘᚙ 76 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 486 485 BUG(); 484 if (!PageLocked(page)) 483 { 482 void add_to_page_cache_locked(struct page * page, struct address_space *mapping, unsigned long index) 481 */ 480 * set all the page flags correctly.. 479 * The caller must have locked the page and 478 * 477 * Add a page to the inode page cache. 476 /* ==================== mm/filemap.c 476 494 ==================== 䴶ϔ㟈ˈ᠔ҹᡞ PG_uptodate ᷛᖫԡ䆒៤ 1DŽߑ᭄__add_to_page_cache()ⱘᅮН㾕 mm/filemap.c˖ ᷛᖫԡᖙ乏Ў 0ˈᣛ䩜 mapping гᖙ乏Ў 0DŽৠᯊˈ义䴶ⱘݙᆍᰃ߮ҢѸᤶ䆒໛䇏ܹⱘˈᔧ✊ϢⲬϞ义 ফࠄᑆᡄDŽ಴Ўᰃ߮ߚ䜡ⱘぎ䯆义䴶ˈ݊ PG_swap_cacheܡᇚ义䴶䫕ԣˈҹܜ೼䇗⫼䖭Ͼߑ᭄ࠡ㽕 70 } 69 add_to_page_cache_locked(page, &swapper_space, entry.val); 68 page•>flags = flags | (1 << PG_uptodate); 67 flags = page•>flags & ~((1 << PG_error) | (1 << PG_arch_1)); 66 BUG(); 65 if (page•>mapping) 64 BUG(); 63 if (PageTestandSetSwapCache(page)) 62 BUG(); 61 if (!PageLocked(page)) 60 #endif 59 swap_cache_add_total++; 58 #ifdef SWAP_CACHE_INFO 57 56 unsigned long flags; 55 { 54 void add_to_swap_cache(struct page *page, swp_entry_t entry) ==================== mm/swap_state.c 54 70 ==================== ೼ mm/swap_state.c Ё˖ ϔϾぎ䯆ݙᄬ义䴶ҹৢˈህ䗮䖛 add_to_swap_cache()ᇚ݊ page 㒧ᵘ䫒ܹⳌᑨⱘ䯳߫ˈ䖭Ͼߑ᭄ⱘҷⷕ 䅽៥ӀᴹⳟⳟݙḌᰃᗢḋᇚϔϾݙᄬ义䴶䫒ܹ䖭ѯ䯳߫ⱘDŽݙḌ೼ЎᶤϾ䳔㽕ᤶܹⱘ义䴶ߚ䜡њ 㸼 page_hash_tableDŽޥ߫DŽℸ໪ˈЎࡴᖿ೼᱖ᄬ䯳߫Ёⱘ᧰㋶ˈজ䆒㕂њϔϾᴖ ݙᄬ义䴶ㅵ⧚䍋ᴹˈ↣ϾৃѸᤶݙᄬ义䴶ⱘ page ᭄᥂㒧ᵘ䛑䗮䖛݊䯳߫༈㒧ᵘ list 䫒ܹ݊ЁⱘϔϾ䯳 Ўಲᬊ义䴶ᦤկখ㗗DŽৠᯊˈ䖬䗮䖛ϔϾܼሔⱘ address_space ᭄᥂㒧ᵘ swapper_spaceˈᡞ᠔᳝ৃѸᤶ ḍ᥂义䴶ⱘ page 㒧ᵘ೼䖭ѯ LRU 䯳߫Ёⱘԡ㕂ˈህৃҹⶹ䘧䖭Ͼ义䴶䕀ܹϡ⌏䎗⢊ᗕৢᯊ䯈ⱘ䭓ⷁDŽ active_list ੠ inactive_dirty_list ϸϾ LRU 䯳߫ˈ䖬೼↣Ͼ义䴶ㅵ⧚ऎЁ䆒㕂њϔϾ inactive_clean_listDŽ Ўњᅲ⦄䖭⾡ㄪ⬹ˈ೼ page ᭄᥂㒧ᵘЁ䆒㕂њ᠔䳔ⱘ৘⾡៤ߚˈᑊ೼ݙḌЁ䆒㕂њܼሔᗻⱘ ᔧ✊ˈᅲ䰙ⱘᅲ⦄䖬㽕᳈໡ᴖϔѯDŽ 77 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 150 #define get_page(p) atomic_inc(&(p)•>count) ==================== include/linux/mm.h 150 150 ==================== ⱘՓ⫼䅵᭄ page•>count ࡴ 1DŽ䖭ᰃ೼ include/linux/mm.h ЁᅮНⱘ˖ ҷⷕЁⱘ page_cache_get()೼ include/linux/pagemap.h ЁᅮНЎ get_page(page)ˈᅲ䰙Ϟাᰃᇚ义䴶 䯳߫ˈᑊ䗮䖛݊䯳߫༈ lru 䫒ܹ LRU 䯳߫ active_listDŽޥ䫒ܹᶤϾᴖ page 㒧ᵘ䗮䖛݊䯳߫༈ list 䫒ܹ᱖ᄬ䯳߫ swapper_spaceˈ䗮䖛ᣛ䩜 next_hash ੠ঠ䞡ᣛ䩜 pprev_hash Ңߑ᭄ add_to_page_cache_locked()Ёৃҹⳟࠄˈ义䴶 page 㹿ࡴܹࠄϝϾ䯳߫ЁDŽϟ䴶䇏㗙Ӯⳟࠄˈ 㒧ᵘЁⱘ᳔ৢϔϾ៤ߚᣛ৥঺ϔϾ᭄᥂㒧ᵘ swap_aopsˈ䞠䴶ࣙ৿њ৘⾡ swap ᪡԰ⱘߑ᭄ᣛ䩜DŽ 37 }; 36 &swap_aops, 35 0, /* nrpages */ 34 LIST_HEAD_INIT(swapper_space.locked_pages), 33 LIST_HEAD_INIT(swapper_space.dirty_pages), 32 LIST_HEAD_INIT(swapper_space.clean_pages), 31 struct address_space swapper_space = { ==================== mm/swap_state.c 31 37 ==================== mm/swap_state.c˖ locked_pages ⫼Ѣ䳔㽕᱖ᯊ䫕ᅮ೼ݙᄬϡ䅽ᤶߎⱘ义䴶DŽ᭄᥂㒧ᵘ swapper_space ⱘᅮН㾕Ѣ āⱘ੠Ā㛣āⱘ义䴶˄䳔㽕ݭߎ˅ˈ঺ϔϾ䯳߫༈ޔ㒧ᵘЁ᳝ϝϾ䯳߫༈ˈࠡϸϾߚ߿⫼ѢĀᑆ 375 }; 374 spinlock_t i_shared_lock; /* and spinlock protecting it */ 373 struct vm_area_struct *i_mmap_shared; /* list of shared mappings */ 372 struct vm_area_struct *i_mmap; /* list of private mappings */ 371 struct inode *host; /* owner: inode, block_device */ 370 struct address_space_operations *a_ops; /* methods */ 369 unsigned long nrpages; /* number of total pages */ 368 struct list_head locked_pages; /* list of locked pages */ 367 struct list_head dirty_pages; /* list of dirty pages */ 366 struct list_head clean_pages; /* list of clean pages */ 365 struct address_space { ==================== include/linux/fs.h 365 375 ==================== include/linux/fs.h˖ 䇗⫼খ᭄ mapping ᰃϔϾ address_space 㒧ᵘᣛ䩜ˈህᰃ&swapper_spaceDŽ䖭⾡᭄᥂㒧ᵘⱘᅮН㾕 494 } 493 spin_unlock(&pagecache_lock); 492 lru_cache_add(page); 491 add_page_to_hash_queue(page, page_hash(mapping, index)); 490 add_page_to_inode_queue(mapping, page); 489 page•>index = index; 488 spin_lock(&pagecache_lock); page_cache_get(page); 487 78 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 226 /** ==================== mm/swap.c 226 241 ==================== ೼ mm/swap.c Ё˖ ᳔ৢᇚ义䴶ⱘ page ᭄᥂㒧ᵘ䗮䖛 lru_cache_add()䫒ܹࠄݙḌЁⱘ LRU 䯳߫ active_list Ёˈ݊ҷⷕ 68 #define page_hash(mapping,index) (page_hash_table+_page_hashfn(mapping,index)) include/linux/pagemap.h 68 68 ==================== ==================== ˖ؐޥѢᴖއ䫒ܹⱘ݋ԧ䯳߫প 70 } 69 atomic_inc(&page_cache_size); 68 PAGE_BUG(page); 67 if (page•>buffers) 66 next•>pprev_hash = &page•>next_hash; 65 if (next) 64 page•>pprev_hash = p; 63 page•>next_hash = next; 62 *p = page; 61 60 struct page *next = *p; 59 { 58 static void add_page_to_hash_queue(struct page * page, struct page **p) ==================== mm/filemap.c 58 70 ==================== 䯳߫Ёˈ݊ҷⷕг೼ mm/filemap.c Ё˖ޥ✊ৢ䗮䖛__add_page_to_hash_queue()ᇚ݊䫒ܹࠄᶤϾᴖ Ң䖭ϾᛣНϞ䇈ˈ⫼ᴹㅵ⧚ৃѸᤶ义䴶ⱘ address_space ᭄᥂㒧ᵘ swapper_space াᰃϾ⡍՟DŽ ᴹㅵ⧚ˈ㗠ҷ㸼ⴔϔϾ᭛ӊⱘ inode ᭄᥂㒧ᵘЁ᳝Ͼ៤ߚ i_dataˈ䙷ህᰃϔϾ address_space ᭄᥂㒧ᵘDŽ ᴎࠊDŽ䗮ᐌᴹ㞾ৠϔϾ᭛ӊⱘ义䴶ህ䗮䖛ϔϾ address_space ᭄᥂㒧ᵘކ᭛ӊⱘ䇏ˋݭг㽕⫼ࠄ䖭⾡㓧 ᰃЎ义䴶Ѹᤶ㗠䆒ⱘˈܝϡކ义䴶DŽЎҔМ䖭Ͼߑ᭄ি add_page_to_inode_queue ਸ਼˛䖭ᰃ಴Ў义䴶ⱘ㓧 ”ޔৃ㾕ˈ䫒ܹⱘᰃ swapper_space Ёⱘ clean_pages 䯳߫ˈ߮ҢѸᤶ䆒໛䇏ܹⱘ义䴶ᔧ✊ᰃĀᑆ 79 } 78 page•>mapping = mapping; 77 list_add(&page•>list, head); 76 mapping•>nrpages++; 75 74 struct list_head *head = &mapping•>clean_pages; 73 { 72 static inline void add_page_to_inode_queue(struct address_space *mapping, struct page * page) ==================== mm/filemap.c 72 79 ==================== ݊ҷⷕ೼ mm/filemap.c Ё˖ ᇚ㒭ᅮⱘ page 㒧ᵘ䗮䖛 add_page_to_inode_queue()ࡴܹࠄ swapper_space Ёⱘ clean_pages 䯳߫ˈܜ (31 #define page_cache_get(x) get_page(x include/linux/pagemap.h 31 31 ==================== ==================== 79 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ḋⱘ㋏㒳ЁᑨᇚⲬऎѸᤶ݇䯁DŽџᅲϞˈ೼ Linux ݙḌ߮ᓩᇐ䖯ᴹП߱ˈ᠔᳝ⱘ义䴶Ѹᤶ䛑ᰃ݇䯁ⱘˈ 㗠ᢍএⱘ䖛⿟জᕜ᜶˄Ϣ⺕Ⲭ䇏ݭⳌ↨䕗˅DŽᰒ✊ˈFlash Memory ᰃϡ䗖ড়⫼԰义䴶ѸᤶⱘDŽ᠔ҹ೼䖭 ᢍএˈ✊ৢᠡݭܹˈܜ఼Ёⱘݙᆍټ᳓⺕Ⲭҟ䋼DŽᇍ Flash Memory ⱘݭ᪡԰ᰃᕜ咏⚺䌍ᯊⱘˈ䳔㽕ᇚᄬ ᳝ᯊ׭ᰃᖙ㽕ⱘDŽϔѯĀጠܹᓣā㋏㒳ˈᐌᐌ⫼ Flash Memory˄䮾ᄬ˅ᴹҷخֱᡸDŽ೼ᅲ䏉Ёˈ䖭ḋ ㅵ⧚ⱘᴎࠊህ䗔࣪ࠄऩ㒃ⱘഄഔ᯴ᇘ੠ټᤶܹᤶߎDŽᔧ᠔⬭ⱘⲬऎ੠᭛ӊ䛑ϡݡ⫼Ѣ义䴶Ѹᤶᯊˈᄬ 䖭ϸϾ㋏㒳䇗⫼ᰃЎ⡍ᴗ⫼᠋䖯⿟䆒㕂ⱘˈ⫼ҹᓔྟ៪㒜ℶᡞᶤϾ⡍ᅮⱘⲬऎ៪᭛ӊ⫼Ѣ义䴶ⱘ swapoff˄const char*path˅ swapon˄const char *pathˈint swapflags˅ swapoff()DŽ䇗⫼⬠䴶Ў˖ 䯈DŽ⡍߿ᰃ⡍ᴗ⫼᠋䖯⿟ˈ䖬ᥠᦵⴔᇍᤶܹˋᤶߎᴎࠊⱘܼሔᗻ᥻ࠊᴗˈ䖭ህᰃ㋏㒳䇗⫼ swapon()੠ ݙᇍѢ݊ᴀ䑿ⱘݙᄬㅵ⧚৥ݙḌᦤߎϔѯ㽕∖ˈ՟བ䗮䖛㋏㒳䇗⫼ mmap()ᇚϔ᭛ӊ᯴ᇘࠄᅗⱘ⫼᠋ぎ ㅵ⧚ϡᅠܼᰃݙḌⱘџˈ⫼᠋䖯⿟ৃҹ೼Ⳍᔧ⿟ᑺϞখϢᇍݙᄬⱘㅵ⧚ˈৃҹ೼ϔᅮⱘ㣗ೈټᄬ ೼䖭ѯ䯳߫䯈ⱘ䕀⿏DŽ PG_inactive_dirty ҹঞ PG_inactive_clean ㄝᷛᖫԡᴹ㸼ᯢⳂࠡᰃ೼ાϔϾ䯳߫ЁDŽҹৢ䇏㗙ᇚⳟࠄ义䴶 ⬅Ѣ page ᭄᥂㒧ᵘৃҹ䗮䖛݊ৠϔϾ䯳߫༈㒧ᵘ lru䫒ܹϡৠⱘ LRU䯳߫ˈ᠔ҹ䳔㽕᳝ PG_activeǃ 215 } 214 nr_active_pages++; \ 213 list_add(&(page)•>lru, &active_list); \ 212 SetPageActive(page); \ 211 ZERO_PAGE_BUG \ 210 DEBUG_ADD_PAGE \ 209 #define add_page_to_active_list(page) { \ ==================== include/linux/swap.h 209 215 ==================== 䖭䞠ⱘ add_page_active_list()ᰃϾᅣ᪡԰ˈᅮНѢ include/linux/swap.h ݙ˖ 241 } 240 spin_unlock(&pagemap_lru_lock); 239 deactivate_page_nolock(page); 238 if (!page•>age) 237 /* This should be relatively rare */ 236 add_page_to_active_list(page); 235 DEBUG_ADD_PAGE 234 BUG(); 233 if (!PageLocked(page)) 232 spin_lock(&pagemap_lru_lock); 231 { 230 void lru_cache_add(struct page * page) 229 */ 228 * @page: the page to add lru_cache_add: add a page to the page lists * 227 80 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 115 start = temp; 114 #endif 113 spin_unlock_irqrestore(&node_lock, flags); 112 next = next•>node_next; 111 temp = next; 110 if (!next) next = pgdat_list; 109 spin_lock_irqsave(&node_lock, flags); 108 #else 107 temp = NODE_DATA(numa_node_id()); 106 #ifdef CONFIG_NUMA 105 return NULL; 104 if (order >= MAX_ORDER) 103 102 #endif 101 static pg_data_t *next = 0; 100 unsigned long flags; 99 #ifndef CONFIG_NUMA 98 pg_data_t *start, *temp; 97 struct page *ret = 0; 96 { 95 struct page * alloc_pages(int gfp_mask, unsigned long order) 94 */ 93 * should do concentratic circle search, starting from current node. 92 * This can be refined. Currently, tries to do round robin, instead 91 /* ==================== mm/numa.c 91 128 ==================== 43 #ifdef CONFIG_DISCONTIGMEM ==================== mm/numa.c 43 43 ==================== ᴹⳟ⫼Ѣ NUMA 㒧ᵘⱘ alloc_pages()ˈ݊ҷⷕ೼ mm/numa.c Ё˖ܜ៥Ӏ ぎ䯈Ā䋼ഄāϔ㟈ᗻⱘ㗗㰥DŽټᄬ ᅮপ㟡DŽЎҔМਸ਼˛䖭ህᰃߎѢࠡϔ㡖Ё᠔䗄ᇍ⠽⧚އ ⱘᴵӊ㓪䆥䗝ᢽ乍 CONFIG_DISCONTIGMEM ҷⷕЁ᳝ϸϾ alloc_pages()ˈϔϾ೼ mm/numa.c Ёˈ঺ϔϾ೼ mm/page_alloc.c Ёˈ㓪䆥ᯊḍ᥂᠔ᅮН ᔧϔϾ䖯⿟䳔㽕ߚ䜡㢹ᑆ䖲㓁ⱘ⠽⧚义䴶ᯊˈৃҹ䗮䖛 alloc_pages()ᴹᅠ៤DŽLinux ݙḌ 2.4.0 ⠜ⱘ ᰃ䖲㓁ߚ䜡ⱘDŽ ぎ䯈Ā䋼ഄāϔ㟈ᗻⱘ㗗㰥ˈेՓϡᰃ⫼Ѣ DMA ⱘݙᄬ义䴶гټЎ֓Ѣㅵ⧚ˈ⡍߿ᰃߎѢᇍ⠽⧚ᄬ Ϟϔ㡖Ё᳒㒣ᦤࠄˈᔧ䳔㽕ߚ䜡㢹ᑆݙᄬ义䴶ᯊˈ⫼Ѣ DMA ⱘݙᄬ义䴶ᖙ乏ᰃ䖲㓁ⱘDŽ݊ᅲˈ 2.7 ⠽⧚义䴶ⱘߚ䜡 䖯⿟䯈䗮䆃ⱘ㣗⭈ˈᇍ䖭޴Ͼ㋏㒳䇗⫼ᇚ೼䖯⿟䯈䗮䆃ϔゴЁ঺㸠ҟ㒡DŽ ㅵ⧚᳝݇ⱘDŽ⬅ѢдᛃϞᇚ݅ѿݙᄬᔦܹټℸ໪ˈ䖬᳝޴Ͼ⫼Ѣ݅ѿݙᄬⱘ㋏㒳䇗⫼ˈгᰃϢᄬ Ⳍᑨⱘᅲ⫼⿟ᑣ swaponDŽা㽕ᡞ䖭ੑҸ㸠Ң᭛ӊЁᣓᥝህ≵᳝义䴶ѸᤶњDŽ ݙḌ೼߱ྟ࣪ᳳ䯈㽕ᠻ㸠/etc/rc.d/rc.S ੑҸ᭛ӊˈ㗠䖭Ͼ᭛ӊЁⱘੑҸ㸠ПϔህᰃϢ㋏㒳䇗⫼ swapon() 81 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 344 static inline struct page * alloc_pages(int gfp_mask, unsigned long order) 343 #ifndef CONFIG_DISCONTIGMEM ==================== include/linux/mm.h 343 352 ==================== 䖲㓁ぎ䯈 UMA 㒧ᵘⱘ alloc_pages()ᰃ೼᭛ӊ include/linux/mm.h ЁᅮНⱘ˖ UMA 㒧ᵘЁা᳝ϔϾ㡖⚍ contig_pape_dataˈ㗠೼ NUMA 㒧ᵘ៪ϡ䖲㓁ぎ䯈 UMA 㒧ᵘЁ᳝߭໮ϾDŽ ᡞ䖭↉ҷⷕϢϟ䴶⫼Ѣ䖲㓁ぎ䯈 UMA 㒧ᵘⱘ alloc_pages()ᇍ✻ϔϟˈህৃҹⳟߎऎ߿˖೼䖲㓁ぎ䯈 ᅮ݋ԧⱘߚ䜡ㄪ⬹DŽއˈৃ㾕ˈখ᭄ gfp_mask ೼䖭䞠⫼԰㒭ᅮ㡖⚍Ё᭄㒘 node_zonelists[]ⱘϟᷛ 89 } 88 return __alloc_pages(pgdat•>node_zonelists + gfp_mask, order); 87 { 86 unsigned long order) 85 static struct page * alloc_pages_pgdat(pg_data_t *pgdat, int gfp_mask, ==================== mm/numa.c 85 89 ==================== Ё˖ 㗠䖨ಲ 0DŽᇍѢ↣Ͼ㡖⚍ˈ䇗⫼ alloc_pages_pgdat()䆩೒ߚ䜡᠔䳔ⱘ义䴶ˈ䖭Ͼߑ᭄ⱘҷⷕ೼ mm/numa.c Ң㄀ϔϾ㡖⚍ࠄ᳔߱ᓔྟⱘഄᮍ˅ᠿᦣ䯳߫Ё᠔᳝ⱘ㡖⚍ˈⳈ㟇೼ᶤϾ㡖⚍ݙߚ䜡៤ࡳˈ៪ᕏᑩ༅䋹 ᰃҢ temp ᓔྟࠄ䯳߫ⱘ᳿ሒˈ✊ৢಲ༈ܜ˄ߑ᭄ЁЏ㽕ⱘ᪡԰೼ѢϸϾ while ᕾ⦃ˈᅗӀߚϸ៾ ߚ䜡ᯊ䕂⌕Ң৘Ͼ㡖⚍ᓔྟˈҹ∖৘㡖⚍䋳㥋ⱘᑇ㸵DŽ ぎ䯈㒧ᵘЁˈ߭г᳝Ͼ pg_data_t ᭄᥂㒧ᵘⱘ䯳߫ pgdat_listˈټpg_data_t ᭄᥂㒧ᵘ䯳߫DŽ㗠೼ϡ䖲㓁ᄬ ೼ NUMA 㒧ᵘⱘ㋏㒳Ёˈৃҹ䗮䖛ᅣ᪡԰ NUMA_DATA ੠ numa_node_id()ᡒࠄ CPU ᠔೼㡖⚍ⱘ 㸼⼎᠔䳔ⱘ⠽⧚ഫ໻ᇣˈৃҹᰃ 1ǃ2ǃ4ǃĂǃⳈࠄ 2MAX_ORDER Ͼ义䴶DŽ 䇗⫼ᯊ᳝ϸϾখ᭄DŽ㄀ϔϾখ᭄ gfp_mask ᰃϾᭈ᭄ˈ㸼⼎䞛⫼ાϔ⾡ߚ䜡ㄪ⬹˗㄀ѠϾখ᭄ order ϡ䖲㓁ⱘ㋏㒳Ёˈ↣Ͼ῵ഫ䛑᳝Ͼ㢹ᑆϾ㡖⚍ˈ಴㗠䛑᳝Ͼ pg_data_t ᭄᥂㒧ᵘⱘ䯳߫DŽ ぎ䯈ټ೼䋼ഄϡഛࣔⱘ⠽⧚ぎ䯈䙷ḋߦߚߎ㢹ᑆ䖲㓁˄㗠Ϩഛࣔ˅ⱘĀ㡖⚍āDŽ᠔ҹˈ೼ᄬڣぎ䯈г㽕 ⧚ഄഔ੠᳔催⠽⧚ഄഔП䯈ᄬ೼ⴔぎ⋲ˈ㗠᳝ぎ⋲ⱘぎ䯈ᔧ✊ᰃ䴲ഛ䋼ⱘDŽ᠔ҹ೼ഄഔϡ䖲㓁ⱘ⠽⧚ ぎ䯈ᰃϔ⾡ᑓНⱘ NUMAˈ಴Ў䙷䇈ᯢ೼᳔Ԣ⠽ټ㗠ϡᰃ CONFIG_NUMADŽ݊ᅲˈϡ䖲㓁ⱘ⠽⧚ᄬ ぎ䯈āˈټCONFIG_DISCONTIGMEM ᳝ᅮНᯊᠡᕫࠄ㓪䆥DŽϡ䖛ˈ䖭䞠⫼ᴹ԰ЎᴵӊⱘᰃĀϡ䖲㓁ᄬ ᇍ NUMA ⱘᬃᣕᰃ䗮䖛ᴵӊ㓪䆥԰Ўৃ䗝乍ᦤկⱘˈ᠔ҹ䖭↉ҷⷕҙ೼ৃ䗝乍ˈܜ佪 128 } 127 return(0); 126 } 125 temp = temp•>node_next; 124 return(ret); 123 if ((ret = alloc_pages_pgdat(temp, gfp_mask, order))) 122 while (temp != start) { 121 temp = pgdat_list; 120 } 119 temp = temp•>node_next; 118 return(ret); 117 if ((ret = alloc_pages_pgdat(temp, gfp_mask, order))) while (temp) { 116 82 83 345 { 346 /* 347 * Gets optimized away by the compiler. 348 */ 349 if (order >= MAX_ORDER) 350 return NULL; 351 return __alloc_pages(contig_page_data.node_zonelists+(gfp_mask), order); 352 } Ϣ NUMA 㒧ᵘⱘ alloc_papes()Ⳍডˈ䖭Ͼߑ᭄ҙ೼ CONFIG_DISCONTIGMEM ᮴ᅮНᯊᠡᕫࠄ㓪 䆥DŽ᠔ҹ䖭ϸϾৠৡⱘߑ᭄া᳝ϔϾӮᕫࠄ㓪䆥DŽ ݋ԧⱘ义䴶ߚ䜡⬅ߑ᭄__alloc_pages()ᅠ៤ˈ݊ҷⷕ೼ mm/page_alloc.c Ёˈ៥Ӏߚ↉䯙䇏˖ ==================== mm/page_alloc.c 270 315 ==================== [alloc_pages()>__alloc_pages()] 270 /* 271 * This is the 'heart' of the zoned buddy allocator: 272 */ 273 struct page * __alloc_pages(zonelist_t *zonelist, unsigned long order) 274 { 275 zone_t **zone; 276 int direct_reclaim = 0; 277 unsigned int gfp_mask = zonelist•>gfp_mask; 278 struct page * page; 279 280 /* 281 * Allocations put pressure on the VM subsystem. 282 */ 283 memory_pressure++; 284 285 /* 286 * (If anyone calls gfp from interrupts nonatomically then it 287 * will sooner or later tripped up by a schedule().) 288 * 289 * We are falling back to lower•level zones if allocation 290 * in a higher zone fails. 291 */ 292 293 /* 294 * Can we take pages directly from the inactive_clean 295 * list? 296 */ 297 if (order == 0 && (gfp_mask & __GFP_WAIT) && 298 !(current•>flags & PF_MEMALLOC)) 299 direct_reclaim = 1; 300 301 /* 302 * If we are about to get low on free pages and we also have 303 * an inactive page shortage, wake up kswapd. Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 336 waitqueue_active(&kreclaimd_wait)) { 335 } else if (z•>free_pages < z•>pages_min && 334 return page; 333 if (page) 332 page = rmqueue(z, order); 331 if (z•>free_pages >= z•>pages_low) { 330 329 BUG(); 328 if (!z•>size) 327 break; 326 if (!z) 325 zone_t *z = *(zone++); 324 for (;;) { 323 zone = zonelist•>zones; 322 */ 321 * any data ... DUH! 320 * We allocate free memory first because it doesn't contain 319 * 318 * First, see if we have any zones with lots of free memory. 317 /* 316 try_again: [alloc_pages()>__alloc_pages()] ==================== mm/page_alloc.c 316 340 ==================== Ӏ䆒⊩㝒ߎϔѯݙᄬ义䴶ᴹ˄䆺㾕Ā义䴶ⱘᅮᳳᤶߎā˅DŽ៥Ӏ㒻㓁ᕔϟⳟ˖ 䖭ѯ义䴶ЁಲᬊDŽℸ໪ˈᔧথ⦄ৃߚ䜡义䴶ⷁ㔎ᯊˈ䖬㽕૸䝦 kswapd ੠ bdflush ϸϾݙḌ㒓⿟ˈ䅽ᅗ ⳳℷⱘぎ䯆义䴶䙷ḋ䖲៤ഫˈ᠔ҹҙ೼㽕∖ߚ䜡ऩϾ义䴶ᯊᠡ㛑Ңڣ⬅Ѣϔ㠀㗠㿔䖭ѯ义䴶ϡϔᅮ㛑 ᕫ೼䳔㽕䖭Ͼ义䴶ⱘݙᆍᯊ᮴䳔ݡҢ䆒໛៪᭛ӊ䇏ܹˈԚᰃᔧぎ䯆义䴶ⷁ㔎ᯊˈህ乒ϡᕫ䙷М໮њDŽ 䯳߫ЁಲᬊDŽ䖭ѯ义䴶ⱘݙᆍ䛑Ꮖݭߎ㟇义䴶Ѹᤶ䆒໛៪᭛ӊЁˈাᰃ䖬ֱᄬⴔ义䴶ⱘݙᆍˈՓކ㓧 义䴶”ޔѢㅵ⧚Ⳃⱘˈ߭ᡞϔϾሔ䚼䞣 direct_reclaim 䆒៤ 1ˈ㸼⼎ৃҹҢⳌᑨ义䴶ㅵ⧚ऎⱘĀϡ⌏䎗ᑆ ᵘˈᰃϔѯ⫼Ѣ᥻ࠊⳂⱘⱘᷛᖫԡDŽབᵰ㽕∖ߚ䜡ⱘাᰃऩϾ义䴶ˈ㗠Ϩ㽕ㄝᕙߚ䜡ᅠ៤ˈজϡᰃ⫼ DŽ䖭䞠ⱘሔ䚼䞣 gfp_mask ᴹ㞾ҷ㸼ⴔ݋ԧߚ䜡ㄪ⬹ⱘ᭄᥂㒧ޣ࡯ˈߚ䜡ݙᄬ义䴶ᯊ䗦๲ˈᔦ䖬ᯊ߭䗦 ϔϾখ᭄ order ߭Ϣࠡ䴶 alloc_pages()ЁⱘⳌৠDŽܼሔ䞣 memory_pressure 㸼⼎ݙᄬ义䴶ㅵ⧚᠔ফⱘय़ 䇗⫼ᯊ᳝ϸϾখ᭄DŽ㄀ϔϾখ᭄ zonelist ᣛ৥ҷ㸼ⴔϔϾ݋ԧߚ䜡ㄪ⬹ⱘ zonelist_t ᭄᥂㒧ᵘDŽ঺ 315 314 wakeup_bdflush(0); 313 && nr_inactive_dirty_pages >= freepages.high) 312 else if (free_shortage() && nr_inactive_dirty_pages > free_shortage() 311 */ 310 * wake up bdflush. 309 * the inactive_dirty pages would fix the situation, 308 * If we are about to get low on free pages and cleaning 307 /* 306 wakeup_kswapd(0); if (inactive_shortage() > inactive_target / 2 && free_shortage()) 305 /* 304 84 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 206 area++; 205 curr_order++; 204 } 203 return page; 202 DEBUG_ADD_PAGE 201 BUG(); 200 if (BAD_RANGE(zone,page)) 199 set_page_count(page, 1); 198 197 spin_unlock_irqrestore(&zone•>lock, flags); 196 page = expand(zone, page, index, order, curr_order, area); 195 194 zone•>free_pages •= 1 << order; 193 MARK_USED(index, curr_order, area); 192 index = (page • mem_map) • zone•>offset; 191 memlist_del(curr); 190 BUG(); 189 if (BAD_RANGE(zone,page)) 188 page = memlist_entry(curr, struct page, list); 187 186 unsigned int index; 185 if (curr != head) { 184 183 curr = memlist_next(head); 182 head = &area•>free_list; 181 do { 180 spin_lock_irqsave(&zone•>lock, flags); 179 178 struct page *page; 177 unsigned long flags; 176 struct list_head *head, *curr; 175 unsigned long curr_order = order; 174 free_area_t * area = zone•>free_area + order; 173 { 172 static struct page * rmqueue(zone_t *zone, unsigned long order) [alloc_pages()>__alloc_pages()>rmqueue()] ==================== mm/page_alloc.c 172 211 ==================== ⧚ऎߚ䜡㢹ᑆ䖲㓁ⱘݙᄬ义䴶ˈ݊ҷⷕ೼ mm/page_alloc.c Ё˖ kreclaimd_wait Ёⴵ⳴ˈህᡞᅗ૸䝦ˈ䅽ᅗᐂࡽಲᬊϔѯ义䴶໛⫼DŽߑ᭄ rmqueue()䆩೒ҢϔϾ义䴶ㅵ ⱘぎ䯆义䴶ᘏ䞣Ꮖ㒣䰡ࠄњ᳔Ԣ⚍ˈ㗠Ϩ᳝䖯⿟˄ᅲ䰙Ϟা㛑ᰃݙḌ㒓⿟ kreclaimd˅೼ϔϾㄝᕙ䯳߫ ⱘᘏ䞣ˈབᵰᘏ䞣ᇮ೼ĀԢ∈ԡāҹϞˈህ䗮䖛 rmqueue()䆩೒Ң䆹ㅵ⧚ऎЁߚ䜡DŽ㽕ᰃথ⦄ㅵ⧚ऎЁ 䖭ᰃᇍϔϾߚ䜡ㄪ⬹Ё᠔㾘ᅮⱘ᠔᳝义䴶ㅵ⧚ऎⱘᕾ⦃DŽᕾ⦃Ёձ⃵㗗ᆳ৘Ͼㅵ⧚ऎЁぎ䯆义䴶 340 339 } 338 } wake_up_interruptible(&kreclaimd_wait); 337 85 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 162 㸠Ёᅠ៤ⱘDŽ✊ৢҢ䆹⠽⧚ഫЁߛএϔञˈ㗠ҹ݊ৢञ䚼԰ЎϔϾᮄⱘ⠽⧚ഫ˄㄀ 163 ੠ 164 㸠 ˅ˈ ञⱘぎ䯆ഫ䯳߫ЁএˈᑊⳌᑨ䆒㕂䆹ぎ䯆ऎ䯳߫ⱘԡ೒ˈ䖭ᰃ೼㄀ 158 㸠㟇ޣϔḷгህᰃ⠽⧚ഫ໻ᇣ ህ㹿䏇䖛њDŽ㢹ᰃߚ䜡ࠄⱘ⠽⧚ഫ໻Ѣ᠔䳔ⱘ໻ᇣ˄ϡৃ㛑ᇣѢ᠔䳔ⱘ໻ᇣ˅ˈ䙷ህᇚ䆹⠽⧚ഫ䫒ܹԢ ህᰃҢЁᕫࠄ㛑⒵䎇㽕∖ⱘ⠽⧚ഫⱘ䯳߫˅ⱘ curr_orderDŽᔧϸ㗙ⳌヺᯊˈҢ 155 㸠ᓔྟⱘ while ᕾ⦃ 䇗⫼খ᭄㸼Ёⱘ low ᇍᑨѢ㸼⼎᠔䳔⠽⧚ഫ໻ᇣⱘ orderˈ㗠 high ߭ᇍᑨѢ㸼⼎ᔧᯊぎ䯆ऎ䯳߫˄г 169 } 168 return page; 167 BUG(); 166 if (BAD_RANGE(zone,page)) 165 } 164 page += size; 163 index += size; 162 MARK_USED(index, high, area); 161 memlist_add_head(&(page)•>list, &(area)•>free_list); 160 size >>= 1; 159 high••; 158 area••; 157 BUG(); 156 if (BAD_RANGE(zone,page)) 155 while (high > low) { 154 153 unsigned long size = 1 << high; 152 { 151 unsigned long index, int low, int high, free_area_t * area) 150 static inline struct page * expand (zone_t *zone, struct page *page, [alloc_pages()>__alloc_pages()>rmqueue()>expand()] ==================== mm/page_alloc.c 150 169 ==================== ߑ᭄ expand()ᰃ೼ৠϔ᭛ӊ˄mm/page_alloc.c˅ЁᅮНⱘ˖ ᇚ݊Ң䯳߫Ёᨬ䰸DŽᇍℸˈ៥ӀᏆ೼㄀ 1 ゴЁ԰䖛㾷䞞DŽ ㋴ˈ✊ৢ䗮䖛 memlist_del()ܗ ㄀ 188 㸠Ёⱘ memlist_entry()ҢϔϾ䴲ぎⱘ䯳߫䞠প㄀ϔϾ㒧ᵘ page 㗠䫒ܹⳌᑨⱘ䯳߫˄䗮䖛 196 㸠ⱘ expand()˅DŽ ህ䆩䆩᳈໻ⱘ˄ᣛ⠽⧚ݙᄬഫ˅䯳߫Ёߚ䜡ˈ៤ࡳⱘ䆱ˈህᡞߚ䜡ࠄⱘ໻ഫЁ࠽ԭⱘ䚼ߚߚ㾷៤ᇣഫ ೼ᙄད⒵䎇໻ᇣ㽕∖ⱘ䯳߫䞠ߚ䜡ˈབᵰϡ㸠ⱘ䆱ܜЏ㽕ⱘ᪡԰ᰃ೼ϔϾ do•while ᕾ⦃Ё䖯㸠DŽᅗ佪 zone•>free_area ᰃϾ㒧ᵘ᭄㒘ˈ᠔ҹzone•>free_area + oderህᣛ৥䫒᥹᠔䳔໻ᇣⱘ⠽⧚ݙᄬഫⱘ䯳߫༈DŽ ᴹᠧᡄⱘDŽ᠔ҹ㽕⫼ spin_lock_irqsave()ᇚⳌᑨⱘߚऎࡴϞ䫕ˈϡᆍ䆌ᠧᡄDŽㅵ⧚ऎ㒧ᵘЁⱘぎ䯆ऎ 䜡义䴶ᯊᔧ✊㽕ᡞᅗҢ䯳߫Ёᨬ䫒ˈ㗠ᨬ䫒ⱘ䖛⿟ᰃϡᆍ䆌݊ᅗⱘ䖯⿟ǃ݊ᅗⱘ໘⧚఼˄བᵰ᳝ⱘ䆱˅ ҹࠡ䆆䖛ˈҷ㸼⠽⧚义䴶ⱘ page ᭄᥂㒧ᵘˈҹঠ৥䫒ⱘᔶᓣ䫒᥹೼ㅵ⧚ऎⱘᶤϾぎ䯆䯳߫ЁDŽߚ 211 } 210 return NULL; 209 208 spin_unlock_irqrestore(&zone•>lock, flags); while (curr_order < MAX_ORDER); { 207 86 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 215 #define PAGES_HIGH 2 214 #define PAGES_LOW 1 213 #define PAGES_MIN 0 [alloc_pages()>__alloc_pages()>__alloc_pages_limit()] ==================== mm/page_alloc.c 213 267 ==================== PAGES_LOW ݡ䇗⫼ϔ⃵DŽߑ᭄__alloc_pages_limit()ⱘҷⷕг೼ mm/page_alloc.c Ё˖ ҹখ᭄ PAGES_HIGH 䇗⫼__alloc_pages_limit() ˗བᵰ䖬ϡ㸠ህݡࡴ໻࡯ᑺˈᬍҹܜ䖭䞠 364 363 return page; 362 if (page) 361 page = __alloc_pages_limit(zonelist, order, PAGES_LOW, direct_reclaim); 360 */ 359 * succeed here. 358 * is low, we're most likely to have our allocation 357 * When the working set is very large and VM activity 356 * 355 * than zone•>pages_low free + inactive_clean pages. 354 * Then try to allocate a page from a zone with more 353 /* 352 351 return page; 350 if (page) 349 page = __alloc_pages_limit(zonelist, order, PAGES_HIGH, direct_reclaim); 348 */ 347 * finding a page using the HIGH limit. 346 * will be high and we'll have a good chance of 345 * If there is a lot of activity, inactive_target 344 * 343 * amount of free + inactive_clean pages. 342 * Try to allocate a page from a zone with a HIGH 341 /* [alloc_pages()>__alloc_pages()] ==================== mm/page_alloc.c 341 364 ==================== ϟⳟ__alloc_pages()ⱘҷⷕ˄mm/page_alloc.c˅DŽ 义䴶āг㗗㰥䖯এDŽ៥Ӏݡᕔޔ೼ㅵ⧚ऎЁⱘĀϡ⌏䎗ᑆކㅵ⧚ऎЁֱᣕĀ∈ԡāⱘ㽕∖ˈѠᰃᡞ㓧 㽕ᰃ㒭ᅮߚ䜡ㄪ⬹Ё᠔᳝ⱘ义䴶ㅵ⧚ऎ䛑༅䋹њˈ䙷ህাདĀࡴ໻࡯ᑺāݡ䆩ˈϔᰃ䰡Ԣᇍ义䴶 Ў 0˅ˈ߭㞾✊↣Ͼ义䴶ⱘՓ⫼䅵᭄䛑ᰃ 1DŽ Ͼ义䴶ⱘ page 㒧ᵘˈᑊϨ䆹 page 㒧ᵘЁⱘՓ⫼䅵᭄ count Ў 1DŽབᵰ↣⃵ߚ䜡ⱘ䛑ᰃऩϾⱘ义䴶˄order 㒜༅䋹˄㾕 327 㸠˅DŽབᵰߚ䜡៤ࡳњˈ߭__alloc_pages()䖨ಲϔϾ page 㒧ᵘᣛ䩜ˈᣛ৥义䴶ഫЁ㄀ϔ 䗮䖛݊ for ᕾ⦃䰡Ḑҹ∖ˈ᥹ⴔ䆩ߚ䜡ㄪ⬹Ё㾘ᅮⱘϟϔϾㅵ⧚ऎˈⳈࠄ៤ࡳˈ៪㗙⺄ࠄњぎᣛ䩜㗠᳔ ህ䖭ḋˈrmqueue()ϔⳈᕔϞᠿᦣˈⳈࠄ៤ࡳ៪㗙᳔㒜༅䋹DŽབᵰ rmqueue()༅䋹ˈ߭__alloc_pages() ᰃᅲ䰙࠽ϟⱘ⠽⧚ഫϢ㽕∖ᙄདⳌヺⱘᯊ׭ˈᕾ⦃ህ㒧ᴳњDŽ 㗠ৢᓔྟϟϔ䕂ᕾ⦃гህᰃ໘⧚᳈Ԣϔḷⱘぎ䯆ഫ䯳߫DŽ䖭ḋˈ᳔ৢᖙ᳝ high Ϣ low ϸ㗙Ⳍㄝˈгህ 87 88 216 217 /* 218 * This function does the dirty work for __alloc_pages 219 * and is separated out to keep the code size smaller. 220 * (suggested by Davem at 1:30 AM, typed by Rik at 6 AM) 221 */ 222 static struct page * __alloc_pages_limit(zonelist_t *zonelist, 223 unsigned long order, int limit, int direct_reclaim) 224 { 225 zone_t **zone = zonelist•>zones; 226 227 for (;;) { 228 zone_t *z = *(zone++); 229 unsigned long water_mark; 230 231 if (!z) 232 break; 233 if (!z•>size) 234 BUG(); 235 236 /* 237 * We allocate if the number of free + inactive_clean 238 * pages is above the watermark. 239 */ 240 switch (limit) { 241 default: 242 case PAGES_MIN: 243 water_mark = z•>pages_min; 244 break; 245 case PAGES_LOW: 246 water_mark = z•>pages_low; 247 break; 248 case PAGES_HIGH: 249 water_mark = z•>pages_high; 250 } 251 252 if (z•>free_pages + z•>inactive_clean_pages > water_mark) { 253 struct page *page = NULL; 254 /* If possible, reclaim a page directly. */ 255 if (direct_reclaim && z•>free_pages < z•>pages_min + 8) 256 page = reclaim_page(z); 257 /* If that fails, fall back to rmqueue. */ 258 if (!page) 259 page = rmqueue(z, order); 260 if (page) 261 return page; 262 } 263 } 264 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 89 265 /* Found nothing. */ 266 return NULL; 267 } 䖭Ͼߑ᭄ⱘҷⷕϢࠡ䴶__alloc_pages()Ёⱘ for ᕾ⦃೼䘏䕥ϟাᰃ⿡᳝ϡৠˈ៥Ӏᡞᅗ⬭㒭䇏㗙DŽ ݊Ё reclaim_page()Ң义䴶ㅵ⧚ऎⱘ inactive_clean_list 䯳߫Ёಲᬊ义䴶ˈ݊ҷⷕ೼ mm/vmscan.c Ёˈ៥ Ӏᡞᅗ߫ߎ೼Ā义䴶ⱘᅮᳳᤶߎāϔ㡖ⱘ᳿ሒˈ䇏㗙ৃҹ೼ᄺдњ义䴶ⱘᤶܹ੠ᤶߎҹৢ㞾Ꮕ䯙䇏DŽ ⊼ᛣ䇗⫼䖭Ͼߑ᭄ⱘᴵӊᰃখ᭄ direct_reclaim 䴲 0ˈ᠔ҹ㽕∖ߚ䜡ⱘϔᅮᰃऩϾ义䴶DŽ 䖬ᰃϡ㸠ⱘ䆱ˈ䙷ህ䇈ᯢ䖭ѯㅵ⧚ऎЁⱘ义䴶Ꮖ㒣Ϲ䞡ⷁ㔎њˈ䅽៥Ӏⳟⳟ__alloc_pages()ᰃབԩ ᇍҬⱘ˖ ==================== mm/page_alloc.c 365 399 ==================== [alloc_pages()>__alloc_pages()] 365 /* 366 * OK, none of the zones on our zonelist has lots 367 * of pages free. 368 * 369 * We wake up kswapd, in the hope that kswapd will 370 * resolve this situation before memory gets tight. 371 * 372 * We also yield the CPU, because that: 373 * • gives kswapd a chance to do something 374 * • slows down allocations, in particular the 375 * allocations from the fast allocator that's 376 * causing the problems ... 377 * • ... which minimises the impact the "bad guys" 378 * have on the rest of the system 379 * • if we don't have __GFP_IO set, kswapd may be 380 * able to free some memory we can't free ourselves 381 */ 382 wakeup_kswapd(0); 383 if (gfp_mask & __GFP_WAIT) { 384 __set_current_state(TASK_RUNNING); 385 current•>policy |= SCHED_YIELD; 386 schedule(); 387 } 388 389 /* 390 * After waking up kswapd, we try to allocate a page 391 * from any zone which isn't critical yet. 392 * 393 * Kswapd should, in most situations, bring the situation 394 * back to normal in no time. 395 */ 396 page = __alloc_pages_limit(zonelist, order, PAGES_MIN, direct_reclaim); 397 if (page) 398 return page; 399 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 434 break; 433 if (!page) 432 page = reclaim_page(z); 431 /* Move one page to the free list. */ 430 struct page * page; 429 while (z•>inactive_clean_pages) { 428 continue; 427 if (!z•>size) 426 break; 425 if (!z) 424 zone_t *z = *(zone++); 423 for (;;) { 422 current•>flags &= ~PF_MEMALLOC; 421 page_launder(gfp_mask, 1); 420 current•>flags |= PF_MEMALLOC; 419 /* First, clean some dirty pages. */ 418 zone = zonelist•>zones; 417 if (order > 0 && (gfp_mask & __GFP_WAIT)) { 416 */ 415 * piece of free memory. 414 * in the hope of creating a large, physically contiguous 413 * Move pages from the inactive_clean to the free list 412 * 411 * Are we dealing with a higher order allocation? 410 /* 409 if (!(current•>flags & PF_MEMALLOC)) { 408 */ 407 * ••> wait on the kswapd waitqueue until memory is freed 406 * • we're /really/ tight on memory 405 * ••> move pages to the free list until we succeed 404 * • we're doing a higher•order allocation 403 * This can be due to 2 reasons: 402 * 401 * Damn, we didn't succeed. 400 /* [alloc_pages()>__alloc_pages()] ==================== mm/page_alloc.c 400 477 ==================== ⳟᇍѢϔ㠀䖯⿟ˈे PF_MEMALLOC ᷛᖫԡЎ 0 ⱘ䖯⿟ⱘᇍㄪDŽ ܜϔ㠀ⱘ䖯⿟᳈䞡㽕DŽ䖭ѯ䖯⿟ⱘ task_struct 㒧ᵘЁ flags ᄫ↉ⱘ PF_MEMALLOC ᷛᖫԡЎ 1DŽ៥Ӏ ህᰃĀݙᄬߚ䜡Ꮉ԰㗙āˈ㽕∖ߚ䜡ݙᄬ义䴶ⱘⳂⱘᰃᠻ㸠݀ࡵˈᰃ㽕᳈དഄߚ䜡ݙᄬ义䴶ˈ䖭ᔧ✊↨ 㽕ⳟᰃ䇕೼㽕∖ߚ䜡ݙᄬ义䴶њDŽབᵰ㽕∖ߚ䜡义䴶ⱘ䖯⿟˄៪㒓⿟˅ᰃ kswapd ៪ kreclaimdˈᴀ䑿 䆌ㄝᕙᯊˈህҹখ᭄ PAGES_MIN ݡ䇗⫼ϔ⃵__alloc_pages_limit()DŽৃᰃˈ㽕ᰃݡ༅䋹ਸ਼˛䖭ᯊ׭ህ ܕ䕏њय़࡯DŽᔧ䇋∖ߚ䜡义䴶ⱘ䖯⿟ݡ⃵㹿䇗ᑺ䖤㸠ᯊˈ៪㗙ߚ䜡ㄪ⬹㸼ᯢϡޣˈ∖ߚ䜡义䴶ⱘ䗳ᑺ 㓧њ㽕ޣϔᴹ䅽 kswapd ᳝ৃ㛑ゟे㹿䇗ᑺ䖤㸠ˈѠᴹ݊ᅗ䖯⿟г᳝ৃ㛑Ӯ䞞ᬒߎϔѯ义䴶ˈݡ䇈гৃ ᖫ೼ᖙᕫˈߚ䜡ϡࠄᯊᅕৃㄝᕙˈህ䅽㋏㒳ᴹϔ⃵䇗ᑺˈᑊϨ䅽ᔧࠡ䖯⿟Ў݊ᅗ䖯⿟䅽ϔϟ䏃DŽ䖭ḋˈ ᰃ૸䝦ݙḌ㒓⿟ kswapdˈ䅽ᅗ䆒⊩ᤶߎϔѯ义䴶DŽབᵰߚ䜡ㄪ⬹㸼ᯢᇍѢ㽕∖ߚ䜡ⱘ义䴶ᰃܜ佪 90 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 义䴶೼ܼሔⱘ inactive_dirty_pages 䯳߫Ёˈᡞ㛣义䴶ⱘݙᆍݭߎࠄѸᤶ䆒໛Ϟ៪᭛ӊЁˈህৃҹՓᅗ inactive_clean_pages 䯳߫Ёˈབᵰࡴҹಲᬊህ᳝ৃ㛑ᣐ㺙䍋䕗໻ⱘ义䴶ഫDŽৠᯊˈৃ㛑䖬᳝ѯĀ㛣” ⾡ᰃᘏ䞣݊ᅲ䖬ϡᇥˈԚᰃ᠔㽕∖ⱘ义䴶ഫ໻ᇣैϡ㛑⒵䎇ˈℸᯊᕔᕔ᳝ϡᇥऩϾⱘ义䴶೼ㅵ⧚ऎⱘ ߚ䜡ݙᄬ义䴶༅䋹ⱘॳ಴ৃ㛑ᰃϸᮍ䴶ⱘˈϔ⾡ৃ㛑ᰃৃߚ䜡义䴶ⱘᘏ䞣ᅲ೼Ꮖ㒣໾ᇥњ˗঺ϔ 477 476 } 475 474 } 473 goto try_again; 472 if (!order) 471 memory_pressure++; 470 try_to_free_pages(gfp_mask); 469 } else if (gfp_mask & __GFP_WAIT) { 468 */ 467 * free ourselves... 466 * kswapd does get the chance to free memory we can't 465 * SUBTLE: The scheduling point above makes sure that 464 * 463 * kswapd just might need some IO locks /we/ are holding ... 462 * If __GFP_IO isn't set, we can't wait on kswapd because 461 /* 460 goto try_again; 459 if (!order) 458 memory_pressure++; 457 wakeup_kswapd(1); 456 if ((gfp_mask & (__GFP_WAIT|__GFP_IO)) == (__GFP_WAIT|__GFP_IO)) { 455 */ 454 * of memory *ever*. 453 * simply cannot free a large enough contiguous area 452 * order allocations since it is possible that kswapd 451 * reliable. Note that we don't loop back for higher 450 * the memory kswapd frees for us and we need to be 449 * We have to do this because something else might eat 448 * 447 * up again. After that we loop back to the start. 446 * We wake up kswapd and sleep until kswapd wakes us 445 * 444 * When we arrive here, we are really tight on memory. 443 /* 442 } 441 } 440 } 439 return page; 438 if (page) 437 page = rmqueue(z, order); 436 /* Try if the allocation succeeds. */ free_page(page);__ 435 91 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 501 * instant execution... 500 * happen when the OOM killer selects this task for 499 * becomes PF_MEMALLOC while looping above. This will 498 * SUBTLE: direct_reclaim is only possible if the task 497 /* 496 495 BUG(); 494 if (!z•>size) 493 break; 492 if (!z) 491 struct page * page = NULL; 490 zone_t *z = *(zone++); 489 for (;;) { 488 zone = zonelist•>zones; 487 */ 486 * deadlock the system... 485 * in the system, otherwise it would be just too easy to 484 * Only recursive allocations can use the very last pages 483 * 482 * recursive allocations (PF_MEMALLOC) end up here. 481 * Higher order allocations, GFP_ATOMIC allocations and 480 * 479 * Final phase: allocate anything we can! 478 /* [alloc_pages()>__alloc_pages()] ==================== mm/page_alloc.c 478 521 ==================== ៥Ӏ㒻㓁ᕔϟⳟ__alloc_pages()ⱘҷⷕDŽ 㗠⦄೼ᏆࠄњĀϡᚰ㸔ᴀāⱘᯊ׭њDŽˈމ⢊z•>pages_minDŽП᠔ҹ䖬⬭ⴔϔ⚍Ā㗕ᴀāˈᰃЎᑨҬ㋻ᗹ ߭ᰃㅵ⧚ऎЁৃߚ䜡义䴶ⱘĀ∈ԡā催Ѣޚϔ⃵ҹ PAGES_MIN Ўখ᭄ˈℸᯊ߸ᮁᰃ৺ৃҹߚ䜡ⱘ ࠡ䴶៥Ӏⳟࠄˈϔ⃵⃵ࡴ໻࡯ᑺ䇗⫼__alloc_pages_limit()ᯊˈᅲ䰙Ϟ䖬ᰃ᳝᠔ֱ⬭ⱘDŽ՟བˈ᳔ৢ ᮑˈাϡ䖛಴Ў㽕∖ߚ䜡ⱘᰃ៤ഫⱘ义䴶ᠡ≵᳝䕀ಲࠡ䴶ⱘᷛো try_again ໘DŽ 䙷МˈབᵰᰃĀᠻ㸠݀ࡵāਸ਼˛៪㗙ˈ㱑✊ϡᰃᠻ㸠݀ࡵˈԚᏆᛇሑњϔߛࡲ⊩ˈ䞛পњϔߛ᥾ try_again ໘DŽ঺ϔ⾡ࡲ⊩ᰃⳈ᥹䇗⫼ try_to_free_pages()ˈ䖭Ͼߑ᭄ᴀᴹᰃ kswapd 䇗⫼ⱘDŽ 义䴶ⱘ䖯⿟DŽ✊ৢˈབᵰ㽕∖ߚ䜡ⱘᰃऩϾ义䴶ˈህ䗮䖛 goto 䇁হ䕀ಲ__alloc_pages()ᓔ༈໘ⱘᷛো kswapdˈ㗠㽕∖ߚ䜡义䴶ⱘ䖯⿟߭ⴵ⳴ㄝᕙˈ⬅ kswapd ೼ᅠ៤њϔ䕂䖤㸠Пৢݡড䖛ᴹ૸䝦㽕∖ߚ䜡 བᵰಲᬊњ䖭ḋⱘ义䴶ҹৢ䖬ᰃϡ㸠ˈ䙷ህᰃৃߚ䜡义䴶ⱘᘏ䞣ϡ໳њDŽ䖭ᯊ׭ϔ⾡ࡲ⊩ᰃ૸䝦 ᯊᗻⱘᎹ԰义䴶ˈϡᡞ PF_MEMALLOC ᷛᖫԡ䆒៤ 1 ህৃ㛑䗦ᔦഄ䖯ܹ䖭䞠ⱘ 409̚476 㸠DŽ ਸ਼˛䖭ᰃ಴Ў೼ page_launder()ЁгӮ㽕∖ߚ䜡ϔѯЈخ᳝݊њĀᠻ㸠݀ࡵāᯊⱘ⡍ᴗDŽЎҔМ㽕䖭ḋ ∖DŽؐᕫ⊼ᛣⱘᰃˈ䖭䞠೼䇗⫼ page_launder()ᳳ䯈ᡞᔧࠡ䖯⿟ⱘ PF_MEMALLOC ᷛᖫԡ䆒៤ 1ˈՓ ሑৃ㛑໻ⱘ义䴶ഫˈ᠔ҹ೼↣ಲᬊњϔϾ义䴶ҹৢ䛑㽕䇗⫼ rmqueue()䆩ϔϟˈⳟⳟᰃ৺Ꮖ㒣㛑⒵䎇㽕 ݋ԧⱘಲᬊ੠䞞ᬒᰃ䗮䖛ϔϾ while ᕾ⦃ᅠ៤ⱘDŽ೼䗮䖛__free_page()䞞ᬒ义䴶ᯊӮᡞぎ䯆义䴶ᣐ㺙䍋 ā义䴶DŽޔā˄䆺㾕Ā义䴶ⱘᅮᳳᤶߎā˅ˈ✊ৢ䗮䖛ϔϾ for ᕾ⦃೼৘Ͼ义䴶ㅵ⧚ऎЁಲᬊ੠䞞ᬒĀᑆޔ ⋪ā义䴶㗠ࡴҹಲᬊDŽ᠔ҹˈ䩜ᇍ㄀Ѡ⾡ৃ㛑ˈҷⷕЁ䗮䖛 page_launder()ᡞĀ㛣义ā䴶ĀޔӀব៤Āᑆ 92 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䯈DŽ೼䖭ϔ⚍ϞˈᅗϢЁᮁ᳡ࡵ⿟ᑣⳌԐDŽ݊⃵ˈᅗⱘҷⷕᰃ䴭ᗕഄ䖲᥹೼ݙḌЁⱘˈৃҹⳈ᥹䇗⫼ 㒳⧚䆎Ё⿄ЎĀ㒓⿟ā˄thread˅ҹ⼎ऎ߿DŽ䙷Мˈkswapd Փ⫼䇕ⱘഄഔぎ䯈ਸ਼˛ᅗՓ⫼ⱘᰃݙḌⱘぎ ᅗ≵᳝㞾Ꮕ⣀ゟⱘഄഔぎ䯈ˈ᠔ҹ೼䖥ҷ᪡԰㋏ˈܜϢ᱂䗮ⱘ䖯⿟Ⳍ↨ˈkswapd 䖬ᰃ᳝݊⡍⅞ᗻDŽ佪 ফݙḌⱘ䇗ᑺDŽ㗠ℷ಴ЎݙḌᇚᅗᣝ䖯⿟ᴹ䇗ᑺˈህৃҹ䅽ᅗ೼㋏㒳Ⳍᇍぎ䯆ⱘᯊ׭ᴹ䖤㸠DŽϡ䖛ˈ Ңॳ⧚Ϟ䇈ˈkswapd ⳌᔧѢϔϾ䖯⿟ˈ᳝݊㞾䑿ⱘ䖯⿟᥻ࠊഫ task_struct 㒧ᵘˈ䎳݊ᅗ䖯⿟ϔḋ ᇚ义䴶ᤶߎⱘĀᅜᡸ⼲”kswapdDŽ ᅲ䰙Ϟᕜᇥথ⫳DŽЎℸˈ೼ Linux ݙḌЁ䆒㕂њϔϾϧৌᅮᳳމᐌথ⫳ᯊᖙ乏Јᯊᇏᡒ义䴶ᤶߎⱘᚙ ὖ⥛DŽᑊϨˈ䗮䖛䗝ᢽ䗖ᔧⱘখ᭄ˈ՟བ↣䱨໮Йᤶߎϔ⃵ˈ↣⃵ᤶߎ໮ᇥ义䴶ˈৃҹՓᕫ೼㔎义ᓖ ᇥ݊থ⫳ⱘޣᐌথ⫳ᯊݙᄬ≵᳝ぎ䯆义䴶ˈ㗠াདЈᯊᇏᡒৃᤶߎ义䴶ⱘৃ㛑DŽԚᰃˈ䖭ḋ↩コৃҹ њг䖬ᰃϡ㛑ᅠܼᴰ㒱೼㔎义ᓖخথ⫳ᯊⱘ䋳ᢙDŽᔧ✊ˈ⬅Ѣ᮴⊩⹂ߛഄ乘⌟义䴶ⱘՓ⫼ˈेՓ䖭ḋ 䕏㋏㒳೼㔎义ᓖᐌޣᇚ㢹ᑆ义䴶ᤶߎˈ㝒ߎぎ䯈ˈҹܜ义䴶ᑊࡴҹᤶߎˈLinux ݙḌᅮᳳഄẔᶹᑊϨ乘 ᘏᰃ೼ CPU ᖭ⹠ⱘᯊ׭ˈгህᰃ೼㔎义ᓖᐌথ⫳ⱘᯊ׭ˈЈᯊݡᴹ᧰ᇏৃկᤶߎⱘݙᄬܡЎњ䙓 䖭Ͼᚙ᱃↨䕗䭓ˈ䇏㗙ᕫ᳝⚍㗤ᖗDŽ 2.8 义䴶ⱘᅮᳳᤶߎ Ёህ៤ࡳњDŽϡ䖛ˈҢ䖭䞠៥Ӏৃҹⳟࠄ䆒䅵ϔϾ㋏㒳䳔㽕ԩㄝ਼ᆚⱘ㗗㰥DŽ М໮㡄㢺ध㒱ⱘࡾ࡯DŽᅲ䰙Ϟˈ㒱໻໮᭄ⱘߚ䜡义䴶᪡԰䛑ᰃ೼ߚ䜡ㄪ⬹᠔㾘ᅮⱘ㄀ϔϾ义䴶ㅵ⧚ऎ ᅮߚ䜡义䴶ⱘࡾ࡯Āስ៬ስ䋹āˈ㗠জĀስ䋹ስ៬āˈ䖭ᠡ᳝䖭؛ᅲ䋼ᗻⱘ䅵ㅫਸ਼˛㽕ⶹ䘧៥Ӏ䖭䞠ᰃ 䇏㗙г䆌Ӯ䇈˖དᆊӭˈߚ䜡ϔϾ˄៪޴Ͼ˅ݙᄬ义䴶᳝䖭М咏⚺ˈ䙷 CPU 䖬᳝໮ᇥᯊ䯈㛑⫼Ѣ བᵰ䖲䖭г༅䋹ˈ䙷ϔᅮᰃ㋏㒳᳝䯂乬њDŽ 521 } 520 return NULL; 519 printk(KERN_ERR "__alloc_pages: %lu•order allocation failed.\n", order); 518 /* No luck.. */ 517 516 } 515 return page; 514 if (page) 513 page = rmqueue(z, order); 512 continue; 511 !(current•>flags & PF_MEMALLOC)) 510 if (z•>free_pages < z•>pages_min / 4 && 509 /* XXX: is pages_min/4 a good amount to reserve for this? */ 508 507 } 506 return page; 505 if (page) 504 page = reclaim_page(z); if (direct_reclaim) { 503 /* 502 93 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 951 * This basically trickles out pages so that we have _some_ 950 * 949 * from the init process. 948 * The background pageout daemon, started as a kernel thread 947 /* ==================== mm/vmscan.c 947 1046 ==================== ህℸᓎゟњˈᑊϨҢߑ᭄ kswapd()ᓔྟᠻ㸠DŽ݊ҷⷕ೼ mm/vmscan.c Ё˖ ᅮ㒓⿟ kswapd؛㽕ˈ᠔ҹ៥Ӏ᱖Ϩᡞᅗᬒ೼ϔ䖍DŽ݇Ѣᓎゟ㒓⿟ⱘ䆺ᚙ䇋খ䯙䖯⿟ㅵ⧚ϔゴˈ䖭䞠᱖Ϩ kswapd 䙷М໡ᴖ੠䞡 ڣㅵ⧚᳝݇ˈϡ䖛ϡټᅠ៤ⱘDŽ䖭䞠䖬߯ᓎњ঺ϔϾ㒓⿟ kreclaimdˈгᰃ䎳ᄬ ݙᄬᴀ䑿ⱘ໻ᇣᴹ⹂ᅮ䖭Ͼখ᭄ᰒ✊ᰃড়⧚ⱘDŽ㄀Ѡӊџህᰃ߯ᓎ㒓⿟ kswapdˈ䖭ᰃ⬅ kernel_thread() ᅮϔϾ䗖ᔧⱘ᭄䞣ˈ㗠ḍ᥂⠽⧚އĀ乘䇏āDŽԚᰃ乘䇏ᛣੇⴔ↣⃵䳔㽕᱖ᄬ᳈໮ⱘݙᄬ义䴶ˈ᠔ҹ䳔㽕 ᪡԰ˈ᠔ҹབᵰ↣⃵া䇏ϔϾ义䴶ᰃϡ㒣⌢ⱘDŽ↨䕗དⱘࡲ⊩ᰃ᮶✊䇏њህᑆ㛚໮䇏޴Ͼ义䴶ˈ⿄Ў 㽕㒣䖛ᇏ䘧ˈᑊϨᇏ䘧ᰃϾ↨䕗䌍ᯊ䯈ⱘܜ䖭ᰃϔϾ䎳⺕Ⲭ䆒໛偅ࡼ᳝݇ⱘখ᭄DŽ⬅Ѣ䇏⺕Ⲭᯊ 305 } 304 page_cluster = 4; 303 else 302 page_cluster = 3; 301 else if (num_physpages < ((32 * 1024 * 1024) >> PAGE_SHIFT)) 300 page_cluster = 2; 299 if (num_physpages < ((16 * 1024 * 1024) >> PAGE_SHIFT)) 298 /* Use a smaller cluster for memory <16MB or <32MB */ 297 { 296 void __init swap_setup(void) 295 */ 294 * Perform any setup for the swap system 293 /* [kswapd_init()>swap_setup()] ==================== mm/swap.c 293 305 ==================== ḍ᥂⠽⧚ݙᄬⱘ໻ᇣ䆒ᅮϔϾܼሔ䞣 page_cluster˖ ϸӊџDŽ㄀ϔӊᰃ೼ swap_setup()Ёخߑ᭄ kswapd_init()ᰃ೼㋏㒳߱ྟ࣪ᳳ䯈ফࠄ䇗⫼ⱘˈᅗЏ㽕 1153 } 1152 return 0; 1151 kernel_thread(kreclaimd, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL); 1150 kernel_thread(kswapd, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL); 1149 swap_setup(); 1148 printk("Starting kswapd v1.8\n"); 1147 { 1146 static int __init kswapd_init(void) ==================== mm/vmscan.c 1146 1153 ==================== ᴹⳟᅗⱘᓎゟ˖ܜ㒓⿟ kswapd ⱘ⑤ҷⷕ෎ᴀϞ䛑೼ mm/vmscan.c ЁDŽ ᴀ㡖䆆䗄 kswapd ফݙḌ䇗ᑺ㗠䖤㸠ᑊ䍄ᅠϔᴵ՟㸠䏃㒓ⱘܼ䖛⿟DŽ ᅮНདⱘϔ㒘ࡳ㛑DŽܜ᱂䗮ⱘ䖯⿟䙷ḋা㛑䗮䖛㋏㒳䇗⫼ˈՓ⫼乘ڣݙḌЁⱘ৘⾡ᄤ⿟ᑣˈ㗠ϡ 94 95 952 * free memory available even if there is no other activity 953 * that frees anything up. This is needed for things like routing 954 * etc, where we otherwise might have all activity going on in 955 * asynchronous contexts that cannot page things out. 956 * 957 * If there are applications that are active memory•allocators 958 * (most normal use), this basically shouldn't matter. 959 */ 960 int kswapd(void *unused) 961 { 962 struct task_struct *tsk = current; 963 964 tsk•>session = 1; 965 tsk•>pgrp = 1; 966 strcpy(tsk•>comm, "kswapd"); 967 sigfillset(&tsk•>blocked); 968 kswapd_task = tsk; 969 970 /* 971 * Tell the memory management that we're a "memory allocator", 972 * and that if we need more memory we should get access to it 973 * regardless (see "__alloc_pages()"). "kswapd" should 974 * never get caught in the normal page freeing logic. 975 * 976 * (Kswapd normally doesn't need memory anyway, but sometimes 977 * you need a small amount of memory in order to be able to 978 * page out something else, and this flag essentially protects 979 * us from recursively trying to free more memory as we're 980 * trying to free the first piece of memory in the first place). 981 */ 982 tsk•>flags |= PF_MEMALLOC; 983 984 /* 985 * Kswapd main loop. 986 */ 987 for (;;) { 988 static int recalc = 0; 989 990 /* If needed, try to free some memory. */ 991 if (inactive_shortage() || free_shortage()) { 992 int wait = 0; 993 /* Do we need to do some synchronous flushing? */ 994 if (waitqueue_active(&kswapd_done)) 995 wait = 1; 996 do_try_to_free_pages(GFP_KSWAPD, wait); 997 } 998 999 /* 1000 * Do some (very minimal) background scanning. This Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! interruptible_sleep_on_timeout()䖯ܹⴵ⳴ˈ䅽ݙḌ㞾⬅ഄ䇗ᑺ߿ⱘ䖯⿟䖤㸠DŽԚᰃݙḌ೼ϔᅮᯊ䯈ҹৢ ೼ϔѯㅔऩⱘ߱ྟ࣪᪡԰ҹৢˈ⿟ᑣ֓䖯ܹϔϾ᮴䰤ᕾ⦃DŽ೼↣⃵ᕾ⦃ⱘ᳿ሒϔ㠀䛑Ӯ䇗⫼ 1046 } 1045 } 1044 } 1043 oom_kill(); 1042 } else if (out_of_memory()) { 1041 */ 1040 * and try free some more memory... 1039 * If there still is enough memory around, we just loop 1038 * 1037 * a process (the alternative is enternal deadlock). 1036 * If that is the case, the only solution is to kill 1035 * due to the system just not having enough memory. 1034 * If we couldn't free enough memory, we see if it was 1033 /* 1032 interruptible_sleep_on_timeout(&kswapd_wait, HZ); 1031 if (!free_shortage() || !inactive_shortage()) { 1030 */ 1029 * we'll be woken up earlier... 1028 * We go to sleep for one second, but if it's needed 1027 * 1026 * it wouldn't help to eat CPU time now ... 1025 * 2) the inactive pages need to be flushed to disk, 1024 * 1) we need no more free pages or 1023 * because: 1022 * or the inactive page shortage is gone. We do this 1021 * We go to sleep if either the free page shortage 1020 /* 1019 1018 run_task_queue(&tq_disk); 1017 wake_up_all(&kswapd_done); 1016 */ 1015 * and unplug the disk queue. 1014 * Wake up everybody waiting for free memory 1013 /* 1012 1011 } 1010 recalculate_vm_stats(); 1009 recalc = jiffies; 1008 if (time_after(jiffies, recalc + HZ)) { 1007 /* Once a second, recalculate some VM stats. */ 1006 1005 refill_inactive_scan(6, 0); 1004 */ 1003 * and moves unused pages to the inactive list. 1002 * every minute. This clears old referenced bits will scan all pages on the active list once * 1001 96 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ࣪āˈेݭܹѸᤶ䆒໛ҹৢᠡ㛑ᡩܹߚ䜡DŽ䖭⾡义䴶ޔࡴҹĀܜ⦄᳝ⱘϡ⌏䎗Ā㛣ā义䴶ˈ䖭ѯ义䴶㽕 гߚᬷ೼৘Ͼ义䴶ㅵ⧚ऎЁˈԚᑊϡড়ᑊ៤ഫˈ᭄݊䞣⬅ nr_inactive_clean_pages()ࡴҹ㒳䅵DŽ᳔ৢᰃ ᇥҢѸᤶ䆒໛ⱘ䇏ܹDŽ䖭ѯ义䴶ޣᰃ义䴶Ёⱘݙᆍৃ㛑䖬Ӯ⫼ࠄˈ᠔ҹ໮ֱ⬭ϔѯ䖭ḋⱘ义䴶᳝ࡽѢ ā义䴶ˈ䖭ѯ义䴶ᴀ䋼Ϟгᰃ偀Ϟህৃҹߚ䜡ⱘ义䴶ˈԚޔࡴҹ㒳䅵DŽ঺ϔᮍ䴶ᰃ⦄᳝ⱘϡ⌏䎗Āᑆ Ͼ义䴶ㅵ⧚ऎЁˈᑊϨড়ᑊ៤ഄഔ䖲㓁ǃ໻ᇣЎ 2ǃ4ǃ8ǃĂǃ2N Ͼ义䴶ⱘ义䴶ഫˈ᭄݊䞣⬅ nr_free_pages() ᴹ⑤᳝߭ϝϾᮍ䴶DŽϔᮍ䴶ᰃᔧࠡᇮᄬⱘぎ䯆义䴶ˈ䖭ᰃゟेህৃҹߚ䜡ⱘ义䴶DŽ䖭ѯ义䴶ߚᬷ೼৘ ϟ┰೼ⱘկᑨ䞣DŽ㗠䖭ѯݙᄬ义䴶ⱘމ߿Ўぎ䯆义䴶ⱘ᭄䞣੠ϡ⌏䎗义䴶ⱘ᭄䞣ˈѠ㗙П੠Ўℷᐌᚙ ㋏㒳Ёᑨ䆹㓈ᣕⱘ⠽⧚义䴶կᑨ䞣⬅ϸϾܼሔ䞣⹂ᅮˈ䙷ህᰃ freepages.high ੠ inactive_targetˈߚ 822 } 821 return 0; 820 819 return shortage; 818 if (shortage > 0) 817 816 shortage •= nr_inactive_dirty_pages; 815 shortage •= nr_inactive_clean_pages(); 814 shortage •= nr_free_pages(); 813 shortage += inactive_target; 812 shortage += freepages.high; 811 810 int shortage = 0; 809 { 808 int inactive_shortage(void) 807 */ 806 * How many inactive pages are we short? 805 /* [kswapd()>inactive_shortage()] ==================== mm/vmscan.c 805 822 ==================== ẔᶹݙᄬЁৃկߚ䜡៪਼䕀ⱘ⠽⧚义䴶ᰃ৺ⷁ㔎˖ܜⳟ㄀ϔ䚼ߚˈ佪ܜ ៪䖯ϔℹಲᬊϔѯ䖭ḋⱘ义䴶៤Ўぎ䯆义䴶DŽˈކ ā义䴶㒻㓁㓧ޔⳂⱘ೼ѢᡞᏆ㒣໘Ѣϡ⌏䎗⢊ᗕⱘĀ㛣ā义䴶ݭܹѸᤶ䆒໛ˈՓᅗӀ៤Ўϡ⌏䎗Āᑆ ໛DŽ㄀Ѡ䚼ߚᰃ↣⃵䛑㽕ᠻ㸠ⱘˈޚՓ䖭ѯ⠽⧚义䴶Ң⌏䎗⢊ᗕ䕀ܹϡ⌏䎗⢊ᗕˈЎ义䴶ⱘᤶߎ԰ད ᡒߎ㢹ᑆ义䴶ˈϨᇚ䖭ѯ义䴶ⱘ᯴ᇘᮁᓔˈܜϟᠡ䖯㸠ⱘˈⳂⱘ೼Ѣ乘މথ⦄⠽⧚义䴶Ꮖ㒣ⷁ㔎ⱘᚙ ѯҔМਸ਼˛ৃҹᡞᅗߚ៤ϸ䚼ߚDŽ㄀ϔ䚼ߚᰃ೼خ䙷Мˈkswapd ೼䖭㟇ᇥ↣⾦ϔ⃵ⱘ՟㸠䏃㒓Ё ⱘ՟㸠䏃㒓DŽ ḋ kswapd ህӮᦤࠡ䖨ಲ㗠ᓔྟᮄⱘϔ䕂ᕾ⦃DŽ᠔ҹˈ䖭Ͼᕾ⦃㟇ᇥ↣䱨 1 ⾦䩳ᠻ㸠ϔ䘡ˈ䖭ህᰃ kswapd ϟݙḌгӮ೼ϡࠄ 1 ⾦䩳ᯊህᡞᅗ૸䝦ˈ䙷މⱘ䇗⫼ϔ䖯এህᕫ 1 ⾦䩳ҹৢᠡಲᴹDŽԚᰃˈ೼᳝ѯᚙ ᯊⱘখ᭄Ў HZˈ㸼⼎ 1 ⾦䩳ҹৢজ㽕䇗ᑺ kswapd 㒻㓁䖤㸠DŽᤶ㿔Пˈᇍ interruptible_sleep_on_timeout() ࠡⱘ㋏㒳䜡㕂䰊↉ᬍব᭄݊ؐˈԚᰃϔ㒣㓪䆥ህᅮϟᴹњDŽ᠔ҹˈ೼䇗⫼ interruptible_sleep_on_timeout() ᅮњݙḌЁ↣⾦䩳᳝໮ᇥ⃵ᯊ䩳ЁᮁDŽ⫼᠋ৃҹ೼㓪䆥ݙḌއ ᯊ䯈āᰃ໮䭓ਸ਼ˈ䖭ህᰃᐌ᭄ HZDŽHZ জӮ૸䝦ᑊ䇗ᑺ kswapd 㒻㓁䖤㸠ˈ䖭ᯊ׭ kswapd ህজಲࠄ䖭᮴䰤ᕾ⦃ᓔྟⱘഄᮍDŽ䙷Мˈ䖭Āϔᅮ 97 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 922 921 ret += page_launder(gfp_mask, user); 920 nr_inactive_clean_pages()) 919 if (free_shortage() || nr_inactive_dirty_pages > nr_free_pages() + 918 */ 917 * list, so this is a relatively cheap operation. 916 * before we get around to moving them to the other 915 * Usually bdflush will have pre•cleaned the pages 914 * 913 * inactive_dirty list to the inactive_clean list. 912 * If we're low on free pages, move pages from the 911 /* 910 909 int ret = 0; 908 { 907 static int do_try_to_free_pages(unsigned int gfp_mask, int user) [kswapd()>do_try_to_free_pages()] ==================== mm/vmscan.c 907 941 ==================== ϟ䴶ህᰃ䇗⫼ do_try_to_free_pages()ˈ䆩೒㝒ߎϔѯݙᄬ义䴶DŽ݊ҷⷕ೼ mm/vmscan.c Ё˖ 161 } 160 return !list_empty(&q•>task_list); 159 158 #endif 157 CHECK_MAGIC_WQHEAD(q); 156 WQ_BUG(); 155 if (!q) 154 #if WAITQUEUE_DEBUG 153 { 152 static inline int waitqueue_active(wait_queue_head_t *q) [kswapd()>waitqueue_active()] ==================== include/linux/wait.h 152 161 ==================== waitqueue_active()ህᰃᶹⳟᰃ৺᳝ߑ᭄೼䖭Ͼ䯳߫Ёㄝᕙᠻ㸠DŽ݊ᅮН೼ include/linux/wait.h Ё˖ ܹ䖭Ͼ䯳߫ⱘߑ᭄ˈ೼ kswapd ↣ᅠ៤ϔ䍳՟㸠ⱘ᪡԰ᯊህ㛑ᕫࠄᠻ㸠DŽ䖭䞠ⱘ inline ߑ᭄ ߫ˈՓᕫ䖭ѯߑ᭄೼ᶤ⾡џӊথ⫳ᯊህ㛑ᕫࠄᠻ㸠DŽ㗠 kswapd_doneˈህℷᰃ䖭ḋⱘϔϾ䯳߫DŽ޵ᰃᣖ ࠄݙḌЁ᳝޴Ͼ⡍⅞ⱘ䯳߫ˈݙḌЁ৘Ͼ䚼ߚ˄Џ㽕ᰃ䆒໛偅ࡼ˅ৃҹᡞϔѯԢሖߑ᭄ᣖܹ䖭ḋⱘ䯳 ৺᳝ߑ᭄೼ㄝᕙᠻ㸠ˈᑊᡞᶹⳟⱘ㒧ᵰ԰Ўখ᭄Ӵ䗦㒭 do_try_to_free_pages()DŽ೼㄀ 3 ゴЁˈ䇏㗙ᇚⳟ do_try_to_free_pages()ᅠ៤ⱘDŽϡ䖛೼ℸПࠡ䖬㽕䇗⫼ waitqueue_active()ˈⳟⳟ kswapd_done 䯳߫Ёᰃ བᵰথ⦄ৃկߚ䜡ⱘݙᄬ义䴶ⷁ㔎ˈ䙷ህ㽕䆒⊩䞞ᬒ੠ᤶߎ㢹ᑆ义䴶ˈ䖭ᰃ䗮䖛 䖭Ͼߑ᭄ⱘҷⷕ೼ mm/vmscan.c Ёˈ៥Ӏгᡞᅗ⬭㒭䇏㗙DŽ ऎЁ᳝Ϲ䞡ⱘⷁ㔎ˈेⳈ᥹ৃկߚ䜡ⱘ义䴶᭄䞣˄䰸ϡ⌏䎗Ā㛣ā义䴶ҹ໪˅ᰃ৺ᇣѢϔϾ᳔Ԣ䰤ᑺDŽ 㓈ᣕ┰೼ⱘ⠽⧚义䴶կᑨᘏ䞣䖬ϡ໳ˈ䖬㽕䗮䖛 free_shortage()Ẕᶹᰃ৺᳝ᶤϾ݋ԧㅵ⧚ܝˈϡ䖛 ߑ᭄ⱘҷⷕ䛑೼ mm/page_alloc.c Ёˈг䛑↨䕗ㅔऩˈ䇏㗙ৃҹ㞾Ꮕ䯙䇏DŽ 䛑೼ৠϔϾ䯳߫ЁˈݙḌЁⱘܼሔ䞣 nr_inactive_dirty_pages 䆄ᔩⴔᔧࠡℸ㉏义䴶ⱘ᭄䞣DŽϞ䗄ϸϾܼ 98 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 482 * go out to Matthew Dillon. 481 * This code is heavily inspired by the FreeBSD source code. Thanks 480 * 479 * do synchronous page flushing in that case. 478 * have a page before it can continue with its allocation, we'll 477 * end up calling this function. Since the user process needs to 476 * In situations where kswapd cannot keep up, user processes will 475 * 474 * and one to (often asynchronously) clean the dirty inactive pages. 473 * one to move the already cleaned pages to the inactive_clean lists 472 * soon as possible, we'll make two loops over the inactive list, 471 * inactive_clean pages. Since we want to refill those pages as 470 * When this function is called, we are most likely low on free + 469 * 468 * @sync: should we wait synchronously for the cleaning of pages 467 * @gfp_mask: what operations we are allowed to do 466 * page_launder • clean dirty inactive pages, move to inactive_clean list 465 /** [kswapd()>do_try_to_free_pages()>page_launder()] ==================== mm/vmscan.c 465 670 ==================== জ᮴义䴶ৃկߚ䜡ᯊˈЈᯊഄফࠄ䇗⫼DŽ݊ҷⷕ೼ mm/vmscan.c Ё˖ ᛣᗱDŽ䖭Ͼߑ᭄ϔᮍ䴶˄෎ᴀϞ˅ᅮᳳഄফࠄ kswapd()ⱘ䇗⫼ˈϔᮍ䴶೼↣ᔧ䳔㽕ߚ䜡ݙᄬ义䴶ˈ㗠 āˈՓᅗӀব៤ゟेৃҹߚ䜡ⱘ义䴶DŽߑ᭄ৡЁⱘ“launderāˈህᰃĀ⋫㸷Ꮉāⱘޔ⋪ⱘĀ㛣ā义䴶Ā ᰃ䇗⫼ page_launder()ˈ䆩೒ᡞᏆ㒣䕀ܹϡ⌏䎗⢊ᗕܜᯧৢ䲒ˈ䗤ℹࡴᔎ࡯ᑺDŽ佪ܜ㰥ˈ䖭䞠᠔԰ⱘᰃ āDŽ᠔ҹˈ㛑໳ϡࡼĀ⦄ᕍā义䴶ᰃ᳔⧚ᛇⱘDŽ෎Ѣ䖭ḋⱘ㗗ޚ߭ˈԚгᑊϡᰃĀᬒ䇌ಯ⍋㗠ⱚޚᬜⱘ ಴Ў䇕гϡ㛑㊒⹂ഄ乘⌟ࠄᑩાϔѯ义䴶ᰃড়䗖ⱘᤶߎᇍ䈵DŽ㱑✊ϔ㠀㗠㿔Ā᳔䖥᳔ᇥ⫼ࠄāᰃϾ᳝ ᇚ⌏䎗义䴶ⱘ᯴ᇘᮁᓔˈՓП䕀ܹϡ⌏䎗⢊ᗕˈ⫮㟇䖯㗠ᤶߎࠄѸᤶ䆒໛ϞˈᰃϡᕫᏆ㗠ЎПˈ 941 } 940 return ret; 939 938 } 937 ret = 1; 936 kmem_cache_reap(gfp_mask); 935 */ 934 * Reclaim unused slab cache memory. 933 /* 932 } else { 931 ret += refill_inactive(gfp_mask, user); 930 shrink_icache_memory(6, gfp_mask); 929 shrink_dcache_memory(6, gfp_mask); 928 if (free_shortage() || inactive_shortage()) { 927 */ 926 * the inode and dentry cache whenever we do this. 925 * to the inactive list. We also "eat" pages from If needed, we move pages from the active list * 924 */ 923 99 100 483 */ 484 #define MAX_LAUNDER (4 * (1 << page_cluster)) 485 int page_launder(int gfp_mask, int sync) 486 { 487 int launder_loop, maxscan, cleaned_pages, maxlaunder; 488 int can_get_io_locks; 489 struct list_head * page_lru; 490 struct page * page; 491 492 /* 493 * We can only grab the IO locks (eg. for flushing dirty 494 * buffers to disk) if __GFP_IO is set. 495 */ 496 can_get_io_locks = gfp_mask & __GFP_IO; 497 498 launder_loop = 0; 499 maxlaunder = 0; 500 cleaned_pages = 0; 501 502 dirty_page_rescan: 503 spin_lock(&pagemap_lru_lock); 504 maxscan = nr_inactive_dirty_pages; 505 while ((page_lru = inactive_dirty_list.prev) != &inactive_dirty_list && 506 maxscan•• > 0) { 507 page = list_entry(page_lru, struct page, lru); 508 509 /* Wrong page on list?! (list corruption, should not happen) */ 510 if (!PageInactiveDirty(page)) { 511 printk("VM: page_launder, wrong page on list.\n"); 512 list_del(page_lru); 513 nr_inactive_dirty_pages••; 514 page•>zone•>inactive_dirty_pages••; 515 continue; 516 } 517 518 /* Page is or was in use? Move it to the active list. */ 519 if (PageTestandClearReferenced(page) || page•>age > 0 || 520 (!page•>buffers && page_count(page) > 1) || 521 page_ramdisk(page)) { 522 del_page_from_inactive_dirty_list(page); 523 add_page_to_active_list(page); 524 continue; 525 } 526 527 /* 528 * The page is locked. IO in progress? 529 * Move it to the back of the list. 530 */ 531 if (TryLockPage(page)) { Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 101 532 list_del(page_lru); 533 list_add(page_lru, &inactive_dirty_list); 534 continue; 535 } 536 537 /* 538 * Dirty swap•cache page? Write it out if 539 * last copy.. 540 */ 541 if (PageDirty(page)) { 542 int (*writepage)(struct page *) = page•>mapping•>a_ops•>writepage; 543 int result; 544 545 if (!writepage) 546 goto page_active; 547 548 /* First time through? Move it to the back of the list */ 549 if (!launder_loop) { 550 list_del(page_lru); 551 list_add(page_lru, &inactive_dirty_list); 552 UnlockPage(page); 553 continue; 554 } 555 556 /* OK, do a physical asynchronous write to swap. */ 557 ClearPageDirty(page); 558 page_cache_get(page); 559 spin_unlock(&pagemap_lru_lock); 560 561 result = writepage(page); 562 page_cache_release(page); 563 564 /* And re•start the thing.. */ 565 spin_lock(&pagemap_lru_lock); 566 if (result != 1) 567 continue; 568 /* writepage refused to do anything */ 569 set_page_dirty(page); 570 goto page_active; 571 } 572 573 /* 574 * If the page has buffers, try to free the buffer mappings 575 * associated with this page. If we succeed we either free 576 * the page (in case it was a buffercache only page) or we 577 * move the page to the inactive_clean list. 578 * 579 * On the first round, we should free all previously cleaned 580 * buffer pages Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 102 581 */ 582 if (page•>buffers) { 583 int wait, clearedbuf; 584 int freed_page = 0; 585 /* 586 * Since we might be doing disk IO, we have to 587 * drop the spinlock and take an extra reference 588 * on the page so it doesn't go away from under us. 589 */ 590 del_page_from_inactive_dirty_list(page); 591 page_cache_get(page); 592 spin_unlock(&pagemap_lru_lock); 593 594 /* Will we do (asynchronous) IO? */ 595 if (launder_loop && maxlaunder == 0 && sync) 596 wait = 2; /* Synchrounous IO */ 597 else if (launder_loop && maxlaunder•• > 0) 598 wait = 1; /* Async IO */ 599 else 600 wait = 0; /* No IO */ 601 602 /* Try to free the page buffers. */ 603 clearedbuf = try_to_free_buffers(page, wait); 604 605 /* 606 * Re•take the spinlock. Note that we cannot 607 * unlock the page yet since we're still 608 * accessing the page_struct here... 609 */ 610 spin_lock(&pagemap_lru_lock); 611 612 /* The buffers were not freed. */ 613 if (!clearedbuf) { 614 add_page_to_inactive_dirty_list(page); 615 616 /* The page was only in the buffer cache. */ 617 } else if (!page•>mapping) { 618 atomic_dec(&buffermem_pages); 619 freed_page = 1; 620 cleaned_pages++; 621 622 /* The page has more users besides the cache and us. */ 623 } else if (page_count(page) > 2) { 624 add_page_to_active_list(page); 625 626 /* OK, we "created" a freeable page. */ 627 } else /* page•>mapping && page_count(page) == 2 */ { 628 add_page_to_inactive_clean_list(page); 629 cleaned_pages++; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 103 630 } 631 632 /* 633 * Unlock the page and drop the extra reference. 634 * We can only do it here because we ar accessing 635 * the page struct above. 636 */ 637 UnlockPage(page); 638 page_cache_release(page); 639 640 /* 641 * If we're freeing buffer cache pages, stop when 642 * we've got enough free memory. 643 */ 644 if (freed_page && !free_shortage()) 645 break; 646 continue; 647 } else if (page•>mapping && !PageDirty(page)) { 648 /* 649 * If a page had an extra reference in 650 * deactivate_page(), we will find it here. 651 * Now the page is really freeable, so we 652 * move it to the inactive_clean list. 653 */ 654 del_page_from_inactive_dirty_list(page); 655 add_page_to_inactive_clean_list(page); 656 UnlockPage(page); 657 cleaned_pages++; 658 } else { 659 page_active: 660 /* 661 * OK, we don't know what to do with the page. 662 * It's no use keeping it here, so we move it to 663 * the active list. 664 */ 665 del_page_from_inactive_dirty_list(page); 666 add_page_to_active_list(page); 667 UnlockPage(page); 668 } 669 } 670 spin_unlock(&pagemap_lru_lock); ҷⷕЁⱘሔ䚼䞣 cleaned_papes ⫼ᴹ㌃䅵㹿Ā⋫⏙āⱘ义䴶᭄䞣DŽ঺ϔϾሔ䚼䞣 launder_loop ⫼ᴹ ᥻ࠊᠿᦣϡ⌏䎗Ā㛣ā义䴶䯳߫ⱘ⃵᭄DŽ೼㄀ϔ䍳ᠿᦣᯊ launder_loop Ў 0ˈབᵰ᳝ᖙ㽕䖯㸠㄀Ѡ䍳ᠿ ᦣˈ߭ᇚ݊䆒៤ 1 ᑊ䕀ಲࠄᷛো dirty_page_rescan ໘˄502 㸠˅ˈᓔྟজϔ⃵ᠿᦣDŽ ᇍϡ⌏䎗Ā㛣ā义䴶䯳߫ⱘᠿᦣᰃ䗮䖛ϔϾ while ᕾ⦃˄505 㸠˅䖯㸠ⱘDŽ⬅Ѣ೼ᕾ⦃ЁӮᡞ᳝ѯ 义䴶Ңᔧࠡԡ㕂⿏ࠄ䯳߫ⱘሒ䚼ˈ᠔ҹ䰸⊓ⴔ䫒᥹ᣛ䩜ᠿᦣ໪䖬㽕ᇍ᭄䞣ࡴҹ᥻ࠊˈᠡ㛑䙓ܡ䞡໡໘ ⧚ৠϔ义䴶ˈ⫮㟇䱋ܹ⅏ᕾ⦃ˈ䖭ህᰃব䞣 maxscan ⱘ԰⫼DŽ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! āⱘDŽޔⱘ PG_dirty ᷛᖫԡᖙᅮᰃ 0ˈ䖭Ͼ义䴶ϔᅮᰃ೼ҹࠡⱘᠿᦣЁݭߎ㗠বĀᑆ াᰃᡞᅗⱘ PG_dirty ᷛᖫԡ⏙៤њ 0DŽ䖬㽕⊼ᛣˈབᵰ CPU ࠄ䖒њҷⷕЁⱘ 582 㸠ˈ߭义䴶 ā义䴶䯳߫Ёˈ㗠ޔ໮њϔϾĀ⫼᠋āDŽ⊼ᛣ䖭䞠ᑊ≵᳝ゟेᡞݭߎⱘ义䴶䕀⿏ࠄϡ⌏䎗Āᑆ 䖭Ͼ䅵᭄ˈ㸼⼎೼ᡞ义䴶ݭߎⱘᳳ䯈ޣ᭄ˈҢ䖭Ͼߑ᭄䖨ಲৢݡ䗮䖛 page_cache_release()䗦 䗮䖛 page_cache_get()䗦๲义䴶ⱘՓ⫼䅵ܜ乎֓ᦤϔϟˈ䖭䞠೼䇗⫼݋ԧⱘ writepage ߑ᭄ᯊ page_launder()ৃҹᘶ໡义䴶ⱘ PG_dirty ᷛᖫԡᑊᇚ݊䗔䖬㒭⌏䎗义䴶䯳߫Ё˄569̚570 㸠 ˅DŽ 㸠˅DŽℸ໪ˈ䖬㽕㗗㰥义䴶ݭߎ༅䋹ⱘৃ㛑ˈ݋ԧⱘߑ᭄೼ݭߎ༅䋹ᯊᑨ䆹䖨ಲ 1ˈՓ ህᰃᡞ义䴶ⱘ PG_dirty ᷛᖫԡ⏙៤ 0 ⱘⳂⱘDŽ䖭ḋˈህϡӮᡞৠϔϾ义䴶ݭߎϸ⃵њ˄㾕 541 ៤ˈ೼ℸᳳ䯈ݙḌ᳝ৃ㛑ݡ⃵䖯ܹ page_launder()ˈ᠔ҹ䳔㽕䰆ℶᡞ䖭Ͼ义䴶ݡݭߎϔ⃵DŽ䖭 ᰃৠℹⱘ˄ᔧࠡ䖯⿟ⴵ⳴ˈㄝᕙݭߎᅠ៤˅ˈгৃ㛑ᰃᓖℹⱘˈԚᘏᰃ䳔㽕ϔᅮⱘᯊ䯈ᠡ㛑ᅠ ݋ԧⱘ᪡԰гϡϔḋDŽ䖭Ͼݭ᪡԰ৃ㛑ˈކmmap()ᓎゟⱘ᭛ӊ᯴ᇘҹঞ᭛ӊ㋏㒳ⱘ䇏ˋݭ㓧 ᦤկⱘߑ᭄ᡞ义䴶ݭߎএDŽḍ᥂义䴶ⱘϡৠՓ⫼Ⳃⱘˈ՟བ᱂䗮ⱘ⫼᠋ぎ䯈义䴶ˈ៪㗙䗮䖛 ClearPageDirty()ᡞ义䴶ⱘ PG_dirty ᷛᖫԡ⏙៤ 0ˈ✊ৢ䗮䖛⬅᠔ሲ address_space ᭄᥂㒧ᵘ᠔ 䗮䖛ܜ䴶˄531̚535 㸠˅DŽབᵰ䖯㸠㄀Ѡ䍳ᠿᦣⱘ䆱ˈ䙷ህⳳⱘ㽕ᡞ义䴶ݭߎএњDŽݭПࠡ ϔĀ݇āᰃ≵᳝䯂乬ⱘDŽ೼㄀ϔ䍳ᠿᦣЁˈাᰃᡞ义䴶⿏ࠄৠϔ䯳߫ⱘሒ䚼ˈ㗠ᑊϡݭߎ义 address_space_operations 㒧ᵘЎ swap_aopsˈ᠔ᦤկⱘ义䴶ݭߎ᪡԰Ў swap_writepage()ˈ䖛䖭 ߫ЁDŽᇍѢϔ㠀ⱘ义䴶Ѹᤶˈ᠔ሲⱘ address_space ᭄᥂㒧ᵘЎ swapper_space ˈ݊ 㒧ᵘᖙ乏ᦤկ义䴶ݭߎ᪡԰ⱘߑ᭄ˈ৺߭ህাད䕀ࠄ page_active ໘ˈᇚ义䴶䗕ಲ⌏䎗义䴶䯳 ᠔ሲⱘ address_space ᭄᥂ˈܜ㽕㗗㰥˄541̚571 㸠˅DŽ佪މࠄѸᤶ䆒໛ϞˈԚ䖬᳝ѯ⡍⅞ᚙ (3) བᵰ义䴶ҡᰃĀ㛣āⱘ˄541 㸠˅ˈे page 㒧ᵘЁⱘ PG_dirty ᷛᖫԡЎ 1ˈ߭ॳ߭Ϟᇚ݊ݭߎ ᇍѢ᳾㹿䫕ԣⱘ义䴶ˈ⦄೼Ꮖ㒣䫕ϞњDŽ ˋ䕧ߎˈ䖭ḋⱘ义䴶ᑨ䆹⬭೼ϡ⌏䎗Ā㛣ā义䴶䯳߫ЁˈԚᰃᡞᅗ⿏ࠄ䯳߫ⱘሒ䚼DŽ⊼ᛣˈ (2) 义䴶Ꮖ㹿䫕ԣ˄531 㸠˅ˈ᠔ҹ TryLockPage()䖨ಲ 1ˈ䖭㸼ᯢℷ೼ᇍℸ义䴶䖯㸠᪡԰ˈབ䕧ܹ ✊ϡᑨ䆹ᤶߎࠄ⺕ⲬϞDŽ 义䴶೼ফࠄ䖯⿟⫼᠋ぎ䯈᯴ᇘⱘৠᯊজ⫼Ѣ ramdiskˈे⫼ݙᄬぎ䯈ᴹ῵ᢳ⺕Ⲭˈ䖭⾡义䴶ᔧ 䙷Мা㽕䅵᭄໻Ѣ 1 ህᖙᅮ䖬᳝䖯⿟೼Փ⫼䖭Ͼ义䴶DŽˈކӊⱘ㓧 DŽབᵰϔϾ义䴶≵᳝⫼԰䇏ˋݭ᭛ކՓ⫼䛑Փ䖭Ͼ䅵᭄ࡴ 1ˈࣙᣀᇚ义䴶⫼԰䇏ˋݭ᭛ӊⱘ㓧 ⱘ᯴ᇘ㸼Ё᳝᯴ᇘDŽབࠡ᠔䗄ˈϔϾ义䴶ⱘՓ⫼䅵᭄೼ߚ䜡ᯊ䆒៤ 1ˈҹৢᇍ䆹义䴶ⱘ↣ϔ⃵ 㗠义䴶ⱘՓ⫼䅵᭄ैজ໻Ѣ 1DŽ䖭䇈ᯢ义䴶೼㟇ᇥϔϾ䖯⿟ˈކ义䴶ᑊϡ⫼԰䇏ˋݭ᭛ӊⱘ㓧 ᑺ᳝݇DŽৢ䴶៥Ӏ䖬㽕ಲࠄ䖭Ͼ䆱乬DŽ 义䴶ⱘĀᇓੑā䖬᳾㗫ሑDŽ义䴶ⱘ page 㒧ᵘЁ᳝Ͼᄫ↉ ageˈ᭄݊ؐϢ义䴶ফ䆓䯂ⱘ乥㐕⿟ ᐌˈҢ㗠ᘶ໡њ䆹义䴶ⱘ᯴ᇘDŽ 义䴶೼䖯ܹњϡ⌏䎗Ā㛣ā义䴶䯳߫Пৢজফࠄњ䆓䯂ˈेথ⫳њҹℸ义䴶ЎⳂᷛⱘ㔎义ᓖ 䫭Ḝāˈ಴㗠䳔㽕ಲࠄ⌏䎗义䴶䯳߫Ё˄519̚525 㸠˅DŽ䖭ḋⱘ义䴶᳝˖؛䯳߫ᴀᴹህᰃĀݸ Ꮖ㒣ব࣪ˈ៪㗙ᔧ߱䖯ܹ䖭Ͼމ(1) ᳝ѯ义䴶㱑✊Ꮖ㒣䖯ܹϡ⌏䎗Ā㛣ā义䴶䯳߫ˈԚᰃ⬅Ѣᚙ ϡ⌏䎗Ā㛣ā义䴶ˈ߭㽕ձ⃵԰ϟ䗄ⱘẔᶹᑊ԰Ⳍᑨⱘ໘⧚DŽ ೼䖭Ͼ䯳߫Ё˗䖭ϔᅮᰃߎњҔМ↯⮙ˈ᠔ҹᡞᅗҢ䯳߫Ёߴ䰸˄㾕 512 㸠˅DŽ䰸ℸП໪ˈᇍѢℷᐌⱘ 㽕Ẕᶹᅗⱘ PG_inactive_dirty ᷛᖫԡЎ 1ˈ৺߭ህḍᴀϡᑨ䆹ߎ⦄ܜᇍѢ䯳߫Ёⱘ↣ϔϾ义䴶ˈ佪 104 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 697 } 696 return cleaned_pages; 695 /* Return the number of pages moved to the inactive_clean list. */ 694 693 } 692 goto dirty_page_rescan; 691 wakeup_bdflush(0); 690 /* Kflushd takes care of the rest. */ 689 maxlaunder = MAX_LAUNDER; 688 /* We only do a few "out of order" flushes. */ 687 sync = 0; 686 if (cleaned_pages) 685 /* If we cleaned pages, never do synchronous IO. */ 684 launder_loop = 1; 683 if (can_get_io_locks && !launder_loop && free_shortage()) { 682 */ 681 * IO. 680 * loads, flush out the dirty pages before we have to wait on 679 * We also wake up bdflush, since bdflush should, under most 678 * 677 * MAX_SYNC_LAUNDER pages. 676 * free anything yet, we wait synchronously on the writeout of 675 * by a user process (that /needs/ a free page) and we didn't 674 * to queue the dirty pages for writeout. When we were called 673 * If we don't have enough free pages, we loop back once 672 /* 671 [kswapd()>do_try_to_free_pages()>page_launder()] ==================== mm/vmscan.c 671 697 ==================== ᅮᰃ৺䖯㸠㄀Ѡ䍳ᠿᦣDŽއ__GFP_IO ᷛᖫԡᰃ৺Ў 1ˈᴹ ᅠ៤њϔ䍳ᠿᦣҹৢˈ䖬㽕ḍ᥂㋏㒳Ёぎ䯆义䴶ᰃ৺ⷁ㔎ǃҹঞ䇗⫼খ᭄ⱘ gfp_mask Ёⱘ 䎗义䴶䯳߫ЁDŽ 㸠˅ˈ䙷ህᰃ᮴⊩໘⧚ⱘ义䴶ˈ᠔ҹᡞᅗ䗔ಲ⌏ 658˄މ(6) ᳔ৢˈབᵰϡሲѢϞ䗄ⱘӏԩϔ⾡ᚙ ā义䴶䯳߫ЁDŽޔњⱘ义䴶ˈ᠔ҹᡞᅗ䕀⿏ࠄ᠔ሲऎ䯈ⱘϡ⌏䎗Āᑆ (5) བᵰ义䴶ϡݡᰃĀ㛣āⱘˈᑊϨ೼ᶤϾ address_space ᭄᥂㒧ᵘⱘ䯳߫Ёˈ䖭ህᰃᏆ㒣Ā⋫⏙” ҹ೼ᄺдњĀ᭛ӊ㋏㒳āϔゴҹৢ㞾㸠䯙䇏DŽ ˄㾕 644 ੠ 645 㸠˅DŽ৺߭㒻㓁ᠿᦣDŽߑ᭄ try_to_free_buffers()ⱘҷⷕ೼ fs/buffer.c Ёˈ䇏㗙ৃ ៤ࡳഄ䞞ᬒњϔϾ义䴶ˈᑊᏆথ⦄㋏㒳Ёⱘぎ䯆义䴶Ꮖ㒣ϡݡⷁ㔎ˈ䙷Мᠿᦣህৃҹ㒧ᴳњ ህ䖒ࠄњ 0ˈҢ㗠᳔㒜ᇚ义䴶䞞ᬒಲࠄぎ䯆义䴶䯳߫ЁDŽབᵰ 1 ޣpage_cache_release()ݡՓ݊ 㸠ⱘ 638ˈ1 ޣ䯳߫DŽབᵰ䞞ᬒ៤ࡳˈ߭义䴶ⱘՓ⫼䅵᭄Ꮖ㒣೼ try_to_free_buffers()Ё ā义䴶ޔ䖨ಲؐᇚ݊䗔ಲϡ⌏䎗Ā㛣ā义䴶䯳߫ˈ៪㗙䫒ܹ⌏䎗义䴶䯳߫ˈ៪㗙ϡ⌏䎗Āᑆ ⾏ϡ⌏䎗Ā㛣ā义䴶䯳߫ˈݡ䗮䖛 try_to_free_buffers()䆩೒ᇚ义䴶䞞ᬒDŽབᵰϡ㛑䞞ᬒ߭ḍ᥂ Փᅗ㜅ܜⱘ义䴶˄582̚647 㸠˅ˈ߭ކབᵰ义䴶ϡݡᰃĀ㛣āⱘˈᑊϨজᰃ⫼԰᭛ӊ䇏ˋݭ㓧 (4) 105 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 847 846 kmem_cache_reap(gfp_mask); 845 /* Always trim SLAB caches when memory gets low. */ 844 843 start_count = count; 842 count = (1 << page_cluster); 841 if (user) 840 count = inactive_shortage() + free_shortage(); 839 838 int priority, count, start_count, made_progress; 837 { 836 static int refill_inactive(unsigned int gfp_mask, int user) 835 */ 834 * to free too many pages. 833 * really care about latency. In that case we don't try 832 * OTOH, if we're a user process (and not kswapd), we 831 * 830 * cluster them so that we get good swap•out behaviour. 829 * We want to try to free "count" pages, and we want to 828 * 827 * without holding the kernel lock etc. 826 * now we need this so that we can do page allocations 825 * We need to make the locks finer granularity, but right 824 /* [kswapd()>do_try_to_free_pages()>refill_inactive()] ==================== mm/vmscan.c 824 905 ==================== ⱘҷⷕˈ៥Ӏ೼䖭䞠߭䲚Ё݇⊼ refill_inactive()ˈ݊ҷⷕ೼ mm/vmscan.c Ё˖ 㽕䗮䖛 kmem_cache_reap()ᴹĀᬊࡆāDŽ䇏㗙ৃҹ೼ᄺдњĀ᭛ӊ㋏㒳āৢಲ䖛ᴹ㞾Ꮕ䯙䇏ࠡϸϾߑ᭄ ㅵ⧚ᴎࠊгᰃؒ৥Ѣߚ䜡੠ֱᣕ᳈໮ⱘぎ䯆⠽⧚义䴶ˈ㗠ϡ⛁㹋Ѣ䗔䖬䖭ѯ义䴶ˈ᠔ҹ䖛ϔ↉ᯊ䯈ህ 䴶ˈ✊ৢߛࡆ៤ᇣഫĀ䳊ଂāDŽ䱣ⴔ㋏㒳ⱘ䖤㸠ˈᇍ䖭⾡⠽⧚义䴶ⱘᅲ䰙䳔∖г೼ࡼᗕഄব࣪DŽԚᰃ slab ㅵ⧚Āᡍথā⠽⧚义ټᰃ৥ᄬڣ䞛⫼њϔ⾡⿄Ў“slabāⱘㅵ⧚ᴎࠊDŽҹৢ䇏㗙Ӯⳟࠄˈ䖭⾡ᴎࠊህད Ā⫳ᗕᑇ㸵āDŽ঺ϔᮍ䴶ˈ䰸ℸҹ໪ˈݙḌ೼䖤㸠Ёг䳔㽕ࡼᗕഄߚ䜡Փ⫼ᕜ໮᭄᥂㒧ᵘˈݙḌЁᇍℸ shrink_dcache_memory()੠ shrink_icache_memory()䗖ᔧࡴҹಲᬊˈҹ㓈ᣕ䖭ѯ᭄᥂㒧ᵘϢ⠽⧚义䴶䯈ⱘ ⿃㌃䍋໻䞣ⱘ dentry ᭄᥂㒧ᵘ੠ inode ᭄᥂㒧ᵘˈऴ⫼᭄䞣ৃ㾖ⱘ⠽⧚义䴶DŽ䖭ᯊˈህ㽕䗮䖛 LRU 䯳߫Ё԰Ўৢ໛ˈҹ䰆೼ϡЙᇚᴹⱘ᭛ӊ᪡԰Ёজ㽕⫼ࠄDŽ䖭ḋˈ㒣䖛ϔ↉ᯊ䯈ҹৢˈህ᳝ৃ㛑 䖬᳝ҷ㸼ⴔ᭛ӊ㋶ᓩ㡖⚍ⱘ inode ᭄᥂㒧ᵘDŽ䖭ѯ᭄᥂㒧ᵘ೼᭛ӊ݇䯁ҹৢᑊϡゟे䞞ᬒˈ㗠ᰃᬒ೼ ӊ㋏㒳āϔゴЁˈ䇏㗙ᇚӮⳟࠄˈ೼ᠧᓔ᭛ӊⱘ䖛⿟Ё㽕ߚ䜡੠Փ⫼ҷ㸼ⴔⳂᔩ乍ⱘ dentry ᭄᥂㒧ᵘˈ shrink_icache_memory()ǃrefill_inactive()˅ˈҹঞㄝϔϟᇚӮⳟࠄⱘ kmem_cache_reap()ⱘᛣ೒DŽ೼Ā᭛ 䴶Ёಲᬊˈ㗠ᰃҢಯϾᮍ䴶ಲᬊˈ䖭ህᰃ䖭䞠᠔䇗⫼ϝϾߑ᭄˄shrink_dcache_memory() ǃ ϡ䎇ˈ䙷ህ㽕䖯ϔℹ䆒⊩ಲᬊ义䴶њDŽϡ䖛ˈгᑊϡᰃऩ㒃ഄҢ৘Ͼ䖯⿟ⱘ⫼᠋ぎ䯈᠔᯴ᇘⱘ⠽⧚义 ಲࠄ do_try_to_free_pages()ⱘҷⷕЁˈ㒣䖛 page_launder()ҹৢˈབᵰৃߚ䜡ⱘ⠽⧚义䴶᭄䞣ҡ✊ ៤њ 1ˈҹৢህϡৃ㛑ݡಲ䖛এজᠿᦣϔ⃵њDŽ᠔ҹ↣⃵䇗⫼ page_launder()᳔໮ᰃ԰ϸ䍳ᠿᦣDŽ ᅮ䖯㸠㄀Ѡ䍳ᠿᦣˈህ䕀ಲࠄ 502 㸠ᷛো dirty_page_rescan ໘DŽ⊼ᛣ䖭䞠ᡞ launder_loop 䆒އབᵰ 106 107 848 priority = 6; 849 do { 850 made_progress = 0; 851 852 if (current•>need_resched) { 853 __set_current_state(TASK_RUNNING); 854 schedule(); 855 } 856 857 while (refill_inactive_scan(priority, 1)) { 858 made_progress = 1; 859 if (••count <= 0) 860 goto done; 861 } 862 863 /* 864 * don't be too light against the d/i cache since 865 * refill_inactive() almost never fail when there's 866 * really plenty of memory free. 867 */ 868 shrink_dcache_memory(priority, gfp_mask); 869 shrink_icache_memory(priority, gfp_mask); 870 871 /* 872 * Then, try to page stuff out.. 873 */ 874 while (swap_out(priority, gfp_mask)) { 875 made_progress = 1; 876 if (••count <= 0) 877 goto done; 878 } 879 880 /* 881 * If we either have enough free memory, or if 882 * page_launder() will be able to make enough 883 * free memory, then stop. 884 */ 885 if (!inactive_shortage() || !free_shortage()) 886 goto done; 887 888 /* 889 * Only switch to a lower "priority" if we 890 * didn't make any useful progress in the 891 * last loop. 892 */ 893 if (!made_progress) 894 priority••; 895 } while (priority >= 0); 896 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 712 int ret = 0; 711 int maxscan, page_active = 0; 710 struct page * page; 709 struct list_head * page_lru; 708 { 707 int refill_inactive_scan(unsigned int priority, int oneshot) 706 */ 705 * unused pages, those pages will then be moved to the inactive list. 704 * This function will scan a portion of the active list to find 703 * 702 * @oneshot: exit after deactivating one page 701 * @priority: the priority at which to scan 700 * refill_inactive_scan • scan the active list and find pages to deactivate 699 /** ==================== mm/vmscan.c 699 769 ==================== refill_inactive_scan()ⱘҷⷕˈ䖭Ͼߑ᭄೼ mm/vmscan.c Ё˖ ⳟܜҢЁᡒߎৃҹ䕀ܹϡ⌏䎗⢊ᗕⱘ义䴶DŽℸ໪ˈ䖬㽕ݡ䆩䆩⫼Ѣ dentry 㒧ᵘ੠ inode 㒧ᵘⱘ义䴶DŽ 䆩೒ҢЁᡒࠄৃҹ䕀ܹϡ⌏䎗⢊ᗕⱘ义䴶˗঺ϔӊᰃ䗮䖛 swap_out()ᡒߎϔϾ䖯⿟ˈ✊ৢᠿᦣ݊᯴ᇘ㸼ˈ ѯҔМਸ਼˛Џ㽕ᰃϸӊџDŽϔӊᰃ䗮䖛 refill_inactive_scan()ᠿᦣ⌏䎗义䴶䯳߫DŽخ䙷Мˈ೼ᕾ⦃Ё Ẕᶹ䖭Ͼᷛᖫᑊ䇗⫼ schedule()DŽ ᳝ৃ㛑㒩䖛䖭Ͼᴎࠊ㗠ऴԣ CPU ϡᬒˈ᠔ҹা㛑䴴ᅗĀ㞾ᕟāˈ㞾Ꮕ೼ৃ㛑䳔㽕䕗䭓ᯊ䯈ⱘ᪡԰Пࠡ ಲ⫼᠋ぎ䯈ᯊህӮẔᶹ䖭ϾᷛᖫDŽৃᰃˈkswapd ᰃϾݙḌ㒓⿟ˈ∌䖰ϡӮĀ䖨ಲ⫼᠋ぎ䯈āˈ䖭ḋህ Ёⱘ need_resched ᰃЎᔎࠊ䇗ᑺ㗠䆒㕂ⱘˈ↣ᔧ CPU 㒧ᴳњϔ⃵㋏㒳䇗⫼៪Ёᮁ᳡ࡵǃҢ㋏㒳ぎ䯈䖨 ⱘ⢊ᗕ䆒㕂៤ TASK_RUNNINGˈ㸼䖒㽕㒻㓁䖤㸠ⱘᛣᜓDŽ䇏㗙೼㄀ 4 ゴЁᇚӮⳟࠄˈtask_struct 㒧ᵘ ህ䇈ᯢᶤϾЁᮁ᳡ࡵ⿟ᑣ㽕∖䇗ᑺˈ᠔ҹ䇗⫼ schedule()䅽ݙḌ䖯㸠ϔ⃵䇗ᑺˈԚᰃ೼ℸПࠡᡞᴀ䖯⿟ ೼ᕾ⦃Ёˈ↣⃵ᓔ༈䛑㽕Ẕᶹϔϟᔧࠡ䖯⿟ⱘ task_struct 㒧ᵘЁⱘ need_resched ᰃ৺Ў 1DŽབᵰᰃˈ г䆌᳝њᬍব˅DŽމЁᮁⳳⱘথ⫳ᯊᚙ 㑻ᯊ䖬ᰃ䖒ϡࠄⳂᷛˈ䙷гাདㅫњ˄ࠄ㔎义ܜᵰ៪㗙䖒ࠄњⳂᷛˈಲᬊⱘ᭄䞣໳њ˗៪㗙೼᳔催Ӭ 㑻᳔Ԣⱘ 6 㑻ᓔྟˈ䗤ℹࡴ໻Ā࡯ᑺāⳈࠄ 0 㑻ˈ㒧ܜ✊ৢˈህᰃϔϾ do•while ᕾ⦃DŽᕾ⦃ҢӬ ऎⱘㅵ⧚āϔ㡖ҹৢ㞾Ꮕ䯙䇏䖭Ͼߑ᭄ⱘҷⷕDŽކ䇏㗙ৃҹ೼ᄺдњĀݙḌᎹ԰㓧 䗮䖛 kmem_cache_reap()Āᬊࡆā⬅ slab ᴎࠊㅵ⧚ⱘぎ䯆⠽⧚义䴶ˈⳌᇍ㗠㿔䖭ᰃࡼ԰᳔ᇣⱘˈܜ佪 ಲᬊ⠽⧚义䴶ⱘ䖛⿟ᰃ৺ৃҹ᜶᜶ᴹˈ᠔ҹᇍᴀ⃵㽕ಲᬊⱘ义䴶᭄䞣᳝ᕅડDŽ ᅮއখ᭄ user ᰃҢ kswapd Ӵϟᴹⱘˈ㸼⼎ᰃ৺᳝ߑ᭄೼ kswapd_done 䯳߫Ёㄝᕙᠻ㸠ˈ䖭Ͼ಴㋴ 905 } 904 return (count < start_count); 903 done: 902 901 } 900 goto done; 899 if (••count <= 0) 898 while (refill_inactive_scan(0, 1)) { Always end on a refill_inactive.., may sleep... */ */ 897 108 109 713 714 /* Take the lock while messing with the list... */ 715 spin_lock(&pagemap_lru_lock); 716 maxscan = nr_active_pages >> priority; 717 while (maxscan•• > 0 && (page_lru = active_list.prev) != &active_list) { 718 page = list_entry(page_lru, struct page, lru); 719 720 /* Wrong page on list?! (list corruption, should not happen) */ 721 if (!PageActive(page)) { 722 printk("VM: refill_inactive, wrong page on list.\n"); 723 list_del(page_lru); 724 nr_active_pages••; 725 continue; 726 } 727 728 /* Do aging on the pages. */ 729 if (PageTestandClearReferenced(page)) { 730 age_page_up_nolock(page); 731 page_active = 1; 732 } else { 733 age_page_down_ageonly(page); 734 /* 735 * Since we don't hold a reference on the page 736 * ourselves, we have to do our test a bit more 737 * strict then deactivate_page(). This is needed 738 * since otherwise the system could hang shuffling 739 * unfreeable pages from the active list to the 740 * inactive_dirty list and back again... 741 * 742 * SUBTLE: we can have buffer pages with count 1. 743 */ 744 if (page•>age == 0 && page_count(page) <= 745 (page•>buffers ? 2 : 1)) { 746 deactivate_page_nolock(page); 747 page_active = 0; 748 } else { 749 page_active = 1; 750 } 751 } 752 /* 753 * If the page is still on the active list, move it 754 * to the other end of the list. Otherwise it was 755 * deactivated by age_page_down and we exit successfully. 756 */ 757 if (page_active || PageActive(page)) { 758 list_del(page_lru); 759 list_add(page_lru, &active_list); 760 } else { 761 ret = 1; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 315 * Pass 2: re•assign rss swap_cnt values, then select as above. 314 * not yet been swapped out. 313 * Pass 1: select the swappable task with maximal RSS that has 312 * assign = {0, 1}: 311 * We make one or two passes through the task list, indexed by 310 /* 309 308 int __ret = 0; 307 int counter; 306 { 305 static int swap_out(unsigned int priority, int gfp_mask) 304 303 #define SWAP_MIN 8 302 #define SWAP_SHIFT 5 301 */ 300 * the lower level routines result in continued processing. 299 * N.B. This function returns only 0 or 1. Return values != 1 from 298 * Select the task with maximal swap_cnt and try to swap out a page. 297 /* [kswapd()>do_try_to_free_pages()>refill_inactive()>swap_out()] ==================== mm/vmscan.c 297 378 ==================== ݡⳟ swap_out()ˈ䙷ᰃ೼ mm/vmscan.c Ё೼Нⱘ˖ ೼Ā义䴶ⱘᤶܹāЁህৃҹⳟࠄDŽˈމ⅞ⱘᚙ ᇚ义䴶䕀ܹϡ⌏䎗义䴶䯳߫ˈ಴㗠ϡ೼䖭Ͼ䯳߫ЁњDŽৃᰃˈህབҷⷕЁⱘ⊼䞞᠔㿔ˈ⹂ᅲᄬ೼ⴔ⡍ ߫Ёⱘ义䴶Փ⫼䅵᭄䛑໻Ѣ 1DŽ㗠ᔧ swap_out()ᮁᓔϔϾ义䴶ⱘ᯴ᇘ㗠Փ݊䕀ܹϡ⌏䎗⢊ᗕᯊˈ߭Ꮖ㒣 ᅮᰃ৺㒻㓁ᠿᦣDŽϔ㠀ᴹ䇈ˈ೼⌏䎗义䴶䯳އᇚϔϾ义䴶䕀ܹњϡ⌏䎗⢊ᗕˈ߭ḍ᥂খ᭄ oneshot ⱘؐ ᇍѢ䖬ϡ㛑䕀ܹϡ⌏䎗⢊ᗕⱘ义䴶ˈ㽕ᇚ݊Ң䯳߫Ёⱘᔧࠡԡ㕂⿏ࠄ䯳߫ⱘሒ䚼DŽডПˈབᵰ៤ࡳഄ ϡ⌏䎗⢊ᗕ˄㾕 744 㸠˅ˈ䖭ḋⱘ义䴶೼䗮䖛 swap_out()ᠿᦣⳌᑨ䖯⿟ⱘ᯴ᇘ㸼ᯊᠡ㛑䕀ܹϡ⌏䎗⢊ᗕDŽ 䙷Мা㽕义䴶ⱘՓ⫼䅵᭄໻Ѣ 1 ህ䇈ᯢ䖬᳝⫼᠋ぎ䯈᯴ᇘˈ䖬ϡ㛑䕀ܹˈކ⫼԰᭛ӊ㋏㒳ⱘ䇏ˋݭ㓧 ሑњᇓੑ䖬ϡ䎇ҹᡞ义䴶Ң⌏䎗⢊ᗕ䕀ܹϡ⌏䎗⢊ᗕˈ䖬ᕫⳟᰃ৺䖬᳝⫼᠋ぎ䯈᯴ᇘDŽབᵰ义䴶ᑊϡ ᰃ㗫ܝˈੑҹৢࠄ䖒њ 0ˈ䙷ህ䇈ᯢ䖭Ͼ义䴶Ꮖ㒣ᕜ䭓ᯊ䯈≵᳝ফࠄ䆓䯂ˈ಴㗠Ꮖ㒣㗫ሑњᇓੑDŽϡ䖛 ᇥ义䴶ᇓޣᇥ义䴶ⱘᇓੑDŽབᵰޣᅮ๲ࡴ៪އˈ˅721 㸠˅DŽ✊ৢˈḍ᥂义䴶ᰃ৺ফࠄњ䆓䯂˄㾕 729 㸠 г㽕偠䆕⹂ᅲሲѢ⌏䎗义䴶˄㾕ܜpriority Ў 0 ᯊᠡᠿᦣᭈϾ䯳߫˄㾕 716 㸠˅DŽᇍѢ᠔ᠿᦣⱘ义䴶ˈ佪 䖛䖭䞠ᠿᦣⱘϡϔᅮᰃᭈϾ⌏䎗义䴶䯳߫ˈ㗠ᰃḍ᥂䇗⫼খ᭄ priority ⱘؐᠿᦣ݊Ёϔ䚼ߚˈা᳝೼ ᇍĀ㛣ā义䴶䯳߫ⱘᠿᦣϔḋˈ䖭䞠г䗮䖛ϔϾሔ䚼䞣 maxscan ᴹ᥻ࠊᠿᦣⱘ义䴶᭄䞣DŽϡڣህ 769 } 768 return ret; 767 766 spin_unlock(&pagemap_lru_lock); 765 } 764 } 763 break; if (oneshot) 762 110 111 316 * 317 * With this approach, there's no need to remember the last task 318 * swapped out. If the swap•out fails, we clear swap_cnt so the 319 * task won't be selected again until all others have been tried. 320 * 321 * Think of swap_cnt as a "shadow rss" • it tells us which process 322 * we want to page out (always try largest first). 323 */ 324 counter = (nr_threads << SWAP_SHIFT) >> priority; 325 if (counter < 1) 326 counter = 1; 327 328 for (; counter >= 0; counter••) { 329 struct list_head *p; 330 unsigned long max_cnt = 0; 331 struct mm_struct *best = NULL; 332 int assign = 0; 333 int found_task = 0; 334 select: 335 spin_lock(&mmlist_lock); 336 p = init_mm.mmlist.next; 337 for (; p != &init_mm.mmlist; p = p•>next) { 338 struct mm_struct *mm = list_entry(p, struct mm_struct, mmlist); 339 if (mm•>rss <= 0) 340 continue; 341 found_task++; 342 /* Refresh swap_cnt? */ 343 if (assign == 1) { 344 mm•>swap_cnt = (mm•>rss >> SWAP_SHIFT); 345 if (mm•>swap_cnt < SWAP_MIN) 346 mm•>swap_cnt = SWAP_MIN; 347 } 348 if (mm•>swap_cnt > max_cnt) { 349 max_cnt = mm•>swap_cnt; 350 best = mm; 351 } 352 } 353 354 /* Make sure it doesn't disappear */ 355 if (best) 356 atomic_inc(&best•>mm_users); 357 spin_unlock(&mmlist_lock); 358 359 /* 360 * We have dropped the tasklist_lock, but we 361 * know that "mm" still exists: we are running 362 * with the big kernel lock, and exit_mm() 363 * cannot race with us. 364 */ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᡒࠄϔϾĀ᳔Շᇍ䈵āDŽϔⳈࠄ᠔᳝䖯⿟ⱘ mm•>swap_cnt 䛑ব៤њ 0ˈҢ㗠ᠿᦣϟᴹコᡒϡࠄϔϾ“best” ࡾ࡯Ёᇮ᳾ফࠄ㗗ᆳⱘ义䴶᭄䞣DŽা㽕೼䖭ϔ䕂Ё㟇ᇥ䖬᳝ϔϾ䖯⿟ⱘ义䴶ᇮ᳾ফࠄ㗗ᆳˈህϔᅮ㛑 mm•>rss ড᯴њϔϾ䖯⿟ऴ⫼ⱘݙᄬ义䴶᭄䞣ˈ㗠 mm•>wap_cnt ড᯴њ䆹䖯⿟೼ϔ䕂ᤶߎݙ㸠义䴶ⱘ Ⳉ㟇᳔ৢব៤ 0DŽ᠔ҹˈˈ1 ޣ 䱣ৢˈ↣⃵㗗ᆳ੠໘⧚њ䖭Ͼ䖯⿟ⱘϔϾ义䴶ˈህᇚ݊ mm•>swap_cnt ϔ⃵ĀҎষ᱂ᶹāDŽڣ0 ⱘᯊ׭䆒㕂དњⱘˈড᯴њᔧᯊ䆹䖯⿟ऴ⫼ݙᄬ义䴶ⱘ᭄䞣 mm•>rssDŽ䖭ህད Ͼ᭄ؐˈᰃ೼ᡞ᠔᳝䖯⿟ⱘ义䴶䌘⑤ᯊ䛑໘⧚њϔ䘡ˈҢ㗠↣Ͼ mm_struct 㒧ᵘЁⱘ䖭Ͼ᭄ؐ䛑ব៤њ 䖯⿟໪ⱘ᠔᳝䖯⿟DŽᠿᦣⱘⳂⱘᰃҢЁᡒߎ mm•>swap_cnt Ў᳔໻ⱘ䖯⿟DŽ↣Ͼ mm_struct 㒧ᵘЁⱘ䖭 䖬೼䖤㸠ˈ䖭Ͼ䖯⿟ህĀ∌䖰ϡ㨑āDŽ᠔ҹˈҢ init_task.next_task ྟ㟇 init_task ℶˈህᰃᠿᦣ䰸㄀ϔϾ ঠ৥䫒䖲᥹៤ϔϾ䯳߫DŽ㗠䖯⿟ init_task ᰃݙḌЁⱘ㄀ϔϾ䖯⿟ˈᰃ᠔᳝݊ᅗ䖯⿟ⱘ⼪ᅫDŽা㽕ݙḌ ҷⷕЁⱘݙሖ for ᕾ⦃㸼⼎Ң㄀ѠϾ䖯⿟ᓔྟᠿᦣ᠔᳝ⱘ䖯⿟DŽݙḌЁ᠔᳝ⱘ task_struct 㒧ᵘ䛑ҹ ⱘᯊ׭њDŽ rssDŽҹࠡ៥Ӏ೼䆆ࠄ䖭Ͼ㒧ᵘᯊᡞ rss 䏇䖛њˈ಴Ў䇈ᴹ䆱䭓DŽ㗠⦄೼ࠄњ㒧ড়ᚙ᱃੠⑤ҷⷕࡴҹ䇈ᯢ ㅵ⧚㒧ᵘ mm_struct Ё᳝ϔϾ៤ߚህᰃټ䲚⿄ЎĀ偏ݙ义䴶䲚ড়ā˄resident set˅ˈ݊໻ᇣ⿄Ў rssDŽ೼ᄬ 䆹䲚ড়Ёⱘ↣ϔϾ义䴶᠔ᇍᑨⱘ⠽⧚义䴶ϡϔᅮ䛑೼ݙᄬЁˈ೼ݙᄬЁⱘᕔᕔাᰃϔϾᄤ䲚DŽ䖭Ͼᄤ ݊㞾䑿ⱘ㰮ᄬぎ䯈ˈぎ䯈ЁᏆ㒣ߚ䜡ᑊᓎゟњ᯴ᇘⱘ义䴶ᵘ៤ϔϾ䲚ড়DŽԚᰃ೼ӏԩϔϾ㒭ᅮⱘᯊࠏˈ ߭ᴹᡒĀ᳔ড়䗖āⱘ䖯⿟ਸ਼˛ৃҹ䇈ᰃĀࡿᆠ⌢䋿āϢĀ䕂⌕തᑘāⳌ㒧ড়DŽ↣Ͼ䖯⿟䛑᳝ޚ᥂ҔМ ໛ˈ㗠ᑊϡϔᅮᰃ⠽⧚ᛣНϞⱘ义䴶ᤶߎˈ᠔ҹ೼ϟ䴶ⱘভ䗄Ё᠔䇧ĀᤶߎāᰃᑓНⱘDŽ䙷Мˈḍޚ 䖭䞠䖬ᑨᣛߎˈ䖭Ͼߑ᭄㱑✊ি“swap_outāˈԚᅲ䰙ϞাᰃЎᡞϔѯ义䴶ᤶߎࠄѸᤶ䆒໛Ϟ԰ད ໛DŽޚ䖭ѯ义䴶ᤶߎࠄѸᤶ䆒໛Ϟ԰ད 义䴶᯴ᇘ㸼ˈᇚヺড়ᴵӊⱘ义䴶᱖ᯊᮁᓔᇍݙᄬ义䴶ⱘ᯴ᇘˈ៪䖯ϔℹᇚ义䴶䕀ܹϡ⌏䎗⢊ᗕˈЎᡞ ೼↣⃵ᕾ⦃Ёˈ⿟ᑣ䆩೒Ң᠔᳝ⱘ䖯⿟ЁᡒࠄϔϾ᳔ড়䗖ⱘ䖯⿟ bestDŽᡒࠄњህᠿᦣ䖭Ͼ䖯⿟ⱘ Ёᰃϔѯ᥻ࠊֵᙃDŽ ᖗā᳝໮໻ˈेҷⷕЁ໪ሖᕾ⦃ⱘ⃵᭄DŽখ᭄ gfp_maskއᅮњᡞ义䴶ᤶߎএⱘĀއ䖯⿟ⱘ᭄䞣DŽ䖭Ͼ᭄ؐ 0 ᯊˈcounter ህㄝѢ˄nr_threads << SWAP_SHIFT˅ˈे 32 × nr_threadsˈ䖭䞠 nr_threads Ўᔧࠡ㋏㒳Ё 㑻Ўܜ㑻˄᳔߱Ў 6 㑻ˈ䗤⃵Ϟछ㟇 0 㑻˅䅵ㅫ㗠ᕫⱘDŽᔧӬܜᣀ㒓⿟˅ⱘϾ᭄੠䇗⫼ swap_out()ᯊӬ Ѣ counterˈ㗠 counter জᰃḍ᥂ݙḌЁ䖯⿟˄ࣙއ䖭Ͼߑ᭄ⱘЏԧᰃϔϾ for ᕾ⦃ˈᕾ⦃ⱘ⃵᭄প 378 } 377 return __ret; 376 } 375 } 374 break; 373 mmput(best); 372 __ret = swap_out_mm(best, gfp_mask); 371 } else { 370 break; 369 } 368 goto select; 367 assign = 1; 366 if (!assign && found_task > 0) { if (!best) { 365 112 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 287 } 286 } 285 address = vma•>vm_start; 284 break; 283 if (!vma) 282 vma = vma•>vm_next; 281 goto out_unlock; 280 if (result) 279 result = swap_out_vma(mm, vma, address, gfp_mask); 278 for (;;) { 277 276 address = vma•>vm_start; 275 if (address < vma•>vm_start) 274 if (vma) { 273 vma = find_vma(mm, address); 272 address = mm•>swap_address; 271 spin_lock(&mm•>page_table_lock); 270 */ 269 * and ptes. 268 * Find the proper vm•area after freezing the vma chain 267 /* 266 265 */ 264 * Go through process' page directory. 263 /* 262 261 struct vm_area_struct* vma; 260 unsigned long address; 259 int result = 0; 258 { 257 static int swap_out_mm(struct mm_struct * mm, int gfp_mask) [kswapd()>do_try_to_free_pages()>refill_inactive()>swap_out()>swap_out_mm()] ==================== mm/vmscan.c 257 295 ==================== ߑ᭄ swap_out_mm()ⱘҷⷕг೼ mm/vmscan.c Ё˖ Ͼ⫼᠋ˈҢ㗠ϡӮ೼Ё䗨㹿䞞ᬒDŽ ⫼䅵᭄ mm_usersˈᕙᅠ៤ҹৢݡ⬅ 373 㸠ⱘ mmput()ᇚ݊䖬ॳˈՓ䖭Ͼ᭄᥂㒧ᵘ೼᪡԰ⱘᳳ䯈໮њϔ 䗮䖛 356 㸠ⱘ atomic_inc()䗦๲ mm_struct 㒧ᵘЁⱘՓܜ৺߭䖨ಲ 0ˈ䖨ಲ䋳᭄߭ЎᓖᐌDŽ೼᪡԰Пࠡ 义䴶ⱘᤶߎ݋ԧᰃ⬅ swap_out_mm()ᴹᅠ៤ⱘDŽᔧ swap_out_mm()៤ࡳഄᤶߎϔϾ义䴶ᯊ䖨ಲ 1ˈ ᡒࠄϔϾĀ᳔Շᇍ䈵”best ҹৢˈህ㽕ձ⃵㗗ᆳ䆹䖯⿟ⱘ᯴ᇘ㸼ˈᇚヺড়ᴵӊⱘ义䴶ᤶߎএDŽ ᅮњϔϾ䖯⿟೼⡍ᅮᯊ䯈ݙᇍݙᄬ义䴶ⱘऴ⫼DŽއⱘ㗗ᆳ㗠㹿ߛᮁ㢹ᑆ义䴶ⱘ᯴ᇘDŽ䖭ϸϾ䖤ࡼⱘ㒧ড় ϔϾᮍ৥ᰃ಴义䴶ᓖᐌ㗠᳝᳈໮ⱘ义䴶ᓎゟ䍋៪ᘶ໡䍋᯴ᇘ˗঺ϔϾᮍ䴶߭ᰃ਼ᳳᗻഄফࠄ swap_out() ষ᱂ᶹāҹৢᠡӮড᯴ߎᴹDŽህ↣Ͼ䖯⿟ⱘ㾦ᑺ㗠㿔ˈᇍݙᄬ义䴶ⱘऴ⫼ᄬ೼ⴔϸϾᮍ৥Ϟⱘ䖤ࡼ˖ ᳔䖥ϔ⃵ĀҎষ᱂ᶹāҹৢ಴义䴶ᓖᐌ㗠ᤶܹ˄៪ᘶ໡᯴ᇘ˅ⱘ义䴶ˈ䖭ѯ义䴶ⱘ᭄䞣㽕ࠄϟϔ⃵ĀҎ ᣋ䋱ࠄ mm•>swap_cnt Ёˈ✊ৢݡҢ᳔ᆠ᳝ⱘ䖯⿟ᓔྟDŽԚᰃˈ᠔䇧ᇮ᳾ফࠄ㗗ᆳⱘ义䴶᭄䞣ᑊϡࣙᣀ ᯊ˄439̚444 㸠˅ˈݡᡞ䖭䞠ⱘሔ䚼䞣 assign 㕂៤ 1ˈݡᠿᦣϔ䘡DŽ䖭ϔ⃵ᇚ↣Ͼ䖯⿟ᔧࠡⱘ mm•>rss 113 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 53 return 1; 52 if (!mm•>swap_cnt) 51 50 goto out_failed; 49 if ((!VALID_PAGE(page)) || PageReserved(page)) 48 page = pte_page(pte); 47 goto out_failed; 46 if (!pte_present(pte)) 45 pte = *page_table; 44 43 int onlist; 42 struct page * page; 41 swp_entry_t entry; 40 pte_t pte; 39 { pte_t * page_table, int gfp_mask) 38 static int try_to_swap_out(struct mm_struct * mm, struct vm_area_struct* vma, unsigned long address, 37 */ 36 * have died while we slept). 35 * using a process that no longer actually exists (it might 34 * don't continue with the swap•out. Otherwise we may be 33 * NOTE! If it sleeps, it *must* return 1 to make sure we 32 * 31 * indicates it decreased rss, but the page was shared. 30 * zero if it couldn't do anything, and any other value 29 * threw something out, and we got a free page. It returns 28 * The swap•out functions return 1 if they successfully 27 /* >swap_out_pmd()>try_to_swap_out()] [kswapd()>do_try_to_free_pages()>refill_inactive()>swap_out()>swap_out_mm()>swap_out_vma()>swap_out_pgd() ==================== mm/vmscan.c 27 56 ==================== ಴Ў䖭ᰃ݇䬂᠔೼DŽϟ䴶ˈ៥Ӏϔℹϔℹᴹⳟᅗⱘ৘Ͼ⠛ᮁ˖ ᄬ义䴶DŽЁ䯈䖭޴Ͼߑ᭄䛑೼ৠϔϾ᭛ӊЁˈ䇏㗙ৃҹ㞾㸠䯙䇏DŽ䖭䞠៥ӀⳈ᥹ᴹⳟ try_to_swap_out()ˈ swap_out_pgd()ǃswap_out_pmd()ˈϔⳈࠄ try_to_swap_out()ˈ䆩೒ᤶߎ⬅ϔϾ义䴶㸼乍 pte ᠔ᣛ৥ⱘݙ ϔ⃵ӏࡵህᅠ៤њDŽ৺߭ህ䆩ϟϔϾ㰮ᄬऎ䯈DŽህ䖭ḋϔሖϔሖഄᕔϟ䇗⫼ˈ㒣䖛 swap_out_vma()ǃ ᡒࠄ݊᠔೼ⱘ㰮ᄬऎඳ vmaˈ✊ৢህ䇗⫼ swap_out_vma()䆩೒ᤶߎϔϾ义䴶DŽབᵰ៤ࡳ˄䖨ಲ 1˅ˈ䖭 ᳝ⱘ义䴶䛑Ꮖ㗗ᆳњϔ䘡ⱘᯊ׭ህজ⏙៤ 0˄㾕 289 㸠˅DŽ⿟ᑣ೼ϔϾ for ᕾ⦃Ёḍ᥂ᔧࠡⱘ䖭Ͼഄഔ mm•>swap_address 㸼⼎೼ᠻ㸠ⱘ䖛⿟Ё㽕᥹ⴔ㗗ᆳⱘ义䴶ഄഔDŽ᳔߱ᯊ䆹ഄഔЎ 0ˈࠄ᠔ˈܜ佪 295 } 294 return result; 293 spin_unlock(&mm•>page_table_lock); 292 out_unlock: 291 290 mm•>swap_cnt = 0; 289 mm•>swap_address = 0; Reset to 0 when we reach the end of address space */ */ 288 114 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 74 73 goto out_failed; 72 if (page•>age > 0) 71 */ 70 * (worse) start unneeded IO. 69 * is in active use by others, don't unmap it or 68 * If the page is in active use by us, or if the page 67 /* 66 65 age_page_down_ageonly(page); 64 /* The page is still mapped, so it can't be freeable... */ 63 if (!onlist) 62 } 61 goto out_failed; 60 age_page_up(page); 59 if (ptep_test_and_clear_young(page_table)) { 58 /* Don't look at this pte if it's been accessed recently. */ 57 onlist = PageActive(page); >swap_out_pmd()>try_to_swap_out()] [kswapd()>_do_try_to_free_pages()>refill_inactive()>swap_out()>swap_out_mm()>swap_out_vma()>swap_out_pgd() ==================== mm/vmscan.c 57 74 ==================== try_to_swap_out()ⱘҷⷕ˖ 1DŽ㒻㓁ᕔϟⳟ ޣ ህ㽕݋ԧഄ㗗ᆳϔϾ义䴶њˈ᠔ҹᇚ mm•>swap_cntˈމ䏇䖛њ䖭ϸ⾡⡍⅞ᚙ 䆌ᤶߎⱘ⠽⧚义䴶г㽕䏇䖛DŽܕℸ໪ˈᇍѢֱ⬭೼ݙᄬЁϡ 118 #define VALID_PAGE(page) ((page • mem_map) < max_mapnr) ==================== include/asm•i386/page.h 118 118 ==================== 䗮ᐌᰃ಴Ў⠽⧚义䴶೼໪䚼䆒໛˄՟བ㔥㒰᥹ষव˅Ϟˈ᠔ҹг䏇䖛䖭ϔ乍DŽމᬜⱘ⠽⧚义䴶ˈ䖭⾡ᚙ 䴶ⱘᑣো˄᭄㒘Ёⱘϟᷛ˅DŽ㽕ᰃ䖭Ͼᑣো໻Ѣ᳔໻ⱘ⠽⧚ݙᄬ义䴶ᑣো max_mapnrˈ䙷ህϡᰃϔϾ᳝ 䴶ⱘ pagp 㒧ᵘⱘᣛ䩜DŽ⬅Ѣ᠔᳝ⱘ page 㒧ᵘ䛑೼ mem_map ᭄㒘Ёˈ᠔ҹ(page • mem_map)ህᰃ䆹义 ডПˈབᵰ⠽⧚义䴶⹂೼ݙᄬЁˈህ䗮䖛 pte_pages()ᇚ义䴶㸼乍ⱘݙᆍᤶㅫ៤ᣛ৥䆹⠽⧚ݙᄬ义 ᇘⱘϟϔϾ义䴶DŽབᵰϔϾ义䴶㸼Ꮖ㒣かሑˈህݡᕔϞ䗔ϔሖ䆩ϟϔϾ义䴶㸼DŽ ᔧ try_to_swap_out()䖨ಲ 0 ᯊˈ݊Ϟϔሖⱘ⿟ᑣህӮ䏇䖛䖭Ͼ义䴶ˈ㗠䆩ⴔᤶߎৠϔϾ义䴶㸼Ё᯴ 107 return 0; 106 out_failed: ==================== mm/vmscan.c 106 107 ==================== ᄬЁˈབᵰϡ೼ݙᄬЁህ䕀৥ out_failedˈᴀ⃵᪡԰ህ༅䋹њ˖ ᇐDŽᡞ䖭Ͼ㸼乍ⱘݙᆍ䌟㒭ব䞣 pte ҹৢˈህ䗮䖛 pte_present()ᴹ⌟䆩䆹㸼乍᠔ᣛⱘ⠽⧚义䴶ᰃ৺೼ݙ 㽕䇈ᯢˈখ᭄ page_table ᅲ䰙Ϟᣛ৥ϔϾ义䴶㸼乍ǃ㗠ϡᰃ义䴶㸼ˈখ᭄ৡ page_table ᳝ѯ䇃ܜ佪 56 mm•>swap_cnt••; 55 54 115 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 138 } 137 page•>age = PAGE_AGE_MAX; 136 if (page•>age > PAGE_AGE_MAX) 135 page•>age += PAGE_AGE_ADV; 134 /* The actual page aging bit */ 133 132 activate_page(page); 131 if (!page•>age) 130 */ 129 * to the active list. 128 * We're dealing with an inactive page, move the page 127 /* 126 { 125 void age_page_up(struct page * page) >swap_out_pmd()>try_to_swap_out()>age_page_up()] [kswapd()>_do_try_to_free_pages()>refill_inactive()>swap_out()>swap_out_mm()>swap_out_vma()>swap_out_pgd() ==================== mm/swap.c 125 138 ==================== 䯈ˈ಴Ў↩コ䖭Ͼ义䴶᳔䖥Ꮖফࠄ䖛䆓䯂DŽ ЁDŽ㗠㽕ᰃ义䴶ϡ೼⌏䎗义䴶䯳߫Ёˈ߭䗮䖛ᯢ age_page_up()๲ࡴ义䴶ৃҹ⬭ϟᴹĀҹ㾖ৢᬜāⱘᯊ PG_referenced ᷛᖫԡ㕂៤ 1DŽгህᰃ䇈ˈᇚ义䴶㸼乍Ё㸼⼎ফࠄ䖛䆓䯂ⱘֵᙃ䕀⿏㟇义䴶ⱘ᭄᥂㒧ᵘ ϔ⚍џᚙ˖བᵰ义䴶䖬⌏䎗ˈህ㽕䗮䖛 SetPageReferenced()ᇚ page ᭄᥂㒧ᵘЁⱘخout_failed Ϟࠡ䖬㽕 བᵰ义䴶䖬Āᑈ䕏āˈ䙷ህ㚃ᅮϡᰃ㽕ࡴҹᤶߎⱘᇍ䈵ˈ᠔ҹг㽕䕀ࠄ out_failedDŽϡ䖛ˈ೼䕀ࠄ ໛DŽޚད 䴶㸼乍Ёⱘ_PAGE_ACCESSED ᷛᖫԡ⏙៤ 0ˈݡᡞᅗݭಲ义䴶㸼乍ˈЎϟϔ⃵ݡᴹ⌟䆩䖭Ͼᷛᖫԡ԰ ফࠄ䖛䆓䯂ህ乘⼎ⴔ೼ϡЙⱘᇚᴹгӮফࠄ䆓䯂ˈ᠔ҹϡᅰᇚ݊ᤶߎDŽপᕫњℸ乍ֵᙃҹৢˈህᇚ义 䇗⫼ try_to_swap_out()㟇Ҟˈ䆹义䴶㟇ᇥᏆ㒣㹿䆓䯂䖛ϔ⃵ˈ᠔ҹ䇈义䴶䖬Āᑈ䕏āDŽϔ㠀㗠㿔ˈ᳔䖥 _PAGE_ACCESSED ᷛᖫԡ䆒៤ 1DŽ᠔ҹˈབᵰ pte_young()䖨ಲ 1ˈህ㸼⼎ҢϞϔ⃵ᇍৠϔϾ义䴶㸼乍 义䴶㸼乍ᇚϔϾ㰮ᄬഄഔ᯴ᇘ៤ϔϾ⠽⧚ഄഔˈ䖯㗠䆓䯂䖭Ͼ⠽⧚ഄഔᯊˈህӮ㞾ࡼᇚ䆹㸼乍ⱘ བࠡ᠔䗄ˈ义䴶㸼乍Ё᳝Ͼ_PAGE_ACCESSED ᷛᖫԡDŽᔧ i386 CPU ⱘݙᄬ᯴ᇘᴎࠊ೼䗮䖛ϔϾ test_and_clear_bit(_PAGE_BIT_ACCESSED, ptep); } 285 static inline int ptep_test_and_clear_young(pte_t *ptep) { return ==================== include/asm•i386/pgtable.h 285 285 ==================== ptep_test_and_clear_young()⌟䆩˄ᑊ⏙ 0˅ⱘˈ݊ᅮН೼ include/asm•i386/pgtable.h Ё˖ Ѣ䖭Ͼ义䴶᳔䖥ᰃ৺ফࠄњ䆓䯂DŽ䖭ᰃ䗮䖛 inline ߑ᭄އϔϾ᯴ᇘЁⱘ⠽⧚义䴶ᰃ৺ᑨ䆹ᤶߎˈপ inactive_dirty_list Ё៪ᶤϾ inactive_clean_list Ёˈㄝϔϟህ㽕Փ⫼⌟䆩ⱘ㒧ᵰDŽ ϔϾৃѸᤶⱘ⠽⧚义䴶ϔᅮ೼ᶤϾ LRU 䯳߫Ёˈϡ೼ active_list 䯳߫Ёህ䇈ᯢϔᅮ೼ 230 #define PageActive(page) test_bit(PG_active, &(page)•>flags) ==================== include/linux/mm.h 230 230 ==================== ᖫԡ㸼⼎ᔧࠡ䖭Ͼ义䴶ᰃ৺Ā⌏䎗āˈेᰃ৺ҡ೼ active_list 䯳߫Ё˖ ݙᄬ义䴶ⱘ page 㒧ᵘЁˈᄫ↉ flags Ёⱘ৘⾡ᷛᖫԡড᯴ⴔ义䴶ⱘᔧࠡ⢊ᗕˈ݊Ёⱘ PG_active ᷛ 116 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 93 */ 92 * memory, and we should just continue our scan. 91 * Return 0, as we didn't actually free any real 90 * 89 * any IO • it's already up•to•date on disk. 88 * we can just drop our reference to it without doing 87 * Is the page already in the swap cache? If so, then 86 /* 85 84 flush_tlb_page(vma, address); 83 pte = ptep_get_and_clear(page_table); 82 */ 81 * bits in hardware. 80 * is needed on CPUs which update the accessed and dirty 79 * nuke this pte, so read and clear the pte. This hook 78 /* From this point on, the odds are that we're going to 77 76 goto out_failed; 75 if (TryLockPage(page)) ut_pmd()>try_to_swap_out()] [kswapd()>_do_try_to_free_pages()>refill_inactive()>swap_out()>swap_out_mm()>swap_out_vma()>swap_out_pgd()>swap_o ==================== mm/vmscan.c 75 108 ==================== 㒣䖛Ϟ䴶䖭ѯㄯ䗝ˈ䖭Ͼ义䴶ॳ߭ϞᏆ㒣ᰃৃҹᤶߎⱘᇍ䈵њDŽ៥Ӏ㒻㓁ᕔϟⳟҷⷕ˖ া㽕 page•>age ᇮ᳾䖒ࠄ 0ˈህ䖬ϡ㛑ᇚℸ义䴶ᤶߎˈ᠔ҹг㽕䕀ࠄ out_failedDŽ 110 } 109 page•>age /= 2; 108 { 107 void age_page_down_ageonly(struct page * page) 106 */ 105 * know we can't deactivate the page (yet). 104 * We use this (minimal) function in the case where we 103 /* ==================== mm/swap.c 103 110 ==================== ˄mm/swap.c˅˖ ᇥ݊ᇓੑޣ ()䗮䖛 age_page_down_ageonlyܜे义䴶ⱘᇓੑDŽབᵰ义䴶ϡ೼⌏䎗䯳߫Ё߭䖬㽕 ࠄ䆓䯂ህ偀Ϟᡞᅗᤶߎএˈ䖬㽕㒭ᅗϔϾĀ⬭㘠ᆳⳟāⱘᴎӮDŽᆳⳟ໮Йਸ਼˛䙷ህᰃ pape•>age ⱘؐˈ བᵰ义䴶ᏆϡĀᑈ䕏āˈ䙷ህ㽕䖯ϔℹ㗗ᆳњDŽᔧ✊ˈгϡ㛑಴Ў䖭Ͼ义䴶೼䖛এϔϾ਼ᳳЁ᳾ফ page_launder()ˈ䅽݊೼㋏㒳↨䕗ぎ䯆ᯊݡᴹ໘⧚ˈ᠔ҹ䖭ḋⱘ义䴶᳝ৃ㛑ϡ೼⌏䎗䯳߫ЁDŽ ᘶ໡ϔϾϡ⌏䎗义䯈ⱘ᯴ᇘᯊˈᑊϡゟेᡞᅗ䕀ܹ⌏䎗义䴶䯳߫ˈ㗠ᡞ䖭乍Ꮉ԰⬭㒭ࠡ䴶ⳟࠄⱘ ݙᄬЁ˅ˈᗢМজӮϡ೼⌏䎗义䴶䯳߫Ёਸ਼˛ҹৢ䇏㗙ህӮ೼ do_swap_page()Ёⳟࠄˈᔧ಴义䴶ᓖᐌ㗠 Āᑈ䕏āњDŽ䇏㗙г䆌Ӯ䯂ˈ᮶✊䖭Ͼ义䴶ᰃ᳝᯴ᇘⱘ˄৺߭ϡӮߎ⦄೼Ⳃᷛ䖯⿟ⱘ᯴ᇘ㸼ЁᑊϨ೼ Ͼ䖯⿟੠䖭Ͼ义䴶ᯊˈབᵰৠϔ义䴶㸼乍 pte Ёⱘ_PAGE_ACCESSED ᷛᖫԡҡ✊Ў 0ˈ䙷ህ㸼⼎ϡݡ 䕀ࠄ out_failed ҹৢˈህ೼䙷䞠䖨ಲ 0ˈ䅽᳈催ሖⱘ⿟ᑣ䏇䖛䖭Ͼ义䴶DŽ䖭ḋˈࠄϟϔ䕂জ䕂ࠄ䖭 117 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! >swap_out_pmd()>try_to_swap_out()>swap_duplicate()] [kswapd()>_do_try_to_free_pages()>refill_inactive()>swap_out()>swap_out_mm()>swap_out_vma()>swap_out_pgd() ==================== mm/swapfile.c 820 871 ==================== Ѡ㗙ᰃ㽕䗦๲ⳌᑨⲬϞ义䴶ⱘ݅ѿ䅵᭄ˈ݊ҷⷕ೼ mm/swapfile.c Ё˖ 义䴶೼Ѹᤶ䆒໛Ϟⱘ᯴䈵ⱘᣛᓩDŽߑ᭄ swap_duplicate()ⱘ԰⫼ˈϔ㗙ᰃ㽕ᇍ㋶ᓩ乍ⱘݙᆍ԰ϔѯẔ偠ˈ ⱘᤶܹˋᤶߎ义䴶DŽℸᯊ page 㒧ᵘЁⱘ index ᄫ↉ᰃϔϾ 32 ԡⱘ㋶ᓩ乍 swp_entry_tˈᅲ䰙Ϟᰃᣛ৥ ᷛᖫԡ PG_swap_cache Ў 1 㸼⼎ page 㒧ᵘ೼ swapper_space 䯳߫Ёˈг䇈ᯢⳌᑨⱘ义䴶ᰃϾ᱂䗮 217 #define PageSwapCache(page) test_bit(PG_swap_cache, &(page)•>flags) ==================== include/linux/mm.h 217 217 ==================== ᅮНЎ˄include/linux/mm.h˅˖ ᵰ义䴶Ꮖ㒣ফ䖛ݭ䆓䯂ህ㽕䗮䖛 set_page_dirty()ᇚ݊䕀ܹĀ㛣ā义䴶䯳߫DŽᅣ᪡԰ PageSwapCache()ⱘ ā੠Ā㛣āϸϾˈ᠔ҹབޔ䖭Ͼ义䴶ˈህৃҹњDŽϡ䖛ˈЎ义䴶ᤶܹˋᤶߎ㗠䆒㕂ⱘ䯳߫гߚЎĀᑆ ݙⱘ䯳߫Ёˈ䙷М义䴶ⱘݙᆍᏆ㒣೼Ѹᤶ䆒໛Ϟˈা㽕ᡞ᯴ᇘ᱖ᯊᮁᓔˈ㸼⼎Ⳃᷛ䖯⿟Ꮖ㒣ৠᛣ䞞ᬒ བᵰ义䴶ⱘ page ᭄᥂㒧ᵘᏆ㒣೼Ў义䴶ᤶܹˋᤶߎ㗠䆒㕂ⱘ䯳߫Ёˈे᭄᥂㒧ᵘ swapper_space 㒣ᬍবDŽ 0 ਸ਼˛೼໮໘⧚఼㋏㒳ЁˈⳂᷛ䖯⿟᳝ৃ㛑ℷ೼঺ϔϾ CPU Ϟ䖤㸠ˈ᠔ҹ݊᯴ᇘ㸼乍ⱘݙᆍ᳝ৃ㛑Ꮖ ⱘ᯴ᇘDŽࠡ䴶೼ 45 㸠Ꮖ㒣䇏њϔ⃵义䴶㸼乍ⱘݙᆍˈЎҔМ⦄೼䖬㽕ݡ䇏ϔ⃵ˈ㗠ϡҙҙᰃᡞ㸼乍⏙ 䗮䖛 ptep_get_and_clear()ݡ䇏ϔ⃵义䴶㸼乍ⱘݙᆍˈᑊᡞ㸼乍ⱘݙᆍ⏙៤ 0ˈ᱖ᯊ᩸䫔䆹义䴶ܜ佪 ໛њDŽޚ԰ᤶߎⱘމࡴ䫕៤ࡳҹৢˈህৃҹḍ᥂义䴶ⱘϡৠᚙ 㛑㒻㓁໘⧚䖭Ͼ page ᭄᥂㒧ᵘˈ㗠জাད༅䋹䖨ಲDŽ 䫕ԣњˈℸᯊህϡܜབᵰ䖨ಲؐЎ 1ˈे㸼⼎ PG_locked ᷛᖫԡॳᴹህᏆ㒣ᰃ 1ˈᏆ㒣㹿߿ⱘ䖯⿟ 183 #define TryLockPage(page) test_and_set_bit(PG_locked, &(page)•>flags) ==================== include/linux/mm.h 183 183 ==================== TryLockPage()ᇚ page ᭄᥂䫕ԣ˄include/linux/mm.h˅˖ ϟ䴶ᇍ page ᭄᥂㒧ᵘⱘ᪡԰⍝ঞ䳔㽕Ѧ᭹ˈ៪㗙䇈⣀ऴⱘᴵӊϟ䖯㸠ⱘ᪡԰ˈ᠔ҹ䖭䞠䗮䖛 108 } 107 return 0; 106 out_failed: 105 page_cache_release(page); 104 deactivate_page(page); 103 mm•>rss••; 102 UnlockPage(page); 101 drop_pte: 100 set_pte(page_table, swp_entry_to_pte(entry)); 99 swap_duplicate(entry); 98 set_swap_pte: 97 set_page_dirty(page); 96 if (pte_dirty(pte)) 95 entry.val = page•>index; if (PageSwapCache(page)) { 94 118 119 820 /* 821 * Verify that a swap entry is valid and increment its swap map count. 822 * Kernel_lock is held, which guarantees existance of swap device. 823 * 824 * Note: if swap_map[] reaches SWAP_MAP_MAX the entries are treated as 825 * "permanent", but will be reclaimed by the next swapoff. 826 */ 827 int swap_duplicate(swp_entry_t entry) 828 { 829 struct swap_info_struct * p; 830 unsigned long offset, type; 831 int result = 0; 832 833 /* Swap entry 0 is illegal */ 834 if (!entry.val) 835 goto out; 836 type = SWP_TYPE(entry); 837 if (type >= nr_swapfiles) 838 goto bad_file; 839 p = type + swap_info; 840 offset = SWP_OFFSET(entry); 841 if (offset >= p•>max) 842 goto bad_offset; 843 if (!p•>swap_map[offset]) 844 goto bad_unused; 845 /* 846 * Entry is valid, so increment the map count. 847 */ 848 swap_device_lock(p); 849 if (p•>swap_map[offset] < SWAP_MAP_MAX) 850 p•>swap_map[offset]++; 851 else { 852 static int overflow = 0; 853 if (overflow++ < 5) 854 printk("VM: swap entry overflow\n"); 855 p•>swap_map[offset] = SWAP_MAP_MAX; 856 } 857 swap_device_unlock(p); 858 result = 1; 859 out: 860 return result; 861 862 bad_file: 863 printk("Bad swap file entry %08lx\n", entry.val); 864 goto out; 865 bad_offset: 866 printk("Bad swap offset entry %08lx\n", entry.val); 867 goto out; 868 bad_unused: Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 170 * 169 * caller has and (maybe) one for the buffers. 168 * One for the cache, one for the extra reference the 167 /* 166 { 165 void deactivate_page_nolock(struct page * page) 164 */ 163 * page is left alone. 162 * called on a page which is not on any of the lists, the 161 * from one of the inactive lists to the active list. If 160 * inactive list, while activate_page will move a page back 159 * Deactivate_page will move an active page to the right 158 * 157 * @nolock • are we already holding the pagemap_lru_lock? 156 * @page: the page we want to move 155 * (de)activate_page • move pages from/to active and inactive lists 154 /** >swap_out_pmd()>try_to_swap_out()>deactivate_page()>deactivate_page_nolock()] [kswapd()>_do_try_to_free_pages()>refill_inactive()>swap_out()>swap_out_mm()>swap_out_vma()>swap_out_pgd() 194 } 193 spin_unlock(&pagemap_lru_lock); 192 deactivate_page_nolock(page); 191 spin_lock(&pagemap_lru_lock); 190 { 189 void deactivate_page(struct page * page) >swap_out_pmd()>try_to_swap_out()>deactivate_page()] [kswapd()>_do_try_to_free_pages()>refill_inactive()>swap_out()>swap_out_mm()>swap_out_vma()>swap_out_pgd() ==================== mm/swap.c 189 194 ==================== ᇚ义䴶ⱘ page 㒧ᵘҢ⌏䎗义䴶䯳߫䕀⿏ࠄᶤϾϡ⌏䎗义䴶䯳߫˄mm/swap.c˅˖ 㒣⒵䎇њব៤ϡ⌏䎗义䴶ⱘᴵӊˈ᠔ҹ೼䇗⫼ deactivate_page()ᯊ᳝ᴵӊഄᇚ݊䆒㕂៤ϡ⌏䎗⢊ᗕˈᑊ ᇥњϔϾ义䴶DŽ⬅Ѣ៥Ӏ䖭Ͼ⠽⧚义䴶ᮁᓔњϔϾ᯴ᇘˈᕜৃ㛑ᏆޣⳂᷛ䖯⿟ⱘ偏ݙ义䴶䲚ড় rss Ёህ ᇍݙᄬ义䴶ⱘ᯴ᇘህব៤њᇍⲬϞ义䴶ⱘ᯴ᇘDŽ䖭ḋˈᔧᠻ㸠ࠄᷛো drop_pte ⱘഄᮍˈܜ义䴶㸼乍ˈॳ ಲࠄ try_to_swap_out()ⱘҷⷕЁˈ100 㸠䇗⫼ set_pte()ˈᡞ䖭Ͼᣛ৥ⲬϞ义䴶ⱘ㋶ᓩ乍㕂ܹⳌᑨⱘ SWAP_MAP_MAXDŽ䗦๲ⲬϞ义䴶ⱘ݅ѿ䅵᭄㸼⼎䖭Ͼ义䴶⦄೼໮њϔϾ⫼᠋DŽ 䴶ॳᴹህᏆ㒣೼Ѹᤶ䆒໛Ϟˈ݊䅵᭄ᰒ✊ϡᑨЎ 0ˈ৺߭ህ䫭њ˗঺ϔᮍ䴶ˈ䗦๲ҹৢгϡᑨ䖒ࠄ 䖭Ͼ᭄᥂㒧ᵘЁⱘ᭄㒘 swap_map[]ˈ߭䆄ᔩⴔѸᤶ䆒໛Ϟ৘Ͼ义䴶ⱘ݅ѿ䅵᭄DŽ⬅Ѣℷ೼໘⧚Ёⱘ义 োDŽҹℸЎϟᷛˈህৃ೼ݙḌЁⱘ᭄㒘 swap_info ЁᡒࠄⳌᑨѸᤶ䆒໛ⱘ swap_info_struct ᭄᥂㒧ᵘDŽ ໛ᴀ䑿ⱘᑣোDŽҹࠡ䖬䆆䖛ˈ݊Ёⱘԡ↉ type ᅲ䰙ϞϢĀ㉏ൟā↿᮴݇㋏ˈ㗠ᰃҷ㸼ⴔѸᤶ䆒໛ⱘᑣ ϔᅮᰃ 0ˈ᳔催ⱘ˄24 ԡ˅ԡ↉ offset Ў䆒໛Ϟⱘ义䴶ᑣোˈ݊ԭⱘ˄7 ԡ˅ԡ↉ type ߭݊ᅲᰃѸᤶ䆒 ҹࠡ䆆䖛ˈ᭄᥂㒧ᵘ swp_entry_t ᅲ䰙Ϟᰃ 32 ԡ᮴ヺোᭈ᭄ˈ݊ݙᆍϡৃ㛑ܼᰃ 0ˈԚᰃ᳔Ԣԡै 871 } 870 goto out; printk("Unused swap offset entry in swap_dup %08lx\n", entry.val); 869 120 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 234 #define del_page_from_active_list(page) { \ ==================== include/linux/swap.h 234 240 ==================== ⌏䎗䯳߫㜅䫒ᰃ⬅ᅣ᪡԰ del_page_from_active_list()ᅠ៤ⱘˈ݊ᅮН೼ include/linux/swap.h Ё˖ 䎗ᯊˈᑨ䆹ᡞᅗ䕀⿏ࠄાϔϾ䯳߫Ёএਸ਼˛㄀ϔℹᘏᰃᡞᅗ䕀ܹĀ㛣ā义䴶䯳߫DŽᇚϔϾ page 㒧ᵘҢ ᳝߭ᕜ໮ˈ↣Ͼ义䴶ㅵ⧚ऎЁ䛑᳝Ͼ inactive_clean_list 䯳߫DŽ䙷МˈᔧϔϾॳᴹ⌏䎗ⱘ义䴶ব៤ϡ⌏ ā义䴶䯳߫ޔᄬϔ↉ᯊ䯈DŽϡ⌏䎗Ā㛣ā义䴶䯳߫া᳝ϔϾˈ䙷ህᰃ inactive_dirty_list˗㗠ϡ⌏䎗Āᑆ 䯳߫ˈ䖭ḋⱘ义䴶ॳ߭ϞᏆৃ԰Ўぎ䯆义䴶ߚ䜡ˈাᰃ಴Ў义䴶Ёⱘݙᆍ䖬ৃ㛑᳝⫼ˈ಴㗠ݡќҹֱ ā义䴶ޔāDŽ঺ϔ⾡ᰃ“cleanāˈे㚃ᅮ䎳Ѹᤶ䆒໛Ϟⱘ义䴶ϔ㟈ⱘĀᑆޔ⋪㽕ᡞᅗݭߎএᠡ㛑ᡞᅗĀ 䖛ˈ಴㗠䎳Ѹᤶ䆒໛Ϟⱘ义䴶ϡϔ㟈ⱘĀ㛣ā义䴶䯳߫ˈ䖭ḋⱘ义䴶ϡ㛑偀Ϟህᣓᴹߚ䜡ˈ಴Ў䖬䳔 ⿏ࠄϔϾϡ⌏䎗䯳߫ЁএDŽৃᰃˈ㋏㒳Ё᳝ϸ⾡ϡ⌏䎗义䴶䯳߫DŽϔ⾡ᰃ“dirtyāˈेৃ㛑᳔䖥Ꮖ㹿ݭ ᇚϔϾ⌏䎗ⱘ义䴶ব៤ϡ⌏䎗ᯊˈ㽕ᡞ䆹义䴶ⱘ page 㒧ᵘҢ⌏䎗义䴶ⱘ LRU 䯳߫ active_list Ё䕀 deactivate_page_nolock()ᑊ᮴ᆇ໘DŽ ᯊህ䕀ܹњϡ⌏䎗⢊ᗕDŽৠᯊˈҢҷⷕЁгৃⳟߎˈᇍϡ೼⌏䎗䯳߫Ёⱘ义䴶ݡ䇗⫼ϔ⃵ ߭ᯊᠡⳳⱘৃҹᇚ义䴶䕀ܹϡ⌏䎗䯳߫DŽ໮᭄᳝⫼᠋ぎ䯈᯴ᇘⱘݙᄬ义䴶䛑া᳝ϔϾ᯴ᇘˈℸޚϝᴵ ߭ϔ᳝݅ϝᴵˈা᳝೼⒵䎇њ䖭ޚぎ䯈ᴹ῵ᢳ⹀Ⲭˈ䖭ḋⱘ义䴶∌䖰ϡӮব៤ϡ⌏䎗DŽ䖭ḋˈ߸ᮁⱘ ᓔⱘ᯴ᇘᰃ䆹ݙᄬ义䴶ⱘ᳔ৢϔϾ᯴ᇘDŽℸ໪ˈݙᄬ义䴶г᳝ৃ㛑⫼԰ ramdiskˈेҹϔ䚼ߚݙᄬ⠽⧚ 㗠䖭Ͼ䯳߫߭៤њ䆹义䴶ⱘ঺ϔϾĀ⫼᠋āDŽ᠔ҹˈᔧ page•>buffers 䴲 0 ᯊˈmaxcount Ў 3 䇈ᯢ߮ᮁ ऎˈ݊ page 㒧ᵘЁⱘᣛ䩜 buffers ᣛ৥ϔϾ buffer_head ᭄᥂㒧ᵘ䯳߫ˈކDŽℸᯊ义䴶ߦߚ៤㢹ᑆ㓧ކ ᭛ӊˈ㗠䖭Ͼ᭛ӊজᏆ㒣㹿ᠧᓔˈᣝᐌ㾘ⱘ᭛ӊ᪡԰䆓䯂ˈ಴ℸ䖭Ͼ义䴶জৠᯊ⫼԰䇏ˋݭ᭛ӊⱘ㓧 ህᰃᔧ䖭Ͼ义䴶ᰃ䗮䖛 mmap()᯴ᇘࠄ᱂䗮ˈމህᰃ䖭䞠ⱘ maxcountDŽԚᰃˈ䖭䞠䖬㽕㗗㰥ϔ⾡⡍⅞ᚙ ߭ˈޚ᯴ᇘDŽ᮶✊᳔ৢⱘ᯴ᇘᏆ㒣ᮁᓔˈ䖭义䴶ᔧ✊ᰃϡ⌏䎗ⱘњDŽ᠔ҹᡞᇣѢㄝѢ 2 ԰ЎϔϾ߸ᮁⱘ ህՓ count ࡴ 1DŽ䖭ḋˈབᵰ䖭Ͼ䅵఼᭄ⱘؐЎ 2ˈህ䇈ᯢ߮ᮁᓔⱘ᯴ᇘᏆ㒣ᰃ䆹⠽⧚义䴶ⱘ᳔ৢϔϾ ˄㾕__alloc_pages()੠ rmqueue()ⱘҷⷕ˅ˈℸৢ↣ᔧ义䴶๲ࡴϔϾĀ⫼᠋āˈབᓎゟ៪ᘶ໡ϔϾ᯴ᇘᯊˈ ೼⠽⧚义䴶ⱘ page 㒧ᵘЁ᳝Ͼ䅵఼᭄ countˈぎ䯆义䴶ⱘ䖭Ͼ䅵᭄Ў 0ˈ೼ߚ䜡义䴶ᯊᇚ݊䆒Ў 1 187 } 186 } 185 add_page_to_inactive_dirty_list(page); 184 del_page_from_active_list(page); 183 if (PageActive(page) && page_count(page) <= maxcount && !page_ramdisk(page)) { 182 */ 181 * (some pages aren't on any list at all) 180 * Don't touch it if it's not on the active list. 179 /* 178 177 ClearPageReferenced(page); 176 page•>age = 0; 175 int maxcount = (page•>buffers ? 3 : 2); 174 */ 173 * inactive_clean list it doesn't need to be perfect... 172 * Besides, as long as we don't move unfreeable pages to the This isn't perfect, but works for just about everything. * 171 121 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Ѣ义䴶᠔೼ⱘऎ䯈ᰃ৺ᦤկњϔϾ vm_operations_struct ᭄᥂㒧ᵘˈᑊϨ䗮䖛䖭Ͼ᭄᥂㒧ᵘЁⱘߑއপ ᯴䈵ˈ៪㗙义䴶ᴹ㞾ϔϾ᭛ӊDŽ䇏㗙ৃҹಲ乒ϔϟˈ೼಴义䴶᮴᯴ᇘ㗠থ⫳㔎义ᓖᐌᯊˈ݋ԧⱘ໘⧚ 㽕ᰃ义䴶ⱘ page 㒧ᵘϡ೼ swapper_space ⱘ䯳߫Ёਸ਼˛䖭䇈ᯢᇮ᳾Ў䆹义䴶೼Ѹᤶ䆒໛Ϟᓎゟ䍋 ⱘᯊ׭˄㾕 52 㸠˅DŽℷᰃ䖭ḋˈᠡՓ swap_out_mm()㛑໳ձ⃵㗗ᆳ੠໘⧚ϔϾ䖯⿟ⱘ᠔᳝义䴶DŽ ϟᠡ䖨ಲ 1ˈ䙷ህᰃᔧ mm•>swap_cnt 䖒ࠄњ 0މout_failed ໘ਸ਼˛݊ᅲˈtry_to_swap_out()ҙ೼ϔ⾡ᚙ 㟇ℸˈᇍ䖭Ͼ义䴶ⱘ໘⧚ህᅠ៤њˈѢᰃজࠄњᷛো out_failed ໘㗠䖨ಲ 0DŽЎҔМজᰃࠄ䖒 㟇ᇥ䖬᳝䖭МϔϾᓩ⫼ˈ᠔ҹϡӮ䖒ࠄ 0DŽ 䖒ࠄњ 0 ህ䗮䖛__free_pages_ok()ᇚ䆹义䴶䞞ᬒDŽ೼䖭䞠ˈ⬅Ѣ义䴶䖬೼ϡ⌏䎗义䴶䯳߫Ёᇮ᳾䞞ᬒˈ ✊ৢ⌟䆩ᰃ৺䖒ࠄњ 0ˈབᵰˈ1 ޣ䖭Ͼߑ᭄䗮䖛 put_page_testzero()ˈᇚ page 㒧ᵘЁ count ⱘؐ 152 #define put_page_testzero(p) atomic_dec_and_test(&(p)•>count) ==================== include/linux/mm.h 152 152 ==================== 553 } 552 __free_pages_ok(page, order); 551 if (!PageReserved(page) && put_page_testzero(page)) 550 { 549 void __free_pages(struct page *page, unsigned long order) ==================== mm/page_alloc.c 549 553 ==================== 379 #define __free_page(page) __free_pages((page), 0) ==================== include/linux/mm.h 379 379 ==================== 34 #define page_cache_release(x) __free_page(x) ==================== include/linux/pagemap.h 34 34 ==================== ⫼䅵᭄ˈ䖭ᰃ⬅ᅣ᪡԰ page_cache_release()ǃᅲ䰙Ϟᰃ⬅_free_pages()ᅠ៤ⱘDŽ ᇍ䖭Ͼ义䴶ⱘՓޣজಲࠄ try_to_swap_out()ⱘҷⷕЁˈ᮶✊ᮁᓔњᇍϔϾݙᄬ义䴶ⱘ᯴ᇘˈህ㽕䗦 ᇚ PG_inactive_dirty ᷛᖫԡ䆒៤ 1DŽ⊼ᛣ೼䖭Ͼ䖛⿟Ё page 㒧ᵘЁⱘՓ⫼䅵᭄ᑊ᳾ᬍবDŽ 䖭䞠ⱘ ClearPageActive()੠ SetPageInactiveDirty()ߚ߿ᇚ page 㒧ᵘЁⱘ PG_active ᷛᖫԡ⏙៤ 0 ੠ 224 } 223 page•>zone•>inactive_dirty_pages++; \ 222 nr_inactive_dirty_pages++; \ 221 list_add(&(page)•>lru, &inactive_dirty_list); \ 220 SetPageInactiveDirty(page); \ 219 ZERO_PAGE_BUG \ 218 DEBUG_ADD_PAGE \ 217 #define add_page_to_inactive_dirty_list(page) { \ ==================== include/linux/swap.h 217 224 ==================== ᇚϔϾ page 㒧ᵘ䫒ܹϡ⌏䎗䯳߫ˈ߭⬅ add_page_to_inactive_dirty_list()ᅠ៤˖ 240 } 239 ZERO_PAGE_BUG \ 238 DEBUG_ADD_PAGE \ 237 nr_active_pages••; \ 236 ClearPageActive(page); \ list_del(&(page)•>lru); \ 235 122 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 149 add_to_swap_cache(page, entry); 148 /* Add it to the swap cache and mark it dirty */ 147 146 goto out_unlock_restore; /* No swap space left */ 145 if (!entry.val) 144 entry = get_swap_page(); 143 */ 142 * page with that swap entry. 141 * we have the swap cache set up to associate the 140 * get a suitable swap entry for it, and make sure 139 * This is a dirty, swappable page. First of all, 138 /* 137 136 } 135 goto drop_pte; 134 set_page_dirty(page); 133 if (page•>mapping) { 132 */ 131 * to its own backing store. 130 * entry for it, or we should write it back 129 * we should either create a new swap cache 128 * Ok, it's really dirty. That means that 127 /* 126 125 goto drop_pte; 124 if (!pte_dirty(pte)) 123 flush_cache_page(vma, address); 122 */ 121 * some real work in the future in "refill_inactive()". 120 * Basically, this just makes it possible for us to do 119 * 118 * our scan. 117 * somewhere, and as such we should just continue 116 * memory, as the page will just be in the page cache 115 * However, this won't actually free any real 114 * 113 * it.. 112 * by just paging it in again, and we can just drop 111 * Is it a clean page? Then it must be recoverable 110 /* >swap_out_pmd()>try_to_swap_out()] [kswapd()>_do_try_to_free_pages()>refill_inactive()>swap_out()>swap_out_mm()>swap_out_vma()>swap_out_pgd() ==================== mm/vmscan.c 110 157 ==================== 䜡ϔϾ义䴶DŽ៥Ӏ㒻㓁ᕔϟⳟ try_to_swap_out()ⱘҷⷕˈϟ䴶ϔ↉ህᰃᇍℸ⾡义䴶ⱘ໘⧚˖ ᡞᅗ᯴ᇘࠄぎⱑ义䴶ˈҹৢ䳔㽕ݭⱘᯊ׭ᠡЎП঺㸠ߚܜⳌᑨⱘⲬϞ义䴶˄಴Ў义䴶㸼乍Ў 0˅ˈℸᯊ ᰃѸᤶ䆒໛˅ˈℸᯊḍ᥂㰮ᄬഄഔᯊҹ䅵ㅫߎ೼᭛ӊЁⱘ义䴶ԡ㕂DŽ৺߭ህᰃ᱂䗮ⱘ义䴶ˈԚᇮ᳾ᓎゟ ᭄ᣛ䩜 nopage ᦤկњ⡍ᅮⱘ᪡԰DŽབᵰᦤկњ nopage ᪡԰ˈህ䇈ᯢ䆹ऎ䯈ⱘ义䴶ᴹ㞾ϔϾ᭛䯈˄㗠ϡ 123 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 141 spin_lock(&pagecache_lock); 140 139 struct address_space *mapping = page•>mapping; 138 { 137 void __set_page_dirty(struct page *page) 136 */ 135 * Add a page to the dirty page list. 134 /* ==================== mm/filemap.c 134 147 ==================== 191 } 190 __set_page_dirty(page); 189 if (!test_and_set_bit(PG_dirty, &page•>flags)) 188 { 187 static inline void set_page_dirty(struct page * page) >swap_out_pmd()>try_to_swap_out()>set_page_dirty()] [kswapd()>_do_try_to_free_pages()>refill_inactive()>swap_out()>swap_out_mm()>swap_out_vma()>swap_out_pgd() ==================== include/linux/mm.h 187 191 ==================== mm/filemap.c ࠄ䆹᭛ӊ᯴ᇘⱘĀ㛣ā义䴶䯳߫ЁDŽ᳝݇ⱘ᪡԰ set_page_dirty()ᅮНѢ include/linux/mm.h ҹঞ ᡞ page 㒧ᵘЁⱘ PG_dirty ᷛᖫԡ䆒៤ 1ˈᑊᡞ义䴶䕀⿏ܜˈᷛᖫԡЎ 1ˈህ㽕೼䕀ࠄ drop_pte ໘Пࠡ ᅮ㾷䰸᯴ᇘˈ㗠义䴶㸼乍Ёⱘ_PAGE_DIRTYއⳌᑨⱘ address_space ᭄᥂㒧ᵘDŽᇍѢ䖭ḋⱘ义䴶ˈབᵰ བᵰ᠔㗗ᆳⱘ义䴶ᰃᴹ㞾䗮䖛 mmap()ᓎゟ䍋ⱘ᭛ӊ᯴ᇘˈ߭݊ pape 㒧ᵘЁⱘᣛ䩜 mapping ᣛ৥ ᇍぎⱑ义䴶ⱘᓩ⫼䅵᭄DŽޣ乍Ꮖ೼ࠡ䴶 83 㸠⏙ 0ˈ㗠 page_cache_release()߭াᰃ䗦 ϟࠡ䴶ⱘ deactivate_pape()ᅲ䰙Ϟϡ䍋԰⫼ˈ⡍߿ᰃ义䴶㸼މࠄࠡ䴶ⱘᷛো drop_pte ໘DŽ⊼ᛣ೼䖭⾡ᚙ ೼义䴶㸼乍Ё˅DŽ᠔ҹˈ䖭䞠䕀ټѸᤶ䆒໛Ϟⱘ义䴶ԡ㕂ϡ㛑䗮䖛䅵ㅫᕫࠄˈ᠔ҹᖙ乏ᡞ义䴶ⱘএ৥ᄬ 䖛 mmap()ᓎゟ䍋ⱘ᭛ӊ᯴ᇘˈ߭೼䳔㽕ᯊৃҹḍ᥂㰮ᢳഄഔ䅵ㅫߎ义䴶೼᭛ӊЁⱘԡ㕂˄Ⳍ↨Пϟˈ ᓔ˅DŽ䖭ᰃ಴Ў˗བᵰ义䴶ⱘݙᆍᰃぎⱑˈ䙷Мҹৢ䳔㽕ᯊৃҹݡᴹᓎゟ᯴ᇘ˗៪㗙ˈབᵰ义䴶ᴹ㞾䗮 ݙᄬ义䴶ᇮ᳾㹿ݭ䖛DŽᇍ䖭ḋⱘ义䴶ˈབᵰᕜЙ≵᳝ফࠄݭ䆓䯂ˈህৃҹᡞ᯴ᇘ㾷䰸˄㗠ϡᰃ᱖ᯊᮁ ᪡԰ˈህ㞾ࡼᡞ䆹ᷛᖫԡ䆒㕂៤ 1ˈ㸼ϡ䆹ݙᄬ义䴶Ꮖ㒣Ā㛣āњDŽབᵰℸᷛᖫԡЎ 0ˈህ㸼⼎Ⳍᑨⱘ ೼义䴶㸼乍Ё᳝ϔϾ“Dāᷛᖫԡ˄_PAGE_DIRTY˅ˈབᵰ CPU ᇍ㸼乍᠔ᣛⱘݙᄬ义䴶䖯㸠њݭ 269 static inline int pte_dirty(pte_t pte) { return (pte).pte_low & _PAGE_DIRTY; } ==================== include/asm•i386/pgtable.h 269 269 ==================== 䖭䞠ⱘ pte_dirty()ᰃϔϾ inline ߑ᭄ˈᅮНѢ include/asm•i386/pgtable.h˖ 157 } 156 return 0; 155 UnlockPage(page); 154 set_pte(page_table, pte); 153 out_unlock_restore: 152 151 goto set_swap_pte; set_page_dirty(page); 150 124 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 1096 /* 1095 DECLARE_WAIT_QUEUE_HEAD(kreclaimd_wait); ==================== mm/vmscan.c 1095 1143 ==================== ᳔ৢˈݡᴹⳟⳟ㒓⿟ kreclaimd ⱘҷⷕˈ䖭ᰃ೼ mm/vmscan.c Ё˖ oom_kill()Ң㋏㒳ЁᴔᥝϔϾ䖯⿟ˈ䗮䖛⡎⡆ሔ䚼ᴹֱ䱰ܼሔDŽ ϟˈৃ㛑᳔㒜䖬ᰃ䖒ϡࠄ㽕∖ˈℸᯊህ䇗⫼މ㄀Ѡ䕂ᠿᦣЁህৃ㛑ヺড়ᴵӊњDŽ᳔ৢˈ೼ᕜ⡍⅞ⱘᚙ ๲ࡴˈ಴Ў义䴶ⱘᇓੑᅲ䰙Ϟᰃҹᠿᦣⱘ⃵᭄ЎऩԡⱘDŽ䖭ḋˈ೼㄀ϔ䕂ᠿᦣЁϡヺড়ᴵӊⱘ义䴶೼ ϡ䎇˗䙷М refill_inactive()ህӮজಲ䖛༈ᴹᓔྟ㄀Ѡ䕂ᠿᦣDŽ㗠ᠿᦣ⃵᭄ⱘ๲ࡴӮՓ义䴶㗕࣪ⱘ䗳ᑺг Ѡᴹᇍ义䴶㸼ⱘᠿᦣᰃϾ㞾䗖ᑨⱘ䖛⿟DŽབᵰ೼ᇍ᠔᳝䖯⿟ⱘϔ䕂ᠿᦣৢ䕀ܹϡ⌏䎗⢊ᗕⱘ义䴶᭄䞣 ⢊ᗕˈ䖭ḋ refill_inactive()ቖϡᰃ㽕᮴か᮴ሑഄᕾ⦃ϟএ˛џᅲϞˈϔᴹ⿟ᑣЁᇍᕾ⦃ⱘ⃵᭄᳝Ͼ䰤ࠊˈ 䇏㗙г䆌೼ᛇˈ䗮䖛 swap_out_mm()ᇍ↣Ͼ䖯⿟义䴶㸼ⱘᠿᦣᑊϡֱ䆕ϔᅮ㛑᳝义䴶䕀ܹϡ⌏䎗 ৃ㛑᳝䖯⿟ℷ೼ⴵ⳴Ёㄝᕙ݊ᅠ៤ˈ಴ℸ䗮䖛 wake_up_all()૸䝦䖭ѯ䖯⿟DŽ ⃵՟㸠䏃㒓ህ෎ᴀ䍄ᅠњDŽབࠡ᠔䗄ˈkswapd()䰸ᅮᳳⱘᠻ㸠໪ˈг᳝ৃ㛑ᰃ㹿݊ᅗ䖯⿟૸䝦ⱘˈ᠔ҹ 㛑Ꮖ㒣᳝њ䕗໻ⱘᬍবˈ᠔ҹ䖬㽕ݡ䇗⫼ϔ⃵ refill_inactive_scan()DŽ䖭ḋˈkswapd()ⱘϔৃމ䯳߫ⱘᚙ ݙϡݡⷁ㔎ᯊЎℶDŽࠄ䙷ᯊˈdo_try_to_free_pages()ህ㒧ᴳњDŽಲࠄ kswapd()ⱘҷⷕЁˈℸᯊ⌏䎗义䴶 ጠ༫ⱘ while ᕾ⦃Ё䇗⫼ swap_out()ⱘˈϔⳈ㽕ࠄ㋏㒳Ёৃկߚ䜡ⱘ义䴶ˈࣙᣀ┰೼ৃկߚ䜡ⱘ义䴶೼ swap_out_mm()ⱘˈ᠔ҹ↣⃵䇗⫼ swap_out()䛑Ӯᤶߎ㢹ᑆ䖯⿟ⱘ㢹ᑆ义䴶ˈ㗠 refill_inactive()জᰃ೼ 㟇ℸˈᇍϔϾ䖯⿟ⱘ⫼᠋ぎ䯈义䴶ⱘᠿᦣ໘⧚ህᅠ៤њDŽswap_out()ᰃ೼ϔϾ for ᕾ⦃Ё䇗⫼ 义䴶䯳߫ЁDŽ㟇Ѣᅲ䰙ⱘݭߎˈ߭ࠡ䴶Ꮖ㒣ⳟࠄᰃ page_launder()ⱘџDŽ 义䴶䯳߫Ёˈ䖭Ͼߑ᭄ⱘҷⷕҹࠡᏆ㒣ⳟࠄ䖛њDŽ✊ৢˈݡ䗮䖛 set_page_dirty()ᇚ义䴶䕀ࠄϡ⌏䎗Ā㛣” ߚ䜡њⲬϞ义䴶ҹৢˈህ䗮䖛 add_to_swap_cache()ᇚ义䴶䫒ܹ swapper_space ⱘ䯳߫Ёˈҹঞ⌏䎗 DŽབᵰߚ䜡ⲬϞ义䴶༅䋹ˈህ䕀ࠄ out_unlock_restore ໘ᘶ໡ॳ᳝ⱘ᯴ᇘDŽޣП߭䗮䖛 swap_free()䗦 ᄬ义䴶ᯊህ䗮䖛 swap_duplicate()䗦๲ˈℸ໪೼᳝䖯⿟ᮁᓔᇍℸ义䴶ⱘ᯴ᇘᯊг㽕䗦๲˄㾕 99 㸠˅˗ড 䕗ㅔऩˈ៥Ӏᡞᅗ⬭㒭䇏㗙DŽⲬϞ义䴶ⱘՓ⫼䅵᭄೼ߚ䜡ᯊ䆒㕂៤ 1ˈҹৢ↣ᔧ᳝䖯⿟খϢ݅ѿৠϔݙ ህᰃ䇈ˈ䗮䖛__get_swap_page()ҢѸᤶ䆒໛Ϟߚ䜡ϔϾ义䴶DŽ݊ҷⷕ೼ mm/swapfile.c Ёˈ⬅Ѣ↨ 150 #define get_swap_page() __get_swap_page(1) ==================== include/linux/swap.h 150 150 ==================== get_swap_page()ߚ䜡ϔϾⲬϞ义䴶ˈ䖭ᰃϾᅣ᪡԰˖ 䗮䖛ܜ义䴶DŽᇍѢ䖭ḋⱘ义䴶ᖙ乏㽕ЎПߚ䜡ϔϾⲬϞ义䴶ˈᑊᇚ݊ݙᆍݭࠄⲬϞ义䴶ЁএDŽ佪 䯂ˈজϡ೼ swapper_space ⱘᤶܹˋᤶߎ䯳߫ЁˈгϡሲѢ᭛ӊ᯴ᇘˈԚैᰃϾফࠄ䖛ݭ䆓䯂ⱘĀ㛣” ݡᕔϟⳟ try_to_swap_out()ⱘҷⷕDŽᔧ⿟ᑣᠻ㸠ࠄ䖭䞠ᯊˈ᠔㗗ᆳⱘ义䴶ᖙ✊ᰃϾᕜЙ≵᳝ফࠄ䆓 147 } 146 mark_inode_dirty_pages(mapping•>host); 145 144 spin_unlock(&pagecache_lock); 143 list_add(&page•>list, &mapping•>dirty_pages); list_del(&page•>list); 142 125 126 1097 * Kreclaimd will move pages from the inactive_clean list to the 1098 * free list, in order to keep atomic allocations possible under 1099 * all circumstances. Even when kswapd is blocked on IO. 1100 */ 1101 int kreclaimd(void *unused) 1102 { 1103 struct task_struct *tsk = current; 1104 pg_data_t *pgdat; 1105 1106 tsk•>session = 1; 1107 tsk•>pgrp = 1; 1108 strcpy(tsk•>comm, "kreclaimd"); 1109 sigfillset(&tsk•>blocked); 1110 current•>flags |= PF_MEMALLOC; 1111 1112 while (1) { 1113 1114 /* 1115 * We sleep until someone wakes us up from 1116 * page_alloc.c::__alloc_pages(). 1117 */ 1118 interruptible_sleep_on(&kreclaimd_wait); 1119 1120 /* 1121 * Move some pages from the inactive_clean lists to 1122 * the free lists, if it is needed. 1123 */ 1124 pgdat = pgdat_list; 1125 do { 1126 int i; 1127 for(i = 0; i < MAX_NR_ZONES; i++) { 1128 zone_t *zone = pgdat•>node_zones + i; 1129 if (!zone•>size) 1130 continue; 1131 1132 while (zone•>free_pages < zone•>pages_low) { 1133 struct page * page; 1134 page = reclaim_page(zone); 1135 if (!page) 1136 break; 1137 __free_page(page); 1138 } 1139 } 1140 pgdat = pgdat•>node_next; 1141 } while (pgdat); 1142 } 1143 } ᇍ✻ϔϟ kswapd()ⱘҷⷕˈህৃҹⳟߎѠ㗙ⱘ߱ྟ࣪䚼ߚᰃϔḋⱘˈ⿟ᑣⱘ㒧ᵘгⳌԐDŽ⊼ᛣѠ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 420 continue; 419 add_page_to_active_list(page); 418 del_page_from_inactive_clean_list(page); 417 (!page•>buffers && page_count(page) > 1)) { 416 if (PageTestandClearReferenced(page) || page•>age > 0 || 415 /* Page is or was in use? Move it to the active list. */ 414 413 } 412 continue; 411 page•>zone•>inactive_clean_pages••; 410 list_del(page_lru); 409 printk("VM: reclaim_page, wrong page on list.\n"); 408 if (!PageInactiveClean(page)) { 407 /* Wrong page on list?! (list corruption, should not happen) */ 406 405 page = list_entry(page_lru, struct page, lru); 404 &zone•>inactive_clean_list && maxscan••) { 403 while ((page_lru = zone•>inactive_clean_list.prev) != 402 maxscan = zone•>inactive_clean_pages; 401 spin_lock(&pagemap_lru_lock); 400 spin_lock(&pagecache_lock); 399 */ 398 * to avoid deadlocks and most of the time we'll succeed anyway. 397 * but we have to grab the pagecache_lock before the pagemap_lru_lock 396 * We only need the pagemap_lru_lock if we don't reclaim the page, 395 /* 394 393 int maxscan; 392 struct list_head * page_lru; 391 struct page * page = NULL; 390 { 389 struct page * reclaim_page(zone_t * zone) 388 */ 387 * the first page of the list and exit successfully. 386 * The tests look impressive, but most of the time we'll grab 385 * The pages on the inactive_clean can be instantly reclaimed. 384 * 383 * @zone: reclaim a page from this zone 382 * reclaim_page • reclaims one page from the inactive_clean list 381 /** [kreclaimd()>reclaim_page()] ==================== mm/vmscan.c 381 463 ==================== ೼䯙䇏њϞ䴶䖭ѯҷⷕҹৢˈ䇏㗙Ꮖ㒣ϡ㟇Ѣᛳࠄೄ䲒њDŽ 义䴶䯳߫˗ҢЁಲᬊ义䴶ࡴҹ䞞ᬒDŽ䖭Ͼߑ᭄ⱘҷⷕ೼ mm/vmscan.c Ёˈ៥Ӏᡞᅗ⬭㒭䇏㗙㞾Ꮕ䯙䇏DŽ ”ޔ⣀ゟߎᴹ៤ЎϔϾ㒓⿟DŽϡ䖛ˈ䖭ϔ⃵ᰃ䗮䖛 reclaim_page()ᠿᦣ৘Ͼ义䴶ㅵ⧚ऎЁⱘϡ⌏䎗Āᑆ 䴶ㅵ⧚ᴎࠊⱘ㓈ᡸ㗙DŽџᅲϞˈ೼ҹࠡⱘ⠜ᴀЁা᳝ϔϾ㒓⿟ kswapdˈ೼ 2.4 ⠜Ёᠡᡞ݊Ёⱘϔ䚼ߚ 㗙䛑ᡞ݊ task_struct 㒧ᵘЁ flags ᄫ↉ⱘ PF_MEMALLOC ᷛᖫԡ䆒៤ 1ˈ㸼⼎䖭ϸϾݙḌ㒓⿟䛑ᰃ义 127 128 421 } 422 423 /* The page is dirty, or locked, move to inactive_dirty list. */ 424 if (page•>buffers || PageDirty(page) || TryLockPage(page)) { 425 del_page_from_inactive_clean_list(page); 426 add_page_to_inactive_dirty_list(page); 427 continue; 428 } 429 430 /* OK, remove the page from the caches. */ 431 if (PageSwapCache(page)) { 432 __delete_from_swap_cache(page); 433 goto found_page; 434 } 435 436 if (page•>mapping) { 437 __remove_inode_page(page); 438 goto found_page; 439 } 440 441 /* We should never ever get here. */ 442 printk(KERN_ERR "VM: reclaim_page, found unknown page\n"); 443 list_del(page_lru); 444 zone•>inactive_clean_pages••; 445 UnlockPage(page); 446 } 447 /* Reset page pointer, maybe we encountered an unfreeable page. */ 448 page = NULL; 449 goto out; 450 451 found_page: 452 del_page_from_inactive_clean_list(page); 453 UnlockPage(page); 454 page•>age = PAGE_AGE_START; 455 if (page_count(page) != 1) 456 printk("VM: reclaim_page, found page with count %d!\n", 457 page_count(page)); 458 out: 459 spin_unlock(&pagemap_lru_lock); 460 spin_unlock(&pagecache_lock); 461 memory_pressure++; 462 return page; 463 } 2.9 义䴶ⱘᤶܹ ೼ i386 CPU ᇚϔϾ㒓ᗻഄഔ᯴ᇘ៤⠽⧚ഄഔⱘ䖛⿟ЁDŽབᵰ䆹ഄഔⱘ᯴ᇘᏆ㒣ᓎゟˈԚ䖬থ⦄Ⳍ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! [do_page_fault()>handle_mm_fault()>handle_pte_fault()>do_swap_page()] ==================== mm/memory.c 1018 1056 ==================== do_swap_page()DŽ䖭Ͼߑ᭄ⱘҷⷕ೼ mm/memory.c Ё˖ handle_pte_fault() Пࠡⱘ໘⧚ҹঞᠻ㸠䏃㒓䛑Ϣ䍞⬠䆓䯂ⱘᚙ᱃Ⳍৠˈ᠔ҹ៥ӀⳈ᥹䖯ܹ ⠽⧚义䴶ϡ೼ݙᄬЁˈ᠔ҹ㽕䗮䖛 do_swap_page() ˈҢѸᤶ䆒໛Ϟᤶܹ䖭Ͼ义䴶DŽᴀᚙ᱃೼ 䇗⫼ do_no_page()DŽ䖭೼ҹࠡⱘᚙ᱃ЁᏆ㒣ⳟࠄ䖛њDŽডПˈབᵰ䴲ぎˈህ䇈ᯢ᯴ᇘᏆ㒣ᓎゟˈাᰃ བᵰϡ೼ˈ߭䖯㗠䗮䖛 pte_none()Ẕᶹ㸼乍ᰃ৺Ўぎˈेܼ 0DŽབᵰЎぎህ䇈ᯢ᯴ᇘᇮ᳾ᓎゟˈ᠔ҹ㽕 ऎߚⱘᰃ pte_present()ˈгህᰃẔᶹ㸼乍Ёⱘ P ᷛᖫԡˈⳟⳟ⠽⧚义䴶ᰃ৺೼ݙᄬЁDŽܜ䖭䞠ˈ佪 1175 } 1174 return do_swap_page(mm, vma, address, pte, pte_to_swp_entry(entry), write_access); 1173 return do_no_page(mm, vma, address, write_access, pte); 1172 if (pte_none(entry)) 1171 spin_unlock(&mm•>page_table_lock); 1170 */ 1169 * drop the lock. 1168 * and the PTE updates will not touch it later. So 1167 * If it truly wasn't present, we know that kswapd 1166 /* 1165 if (!pte_present(entry)) { 1164 entry = *pte; 1163 spin_lock(&mm•>page_table_lock); 1162 */ 1161 * and the SMP•safe atomic PTE updates. 1160 * We need the page table lock to synchronize with kswapd 1159 /* 1158 1157 pte_t entry; 1156 { 1155 int write_access, pte_t * pte) 1154 struct vm_area_struct * vma, unsigned long address, 1153 static inline int handle_pte_fault(struct mm_struct *mm, [do_page_fault()>handle_mm_fault()>handle_pte_fault()] ==================== mm/memory.c 1153 1175 ==================== Ёⱘᓔ༈޴㸠˖ Зᰃ䕃ӊˈгህᰃ义䴶ᓖᐌ໘⧚⿟ᑣⱘџDŽ೼Ā䍞⬠䆓䯂āⱘᚙ᱃Ёˈ៥Ӏ᳒ⳟࠄ೼ߑ᭄ handle_pte_fault() Ϟ义䴶ˈ䙷ᰃ䕃ӊⱘџDŽ᠔ҹˈऎߚ༅䋹ⱘॳ಴ࠄᑩᰃ಴Ў义䴶ϡ೼ݙᄬˈ䖬ᰃ಴Ў᯴ᇘᇮ᳾ᓎゟˈ ᷛᖫԡЎ 0ˈ݊ԭ৘Ͼԡ↉ⱘؐህ᮴ᛣНњDŽ㟇ѢᔧϔϾ义䴶ϡ೼ݙᄬЁᯊˈ߽⫼义䴶㸼乍ᣛ৥ϔϾⲬ ⳟⱘህᰃ义䴶㸼乍៪Ⳃᔩ乍Ёⱘ P ᷛᖫԡDŽা㽕 Pܜᐌ ā˄ Page Fault˅DŽџᅲϞˈCPU ೼᯴ᇘ䖛⿟Ё佪 া㽕 P ᷛᖫԡЎ 0 ህ䛑䅸Ўᰃ义䴶᯴ᇘ༅䋹ˈCPU ህӮѻ⫳ϔ⃵Ā义䴶ᓖˈމϡऎߚ䖭ϸ⾡ϡৠⱘᚙ ᳝᠔ऎ߿ˈ᠔ҹ៥Ӏ⿄ПЎĀᮁᓔāDŽԚᰃˈCPU ⱘ MMU ⹀ӊᑊމ㒣ᓎゟˈ⧚ᑨϢᇮ᳾ᓎゟ᯴ᇘⱘᚙ ЎĀফ䰏ā㗠ϡᰃĀ༅䋹āˈ಴Ў᯴ᇘⱘ݇㋏↩コᏆ⿄މ⃵ݙᄬ䆓䯂DŽҢ⧚䆎Ϟ䇈ˈг䆌ᑨ䆹ᡞ䖭⾡ᚙ ᑨ义䴶㸼乍៪Ⳃᔩ乍Ёⱘ P˄Present˅ᷛᖫԡЎ 0ˈ߭㸼⼎Ⳍᑨⱘ⠽⧚义䴶ϡ೼ݙᄬˈҢ㗠᮴⊩ᅠ៤ᴀ 129 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ϔϾ swap_entry_t 㒧ᵘˈᣛ৥ϔϾⲬϞ义䴶DŽѠ㗙ᅲ䰙Ϟ䛑ᰃ 32 ԡ᮴ヺোᭈ᭄DŽ䖭䞠㽕ᣛߎˈ᠔䇧Āϡ 䴶೼ݙᄬЁᯊˈ义䴶㸼乍ᰃϔϾি pte_t 㒧ᵘˈᣛ৥ϔϾݙᄬ义䴶˗㗠ᔧ⠽⧚义䴶ϡ೼ݙᄬЁᯊˈ߭ᰃ খ᭄ page_table ᣛ৥᯴ᇘ༅䋹ⱘ义䴶㸼乍ˈ㗠 entry ߭Ў䆹㸼乍ⱘݙᆍDŽ៥Ӏҹࠡ䇈䖛ˈᔧ⠽⧚义 ᯴ᇘ༅䋹ⱘ㒓ᗻഄഔDŽ ✊ⱘˈߚ߿ᰃᣛ৥ᔧࠡ䖯⿟ⱘ mm_struct 㒧ᵘⱘᣛ䩜ǃ᠔ሲ㰮ᄬऎ䯈ⱘ vm_area_struct 㒧ᵘⱘᣛ䩜ҹঞ ⴔ CPU ⱘᠻ㸠䏃㒓䍄ϔ䘡ˈ᧲⏙Ἦ䖭ѯখ᭄ⱘᴹ啭এ㛝DŽখ᭄㸼Ёⱘ mmǃvma 䖬᳝ address ᰃϔⳂњ ේᷜⱘᚙ᱃Ёˈ乎ܙಲࠄࠡ䴶䗮䖛䍞⬠䆓䯂ᠽܜⳟⳟ䇗⫼ᯊӴ䖛ᴹⱘখ᭄ᰃѯҔМDŽᓎ䆂䇏㗙ܜ { 1056 1055 return 1; /* Minor fault */ 1054 update_mmu_cache(vma, address, pte); 1053 /* No need to invalidate • it was non•present before */ 1052 set_pte(page_table, pte); 1051 1050 UnlockPage(page); 1049 pte = pte_mkwrite(pte_mkdirty(pte)); 1048 if (write_access && !is_page_shared(page)) 1047 swap_free(entry); 1046 lock_page(page); 1045 */ 1044 * obtained page count. 1043 * Must lock page before transferring our swap count to already 1042 * Freeze the "shared"ness of the page, ie page_count + swap_count. 1041 /* 1040 1039 pte = mk_pte(page, vma•>vm_page_prot); 1038 1037 mm•>rss++; 1036 1035 } 1034 flush_icache_page(vma, page); 1033 flush_page_to_ram(page); 1032 1031 return •1; 1030 if (!page) 1029 unlock_kernel(); 1028 page = read_swap_cache(entry); 1027 swapin_readahead(entry); 1026 lock_kernel(); 1025 if (!page) { 1024 1023 pte_t pte; 1022 struct page *page = lookup_swap_cache(entry); 1021 { 1020 pte_t * page_table, swp_entry_t entry, int write_access) 1019 struct vm_area_struct * vma, unsigned long address, static int do_swap_page(struct mm_struct * mm, 1018 130 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ૸䝦њ kswapdˈᔧᴀ䖯⿟㹿䇗ᑺᘶ໡䖤㸠ᯊˈгህᰃҢ schedule()䖨ಲᯊˈܜ䖤㸠DŽ⬅Ѣ೼ℸПࠡܜ⿟ Ёⱘ__GFP_WAIT ᷛᖫԡ㕂៤ 1ˈ᠔ҹᔧߚ䜡ϡࠄݙᄬ义䴶ᯊ䛑Ӯ㞾ᜓ᱖ᯊ⼐䅽ˈ䅽ݙḌ䇗ᑺ݊ᅗ䖯 ᮴䆎ᰃ swapin_readahead()䖬ᰃ read_swap_cache()ˈ೼⬇䇋ߚ䜡ݙᄬ义䴶ᯊ䛑ᡞ䇗⫼খ᭄ gfp_mask 387 } 386 schedule(); 385 current•>policy |= SCHED_YIELD; 384 __set_current_state(TASK_RUNNING); 383 if (gfp_mask & __GFP_WAIT) { 382 wakeup_kswapd(0); ==================== mm/page_alloc.c 382 387 ==================== ϟߑ᭄__alloc_pages()ЁⱘϔϾ⠛↉˖ ㄀ϔ⃵ߚ䜡ݙᄬ义䴶༅䋹ᑊϡϔᅮ䇈ᯢ㋻᥹ⴔⱘ㄀Ѡ⃵гӮ༅䋹DŽ㽕ᯢⱑ䖭ϔ⚍ˈ៥Ӏৃҹݡᴹⳟϔ 㗠㹿䇗ᑺ䖤㸠ⱘ䖯⿟ৃ㛑Ӯ䞞ᬒߎϔѯݙᄬ义䴶ˈ⫮㟇㹿䇗ᑺ䖤㸠ⱘ䖯⿟ৃ㛑ᙄདህᰃ kswapdDŽ಴ℸˈ 䖤㸠ˈܜⴔⱘϟϔ㸠ህ᳝ৃ㛑៤ࡳਸ਼˛䖭ᰃ಴Ўˈ೼ߚ䜡ݙᄬ义䴶༅䋹ᯊˈݙḌৃ㛑Ӯ䇗ᑺ݊ᅗ䖯⿟ 㛑Ӯ䯂ˈ䖭ϸ㸠⿟ᑣᰃ㋻᣼ⴔⱘˈЎҔМ೼ࠡϔ㸠䇁হЁ಴ߚ䜡ϡࠄ䎇໳ⱘݙᄬ义䴶㗠༅䋹ˈࠄ㋻᥹ ໳ⱘݙᄬ义䴶㗠༅䋹ˈ䙷ḋህⳳⱘ㽕ݡᴹ䇏ϔ⃵ˈ㗠䖭ϔ⃵ैⳳᰃা䇏ܹϔϾ义䴶њDŽ㒚ᖗⱘ䇏㗙ৃ ᠔䳔ⱘ义䴶Ꮖ㒣೼⌏䎗义䴶䯳߫Ё㗠া䳔㽕ᡞᅗᡒࠄህ㸠њDŽԚᰃˈг᳝ৃ㛑乘䇏ᯊ಴Ўߚ䜡ϡࠄ䎇 ህӮ⬅䖯⿟ kswapd ੠ kreclaimd ೼ϔ↉ᯊ䯈ҹৢࡴҹಲᬊDŽ䖭ḋˈᔧ䇗⫼ read_swap_cache()ᯊˈ䗮ᐌ 䖯ᴹⱘ义䴶䛑᱖ᯊ䫒ܹ⌏䎗义䴶䯳߫ҹঞ swapper_space ⱘᤶܹˋᤶߎ䯳߫Ёˈབᵰᅲ䰙Ϟ⹂ᅲϡ䳔㽕 䴶䲚㕸˄cluster˅DŽ⬅Ѣℸᯊᑊ䴲↣Ͼ䇏ܹⱘ义䴶䛑ᰃゟे䳔㽕ⱘˈ᠔ҹᰃĀ乘䇏ā˄read ahead˅DŽ乘䇏 䭓ᕫ໮DŽ᠔ҹˈ↨䕗㒣⌢ⱘࡲ⊩ᰃ˖᮶✊ᖙ乏㒣䖛ᇏ䘧ˈህᑆ㛚ϔ⃵໮䇏޴Ͼ义䴶䖯ᴹˈ⿄ЎϔϾ义 㽕㒣䖛೼⺕ⲬϞᇏ䘧Փ⺕༈ᅮԡˈ㗠ᇏ䘧᠔䳔ⱘᯊ䯈ᅲ䰙Ϟ↨⺕༈ࠄԡҹৢ䇏ϔϾ义䴶᠔䳔ⱘᯊ䯈㽕 䇗⫼ swapin_readahead()ਸ਼˛ᔧҢ⺕ⲬϞ䇏ⱘᯊ׭ˈ↣⃵ҙҙ䇏ϔϾ义䴶ᰃϡ㒣⌢ⱘˈ಴Ў↣⃵䇏Ⲭ䛑 ܜ䙷ህ㽕䗮䖛 read_swap_cache()ߚ䜡ϔϾݙᄬ义䴶ˈᑊϨҢⲬϞᇚ݊ݙᆍ䇏䖯ᴹDŽЎҔМ೼ℸПࠡ㽕 བᵰ≵᳝ᡒࠄˈህᰃ䇈ҹࠡ⫼Ѣ䖭Ͼ㰮ᄬ义䴶ⱘݙᄬ义䴶Ꮖ㒣䞞ᬒˈ⦄೼݊ݙᆍҙᄬ೼ѢⲬϞњˈ ߑ᭄ᰃ೼ swap_state.c ЁᅮНⱘˈ៥Ӏᡞᅗ⬭㒭䇏㗙㞾Ꮕ䯙䇏DŽ 䇗⫼ lookup_swap_cache()DŽ䖭Ͼܜܹˋᤶߎ䯳߫Ёᇮ᳾᳔ৢ䞞ᬒDŽབᵰᰃⱘ䆱䙷ህⳕџњDŽ᠔ҹˈ㽕 㽕ⳟⳟⳌᑨⱘݙᄬ义䴶ᰃ৺䖬⬭೼ swapper_space ⱘᤶܜ໘⧚ϔ⃵಴㔎义䴶ᓩ䍋ⱘ义䴶ᓖᐌᯊˈ佪 ḋᠡ㛑Ϣ᯴ᇘᇮ᳾ᓎゟᯊⱘ义䴶㸼乍Ⳍऎ߿DŽ 义䴶DŽկ义䴶Ѹᤶⱘ䆒໛Ϟ㄀ϔϾ义䴶˄ᑣোЎ 0˅ᰃֱ⬭ϡ⫼ⱘˈ᠔ҹ entry ⱘؐϡৃ㛑Ўܼ 0DŽ䖭 䆒໛Ϟ˄៪᭛ӊЁˈϟৠ˅ⱘԡ⿏ˈ݊ᅲгህᰃ义䴶ᑣোDŽϸ䚼ߚড়೼ϔ䍋ህᚳϔഄ⹂ᅮњϔϾⲬϞ ԡ˅DŽ䆹ᣛ䩜䘏䕥Ϟߚ៤ϸ䚼ߚ˖㄀ϔ䚼ߚᰃ义䴶Ѹᤶ䆒໛˄៪᭛ӊ˅ⱘᑣো˗㄀Ѡ䚼ߚᰃ义䴶೼䖭Ͼ ⬅Ѣ⠽⧚义䴶ϡ೼ݙᄬˈ᠔ҹ entry ᰃᣛ৥ϔϾⲬϞ义䴶ⱘ㉏ԐѢᣛ䩜ⱘ㋶ᓩ乍˄ࡴϞ㢹ᑆᷛᖫ 䙷Ͼ switch 䇁হЁˈ“default:āϢ“case 2:āП䯈≵᳝ break 䇁হ˅DŽℸৢ֓䗤ሖӴњϟᴹDŽ ᅮⱘ˄⊼ᛣˈ೼އ ⱘ switch 䇁হЁ˄㾕 arch/i386/fault.c˅ḍ᥂ CPU ѻ⫳ⱘߎ䫭ҷⷕ error_code ⱘ bit1 䖬᳝ϔϾখ᭄ write_accessˈ㸼⼎ᔧ᯴ᇘ༅䋹ᯊ᠔䖯㸠ⱘ䆓䯂⾡㉏˄䇏ˋݭ˅ˈ䖭ᰃ೼ do_page_fault() ߫Ёˈ⫮㟇೼⌏䎗义䴶䯳߫ЁDŽ ೼ݙᄬЁāᰃ䘏䕥ᛣНϞⱘˈᰃᇍ CPU ⱘ义䴶᯴ᇘ⹀ӊ㗠㿔ˈᅲ䰙Ϟ䖭Ͼ义䴶ᕜৃ㛑೼ϡ⌏䎗义䴶䯳 131 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 238 found_page = lookup_swap_cache(entry); 237 */ 236 * Check the swap cache again, in case we stalled above. 235 /* 234 233 new_page = virt_to_page(new_page_addr); 232 goto out_free_swap; /* Out of memory */ 231 if (!new_page_addr) 230 new_page_addr = __get_free_page(GFP_USER); 229 228 goto out_free_swap; 227 if (found_page) 226 found_page = lookup_swap_cache(entry); 225 */ 224 * Look for the page in the swap cache. 223 /* 222 goto out; 221 if (!swap_duplicate(entry)) /* Account for the swap cache */ 220 */ 219 * Make sure the swap entry is still in use. 218 /* 217 216 unsigned long new_page_addr; 215 struct page *found_page = 0, *new_page; 214 { 213 struct page * read_swap_cache_async(swp_entry_t entry, int wait) 212 211 */ 210 * the swap entry is no longer in use. 209 * A failure return means that either the page allocation failed or that 208 * 207 * only doing readahead, so don't worry if the page is already locked. 206 * and reading the disk if it is not already cached. If wait==0, we are 205 * Locate a page of swap in physical memory, reserving swap cache space 204 /* [do_page_fault()>handle_mm_fault()>handle_pte_fault()>do_swap_page()read_swap_cache_async()] ==================== mm/swap_state.c 204 255 ==================== ߑ᭄ read_swap_cache_async()ⱘҷⷕ೼ mm/swap_state.c Ё˖ 125 #define read_swap_cache(entry) read_swap_cache_async(entry, 1); ==================== include/linux/swap.h 125 125 ==================== ᕙ䇏ܹᅠ៤˄᠔ҹᅲ䰙Ϟᰃৠℹⱘ䇏ܹ˅DŽ 䯙䇏DŽ㗠 read_swap_cache()ᅲ䰙Ϟᰃ read_swap_cache_async()ˈাᰃᡞ䇗⫼খ᭄ wait 䆒៤ 1ˈ㸼⼎㽕ㄝ гህ༅䋹њˈ᠔ҹ೼ 1031 㸠䖨ಲ•1DŽ䖭䞠ˈ៥Ӏህϡ⏅ܹࠄ swapin_readahead()Ёএњˈ䇏㗙ৃҹ㞾㸠 ᴹϔ⃵ˈг䖬ᰃ᳝ৃ㛑˄㗠Ϩ໮ञ㛑໳˅៤ࡳDŽᔧ✊ˈг᳝ৃ㛑Ѡ㗙䛑༅䋹њˈ䙷ḋ do_swap_page() ݡ⃵䆩೒ߚ䜡义䴶Ꮖ᳝ৃ㛑៤ࡳњDŽेՓ೼ swapin_readahead()Ёজ༅䋹њˈ೼ read_swap_cache()Ёݡ 132 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䙷ህᰃ䗮䖛ˈމऎⳌ㘨㋏ˈজ໮ϔϾĀ⫼᠋āˈ᠔ҹՓ⫼䅵᭄Ў 3DŽԚᰃˈ䖬᳝ϔ⾡⡍⅞ᚙކ䇏ˋݭ㓧 Ͼ䖯⿟᯴ᇘࠄ䖭Ͼᤶܹˋᤶߎ义䴶ᯊˈ݊Փ⫼䅵᭄Ў 2DŽབᵰ义䴶ᴹ㞾᭛ӊ᯴ᇘˈ߭⬅ѢৠᯊজϢ᭛ӊ ᯊˈজ೼ add_to_page_cache_locked()Ё䗮䖛 page_cache_get()䗦๲њ䖭Ͼ䅵᭄ˈ᠔ҹᔧ᳝ǃᑊϨা᳝ϔ 1DŽ✊ৢˈ೼䗮䖛 add_to_swap_cache()ᇚ݊䫒ܹᤶܹˋᤶߎ䯳߫˄៪᭛ӊ᯴ᇘ䯳߫˅੠ LRU 䯳߫ active_list ೼ߚ䜡ϔϾݙᄬ义䴶ᯊᡞ䖭Ͼ䅵᭄䆒៤ˈܜ䖬㽕⊼ᛣᇍݙᄬ义䴶ˈे݊ page 㒧ᵘⱘՓ⫼䅵᭄DŽ佪 Ѡ㗙ᰃѦⳌᇍᑨⱘDŽˈ1 ޣⲬϞ义䴶ⱘ݅ѿ䅵᭄DŽ㗠⦄೼ᘶ໡᯴ᇘ߭Փ݅ѿ䅵᭄ ⳟϔϟ try_to_swap_out()Ёⱘ 99 㸠DŽ೼䙷䞠ˈᔧᮁᓔϔϾ义䴶ⱘ᯴ᇘᯊˈ䗮䖛 swap_duplicate()䗦๲њ 1DŽᇍℸˈ䇏㗙ϡོಲ䖛এ ޣ䯳߫Ёᡒࠄњ᠔䳔ⱘ义䴶ˈ߭݅ѿ䅵᭄ކ݅ѿ䅵᭄ֱᣕϡব˗㗠བᵰ೼㓧 ህব៤њ䖭ḋ˖བᵰҢѸᤶ䆒໛䇏ܹ义䴶ˈ߭ⲬϞ义䴶ⱘމ1DŽ䖭Мϔᴹˈᚙ ޣՓⲬϞ义䴶ⱘ݅ѿ䅵᭄ ᠔ҹⲬϞ义䴶ⱘ݅ѿ䅵᭄ࡴњ 1DŽৃ ᰃ ˈಲ ࠄ do_swap_page()ҹৢˈ೼ 1047 㸠জ䇗⫼њϔ⃵ swap_free()ˈ 䗮䖛 swap_free()ᢉ⍜ᇍ݅ѿ䅵᭄ⱘ䗦๲DŽডПˈབᵰ䳔㽕ҢѸᤶ䆒໛䇏ܹ义䴶ˈ߭ϡ䇗⫼ swap_free()ˈ 䯳߫Ёᡒࠄњ᠔䳔ⱘ义䴶㗠᮴䳔ҢѸᤶ䆒໛䇏ܹˈ߭೼ 252 㸠ކ๲њⲬϞ义䴶ⱘ݅ѿ䅵᭄DŽབᵰ೼㓧 ϔᓔྟᯊ೼ 221 㸠ህ䗮䖛 swap_duplicate()䗦ˈܜ䖭䞠㽕ⴔ䞡⊼ᛣϔϟᇍⲬϞ义䴶ⱘ݅ѿ䅵᭄DŽ佪 ЁњˈᑊϨ偀Ϟህ㽕ᘶ໡᯴ᇘDŽ 䇏DŽ䇗⫼ read_swap_cache()៤ࡳҹৢˈ᠔㽕ⱘ义䴶㚃ᅮᏆ㒣೼ swapper_space 䯳߫ҹঞ active_list 䯳߫ ᭄ⱘҷⷕ䇏㗙Ꮖ㒣ⳟࠄ䖛њDŽ㟇Ѣ rw_swap_page()ˈ䇏㗙ৃҹ೼ᄺдњഫ䆒໛偅ࡼϔゴҹৢಲ䖛ᴹ䯙 ⱘ⠽⧚义䴶˄⹂ߛഄ䇈ᰃᅗⱘ page ᭄᥂㒧ᵘ˅ᣖܹ swapper_space 䯳߫ҹঞ active_list 䯳߫Ёˈ䖭Ͼߑ 䇏䖯ᴹњˈ᠔ҹ㽕ݡẔᶹϔ⃵DŽབᵰ⹂ᅲ䳔㽕ҢѸᤶ䆒໛䇏ܹˈ߭䗮䖛 add_to_swap_cache()ᇚᮄߚ䜡 ᡞ䖭Ͼ义䴶ܜ㹿䇗ᑺ䖤㸠ˈᑊ៤ࡳഄߚ䜡ࠄ⠽⧚义䴶Ң__get_free_page()䖨ಲᯊˈг䆌঺ϔϾ䖯⿟Ꮖ㒣 䖤㸠DŽ㗠ᔧ䖭Ͼ䖯⿟ݡ⃵ܜৃ㛑ফ䰏ˈབᵰϔᯊߚ䜡ϡࠄ义䴶ˈᔧࠡ䖯⿟ህӮⴵ⳴ㄝᕙˈ䅽߿ⱘ䖯⿟ ҔМ೼ᡒϡࠄǃ಴㗠Ўℸߚ䜡њϔϾݙᄬ义䴶ҹৢজᴹᇏᡒϔ⃵ਸ਼˛䖭ᰃ಴Ўߚ䜡ݙᄬ义䴶ⱘ䖛⿟᳝ 䴶ᰃЎњ㡖ⳕϔ⃵Ң䆒໛䇏ܹ˗঺ϔᮍ䴶ˈ᳈䞡㽕ⱘᰃ䰆ℶৠϔϾ义䴶೼ݙᄬЁ᳝ϸϾࡃᴀDŽৃᰃЎ Ң swapper_space 䯳߫Ёᇏᡒϔ⃵DŽ䖭ϔᮍܜswapin_readahead()г䆌Ꮖ㒣ᡞⳂᷛ义䴶䇏䖯ᴹњˈ᠔ҹ㽕 䇏㗙г䆌⊼ᛣࠄњˈ䖭䞠ϸ⃵䇗⫼њ lookup_swap_cache()DŽ㄀ϔ⃵ᰃᕜད⧚㾷ⱘˈ಴Ў 255 } 254 return found_page; 253 out: 252 swap_free(entry); 251 out_free_swap: 250 page_cache_release(new_page); 249 out_free_page: 248 247 return new_page; 246 rw_swap_page(READ, new_page, wait); 245 add_to_swap_cache(new_page, entry); 244 lock_page(new_page); 243 */ 242 * Add it to the swap cache and read its contents. 241 /* 240 goto out_free_page; if (found_page) 239 133 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ∛៤ϔԧⱘDŽ 㟇Ѣ㋻᥹ⴔⱘ update_mmu_cache()ˈᇍѢ i386 CPUাᰃϾぎ᪡԰ˈ಴Ў i386 ⱘ MMU ᰃϢ CPU ㅵ⧚Ꮖ㒣↨䕗❳ᙝˈᑨ䆹ϡӮ᳝໾໻ⱘೄ䲒њDŽټЎ㆛ᐙⱘ݇㋏ˈѠᴹ䇏㗙⦄೼ᇍᄬ ᵰ䳔㽕 cowˈे copy_on_write ⱘ䆱ˈгᰃ೼䙷䞠໘⧚ⱘ˅DŽ៥Ӏᇚ do_wp_page()⬭㒭䇏㗙ˈϔᴹᰃ಴ ϔ⃵义䴶ᓖᐌDŽ䙷ᯊˈህӮ೼ handle_pte_fault()Ё䇗⫼ do_wp_page()ˈᇚ义䴶ⱘ䆓䯂ᴗ䰤԰ߎᬍব˄བ 䆌ݭњ৫˛ϡ㽕㋻ˈথ⫳ݭ䆓䯂ᯊӮ಴䆓䯂ᴗ䰤ϡヺ㗠ᓩ䍋঺ܕᙄདᰃ䇏䆓䯂ˈ䖭Ͼ义䴶ϡህ∌䖰ϡ vma•>vm_flags 㗠ϡᰃ vma•>vm_page_prot Ёⱘϔԡ˅DŽ䇏㗙䖬ৃ㛑Ӯ䯂ˈ䙷ḋϔᴹˈ㽕ᰃᔧࠡⱘ䆓䯂 vma•>vm_page_prot ᵘㄥϔϾ义䴶㸼乍ᯊˈ㸼乍ⱘ_PAGE_RW ᷛᖫԡЎ 0˄⊼ᛣ VM_WRITE ᰃ Ў 1 ⱘࠡᦤϟˈ_PAGE_RW ᠡ᳝ৃ㛑Ў 1ˈԚैᑊϡϔᅮЎ 1DŽ᠔ҹˈ೼ 1039 㸠Ёˈḍ᥂ 䆌ݭ䆓䯂DŽা᳝೼ VM_WRITEܕԡ ˗㗠 _PAGE_RW ߭᳈Ўࡼᗕˈা㸼⼎ᔧࠡ䖭ϔϾ⠽⧚ݙᄬ义䴶ᰃ৺ 䯈ⱘৃݭᷛᖫ VM_WRITE Ϣ义䴶ⱘৃݭᷛᖫ_PAGE_RW ᰃϡৠⱘDŽVM_WRITE ᰃϾⳌᇍ䴭ᗕⱘᷛᖫ 䆌ݭⱘ䆱˄VM_WRITE ᷛᖫԡЎ 0˅ˈህ䕀ࠄ bad_area এњDŽ䖬㽕⊼ᛣˈऎܕབᵰ义䴶᠔ሲⱘऎ䯈ϡ 䆱ህḍᴀࠄ䖒ϡњ䖭ϾഄᮍDŽ䇏㗙ϡོಲ䖛༈এⳟⳟ do_page_fault()Ё switch 䇁হⱘ case 2DŽ೼䙷⫼ˈ 䆌ݭ˛ϛϔᴀᴹህᑨ䆹᳝ݭֱᡸⱘਸ਼˛ㄨḜᰃˈབᵰ䙷ḋⱘܕ䆓䯂ᰃϔ⃵ݭ䆓䯂ህᡞ义䴶㸼乍䆒㕂៤ 䗮䖛 pte_mkwrite()ᇚ义䴶㸼乍Ёⱘ_PAGE_RW ᷛᖫԡг㕂៤ 1DŽ䇏㗙г䆌Ӯ䯂˖ᗢМৃҹ߁ⴔᔧࠡⱘ Ўぎ᪡԰DŽҷⷕЁ䗮䖛 pte_mkdirty()ᇚ义䴶㸼乍Ёⱘ D ᷛᖫԡ㕂៤ 1ˈ㸼⼎䆹义䴶Ꮖ㒣Ā㛣āњˈᑊϨ ಲࠄ do_swap_page()ҷⷕЁˈ䖭䞠ⱘ flush_page_to_ram()੠ flush_icache_page()ᇍѢ i386 ໘⧚఼ഛ ˄㾕 mm/vmscan.c ⱘ 744 㸠˅ˈ䙷ᠡᰃՓ⫼䅵᭄Ў 1 ⱘ义䴶ᑨ䆹ਚⱘഄᮍDŽ 㗠Փ⫼䅵᭄ব៤ 2˗៪㗙ᰃ೼ϔ↉ᯊ䯈ҹৢҡ᮴䖯⿟䅸乚ˈ᳔ৢ㹿 refill_inactive_scan()⿏ܹϡ⌏䎗䯳߫ ⅞ⱘ义䴶ˈᅗӀ೼ active_list Ёˈ㗠Փ⫼䅵᭄ैᰃ 1DŽҹৢˈ䖭ѯ义䴶៪㗙ᰃ㹿ᶤϾ䖯⿟Ā䅸乚āˈҢ 䖭Ͼ䅵᭄ˈಮЎ乘䇏䖯ᴹⱘ义䴶ᑊ≵᳝䖯⿟೼Փ⫼DŽѢᰃˈ䖭ѯ义䴶ህ៤њ⡍ޣpage_cache_release()䗦 read_swap_cache_async() 䖨ಲᯊˈ↣Ͼ义䴶ⱘՓ⫼䅵᭄䛑ᰃ 2DŽԚᰃˈ೼ᕾ⦃Ё偀Ϟজ䗮䖛 ೼ swapin_readahead()Ёˈᕾ⦃ഄ䇗⫼ read_swap_cache_async()ߚ䜡੠䇏ܹ㢹ᑆ义䴶ˈ಴㗠೼Ң 1016 } 1015 return; 1014 } 1013 swap_free(SWP_ENTRY(SWP_TYPE(entry), offset)); 1012 page_cache_release(new_page); 1011 if (new_page != NULL) 1010 new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset), 0); 1009 /* Ok, do the async read•ahead now */ ==================== mm/memory.c 1009 1016 ==================== ...... 1001 for (i = 0; i < num; offset++, i++) { ==================== mm/memory.c 1001 1001 ==================== ...... 991 { 990 void swapin_readahead(swp_entry_t entry) [do_page_fault()>handle_mm_fault()>handle_pte_fault()>do_swap_page()>swapin_readahead()] ==================== mm/memory.c 990 991 ==================== swapin_readahead()乘䇏䖯ᴹⱘ义䴶DŽ 134 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ԭⱘ䯳߫ࡴҹ㊒ㅔDŽ៥Ӏ೼ kswapd ⱘ do_try_to_free_pages()Ё᳒㒣ⳟࠄˈ䇗⫼ߑ᭄ kmem_cache_reap()ˈ њॳ⢊DŽ↣Ͼ䯳߫ЁĀᇍ䈵āⱘϾ᭄ᰃࡼᗕব࣪ⱘˈϡ໳ᯊৃҹ๲⏏DŽৠᯊˈজᅮᳳഄẔᶹˈᇚ᳝ᆠ খⳟ page ᭄᥂㒧ᵘⱘᅮНˈ㒧ᵘЁ᳝ϸϾ struct list_head ៤ߚ˅ˈᔧᇚ݊Ң䯳߫Ёᨬ䰸ᯊ㞾✊ህᘶ໡៤ ᠔ҹϔ㒣ߚ䜡ゟेህ㛑Փ⫼ˈ㗠೼䞞ᬒᯊ߭ᘶ໡៤ॳ⢊DŽ՟བˈᇍѢ݊Ёⱘ䯳߫༈៤ߚᴹ䇈˄䇏㗙ৃ ऎ䯳߫Ёⱘ৘Ͼᇍ䈵೼ᓎゟᯊ⫼݊Āᵘ䗴āߑ᭄䖯㸠߱ྟ࣪ˈކ⿄Ā㒧ᵘā㗠⿄ЎĀᇍ䈵ā˄object˅DŽ㓧 䴶৥ᇍ䈵⿟ᑣ䆒䅵ᡔᴃЁⱘৡ䆡ˈϡݡ⫼׳䗴 ā˄ constructor˅੠Āᢚ䰸ā˄destructor˅ߑ᭄DŽৠᯊˈ䖬 ऎ䯳߫ˈ↣⾡᭄᥂㒧ᵘ䛑᳝ⳌᑨⱘĀᵘކ೼ slab ᮍ⊩Ёˈ↣⾡䞡㽕ⱘ᭄᥂㒧ᵘ䛑᳝㞾Ꮕϧ⫼ⱘ㓧 ໻ᇣߦߚㅵ⧚ऎ˄zone˅ⱘᮍ⊩ⳌԐˈԚᰃг᳝䞡㽕ⱘϡৠDŽ ⳌԐ˗гϢҹࠡ៥Ӏⳟࠄ䖛ⱘᣝ∴ކ఼ߚ䜡ⱘ㾦ᑺ䆆ˈslab ϢЎ৘⾡᭄᥂㒧ᵘߚ߿ᓎゟ㓧ټҢᄬ њᬍ䖯DŽ ᳡њϞ䗄ⱘ㔎⚍DŽ㗠 Linuxˈг೼݊ݙḌЁ䞛⫼њ䖭⾡ᮍ⊩ˈᑊ԰ܟೳ˅ˈ೼Ⳍᔧ⿟ᑺϞޱ⏋ᰃ໻ഫⱘ ऎߚ䜡੠ㅵ⧚ᮍ⊩˄slab ⱘॳᛣކ2.4 ᪡԰㋏㒳˄Unix ⱘϔϾব⾡˅Ёˈ䞛⫼њϔ⾡⿄Ў“slabāⱘ㓧 ऎぎ䯈ˈᕜЙҹᴹህᰃϔϾ⛁䮼ⱘⷨお䇒乬DŽ90 ᑈҷࠡᳳˈ೼ Solarisކᅲ䰙Ϟˈབԩ᳝ᬜഄㅵ⧚㓧 DŽމ· ϡ䗖ড়໮໘⧚఼݅⫼ݙᄬⱘᚙ ᰒ✊ህ㽕ᠧᡬᠷњDŽ ϡৠⱘ义䴶Ёˈ಴㗠↣⃵䛑ϡ㛑ੑЁˈ㗠㽕Ңݙᄬ㺙ܹࠄ催䗳㓧ᄬˈ䙷Мৃᛇ㗠ⶹˈ݊ᬜ⥛ Ͼ⠛↉ϡ㛑⒵䎇㽕∖㗠乎ⴔᣛ䩜ᕔϟⳟϟϔϾ⠛↉ⱘ᭄᥂㒧ᵘᯊˈབᵰ䆹᭄᥂㒧ᵘ↣⃵䛑೼ ऎDŽ೼䖭ḋⱘ䖛⿟Ёˈᔧϔކぎ䯈⠛↉ᵘ៤ⱘ䯳߫Ёߚ䜡㓧ټ˄first fit˅ⱘᮍ⊩ˈҢϔϾ⬅ᄬ ヺড়ܜᅮ៥Ӏ䖤⫼᳔؛ˈⳈ᥹ᕅડࠄ催䗳㓧ᄬЁⱘੑЁ⥛ˈ䖯㗠ᕅડࠄ䖤㸠ᯊⱘᬜ⥛DŽ䆩ᛇ ऎⱘ㒘㒛੠ㅵ⧚ᮍᓣކϟˈ䖭ѯ㓧މऎⱘ㒘㒛੠ㅵ⧚ᰃᆚߛⳌ݇ⱘDŽ೼᳝催䗳㓧ᄬⱘᚙކ· 㓧 ᬜ⥛DŽ ៤ܼ 0DŽབᵰ䞞ᬒⱘ᭄᥂㒧ᵘৃҹ೼ϟ⃵ߚ䜡ᯊĀ䞡⫼ā㗠᮴䳔߱ྟ࣪ˈ䙷ህৃҹᦤ催ݙḌⱘ 䖭ѯ᭄᥂㒧ᵘЁⳌᔧϔ䚼ߚ៤ߚ䳔㽕ᶤѯ⡍⅞ⱘ߱ྟ࣪˄՟བ䯳߫༈䚼ㄝ˅㗠ᑊ䴲ㅔऩഄ⏙ ऎҹৢˈ䛑㽕䖯㸠߱ྟ࣪DŽݙḌЁ乥㐕ഄՓ⫼ϔѯ᭄᥂㒧ᵘˈކ· ↣⃵ߚ䜡ᕫࠄ᠔䳔໻ᇣⱘ㓧 䳔໻ᇣⱘ䖲㓁ぎ䯈DŽЎℸˈϔ㠀䛑䞛⫼ᣝ 2n ⱘ໻ᇣᴹߚ䜡ぎ䯈ˈҹ㓧㾷⹢⠛࣪DŽ ේЁぎ䯆ぎ䯈ⱘᘏ䞣䎇໳໻ǃै᮴⊩ߚ䜡᠔ټේĀ⹢⠛࣪āˈҹ㟇㱑✊ᄬټ· Й㗠ЙПˈӮՓᄬ ぎ䯈Āේā˄heap˅Ёˈ䳔㽕⫼໮ᇥህߛϟ໮໻ϔഫˈϡ⫼њህᔦ䖬ˈ᳝߭޴Ͼ㔎⚍䳔㽕㗗㰥ᬍ䖯˖ټᄬ ぎ䯈Ёⱘ malloc()䙷ḋⱘࡼᗕߚ䜡ࡲ⊩ˈҢϔϾ㒳ϔⱘ᠋⫼ڣ⫼䙷Мˈ⫼ҔМḋⱘᮍ⊩ਸ਼˛བᵰ䞛 ऎぎ䯆ⱘሔ䴶DŽ಴ℸˈা㛑䞛⫼᳈݋ܼሔᗻⱘᮍ⊩DŽކЁै᳝໻䞣㓧∴ކ㓧 Ꮖ㒣⫼ሑ㗠᳝ѯ∴ކāˈ಴Ў䙷ḋⱘ䆱ᕜৃ㛑Ӯߎ⦄᳝ѯ㓧∴ކϔ⾡ৃ㛑⫼ࠄⱘ᭄᥂㒧ᵘᓎゟϔϾĀ㓧 ऎⱘ䳔∖ˈϡ䗖ড়Ў↣ކḍᴀ᮴⊩乘⌟䖤㸠Ё৘⾡ϡৠ᭄᥂㒧ᵘᇍ㓧ܜᵘ៤ϔϾ䴭ᗕⱘ䰉߫DŽ⬅Ѣџ Ѣݙᄬ义䴶ㅵ⧚ⱘ page 㒧ᵘ䙷ḋˈ᳝໮໻ⱘݙᄬህ᳝໮ᇥϾ page 㒧ᵘˈ⫼ڣぎ䯈জᰃࡼᗕব࣪ⱘˈϡ ټ䰤ѢᶤϔϾᄤ⿟ᑣˈ৺߭ህৃҹ԰Ў䖭Ͼᄤ⿟ᑣⱘሔ䚼ব䞣㗠Փ⫼ේᷜぎ䯈њDŽ঺໪ˈ䖭ѯᇣഫᄬ ぎ䯈ⱘՓ⫼ᑊϡሔټϔϾ task_struct 㒧ᵘˈ㗠ᔧ䖯⿟᩸䫔ᯊህ㽕䞞ᬒᴀ䖯⿟ⱘ task 㒧ᵘDŽ䖭ѯᇣഫᄬ ऎDŽ՟བˈᔧ㽕ᓎゟϔϾᮄⱘ䖯⿟ᯊህ㽕๲ࡴކৃᛇ㗠ⶹˈݙḌ೼䖤㸠ЁᐌᐌӮ䳔㽕Փ⫼ϔѯ㓧 ऎⱘㅵ⧚ކݙḌ㓧 2.10 135 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! · ↣Ͼ slab Ϟ䖬᳝Ͼᇍ䈵䫒᥹᭄㒘ˈ⫼ᴹᅲ⦄ϔϾぎ䯆ᇍ䈵䫒DŽ ᇍ䈵ⱘ䍋ྟഄഔDŽ · ↣Ͼ slab 䛑᳝ϔϾᇍ䈵ऎˈ䖭ᰃϾᇍ䈵᭄᥂㒧ᵘⱘ᭄㒘ˈҹᇍ䈵ⱘᑣোЎϟᷛህৃᕫࠄ݋ԧ ᰃ৘Ͼ slab Ϟⱘܼ䚼ᇍ䈵䛑໘Ѣぎ䯆⢊ᗕDŽ Ϟ᠔᳝ⱘᇍ䈵䛑Ꮖߚ䜡Փ⫼ⱘ˗㄀Ѡ៾ᰃ৘Ͼ slab Ϟⱘᇍ䈵Ꮖ㒣䚼ߚഄߚ䜡Փ⫼˗᳔ৢϔ៾ ⱘ䯳߫༈ᔶ៤ϔᴵঠ৥䫒䯳߫DŽ↣Ͼ slab ঠ৥䫒䯳߫೼䘏䕥Ϟߚ៤ϝ៾ˈ㄀ϔ៾ᰃ৘Ͼ slab · ೼↣Ͼ slab ⱘࠡッᰃ䆹 slab ⱘᦣ䗄㒧ᵘ slab_tDŽ⫼Ѣৠϔ⾡ᇍ䈵ⱘ໮Ͼ slab 䗮䖛ᦣ䗄㒧ᵘЁ 䈵ⱘ໻ᇣ㗠ᓖˈ߱ྟ࣪ᯊ䗮䖛䅵ㅫᕫߎ᳔ড়䗖ⱘ໻ᇣDŽ · ϔϾ slab ৃ㛑⬅ 1 Ͼǃ2 Ͼǃ4 ϾǃĂ᳔໮ 32 Ͼ䖲㓁ⱘ⠽⧚义䴶ᵘ៤DŽslab ⱘ݋ԧ໻ᇣ಴ᇍ ߭䱣ⴔҷⷕⱘ䯙䇏ݡ䗤ℹ⏅ܹ˗މᇍϞ䗄⼎ᛣ೒԰޴⚍䇈ᯢˈ䆺㒚ᚙܜℸ໘ ೒ 2.8 ᇣᇍ䈵 slab 㒧ᵘ⼎ᛣ೒ ᛇᇣᇍ䈵ⱘϔϾ slab ഫⱘ㒧ᵘ⼎ᛣ೒˄೒ 2.8˅˖؛⾡ⳟ⫼Ѣᶤܜⱘ slab 㒧ᵘDŽ ᴹⳟᇍᇣᇍ䈵ⱘ㒘㒛੠ㅵ⧚ҹঞⳌᑨܜݙḌЁՓ⫼ⱘ໻໮᭄᭄᥂㒧ᵘ䛑ᰃ䖭ḋⱘᇣᇍ䈵ˈ᠔ҹˈ៥Ӏ ϔϾ inode ⱘ໻ᇣ㑺 300 ໮Ͼᄫ㡖ˈ಴ℸϔϾ义䴶Ёৃҹᆍ㒇 8 ϾҹϞⱘ inodeˈ᠔ҹ inode ᰃᇣᇍ䈵DŽ ᰃ໻ᇍ䈵ˈϔ⾡ᰃᇣᇍ䈵DŽ᠔䇧ᇣᇍ䈵ˈᰃᣛ೼ϔϾ义䴶Ёৃҹᆍ㒇ϟད޴Ͼᇍ䈵ⱘ䙷ϔ⾡DŽ՟བˈ 䖲ІⱘĀ໻ഫā˄slab˅ᵘ៤ˈ㗠↣Ͼ໻ഫЁ߭ࣙ৿њ㢹ᑆৠ⾡ⱘᇍ䈵DŽϔ㠀㗠㿔ˈᇍ䈵ߚϸ⾡ˈϔ⾡ ऎ䯳߫ᑊ䴲⬅৘Ͼᇍ䈵Ⳉ᥹ᵘ៤ˈ㗠ᰃ⬅ϔކℸ໪ˈslab ㅵ⧚ᮍ⊩䖬᳝ϔϾ⡍⚍ˈ↣⾡ᇍ䈵ⱘ㓧 䯳߫ˈгᰃ kswapd ⱘϔ乍ࡳ㛑DŽ ऎކЎⱘህᰃҢᆠԭⱘ䯳߫ಲᬊ⠽⧚义䴶ˈাᰃᔧᯊ៥Ӏ≵᳝㒚䆆DŽ݊ᅲˈᅮᳳഄẔᶹ੠໘⧚䖭ѯ㓧 136 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᣛ䩜 s_mem ᣛ৥ᇍ䈵ऎⱘ䍋⚍ˈinuse ᰃᏆߚ䜡ᇍ䈵ⱘ䅵఼᭄DŽ᳔ৢˈfree ⱘؐᣛᯢњぎ䯆ᇍ䈵䫒Ёⱘ ऎ䯳߫ˈcolouroff Ўᴀ slab Ϟⴔ㡆ऎⱘ໻ᇣˈކ䖭䞠ⱘ䯳߫༈ list ⫼ᴹᇚϔഫ slab 䫒ܹϔϾϧ⫼㓧 152 } slab_t; 151 kmem_bufctl_t free; 150 unsigned int inuse; /* num of objs active in slab */ 149 void *s_mem; /* including colour offset */ 148 unsigned long colouroff; 147 struct list_head list; 146 typedef struct slab_s { 145 */ 144 * free slabs. 143 * Slabs are chained into one ordered list: fully used, partial, then fully 142 * for a slab, or allocated from an general cache. 141 * Manages the objs in a slab. Placed either at the beginning of mem allocated 140 * 139 * slab_t 138 /* ==================== mm/slab.c 138 152 ==================== ϟ䴶ህᰃ slab ᦣ䗄㒧ᵘ slab_t ⱘᅮНˈ೼ mm/slab.c Ё˖ 㸠ᇍ唤ⱘDŽކᣝ催䗳㓧ᄬЁⱘ㓧 㸠ᇍ唤ᯊˈᠡ๲ࡴ㢹ᑆᄫ㡖Փ݊ᇍ唤DŽ᠔ҹˈϔϾ slab Ϟⱘ᠔᳝ᇍ䈵ⱘ䍋ྟഄഔ䛑ᖙ✊ᰃކ · ↣Ͼᇍ䈵ⱘ໻ᇣ෎ᴀϞᰃ᠔䳔᭄᥂㒧ᵘⱘ໻ᇣDŽা᳝ᔧ᭄᥂㒧ᵘⱘ໻ᇣϡϢ催䗳㓧ᄬЁⱘ㓧 Ѣৠϔ⾡ᇍ䈵ⱘ৘Ͼ slab ᰃϾᐌ᭄DŽ Ѣⴔ㡆ऎⱘ໻ᇣҹঞ slab Ϣ݊↣Ͼᇍ䈵ⱘⳌᇍ໻ᇣDŽԚ䆹ऎඳϢⴔ㡆ऎⱘᘏ੠ᇍއ݊໻ᇣপ · ↣Ͼ slab Ϟ᳔ৢϔϾᇍ䈵ҹৢг᳝ϔϾᇣᇣⱘᑳ᭭ऎᰃϡ⫼ⱘˈ䖭ᰃᇍⴔ㡆ऎ໻ᇣⱘ㸹ٓˈ ೼催䗳㓧ᄬЁѦⳌ䫭ᓔˈ䖭ḋৃҹᬍ୘催䗳㓧ᄬⱘᬜ⥛DŽ ⴔ㡆ऎⱘ໻ᇣሑৃ㛑ഄᅝᥦ៤ϡৠⱘ໻ᇣˈՓᕫϡৠ slab ϞৠϔⳌᇍԡ㕂ⱘᇍ䈵ⱘ䍋ྟഄഔ 䯳߫Ёⱘ৘Ͼ slab ⱘކ㸠ᇍ唤ⱘ䖍⬠DŽৠϔϾᇍ䈵ⱘ㓧ކ䈵ⱘ䍋ྟഄഔᕔৢ᥼ࠄ঺ϔϾϢ㓧 㸠ᇍ唤ˈ㗠ⴔ㡆ऎⱘ䆒㕂াᰃᇚ㄀ϔϾᇍކ䴶䖍⬠ᓔྟⱘˈ᠔ҹᴀᴹህ㞾✊ᣝ催䗳㓧ᄬⱘ㓧 㑻催䗳㓧ᄬЁ㓧ᄬ㸠໻ᇣЎ 16 Ͼᄫ㡖ˈPentium Ў 32 Ͼᄫ㡖˅ᇍ唤DŽ↣Ͼ slab 䛑ᰃҢϔϾ义 㸠ā˄cache line˅໻ᇣ˄80386 ⱘϔކՓ slab Ёⱘ↣Ͼᇍ䈵ⱘ䍋ྟഄഔ䛑ᣝ催䗳㓧ᄬЁⱘĀ㓧 · ↣Ͼ slab ⱘ༈䚼᳝ϔᇣᇣⱘऎඳᰃϡՓ⫼ⱘˈ⿄ЎĀⴔ㡆ऎā˄coloring area˅DŽⴔ㡆ऎⱘ໻ᇣ 䜡Փ⫼ˈህ㽕ᇚ䆹 slab Ң㄀Ѡ៾䕀⿏ࠄ㄀ϔ៾এ˅DŽ 㗠䇗ᭈ݊೼ slab 䯳߫Ёⱘԡ㕂˄՟བˈབᵰ slab Ϟ᠔᳝ⱘᇍ䈵䛑Ꮖߚމḍ᥂䆹 slab ⱘՓ⫼ᚙ ㋴ҹঞ slab ᦣ䗄㒧ᵘЁⱘ䅵఼᭄ˈᑊϨܗ· ᔧ䞞ᬒϔϾᇍ䈵ᯊˈা䳔㽕䇗ᭈ䫒᥹᭄㒘ЁⱘⳌᑨ ህᇚ slab ᥻ࠊ㒧ᵘЁⱘ䅵఼᭄ࡴ 1ˈᑊᇚ䆹ᇍ䈵Ңぎ䯆䯳߫Ё㜅䫒DŽ · ೼ slab ᦣ䗄㒧ᵘЁ䖬᳝ϔϾᏆ㒣ߚ䜡Փ⫼ⱘᇍ䈵ⱘ䅵఼᭄ˈᔧᇚϔϾぎ䯆ⱘᇍ䈵ߚ䜡Փ⫼ᯊˈ ᇍ䈵䫒᥹᭄㒘㒧ড়೼ϔ䍋ᔶ៤њϔᴵぎ䯆ᇍ䈵䫒DŽ ৠᯊˈ↣Ͼ slab ⱘᦣ䗄㒧ᵘЁ䛑᳝ϔϾᄫ↉ˈ㸼ᯢ䆹 slab Ϟⱘ㄀ϔϾぎ䯆ᇍ䈵DŽ䖭Ͼᄫ↉Ϣ · 137 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ऎⱘ slab 䯳߫ⱘ༈䚼DŽ䖭ḋˈᔧ㽕ߚ䜡ϔϾᶤ⾡᭄᥂㒧ᵘކmm_structˈЗ㟇Ѣ IP 㔥㒰ֵᙃࣙㄝㄝ˅㓧 kmem_cache_t ᭄᥂㒧ᵘDŽ㗠↣Ͼ䖭ḋⱘ᭄᥂㒧ᵘজᰃᶤ⾡᭄᥂㒧ᵘ˄՟བ inodeǃvm_area_structǃ Ң೒ 2.9 Ёৃҹⳟߎˈ᳔催ⱘሖ⃵ᰃ slab 䯳߫ cache_cacheˈ䯳߫Ёⱘ↣Ͼ slab 䕑᳝㢹ᑆϾ ᘏԧⱘ㒘㒛བϟ义೒ 2.9 ᠔⼎DŽ · ↣Ͼ㄀Ѡሖ slab Ϟ䛑㓈ᣕⴔϔϾぎ䯆ᇍ䈵䯳߫DŽ · ㄀Ѡሖ slab 䯳߫෎ᴀϞ䛑ᰃЎᶤ⾡ᇍ䈵ˈे᭄᥂㒧ᵘϧ⫼ⱘDŽ slab 䯳߫DŽ · ↣Ͼ㄀ϔሖ slab Ϟⱘ↣Ͼᇍ䈵ˈे kmem_cache_t ᭄᥂㒧ᵘ䛑ᰃ䯳߫༈ˈ⫼ᴹ㓈ᣕϔϾ㄀Ѡሖ 䛑ᰃ kmem_cache_t ᭄᥂㒧ᵘDŽ · ᘏḍ cache_cache ᰃϔϾ kmem_cache_t 㒧ᵘˈ⫼ᴹ㓈ᣕ㄀ϔሖ slab 䯳߫ˈ䖭ѯ slab Ϟⱘᇍ䈵 䖭ḋˈህᔶ៤њϔ⾡ሖ⃵ᓣⱘᷥᔶ㒧ᵘ˗ ݊䯳߫༈߭гᰃϔϾ kmem_cache_t 㒧ᵘˈ⿄Ў cache_cacheDŽ ⾡ᇍ䈵ⱘ slab 䯳߫༈гᰃ೼ slab ϞDŽ㋏㒳Ё᳝Ͼᘏⱘ slab 䯳߫ˈ݊ᇍ䈵ᰃ৘Ͼ݊ᅗᇍ䈵ⱘ slab 䯳߫༈ˈ ᅗ᭄᥂㒧ᵘϔḋˈ↣݊ڣˈᇍ䈵ⱘᵘ䗴ߑ᭄˄constructor˅ˈ঺ϔϾᰃᢚ䰸ߑ᭄˄destructor˅DŽ᳝䍷ⱘᰃ ᣕ slab 䯳߫ⱘ৘⾡ᣛᓩ໪ˈ䖬䆄ᔩњ䗖⫼Ѣ䯳߫Ё↣Ͼ slab ⱘ৘⾡খ᭄ˈҹঞϸϾߑ᭄ᣛ䩜˖ϔϾᰃ Ў↣⾡ᇍ䈵ᓎゟⱘ slab 䯳߫䛑᳝Ͼ䯳߫༈ˈ݊᥻ࠊ㒧ᵘЎ kmem_cache_tDŽ䆹᭄᥂㒧ᵘЁ䰸⫼ᴹ㓈 ㋴ⱘؐЎ BUFCTL_ENDDŽܗᇍᑨ ㋴ⱘؐЎϟϔϾᇍ䈵ⱘᑣোˈ᳔ৢϔϾᇍ䈵᠔ܗ೼ぎ䯆ᇍ䈵䫒᥹᭄㒘Ёˈ䫒ݙ↣ϔϾᇍ䈵᠔ᇍᑨ 131 typedef unsigned int kmem_bufctl_t; 130 #define SLAB_LIMIT 0xffffFFFE 129 #define BUFCTL_END 0xffffFFFF 128 127 */ 126 * is less than 512 (PAGE_SIZE<<3), but greater than 256. 125 * Note: This limit can be raised by introducing a general cache whose size 124 * to have too many per slab. 123 * This is not serious, as it is only for large objects, when it is unwise 122 * For 32bit archs with 4 kB pages, is this 56. 121 * that does not use off•slab slabs. 120 * bufctls are used. The limit is the size of the largest general cache 119 * the number of objects a slab (not a cache) can contain when off•slab 118 * This allows the bufctl structure to be small (one int), but limits 117 * slab an object belongs to. 116 * This implementaion relies on "struct page" for locating the cache & 115 * 114 * linked offsets. 113 * Bufctl's are used for linking objs within a slab 112 * 111 * kmem_bufctl_t: 110 /* ==================== mm/slab.c 110 131 ==================== ㄀ϔϾᇍ䈵ˈ݊ᅲᰃϾᭈ᭄˖ 138 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Ў“slab_cacheāDŽslab_cache⿄ˈ∴ކѢ⠽⧚义䴶ߚ䜡Ё䞛⫼ⱘᣝ໻ᇣߚऎˈজ䞛⫼ slab ᮍᓣㅵ⧚ⱘ䗮⫼㓧 ऎߚ䜡ᴎࠊDŽ᠔ҹˈLinux ݙḌЁ䖬᳝ϔ⾡᮶㉏Ԑކᓔ䫔гϡ໻ⱘ᭄᥂㒧ᵘ䖬ᰃৃҹড়⫼ϔϾ䗮⫼ⱘ㓧 ऎ䯳߫ˈϔѯϡ໾ᐌ⫼ǃ߱ྟ࣪ކϡ䖛ˈᑊ䴲ݙḌЁՓ⫼ⱘ᠔᭄᳝᥂㒧ᵘ䛑᳝ᖙ㽕ᢹ᳝ϧ⫼ⱘ㓧 slab ⱘぎ䯈Փ⫼⥛DŽ ϟˈᇚ䫒᥹ᣛ䩜гҢ slab Ϟ␌⾏ߎᴹ䲚Ёᄬᬒˈҹᦤ催މslab ぎ䯈Ϟⱘ䞡໻⌾䌍ˈ᠔ҹ೼䖭ѯ⡍⅞ᚙ 䈵ⱘ໻ᇣᙄདᰃ⠽⧚义䴶ⱘ 1/2ǃ1/4 ៪ 1/8 ᯊˈᇚձ䰘Ѣ↣Ͼᇍ䈵ⱘ䫒᥹ᣛ䩜㋻᣼ⴔᬒ೼ϔ䍋Ӯ䗴៤ ᰃᇚ᠋ষᴀ䲚Ёᄬᬒ೼⌒ߎ᠔䞠៪㗙ᰃᶤϾҷ⧚ᴎᵘ䞠DŽℸ໪ˈᔧᇍڣ㗠䕑᳝Ā໻ᇍ䈵āⱘ slab ህད ᰃ䱣䑿ᨎᏺⴔⱘ᠋ষᴀˈڣ㒧ᵘϢ᥻ࠊᇍ䈵Ⳍߚ⾏ⱘϔ㠀῵ᓣDŽᠧϾ↨ᮍˈ䕑᳝Āᇣᇍ䈵āⱘ slab ህད kmem_slab_t Ё᳝ϔϾᣛ䩜ᣛ৥Ⳍᑨ slab Ϟⱘ㄀ϔϾᇍ䈵ˈ᠔ҹ䘏䕥ϞᰃϔḋⱘDŽ݊ᅲˈ䖭ህᰃᇚ᥻ࠊ ࠊ㒧ᵘᬒ೼ᅗ᠔ҷ㸼ⱘ slab Ϟˈ㗠ᰃᇚ݊␌⾏ߎᴹˈ䲚Ёᬒ೼঺໪ⱘ slab ϞDŽ⬅Ѣ೼ slab ⱘ᥻ࠊ㒧ᵘ ᔧ᭄᥂㒧ᵘ↨䕗໻ˈ಴㗠ϡሲѢĀᇣᇍ䈵āᯊˈslab ⱘ㒧ᵘ⬹᳝ϡৠDŽϡৠП໘ᰃϡᇚ slab ⱘ᥻ ऎ㒧ᵘ⼎ᛣ೒ކ೒ 2.9 ᇣᇍ䈵㓧 ⫼ⱘ slab 䯳߫ˈ㗠ᑨ䗮䖛 kmem_cache_alloc()ߚ䜡DŽ ៥Ӏ೼ᴀゴЁⳟࠄ䖛ⱘ mm_structǃvm_area_structǃfileǃdentryǃinode ㄝᐌ⫼ⱘ᭄᥂㒧ᵘˈህ䛑᳝ϧ ᠔ҹˈᔧ䳔㽕ߚ䜡ϔϾ݋᳝ϧ⫼ slab 䯳߫ⱘ᭄᥂㒧ᵘᯊˈᑨ䆹䗮䖛 kmem_cache_alloc()ߚ䜡DŽ՟བˈ void *kmem_cache_free(kmem cache_t *cachep, void *objp); void *kmem_cache_alloc(kmem_cache_t *cachep, int flags); ݋ԧⱘߑ᭄ᰃ˖ ऎⱘ໻ᇣˈᑊϨϡ䳔㽕߱ྟ࣪њDŽކऎᯊˈህা㽕ᣛᯢᰃҢાϔϾ䯳߫Ёߚ䜡ˈ㗠ϡ䳔㽕䇈ᯢ㓧ކⱘ㓧 139 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 479 0, 478 sizeof(struct sk_buff), 477 skbuff_head_cache = kmem_cache_create("skbuff_head_cache", 476 475 int i; 474 { 473 void __init skb_init(void) ==================== net/core/skbuff.c 473 487 ==================== net/core/skbuff.c ЁᅮНⱘ˖ ᮄⱘᡔᴃ˅ˈԐТϡ໳݌ൟDŽ᠔ҹˈ៥ӀҢݙḌⱘ㔥㒰偅ࡼᄤ㋏㒳Ё䗝ᢽњϔϾ՟њˈ䖭ᰃ೼ ߚ߽⫼ slab ㅵ⧚ᴎࠊ᠔ᦤկⱘད໘˄Ⳍᇍᴹ䇈ˈslab ᰃ↨䕗ܙ˄constructor˅ⱘᣛ䩜ˈгህᰃ䇈ᑊ≵᳝ ऎⱘᓎゟ䛑⫼ NULL ԰Ўᵘ䗴ߑ᭄ކ᭄᥂㒧ᵘⱘՓ⫼DŽԚᰃˈࠄ⦄೼ЎℶˈLinux ݙḌЁ໮᭄ϧ⫼㓧 ऎ䯳߫ᰃϔϾᕜདⱘᅲ՟ˈ䇏㗙䛑Ꮖ㒣❳ᙝњ䖭Ͼކᴀᴹˈ㰮ᄬऎ䯈㒧ᵘ vm_area_struct ⱘϧ⫼㓧 ऎ䯳߫ⱘᓎゟކ2.10.1ϧ⫼㓧 ऎ䯳߫ⱘᓎゟDŽކҟ㒡㓧ܜऎⱘߚ䜡Пࠡˈ៥Ӏކ೼䆆㾷ݙḌ㓧 ህϡ䆆њDŽ Ӯᇚᇮ೼Փ⫼Ёⱘ slab ᠔ऴ᥂ⱘ义䴶ᤶߎDŽ⬅Ѣ vmalloc()Ϣ៥Ӏৢ䴶㽕䆆ⱘ ioremap()䴲ᐌⳌԐˈ䖭䞠 ᥂㒧ᵘˈ߭ kswapd াᰃҢ৘Ͼ slab 䯳߫Ёᇏᡒ੠ᬊ䲚ぎ䯆ϡ⫼ⱘ slabˈᑊ䞞ᬒ᠔ऴ⫼ⱘ义䴶ˈԚᰃϡ ᦣ৘Ͼ䖯⿟ⱘ⫼᠋ぎ䯈ˈ㗠ḍᴀህⳟϡࠄ䗮䖛 vmalloc()ߚ䜡ⱘ义䴶㸼乍DŽ㟇Ѣ䗮䖛 kmalloc()ߚ䜡ⱘ᭄ ᰃݙḌЁਃࡼˈҢݙḌぎ䯈Ёߚ䜡ⱘDŽ⬅ vmalloc()ߚ䜡ⱘぎ䯈ϡӮ㹿 kswapd ᤶߎˈ಴Ў kswapd াᠿ brk()DŽϡ䖛 brk()ᰃ⬅䖯⿟೼⫼᠋ぎ䯈ਃࡼᑊҢ⫼᠋ぎ䯈Ёߚ䜡ⱘˈ㗠 vmalloc()߭ᰃ೼㋏㒳ぎ䯈ˈгህ ߑ᭄ vmalloc()ҢݙḌⱘ㰮ᄬぎ䯈˄3G ҹϞ˅ߚ䜡ϔഫ㰮ᄬҹঞⳌᑨⱘ⠽⧚ݙᄬˈ㉏ԐѢ㋏㒳䇗⫼ void vfree(void* addr); void *vmalloc(unsigned long size); 乎֓ᦤϔϟˈݙḌЁ䖬᳝ϔ㒘Ϣݙᄬߚ䜡᳝݇ⱘߑ᭄ vmalloc()੠ vfree()˖ ЎПߚ䜡ϔϾ义䴶DŽ vfsmount ᭄᥂㒧ᵘህᰃ䖭ḋDŽབᵰ᭄᥂㒧ᵘⱘ໻ᇣ᥹䖥ѢϔϾ义䴶ˈ߭гৃҹᑆ㛚ህ䗮䖛 alloc_pages() 䖛 kmalloc()ߚ䜡DŽ䖭ϔ㠀䛑ᰃѯ㒚ᇣ㗠জϡᐌ⫼ⱘ᭄᥂㒧ᵘˈ՟བ㄀ 5 ゴЁᅝ㺙᭛ӊ㋏㒳ᯊՓ⫼ⱘ ᠔ҹˈᔧ䳔㽕ߚ䜡ϔϾϡ݋᳝ϧ⫼ slab 䯳߫ⱘ᭄᥂㒧ᵘ㗠জϡᖙЎПՓ⫼ᭈϾ义䴶ᯊˈህᑨ䆹䗮 void kfree(const void *objp); void *kmalloc(size_t size, int flags); ऎⱘߑ᭄Ў˖ކЁߚ䜡੠䞞ᬒ㓧∴ ކ᠔䕑ᇍ䈵ⱘ໻ᇣDŽ᳔ᇣⱘᰃ 32ˈ✊ৢձ⃵Ў 64ǃ128ǃĂⳈ㟇 128K˄гህᰃ 32 Ͼ义䴶˅DŽҢ䗮⫼㓧 ㋴ᣛ৥ϔϾϡৠⱘ slab 䯳߫DŽ䖭ѯ slab 䯳߫ⱘϡৠП໘ҙ೼ѢܗⳌᇍᴹ䇈↨䕗䴭ᗕ˅ˈ᭄㒘Ёⱘ↣ϔϾ ⱘ㒧ᵘϢ cache_cache ໻ৠᇣᓖˈাϡ䖛݊乊ሖϡᰃϔϾ䯳߫㗠ᰃϔϾ㒧ᵘ᭄㒘˄䖭ᰃ⬅Ѣ slab_cache 140 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 206 unsigned int dflags; /* dynamic flags */ 205 unsigned int growing; 204 kmem_cache_t *slabp_cache; 203 unsigned int colour_next; /* cache colouring */ 202 unsigned int colour_off; /* colour offset */ 201 size_t colour; /* cache colouring range */ 200 199 unsigned int gfpflags; 198 /* force GFP flags, e.g. GFP_DMA */ 197 196 unsigned int gfporder; 195 /* order of pgs per slab (2^n) */ 194 /* 2) slab additions /removals */ 193 192 #endif 191 unsigned int batchcount; 190 #ifdef CONFIG_SMP 189 spinlock_t spinlock; 188 unsigned int num; /* # of objs per slab */ 187 unsigned int flags; /* constant flags */ 186 unsigned int objsize; 185 struct list_head *firstnotfull; 184 struct list_head slabs; 183 /* full, partial first, then free */ 182 /* 1) each alloc & free */ 181 struct kmem_cache_s { ==================== mm/slab.c 181 237 ==================== ᵘDŽ᭄᥂㒧ᵘ㉏ൟ kmem_cache_t ᰃ೼ mm/slab.c ЁᅮНⱘ˖ 㽕Ң cache_cache Ёߚ䜡ϔϾ kmem_cache_t 㒧ᵘˈ԰Ў sk_buff ᭄᥂㒧ᵘ slab 䯳߫ⱘ᥻ࠊ㒧ˈܜ佪 ᡞᅗⱘݙᆍὖ㽕ҟ㒡བϟDŽ 䖭䞠ህϡ⏅ܹࠄ݊ҷⷕЁএњˈাᰃˈۏދⱘџᚙ䖛Ѣϧ䮼ˈ䖛Ѣخߑ᭄ kmem_cache_create()᠔ ऎ䖯㸠⡍⅞ⱘ໘⧚DŽކdestructor ߭Ў NULLˈгህᰃ䇈೼ᢚ䰸៪䞞ᬒϔϾ slab ᯊ᮴䳔ᇍ৘Ͼ㓧 㸠䖍⬠˄16 ᄫ㡖៪ 32 ᄫ㡖˅ᇍ唤DŽᇍ䈵ⱘᵘ䗴ߑ᭄Ў skb_headerinit()ˈ㗠ކ㽕∖Ϣ催䗳㓧ᄬЁⱘ㓧 ऎ೼ slab Ёⱘԡ⿏ᑊ᮴⡍⅞㽕∖DŽԚᰃখ᭄ flags Ў SLAB_HWCACHE_ALIGNˈ㸼⼎ކ⼎ᇍ㄀ϔϾ㓧 ऎˈ៪㗙䇈Āᇍ䈵āⱘ໻ᇣᰃ sizeof(struct sk_buff)DŽ䇗⫼খ᭄ offset Ў 0ˈ㸼ކDŽ↣Ͼ㓧މ߫ⱘՓ⫼ᚙ ऎ䯳߫ˈ݊ৡ⿄Ў“skbuff_head_cacheāDŽ䇏㗙ৃ⫼ੑҸ“cat /proc/slabinfoāᴹ㾖ᆳ䖭ѯ䯳ކⱘϧ⫼㓧 ⱘџᚙᅲ䰙ϞህᰃЎ㔥㒰偅ࡼᄤ㋏㒳ᓎゟϔϾ sk_buff ᭄᥂㒧ᵘخҢҷⷕЁৃҹⳟࠄˈskb_init ᠔ 487 } 486 skb_queue_head_init(&skb_head_pool[i].list); 485 for (i=0; ikmem_cache_alloc()] 1506 void * kmem_cache_alloc (kmem_cache_t *cachep, int flags) 1507 { 1508 return __kmem_cache_alloc(cachep, flags); 1509 } ==================== mm/slab.c 1291 1299 ==================== [alloc_skb()>kmem_cache_alloc()>__kmem_cache_alloc()] 1291 static inline void * __kmem_cache_alloc (kmem_cache_t *cachep, int flags) 1292 { 1293 unsigned long save_flags; 1294 void* objp; 1295 1296 kmem_cache_alloc_head(cachep, flags); 1297 try_again: 1298 local_irq_save(save_flags); 1299 #ifdef CONFIG_SMP ...... ==================== mm/slab.c 1319 1325 ==================== 1319 #else 1320 objp = kmem_cache_alloc_one(cachep); 1321 #endif 1322 local_irq_restore(save_flags); 1323 return objp; 1324 alloc_new_slab: 1325 #ifdef CONFIG_SMP ...... ==================== mm/slab.c 1328 1336 ==================== 1328 #endif 1329 local_irq_restore(save_flags); 1330 if (kmem_cache_grow(cachep, flags)) 1331 /* Someone may have stolen our objs. Doesn't matter, we'll 1332 * just come back here again. 1333 */ 1334 goto try_again; 1335 return NULL; 1336 } 佪ܜˈalloc_skb()Ёⱘᣛ䩜 skbuff_head_cache ᰃϾܼሔ䞣ˈᣛ৥Ⳍᑨⱘ slab 䯳߫˄䖭䞠ᰃ sk_buff 㒧ᵘⱘ slab 䯳߫˅ⱘ䯳߫༈ˈ಴㗠䖭䞠ⱘখ᭄ cachep гᣛ৥䖭Ͼ䯳߫DŽ ⿟ᑣЁⱘ kmem_cache_alloc_head()ᰃЎ䇗䆩㗠䆒ⱘˈ೼ᅲ䰙䖤㸠ⱘ㋏㒳Ёᰃぎߑ᭄DŽ៥Ӏ೼䖭䞠 гϡ݇ᖗ໮໘⧚఼ SMP 㒧ᵘˈ᠔ҹ䖭䞠݇䬂ᗻⱘ᪡԰ህᰃ kmem_cache_alloc_one()ˈ䖭ᰃϔϾᅣ᪡԰ˈ ݊ᅮНЎ˖ ==================== mm/slab.c 1246 1263 ==================== Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 145 1246 /* 1247 * Returns a ptr to an obj in the given cache. 1248 * caller must guarantee synchronization 1249 * #define for the goto optimization 8•) 1250 */ 1251 #define kmem_cache_alloc_one(cachep) \ 1252 ({ \ 1253 slab_t *slabp; \ 1254 \ 1255 /* Get slab alloc is to come from. */ \ 1256 {\ 1257 struct list_head* p = cachep•>firstnotfull; \ 1258 if (p == &cachep•>slabs) \ 1259 goto alloc_new_slab; \ 1260 slabp = list_entry(p,slab_t, list); \ 1261 }\ 1262 kmem_cache_alloc_one_tail(cachep, slabp); \ 1263 }) Ϟ䴶__kmem_cache_alloc()ⱘҷⷕϔᅮ㽕੠䖭ϾᅣᅮН㒧ড়䍋ᴹⳟᠡ㛑ᯢⱑDŽҢᅮНЁৃҹⳟࠄˈ ㄀ϔℹᰃ䗮䖛 slab 䯳߫༈Ёⱘᣛ䩜 firstnotfullˈᡒࠄ䆹䯳߫Ё㄀ϔϾ৿᳝ぎ䯆ᇍ䈵ⱘ slabDŽབᵰ䖭Ͼ䩜 ᣛ৥ slab 䯳߫ⱘ䫒༈˄ᰃ䫒Ёⱘ㄀ϔϾ slab˅ˈ䙷ህ㸼⼎䯳߫ЁᏆ㒣≵᳝৿᳝ぎ䯆ᇍ䈵ⱘ slabˈ᠔ҹህ 䕀ࠄ__kmem_cache_alloc()Ёⱘᷛো alloc_new_slab ໘˄1324 㸠˅ˈ䖯ϔℹᠽܙ䆹 slab 䯳߫DŽ བᵰᡒࠄњ৿᳝ぎ䯆ᇍ䈵ⱘ slabˈህ䇗⫼ kmem_cache_alloc_tail()ߚ䜡ϔϾぎ䯆ᇍ䈵ᑊ䖨ಲ݊ᣛ䩜˖ ==================== mm/slab.c 1211 1228 ==================== [alloc_skb()>kmem_cache_alloc()>__kmem_cache_alloc()>kmem_cache_alloc_one_tail()] 1211 static inline void * kmem_cache_alloc_one_tail (kmem_cache_t *cachep, 1212 slab_t *slabp) 1213 { 1214 void *objp; 1215 1216 STATS_INC_ALLOCED(cachep); 1217 STATS_INC_ACTIVE(cachep); 1218 STATS_SET_HIGH(cachep); 1219 1220 /* get obj pointer */ 1221 slabp•>inuse++; 1222 objp = slabp•>s_mem + slabp•>free*cachep•>objsize; 1223 slabp•>free=slab_bufctl(slabp)[slabp•>free]; 1224 1225 if (slabp•>free == BUFCTL_END) 1226 /* slab now full: move to next slab for next alloc */ 1227 cachep•>firstnotfull = slabp•>list.next; 1228 #if DEBUG ...... ==================== mm/slab.c 1242 1244 ================== 1242 #endif Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 1090 * the more obvious place, simply to reduce the critical path length 1089 * The test for missing atomic flag is performed here, rather than 1088 /* 1087 1086 return 0; 1085 if (flags & SLAB_NO_GROW) 1084 BUG(); 1083 if (flags & ~(SLAB_DMA|SLAB_LEVEL_MASK|SLAB_NO_GROW)) 1082 */ 1081 * keeping it out of the critical path in kmem_cache_alloc(). 1080 /* Be lazy and only check for valid flags here, 1079 1078 unsigned long save_flags; 1077 unsigned long ctor_flags; 1076 unsigned int i, local_flags; 1075 size_t offset; 1074 void *objp; 1073 struct page *page; 1072 slab_t *slabp; 1071 { 1070 static int kmem_cache_grow (kmem_cache_t * cachep, int flags) 1069 */ 1068 * kmem_cache_alloc() when there are no active objs left in a cache. 1067 * Grow (by 1) the number of slabs within a cache. This is called by 1066 /* [alloc_skb()>kmem_cache_alloc()>__kmem_cache_alloc()>kmem_cache_grow()] ==================== mm/slab.c 1066 1168 ==================== ᭄ kmem_cache_grow()ⱘҷⷕг೼ mm/slab.c Ё˖ ऎⱘ䯳߫Ā⫳䭓ā䍋ᴹDŽߑކalloc_new_slab ໘ˈ䗮䖛 kmem_cache_grow()ᴹߚ䜡ϔഫᮄⱘ slabˈՓ㓧 ᅮ slab 䯳߫ЁᏆ㒣ϡᄬ೼৿᳝ぎ䯆ᇍ䈵ⱘ slabˈ᠔ҹ㽕䕀ࠄࠡ䴶ҷⷕЁⱘᷛো؛ϡ䖛ˈ៥Ӏ ⱘϟϔϾ slabDŽ བᵰ䖒ࠄњ slab ⱘ᳿ሒ BUFCTL_ENDˈህ㽕䇗ᭈ䆹 slab 䯳߫ⱘᣛ䩜 firstnotfullˈՓᅗᣛ৥䯳߫Ё ᬍবњ slab_t Ё free ᄫ↉ⱘؐˈህ䱤৿ⴔᔧࠡᇍ䈵Ꮖ㹿ߚ䜡DŽ ㋴ⱘؐ߭㸼ᯢϟϔϾぎ䯆ᇍ䈵ⱘᑣোDŽܗ᣼ⴔ᭄᥂㒧ᵘ slab_tDŽ䆹᭄㒘ҹᔧࠡᇍ䈵ⱘᑣোЎϟᷛˈ㗠᭄㒘 䖭Ͼᅣ᪡԰䖨ಲϔϾ kmem_bufctl_t ᭄㒘ⱘഄഔˈ䖭Ͼ᭄㒘ህ೼ slab Ё᭄᥂㒧ᵘ slab_t ⱘϞᮍˈ㋻ 155 ((kmem_bufctl_t *)(((slab_t*)slabp)+1)) 154 #define slab_bufctl(slabp) \ ==================== mm/slab.c 154 155 ==================== ✊ৢˈህ䗮䖛ᅣ᪡԰ slab_bufctl()ᬍবᄫ↉ free ⱘؐˈՓᅗᣛᯢϟϔϾぎ䯆ᇍ䈵ⱘᑣোDŽ slab Ёⱘᇍ䈵ऎˈ᠔ҹḍ᥂䖭ѯ᭄᥂੠ᴀϧ⫼䯳߫ⱘᇍ䈵໻ᇣˈህৃҹ䅵ㅫߎ䆹ぎ䯆ᇍ䈵ⱘ䍋ྟഄഔDŽ བࠡ᠔䗄ˈ᭄᥂㒧ᵘ slab_t Ёⱘ free 䆄ᔩⴔϟϔ⃵ৃҹߚ䜡ⱘぎ䯆ᇍ䈵ⱘᑣোˈ㗠 s_mem ߭ᣛ৥ 1244 } return objp; 1243 146 147 1091 * in kmem_cache_alloc(). If a caller is seriously mis•behaving they 1092 * will eventually be caught here (where it matters). 1093 */ 1094 if (in_interrupt() && (flags & SLAB_LEVEL_MASK) != SLAB_ATOMIC) 1095 BUG(); 1096 1097 ctor_flags = SLAB_CTOR_CONSTRUCTOR; 1098 local_flags = (flags & SLAB_LEVEL_MASK); 1099 if (local_flags == SLAB_ATOMIC) 1100 /* 1101 * Not allowed to sleep. Need to tell a constructor about 1102 * this • it might need to know... 1103 */ 1104 ctor_flags |= SLAB_CTOR_ATOMIC; 1105 1106 /* About to mess with non•constant members • lock. */ 1107 spin_lock_irqsave(&cachep•>spinlock, save_flags); 1108 1109 /* Get colour for the slab, and cal the next value. */ 1110 offset = cachep•>colour_next; 1111 cachep•>colour_next++; 1112 if (cachep•>colour_next >= cachep•>colour) 1113 cachep•>colour_next = 0; 1114 offset *= cachep•>colour_off; 1115 cachep•>dflags |= DFLGS_GROWN; 1116 1117 cachep•>growing++; 1118 spin_unlock_irqrestore(&cachep•>spinlock, save_flags); 1119 1120 /* A series of memory allocations for a new slab. 1121 * Neither the cache•chain semaphore, or cache•lock, are 1122 * held, but the incrementing c_growing prevents this 1123 * cache from being reaped or shrunk. 1124 * Note: The cache could be selected in for reaping in 1125 * kmem_cache_reap(), but when the final test is made the 1126 * growing value will be seen. 1127 */ 1128 1129 /* Get mem for the objs. */ 1130 if (!(objp = kmem_getpages(cachep, flags))) 1131 goto failed; 1132 1133 /* Get slab management. */ 1134 if (!(slabp = kmem_cache_slabmgmt(cachep, objp, offset, local_flags))) 1135 goto opps1; 1136 1137 /* Nasty!!!!!! I hope this is OK. */ 1138 i = 1 << cachep•>gfporder; 1139 page = virt_to_page(objp); Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 1005 if (!slabp) 1004 slabp = kmem_cache_alloc(cachep•>slabp_cache, local_flags); 1003 /* Slab management obj is off•slab. */ 1002 if (OFF_SLAB(cachep)) { 1001 1000 slab_t *slabp; 999 { 998 void *objp, int colour_off, int local_flags) 997 static inline slab_t * kmem_cache_slabmgmt (kmem_cache_t *cachep, 996 /* Get the memory for a slab management obj. */ [alloc_skb()>kmem_cache_alloc()>__kmem_cache_alloc()>kmem_cache_grow()>kmem_cache_slabmgmt()] ==================== mm/slab.c 996 1021 ==================== ⧚ֵᙃDŽ݊ҷⷕ೼ mm/slab.c Ё˖ ߚ䜡ぎ䯆义䴶DŽߚ䜡њ⫼Ѣᇍ䈵ᴀ䑿ⱘݙᄬ义䴶ৢˈ䖬㽕䗮䖛 kmem_cache_slabmgmt()ᓎゟ䍋 slab ⱘㅵ ऎⱘ义䴶ˈ䖭Ͼߑ᳔᭄㒜䇗⫼ alloc_pages()ކऎ໻ᇣDŽ✊ৢˈ䗮䖛 kmem_getpages()ߚ䜡⫼Ѣ݋ԧᇍ䈵㓧 䴶ᵘ䗴៤ slabˈ䫒ܹ㒭ᅮⱘ slab 䯳߫DŽᇍখ᭄䖯㸠њϔѯẔᶹҹৢˈህ䅵ㅫߎϟϔഫ slab ᑨ᳝ⱘⴔ㡆 ߑ᭄ kmem_cache_grow()ḍ᥂䯳߫༈Ёⱘখ᭄ gfporder ߚ䜡㢹ᑆ䖲㓁ⱘ⠽⧚ݙᄬ义䴶ˈᑊᇚ䖭ѯ义 1168 } 1167 return 0; 1166 spin_unlock_irqrestore(&cachep•>spinlock, save_flags); 1165 cachep•>growing••; 1164 spin_lock_irqsave(&cachep•>spinlock, save_flags); 1163 failed: 1162 kmem_freepages(cachep, objp); 1161 opps1: 1160 return 1; 1159 spin_unlock_irqrestore(&cachep•>spinlock, save_flags); 1158 1157 cachep•>failures = 0; 1156 STATS_INC_GROWN(cachep); 1155 cachep•>firstnotfull = &slabp•>list; 1154 if (cachep•>firstnotfull == &cachep•>slabs) 1153 list_add_tail(&slabp•>list,&cachep•>slabs); 1152 /* Make slab active. */ 1151 1150 cachep•>growing••; 1149 spin_lock_irqsave(&cachep•>spinlock, save_flags); 1148 1147 kmem_cache_init_objs(cachep, slabp, ctor_flags); 1146 1145 } while (••i); 1144 page++; 1143 PageSetSlab(page); 1142 SET_PAGE_SLAB(page, slabp); 1141 SET_PAGE_CACHE(page, cachep); do { 1140 148 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 1041 * the same cache which they are a constructor for. 1040 * Constructors are not allowed to allocate memory from 1039 /* 1038 1037 #endif ==================== mm/slab.c 1037 1046 ==================== ...... 1030 #if DEBUG 1029 void* objp = slabp•>s_mem+cachep•>objsize*i; 1028 for (i = 0; i < cachep•>num; i++) { 1027 1026 int i; 1025 { 1024 slab_t * slabp, unsigned long ctor_flags) 1023 static inline void kmem_cache_init_objs (kmem_cache_t * cachep, [alloc_skb()>kmem_cache_alloc()>__kmem_cache_alloc()>kmem_cache_grow()>kmem_cache_init_objs()] ==================== mm/slab.c 1023 1030 ==================== ᳔ৢˈ䗮䖛 kmem_cache_init_objs()䖯㸠 slab ⱘ߱ྟ࣪˖ 㽕ᡞ page 㒧ᵘЁⱘ PG_slab ᷛᖫԡ䆒៤ 1ˈҹ㸼ᯢ䆹义䴶ⱘ⫼䗨DŽ SET_PAGE_SLABˈ䆒㕂݊䫒᥹ᣛ䩜 prev ੠ nextˈՓᅗӀߚ߿ᣛ৥᠔ሲⱘ slab ੠ slab 䯳߫DŽৠᯊˈ䖬 ᇍߚ䜡⫼Ѣ slab ⱘ↣Ͼ义䴶ⱘ page ᭄᥂㒧ᵘˈ㽕䗮䖛ᅣ᪡԰ SET_PAGE_CACHE ੠ ⚍DŽ ᇍ䈵䫒᥹᭄㒘ⱘ໻ᇣҹঞ᥻ࠊ㒧ᵘ slab_t ⱘ໻ᇣDŽ᠔ҹˈslabp•>s_mem ᘏᰃᣛ৥ slab_t Ϟᇍ䈵ऎⱘ䍋 㸠᠔ᓩ⫼ⱘ colour_off ϡᰃৠϔϾ᭄ؐˈ䖭Ͼব䞣ⱘؐᏆ೼ 1013 㸠԰њ䇗ᭈˈ೼ॳᴹⱘ᭄ؐϞ๲ࡴњ slab Ԣッⱘϔ䚼ߚぎ䯈⫼԰݊᥻ࠊ㒧ᵘˈϡ䖛೼ℸПࠡ㽕ぎߎϔᇣഫⴔ㡆ऎDŽ⊼ᛣ䖭䞠 1012 㸠੠ 1017 ݊ϧ⫼ⱘ slab 䯳߫DŽ᠔ҹˈབᵰᰃ໻ᇍ䈵ህ䗮䖛 kmem_cache_alloc()ߚ䜡ϔϾ slab_tˈ৺߭ህ⫼ᇣᇍ䈵 ␌⾏Ѣ slab П໪DŽԚᰃˈ໻ᇍ䈵ⱘ᥻ࠊ㒧ᵘгᰃ slab_tˈᄬ೼ѢЎ䖭⾡᭄᥂㒧ᵘϧ䆒ⱘ slab Ϟˈг᳝ བࠡ᠔䗄ˈᇣᇍ䈵ⱘ slab ᥻ࠊ㒧ᵘ slab_t Ϣᇍ䈵ᴀ䑿݅ᄬѢৠϔ slab Ϟˈ㗠໻ᇍ䈵ⱘ᥻ࠊ㒧ᵘ߭ 1021 } 1020 return slabp; 1019 1018 slabp•>s_mem = objp+colour_off; 1017 slabp•>colouroff = colour_off; 1016 slabp•>inuse = 0; 1015 } 1014 sizeof(kmem_bufctl_t) + sizeof(slab_t)); 1013 colour_off += L1_CACHE_ALIGN(cachep•>num * 1012 slabp = objp+colour_off; 1011 */ 1010 * if you enable OPTIMIZE 1009 slabp = objp 1008 /* FIXME: change to 1007 } else { return NULL; 1006 149 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ==================== mm/slab.c 1466 1472 ================ ᰒ✊ˈ᪡԰ⱘЏԧᰃ__kmem_cache_free()ˈ䖭䞠াᰃ೼᪡԰ᳳ䯈ᡞЁᮁ᱖ᯊ݇䯁DŽ 1566 } 1565 local_irq_restore(flags); 1564 __kmem_cache_free(cachep, objp); 1563 local_irq_save(flags); 1562 1561 #endif ==================== mm/slab.c 1561 1566 ==================== ...... 1557 #if DEBUG 1556 unsigned long flags; 1555 { 1554 void kmem_cache_free (kmem_cache_t *cachep, void *objp) ==================== mm/slab.c 1554 1557 ==================== ऎⱘ䞞ᬒˈ䖭ᰃ⬅ kmem_cache_free()ᅠ៤ⱘDŽ݊ҷⷕ೼ mm/slab.c Ё˖ކݡᴹⳟϧ⫼㓧 slab ᄬ೼DŽ᳝ⱘ䆱ህᇚ䖭ѯ slab ऴ⫼ⱘݙᄬ义䴶䞞ᬒDŽ ऎ slab 䯳߫ˈⳟⳟᰃ৺᳝ᅠܼぎ䯆ⱘކkmem_cache_reap()ᴹĀᬊࡆāDŽгህᰃ䇈ˈձ⃵Ẕᶹ㢹ᑆϧ⫼㓧 ऎ䯳߫ᰃ৺ऩ䇗ഄ៤䭓㗠ϡ㓽ᇣਸ਼˛៥Ӏ೼ҹࠡᦤࠄ䖛ˈkswapd ᅮᯊഄ䇗⫼ކ䙷Мˈ㓧 ഫDŽ ऎⱘ slabˈ䙷ህݡᕔϟ䎥ϔሖˈ䗮䖛 kmem_cache_grow()ˈߚ䜡㢹ᑆ义䴶㗠ᵘ䗴ߎϔϾ slabކ䕑᳝ぎ䯆㓧 བᵰ༅䋹ⱘ䆱ˈህᕔϟ䎥ϔሖҢ slab 䯳߫Ё䗮䖛 kmem_cache_alloc()ߚ䜡DŽ㽕ᰃ slab 䯳߫ЁᏆ㒣≵᳝ Ꮖ㒣ߚ䜡ⱘ slab ഫЁߚ䜡DŽेˈ∴ކ䗮䖛 skb_head_from_pool()ˈҢ㓧ܜЁህᰃ alloc_skb()ˈ݋ԧ߭ᰃ ऎⱘߚ䜡ˈ೼៥Ӏ䖭Ͼᚙ᱃ކऎߚ䜡ᴎࠊDŽԡѢ᳔催ሖⱘᰃ㓧ކ䖭ḋˈህᵘ៤њϔϾ໮ሖ⃵ⱘ㓧 䆩ϔ䘡˄㾕__kmem_cache_alloc()Ёⱘ 1334 㸠 ˅DŽ ऎৃկߚ䜡њDŽ᠔ҹ䕀ಲᷛো try_again ໘ݡކऎ䯳߫Ā៤䭓āњϔѯҹৢˈህϔᅮ᳝ぎ䯆㓧ކ㓧 ㋴ⱘ߱ྟ࣪DŽܗskb_headerinit()DŽℸ໪ˈҷⷕЁⱘ 1060 㸠ᰃᇍ䫒᥹᭄㒘Ё৘Ͼ 䖭䞠ⱘ߱ྟ࣪ࣙᣀњᇍ݋ԧᇍ䈵ᵘ䗴ߑ᭄ⱘ䇗⫼DŽᇍѢ sk_buff ᭄᥂㒧ᵘˈ䖭Ͼߑ᭄ህᰃ 1064 } 1063 slabp•>free = 0; 1062 slab_bufctl(slabp)[i•1] = BUFCTL_END; 1061 } 1060 slab_bufctl(slabp)[i] = i+1; 1059 #endif ==================== mm/slab.c 1059 1064 ==================== ...... 1046 #if DEBUG 1045 cachep•>ctor(objp, cachep, ctor_flags); 1044 if (cachep•>ctor) 1043 */ Otherwise, deadlock. They must also be threaded. * 1042 150 151 [kmem_cache_free()>__kmem_cache_free()] 1466 /* 1467 * __kmem_cache_free 1468 * called with disabled ints 1469 */ 1470 static inline void __kmem_cache_free (kmem_cache_t *cachep, void* objp) 1471 { 1472 #ifdef CONFIG_SMP ...... ==================== mm/slab.c 1493 1496 ==================== 1493 #else 1494 kmem_cache_free_one(cachep, objp); 1495 #endif 1496 } ៥Ӏ೼䖭䞠ϡ݇ᖗ໮໘⧚఼ SMP 㒧ᵘˈ㗠ߑ᭄ kmem_cache_free_one()ⱘҷⷕг೼ৠϔ᭛ӊЁ˖ =================== mm/slab.c 1367 1380 ==================== [kmem_cache_free()>__kmem_cache_free()>kmem_cache_free_one()] 1367 static inline void kmem_cache_free_one(kmem_cache_t *cachep, void *objp) 1368 { 1369 slab_t* slabp; 1370 1371 CHECK_PAGE(virt_to_page(objp)); 1372 /* reduces memory footprint 1373 * 1374 if (OPTIMIZE(cachep)) 1375 slabp = (void*)((unsigned long)objp&(~(PAGE_SIZE•1))); 1376 else 1377 */ 1378 slabp = GET_PAGE_SLAB(virt_to_page(objp)); 1379 1380 #if DEBUG ...... ==================== mm/slab.c 1402 1448 ==================== 1402 #endif 1403 { 1404 unsigned int objnr = (objp•slabp•>s_mem)/cachep•>objsize; 1405 1406 slab_bufctl(slabp)[objnr] = slabp•>free; 1407 slabp•>free = objnr; 1408 } 1409 STATS_DEC_ACTIVE(cachep); 1410 1411 /* fixup slab chain */ 1412 if (slabp•>inuse•• == cachep•>num) 1413 goto moveslab_partial; 1414 if (!slabp•>inuse) 1415 goto moveslab_free; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ऎⱘ䞞ᬒᑊϡᇐ㟈 slab ⱘކऎⱘᓔ䫔䛑ᰃᕜᇣⱘDŽ䖭䞠䖬㽕ᣛߎˈ㓧ކৃ㾕ˈߚ䜡੠䞞ᬒϧ⫼㓧 ᬍࡼDŽ · ॳᴹ slab Ϟህ᳝ぎ䯆ᇍ䈵ˈ⦄೼াϡ䖛ᰃ໮њϔϾˈԚгᑊ≵᳝ܼ䚼ぎ䯆ˈ᠔ҹϡ䳔㽕ӏԩ Ң䯳߫Ёॳᴹⱘԡ㕂⿏ࠄ䯳߫ⱘ㄀ϝ៾ˈे䯳߫ⱘ᳿ሒDŽ · ॳᴹ slab Ϟህ᳝ぎ䯆ᇍ䈵ˈ㗠⦄೼᠔᳝ᇍ䈵䛑ぎ䯆њˈ᠔ҹ㽕䕀ࠄ moveslab_free ໘ˈᡞ slab ⱘԡ㕂⿏ࠄ䯳߫ⱘ㄀Ѡ៾ˈे⬅ᣛ䩜 firstnotfull ᠔ᣛⱘഄᮍDŽ · ॳᴹ slab ≵᳝ぎ䯆ᇍ䈵ˈ㗠⦄೼᳝њˈ᠔ҹ㽕䕀ࠄ moveslab_partial ໘ˈᡞ slab Ң䯳߫Ёॳᴹ ҹৢ᳝ϝ⾡ৃ㛑˖ޣ᠔ሲ slab 䯳߫᥻ࠊ㒧ᵘЁ䴲ぎ䯆ᇍ䈵ⱘ䅵᭄DŽ䗦ޣ1407 㸠˅DŽৠᯊˈ䖬㽕䗦 ህৃҹᕫࠄ䖭Ͼslabⱘᣛ䩜DŽᡒࠄњᇍ䈵᠔೼ⱘslabˈህৃҹ䗮䖛݊䫒᥹᭄㒘䞞ᬒ㒭ᅮᇍ䈵њ˄㾕1404̚ ᵘЁ䫒༈ list ݙˈॳ⫼Ѣ䯳߫䫒᥹ⱘᣛ䩜 prevˈᣛ৥义䴶᠔ሲⱘ slabˈ᠔ҹ䗮䖛ᅣ᪡԰ GET_PAGE_SLAB ഔৃҹㅫߎ݊᠔೼ⱘ义䴶DŽ䖯ϔℹˈབࠡ᠔䗄˄㾕 kmem_cache_grow()Ёⱘ 1142 㸠˅ˈ义䴶ⱘ page 㒧 ҷⷕЁⱘ CHECK_PAGE া⫼Ѣ⿟ᑣ䇗䆩ˈ೼ᅲ䰙䖤㸠ⱘ㋏㒳ЁЎぎ䇁হDŽḍ᥂ᕙ䞞ᬒᇍ䈵ⱘഄ 1448 } 1447 } 1446 return; 1445 cachep•>firstnotfull = t•>next; 1444 if (cachep•>firstnotfull == &slabp•>list) 1443 list_add_tail(&slabp•>list, &cachep•>slabs); 1442 list_del(&slabp•>list); 1441 1440 struct list_head *t = cachep•>firstnotfull•>prev; 1439 { 1438 */ 1437 * FIXME: optimize 1436 * c_firstnotfull might point to slabp 1435 * was partial, now empty. 1434 /* 1433 moveslab_free: 1432 } 1431 return; 1430 list_add_tail(&slabp•>list, t); 1429 list_del(&slabp•>list); 1428 return; 1427 if (slabp•>list.next == t) 1426 cachep•>firstnotfull = &slabp•>list; 1425 1424 struct list_head *t = cachep•>firstnotfull; 1423 { 1422 */ 1421 * slabp: there are no partial slabs in this case 1420 * Even if the page is now empty, we can set c_firstnotfull to 1419 /* was full. 1418 moveslab_partial: 1417 return; 1416 152 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 㒓⿟ kswapd ೼਼ᳳᗻⱘ䖤㸠ЁӮ䇗⫼ kmem_cache_reap()ಲᬊ䖭ѯ义䴶DŽ䖭Ͼߑ᭄ⱘҷⷕ೼ mm/slab.c ᳔ৢˈ៥Ӏᴹⳟⳟぎ䯆 slab ⱘĀᬊࡆāˈेᇍᵘ៤ぎ䯆 slab ⱘ义䴶ⱘಲᬊDŽҹࠡ៥Ӏⳟࠄ䖛ˈݙḌ ⱘ԰⫼៥Ӏ೼ࠡ䴶Ꮖ㒣ㅔ㽕ഄҟ㒡䖛њDŽ ऎDŽ㗠 kmem_cache_alloc()ކ㽕∖ⱘ䯳߫ˈ✊ৢህ䇗⫼ߑ᭄__kmem_cache_alloc()Ң䆹䯳߫Ёߚ䜡ϔϾ㓧 ऎ໻ᇣކ䖭䞠䗮䖛ϔϾ for ᕾ⦃ˈ೼ cache_sizes 㒧ᵘ᭄㒘Ё⬅ᇣࠄ໻ᠿᦣˈᡒࠄ㄀ϔϾ㛑⒵䎇㓧 1544 } 1543 return NULL; 1542 BUG(); // too big size 1541 } 1540 csizep•>cs_dmacachep : csizep•>cs_cachep, flags); 1539 return __kmem_cache_alloc(flags & GFP_DMA ? 1538 continue; 1537 if (size > csizep•>cs_size) 1536 for (; csizep•>cs_size; csizep++) { 1535 1534 cache_sizes_t *csizep = cache_sizes; 1533 { 1532 void * kmalloc (size_t size, int flags) 1531 */ 1530 * %GFP_KSWAPD • Don't use unless you're modifying kswapd. 1529 * 1528 * Don't use unless you're in the NFS code. 1527 * %GFP_NFS • has a slightly lower probability of sleeping than %GFP_KERNEL. 1526 * 1525 * %GFP_KERNEL • allocate normal kernel ram. May sleep. 1524 * 1523 * %GFP_USER • allocate memory on behalf of user. May sleep. 1522 * 1521 * %GFP_ATOMIC • allocation will not sleep. Use inside interrupt handlers. 1520 * 1519 * %GFP_BUFFER • XXX 1518 * 1517 * in the kernel. The @flags argument may be one of: 1516 * kmalloc is the normal method of allocating memory 1515 * 1514 * @flags: the type of memory to allocate. 1513 * @size: how many bytes of memory are required. 1512 * kmalloc • allocate memory 1511 /** ==================== mm/slab.c 1511 1544 ==================== ऎߚ䜡ⱘߑ᭄ kmalloc()ᰃ೼ mm/slab.c ЁᅮНⱘ˖ކ㓧 ऎⱘ໻ᇣ㗠ߚ៤㢹ᑆ䯳߫DŽ⫼њ䗮⫼ކcache_sizesˈ䞠䴶ḍ᥂㓧 ∴ކ߫໪ˈݙḌЁ䖬᳝ϔϾ䗮⫼ⱘ㓧 ऎ䯳ކऎⱘߚ䜡DŽࠡ䴶䆆䖛ˈ䰸৘⾡ϧ⫼ⱘ㓧ކऎⱘߚ䜡੠䞞ᬒˈݡⳟⳟ䗮⫼㓧ކⳟᅠњϧ⫼㓧 䞞ᬒˈぎ䯆 slab ⱘ䞞ᬒᰃ⬅ kswapd ㄝݙḌ㒓⿟਼ᳳഄ䇗⫼ kmem_cache_reap()ᅠ៤ⱘDŽ 153 154 Ё˖ ==================== mm/slab.c 1701 1742 ==================== 1701 /** 1702 * kmem_cache_reap • Reclaim memory from caches. 1703 * @gfp_mask: the type of memory required. 1704 * 1705 * Called from try_to_free_page(). 1706 */ 1707 void kmem_cache_reap (int gfp_mask) 1708 { 1709 slab_t *slabp; 1710 kmem_cache_t *searchp; 1711 kmem_cache_t *best_cachep; 1712 unsigned int best_pages; 1713 unsigned int best_len; 1714 unsigned int scan; 1715 1716 if (gfp_mask & __GFP_WAIT) 1717 down(&cache_chain_sem); 1718 else 1719 if (down_trylock(&cache_chain_sem)) 1720 return; 1721 1722 scan = REAP_SCANLEN; 1723 best_len = 0; 1724 best_pages = 0; 1725 best_cachep = NULL; 1726 searchp = clock_searchp; 1727 do { 1728 unsigned int pages; 1729 struct list_head* p; 1730 unsigned int full_free; 1731 1732 /* It's safe to test this without holding the cache•lock. */ 1733 if (searchp•>flags & SLAB_NO_REAP) 1734 goto next; 1735 spin_lock_irq(&searchp•>spinlock); 1736 if (searchp•>growing) 1737 goto next_unlock; 1738 if (searchp•>dflags & DFLGS_GROWN) { 1739 searchp•>dflags &= ~DFLGS_GROWN; 1740 goto next_unlock; 1741 } 1742 #ifdef CONFIG_SMP ...... ==================== mm/slab.c 1750 1825 ==================== 1750 #endif 1751 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 155 1752 full_free = 0; 1753 p = searchp•>slabs.prev; 1754 while (p != &searchp•>slabs) { 1755 slabp = list_entry(p, slab_t, list); 1756 if (slabp•>inuse) 1757 break; 1758 full_free++; 1759 p = p•>prev; 1760 } 1761 1762 /* 1763 * Try to avoid slabs with constructors and/or 1764 * more than one page per slab (as it can be difficult 1765 * to get high orders from gfp()). 1766 */ 1767 pages = full_free * (1<gfporder); 1768 if (searchp•>ctor) 1769 pages = (pages*4+1)/5; 1770 if (searchp•>gfporder) 1771 pages = (pages*4+1)/5; 1772 if (pages > best_pages) { 1773 best_cachep = searchp; 1774 best_len = full_free; 1775 best_pages = pages; 1776 if (full_free >= REAP_PERFECT) { 1777 clock_searchp = list_entry(searchp•>next.next, 1778 kmem_cache_t,next); 1779 goto perfect; 1780 } 1781 } 1782 next_unlock: 1783 spin_unlock_irq(&searchp•>spinlock); 1784 next: 1785 searchp = list_entry(searchp•>next.next,kmem_cache_t,next); 1786 } while (••scan && searchp != clock_searchp); 1787 1788 clock_searchp = searchp; 1789 1790 if (!best_cachep) 1791 /* couldn't find anything to reap */ 1792 goto out; 1793 1794 spin_lock_irq(&best_cachep•>spinlock); 1795 perfect: 1796 /* free only 80% of the free slabs */ 1797 best_len = (best_len*4 + 1)/5; 1798 for (scan = 0; scan < best_len; scan++) { 1799 struct list_head *p; 1800 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 156 1801 if (best_cachep•>growing) 1802 break; 1803 p = best_cachep•>slabs.prev; 1804 if (p == &best_cachep•>slabs) 1805 break; 1806 slabp = list_entry(p,slab_t,list); 1807 if (slabp•>inuse) 1808 break; 1809 list_del(&slabp•>list); 1810 if (best_cachep•>firstnotfull == &slabp•>list) 1811 best_cachep•>firstnotfull = &best_cachep•>slabs; 1812 STATS_INC_REAPED(best_cachep); 1813 1814 /* Safe to drop the lock. The slab is no longer linked to the 1815 * cache. 1816 */ 1817 spin_unlock_irq(&best_cachep•>spinlock); 1818 kmem_slab_destroy(best_cachep, slabp); 1819 spin_lock_irq(&best_cachep•>spinlock); 1820 } 1821 spin_unlock_irq(&best_cachep•>spinlock); 1822 out: 1823 up(&cache_chain_sem); 1824 return; 1825 } 䖭Ͼߑ᭄ᠿᦣ slab 䯳߫ⱘ䯳߫ cache_cacheˈҢЁথ⦄ৃկĀᬊࡆāⱘ slab 䯳߫DŽϡ䖛ˈᑊϡᰃ↣ ⃵䛑ᠿᦣᭈϾ cache_cacheˈ㗠াᰃᠿᦣ݊Ёⱘϔ䚼ߚ slab 䯳߫ˈ᠔ҹ䳔㽕᳝Ͼܼሔ䞣ᴹ䆄ᔩϟϔ⃵ᠿ ᦣⱘ䍋⚍ˈ䖭ህᰃ clock_searchp˖ ==================== mm/slab.c 360 361 ==================== 360 /* Place maintainer for reaping. */ 361 static kmem_cache_t *clock_searchp = &cache_cache; ᡒࠄњৃҹĀᬊࡆāⱘ slab 䯳߫ˈгϡᰃᡞᅗ᠔᳝ぎ䯆ⱘ slab 䛑ܼ䚼ಲᬊˈ㗠ᰃಲᬊ݊Ёⱘ໻㑺 80%DŽᇍѢ㽕ಲᬊⱘ slabˈ䇗⫼ kmem_slab_destroy()䞞ᬒ݊৘Ͼ义䴶ˈ៥Ӏᡞ䖭Ͼߑ᭄⬭㒭䇏㗙㞾Ꮕ䯙 䇏DŽ ==================== mm/slab.c 540 554 ==================== [kmem_cache_reap()>kmem_slab_destroy()] 540 /* Destroy all the objs in a slab, and release the mem back to the system. 541 * Before calling the slab must have been unlinked from the cache. 542 * The cache•lock is not held/needed. 543 */ 544 static void kmem_slab_destroy (kmem_cache_t *cachep, slab_t *slabp) 545 { 546 if (cachep•>dtor 547 #if DEBUG 548 || cachep•>flags & (SLAB_POISON | SLAB_RED_ZONE) Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⠽⧚ݙᄬ义䴶ᑊᓎゟ䍋᯴ᇘDŽ㗠Ϩ䖭ḋⱘ᯴ᇘгᑊϡᰃϔ⃵ህᓎゟᅠ↩ˈৃҹ೼䆓䯂䖭ѯ㰮ᄬ义䴶ᓩ ೼㰮ᄬぎ䯈ߚ䜡ϔϾ㰮ᄬऎ䯈ˈ✊ৢЎℸऎ䯈ߚ䜡ⳌᑨⱘܜᇍѢݙᄬ义䴶ⱘㅵ⧚ˈ䗮ᐌ៥Ӏ䛑ᰃ ᄬぎ䯈ˈᅲ䰙Ϟᰃ㰮ᄬぎ䯈ⱘ᠟↉DŽ೼ Linux ݙḌЁˈ䖭ḋⱘ᯴ᇘᰃ䗮䖛ߑ᭄ ioremap()ᴹᓎゟⱘDŽ ఼᯴ᇘࠄݙټ఼᯴ᇘˈ䛑ᖙ乏㽕᳝ᇚ໪䆒वϞⱘᄬټњDŽ᠔ҹˈϡㅵ CPU ⱘ䆒䅵䞛⫼ I/O ᯴ᇘ៪ᰃᄬ ఼ˈ⫮㟇䖬ৃ㛑ᏺ᳝ϔഫ ROMˈ䞠䴶㺙᳝ৃᠻ㸠ҷⷕDŽ㞾Ң PCI ᘏ㒓ߎ⦄ҹৢˈ䖭Ͼ䯂乬ህ᳈さߎټ वˈᏺ᳝ 2MB ⱘᄬڣ໻ϡϔḋDŽ՟བˈ೼ PC ᴎϞৃҹᦦϞϔഫ೒ैމ䆒ⱘ᠔᳝᪡԰њDŽ㗠⦄೼ⱘᚙ ড়Ѣᮽᳳⱘ䅵ㅫᴎᡔᴃˈ䙷ᯊ׭ϔϾ໪䆒䗮ᐌ䛑া᳝޴Ͼᆘᄬ఼ˈ䗮䖛䖭޴Ͼᆘᄬ఼ህৃҹᅠ៤ᇍ໪ Ԛᰃˈ䱣ⴔ䅵ㅫᴎᡔᴃⱘথሩˈҎӀথ⦄ऩ㒃ⱘ I/O ᯴ᇘᮍᓣᰃϡ㛑⒵䎇㽕∖ⱘDŽℸ⾡ᮍᓣা䗖 ⱘ I/O ഄഔぎ䯈Ꮖ㒣䴲ᐌᢹ᣸DŽ њϧ䮼ⱘ IN ੠ OUT ᣛҸˈԚᰃ⫼Ѣ I/O ᣛҸⱘĀഄഔЏ䯈āⳌᇍᴹ䇈ᰃᕜᇣⱘDŽџᅲϞˈ⦄೼ X86 ᠔ҹ೼ X86 CPU Ё䆒ゟˈܗऩټᄬߚሲϸϾϡৠⱘԧ㋏DŽ䆓䯂ݙᄬⱘᣛҸϡ㛑⫼ᴹ䆓䯂໪䚼䆒໛ⱘᄬ ϢݙܗऩټPower PC ㄝ CPU 䛑䞛⫼䖭⾡ᮍᓣDŽ㗠೼䞛⫼ I/O ᯴ᇘᮍᓣⱘ㋏㒳Ё߭ϡৠˈ໪䚼䆒໛ⱘᄬ ᠔ҹϡ䳔㽕ϧ䮼䆒ゟ⫼Ѣ໪䆒 I/O ⱘᣛҸDŽҢࠡⱘ PDP•11ǃৢᴹⱘ M68Kǃˈܗऩټ䯂໪䚼䆒໛ⱘᄬ ϔḋഄ䆓ܗ䆓䯂ϔϾݙᄬऩڣ఼ǃ᭄᥂ᆘᄬ఼ㄝㄝˈএ԰Ўݙᄬⱘϔ䚼ߚߎ⦄೼㋏㒳ЁⱘDŽCPU ৃҹ བ᥻ࠊᆘᄬ఼ǃ⢊ᗕᆘᄬˈܗऩټᇘᓣ˄I/O mapped˅DŽ೼䞛⫼ݙᄬ᯴ᇘᮍᓣⱘ CPU Ёˈ໪䚼䆒໛ⱘᄬ ᴹ䇈ˈᇍ໪䚼䆒໛ⱘ䆓䯂᳝ϸ⾡ϡৠⱘᔶᓣˈϔ⾡িݙᄬ᯴ᇘᓣ˄memory mapped˅ˈ঺ϔ⾡ি I/O ᯴ ϡњ㽕᳝䕧ܹˋ䕧ߎˈ᠔ҹᇍ໪䚼䆒໛ⱘ䆓䯂ᰃ CPU 䆒䅵ЁⱘϔϾ䞡㽕䯂乬DŽϔ㠀ܡӏԩ㋏㒳䛑 ぎ䯈ⱘഄഔ᯴ᇘټ2.11 ໪䚼䆒໛ᄬ 580 } 579 kmem_cache_free(cachep•>slabp_cache, slabp); 578 if (OFF_SLAB(cachep)) 577 kmem_freepages(cachep, slabp•>s_mem•slabp•>colouroff); 576 575 } 574 } 573 #endif ==================== mm/slab.c 573 580 ==================== ...... 566 #if DEBUG 565 (cachep•>dtor)(objp, cachep, 0); 564 if (cachep•>dtor) 563 #endif ==================== mm/slab.c 563 566 ==================== ...... 554 #if DEBUG 553 void* objp = slabp•>s_mem+cachep•>objsize*i; 552 for (i = 0; i < cachep•>num; i++) { 551 int i; 550 ) { endif# 549 157 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 111 110 return NULL; 109 if (!size || last_addr < phys_addr) 108 last_addr = phys_addr + size • 1; 107 /* Don't allow wraparound or zero size */ 106 105 unsigned long offset, last_addr; 104 struct vm_struct * area; 103 void * addr; 102 { 101 void * __ioremap(unsigned long phys_addr, unsigned long size, unsigned long flags) 100 */ 99 * caller shouldn't need to know that small detail. 98 * have to convert them into an offset in a page•aligned mapping, but the 97 * NOTE! We need to allow non•page•aligned mappings too: we will obviously 96 * 95 * directly. 94 * address space. Needed when the kernel wants to access high addresses 93 * Remap an arbitrary physical address space into the kernel virtual 92 /* [ioremap()>__ioremap()] ==================== arch/i386/mm/ioremap.c 92 152 ==================== ᅲ䰙ⱘ᪡԰⬅__ioremap()ᅠ៤ˈᰃ೼ arch/i386/mm/ioremap.c ЁᅮНⱘ˖ 143 } 142 return __ioremap(offset, size, 0); 141 { 140 extern inline void * ioremap (unsigned long offset, unsigned long size) ==================== include/asm•i386/io.h 140 143 ==================== ⳟ ioremp()ˈ䖭ᰃϔϾ inline ߑ᭄ˈᅮНѢ include/asm•i386/io.h˖ܜ ᳝DŽ䖭ḋⱘ义䴶ᔧ✊ϡ᳡Ңࡼᗕⱘ⠽⧚ݙᄬ义䴶ߚ䜡ˈгϡ᳡Ң kswapd ⱘᤶߎDŽ ࠡⱘ Linux ݙḌ⠜ᴀЁˈ䖭Ͼߑ᭄⿄Ў vremap()ˈৢᴹᬍ៤њ ioremap()ˈгさߎഄড᯴њ䖭ϔ⚍DŽ䖬 াথ⫳Ѣᇍ໪䚼䆒໛ⱘ᪡԰ˈ㗠䖭ᰃݙḌⱘџˈ᠔ҹⳌᑨⱘ㰮ᄬऎ䯈ᰃ೼㋏㒳ぎ䯈˄3GB ҹϞ˅DŽ೼ҹ 㰮ᢳഄഔˈ᠔ҹᖙ䳔Āড৥āഄҢ⠽⧚ഄഔߎথᡒࠄϔ⠛㰮ᄬぎ䯈ᑊᓎゟ䍋᯴ᇘDŽ݊⃵ˈ䖭ḋⱘ䳔∖ Ё䯈ˈ㗠ᖙ乏Փ⫼ټ⠽⧚ഄഔˈ៪㗙䇈Āᘏ㒓ഄഔāDŽ೼ Linux ㋏㒳ЁˈCPU ϡ㛑ᣝ⠽⧚ഄഔᴹ䆓䯂ᄬ ऎ䯈ᰃҢ 0x0000 f000 0000 0000 ᓔྟⱘˈ䖭ህᰃ䆹ऎ䯈ⱘټⱘ CPU ⱘ㾦ᑺᴹ䇈ˈᅗাⶹ䘧䖭⠛⠽⧚ᄬ ऎ䯈ⱘഄഔৃ㛑ᰃҢ 0x0000 f000 0000 0000 ᓔྟⱘˈ䖭Ё䯈Ꮖ㒣᳝њϔ⃵᯴ᇘDŽৃᰃˈҢ㋏㒳˄PC˅ټ ⠽⧚ഄഔDŽԚᰃᇚ䖭ഫ೒ᔶवᦦࠄ PC ⱘϔϾ PCI ᘏ㒓ᦦῑϞᯊˈ⬅ PC ⱘ CPU ᠔ⳟࠄⱘ䖭⠛⠽⧚ᄬ ఼ᰃҢഄഔ 0 ᓔྟⱘˈ䖭ህᰃवϞሔ䚼ⱘټवϞ᳝Ͼᖂ໘⧚఼DŽᇍѢवϞⱘᖂ໘⧚఼ᴹ䇈ˈवϞⱘᄬ CPU ᴹ䇈ᰃ䗣ᯢⱘDŽ᠔ҹ᳝ᯊᡞ䖭⾡ഄഔ⿄ЎĀᘏ㒓ഄഔāDŽВ՟ᴹ䇈ˈབᵰ᳝ϔഫĀᱎ㛑೒ᔶवāˈ 㗠ᰃ೼ᘏ㒓Ϟ⬅ CPU ᠔Āⳟࠄāⱘഄഔˈ䖭Ё䯈ᕜৃ㛑Ꮖ㒣㒣ग़њϔ⃵ഄഔ᯴ᇘˈԚ䖭⾡᯴ᇘᇍѢ ೼໪䆒वϞሔ䚼ⱘ⠽⧚ഄഔˈܗऩټ఼ߎ⦄೼ᘏ㒓ϞⱘഄഔDŽ䖭ഄഔ᳾ᖙህᰃ䖭ѯᄬټ໪䆒वϞⱘᄬ ऎ䯈ˈ݊ഄഔህᰃټ᳝ϔϾ⠽⧚ᄬܜ៥Ӏˈܜ䍋义䴶ᓖᐌᯊ䗤ℹഄᓎゟDŽԚᰃˈioremap()߭ϡৠˈ佪 158 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䰸䴲Ⳍᑨⱘ⠽⧚义䴶ॳᴹህᰃֱ⬭ⴔⱘぎ⋲DŽ೼䗮䖛䖭ѯẔᶹҹৢˈ䖬㽕ֱ䆕䆹⠽⧚ഄഔᰃᣝ义䴶䖍 さњˈކᇍᑨⱘ㰮ᢳഄഔ˅DŽབᵰ᠔㽕∖ⱘ phys_addr ᇣѢ䖭ϾϞ䰤ⱘ䆱ˈህ㸼⼎Ϣ㋏㒳ⱘ⠽⧚ݙᄬ᳝ 㸠Ёⱘ high_memory ᰃ೼㋏㒳߱ྟ࣪ᯊˈḍ᥂Ẕ⌟ࠄⱘ⠽⧚ݙᄬ໻ᇣ䆒㕂ⱘ⠽⧚ݙᄬഄഔⱘϞ䰤˄᠔ 0x100000 ⫼Ѣ VGA व੠ BIOSˈ䖭ᰃ೼㋏㒳߱ྟ࣪ᯊህ᯴ᇘདњⱘˈϡ㛑։⢃ࠄ䖭Ͼऎ䯈ЁএDŽ121 Ẕᶹⱘᰃऎ䯈ⱘ໻ᇣ᮶ϡЎ 0ˈгϡ㛑໾໻㗠䍞ߎњ 32 ԡഄഔぎ䯈ⱘ䰤ࠊDŽ⠽⧚ഄഔ 0xa0000 㟇 ᒋẔᶹāǃĀि⫳ẔᶹāDŽ݊Ё 109 㸠عᰃϔѯ՟㸠Ẕᶹˈᐌᐌ⿄Ў“sanity checkāˈ៪㗙䇈Āܜ佪 152 } 151 return (void *) (offset + (char *)addr); 150 } 149 return NULL; 148 vfree(addr); 147 if (remap_area_pages(VMALLOC_VMADDR(addr), phys_addr, size, flags)) { 146 addr = area•>addr; 145 return NULL; 144 if (!area) 143 area = get_vm_area(size, VM_IOREMAP); 142 */ 141 * Ok, go for it.. 140 /* 139 138 size = PAGE_ALIGN(last_addr) • phys_addr; 137 phys_addr &= PAGE_MASK; 136 offset = phys_addr & ~PAGE_MASK; 135 */ 134 * Mappings have to be page•aligned 133 /* 132 131 } 130 return NULL; 129 if(!PageReserved(page)) 128 for(page = virt_to_page(t_addr); page <= virt_to_page(t_end); page++) 127 126 t_end = t_addr + (size • 1); 125 t_addr = __va(phys_addr); 124 123 struct page *page; 122 char *t_addr, *t_end; 121 if (phys_addr < virt_to_phys(high_memory)) { 120 */ 119 * Don't allow anybody to remap normal RAM that we're using.. 118 /* 117 116 return phys_to_virt(phys_addr); 115 if (phys_addr >= 0xA0000 && last_addr < 0x100000) 114 */ Don't remap the low PCI/ISA area, it's always mapped.. * 113 */ 112 159 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 14 struct vm_struct { ==================== include/linux/vmalloc.h 14 19 ==================== vm_area_structˈԚ㽕ㅔऩᕫ໮ˈᅮНѢ include/linux/vmalloc.h ੠ mm/vmalloc.c Ё˖ 䖭䞠ⱘ vm_struct ੠ vmlist 䛑ᰃ⬅ݙḌϧ⫼ⱘDŽvm_struct ҢὖᗉϞ䇈㉏ԐѢկ䖯⿟Փ⫼ⱘ ݙḌЎ㞾ᏅֱᣕϔϾ㰮ᄬऎ䯈䯳߫ vmlistˈ䖭ᰃ⬅ϔІ vm_struct ᭄᥂㒧ᵘ㒘៤ⱘϔϾऩ䫒䯳߫DŽ 201 } 200 return area; 199 write_unlock(&vmlist_lock); 198 *p = area; 197 area•>next = *p; 196 area•>size = size; 195 area•>addr = (void *)addr; 194 area•>flags = flags; 193 } 192 } 191 return NULL; 190 kfree(area); 189 write_unlock(&vmlist_lock); 188 if (addr > VMALLOC_END•size) { 187 addr = tmp•>size + (unsigned long) tmp•>addr; 186 break; 185 if (size + addr < (unsigned long) tmp•>addr) 184 } 183 return NULL; 182 kfree(area); 181 write_unlock(&vmlist_lock); 180 if ((size + addr) < addr) { 179 for (p = &vmlist; (tmp = *p) ; p = &tmp•>next) { 178 write_lock(&vmlist_lock); 177 addr = VMALLOC_START; 176 size += PAGE_SIZE; 175 return NULL; 174 if (!area) 173 area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_KERNEL); 172 171 struct vm_struct **p, *tmp, *area; 170 unsigned long addr; 169 { 168 struct vm_struct * get_vm_area(unsigned long size, unsigned long flags) [ioremap()>__ioremap()>get_vm_area()] ==================== mm/vmalloc.c 168 201 ==================== এᇏᡒˈ㗠ᰃҢሲѢݙḌⱘ㰮ᄬऎ䯈䯳߫ЁএᇏᡒDŽߑ᭄ get_vm_area()ᰃ೼ mm/vmalloc.c ЁᅮНⱘ˖ ሲѢݙḌˈ㗠ϡሲѢӏԩϔϾ⡍ᅮⱘ䖯⿟ˈ᠔ҹϡᰃ೼ᶤϾ䖯⿟ⱘ mm_struct 㒧ᵘЁⱘ㰮ᄬऎ䯈䯳߫Ё ᰃ㽕ᡒࠄϔ⠛㰮ᄬഄഔऎ䯈DŽࠡ䴶䆆䖛ˈ䖭⠛ऎ䯈ܜ໛ҹৢˈ䖭ᠡĀ㿔ᔦℷӴāDŽ佪ޚᅠ៤њ䖭ѯ ⬠ᇍ唤ⱘ˄136̚138 㸠 ˅DŽ 160 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 66 unsigned long end = address + size; 65 pgd_t * dir; 64 { 63 unsigned long size, unsigned long flags) 62 static int remap_area_pages(unsigned long address, unsigned long phys_addr, [ioremap()>__ioremap()>remap_area_pages()] ==================== arch/i386/mm/ioremap.c 62 86 ==================== ߑ᭄ remap_area_pages()ⱘҷⷕг೼ arch/i386/mm/ioremap.c Ё˖ ҔМџᚙˈাᰃ㉏ൟ䕀ᤶDŽخᅣᅮН VMALLOC_VMADDR ៥ӀᏆ㒣೼ࠡ䴶ⳟࠄ䖛њˈᅲ䰙Ϟϡ ࠄ䖭⠛ぎ䯈ⱘ䍋ྟഄഔDŽϟ䴶ህᰃᓎゟ᯴ᇘⱘџњDŽ get_vm_area()៤ࡳ䖨ಲᯊˈህᷛᖫⴔ᠔䳔㽕ⱘϔ⠛㰮ᄬぎ䯈Ꮖ㒣ߚ䜡དњˈҢ䖨ಲⱘ᭄᥂㒧ᵘৃҹᕫ ᑊϨ size ੠ addr 䛑ᰃᣝ义䴶䖍⬠ᇍ唤ⱘˈ᠔ҹ 185 㸠ⱘᴵӊᏆ㒣䱤৿ⴔ݊Ё᳝ϔϾ义䴶ⱘᅛ⋲DŽҢ さ৫˛䖭䞠ⱘ༹཭೼Ѣ 185 㸠߸ᅮⱘᴵӊᰃ“<ā㗠 ϡ ᰃ“ <=āˈކ䴶䲒䘧ϡৃ㛑ϢϟϔϾऎ䯈ⱘ䍋ྟഄഔ ऎ䯈ⱘ䍋ྟഄഔˈ䖭ᰃᕜད⧚㾷ⱘDŽৃᰃ 176 㸠೼ऎ䯈໻ᇣϞজࡴњϔϾ义䴶԰Ўぎ⋲DŽ䖭Ͼぎ⋲义 䖭䞠䇏㗙ৃ㛑Ӯ᳝Ͼ䯂乬ˈ185 㸠ⱘ if 䇁হẔᶹⱘᰃᔧࠡⱘ䍋ྟഄഔࡴϞऎ䯈໻ᇣ乏ᇣѢϟϔϾ ⱘぎ⋲˄㾕 132 㸠˅㾷䞞ᕫᕜ⏙Ἦ˖ᰃЎњ֓Ѣᤩᤝৃ㛑ⱘ䍞⬠䆓䯂DŽ ⑤ҷⷕЁⱘ⊼㾷ᇍѢЎҔМ㽕⬭ϔϾ 8MB ⱘぎ⋲ˈҹঞ೼↣⃵ߚ䜡㰮ᄬऎ䯈ᯊг㽕⬭ϟϔϾ义䴶 143 #define VMALLOC_END (FIXADDR_START) 142 #define VMALLOC_VMADDR(x) ((unsigned long)(x)) 141 ~(VMALLOC_OFFSET•1)) 140 #define VMALLOC_START (((unsigned long) high_memory + 2*VMALLOC_OFFSET•1) & \ 139 #define VMALLOC_OFFSET (8*1024*1024) 138 */ 137 * area for the same reason. ;) 136 * The vmalloc() routines leaves a hole of 4kB between each vmalloced 135 * any out•of•bounds memory accesses will hopefully be caught. 134 * physical memory until the kernel virtual memory starts. That means that 133 * current 8MB value just means that there will be a 8MB "hole" after the 132 /* Just any arbitrary offset to the start of the vmalloc VM area: the ==================== include/asm•i386/pgtable.h 132 143 ==================== ᭄˖ ഄഔҹϟ 8MB ໘ߚ䜡DŽЎℸˈ೼ include/asm•i386/pgtable.h ЁᅮНњ VMALLOC_START ㄝ᳝݇ⱘᐌ Ϟ䰤᠔ᇍᑨⱘ㰮ᢳഄഔˈ䖭ᰃ೼㋏㒳߱ྟ࣪ᯊ䆒㕂དⱘDŽᔧݙḌ䳔㽕ϔ⠛㰮ᄬഄഔぎ䯈ᯊˈህҢ䖭Ͼ 䞣ህᕫࠄњݙḌⱘ㰮ᢳഄഔDŽ㗠ব䞣 high_memory ᷛᖫⴔ݋ԧ⠽⧚ݙᄬⱘ⿏أഄഔϞࡴϞϔϾ 3GB ⱘ ҹࠡ䆆䖛ˈݙḌՓ⫼ⱘ㋏㒳ぎ䯈㰮ᢳഄഔϢ⠽⧚ഄഔ䯈ᄬ೼ⴔϔ⾡ㅔऩⱘ᯴ᇘ݇㋏ˈা㽕೼⠽⧚ 18 struct vm_struct * vmlist; ==================== mm/vmalloc.c 18 18 ==================== 19 }; 18 struct vm_struct * next; 17 unsigned long size; 16 void * addr; unsigned long flags; 15 161 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ࠄāЁ䯈Ⳃᔩ乍㗠Ꮖˈা᳝೼Ё䯈Ⳃᔩ乍Ўぎᯊᠡⳳⱘߚ䜡ϔϾDŽ PAE˅ⱘ Pentium CPUˈ㱑✊ᅲ⦄ϝ㑻᯴ᇘˈ݊԰⫼гাᰃĀᡒ˄ܙ᮴݇㋏DŽेՓᇍѢ䞛⫼њ⠽⧚ഄഔᠽ ৃ㾕ˈᇍѢ i386 ⱘѠ㑻义ᓣ᯴ᇘˈাᰃᡞ义䴶Ⳃᔩ乍ᔧ៤Ё䯈Ⳃᔩ乍㗠ᏆˈϢĀߚ䜡āᅲ䰙Ϟ↿ 21 } 20 return (pmd_t *) pgd; 19 BUG(); 18 if (!pgd) 17 { 16 extern inline pmd_t * pmd_alloc(pgd_t *pgd, unsigned long address) [ioremap()>__ioremap()>remap_area_pages()>pmd_alloc()] ==================== include/asm•i386/pgalloc•2level.h 16 21 ==================== ˄㾕 include/asm•i386/pgalloc•2level.h˅˖ 㗠 inline ߑ᭄ pmd_alloc()ⱘᅮН᳝߭ϸϾˈߚ߿⫼ѢѠ㑻੠ϝ㑻᯴ᇘDŽᇍѢѠ㑻᯴ᇘ䖭ϾᅮНЎ 151 #define pmd_alloc_kernel pmd_alloc ==================== include/asm•i386/pgalloc.h 151 151 ==================== ᇍѢ i386 CPU ህᰃ pmd_alloc()ˈᅮНѢ include/asm•i386/pgalloc.h˖ ⬅Ѣ೼ᕾ⦃Ё㰮ᢳഄഔ address ೼ব˄㾕 81 㸠˅ˈ⠽⧚ഄഔгህⳌᑨ㗠বDŽ㄀ 75 㸠ⱘ pmd_alloc_kernel() এ㰮ᢳഄഔᕫߎϔϾ䋳ⱘԡ⿏䞣ˈ䖭Ͼԡ⿏䞣೼ 78̚79 㸠জϢ㰮ᢳഄഔⳌࡴˈҡᮻᕫࠄ⠽⧚ഄഔDŽޣЁ ࠄ᠔ሲⱘⳂᔩ乍ˈ✊ৢህḍ᥂ऎ䯈ⱘ໻ᇣ䍄䘡᠔᳝⍝ঞⱘⳂᔩ乍DŽ䖭䞠ⱘ 68 㸠ⳟԐ༛ᗾDŽҢ⠽⧚ഄഔ ⿄Ў init_mmDŽᔧ✊ˈݙḌг≵᳝ҷ㸼ᅗⱘ task_struct 㒧ᵘˈ᠔ҹ 69 㸠ḍ᥂䍋ྟഄഔҢ init_mm Ёᡒ 义䴶ⳂᔩDŽԚᰃˈݙḌぎ䯈ϡሲѢӏԩϔϾ⡍ᅮⱘ䖯⿟ˈ᠔ҹऩ⣀䆒㕂њϔϾݙḌϧ⫼ⱘ mm_structˈ ៥Ӏ䆆䖛ˈ↣Ͼ䖯⿟ⱘ task_struct 㒧ᵘЁ䛑᳝ϔϾᣛ䩜ᣛ৥ mm_struct 㒧ᵘˈҢЁৃҹᡒࠄⳌᑨⱘ 86 } 85 return 0; 84 flush_tlb_all(); 83 } while (address && (address < end)); 82 dir++; 81 address = (address + PGDIR_SIZE) & PGDIR_MASK; 80 return •ENOMEM; 79 phys_addr + address, flags)) 78 if (remap_area_pmd(pmd, address, end • address, 77 return •ENOMEM; 76 if (!pmd) 75 pmd = pmd_alloc_kernel(dir, address); 74 pmd_t *pmd; 73 do { 72 BUG(); 71 if (address >= end) 70 flush_cache_all(); 69 dir = pgd_offset(&init_mm, address); phys_addr •= address; 68 67 162 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ៤䳊ଂˈbrk()߭ᰃᡍথDŽᑧߑ᭄ malloc()Ў⫼᠋䖯⿟˄malloc ᴀ䑿ህᰃ䆹䖯⿟ⱘϔ䚼ߚ˅㓈ᣕϔϾڣᛇ malloc()ϔ㉏ⱘ C 䇁㿔ᑧߑ᭄˄៪䇁㿔៤ߚˈབ C++Ёⱘ new˅䯈᥹ഄ⫼ࠄ brk()DŽབᵰᡞ malloc() ڣ䗮䖛 Ӏᐌᐌᑊϡᛣ䆚ࠄ೼䇗⫼ brk()ˈॳ಴೼Ѣᕜᇥ᳝ҎӮⳈ᥹Փ⫼㋏㒳䇗⫼ brk()৥㋏㒳⬇䇋ぎ䯈ˈ㗠ᘏᰃ ሑㅵĀৃ㾕ᑺāϡ催ˈbrk()г䆌ᰃ᳔ᐌՓ⫼ⱘ㋏㒳䇗⫼њˈ⫼᠋䖯⿟䗮䖛ᅗ৥ݙḌ⬇䇋ぎ䯈DŽҎ 2.12 ㋏㒳䇗⫼ brk() ϡࠄ init_mm Ёⱘ㰮ᄬऎ䯈ˈ䖭ѯऎ䯈ⱘ义䴶ህ㞾✊ϡӮ㹿ᤶߎ㗠䭓偏ѢݙᄬDŽ 㒧ᵘ init_mm ᰃऩ⣀ⱘˈҢӏԩϔϾ䖯⿟ⱘ task 㒧ᵘЁ䛑ࠄ䖒ϡњ init_mmDŽ᠔ҹˈkswapd ḍᴀህⳟ ᡒߎऴ⫼ݙᄬ义䴶᳔໮ⱘ䖯⿟ˈ✊ৢህᇍ䆹䖯⿟䇗⫼ swap_out_mm()ᤶߎϔѯ义䴶DŽ㗠ݙḌⱘ mm_struct ೼ kswapd ᤶߎ义䴶ⱘᚙ᱃Ёˈ៥ӀᏆ㒣ⳟࠄ kswapd ᅮᳳഄǃᕾ⦃ഄǃձ⃵ഄҢ task 㒧ᵘ䯳߫Ё _PAGE_DIRTYǃ_PAGE_ACCESSED ੠_PAGE_PRESENTEDDŽ 䖭䞠াᰃㅔऩഄ೼ᕾ⦃Ё䆒㕂义䴶㸼Ё᠔᳝⍝ঞⱘ义䴶㸼乍˄31 㸠˅DŽ↣Ͼ㸼乍䛑㹿乘䆒៤ 37 } 36 } while (address && (address < end)); 35 pte++; 34 phys_addr += PAGE_SIZE; 33 address += PAGE_SIZE; 32 _PAGE_DIRTY | _PAGE_ACCESSED | flags))); 31 set_pte(pte, mk_pte_phys(phys_addr, __pgprot(_PAGE_PRESENT | _PAGE_RW | 30 } 29 BUG(); 28 printk("remap_area_pte: page already exists\n"); 27 if (!pte_none(*pte)) { 26 do { 25 BUG(); 24 if (address >= end) 23 end = PMD_SIZE; 22 if (end > PMD_SIZE) 21 end = address + size; 20 address &= ~PMD_MASK; 19 18 unsigned long end; 17 { 16 unsigned long phys_addr, unsigned long flags) 15 static inline void remap_area_pte(pte_t * pte, unsigned long address, unsigned long size, [ioremap()>__ioremap()>remap_area_pages()>remap_area_pmd()>remap_area_pte()] ==================== arch/i386/mm/ioremap.c 15 37 ==================== remap_area_pte()ˈ䖭гᰃ೼ arch/i386/mm/ioremap.c ЁᅮНⱘ˖ ↣ϾЁ䯈Ⳃᔩ乍ᅲ䰙ϞህᰃϔϾ义䴶㸼乍ˈгᇍҹ⧚㾷ЎЁ䯈Ⳃᔩ㸼ⱘ໻ᇣЎ 1˅䇗⫼ remap_area_pmd()DŽ㗠 remap_area_pmd()޴Тᅠܼϔḋˈᇍ⍝ঞࠄⱘ↣Ͼ义䴶㸼˄ᇍ i386 ⱘѠ㑻᯴ᇘˈ 䖭ḋˈremap_area_pages()ЁҢ 73 㸠छྟⱘ do•while ᕾ⦃ˈᇍ⍝ঞࠄⱘ↣Ͼ义䴶Ⳃᔩ㸼乍䇗⫼ 163 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 123 unsigned long newbrk, oldbrk; 122 unsigned long rlim, retval; 121 { 120 asmlinkage unsigned long sys_brk(unsigned long brk) 119 */ 118 * to invoke file system routines that need the global lock. 117 * to a regular file. in this case, the unmapping will need 116 * like trying to un•brk an area that has already been mapped 115 * lock, except when an application is doing something nasty 114 * sys_brk() for the most part doesn't need the global kernel 113 /* [sys_brk()] ==================== mm/mmap.c 113 141 ==================== 䇏㄀ϔ䚼ߚ˖ܜৃҹߚ៤ϸ䚼ߚDŽ៥Ӏ ぎ䯈ˈेᡞࡼᗕߚ䜡ऎᑩ䚼ⱘ䖍⬠ᕔϞ᥼˗гৃҹ⫼ᴹ䞞ᬒˈेᔦ䖬ぎ䯈DŽ಴ℸˈᅗⱘҷⷕг໻㟈Ϟ ㋏㒳䇗⫼ brk()೼ݙḌЁⱘᅲ⦄Ў sys_brk()ˈ݊ҷⷕ೼ mm/mmap.c ЁDŽ䖭Ͼߑ᭄᮶ৃҹ⫼ᴹߚ䜡 བ⠽⧚ぎ䯈Ꮖ㒣ߚ䜡ᅠ˅ˈ៪㗙থ⦄ᮄⱘ䖍⬠Ꮖ㒣䖛њ䘐䖥䆒Ѣ乊䚼ⱘේᷜᯊˈህᢦ㒱ߚ䜡㗠䖨ಲ•1DŽ 䇗⫼ brk()䖨ಲ 0ˈℸৢᮄᮻϸϾ䖍⬠П䯈ⱘ㰮ᄬഄഔህ䛑ৃҹՓ⫼њDŽᔧݙḌথ⦄᮴⊩⒵䎇㽕∖˄՟ 䖍⬠Ⳍࡴˈ᠔ᕫⱘህᰃ᠔㽕∖ⱘᮄ䖍⬠ˈгህᰃ brk()䇗⫼ᯊⱘখ᭄ brkDŽᔧݙḌ㛑⒵䎇㽕∖ᯊˈ㋏㒳 㸼⼎ࡼᗕߚ䜡ऎᔧࠡⱘᑩ䚼DŽᔧϔϾ䖯⿟䳔㽕ߚ䜡ݙᄬᯊˈᇚ㽕∖ⱘ໻ᇣϢ݊ᔧࠡⱘࡼᗕߚ䜡ऎᑩ䚼 ೼ݙḌЁ߭ᇚᔧࠡⱘ䖍⬠䆄ᔩ೼䖯⿟ⱘ mm_struct 㒧ᵘЁDŽ݋ԧഄ䇈ˈmm_struct 㒧ᵘЁ᳝ϔϾ៤ߚ brkˈ 䎱⾏ˈৠᯊݙḌ੠䖯⿟䛑㽕䆄ϟᔧࠡⱘ䖍⬠೼ા䞠DŽ೼䖯⿟䖭ϔ䖍⬅ malloc()៪㉏Ԑⱘᑧߑ᭄ㅵ⧚ˈ㗠 ᓔྟⱘˈ䖭ϾഄഔЎݙḌ੠䖯⿟᠔݅ⶹDŽҹৢˈ↣⃵ࡼᗕߚ䜡ϔഫĀݙᄬāˈ䖭Ͼ䖍⬠ህᕔϞ᥼䖯ϔ↉ ϾᎼ໻ⱘぎ⋲ˈ䖭ህᰃৃҹ೼䖤㸠ᯊࡼᗕߚ䜡ⱘぎ䯈DŽ᳔߱ˈ䖭Ͼࡼᗕߚ䜡ぎ䯈ᰃҢ䖯⿟ⱘ end_data ⏋⎚˅ˈ೼䖤㸠ᯊᑊϡ৥ϞԌሩDŽ㗠Ң᭄᥂↉ⱘ乊䚼 end_data ࠄේᷜ↉ഄഔⱘϟ⊓䖭ϾЁ䯈ऎඳ߭ᰃϔ ⷕ↉੠᭄᥂↉߭೼ᑩ䚼˄⊼ᛣˈϡ㽕Ϣ X86 ㋏㒳㒧ᵘЁ⬅↉ᆘᄬ఼ᓎゟⱘĀҷⷕ↉āঞĀ᭄᥂↉āⳌ DŽ᠔ϡৠⱘᰃˈේᷜЁ䯈ᅝ㕂೼㰮ᄬぎ䯈ⱘ乊䚼ˈ䖤㸠ᯊ⬅乊৥ϟᓊԌ˗ҷ˅ܙህߚ䜡དⱘ˄Ԛৃҹᠽ 义䴶ˈᑊᓎゟདѠ㗙䯈ⱘ᯴ᇘDŽ䰸ℸП໪ˈේᷜՓ⫼ⱘぎ䯈гሲѢ෎ᴀ㽕∖ˈ᠔ҹгᰃ೼ᓎゟ䖯⿟ᯊ 乏ⱘ෎ᴀ㽕∖ˈ᠔ҹݙḌ೼ᓎゟϔϾ䖯⿟ⱘ䖤㸠᯴䈵ᯊህߚ䜡ད䖭ѯぎ䯈ˈࣙᣀ㰮ᄬഄഔऎ䯈੠⠽⧚ ↉Ёࣙᣀњ᠔᳝䴭ᗕߚ䜡ⱘ᭄᥂ぎ䯈ˈࣙᣀܼሔব䞣੠䇈ᯢЎ static ⱘሔ䚼ব䞣DŽ䖭ѯぎ䯈ᰃ䖯⿟᠔ᖙ ᯴䈵᭛ӊЁ᳝ϔϾҷⷕ↉੠ϔϾ᭄᥂↉˄ࣙᣀ data ↉੠ bss ↉˅ˈ݊Ёҷⷕ↉೼ϟˈ᭄᥂↉೼ϞDŽ᭄᥂ 䙷МˈݙḌᗢḋㅵ⧚↣Ͼ䖯⿟ⱘ 3G ᄫ㡖㰮ᄬぎ䯈ਸ਼˛㉫⬹ഄ䇈ˈ⫼᠋⿟ᑣ㒣䖛㓪䆥ǃ䖲᥹ᔶ៤ⱘ 㗠া㛑ᰃ䳔㽕⫼໮ᇥᠡĀߚ䜡ā໮ᇥDŽ 㽕Փ⫼ⱘজᕜᇣˈݙḌϡৃ㛑೼߯ᓎ䖯⿟ᯊህЎᭈϾ㰮ᄬぎ䯈䛑ߚ䜡དⳌᑨⱘ⠽⧚ぎ䯈ᑊᓎゟ᯴ᇘˈ ᄬऎ䯈੠Ⳍᑨⱘ㢹ᑆ⠽⧚义䴶ˈᑊᓎゟ䍋᯴ᇘ݇㋏DŽ⬅Ѣ↣Ͼ䖯⿟ⱘ㰮ᄬぎ䯈䛑ᕜ໻˄3G˅ˈ㗠ᅲ䰙䳔 Փ⫼ˈ㗠䖭⾡᯴ᇘⱘᓎゟ੠ㅵ⧚߭⬅ݙḌ໘⧚DŽ᠔䇧৥ݙḌ⬇䇋ϔഫぎ䯈ˈᰃᣛ䇋∖ݙḌߚ䜡ϔഫ㰮 ぎ䯈˄ݙᄬ៪⺕Ⲭぎ䯈˅ˈᠡⳳℷৃҹټ㣗ೈ䞠ৃҹӏᛣՓ⫼ˈ಴Ў㰮ᄬぎ䯈᳔㒜ᕫ᯴ᇘࠄᶤϾ⠽⧚ᄬ ࠡ䴶䆆䖛ˈ↣Ͼ䖯⿟ᢹ᳝ 3G ᄫ㡖ⱘ⫼᠋㰮ᄬぎ䯈DŽԚᰃˈ䖭ᑊϡᛣੇⴔ⫼᠋䖯⿟೼䖭 3G ᄫ㡖ⱘ থDŽ ᇣҧᑧˈᔧ䖯⿟䳔㽕Փ⫼᳈໮ⱘݙᄬぎ䯈ᯊህ৥ᇣҧᑧ㽕ˈᇣҧᑧЁᄬ䞣ϡ䎇ᯊህ䗮䖛 brk()৥ݙḌᡍ 164 165 124 struct mm_struct *mm = current•>mm; 125 126 down(&mm•>mmap_sem); 127 128 if (brk < mm•>end_code) 129 goto out; 130 newbrk = PAGE_ALIGN(brk); 131 oldbrk = PAGE_ALIGN(mm•>brk); 132 if (oldbrk == newbrk) 133 goto set_brk; 134 135 /* Always allow shrinking brk. */ 136 if (brk <= mm•>brk) { 137 if (!do_munmap(mm, newbrk, oldbrk•newbrk)) 138 goto set_brk; 139 goto out; 140 } 141 খ᭄ brk 㸼⼎᠔㽕∖ⱘᮄ䖍⬠ˈ䖭Ͼ䖍⬠ϡ㛑ԢѢҷⷕ↉ⱘ㒜⚍ˈᑊϨᖙ乏Ϣ义䴶໻ᇣᇍ唤DŽབ ᵰᮄ䖍⬠ԢѢ㗕䖍⬠ˈ䙷ህϡᰃ⬇䇋ߚ䜡ぎ䯈ˈ㗠ᰃ䞞ᬒぎ䯈ˈ᠔ҹ䗮䖛 do_munmap()㾷䰸ϔ䚼ߚऎ 䯈ⱘ᯴ᇘˈ䖭ᰃϾ䞡㽕ⱘߑ᭄DŽ݊ҷⷕ೼ mm/mmap.c Ё˖ ==================== mm/mmap.c 664 696 ==================== [sys_brk()>do_munmap()] 664 /* Munmap is split into 2 main parts •• this part which finds 665 * what needs doing, and the areas themselves, which do the 666 * work. This now handles partial unmappings. 667 * Jeremy Fitzhardine 668 */ 669 int do_munmap(struct mm_struct *mm, unsigned long addr, size_t len) 670 { 671 struct vm_area_struct *mpnt, *prev, **npp, *free, *extra; 672 673 if ((addr & ~PAGE_MASK) || addr > TASK_SIZE || len > TASK_SIZE•addr) 674 return •EINVAL; 675 676 if ((len = PAGE_ALIGN(len)) == 0) 677 return •EINVAL; 678 679 /* Check if this memory area is ok • put it on the temporary 680 * list if so.. The checks here are pretty simple •• 681 * every area affected in some way (by any overlap) is put 682 * on the list. If nothing is put on, nothing is affected. 683 */ 684 mpnt = find_vma_prev(mm, addr, &prev); 685 if (!mpnt) 686 return 0; 687 /* we have addr < mpnt•>vm_end */ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! vm_area_struct 㒧ᵘ extraDŽ঺ϔᮍ䴶ˈ㽕㾷䰸᯴ᇘⱘ䙷䚼ߚぎ䯈г᳝ৃ㛑䎼䍞ད޴Ͼऎ䯈ˈ᠔ҹ䗮䖛ϔ ߚ䜡དϔϾぎⱑⱘܜ⬅Ѣ㾷䰸ϔ䚼ߚぎ䯈ⱘ᯴ᇘ᳝ৃ㛑Փॳᴹⱘऎ䯈ϔߚЎѠˈ᠔ҹ䖭䞠 717 716 spin_unlock(&mm•>page_table_lock); 715 mm•>mmap_cache = NULL; /* Kill the cache. */ 714 } 713 avl_remove(mpnt, &mm•>mmap_avl); 712 if (mm•>mmap_avl) 711 free = mpnt; 710 mpnt•>vm_next = free; 709 *npp = mpnt•>vm_next; 708 for ( ; mpnt && mpnt•>vm_start < addr+len; mpnt = *npp) { 707 spin_lock(&mm•>page_table_lock); 706 free = NULL; 705 npp = (prev ? &prev•>vm_next : &mm•>mmap); 704 703 return •ENOMEM; 702 if (!extra) 701 extra = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); 700 */ 699 * and this is the last chance for an easy error exit. 698 * We may need one additional vma to fix up the mappings ... 697 /* [sys_brk()>do_munmap()] ==================== mm/mmap.c 697 717 ==================== ៥Ӏ㒻㓁ᕔϟⳟ˖ 䆌䖭ḋⱘ᪡԰DŽܕњϞ䰤 MAX_MAP_COUNTˈህϡݡ ॳᴹⱘऎ䯈ϔߚЎѠDŽৃᰃˈϔϾ䖯⿟ৃҹᢹ᳝ⱘ㰮ᄬऎ䯈ⱘ᭄䞣ᰃ᳝䰤ࠊⱘˈ᠔ҹ㢹䖭Ͼ᭄䞣䖒ࠄ ᥹䖨ಲ 0DŽབᵰ䖭䚼ߚぎ䯈㨑೼ᶤϾऎ䯈ⱘЁ䯈ˈ߭೼㾷䰸䖭䚼ߚぎ䯈ⱘ᯴ᇘҹৢӮ䗴៤ϔϾぎ⋲㗠Փ 㗙䆹ऎ䯈ⱘ䍋ྟഄഔг催Ѣ addr+lenˈ䙷ህ㸼⼎ᛇ㽕㾷䰸᯴ᇘⱘ䙷䚼ߚぎ䯈ॳᴹህ≵᳝᯴ᇘˈ᠔ҹⳈ prev 䖨ಲ݊ࠡϔऎ䯈㒧ᵘⱘᣛ䩜DŽㄝϔϟ៥ӀህᇚⳟࠄЎҔМ䳔㽕䖭Ͼᣛ䩜DŽབᵰ䖨ಲⱘᣛ䩜Ў 0ˈ៪ ㄀ϔϾऎ䯈ˈབᵰᡒࠄˈ߭ߑ᭄䖨ಲ䆹ऎ䯈ⱘ vm_area_struct 㒧ᵘᣛ䩜DŽϡৠⱘᰃˈᅗৠᯊ䖬䗮䖛খ᭄ Ⳍৠˈᅗᠿᦣᔧࠡ䖯⿟⫼᠋ぎ䯈ⱘ vm_area_struct 㒧ᵘ䫒㸼៪ AVL ᷥˈ䆩೒ᡒࠄ㒧ᴳഄഔ催Ѣ addr ⱘ ߑ᭄ find_vma_prev()ⱘ԰⫼Ϣҹࠡ೼Ā޴Ͼ䞡㽕ⱘ᭄᥂㒧ᵘ੠ߑ᭄āϔ㡖Ё䇏䖛ⱘ find_vma()෎ᴀ 696 695 return •ENOMEM; 694 && mm•>map_count >= MAX_MAP_COUNT) 693 if ((mpnt•>vm_start < addr && mpnt•>vm_end > addr+len) 692 /* If we'll make "hole", check the vm areas limit */ 691 690 return 0; if (mpnt•>vm_start >= addr+len) 689 688 166 167 Ͼ for ᕾ⦃ᡞ᠔᳝⍝ঞⱘऎ䯈䛑䕀⿏ࠄϔϾЈᯊ䯳߫ free Ёˈབᵰᓎゟњ AVL ᷥˈ߭г㽕ᡞ䖭ѯऎ䯈 ⱘ vm_area_struct 㒧ᵘҢ AVL ᷥЁߴ䰸DŽҹࠡ䆆䖛ˈmm_struct 㒧ᵘЁⱘᣛ䩜 mmap_cache ᣛ৥Ϟϔ⃵ find_vma()᪡԰ⱘᇍ䈵ˈ಴Ўᇍ㰮ᄬऎ䯈ⱘ᪡԰ᕔᕔᰃ᳝䖲㓁ᗻⱘ˄㾕 find_vma()ⱘҷⷕ˅ˈ㗠⦄೼⫼᠋ ぎ䯈ⱘ㒧ᵘ᳝њব࣪ˈ໮ञᏆ㒣ᠧ⸈њ䖭⾡䖲㓁ᗻˈ᠔ҹᡞᅗ⏙៤ 0DŽ㟇ℸˈᏆ㒣ᅠ៤њ᠔᳝ⱘޚ໛ˈ ϟ䴶ህ㽕݋ԧ㾷䰸᯴ᇘњDŽ ==================== mm/mmap.c 718 762 ==================== [sys_brk()>do_munmap()] 718 /* Ok • we have the memory areas we should free on the 'free' list, 719 * so release them, and unmap the page range.. 720 * If the one of the segments is only being partially unmapped, 721 * it will put new vm_area_struct(s) into the address space. 722 * In that case we have to be careful with VM_DENYWRITE. 723 */ 724 while ((mpnt = free) != NULL) { 725 unsigned long st, end, size; 726 struct file *file = NULL; 727 728 free = free•>vm_next; 729 730 st = addr < mpnt•>vm_start ? mpnt•>vm_start : addr; 731 end = addr+len; 732 end = end > mpnt•>vm_end ? mpnt•>vm_end : end; 733 size = end • st; 734 735 if (mpnt•>vm_flags & VM_DENYWRITE && 736 (st != mpnt•>vm_start || end != mpnt•>vm_end) && 737 (file = mpnt•>vm_file) != NULL) { 738 atomic_dec(&file•>f_dentry•>d_inode•>i_writecount); 739 } 740 remove_shared_vm_struct(mpnt); 741 mm•>map_count••; 742 743 flush_cache_range(mm, st, end); 744 zap_page_range(mm, st, size); 745 flush_tlb_range(mm, st, end); 746 747 /* 748 * Fix the mapping, and free the old area if it wasn't reused. 749 */ 750 extra = unmap_fixup(mm, mpnt, st, size, extra); 751 if (file) 752 atomic_inc(&file•>f_dentry•>d_inode•>i_writecount); 753 } 754 755 /* Release the extra vma struct if it wasn't used */ 756 if (extra) 757 kmem_cache_free(vm_area_cachep, extra); Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 375 /* 374 spin_unlock(&mm•>page_table_lock); 373 } while (address && (address < end)); 372 dir++; 371 address = (address + PGDIR_SIZE) & PGDIR_MASK; 370 freed += zap_pmd_range(mm, dir, address, end • address); 369 do { 368 spin_lock(&mm•>page_table_lock); 367 BUG(); 366 if (address >= end) 365 */ 364 * process we _want_ it to get stuck. 363 * even if kswapd happened to be looking at this 362 * lock only protects against kswapd anyway, and 361 * There's no contention, because the page table 360 * This is a long•lived spinlock. That's fine. 359 /* 358 357 dir = pgd_offset(mm, address); 356 355 int freed = 0; 354 unsigned long end = address + size; 353 pgd_t * dir; 352 { 351 void zap_page_range(struct mm_struct *mm, unsigned long address, unsigned long size) 350 */ 349 * remove user pages in a given range. 348 /* [sys_brk()>do_munmap()>zap_page_range()] ==================== mm/memory.c 348 383 ==================== Ϟ⠽⧚义䴶ⱘᓩ⫼ˈ䖭ᠡᰃ៥Ӏ೼䖭䞠᠔Џ㽕݇ᖗⱘDŽ݊ҷⷕ೼ mm/memory.c Ё˖ ҷⷕЁⱘ zap_page_range()㾷䰸㢹ᑆ䖲㓁义䴶ⱘ᯴ᇘˈᑊϨ䞞ᬒ᠔᯴ᇘⱘݙᄬ义䴶ˈ៪ᇍѸᤶ䆒໛ ᇚ݊ vm_area_struct 㒧ᵘҢⳂᷛ᭛ӊⱘ inode 㒧ᵘݙⱘ i_mapping 䯳߫Ё㜅䫒DŽ 752 㸠˅DŽৠᯊˈ䖬㽕䗮䖛 remove_shared_vm_struct()ⳟⳟ᠔໘⧚ⱘऎ䯈ᰃ৺ᰃ䖭ḋⱘऎ䯈ˈབᵰᰃˈህ 䆹᭛ӊⱘ inode 㒧ᵘЁⱘϔϾ䅵఼᭄ i_writecountˈҹֱ䆕Ѧ᭹ˈࠄ᪡԰ᅠ៤ҹৢݡќᘶ໡˄751̚ޣ㽕䗦 ᭹DŽབᵰ㽕㾷䰸᯴ᇘⱘাᰃ䖭ḋⱘऎ䯈ⱘϔ䚼ߚ˄735̚737 㸠˅ˈ䙷ህⳌᔧѢᇍℸऎ䯈ⱘݭ᪡԰ˈ᠔ҹ ⱘ䖯⿟ᠧᓔˈᑊ䗮䖛ᐌ㾘ⱘ᭛ӊ᪡԰䆓䯂ˈ߭೼Ѡ㗙ᇍℸ᭛ӊⱘϸ⾡ϡৠᔶᓣⱘݭ᪡԰П䯈㽕ࡴҹѦ 䆓䯂ݙᄬϔḋᴹ䆓䯂䖭Ͼ᭛ӊDŽԚᰃˈབᵰ䖭Ͼ᭛ӊৠᯊজ㹿߿ڣࠄ݊⫼᠋ぎ䯈ⱘᶤϾऎ䯈ˈ✊ৢህ ᯊⱘ䯳߫ free ЁDŽ೼ϟϔ㡖Ё䇏㗙ᇚⳟࠄˈϔϾ䖯⿟ৃҹ䗮䖛㋏㒳䇗⫼ mmap()ᇚϔϾ᭛ӊⱘݙᆍ᯴ᇘ 䖭䞠䗮䖛ϔϾ while ᕾ⦃䗤Ͼ໘⧚᠔⍝ঞⱘऎ䯈ˈ䖭ѯऎ䯈ⱘ vm_area_struct 㒧ᵘ䛑䫒᥹೼ϔϾЈ 762 } 761 return 0; 760 free_pgtables(mm, prev, addr, addr+len); 759 758 168 169 376 * Update rss for the mm_struct (not necessarily current•>mm) 377 * Notice that rss is an unsigned long. 378 */ 379 if (mm•>rss > freed) 380 mm•>rss •= freed; 381 else 382 mm•>rss = 0; 383 } 䖭Ͼߑ᭄㾷䰸ϔഫ㰮ᄬऎ䯈ⱘ义䴶᯴ᇘDŽ佪ܜ䗮䖛 pgd_offset()೼㄀ϔሖ义䴶ⳂᔩЁᡒࠄ䍋ྟഄഔ ᠔ሲⱘⳂᔩ乍ˈ✊ৢህ䗮䖛ϔϾ do•while ᕾ⦃Ң䖭ϾⳂᔩ乍ᓔྟ໘⧚⍝ঞⱘ᠔᳝Ⳃᔩ乍DŽ ==================== include/asm•i386/pgtable.h 312 312 ==================== 312 #define pgd_index(address) ((address >> PGDIR_SHIFT) & (PTRS_PER_PGD•1)) ==================== include/asm•i386/pgtable.h 316 316 ==================== 316 #define pgd_offset(mm, address) ((mm)•>pgd+pgd_index(address)) ᇍѢ⍝ঞⱘ↣ϔϾⳂᔩ乍ˈ䗮䖛 zap_pmd_range()໘⧚㄀ѠሖⱘЁ䯈Ⳃᔩ㸼DŽ ==================== mm/memory.c 321 346 ==================== [sys_brk()>do_munmap()>zap_page_range()>zap_pmd_range()] 321 static inline int zap_pmd_range(struct mm_struct *mm, pgd_t * dir, unsigned long address, unsigned long size) 322 { 323 pmd_t * pmd; 324 unsigned long end; 325 int freed; 326 327 if (pgd_none(*dir)) 328 return 0; 329 if (pgd_bad(*dir)) { 330 pgd_ERROR(*dir); 331 pgd_clear(dir); 332 return 0; 333 } 334 pmd = pmd_offset(dir, address); 335 address &= ~PGDIR_MASK; 336 end = address + size; 337 if (end > PGDIR_SIZE) 338 end = PGDIR_SIZE; 339 freed = 0; 340 do { 341 freed += zap_pte_range(mm, pmd, address, end • address); 342 address = (address + PMD_SIZE) & PMD_MASK; 343 pmd++; 344 } while (address < end); 345 return freed; 346 } ৠḋˈܜ䗮䖛 pmd_offset()ˈ೼㄀ѠሖⳂᔩ㸼Ёᡒࠄ䍋ྟⳂᔩ乍DŽᇍѢ䞛⫼Ѡ㑻᯴ᇘⱘ i386 㒧ᵘˈ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᡒࠄ೼㒭ᅮ义䴶㸼Ёⱘ䍋ྟ㸼乍ˈϢ pte_offset()᳝݇ⱘᅮН೼ include/asm•i386/pgtable.h Ё˖ܜ䖬ᰃ 319 } 318 return freed; 317 } 316 freed += free_pte(page); 315 continue; 314 if (pte_none(page)) 313 size••; 312 pte++; 311 page = ptep_get_and_clear(pte); 310 break; 309 if (!size) 308 pte_t page; 307 for (;;) { 306 freed = 0; 305 size >>= PAGE_SHIFT; 304 size = PMD_SIZE • address; 303 if (address + size > PMD_SIZE) 302 address &= ~PMD_MASK; 301 pte = pte_offset(pmd, address); 300 } 299 return 0; 298 pmd_clear(pmd); 297 pmd_ERROR(*pmd); 296 if (pmd_bad(*pmd)) { 295 return 0; 294 if (pmd_none(*pmd)) 293 292 int freed; 291 pte_t * pte; 290 { size) 289 static inline int zap_pte_range(struct mm_struct *mm, pmd_t * pmd, unsigned long address, unsigned long [sys_brk()>do_munmap()>zap_page_range()>zap_pmd_range()>zap_pte_range()] ==================== mm/memory.c 289 319 ==================== 义䴶᯴ᇘ㸼њDŽ ⱘџ䞡໡њϔ䘡DŽϡ䖛ˈ䖭ϔ⃵䞡໡䇗⫼ⱘᰃ zap_pte_range()ˈ໘⧚ⱘᰃᑩሖⱘخzap_page_range()᠔ гህᰃ䇈ᡞ㄀ϔሖⳂᔩᔧ៤њЁ䯈ⳂᔩDŽ᠔ҹˈᇍѢѠ㑻᯴ᇘˈzap_pmd_range()೼ᶤ⾡ᛣНϞাᰃᡞ ৃ㾕ˈpmd_offset()ᡞᣛ৥㄀ϔሖⳂᔩ乍ⱘᣛ䩜ॳᇕϡࡼഄ԰Ўᣛ৥Ё䯈Ⳃᔩ乍ⱘᣛ䩜䖨ಲᴹњˈ 56 } 55 return (pmd_t *) dir; 54 { 53 extern inline pmd_t * pmd_offset(pgd_t * dir, unsigned long address) ==================== include/asm•i386/pgtable•2level.h 53 56 ==================== Ё䯈Ⳃᔩ㸼䖭ϔሖᰃぎⱘDŽpmd_offset()ⱘᅮН೼ include/asm•i386/pgtable•2level.h Ё˖ 170 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ==================== mm/swap_state.c 133 150 ==================== mm/swap_state.c Ё˖ ᑊ೼Ⳍᑨⱘ address_space 㒧ᵘЁᇚ݊⿏ܹ dirty_pages 䯳߫DŽߑ᭄ free_page_and_swap_cache()ⱘҷⷕ೼ try_to_swap_out()ҹৢᏆ㹿ݭ䖛ˈ߭䖬㽕䗮䖛 set_page_dirty()䆒㕂䆹义䴶 page 㒧ᵘЁⱘ PG_dirtyᷛᖫԡˈ free_page_and_swap_cache()㾷䰸ᇍⲬϞ义䴶੠ݙᄬ义䴶Ѡ㗙ⱘՓ⫼DŽℸ໪ˈབᵰ义䴶೼᳔䖥ϔ⃵ Ў 0DŽডПˈ߭㽕䗮䖛ৢޣϞ义䴶ⱘ᳔ৢϔϾ⫼᠋˄៪ᚳϔⱘ⫼᠋˅ˈ߭䆹䅵᭄䗦 Ϟ义䴶ⱘՓ⫼䅵᭄ˈা᳝ᔧ䖭Ͼ䅵᭄䖒ࠄ 0 ᯊᠡⳳℷഄ䞞ᬒњ䖭ϾⲬϞ义䴶DŽབᵰᔧࠡ䖯⿟ᰃ䖭ϾⲬ Ⲭޣᰃ䗦ܜ᠔ҹা䳔䇗⫼ swap_free()㾷䰸ᇍѸᤶ䆒໛ϞⱘĀⲬϞ义䴶āⱘՓ⫼DŽᔧ✊ˈswap_free()佪 བᵰ义䴶㸼乍㸼ᯢ೼㾷䰸᯴ᇘࠡ义䴶ህᏆϡ೼ݙᄬˈ߭ᔧࠡ䖯⿟ᇍ䆹ݙᄬ义䴶ⱘՓ⫼Ꮖ㒣㾷䰸ˈ 279 } 278 return 0; 277 swap_free(pte_to_swp_entry(pte)); 276 } 275 return 1; 274 free_page_and_swap_cache(page); 273 set_page_dirty(page); 272 if (pte_dirty(pte) && page•>mapping) 271 */ 270 * entries. We may now have to do it manually. 269 * free_page() used to be able to clear swap cache 268 /* 267 return 0; 266 if ((!VALID_PAGE(page)) || PageReserved(page)) 265 struct page *page = pte_page(pte); 264 if (pte_present(pte)) { 263 { 262 static inline int free_pte(pte_t pte) 261 */ 260 * Return indicates whether a page was freed so caller can adjust rss 259 /* [sys_brk()>do_munmap()>zap_page_range()>zap_pmd_range()>zap_pte_range()>free_pte()] ==================== mm/memory.c 259 279 ==================== ᳔ৢ䗮䖛 free_pte()㾷䰸ᇍݙᄬ义䴶ҹঞⲬϞ义䴶ⱘՓ⫼ˈ䖭Ͼߑ᭄ⱘҷⷕ೼ mm/memory.c Ё˖ 57 #define ptep_get_and_clear(xp) __pte(xchg(&(xp)•>pte_low, 0)) ==================== include/asm•i386/pgtable•2level.h 57 57 ==================== ✊ৢህᰃ೼ϔϾ for ᕾ⦃Ёˈᇍ䳔㽕㾷䰸᯴ᇘⱘ义䴶䇗⫼ ptep_get_and_clear()ᇚ义䴶㸼乍⏙៤ 0˖ 328 __pte_offset(address)) 327 #define pte_offset(dir, address) ((pte_t *) pmd_page(*(dir)) + \ 326 ((address >> PAGE_SHIFT) & (PTRS_PER_PTE • 1)) 325 #define __pte_offset(address) \ 324 /* Find an entry in the third•level page table.. */ include/asm•i386/pgtable.h 324 328 ==================== ==================== 171 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 119 page_cache_release(page); 118 spin_unlock(&pagecache_lock); 117 __delete_from_swap_cache(page); 116 ClearPageDirty(page); 115 spin_lock(&pagecache_lock); 114 113 lru_cache_del(page); 112 if (block_flushpage(page, 0)) 111 110 BUG(); 109 if (!PageLocked(page)) 108 { 107 void delete_from_swap_cache_nolock(struct page *page) 106 */ 105 * a reference on the page. 104 * This will never put the page into the free list, the caller has 103 /* >delete_from_swap_cache_nolock()] [sys_brk()>do_munmap()>zap_page_range()>zap_pmd_range()>zap_pte_range()>free_pte()>free_page_and_swap_cache() ==================== mm/swap_state.c 103 120 ==================== delete_from_swap_cache_nolock()ᇚ义䴶ҢϞ䗄䯳߫Ё㜅⾏ߎᴹDŽ བᵰᔧࠡ䖯⿟ᰃ䖭Ͼ义䴶ⱘ᳔ৢϔϾ⫼᠋˄៪ᚳϔ⫼᠋˅ˈℸᯊ֓㽕䇗⫼ 䯳߫DŽᔧϔϾ义䴶೼ᶤϾᤶܹˋᤶߎ䯳߫Ёᯊˈ݊ page 㒧ᵘЁⱘ PG_swap_cache ᷛᖫԡЎ 1ˈޥϾᴖ े active_listǃinactive_dirty_list ៪㗙ᶤϾ inactive_clean_list Пϔ˗᳔ৢህᰃ䗮䖛ᣛ䩜 next_hash 䫒ܹϔ clean_pagesǃdirty_pages ҹঞ locked_pages ϝϾ䯳߫Пϔ˗Ѡᰃ䗮䖛݊䯳߫༈ lru 䫒ܹᶤϾ LRU 䯳߫ˈ ϝϾ䯳߫ЁDŽϔᰃ䗮䖛݊䯳߫༈ list 䫒ܹᶤϾᤶܹˋᤶߎ䯳߫ˈेⳌᑨ address_space 㒧ᵘЁⱘ ҹࠡ䆆䖛ˈϔϾ᳝⫼᠋ぎ䯈᯴ᇘǃৃᤶߎⱘݙᄬ义䴶˄⹂ߛഄ䇈ᰃᅗⱘ page ᭄᥂㒧ᵘ˅ˈৠᯊ೼ 150 } 149 page_cache_release(page); 148 } 147 UnlockPage(page); 146 } 145 delete_from_swap_cache_nolock(page); 144 if (!is_page_shared(page)) { 143 if (PageSwapCache(page) && !TryLockPage(page)) { 142 */ 141 * If we are the only user, then try to free up the swap cache. 140 /* 139 { 138 void free_page_and_swap_cache(struct page *page) 137 */ 136 * as we are holding the page_table_lock spinlock. 135 * this page if it is the last user of the page. Can not do a lock_page, 134 * Perform a free_page(), also freeing any swap cache associated with 133 /* sys_brk()>do_munmap()>zap_page_range()>zap_pmd_range()>zap_pte_range()>free_pte()>free_page_and_swap_cache()]] 172 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 519 * This function works out what part of an area is affected and 518 * function. This may be used as part of a more specific routine. 517 * This function is the default for when an area has no specific 516 /* Normal function to fix up a mapping [sys_brk()>do_munmap()>unmap_fixup()] ==================== mm/mmap.c 516 604 ==================== mm/mmap.c Ё˖ ಴㗠䳔㽕ᦦܹϔϾᮄⱘ vm_area_struct ᭄᥂㒧ᵘDŽ䖭ѯ᪡԰ᰃ⬅ unmap_fixup()ᅠ៤ⱘˈ݊ҷⷕ೼ ऎ䯈䛑㾷䰸њ᯴ᇘˈ߭㽕䞞ᬒॳ᳝ⱘ vm_area_struct ᭄᥂㒧ᵘ˗঺ϔᮍ䴶ॳᴹⱘऎ䯈䖬ৃ㛑㽕ϔߚЎѠˈ ⱘ vm_area_struct ᭄᥂㒧ᵘ੠䖯⿟ⱘ mm_struct ᭄᥂㒧ᵘ԰ߎ䇗ᭈˈҹড᯴Ꮖ㒣থ⫳ⱘব࣪ˈབᵰᭈϾ ᔧಲࠄ do_munmap()Ёⱘᯊ׭ˈᏆ㒣ᅠ៤њᇍϔϾ㰮ᄬऎ䯈ⱘ᪡԰DŽℸᯊˈϔᮍ䴶㽕ᇍ㰮ᄬऎ䯈 ぎ䯆义䴶䯳߫ЁDŽ 㸠˅জ䇗⫼њϔ⃵ page_cache_release()ˈ䖭ϔ⃵ህՓ݊ব៤њ 0ˈѢᰃህ᳔㒜ᡞ义䴶䞞ᬒˈ䅽ᅗಲࠄњ 㸠˅䇗⫼њϔ⃵ page_cache_release()ህՓ݊ব៤њ 1DŽݡ䖨ಲࠄ free_page_and_swap_cache()Ё ˈ䖭 䞠˄ 149 㾷䰸᯴ᇘПࠡ义䴶೼ݙᄬЁ˄㾕Ϟ䴶 free_pte()Ёⱘ 264 㸠˅ˈ᠔ҹ义䴶ⱘՓ⫼䅵᭄ᑨ䆹ᰃ 2ˈ䖭 䞠˄ 119 page 㒧ᵘЁⱘՓ⫼䅵᭄DŽ⬅Ѣᔧࠡ䖯⿟ᰃ义䴶ⱘ᳔ৢϔϾ⫼᠋ˈᑊϨ೼ ޣpage_cache_release()ˈे䗦 ✊ৢˈгᰃ䗮䖛 swap_free() 䞞ᬒⲬϞ义䴶ˈಲࠄ delete_from_swap_cache_nolock() DŽ᳔ৢᰃ 䯳߫Ё㜅⾏ߎᴹDŽޥ䖭䞠ⱘ remove_from_swap_cache()ᇚ义䴶ⱘ page 㒧ᵘҢᤶܹˋᤶߎ䯳߫੠ᴖ 101 } 100 swap_free(entry); 99 remove_from_swap_cache(page); 98 #endif 97 swap_cache_del_total++; 96 #ifdef SWAP_CACHE_INFO 95 94 entry.val = page•>index; 93 92 swp_entry_t entry; 91 { 90 void __delete_from_swap_cache(struct page *page) 89 */ 88 * been verified to be in the swap cache. 87 * This must be called only on pages that have 86 /* >delete_from_swap_cache_nolock()>__delete_from_swap_cache()] [sys_brk()>do_munmap()>zap_page_range()>zap_pmd_range()>zap_pte_range()>free_pte()>free_page_and_swap_cache() ==================== mm/swap_state.c 86 101 ==================== __delete_from_swap_cache()ˈՓ义䴶㜅⾏݊ᅗϸϾ䯳߫DŽ ࠋҹৢˈህ䗮䖛 lru_cache_del()ᇚ义䴶Ң݊᠔೼ⱘ LRU 䯳߫Ё㜅⾏ߎᴹDŽ✊ৢˈݡ䗮䖛ކњ Ͼ᯴ᇘࠄ⫼᠋ぎ䯈ⱘ᭛ӊᯊᠡ䖯㸠ˈ಴ЎᇍѢѸᤶ䆒໛Ϟⱘ义䴶ˈℸᯊⱘݙᆍᏆ㒣≵᳝ᛣНњDŽᅠ៤ ࠋҙ೼义䴶ᴹ㞾ϔކ⾡ࠋāࠄഫ䆒໛Ϟˈϡ䖛ᅲ䰙Ϟ䖭ކ䗮䖛 block_flushpage()ᡞ义䴶ⱘݙᆍĀܜ { 120 173 174 520 * adjusts the mapping information. Since the actual page 521 * manipulation is done in do_mmap(), none need be done here, 522 * though it would probably be more appropriate. 523 * 524 * By the time this function is called, the area struct has been 525 * removed from the process mapping list, so it needs to be 526 * reinserted if necessary. 527 * 528 * The 4 main cases are: 529 * Unmapping the whole area 530 * Unmapping from the start of the segment to a point in it 531 * Unmapping from an intermediate point to the end 532 * Unmapping between to intermediate points, making a hole. 533 * 534 * Case 4 involves the creation of 2 new areas, for each side of 535 * the hole. If possible, we reuse the existing area rather than 536 * allocate a new one, and the return indicates whether the old 537 * area was reused. 538 */ 539 static struct vm_area_struct * unmap_fixup(struct mm_struct *mm, 540 struct vm_area_struct *area, unsigned long addr, size_t len, 541 struct vm_area_struct *extra) 542 { 543 struct vm_area_struct *mpnt; 544 unsigned long end = addr + len; 545 546 area•>vm_mm•>total_vm •= len >> PAGE_SHIFT; 547 if (area•>vm_flags & VM_LOCKED) 548 area•>vm_mm•>locked_vm •= len >> PAGE_SHIFT; 549 550 /* Unmapping the whole area. */ 551 if (addr == area•>vm_start && end == area•>vm_end) { 552 if (area•>vm_ops && area•>vm_ops•>close) 553 area•>vm_ops•>close(area); 554 if (area•>vm_file) 555 fput(area•>vm_file); 556 kmem_cache_free(vm_area_cachep, area); 557 return extra; 558 } 559 560 /* Work out to one of the ends. */ 561 if (end == area•>vm_end) { 562 area•>vm_end = addr; 563 lock_vma_mappings(area); 564 spin_lock(&mm•>page_table_lock); 565 } else if (addr == area•>vm_start) { 566 area•>vm_pgoff += (end • area•>vm_start) >> PAGE_SHIFT; 567 area•>vm_start = end; 568 lock_vma_mappings(area); Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 175 569 spin_lock(&mm•>page_table_lock); 570 } else { 571 /* Unmapping a hole: area•>vm_start < addr <= end < area•>vm_end */ 572 /* Add end mapping •• leave beginning for below */ 573 mpnt = extra; 574 extra = NULL; 575 576 mpnt•>vm_mm = area•>vm_mm; 577 mpnt•>vm_start = end; 578 mpnt•>vm_end = area•>vm_end; 579 mpnt•>vm_page_prot = area•>vm_page_prot; 580 mpnt•>vm_flags = area•>vm_flags; 581 mpnt•>vm_raend = 0; 582 mpnt•>vm_ops = area•>vm_ops; 583 mpnt•>vm_pgoff = area•>vm_pgoff + ((end • area•>vm_start) >> PAGE_SHIFT); 584 mpnt•>vm_file = area•>vm_file; 585 mpnt•>vm_private_data = area•>vm_private_data; 586 if (mpnt•>vm_file) 587 get_file(mpnt•>vm_file); 588 if (mpnt•>vm_ops && mpnt•>vm_ops•>open) 589 mpnt•>vm_ops•>open(mpnt); 590 area•>vm_end = addr; /* Truncate area */ 591 592 /* Because mpnt•>vm_file == area•>vm_file this locks 593 * things correctly. 594 */ 595 lock_vma_mappings(area); 596 spin_lock(&mm•>page_table_lock); 597 __insert_vm_struct(mm, mpnt); 598 } 599 600 __insert_vm_struct(mm, area); 601 spin_unlock(&mm•>page_table_lock); 602 unlock_vma_mappings(area); 603 return extra; 604 } ៥Ӏᡞ䖭↉ҷⷕ⬭㒭䇏㗙DŽ᳔ৢˈᔧᕾ⦃㒧ᴳПᯊˈ⬅ѢᏆ㒣㾷䰸њϔѯ义䴶ⱘ᯴ᇘˈ᳝ѯ义䴶 ᯴ᇘ㸼ৃ㛑ᭈϾ䛑Ꮖ㒣ぎⱑˈᇍѢ䖭ḋⱘ义䴶㸼˄᠔ऴⱘ义䴶˅г㽕ࡴҹ䞞ᬒDŽ䖭ᰃ⬅ free_pgtables() ᅠ៤ⱘDŽ៥Ӏгᡞᅗⱘҷⷕ⬭㒭䇏㗙˄mm/mmap.c˅DŽ ==================== mm/mmap.c 606 662 ==================== [sys_brk()>do_munmap()>free_pgtables()] 606 /* 607 * Try to free as many page directory entries as we can, 608 * without having to work very hard at actually scanning 609 * the page tables themselves. 610 * 611 * Right now we try to free page tables if we have a nice Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 176 612 * PGDIR•aligned area that got free'd up. We could be more 613 * granular if we want to, but this is fast and simple, 614 * and covers the bad cases. 615 * 616 * "prev", if it exists, points to a vma before the one 617 * we just free'd • but there's no telling how much before. 618 */ 619 static void free_pgtables(struct mm_struct * mm, struct vm_area_struct *prev, 620 unsigned long start, unsigned long end) 621 { 622 unsigned long first = start & PGDIR_MASK; 623 unsigned long last = end + PGDIR_SIZE • 1; 624 unsigned long start_index, end_index; 625 626 if (!prev) { 627 prev = mm•>mmap; 628 if (!prev) 629 goto no_mmaps; 630 if (prev•>vm_end > start) { 631 if (last > prev•>vm_start) 632 last = prev•>vm_start; 633 goto no_mmaps; 634 } 635 } 636 for (;;) { 637 struct vm_area_struct *next = prev•>vm_next; 638 639 if (next) { 640 if (next•>vm_start < start) { 641 prev = next; 642 continue; 643 } 644 if (last > next•>vm_start) 645 last = next•>vm_start; 646 } 647 if (prev•>vm_end > first) 648 first = prev•>vm_end + PGDIR_SIZE • 1; 649 break; 650 } 651 no_mmaps: 652 /* 653 * If the PGD bits are not consecutive in the virtual address, the 654 * old method of shifting the VA >> by PGDIR_SHIFT doesn't work. 655 */ 656 start_index = pgd_index(first); 657 end_index = pgd_index(last); 658 if (end_index > start_index) { 659 clear_page_tables(mm, start_index, end_index • start_index); 660 flush_tlb_pgtables(mm, first & PGDIR_MASK, last & PGDIR_MASK); Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 177 661 } 662 } ಲࠄ sys_brk()ⱘҷⷕЁˈ៥ӀᏆ㒣ᅠ៤њ䗮䖛 sys_brk()䞞ᬒぎ䯈ⱘᚙ᱃ߚᵤDŽ བᵰᮄ䖍⬠催Ѣ㗕䖍⬠ˈህ㸼⼎㽕ߚ䜡ぎ䯈ˈ䖭ህᰃ sys_brk()ⱘৢϔ䚼ߚDŽ៥Ӏ㒻㓁ᕔϟⳟ ˄mm/mmap.c˅˖ ==================== mm/mmap.c 142 164 ==================== [sys_brk()] 142 /* Check against rlimit.. */ 143 rlim = current•>rlim[RLIMIT_DATA].rlim_cur; 144 if (rlim < RLIM_INFINITY && brk • mm•>start_data > rlim) 145 goto out; 146 147 /* Check against existing mmap mappings. */ 148 if (find_vma_intersection(mm, oldbrk, newbrk+PAGE_SIZE)) 149 goto out; 150 151 /* Check if we have enough memory.. */ 152 if (!vm_enough_memory((newbrk•oldbrk) >> PAGE_SHIFT)) 153 goto out; 154 155 /* Ok, looks good • let it rip. */ 156 if (do_brk(oldbrk, newbrk•oldbrk) != oldbrk) 157 goto out; 158 set_brk: 159 mm•>brk = brk; 160 out: 161 retval = mm•>brk; 162 up(&mm•>mmap_sem); 163 return retval; 164 } 佪ܜẔᶹᇍ䖯⿟ⱘ䌘⑤䰤ࠊˈབᵰ᠔㽕∖ⱘᮄ䖍⬠Փ᭄᥂↉ⱘ໻ᇣ䍙䖛њᇍᔧࠡ䖯⿟ⱘ䰤ࠊˈህ ᢦ㒱ᠻ㸠DŽℸ໪ˈ䖬㽕䗮䖛 find_vma_intersection()ˈẔᶹ᠔㽕∖ⱘ䙷䚼ߚぎ䯈ᰃ৺ϢᏆ㒣ᄬ೼ⱘᶤϔ ऎ䯈Ⳍކさˈ䖭Ͼ inline ߑ᭄ⱘҷⷕ೼ include/linux/mm.h Ё˖ ==================== include/linux/mm.h 511 520 ==================== [sys_brk()>find_vma_intersection()] 511 /* Look up the first VMA which intersects the interval start_addr..end_addr•1, 512 NULL if none. Assume start_addr < end_addr. */ 513 static inline struct vm_area_struct * find_vma_intersection(struct mm_struct * mm, unsigned long start_addr, unsigned long end_addr) 514 { 515 struct vm_area_struct * vma = find_vma(mm,start_addr); 516 517 if (vma && end_addr <= vma•>vm_start) 518 vma = NULL; 519 return vma; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 782 struct mm_struct * mm = current•>mm; 781 { 780 unsigned long do_brk(unsigned long addr, unsigned long len) 779 */ 778 * brk•specific accounting here. 777 * anonymous maps. eventually we may be able to do some 776 * this is really a simplified "do_mmap". it only handles 775 /* [sys_brk()>do_brk()] ==================== mm/mmap.c 775 861 ==================== 䗮䖛њ䖭ѯẔᶹˈ᥹ⴔህᰃ᪡԰ⱘЏԧ do_brk()њDŽ䖭Ͼߑ᭄ⱘҷⷕ೼ mm/mmap.c Ё˖ 67 } 66 return free > pages; 65 free += nr_swap_pages; 64 free += nr_free_pages(); 63 free += atomic_read(&page_cache_size); 62 free = atomic_read(&buffermem_pages); 61 60 return 1; 59 if (sysctl_overcommit_memory) 58 /* Sometimes we want to use more memory than we have. */ 57 56 long free; 55 54 */ 53 * of num_physpages for safety margin. 52 * (buffers+cache), use the minimum values. Allow an extra 2% 51 * which tries to do "TheRightThing". Instead of using half of 50 /* 23/11/98 NJC: Somewhat less stupid version of algorithm, 49 */ 48 * fool it, but this should catch most mistakes. 47 * simple, it hopefully works in most obvious cases.. Easy to 46 /* Stupid algorithm to decide if we have enough memory: while 45 { 44 int vm_enough_memory(long pages) 43 */ 42 * new virtual mapping. 41 /* Check that a process has enough memory to allocate a [sys_brk()>vm_enough_memoty()] ==================== mm/mmap.c 41 67 ==================== vm_enough_memoty()ⳟⳟ㋏㒳Ёᰃ৺᳝䎇໳ⱘぎ䯆ݙᄬ义䴶DŽ さҹৢˈ䖬㽕䗮䖛ކさDŽ೼ᶹᯢњϡᄬ೼ކstart_addr ࠄ end_addr 䖭ഫぎ䯈㨑೼ぎ⋲Ёˈ৺߭֓ᰃ᳝њ さⱘৃ㛑DŽℸᯊᮄⱘ䖍⬠ end_addr ᖙ乏㨑೼䖭Ͼऎ䯈ⱘ䍋⚍Пϟˈгህᰃ䅽ҢކᏆ᯴ᇘऎ䯈ˈ಴ℸ᳝ 䖭䞠ⱘ start_addr ᰃ㗕ⱘ䖍⬠ˈབᵰ find_vma()䖨ಲϔϾ䴲 0 ᣛ䩜ˈህ㸼⼎೼ᅗПϞᏆ㒣᳝њϔϾ { 520 178 179 783 struct vm_area_struct * vma; 784 unsigned long flags, retval; 785 786 len = PAGE_ALIGN(len); 787 if (!len) 788 return addr; 789 790 /* 791 * mlock MCL_FUTURE? 792 */ 793 if (mm•>def_flags & VM_LOCKED) { 794 unsigned long locked = mm•>locked_vm << PAGE_SHIFT; 795 locked += len; 796 if (locked > current•>rlim[RLIMIT_MEMLOCK].rlim_cur) 797 return •EAGAIN; 798 } 799 800 /* 801 * Clear old maps. this also does some error checking for us 802 */ 803 retval = do_munmap(mm, addr, len); 804 if (retval != 0) 805 return retval; 806 807 /* Check against address space limits *after* clearing old maps... */ 808 if ((mm•>total_vm << PAGE_SHIFT) + len 809 > current•>rlim[RLIMIT_AS].rlim_cur) 810 return •ENOMEM; 811 812 if (mm•>map_count > MAX_MAP_COUNT) 813 return •ENOMEM; 814 815 if (!vm_enough_memory(len >> PAGE_SHIFT)) 816 return •ENOMEM; 817 818 flags = vm_flags(PROT_READ|PROT_WRITE|PROT_EXEC, 819 MAP_FIXED|MAP_PRIVATE) | mm•>def_flags; 820 821 flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; 822 823 824 /* Can we just expand an old anonymous mapping? */ 825 if (addr) { 826 struct vm_area_struct * vma = find_vma(mm, addr•1); 827 if (vma && vma•>vm_end == addr && !vma•>vm_file && 828 vma•>vm_flags == flags) { 829 vma•>vm_end = addr + len; 830 goto out; 831 } Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ˄826̚831 㸠˅DŽབᵰϡ㸠ህᕫ঺㸠ᓎゟϔϾऎ䯈˄838̚852 㸠 ˅DŽ ⳟⳟᰃ৺ৃҹ䎳ॳ᳝ⱘऎ䯈ড়ᑊˈे䗮䖛ᠽሩॳ᳝ऎ䯈ᴹ㽚Ⲫᮄ๲ⱘऎ䯈ܜˈᓎゟᮄⱘ᯴ᇘᯊ ᇘ䖭ḋⱘᮍ⊩њDŽ Ԣッৃҹ䅽䖯⿟㞾Ꮕᇍৃ㛑ⱘ䫭䇃䋳䋷ˈᇍѢේᷜৃህϡ㛑䞛পᡞॳ᳝ⱘ᯴ᇘ㾷䰸ˈ঺㸠ᓎゟᮄⱘ᯴ さDŽ᠔ҹˈᇍѢކさ߭া㛑ᰃϢ᭄᥂↉ⱘކさˈ㗠Ԣッⱘކさˈ䙷ህৃ㛑ᰃϢේᷜⱘކऎ䯈ⱘ催ッ᳝ 䯈ᘏᰃᄬ೼ⱘDŽᔧ✊ˈ೼ේᷜҹϟгৃ㛑䖬᳝䗮䖛 mmap()៪ ioremap()ᓎゟⱘ᯴ᇘऎ䯈DŽ᠔ҹˈབᵰᮄ 䯈ᄬ೼ⴔⱘˈ᠔ҹ find_vma_intersection()Ёⱘ find_vma()݊ᅲϡӮ䖨ಲ 0ˈ಴Ў㟇ᇥ⫼Ѣේᷜⱘ䙷Ͼऎ ҹࠡ䇈䖛ˈ⫼᠋ぎ䯈ⱘ乊ッᰃ䖯⿟ⱘ⫼᠋ぎ䯈ේᷜDŽϡㅵҔМ䖯⿟ˈ೼䙷䞠ᘏᰃ᳝ϔϾᏆ᯴ᇘऎ ᛇϔᛇˈ✊ৢݡᕔϟⳟDŽܜⱘ催ッ੠Ԣッ᳝བℸϡৠⱘᆍᖡᑺ੠ᇍᕙਸ਼˛䇏㗙᳔ད do_munmap()ᡞॳ᳝ⱘ᯴ᇘ㾷䰸˄㾕 803 㸠˅ˈݡᴹᓎゟᮄⱘ᯴ᇘDŽ䇏㗙໻ὖ㽕䯂њˈЎҔМᇍᮄऎ䯈 䗮䖛ܜDŽޚⱘᮍ⊩ᰃҹᮄⱘ᯴ᇘЎއ䆌ⱘˈ㾷ܕさᰃކさDŽϡ䖛ˈᇍѢԢッⱘކ䇈ᯢ೼Ԣッ᳝њ さ߭ᑊ᳾ẔᶹDŽ՟བˈ㗕ⱘ䖍⬠ᰃ৺ᙄདᰃϔϾᏆ᯴ᇘऎ䯈ⱘ㒜⚍ਸ਼˛བᵰϡᰃˈ䙷ህކѢ݊Ԣッⱘ さⱘẔᶹˈৃᰃϡⶹ䇏㗙ᰃ৺⊼ᛣࠄˈᅲ䰙ϞẔᶹⱘাᰃᮄऎ䯈ⱘ催ッˈᇍކfind_vm_intersection()ᇍ খ᭄ addr Ў䳔㽕ᓎゟ᯴ᇘⱘᮄऎ䯈ⱘ䍋⚍ˈ1en ߭Ўऎ䯈ⱘ䭓ᑺDŽࠡ䴶៥ӀᏆ㒣ⳟࠄ 861 } 860 return addr; 859 } 858 make_pages_present(addr, addr + len); 857 mm•>locked_vm += len >> PAGE_SHIFT; 856 if (flags & VM_LOCKED) { 855 mm•>total_vm += len >> PAGE_SHIFT; 854 out: 853 852 insert_vm_struct(mm, vma); 851 850 vma•>vm_private_data = NULL; 849 vma•>vm_file = NULL; 848 vma•>vm_pgoff = 0; 847 vma•>vm_ops = NULL; 846 vma•>vm_page_prot = protection_map[flags & 0x0f]; 845 vma•>vm_flags = flags; 844 vma•>vm_end = addr + len; 843 vma•>vm_start = addr; 842 vma•>vm_mm = mm; 841 840 return •ENOMEM; 839 if (!vma) 838 vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); 837 */ create a vma struct for an anonymous mapping * 836 */ 835 834 833 { 832 180 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䖛ᴹ䯙䇏ᠡ㛑ᓘពDŽ Ϣ sys_brk()ѦⳌখ✻↨䕗DŽ᳝ѯݙᆍৃ㛑㽕ࠄ䯙䇏њৢ䴶޴ゴҹৢˈ⡍߿ᰃĀ᭛ӊ㋏㒳āˈҹৢˈݡಲ ⳟϔϟࠡϔ㡖 sys_brk()ⱘҷⷕ੠᳝݇䇈ᯢˈᑊϨ೼䯙䇏ⱘ䖛⿟Ё⊼ᛣܜ೼䯙䇏ᴀ㡖Пࠡˈ䇏㗙ᑨ ᇍ᭄᥂ᑧ᭛ӊⱘ䆓䯂˅DŽ 䆓䯂ݙᄬϔḋഄ䆓䯂᭛ӊᰒ✊㽕ᮍ֓ᕫ໮˄䇏㗙ϡོ䆒ᛇϔϟڣlseek()ㄝㄝˈᇚ᭛ӊ᯴ᇘࠄ⫼᠋ぎ䯈ৢ ㄝ˗ৢ㗙߭⫼Ѣ݊ᅗ᥻ࠊⳂⱘDŽҢᑨ⫼⿟ᑣ䆒䅵ⱘ㾦ᑺᴹ䇈ˈ↨Пᐌ㾘ⱘ᭛ӊ᪡԰ˈབ read()ǃwrite()ǃ length ߭Ў䭓ᑺDŽ䖬᳝ϸϾখ᭄ prot ੠ flagsˈࠡ㗙⫼Ѣᇍ᠔᯴ᇘऎ䯈ⱘ䆓䯂῵ᓣˈབৃݭǃৃᠻ㸠ㄝ খ᭄ fd ҷ㸼ⴔϔϾᏆᠧᓔ᭛ӊˈoffset Ў᭛ӊЁⱘ䍋⚍ˈ㗠 start Ў᯴ᇘࠄ⫼᠋ぎ䯈Ёⱘ䍋ྟഄഔˈ mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset) Ў˖ ϔϾ䖯⿟ৃҹ䗮䖛㋏㒳䇗⫼ mmap()ˈᇚϔϾᏆᠧᓔ᭛ӊⱘݙᆍ᯴ᇘࠄᅗⱘ⫼᠋ぎ䯈ˈ݊⫼᠋⬠䴶 2.13 ㋏㒳䇗⫼ mmap() জӮᗢḋ˛މЁ䇏ˈ䇏ߎⱘݙᆍ䆹ᰃҔМ˛ᕔ䞠䴶ݭˈᚙ Ң do_brk()䖨ಲˈ䖯㗠Ң sys_brk()䖨ಲПᯊˈ䖭ѯ义䴶㸼乍ⱘ᯴ᇘᰃᗢḋⱘ˛བᵰ䖯⿟Ңᮄߚ䜡ⱘऎ䯈 䖭䞠᠔⫼ⱘᮍ⊩ᕜ᳝䍷ˈ䙷ህᰃᇍᮄऎ䯈Ёⱘ↣ϔϾ义䴶῵ᢳϔ⃵㔎义ᓖᐌDŽ䇏㗙ϡོᛇᛇˈᔧ 1229 } 1228 return 0; 1227 } while (addr < end); 1226 addr += PAGE_SIZE; 1225 return •1; 1224 if (handle_mm_fault(mm, vma, addr, write) < 0) 1223 do { 1222 BUG(); 1221 if (addr >= end) 1220 write = (vma•>vm_flags & VM_WRITE) != 0; 1219 vma = find_vma(mm, addr); 1218 1217 struct vm_area_struct * vma; 1216 struct mm_struct *mm = current•>mm; 1215 int write; 1214 { 1213 int make_pages_present(unsigned long addr, unsigned long end) 1212 */ 1211 * Simplistic page force•in.. 1210 /* [sys_brk()>do_brk()>make_page_present()] ==================== mm/memory.c 1210 1229 ==================== ᳔ৢˈ䗮䖛 make_page_present()ˈЎᮄ๲ⱘऎ䯈ᓎゟ䍋ᇍݙᄬ义䴶ⱘ᯴ᇘDŽ݊ҷⷕ㾕 mm/memory.c˖ 181 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 54 if (!file) 53 file = fget(fd); 52 if (!(flags & MAP_ANONYMOUS)) { 51 flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE); 50 49 struct file * file = NULL; 48 int error = •EBADF; 47 { 46 unsigned long fd, unsigned long pgoff) 45 unsigned long prot, unsigned long flags, 44 unsigned long addr, unsigned long len, 43 static inline long do_mmap2( 42 /* common code for old and new mmaps */ [sys_mmap2()>do_mmap2()] ==================== arch/i386/kernel/sys_i386.c 42 66 ==================== ৃ㾕ˈѠ㗙ⱘऎ߿ҙ೼ѢӴ䗦খ᭄ⱘᮍᓣˈᅗӀⱘЏԧ䛑ᰃ do_mmap2()ˈ݊ҷⷕ೼ৠϔ᭛ӊЁ˖ 106 } 105 return err; 104 out: 103 err = do_mmap2(a.addr, a.len, a.prot, a.flags, a.fd, a.offset >> PAGE_SHIFT); 102 101 goto out; 100 if (a.offset & ~PAGE_MASK) 99 err = •EINVAL; 98 97 goto out; 96 if (copy_from_user(&a, arg, sizeof(a))) 95 94 int err = •EFAULT; 93 struct mmap_arg_struct a; 92 { 91 asmlinkage int old_mmap(struct mmap_arg_struct *arg) ==================== arch/i386/kernel/sys_i386.c 91 106 ==================== 73 } 72 return do_mmap2(addr, len, prot, flags, fd, pgoff); 71 { 70 unsigned long fd, unsigned long pgoff) 69 unsigned long prot, unsigned long flags, 68 asmlinkage long sys_mmap2(unsigned long addr, unsigned long len, ==================== arch/i386/kernel/sys_i386.c 68 73 ==================== arch/i386/kernel/sys_i386.c Ё˖ ᅮ䞛⫼ાϔϾ㋏㒳䇗⫼োDŽѠ㗙ⱘҷⷕ䛑೼އ㒳䇗⫼ো੠ old_mmap()ˈ⬅ϡৠ⠜ᴀⱘ C 䇁㿔ᑧ⿟ᑣ old_mmap()ˈ䖭ϸϾߑ᭄ᇍᑨⴔϡৠⱘ㋏㒳䇗⫼োDŽЎֱᣕᇍ㗕⠜ᴀⱘݐᆍˈ2.4.0 ⠜Ёҡֱ⬭㗕ⱘ㋏ ೼ 2.4.0 ⠜ⱘݙḌЁᅲ⦄䖭Ͼ䇗⫼ⱘߑ᭄Ў sys_mmap2()ˈԚᰃ೼㗕ϔѯⱘ⠜ᴀЁ঺᳝ϔϾߑ᭄ 182 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 192 struct vm_area_struct * vma; 191 struct mm_struct * mm = current•>mm; 190 { 189 unsigned long prot, unsigned long flags, unsigned long pgoff) 188 unsigned long do_mmap_pgoff(struct file * file, unsigned long addr, unsigned long len, [sys_mmap2()>do_mmap2()>do_mmap_pgoff()] ==================== mm/mmap.c 188 249 ==================== ߑ᭄ do_mmap_pgoff()ⱘҷⷕ೼ mm/mmap.c Ё˖ ᠔ҹгϡݡ䳔㽕䗮䖛ֵো䞣᪡԰ down()੠ up()ࡴҹֱᡸDŽ ⱘাᰃ do_mmap()ϡᬃᣕ MAP_ANONYMOUS˗঺ϔᮍ䴶⬅Ѣ೼䖯ܹ do_mmap()ПࠡᏆ㒣೼Ј⬠ऎݙˈ Ϣ do_mmap2()԰ϔ↨䕗ˈህৃথ⦄Ѡ㗙෎ᴀϞⳌৠˈ䛑ᰃ䗮䖛 do_mmap_pgoff()ᅠ៤᪡԰DŽϡৠ 439 } 438 return ret; 437 out: 436 ret = do_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT); 435 if (!(offset & ~PAGE_MASK)) 434 goto out; 433 if ((offset + PAGE_ALIGN(len)) < offset) 432 unsigned long ret = •EINVAL; 431 { 430 unsigned long flag, unsigned long offset) 429 unsigned long len, unsigned long prot, 428 static inline unsigned long do_mmap(struct file *file, unsigned long addr, ==================== include/linux/mm.h 428 439 ==================== Ў䖯⿟䯈䗮ֵ᠟↉ⱘĀ݅ѿݙᄬऎāDŽ䖭Ͼ inline ߑ᭄ᰃ೼ include/linux/mm.h ЁᅮНⱘ˖ do_mmap()ᇚৃᠻ㸠⿟ᑣ˄Ѡ䖯ࠊҷⷕ˅᯴ᇘࠄᔧࠡ䖯⿟ⱘ⫼᠋ぎ䯈DŽℸ໪ˈdo_mmap()䖬⫼ᴹ߯ᓎ԰ ⫼᠋ぎ䯈DŽҹৢˈ೼䯙䇏㋏㒳䇗⫼ sys_execve()ⱘҷⷕᯊˈ೼ߑ᭄ load_aout_binary()Ёৃҹⳟࠄ䗮䖛 ݙḌЁ䖬᳝Ͼ inline ߑ᭄ do_mmap()ˈᰃկݙḌ㞾Ꮕ⫼ⱘˈᅗгᰃᇚᏆᠧᓔ᭛ӊ᯴ᇘࠄᔧࠡ䖯⿟ⱘ ⱘԡ㕂Ϟߚ䜡ぎ䯈DŽ䰸ℸП໪ˈ᪡԰ⱘЏԧህᰃ do_mmap_pgoff()DŽ ᭄ flags Ёᡞᷛᖫԡ MAP_ANONYMOUS 䆒៤ 1ˈ㸼⼎≵᳝᭛ӊˈᅲ䰙Ϟাᰃ⫼ᴹĀ೜ഄāˈे೼ᣛᅮ ϔ㠀㗠㿔ˈ㋏㒳䇗⫼ mmap()ᇚᏆᠧᓔ᭛ӊ᯴ᇘࠄ⫼᠋ぎ䯈DŽԚᰃ᳝Ͼ՟໪ˈ䙷ህᰃৃҹ೼䇗⫼খ 66 } 65 return error; 64 out: 63 fput(file); 62 if (file) 61 60 up(¤t•>mm•>mmap_sem); 59 error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff); 58 down(¤t•>mm•>mmap_sem); 57 56 } goto out; 55 183 184 193 int correct_wcount = 0; 194 int error; 195 196 if (file && (!file•>f_op || !file•>f_op•>mmap)) 197 return •ENODEV; 198 199 if ((len = PAGE_ALIGN(len)) == 0) 200 return addr; 201 202 if (len > TASK_SIZE || addr > TASK_SIZE•len) 203 return •EINVAL; 204 205 /* offset overflow? */ 206 if ((pgoff + (len >> PAGE_SHIFT)) < pgoff) 207 return •EINVAL; 208 209 /* Too many mappings? */ 210 if (mm•>map_count > MAX_MAP_COUNT) 211 return •ENOMEM; 212 213 /* mlock MCL_FUTURE? */ 214 if (mm•>def_flags & VM_LOCKED) { 215 unsigned long locked = mm•>locked_vm << PAGE_SHIFT; 216 locked += len; 217 if (locked > current•>rlim[RLIMIT_MEMLOCK].rlim_cur) 218 return •EAGAIN; 219 } 220 221 /* Do simple checking here so the lower•level routines won't have 222 * to. we assume access permissions have been handled by the open 223 * of the memory object, so we don't do any here. 224 */ 225 if (file != NULL) { 226 switch (flags & MAP_TYPE) { 227 case MAP_SHARED: 228 if ((prot & PROT_WRITE) && !(file•>f_mode & FMODE_WRITE)) 229 return •EACCES; 230 231 /* Make sure we don't allow writing to an append•only file.. */ 232 if (IS_APPEND(file•>f_dentry•>d_inode) && (file•>f_mode & FMODE_WRITE)) 233 return •EACCES; 234 235 /* make sure there are no mandatory locks on the file. */ 236 if (locks_verify_locked(file•>f_dentry•>d_inode)) 237 return •EAGAIN; 238 239 /* fall through */ 240 case MAP_PRIVATE: 241 if (!(file•>f_mode & FMODE_READ)) Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 382 381 struct vm_area_struct * vmm; 380 { 379 unsigned long get_unmapped_area(unsigned long addr, unsigned long len) 378 #ifndef HAVE_ARCH_UNMAPPED_AREA 377 */ 376 * Return value 0 means ENOMEM. 375 * For mmap() without MAP_FIXED and shmat() with addr=0. 374 /* Get an address range which is currently unmapped. [sys_mmap2()>do_mmap2()>do_mmap_pgoff()>get_unmapped_area()] ==================== mm/mmap.c 374 398 ==================== ህ䗮䖛 get_unmapped_area()೼ᔧࠡ䖯⿟ⱘ⫼᠋ぎ䯈Ёߚ䜡ϔϾ䍋ྟഄഔDŽ݊ҷⷕ೼ mm/mmap.c Ё˖ MAP_FIXED Ў 0ˈህ㸼⼎ᣛᅮⱘ᯴ᇘഄഔাᰃϾখ㗗ؐˈϡ㛑⒵䎇ᯊৃҹ⬅ݙḌ㒭ߚ䜡ϔϾDŽ᠔ҹˈ 䇗⫼ do_mmap_pgoff()ᯊⱘখ᭄෎ᴀϞህᰃ㋏㒳䇗⫼ mmap()ⱘখ᭄ˈབᵰখ᭄ flags Ёⱘᷛᖫԡ 261 260 } 259 return •ENOMEM; 258 if (!addr) 257 addr = get_unmapped_area(addr, len); 256 } else { 255 return •EINVAL; 254 if (addr & ~PAGE_MASK) 253 if (flags & MAP_FIXED) { 252 */ 251 * that it represents a valid section of the address space. 250 /* Obtain the address to map to. we verify (or select) it and ensure [sys_mmap2()>do_mmap2()>do_mmap_pgoff()] ==================== mm/mmap.c 250 261 ==================== 䖛ᴹҨ㒚ⳟ䖭ѯҷⷕDŽ䖭䞠៥Ӏ㒻㓁ᕔϟⳟ˖ ℸ໪ˈ䖬㽕ᇍ᭛ӊ੠ऎ䯈ⱘ䆓䯂ᴗ䰤䖯㸠ẔᶹˈѠ㗙ᖙ乏ⳌヺDŽ䇏㗙ৃҹ೼䯙䇏њ㄀ 5 ゴҹৢಲ ᶊ˗Ԣሖⱘ᭛ӊ᪡԰ᰃ⬅݋ԧⱘ᭛ӊ㋏㒳ᦤկⱘDŽ ԰˄䆺㾕㄀ 5 ゴĀ᭛ӊ㋏㒳ā˅DŽҢᶤ⾡ᛣНϞ䇈ˈdo_mmap()੠ do_mmap2()ᦤկⱘাᰃϔϾ催ሖⱘḚ ᣛ৥ϔϾ file_operations ᭄᥂㒧ᵘˈ݊Ёⱘߑ᭄ᣛ䩜 mmap জᖙ乏ᣛ৥݋ԧ᭛ӊ㋏㒳᠔ᦤկⱘ mmap ᪡ 䴲 0 㸼⼎᯴ᇘⱘᰃ݋ԧⱘ᭛ӊ˄㗠ϡᰃ MAP_ANONYMOUS˅ˈ᠔ҹⳌᑨ file 㒧ᵘЁⱘᣛ䩜 f_op ᖙ乏 ᇍ᭛ӊ੠ऎ䯈ϸᮍ䴶䛑԰ϔѯẔᶹˈࣙᣀ䍋ྟഄഔϢ䭓ᑺǃᏆ㒣᯴ᇘⱘ⃵᭄ㄝㄝDŽᣛ䩜 fileܜ佪 249 248 } 247 } 246 return •EINVAL; 245 default: 244 243 break; return •EACCES; 242 185 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 274 273 vma•>vm_flags = vm_flags(prot,flags) | mm•>def_flags; 272 vma•>vm_end = addr + len; 271 vma•>vm_start = addr; 270 vma•>vm_mm = mm; 269 268 return •ENOMEM; 267 if (!vma) 266 vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); 265 */ 264 * not unmapped, but the maps are removed from the list. 263 * specific mapper. the address has already been validated, but 262 /* Determine the object being mapped and call the appropriate [sys_mmap2()>do_mmap2()>do_mmap_pgoff()] ==================== mm/mmap.c 262 322 ==================== do_mmap_pgoff()Ё㒻㓁ᕔϟⳟ˄mm/mmap.c˅˖ 㟇ℸˈা㽕䖨ಲⱘഄഔ䴲 0ˈaddr ህᏆ㒣ᰃϔϾヺড়৘⾡㽕∖ⱘ㰮ᄬഄഔњDŽ៥Ӏಲࠄ ഄഔⱘऎ䯈DŽབᵰᡒϡࠄ䖭МϔϾऎ䯈ˈ䙷ህ䇈ᯢ㒭ᅮⱘഄഔᇮ᳾᯴ᇘˈ಴㗠ৃҹՓ⫼DŽ ৥ϞᇏᡒDŽߑ᭄ find_vma()೼ᔧࠡ䖯⿟Ꮖ㒣᯴ᇘⱘ㰮ᄬぎ䯈Ёᡒࠄ㄀ϔϾ⒵䎇 vma•>vm_end ໻Ѣ㒭ᅮ 㰮ᄬぎ䯈Ёᇏᡒϔഫ䎇ҹᆍ㒇㒭ᅮ䭓ᑺⱘऎ䯈ǃ㗠ᔧ㒭ᅮⱘⳂᷛഄഔϡЎ 0 ᯊˈ߭Ң㒭ᅮⱘഄഔᓔྟ гህᰃ䇈ˈᔧ㒭ᅮⱘⳂᷛഄഔЎ 0 ᯊˈݙḌҢ˄TASK_SIZE/3˅े 1GB ໘ᓔྟ৥Ϟ೼ᔧࠡ䖯⿟ⱘ 266 #define TASK_UNMAPPED_BASE (TASK_SIZE / 3) 265 */ 264 * space during mmap's. 263 /* This decides where the kernel will search for a free chunk of vm ==================== include/asm•i386/processor.h 263 266 ==================== include/asm•i386/processor.h ЁᅮНⱘ˖ 䇏㗙㞾㸠䯙䇏䖭↉⿟ᑣᑨ䆹ϡӮ᳝ೄ䲒DŽᐌ᭄ TASK_UNMAPPED_BASE ᰃ೼ 398 #endif 397 } 396 } 395 addr = vmm•>vm_end; 394 return addr; 393 if (!vmm || addr + len <= vmm•>vm_start) 392 return 0; 391 if (TASK_SIZE • len < addr) 390 /* At this point: (!vmm || addr < vmm•>vm_end). */ 389 for (vmm = find_vma(current•>mm, addr); ; vmm = vmm•>vm_next) { 388 387 addr = PAGE_ALIGN(addr); 386 addr = TASK_UNMAPPED_BASE; 385 if (!addr) 384 return 0; if (len > TASK_SIZE) 383 186 187 275 if (file) { 276 VM_ClearReadHint(vma); 277 vma•>vm_raend = 0; 278 279 if (file•>f_mode & FMODE_READ) 280 vma•>vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; 281 if (flags & MAP_SHARED) { 282 vma•>vm_flags |= VM_SHARED | VM_MAYSHARE; 283 284 /* This looks strange, but when we don't have the file open 285 * for writing, we can demote the shared mapping to a simpler 286 * private mapping. That also takes care of a security hole 287 * with ptrace() writing to a shared mapping without write 288 * permissions. 289 * 290 * We leave the VM_MAYSHARE bit on, just to get correct output 291 * from /proc/xxx/maps.. 292 */ 293 if (!(file•>f_mode & FMODE_WRITE)) 294 vma•>vm_flags &= ~(VM_MAYWRITE | VM_SHARED); 295 } 296 } else { 297 vma•>vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; 298 if (flags & MAP_SHARED) 299 vma•>vm_flags |= VM_SHARED | VM_MAYSHARE; 300 } 301 vma•>vm_page_prot = protection_map[vma•>vm_flags & 0x0f]; 302 vma•>vm_ops = NULL; 303 vma•>vm_pgoff = pgoff; 304 vma•>vm_file = NULL; 305 vma•>vm_private_data = NULL; 306 307 /* Clear old maps */ 308 error = •ENOMEM; 309 if (do_munmap(mm, addr, len)) 310 goto free_vma; 311 312 /* Check against address space limit. */ 313 if ((mm•>total_vm << PAGE_SHIFT) + len 314 > current•>rlim[RLIMIT_AS].rlim_cur) 315 goto free_vma; 316 317 /* Private writable mapping? Check memory availability.. */ 318 if ((vma•>vm_flags & (VM_SHARED | VM_WRITE)) == VM_WRITE && 319 !(flags & MAP_NORESERVE) && 320 !vm_enough_memory(len >> PAGE_SHIFT)) 321 goto free_vma; 322 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 328 correct_wcount = 1; 327 goto free_vma; 326 if (error) 325 error = deny_write_access(file); 324 if (vma•>vm_flags & VM_DENYWRITE) { 323 if (file) { [sys_mmap2()>do_mmap2()>do_mmap_pgoff()] ==================== mm/mmap.c 323 372 ==================== 㒻㓁ᕔϟⳟ do_mmap_pgoff()ⱘҷⷕ˄mm/mmap.c˅˖ 㸠᳝৺ৃ㛑ᬍব䖭ѯᴵӊDŽҹ䖭䞠ⱘ㄀ϝϾᴵӊЎ՟ˈབᵰথ⫳䇗ᑺˈ䙷ህᯢᰒᰃৃ㛑ᬍবⱘDŽ ৢߚ䜡䌘⑤˅ⱘᚙ᱃DŽ݇䬂ህ೼Ѣߚ䜡䌘⑤ⱘ䖛⿟Ёᰃ৺᳝ৃ㛑থ⫳䇗ᑺˈҹঞ݊ᅗ䖯⿟៪㒓⿟ⱘ䖤 Ẕ⌟ᴵӊˈܜߚ䜡ᶤ乍䌘⑤ˈ✊ৢẔ⌟ᴵӊˈབᵰᴵӊϡヺݡᇚ䌘⑤䞞ᬒ˄㗠ϡᰃܜḌЁᐌᐌৃҹⳟࠄ ⿟ clone()ߎᴹⱘ㒓⿟˄㾕㄀ 4 ゴ˅䖤㸠䖛њˈህϡ㛑ᥦ䰸䖭ѯᴵӊᏆ㒣ᬍবⱘৃ㛑DŽ᠔ҹˈ䇏㗙೼ݙ 䇗ᑺ߿ⱘ䖯⿟䖤㸠DŽ䖭ḋˈ⬅Ѣৃ㛑Ꮖ㒣᳝߿ⱘ䖯⿟៪㒓⿟ˈ⡍߿ᰃ⬅ᴀ䖯ܜᯊϡ㛑⒵䎇㽕∖㗠াད ϧ⫼ⱘ slab Ꮖ㒣⫼ᅠˈ㗠ϡᕫϡߚ䜡᳈໮⠽⧚义䴶ⱘᚙᔶDŽ㗠ߚ䜡⠽⧚义䴶ⱘ䖛⿟ˈ߭জ᳝ৃ㛑಴ϔ Ѣˈ೼䗮䖛 kmem_cache_alloc()ߚ䜡 vm_area_struct ᭄᥂㒧ᵘⱘ䖛⿟Ёˈ᳝ৃ㛑Ӯথ⫳կ䖭⾡᭄᥂㒧ᵘ 䇏㗙г䆌䖬㽕䯂˖ЎҔМϡᡞᇍ᠔᳝ᴵӊⱘẔ偠ᬒ೼ߚ䜡 vm_area_struct ᭄᥂㒧ᵘПࠡਸ਼˛䯂乬೼ ৃݭऎ䯈ˈ㗠⠽⧚义䴶ⱘ᭄䞣Ꮖ㒣˄᱖ᯊ˅ϡ䎇DŽ Ͼᰃབᵰᔧࠡ䖯⿟ᇍ㰮ᄬぎ䯈ⱘՓ⫼䍙ߎњЎ݊䆒㕂ⱘϟ䰤˗঺ϔϾᰃ೼㽕∖ᓎゟ⬅ᔧࠡ䖯⿟ϧ⫼ⱘ гӮᇐ㟈᩸䫔Ꮖ㒣ߚ䜡ⱘ vm_area_struct ᭄᥂㒧ᵘ˖ϔމ߭ᇮ᳾ᇍℸࡴҹẔᶹDŽ䰸ℸП໪ˈ䖬᳝ϸϾᚙ ⊼ᛣⳟϔϟህৃⶹ䘧ˈ䙷াᰃᔧ䇗⫼খ᭄ flags Ёⱘᷛᖫԡ MAP_FIXED Ў 0 ᯊˈ㗠ᔧ䆹ᷛᖫԡЎ 1 ᯊ ࠄ༛ᗾˈ䖭Ͼऎ䯈ϡᰃ೼ࠡ䴶䇗⫼ ger_unmapped_area()ᡒࠄⱘ৫˛ᗢМӮॳᴹህᏆ᯴ᇘਸ਼˛ಲ䖛༈এ 㒣ߚ䜡ⱘ vm_area_struct ᭄᥂㒧ᵘ᩸䫔DŽ៥ӀᏆ㒣೼ࠡϔ㡖Ё䇏䖛 do_munmap()ⱘҷⷕDŽг䆌䇏㗙Ӯᛳ ᇘ᩸䫔DŽ㽕ᰃ䖭Ͼ᪡԰༅䋹ˈ䙷ᔧ✊ϡ㛑䞡໡᯴ᇘৠϔϾⳂᷛഄഔˈ᠔ҹህᕫ䕀⿏ࠄ free_vmaˈᡞᏆ do_munmap()DŽᅗẔᶹⳂᷛഄഔ೼ᔧࠡ䖯⿟ⱘ㰮ᄬぎ䯈ᰃ৺Ꮖ㒣೼Փ⫼ˈབᵰᏆ㒣೼Փ⫼ህ㽕ᇚ㗕ⱘ᯴ mm_struct 㒧ᵘЁDŽৃᰃˈ೼ᶤѯᴵӊϟै䖬ϡᕫϡᇚᅗ᩸䫔DŽЎҔМਸ਼˛䖭䞠䇗⫼њϔϾߑ᭄ 㟇ℸˈҷ㸼ⴔ៥Ӏ᠔䳔㰮ᄬऎ䯈ⱘ᭄᥂㒧ᵘᏆ㒣߯ᓎњˈাᰃᇮ᳾ᦦܹҷ㸼ᔧࠡ䖯⿟㰮ᄬぎ䯈ⱘ 䴶㸼乍Ёᣛᯢ݊এ৥DŽ঺ϔᮍ䴶ˈ䖭г䇈ᯢњЎҔМ䖭ḋⱘऎ䯈ᖙ乏ᰃ⣀ゟⱘDŽ ᱂䗮ᤶܹˋᤶߎ义䴶䙷ḋ೼义ڣ䴶೼᭛ӊЁⱘԡ㕂DŽ᠔ҹˈᔧᮁᓔ᯴ᇘᯊˈᇍѢ᭛ӊ᯴ᇘ义䴶ϡ䳔㽕 㸼ⴔ᠔᯴ᇘݙᆍ೼᭛ӊЁⱘ䍋⚍DŽ᳝њ䖭Ͼ䍋⚍ˈথ⫳㔎义ᓖᐌᯊህৃҹḍ᥂㰮ᄬഄഔ䅵ㅫߎⳌᑨ义 ⊼ᛣҷⷕЁⱘ 303 㸠ᇚখ᭄ pgoff 䆒㕂ࠄ vm_area_struct ᭄᥂㒧ᵘЁⱘ vm_pgoff ᄫ↉DŽ䖭Ͼখ᭄ҷ 㕂ⱘ䆓䯂ᴗ䰤㗗㰥䖯এ˄㾕 275̚296 㸠 ˅DŽ ゟҢ⠽⧚ぎ䯈ࠄ㰮ᄬऎ䯈ⱘ᯴ᇘDŽ㗠བᵰⳂⱘ೼ѢᓎゟҢ᭛ӊࠄ㰮ᄬऎ䯈ⱘ᯴ᇘˈ䙷ህ㽕ᡞЎ᭛ӊ䆒 བᵰ䇗⫼ do_mmap_pgoff()ᯊⱘ file 㒧ᵘᣛ䩜Ў 0ˈ߭Ⳃⱘҙ೼Ѣ߯ᓎ㰮ᄬऎ䯈ˈ៪㗙䇈ҙ೼Ѣᓎ Пऩ⣀ᓎゟϔϾ䘏䕥ऎ䯈DŽ ሲᗻϡৠⱘऎ↉ϡ㛑݅ᄬѢৠϔ䘏䕥ऎ䯈Ёˈ㗠᯴ᇘࠄϔϾ⡍ᅮⱘ᭛ӊгᰃϔ⾡ሲᗻˈ᠔ҹᘏᰃ㽕Ў Ꮖ᳝ⱘऎ䯈ড়ᑊᯊˈᠡߚ䜡њϔϾ vm_area_struct ᭄᥂㒧ᵘˈ㗠䖭䞠ैᰃ᮴ᴵӊⱘDŽҹࠡ៥Ӏᦤࠄ䖛ˈ 䜡ϔϾˈᑊࡴҹ䆒㕂DŽ៥ӀϡོϢࠡϔ㡖Ё do_brk()ⱘҷⷕ԰ϔ↨䕗ˈ೼䙷䞠াᰃ೼ᮄ๲ⱘऎ䯈ϡ㛑Ϣ Ͼ䘏䕥ऎ䯈䛑㽕᳝Ͼ vm_area_struct ᭄᥂㒧ᵘˈ᠔ҹ䗮䖛 kmem_cache_alloc()Ўᕙ᯴ᇘⱘऎ䯈ߚ↣ 188 189 329 } 330 vma•>vm_file = file; 331 get_file(file); 332 error = file•>f_op•>mmap(file, vma); 333 if (error) 334 goto unmap_and_free_vma; 335 } else if (flags & MAP_SHARED) { 336 error = shmem_zero_setup(vma); 337 if (error) 338 goto free_vma; 339 } 340 341 /* Can addr have changed?? 342 * 343 * Answer: Yes, several device drivers can do it in their 344 * f_op•>mmap method. •DaveM 345 */ 346 flags = vma•>vm_flags; 347 addr = vma•>vm_start; 348 349 insert_vm_struct(mm, vma); 350 if (correct_wcount) 351 atomic_inc(&file•>f_dentry•>d_inode•>i_writecount); 352 353 mm•>total_vm += len >> PAGE_SHIFT; 354 if (flags & VM_LOCKED) { 355 mm•>locked_vm += len >> PAGE_SHIFT; 356 make_pages_present(addr, addr + len); 357 } 358 return addr; 359 360 unmap_and_free_vma: 361 if (correct_wcount) 362 atomic_inc(&file•>f_dentry•>d_inode•>i_writecount); 363 vma•>vm_file = NULL; 364 fput(file); 365 /* Undo any partial mapping done by a device driver. */ 366 flush_cache_range(mm, vma•>vm_start, vma•>vm_end); 367 zap_page_range(mm, vma•>vm_start, vma•>vm_end • vma•>vm_start); 368 flush_tlb_range(mm, vma•>vm_start, vma•>vm_end); 369 free_vma: 370 kmem_cache_free(vm_area_cachep, vma); 371 return error; 372 } བᵰ㽕ᓎゟⱘᰃҢ᭛ӊࠄ㰮ᄬऎ䯈ⱘ᯴ᇘˈ㗠೼䇗⫼ do_mmap() ᯊⱘখ᭄ flags Ёⱘ MAP_DENYWRITE ᷛᖫԡЎ 1˄䖭Ͼᷛᖫԡ೼ࠡ䴶 273 㸠ᓩ⫼ⱘᅣ᪡԰ vm_flags()Ё䕀ᤶ៤ VM_DENYWRITE˅ˈ䙷ህ㸼⼎ϡܕ䆌䗮䖛ᐌ㾘ⱘ᭛ӊ᪡԰䆓䯂䆹᭛ӊˈ᠔ҹ㽕䇗⫼ deny_write_access() Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 㟇Ѣ opsˈ߭ḍ᥂᯴ᇘЎϧ᳝៪݅ѿ㗠ߚ߿ᣛ৥᭄᥂㒧ᵘ file_private_mmap ៪ file_shared_mmapDŽ䖭ϸ 䖭Ͼߑ᭄ᕜㅔऩˈᅲ䋼ᗻⱘ᪡԰ህᰃ 1723 㸠ᇚ㰮ᄬऎ䯈᥻ࠊ㒧ᵘЁⱘᣛ䩜 vm_ops 䆒㕂៤ opsDŽ 1725 } 1724 return 0; 1723 vma•>vm_ops = ops; 1722 UPDATE_ATIME(inode); 1721 return •ENOEXEC; 1720 if (!inode•>i_mapping•>a_ops•>readpage) 1719 return •EACCES; 1718 if (!inode•>i_sb || !S_ISREG(inode•>i_mode)) 1717 } 1716 ops = &file_shared_mmap; 1715 return •EINVAL; 1714 if (!inode•>i_mapping•>a_ops•>writepage) 1713 if ((vma•>vm_flags & VM_SHARED) && (vma•>vm_flags & VM_MAYWRITE)) { 1712 ops = &file_private_mmap; 1711 1710 struct inode *inode = file•>f_dentry•>d_inode; 1709 struct vm_operations_struct * ops; 1708 { 1707 int generic_file_mmap(struct file * file, struct vm_area_struct * vma) 1706 1705 /* This is used for a general mmap of a disk file */ [sys_mmap2()>do_mmap2()>do_mmap_pgoff()>generic_file_mmap()] ==================== mm/filemap.c 1705 1725 ==================== ⱘҷⷕ೼ mm/filemap.c Ё˖ 䆒㕂៤ᣛ৥䖭Ͼ᭄᥂㒧ᵘˈ᠔ҹϞ䴶 332 㸠ⱘ file•>f_op•>mmap ህᣛ৥ generic_file_mmap()DŽ䖭Ͼߑ᭄ ᔧᠧᓔϔϾ᭛ӊᯊˈབᵰ᠔ᠧᓔⱘ᭛ӊ೼ϔϾ Ext2 ᭛ӊ㋏㒳ЁˈݙḌህӮᇚ file 㒧ᵘЁⱘᣛ䩜 f_op 109 }; ==================== fs/ext2/file.c 109 109 ==================== ...... 105 mmap: generic_file_mmap, ==================== fs/ext2/file.c 105 105 ==================== ...... 100 struct file_operations ext2_file_operations = { ==================== fs/ext2/file.c 100 100 ==================== ᭛ӊ㋏㒳ⱘ file_operations ᭄᥂㒧ᵘ˄fs/ext2/file.c˅˖ ࠄ㰮ᄬऎ䯈ⱘ᯴ᇘⱘ᪡԰DŽ䙷Мˈ݋ԧࠄ Linux ⱘ Ext2 ᭛ӊ㋏㒳ˈ䖭Ͼߑ᭄ᰃҔМਸ਼˛៥Ӏᴹⳟ Ext2 ↣⾡᭛ӊ㋏㒳䛑᳝Ͼ file_operations ᭄᥂㒧ᵘˈ݊Ёⱘߑ᭄ᣛ䩜 mmap ᦤկњ⫼ᴹᓎゟҢ䆹㉏᭛ӊ ᯊˈ䖬㽕ಲ䖛ᴹⳟ shmem_zero_setup()ⱘҷⷕDŽ ៥Ӏ೼䖭䞠᱖ϡ݇ᖗЎ݅ѿݙᄬऎ㗠ᓎゟⱘ᯴ᇘˈ᠔ҹ䏇䖛 335̚339 㸠ˈᇚᴹ೼䆆ࠄ݅ѿݙᄬऎ ᵘЁⱘ݅ѿ䅵᭄DŽ ᥦ᭹ᐌ㾘ⱘ᭛ӊ᪡԰ˈ䆺㾕Ā᭛ӊ㋏㒳āϔゴЁⱘ᳝݇ݙᆍDŽ㟇Ѣ get_file()ˈ݊԰⫼াᰃ䗦๲ file 㒧 190 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ݇ⱘ᭄᥂㒧ᵘ੠ᣛ䩜гᰃ೼ᠧᓔ᭛ӊᯊ䆒㕂དњⱘDŽ 䖭Ͼ᭄᥂㒧ᵘᦤկњ⫼ᴹ䇏ˋݭ ext2 ᭛ӊ义䴶ⱘߑ᭄ ext2_readpage()੠ ext2_writepage()DŽ䖭ѯ᳝ 676 }; 675 bmap: ext2_bmap 674 commit_write: generic_commit_write, 673 prepare_write: ext2_prepare_write, 672 sync_page: block_sync_page, 671 writepage: ext2_writepage, 670 readpage: ext2_readpage, 669 struct address_space_operations ext2_aops = { ==================== fs/ext2/inode.c 669 676 ==================== Ѣ fs/ext2/inode.c Ё˖ ϔ⾡⡍⅞ⱘ᭛ӊ㋏㒳˅᳝ϡৠⱘ address_space_operations 㒧ᵘDŽᇍѢ Ext2 ᭛ӊ㋏㒳ᰃ ext2_aopsˈᅮН 䩜 a_opsˈᅗᣛ৥ϔϾ address_space_operations ᭄᥂㒧ᵘDŽϡৠⱘ᭛ӊ㋏㒳˄义䴶Ѹᤶ䆒໛ৃҹⳟ԰ᰃ ಲࠄĀ⠽⧚义䴶ⱘՓ⫼੠਼䕀āϔ㡖ЁⳟϔϟᅗⱘᅮНDŽ៥Ӏ䖭䞠݇ᖗⱘᰃ address_space 㒧ᵘЁⱘᣛ 㒧ᵘ䯈᥹ഄᦤկDŽ೼ inode 㒧ᵘЁ᳝Ͼᣛ䩜 i_mappingˈᅗᣛ৥ϔϾ address_space ᭄᥂㒧ᵘˈ䇏㗙ᑨ䆹 䖬Ẕ偠њ⫼Ѣ义䴶䇏ˋݭⱘߑ᭄ᰃ৺ᄬ೼˄㾕 1714 ੠ 1720 㸠˅DŽ䖭ϸϾߑ᭄ᑨ䆹⬅᭛ӊⱘ inode ᭄᥂ ϸϾ㒧ᵘ݊ᅲᰃϔḋⱘˈ䛑াᰃЎ㔎义ᓖᐌᦤկњ nopage ᪡԰DŽℸ໪ˈ೼ generic_file_mmap()Ё ؐⱘᇍᑨ݇㋏гϡⳈ㾖DŽ ೼㗕⠜ᴀЁ߭ᖙ乏ݭ៤{NULL, NULL, filemap_nopape˅ˈ䙷ḋˈϔᴹ咏⚺ˈѠᴹ㒧ᵘЁ৘ᄫ↉Ϣ݊߱ྟ nopage ҹ໪ˈ᠔᳝៤ߚⱘ߱ྟؐഛЎ 0 ៪ NULLˈ㗠 nopage ⱘ߱ྟؐ߭Ў filemap_nopageDŽⳌ↨Пϟ ᭄᥂㒧ᵘⱘ߱ྟ࣪гᰃ gcc ᇍ C 䇁㿔᠔԰ᬍ䖯ПϔDŽ䖭䞠㸼⼎݋ԧ vm_operations_struct 㒧ᵘЁ䰸 1703 }; 1702 nopage: filemap_nopage, 1701 static struct vm_operations_struct file_private_mmap = { 1700 */ 1699 * know they can't ever get write permissions..) 1698 * (This is actually used for shared mappings as well, if we 1697 * 1696 * Private mappings just need to be able to load in the map. 1695 /* 1694 1693 }; 1692 nopage: filemap_nopage, 1691 static struct vm_operations_struct file_shared_mmap = { 1690 */ 1689 * backing•store for swapping.. 1688 * close/unmap/sync. They will also use the private file as 1687 * Shared mappings need to be able to do the right thing at 1686 /* ==================== mm/filemap.c 1686 1703 ==================== Ͼ㒧ᵘഛᅮНѢ mm/filemap.c˖ 191 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! · mremap(void *old_adress, size_t old_size, size_t new_size, unsigned long flags) 㾷䰸⬅ mmap()᠔ᓎゟⱘ᭛ӊ᯴ᇘDŽ · munmap(void *start, size_t length) ៥ӀাᡞᅗӀ߫ߎѢϟˈ᳝݈䍷៪䳔㽕ⱘ䇏㗙ৃ㞾㸠䯙䇏䖭ѯߑ᭄ⱘ⑤ҷⷕDŽ DŽ䰤Ѣ㆛ᐙˈܙ䰸 mmap()ҹ໪ˈLinux ݙḌ䖬ᦤկњ޴ϾϢП᳝݇ⱘ㋏㒳䇗⫼ˈ԰Ўᇍ mmap()ⱘ㸹 ៥Ӏᡞ䖭ѯᚙ᱃⬭㒭䇏㗙԰ЎĀᆊᒁ԰ϮāDŽ ᤶܹˋᤶߎ义䴶䙷ḋ䖯ܹ do_swap_page()DŽڣdo_no_page()ˈ㗠ϡ (3) 㾷䰸њ᯴ᇘⱘ义䴶೼ݡ⃵ফࠄ䆓䯂ᯊজӮথ⫳㔎义ᓖᐌˈҡᮻ಴义䴶᮴᯴ᇘ㗠䖯ܹ ৢ㗙ᰃᮁᓔ义䴶᯴ᇘˈՓ义䴶㸼乍ᣛ৥ⲬϞ义䴶DŽ ᱂䗮ⱘᤶܹˋᤶߎ义䴶᳝ϡৠⱘ໘⧚DŽᇍѢࠡ㗙ᰃ㾷䰸义䴶᯴ᇘˈᡞ义䴶㸼乍䆒㕂៤ 0˗㗠ᇍ Ё䇗⫼ ext2_writepage()DŽ៥Ӏ೼ try_to_swap_out()ⱘҷⷕЁ᳒ⳟࠄˈᇍ⫼Ѣ᭛ӊ᯴ᇘⱘ义䴶Ϣ try_to_swap_out()Ё㹿㾷䰸᯴ᇘ㗠䕀ܹϡ⌏䎗⢊ᗕDŽབᵰ义䴶ᰃĀ㛣āⱘˈ߭гӮ೼ page_launder() ݙᆍݭܹ᭛ӊDŽབᵰ义䴶ᕜ䭓ᯊ䯈≵᳝ফࠄ䆓䯂ˈ߭义䴶Ӯ㗫ሑᅗⱘᇓੑˈҢ㗠೼ϔ⃵ ⬅ݙḌ㒓⿟ bdflush()਼ᳳᗻ䖤㸠ᯊ䗮䖛 page_launder()䯈᥹ഄ䇗⫼ ext2_writepage()ˈᇚ义䴶ⱘ (2) ᓎゟ䍋᯴ᇘҹৢˈᇍ义䴶ⱘݭ᪡԰Փ义䴶বĀ㛣āˈԚᰃ义䴶ⱘݙᆍᑊϡゟेݭಲ᭛ӊЁˈ㗠 ߚ䜡ϔϾぎ䯆ݙᄬ义䴶ᑊҢ᭛ӊ䇏ܹⳌᑨⱘ义䴶ˈ✊ৢᓎゟ䍋᯴ᇘDŽ ⱘᓖᐌ໘⧚⿟ᑣЎ do_no_page()DŽᇍѢ Ext2 ᭛ӊ㋏㒳ˈdo_no_page()Ӯ䗮䖛 ext2_readpage() ᔧ䖭Ͼऎ䯈ЁⱘϔϾ义䴶佪⃵ফࠄ䆓䯂ᯊˈӮ⬅Ѣ㾕䴶᮴᯴ᇘ㗠থ⫳㔎义ᓖᐌˈⳌᑨˈܜ(1) 佪 䙷МˈҔМᯊ׭ˈ⬅䇕ᴹ䇗⫼䖭ѯߑ᭄ਸ਼˛ ህᰃ filemap_nopage()ǃext2_readpage()ҹঞ ext2_writepage()DŽ ໛ϔѯߑ᭄ˈ䖭ޚ㸠DŽ݋ԧഄˈህᰃЎ᯴ᇘⱘᓎゟǃ⠽⧚义䴶ⱘᤶܹ੠ᤶߎ˄ҹঞ᯴ᇘⱘᢚ䰸˅ߚ߿ ᓎゟ೼㒳䅵᭄᥂ⱘ෎⸔ϞDŽ䖭䞠ℷᰃ䖤⫼њ䖭Ͼὖᗉˈᡞ݋ԧ义䴶ⱘ᯴ᇘ᥼䖳ࠄⳳℷ䳔㽕ⱘᯊ׭ᠡ䖯 ᅮᕔᕔ㽕އЎߚᬷ໘⧚㗠Փ݋ԧ᯴ᇘ↣ϔϾ义䴶ⱘᓔ䫔๲ࡴDŽ᠔ҹ䖭䞠᳝Ͼ߽ᓞᴗ㸵ⱘ䯂乬ˈ݋ԧⱘ 䳔㽕⫼ࠄϔϾ义䴶ᯊݡᴹᓎゟ䆹义䴶ⱘ᯴ᇘˈ⫼ࠄ޴Ͼ义䴶ህ᯴ᇘ޴Ͼ义䴶DŽᔧ✊ˈ䙷ḋᕜৃ㛑Ӯ಴ 䭓ᳳϡ⫼ⱘ义䴶䖬ᕫ䌍ࢆᡞᅗӀᤶߎઽDŽ㗗㰥ࠄ䖭ѯ಴㋴ˈ䖬ϡབࠄⳳℷˈމᰃϡ㛑ᗑ⬹ϡ䅵ⱘDŽԩ ᯴ᇘњ 100 Ͼ义䴶ˈ㗠ᅲ䰙Ϟ೼Ⳍᔧ䭓ⱘᯊ䯈䞠া⫼ࠄњ݊ЁⱘϔϾ义䴶ˈ㗠᯴ᇘ 99 Ͼ义䴶ⱘᓔ䫔ै ᅲ䰙Ϟैḍᴀ≵᳝⫼ࠄ៪㗙া⫼ࠄњᕜᇣϔ䚼ߚˈҢ㗠䗴៤њ⌾䌍DŽህҹ䖭䞠ⱘ᭛ӊ᯴ᇘᴹ䇈ˈг䆌 ໛ˈޚगবϛ࣪ˈ᳝ᯊ׭㢅њ㗕໻ⱘࢆᠡᅠ៤њމ᥼䖳ࠄⳳℷ䳔㽕ᯊᠡ䖯㸠DŽ䖭ᰃ಴Ўᅲ䰙䖤㸠Ёⱘᚙ ໛㗠䖯㸠ⱘ᪡԰˄䅵ㅫ˅ৃ㛑ᑊ᮴ᖙ㽕ˈ᠔ҹᑨ䆹ޚ⾡computationāⱘὖᗉˈህᰃ䇈᳝ѯЎᇚᴹ԰ᶤ ᓎゟ䍋ᮄⱘ᯴ᇘDŽ঺ϔᮍ䴶ˈ೼䅵ㅫᴎᡔᴃЁ᳝ϔϾ⿄Ў“lazyމᯊህৃҹḍ᥂ᔧᯊⱘ݋ԧᚙ Ѡ㗙䛑ᰃࡼᗕⱘDŽ᠔ҹˈ䞡㽕ⱘᑊϡᰃᓎゟ䍋ϔϾ⡍ᅮⱘ᯴ᇘˈ㗠ᰃᓎゟ䍋ϔ༫ᴎࠊˈՓᕫϔᮺ䳔㽕 ⴔϸϾ⦃㡖ˈϔᰃ⠽⧚义䴶Ϣ᭛ӊ᯴䈵П䯈ⱘᤶܹˋᤶߎˈѠᰃ⠽⧚义䴶Ϣ㰮ᄬ义䴶П䯈ⱘ᯴ᇘDŽ䖭 义䴶᯴ᇘⱘᓎゟʽ݊ᅲˈ݋ԧⱘ᯴ᇘᰃ䴲ᐌࡼᗕǃ㒣ᐌ೼বⱘDŽ᠔䇧᭛ӊϢ㰮ᄬऎ䯈П䯈ⱘ᯴ᇘࣙ৿ 䇏㗙г䆌ᛳࠄೄᚥˈ೼᭛ӊϢ㰮ᄬऎ䯈П䯈ᓎゟ᯴ᇘ䲒䘧ህ䖭Мㅔऩ˛㗠Ϩ៥Ӏḍᴀህ≵᳝ⳟࠄ ྟⱘ义䴶᯴ᇘˈ䖭Ͼߑ᭄ⱘҷⷕᏆ㒣೼ࠡϔ㡖Ёⳟࠄ䖛њDŽ ෎ᴀᅠ៤њ do_mmap_pgoff()ⱘ᪡԰ˈҙ೼㽕∖ᇍऎ䯈ࡴ䫕ᯊᠡ䇗⫼ make_pages_present()ˈᓎゟ䍋߱ ᅠ៤њ䖭ѯẔᶹ੠໘⧚ˈᡞᮄᓎゟⱘ vm_area_struct 㒧ᵘᦦܹࠄᔧࠡ䖯⿟ⱘ mm_struct 㒧ᵘЁˈህ 192 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᳔ৢˈmprotect()⫼ᴹᬍবϔ↉㰮ᄬぎ䯈ⱘֱᡸሲᗻDŽ · mprotect(const void *addr, size_t len, int prot) 䆌ᤶߎDŽܕϡ ⫼ mlock()ᇚ㰮ᄬЁҢ addr ᓔྟⱘ len Ͼᄫ㡖ˈᅲ䰙Ϟᰃ䖭ѯᄫ㡖᠔೼ⱘ义䴶䫕ᅮ೼ݙᄬЁˈ ⱘDŽԚ᳝ᯊ׭ᶤѯ䖯⿟಴䖤㸠ᬜ⥛ⱘ㗗㰥䳔㽕ᇚᶤѯ义䴶Ā䫕ᅮā೼ݙᄬЁˈ䖭ᯊ׭ህৃҹ ᅮ义䴶ⱘᤶܹ៪ᤶߎއ㰮ᄬぎ䯈㹿᯴ᇘࠄ⠽⧚ぎ䯈ҹৢˈϔ㠀㗠㿔ᰃ⬅ݙḌ䖤⫼ LRU ㅫ⊩ᴹ · mlock(cost void *addr, size_t len) ᬜ㗠ᑨࡴҹࠋᮄDŽ 㗠䆒㕂ⱘˈ㸼⼎ৠϔ᭛ӊⱘ݊ᅗ᯴䈵ᑨ㹿㾚԰᮴މৠϔ᭛ӊ㹿໮⃵˄⬅໮Ͼ䖯⿟˅᯴ᇘⱘᚙ ࠋDŽ㗠 MS_INVALIDATEˈ䙷ᰃЎކ៤ˈ㋏㒳䇗⫼ᑨゟे䖨ಲˈݙḌৃҹ೼䗖ᔧⱘᯊᴎ䖯㸠 ࠋৃҹᓖℹഄᅠކ⼎ࠋᅠ៤ᯊᠡ䖨ಲDŽMS_ASYNC ߭㸼ކࠋゟࠏ䖯㸠ˈᑊϨ㋏㒳䇗⫼ᑨ䆹ㄝ ކ⼎Ё᳝ϝϾᷛᖫԡˈߚ߿Ў MS_SYNCǃMS_ASYNC ੠ MS_INVALIDATEDŽMS_SYNC 㸼 ࠋāࠄᅲ䰙ⱘ᭛ӊЁˈՓᕫ᭛ӊⱘݙᆍϢݙᄬЁⱘݙᆍϔ㟈DŽখ᭄ flagsކⱘ length Ͼᄫ㡖Ā ᡞϔϾᠧᓔⱘ᭛ӊ᯴ᇘࠄ䖯⿟ⱘ㰮ᄬぎ䯈ᑊ䖯㸠䇏ݭПৢˈৃҹ⫼ msync()ᇚҢഄഔ start ᓔྟ · msync(const void *start, size_t length, int flags) 䖭ᰃ Linux ᠔⡍᳝ⱘˈ⫼ᴹᠽ໻៪㓽ᇣᏆ㒣᯴ᇘⱘϔഫぎ䯈DŽ 193 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ડᑨЁᮁᯊˈे೼ᕫࠄњЁᮁ৥䞣ҹৢˈᗢḋ䖯ܹⳌᑨⱘЁᮁ᳡ࡵ⿟ᑣⱘ䖛⿟DŽ䖭ᰃҢ᪡԰㋏㒳ⱘ㾦 ᴀ㡖ϡ䅼䆎ϹḐᛣНϞⱘЁᮁડᑨܼ䖛⿟˄↨བ䇈ˈᗢḋ㦋ᕫЁᮁ৥䞣˅ˈ㗠ᰃⴔ䞡䅼䆎 CPU ೼ 3.1 X86 CPUᇍЁᮁⱘ⹀ӊᬃᣕ ߚ߭ҟ㒡㋏㒳䇗⫼ⱘ᳝݇ݙᆍDŽ ᴀゴࠡϔ䚼ߚݙᆍ䆆ЁᮁˈࣙᣀЁᮁⱘ⹀ӊᬃᣕǃ䕃ӊ໘⧚ҹঞЁᮁડᑨ੠᳡ࡵⱘ䖛⿟˗ৢϔ䚼 㒳⿄ЎĀЁᮁāDŽ㟇Ѣ㋏㒳䇗⫼ˈϔ㠀䛑ᰃ䗮䖛 INT ᣛҸᅲ⦄ⱘˈ᠔ҹгϢЁᮁᆚߛⳌ݇DŽ ԰Ўϔ⾡㒳ϔⱘ῵ᓣࡴҹ㗗㰥੠ᅲ⦄ˈ㗠ϨᐌᐌމऎߚᓔᴹњDŽ಴ℸˈ೼ᅲ䏉Ёᐌᐌᇚ䖭ѯϡৠⱘᚙ ህ಴Ёᮁ৥䞣ⱘϡৠ㗠ѦⳌމ㾘ᅮདⱘDŽ䖭ḋˈ䖭ѯϡৠⱘᚙܜᓖᐌⱘ৥䞣߭ᰃ CPU ⱘ⹀ӊ㒧ᵘЁ乘 ⱘ৥䞣ᰃ⬅䕃ӊ៪⹀ӊ䆒㕂དњⱘˈ䱋䰅ⱘ৥䞣ᰃ೼Ā㞾䱋āᣛҸЁথߎⱘ˄INT nЁⱘ n˅ˈ㗠৘⾡ ህḍ᥂Ёᮁ⑤᠔ᦤկⱘĀЁᮁ৥䞣āˈ೼ݙᄬЁᡒࠄⳌᑨⱘ᳡ࡵ⿟ᑣܹষᑊ䇗⫼䆹᳡ࡵ⿟ᑣDŽ໪䚼Ёᮁ ⱘˈCPU ⱘડᑨ䖛⿟ै෎ᴀϞϔ㟈DŽ䖭ህᰃ˖೼ᠻ㸠ᅠᔧࠡᣛҸҹৢˈ៪㗙೼ᠻ㸠ᔧࠡᣛҸⱘЁ䗨ˈ Ԛᰃˈϡㅵᰃ໪䚼ѻ⫳ⱘЁᮁ䖬ᰃ䱋䰅ˈ៪㗙ᓖᐌˈϡㅵᰃ᮴ᛣⱘǃ㹿ࡼⱘˈ䖬ᰃᬙᛣⱘǃЏࡼ 䖭ḋˈϔ݅ህ᳝ϝ⾡㉏ԐⱘᴎࠊˈेЁᮁǃ䱋䰅ҹঞᓖᐌDŽ ゴЁⳟࠄ䖛䗮䖛义䴶ᓖᐌᠽሩේᷜऎ䯈ⱘᚙ᱃ˈ䙷ህᰃᬙᛣᅝᥦⱘDŽ 䖭໮ञᰃ಴Ўϡᇣᖗˈ㗠ϡᰃᬙᛣⱘˈ᠔ҹгᰃ㹿ࡼⱘDŽᔧ✊ˈгϡᥦ䰸ᬙᛣⱘৃ㛑ᗻDŽ៥Ӏ೼㄀ 2 ᖗā⢃њ㾘ᠡথ⫳DŽ՟བˈᔧԴ೼⿟ᑣЁথߎϔᴵ䰸⊩ᣛҸ DIVˈ㗠䰸᭄Ў 0 ᯊˈህӮথ⫳ϔ⃵ᓖᐌDŽ ℸ໪ˈ䖬᳝ϔ⾡ϢЁᮁⳌԐⱘᴎࠊ⿄ЎĀᓖᐌā˄exception˅ˈϔ㠀гᰃᓖℹⱘˈ໮ञ⬅ѢĀϡᇣ 䖯ܹЁᮁ᳡ࡵ⿟ᑣDŽ䖭⾡ЏࡼⱘЁᮁ⿄ЎĀ䱋䰅ā˄trap˅DŽܜ㽕 ᠔ҹᰃЏࡼⱘˈĀৠℹāⱘDŽা㽕 CPU ᠻ㸠њϔᴵ INT ᣛҸˈህⶹ䘧೼ᓔྟᠻ㸠ϟϔᴵᣛҸПࠡϔᅮ ⬅䕃ӊѻ⫳ⱘЁᮁ߭ϡৠˈᅗᰃ⬅ϧ䆒ⱘᣛҸˈབ X86 Ёⱘ“INT nāˈ೼⿟ᑣЁ᳝ᛣഄѻ⫳ⱘˈ ᮁˈ䖭ḋህৃҹⴐϡ㾕ᖗϡ⚺њ˄䖭䞠ϡ㗗㰥Āϡৃሣ㬑Ёᮁā˅DŽ āⱘ䗨ᕘᥤމᅠܼᰃ㹿ࡼⱘDŽϡ䖛ˈ䕃ӊৃҹ䗮䖛Ā݇ЁᮁāᣛҸ݇䯁ᇍЁᮁⱘડᑨˈᡞᅗĀড᯴ᚙ Āᓖℹāⱘˈḍᴀ᮴⊩乘⌟ℸ㉏ЁᮁӮ೼ҔМᯊ׭থ⫳DŽ಴ℸˈCPU˄៪㗙䕃ӊ˅ᇍ໪䚼Ёᮁⱘડᑨ ໪䚼Ёᮁˈህᰃ䗮ᐌ᠔䆆ⱘĀЁᮁā˄interrupt˅DŽᇍѢᠻ㸠Ёⱘ䕃ӊᴹ䇈ˈ䖭⾡Ёᮁⱘথ⫳ᅠܼᰃ ⿟Ёѻ⫳ⱘDŽ ㅔ㽕ᦤϔϟˈЁᮁ᳝ϸ⾡ˈϔ⾡ᰃ⬅ CPU ໪䚼ѻ⫳ⱘˈ঺ϔ⾡ᰃ⬅ CPU ᴀ䑿೼ᠻ㸠⿟ᑣⱘ䖛ܜ ⱘҟ㒡੠ߚᵤˈ⡍߿ᰃ䱣ⴔ৘Ͼᚙ᱃ⱘথሩ੠ҷⷕⱘ䯙䇏ˈ䇏㗙㞾Ӯ䗤ℹഄࡴ⏅⧚㾷DŽ 䴶ⱘ᳝݇ᴤ᭭DŽϡ䖛ˈ៥Ӏгᑊϡ㽕∖䇏㗙ᇍⳌ݇ݙᆍᏆ㒣݋໛њᕜ⏅ܹⱘ⧚㾷DŽџᅲϞˈ䱣ⴔ៥Ӏ 䯙䇏ϔѯᖂ໘⧚఼ᮍܜ˄exception˅໘⧚ⱘॳ⧚੠ᴎࠊϡ԰⏅ܹⱘҟ㒡DŽ㔎У䖭ᮍ䴶෎⸔ⱘ䇏㗙ϡོ ᅮᴀкⱘ䇏㗙Ꮖ㒣݋໛њ䅵ㅫᴎ㋏㒳㒧ᵘᮍ䴶ⱘ෎⸔ⶹ䆚ˈ᠔ҹᴀゴᇍЁᮁҹঞᓖᐌ؛៥Ӏ ㄀3ゴЁᮁǃᓖᐌ੠㋏㒳䇗⫼ 194 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⳟӏࡵ䮼ˈ݊໻ᇣЎ 64 ԡˈ㒧ᵘབ೒ 3.1 ᠔⼎DŽܜ ϡᰃϢЁᮁ৥䞣㸼Ⳍ㘨㋏ⱘDŽ 䰅䮼˄trap gate˅ҹঞ䇗⫼䮼˄call gate˅DŽ݊Ё䰸ӏࡵ䮼໪݊ᅗϝ⾡䮼ⱘ㒧ᵘ෎ᴀⳌৠˈϡ䖛䇗⫼䮼ᑊ ᣝϡৠⱘ⫼䗨੠ⳂⱘˈCPU Ёϔ᳝݅ಯ⾡䮼ˈेӏࡵ䮼˄task gate˅ǃЁᮁ䮼˄interrupt gate˅ǃ䱋 ࠄ䆹䖯⿟DŽ ᤶࠄ঺ϔϾ䖯⿟DŽৠℸ೼᪡԰㋏㒳Ёৃҹ䆒ゟϔϾĀЁᮁ᳡ࡵ䖯⿟˄ӏࡵ˅āˈ↣ᔧЁᮁথ⫳ᯊህߛᤶ CPU ⱘ䖤㸠⢊ᗕᑊ䕀ܹЁᮁ᳡ࡵ⿟ᑣˈ䖬ৃҹᅝᥦ䖯㸠ϔ⃵ӏࡵߛᤶ˄᠔䇧ĀϞϟ᭛ߛᤶā˅ˈゟेߛ 䖬ৃҹ䗮䖛ᄤ⿟ᑣ䇗⫼ᣛҸ CALL ੠䕀⿏ᣛҸ JMP ᴹ䖒ࠄⳂⱘDŽ㗠ϨˈᔧЁᮁথ⫳ᯊϡԚৃҹߛᤶ ㋏㒳ⱘ 0 㑻ˈህ䛑㽕䗮䖛ϔ䘧䮼DŽ㗠Ң⫼᠋ᗕ䖯ܹ㋏㒳ᗕⱘ䗨ᕘгᑊϡা䰤ѢЁᮁ˄៪ᓖᐌˈ៪䱋䰅˅ˈ 㑻߿ˈ՟བҢ⫼᠋ⱘ 3 㑻䖯ܹܜᰃЎЁᮁ㗠䆒ⱘˈা㽕ᛇߛᤶ CPU ⱘ䖤㸠⢊ᗕˈे݊Ӭܝḋⱘ䮼ᑊϡ 䗮䖛䖭ѯ䮼ˈᠡ㛑䖯ܹⳌᑨⱘ᳡ࡵ⿟ᑣDŽԚᰃˈ䖭ܜ乍ˈ⿄ЎĀ䮼ā˄gate˅ˈᛣᗱᰃᔧЁᮁথ⫳ᯊᖙ乏 Ёᮁ৥䞣㸼Ёⱘ㸼乍Ңऩ㒃ⱘܹষഄഔᬍ៤њ㉏ԐѢ PSW ࡴܹষഄഔᑊϨ᳈Ў໡ᴖⱘᦣ䗄ˈܜ佪 ಴ℸˈIntel ೼ᅲ⦄ֱᡸ῵ᓣᯊˈᇍ CPU ⱘЁᮁડᑨᴎࠊ԰њ໻ᐙᑺⱘׂᬍDŽ ϟⱘЁᮁડᑨ䖛⿟᠔㔎ᇥⱘህᰃ㉏ԐѢ PDP•11 ᇍ PSW ⱘ໘⧚DŽ ᳡ࡵ⿟ᑣৢ䖨ಲᯊ֓জᘶ໡ॳ⢊ˈಲࠄ⫼᠋⢊ᗕDŽⳌ↨Пϟˈ៥Ӏৃҹ⏙ἮഄⳟࠄˈX86 ᅲഄഔ῵ᓣ 䖬ᰃ㋏㒳䇗⫼˄⬅䕃ӊѻ⫳ⱘЁᮁ˅ˈ៪ᰃᶤ⾡ᓖᐌˈ䛑Ӯ䗮䖛Ёᮁ৥䞣㸼䖯ܹ㋏㒳⢊ᗕˈᠻ㸠ᅠЁᮁ 䖤㸠῵ᓣDŽ䖭ḋˈህᕜ㞾✊ഄᅲ⦄њ䖤㸠⢊ᗕⱘߛᤶDŽCPU ᑇᯊ໘Ѣ⫼᠋⢊ᗕˈ᮴䆎ᰃ಴Ў໪䚼Ёᮁ PSW ߭䱣Ёᮁ䖨ಲഄഔϔ䍋㹿य़ܹේᷜ˄⫼᠋ේᷜ˅ˈҹ֓ CPU ҢЁᮁ᳡ࡵ⿟ᑣ䖨ಲᯊ㛑ಲࠄॳᴹⱘ 㑻߿˅ⱘঠ䞡ⳂⱘDŽ㟇ѢॳᴹⱘܜⱘЁᮁ᳡ࡵ⿟ᑣˈজҢϔ⾡䖤㸠῵ᓣߛᤶࠄ঺ϔ⾡䖤㸠῵ᓣ˄៪Ӭ PSW 㺙ܹ݊᥻ࠊ⢊ᗕᆘᄬ఼ˈ㗠ᇚЁᮁ᳡ࡵ⿟ᑣⱘܹষഄഔ㺙ܹ⿟ᑣ䅵఼᭄ˈҢ㗠䖒ࠄ᮶䕀ܹњⳌᑨ ᔧ✊ˈЁᮁ৥䞣㸼ⱘݙᆍা᳝ᔧ CPU ໘Ѣ㋏㒳῵ᓣᯊᠡ㛑ᬍবDŽᔧЁᮁথ⫳ᯊˈCPU Ң৥䞣㸼Ёᇚ ߚ㒘៤ˈϔ䚼ߚᰃⳌᑨЁᮁ᳡ࡵ⿟ᑣⱘܹষഄഔˈ঺ϔ䚼ߚህᰃᔧ CPU 䖯ܹЁᮁ᳡ࡵ⿟ᑣৢⱘ PSWDŽ 㑻ⱘⳂⱘⱘDŽ೼ PDP•11 ⱘЁᮁ৥䞣㸼Ёˈ↣Ͼ㸼乍⬅ϸ䚼ܜᰃϡ㛑䗮䖛Ⳉ᥹ׂᬍ PSW ᴹ䖒ࠄ䇗催Ӭ 㑻੠῵ᓣ˄㋏㒳៪⫼᠋˅DŽ೼⫼᠋⿟ᑣЁܜᅮњ CPU ⱘᔧࠡ䖤㸠Ӭއ⧚఼⢊ᗕᄫ˅DŽPSW Ё᳝ϔϾԡ↉ Ϟᅲ⦄ⱘDŽPDP•11 ⱘ CPU Ё᳝ϔϾϢ X86 ⱘ FLAGS ᆘᄬ఼Ⳍ㉏Ԑⱘ᥻ࠊ⢊ᗕᆘᄬ఼ˈ⿄Ў PSW ໘ ⱘDŽ䇏㗙г䆌ⶹ䘧ˈᮽᳳⱘUNIX ᰃ೼PDP•11خ᠟↉DŽЎњ⧚㾷䖭ϔ⚍ˈ䅽៥Ӏᴹⳟⳟ݊ᅗⱘCPUᰃᗢМ ㅵ⧚ˈг䖬ᰃ᮴⌢ѢџDŽॳ಴೼Ѣˈ䖭ϾᴎࠊЁᑊ≵᳝ᦤկぎ䯈ߛᤶˈ៪㗙䇈䖤㸠῵ᓣߛᤶⱘټᓣᄬ ೼䖭ḋⱘᴎࠊϞᰃϡ㛑ᵘㄥ⦄ҷᛣНⱘ᪡԰㋏㒳ⱘˈेՓᡞ 16 ԡᇏഔᬍ៤ 32 ԡᇏഔˈेՓᅲ⦄њ义 ៤ⱘഄഔ֓ᰃⳌᑨЁᮁ᳡ࡵ⿟ᑣⱘܹষഄഔDŽ䖭Ϣ 16 ԡᅲഄഔ῵ᓣЁⱘᇏഔᮍᓣгᰃϔ㟈ⱘDŽԚᰃˈ ϔϾЁᮁ৥䞣㸼DŽ㸼Ёⱘ↣Ͼ㸼乍ऴಯϾᄫ㡖ˈ⬅ϸϾᄫ㡖ⱘ↉ഄഔ੠ϸϾᄫ㡖ⱘԡ⿏㒘៤DŽ䖭ḋᵘ ⱘЁᮁડᑨᴎࠊᰃ䴲ᐌॳྟǃ䴲ᐌㅔऩⱘDŽ೼ᅲഄഔ῵ᓣЁˈCPU ᡞݙᄬЁҢ 0 ᓔྟⱘ 1K ᄫ㡖԰Ў ᑺ䳔㽕݇ᖗⱘ䯂乬DŽIntel X86 CPU ᬃᣕ 256 ϾϡৠⱘЁᮁ৥䞣ˈ䖭ϔ⚍㟇Ҟ᳾বDŽৃᰃˈᮽᳳ X86 CPU 195 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䮼ᑊϡᣛ৥ᶤϔϾᄤ⿟ᑣⱘܹ䮼ˈTSS ᴀ䑿ᰃ԰ЎϔϾ↉ᴹᇍᕙⱘˈ㗠Ёᮁ䮼ǃ䱋䰅䮼੠䇗⫼䮼߭䛑 ⫼䮼ⱘ㉏ൟⷕᰃ 100DŽϢӏࡵ䮼Ⳍ↨ˈϡৠП໘Џ㽕೼Ѣ˖೼ӏࡵ䮼Ёϡ䳔㽕Փ⫼↉ݙԡ⿏ˈ಴Ўӏࡵ ϝ⾡䮼П䯈ⱘϡৠП໘೼Ѣ 3 ԡⱘ㉏ൟⷕDŽЁᮁ䮼ⱘ㉏ൟⷕᰃ 110ˈ䱋䰅䮼ⱘ㉏ൟⷕᰃ 111ˈ㗠䇗 ೒ 3.2 Ёᮁ䮼ǃ䱋䰅䮼੠䇗⫼䮼㒧ᵘ೒ 䰸ӏࡵ䮼໪ˈ݊ԭϝ⾡䮼ⱘ㒧ᵘ෎ᴀⳌৠ˗↣Ͼ䮼ⱘ໻ᇣг䛑ᰃ 64 ԡˈ㾕೒ 3.2DŽ ᣛҸ៪ JMP ᣛҸ䗮䖛䇗⫼䮼䖒ࠄৠḋⱘⳂⱘDŽDPL ԡ↉ⱘ԰⫼ৢ䴶䖬㽕䅼䆎DŽ Ў䖯⿟ߛᤶⱘ᠟↉DŽ䗮䖛ӏࡵ䮼ߛᤶࠄϔϾᮄⱘӏࡵᑊϡᰃᚳϔⱘ䗨ᕘˈ՟བ೼⿟ᑣЁгৃҹ⫼ CALL Linux ⱘ䖯⿟জᑊϡᅠܼᰃ Intel 䆒䅵ᛣ೒ЁⱘӏࡵDŽ䇏㗙ৢ䴶ህӮⳟࠄˈLinux ݙḌᑊϡ䞛⫼ӏࡵ䮼԰ ϔϾ䖯⿟ˈԚᰃ䖯⿟ⱘĀ᥻ࠊഫāˈे task_struct 㒧ᵘЁ䳔㽕ᄬᬒ᳈໮ⱘֵᙃDŽ᠔ҹˈҢ䖭ϾᛣНϞ䆆ˈ CPU Ёজ๲䆒њϔϾĀӏࡵᆘᄬ఼”TRˈ⫼ᴹᣛ৥ᔧࠡӏࡵⱘ TSSDŽ೼ Linux ݙḌЁˈϔϾӏࡵህᰃ ⱘ TSS ԰Ўᔧࠡӏࡵˈᇚ݊ݙᆍ㺙ܹ CPU Ёⱘ৘Ͼᆘᄬ఼ˈҢ㗠ᅠ៤њϔ⃵ӏࡵⱘߛᤶDŽЎℸⳂⱘˈ 㑻߿ⱘẔᶹˈCPU ህӮᇚᔧࠡӏࡵⱘ䖤㸠⦄എֱᄬ೼Ⳍᑨⱘ TSS Ёˈᑊᇚӏࡵ䮼᠔ᣛ৥ܜϨ䗮䖛њӬ њϝϾේᷜᣛ䩜DŽЁᮁথ⫳ᯊˈCPU ೼Ёᮁ৥䞣㸼ЁᡒࠄⳌᑨⱘ㸼乍DŽབᵰℸ㸼乍ᰃϔϾӏࡵ䮼ˈᑊ ᭄᥂㒧ᵘˈ݊Ёࣙᣀ CPU Ё᠔᳝Ϣ݋ԧ䖯⿟᳝݇ⱘᆘᄬ఼ⱘݙᆍ˄ࣙ৿义䴶Ⳃ∖ᣛ䩜 CR3˅ˈ䖬ࣙᣀ ϔ⾡ˈ⿄ЎĀӏࡵ⢊ᗕ↉ā˄task state segment˅TSSDŽTSS ᅲ䰙ϞᰃϔϾ⫼ᴹֱᄬӏࡵ䖤㸠Ā⦄എāⱘ TSS ↉䗝ᢽⷕⱘ԰⫼੠↉ᆘᄬ఼ CSǃDS ㄝⳌԐˈ䗮䖛 GDT ៪ LDT ᣛ৥⡍⅞ⱘĀ㋏㒳↉āЁⱘ ೒ 3.1 ӏࡵ䮼㒧ᵘ೒ 196 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ϟˈCPU ϡԚ㽕މܹ݊ේᷜ↉ᆘᄬ఼ SS ੠ේᷜᣛ䩜˄ᆘᄬ఼˅ESPˈ䖒ࠄ᳈ᤶේᷜⱘⳂⱘDŽ೼䖭⾡ᚙ ࠄᔧࠡ TSS 㒧ᵘˈᑊḍ᥂Ⳃᷛҷⷕ↉ⱘ DPLˈҢ䖭 TSS 㒧ᵘЁপߎᮄⱘේᷜᣛ䩜˄SS ࡴ ESP˅ˈᑊ㺙 ߚ߿⫼Ѣᔧ CPU ೼Ⳃᷛҷⷕ↉Ёⱘ䖤㸠㑻߿Ў 0ˈ1 ҹঞ 2 ᯊDŽ᠔ҹˈCPU ḍ᥂ᆘᄬ఼ TR ⱘݙᆍᡒ ఼ݙᆍ˄ࣙᣀᔧࠡⱘ SS ੠ ESP˅໪ˈ䖬᳝ϝϾ乱໪ⱘේᷜᣛ䩜˄SS ࡴ ESP˅DŽ䖭ϝϾ乱໪ⱘේᷜᣛ䩜 DPLˈϢЁᮁথ⫳ᯊⱘ CPL ϡৠˈ䙷ህ㽕ᓩ䍋᳈ᤶේᷜDŽࠡ䴶ᦤࠄ䖛ˈTSS 㒧ᵘЁ䰸᠔᳝ᐌ㾘ⱘᆘᄬ 㸼⼎ᓖᐌॳ಴ⱘߎ䫭ҷⷕгय़ܹේᷜDŽ䖯ϔℹˈབᵰЁᮁ᳡ࡵ⿟ᑣⱘ䖤㸠㑻߿ˈгህᰃⳂᷛҷⷕ↉ⱘ ⬅↉ᆘᄬ఼ CS ⱘݙᆍ੠পᣛҸᣛ䩜 EIP ⱘݙᆍ݅ৠ㒘៤ⱘDŽབᵰЁᮁᰃ⬅ᓖᐌᓩ䍋ⱘˈ߭䖬㽕ᇚϔϾ 䖯ܹЁᮁ᳡ࡵ⿟ᑣᯊˈCPU 㽕ᇚᔧࠡ EFLAGS ᆘᄬ఼ⱘݙᆍҹঞ䖨ಲഄഔय़ܹේᷜˈ䖨ಲഄഔᰃ ԩϔϾ༅䋹䛑Ӯѻ⫳ϔ⃵ܼ䴶ֱᡸᓖᐌ˄general protection exception˅DŽ 䆌䰡Ԣ݊䖤㸠㑻߿DŽ䖭ϸϾ⦃㡖Ёⱘӏܕ䆌ֱᣕ៪ᦤछ CPU ⱘ䖤㸠㑻߿˗㗠ϡܕ䇈ˈ䗮䖛Ёᮁ䮼ᯊা 䖬㽕䖯ϔℹᇚⳂᷛҷⷕ↉ᦣ䗄乍Ёⱘ DPL Ϣ CPL ↨䕗ˈⳂᷛ↉ⱘ DPL ᖙ乏ᇣѢ៪ㄝѢ CPLDŽгህᰃ এњ䖭ϔሖẔ偠DŽこ䖛њЁᮁ䮼Пৢˈܡ䖛ˈབᵰЁᮁᰃ⬅໪䚼ѻ⫳៪ᰃ಴ CPU ᓖᐌ㗠ѻ⫳ⱘ䆱ˈ䙷ህ 㑻߿ϡԢѢ DPLˈᠡ㛑こ䖛䖭᠛䮼DŽϡܜϢ CPU ⱘ CPL Ⳍ↨ˈCPL ᖙ乏ᇣѢ៪ㄝѢ DPLˈгህᰃӬ ϟϔ㠀ᘏᰃЁᮁ䮼DŽ✊ৢˈህ㽕ᇚ䖭Ͼ䮼ⱘ DPLމ೼Ёᮁ৥䞣㸼Ёᡒࠄϔ᠛䮼˄ᦣ䗄乍˅ˈ೼䖭⾡ᚙ ḍ᥂䆹৥䞣ܜ ᔧ䗮䖛ϔᴵ INT ᣛҸ䖯ܹϔϾЁᮁ᳡ࡵ⿟ᑣᯊˈ೼ᣛҸЁ㒭ߎϔϾЁᮁ৥䞣DŽCPU 䇈೼䖭䞠៥Ӏা݇ᖗᇍҷⷕ↉ⱘ䆓䯂ˈ᠔ҹ࠽ϟⱘ䚼ߚህϡ໾໡ᴖњDŽ ᴀϞгϡՓ⫼䇗⫼䮼˄ϡ䖛Ўњݐᆍᗻⱘ㽕∖⹂ᅲᬃᣕ䗮䖛䇗⫼䮼ᴹ䖯ܹ㋏㒳䇗⫼ˈԚϡᰃЏ⌕˅ˈݡ Ḍⱘᅲ⦄ҟ㒡݊Ёϔ䚼ߚDŽ⬅Ѣ Linux ݙḌ䙓ᓔњ䖭༫ᴎࠊЁ᳔໡ᴖⱘ䚼ߚˈ՟བϡՓ⫼ӏࡵ䮼ˈ෎ 㑻߿Ẕ偠ᴎࠊDŽ៥Ӏ೼䖭䞠াḍ᥂ Linux ݙܜIntel ೼ i386 CPU Ёᅲ⦄њϔ༫ৃ䇧໡ᴖᕫߎ༛ⱘӬ 㽕䆆ࠄ i386 ⱘֱᡸ῵ᓣЁᇍ䖤㸠੠䆓䯂㑻߿䖯㸠Ẕᶹ↨ᇍⱘᴎࠊњDŽ ܹ PSW ੠᳡ࡵ⿟ᑣܹষഄഔᰃϔ㟈ⱘDŽৃᰃˈ೼Ёᮁ䮼Ёг᳝ϔϾ DPLˈ䙷ᰃᑆҔМ⫼ⱘਸ਼˛䖭ህᰃ ህব៤ CPU ⱘᔧࠡ䖤㸠㑻߿ˈ⿄Ў CPLDŽ䖭Ϣ៥Ӏ೼ࠡ䴶᠔䇈ⱘ PDP•11 ೼ЁᮁᯊҢ৥䞣㸼Ёৠᯊ㺙 ҷⷕ↉ᦣ䗄乍ˈᑊ䖯㗠䕀ܹⳌᑨⱘ᳡ࡵ⿟ᑣᯊˈህᡞ䖭Ͼҷⷕ↉ᦣ䗄乍㺙ܹ CPU Ёˈ㗠ᦣ䗄乍ⱘ DPL 㑻߿āԡ↉DŽᔧ CPU 䗮䖛Ёᮁ䮼ᡒࠄϔϾܜ↣Ͼ↉ᦣ䗄乍Ё䛑᳝ϔϾ DPL ԡ↉ˈेĀᦣ䗄乍Ӭ Ԛᰃ bit44 ⱘ S ᷛᖫԡЎ 0ˈ㸼⼎ϡᰃϔ㠀ⱘҷⷕ↉៪᭄᥂↉DŽ 乍ˈ߭ᰃϧ䮼Ў TSS 㗠䆒ⱘ TSS ᦣ䗄乍DŽTSS ᦣ䗄乍ⱘ㒧ᵘϢ៥Ӏ೼㄀ 2 ゴЁ᠔䆆ⱘ෎ᴀϞᰃⳌৠⱘˈ 䱋䰅䮼੠䇗⫼䮼ᴹ䇈ˈ↉ᦣ䗄㸼ЁⱘⳌᑨ㸼乍ᰒ✊ᑨ䆹ᰃϔϾҷⷕ↉ᦣ䗄乍DŽ㗠ӏࡵ䮼᠔ᣛ৥ⱘᦣ䗄 ܼሔ↉ᦣ䗄㸼 GDTˈ㗠ሔ䚼↉ᦣ䗄㸼 LDT াᰃ೼⡍⅞ᑨ⫼Ё˄Џ㽕ᰃ WINE˅ᠡՓ⫼DŽᇍѢЁᮁ䮼ǃ Ѣ↉䗝ᢽⷕЁⱘϔϾ TI ᷛᖫԡDŽ೼ Linux ݙḌЁˈᅲ䰙ϞাՓ⫼އ⬅ LDTR ᠔ᣛ৥ⱘ↉ᦣ䗄㸼ˈ߭প ᅮⱘᶤϾ↉ᦣ䗄㸼ЁⱘϔϾ㸼乍ˈ᠔ҹᠡজ⿄ЎĀ↉䗝ᢽⷕāDŽ㟇Ѣࠄᑩᰃ⬅ GDTR 䖬ᰃއ ៪ LDTR ೼㄀ 2 ゴЁ䆆䖛ˈ೼ֱᡸ῵ᓣϟ↉ᆘᄬ఼ⱘݙᆍᑊϡⳈ᥹ᣛ৥ϔϾ↉ⱘ䍋ྟഄഔˈ㗠ᰃᣛ৥⬅ GDTR ↉DŽ↉䗝ᢽⷕⱘ԰⫼Ϣ᱂䗮ⱘ↉ᆘᄬ఼ϔḋDŽ៥ӀټϡㅵᰃҔМ䮼ˈ䛑䗮䖛↉䗝ᢽⷕᣛ৥ϔϾᄬ 䱋䰅䮼ⱘᚳϔऎ߿DŽ ៤ 0ˈҹ䰆ጠ༫Ёᮁⱘথ⫳˗㗠೼䗮䖛䱋䰅䮼䖯ܹ᳡ࡵ⿟ᑣᯊ߭㓈ᣕ IF ᷛᖫԡϡবDŽ䖭ህᰃЁᮁ䮼੠ Ёᮁ䮼䖯ܹЁᮁ᳡ࡵ⿟ᑣᯊ CPU Ӯ㞾ࡼᇚЁᮁ݇䯁ˈгህᰃᇚ CPU Ё EFLAGS ᆘᄬ఼ⱘ IF ᷛᖫԡ⏙ Ёᮁ䮼੠䱋䰅䮼೼Փ⫼Ϟⱘऎ߿ϡ೼ѢЁᮁᰃ໪䚼ѻ⫳ⱘ៪ᰃ⬅ CPU ᴀ䑿ѻ⫳ⱘˈ㗠ᰃ೼Ѣ䗮䖛 Ϟ∌䖰ᰃ 0DŽ 㽕ᣛ৥ϔϾᄤ⿟ᑣˈ᠔ҹᖙ乏㒧ড়Փ⫼↉䗝ᢽⷕ੠↉ݙԡ⿏DŽℸ໪ˈӏࡵ䮼ЁⳌᇍѢ D ᷛᖫԡⱘԡ㕂 197 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⱘ㋏㒳Пϔ˅ᴹ䇈ˈi386 ㋏㒳㒧ᵘЁⱘ䆌໮ݙᆍᰃϡᖙ㽕ⱘˈ⫮㟇ᰃ⬏㲛⏏䎇ⱘˈ䲒ᗾ᳝ѯᄺ㗙ᡍ䆘 Linux 䖭ḋⱘ᪡԰㋏㒳˄џᅲ䆕ᯢᰃࡳ㛑᳔ᔎˈᑊϨ᳔〇ᅮ ڣⱘݙᆍDŽ䖭гҢ঺ϔϾ㾦ᑺ䇈ᯢˈᇍѢ ᅲ䰙ⱘ i386 ㋏㒳㒧ᵘЁⱘ᳝݇ᴎࠊ↨Ϟ䴶䆆ⱘ䖬㽕໡ᴖˈ៥Ӏ⬹এњ݊ЁϢ Linux ݙḌᅲ⦄᮴݇ ೒ 3.4 ⱘ⼎ᛣ䇈ᯢњ i386 ֱᡸ῵ᓣϟⱘЁᮁᴎࠊ೼䞛⫼Ёᮁ䮼៪䱋䰅䮼ᯊⱘ㒧ᵘDŽ Ёᮁ৥䞣㸼 IDTˈ៪㗙䇈ᔧࠡЁᮁᦣ䗄㸼DŽ ੠ LDT 䙷ḋৃҹᬒ೼ݙᄬЁⱘӏԩഄᮍDŽЎℸⳂⱘˈ೼ CPU Ёজ๲䆒њϔϾᆘᄬ఼ IDTRˈᣛ৥ᔧࠡ GDT ڣ᳔ৢˈ೼ֱᡸ῵ᓣЁˈЁᮁ৥䞣㸼೼ݙᄬЁⱘԡ㕂гϡݡ䰤ѢҢഄഔ 0 ᓔྟⱘഄᮍˈ㗠ᰃ ේᷜߛᤶࠄ㋏㒳ේᷜDŽ㗠ᔧЁᮁথ⫳೼㋏㒳⢊ᗕᯊˈгህᰃᔧ CPU ೼ݙḌЁ䖤㸠ᯊˈ߭ϡӮ᳈ᤶේᷜDŽ 㸠㑻߿Ў 3ˈ㗠೼ݙḌЁⱘЁᮁ᳡ࡵ⿟ᑣⱘ䖤㸠㑻߿Ў 0ˈ᠔ҹӮᓩ䍋ේᷜⱘ᳈ᤶDŽгህᰃ䇈ˈҢ⫼᠋ ݋ԧࠄ Linux ݙḌDŽᔧЁᮁথ⫳೼⫼᠋⢊ᗕDŽгህᰃ CPU ೼⫼᠋ぎ䯈Ё䖤㸠ᯊˈ⬅Ѣ⫼᠋ᗕⱘ䖤 ೒ 3.3 Ёᮁ᳡ࡵ⿟ᑣේᷜ⼎ᛣ೒ ೒ 3.3 г䆌᳝ࡽѢ⧚㾷DŽ ᇚॳᴹⱘේᷜᣛ䩜гय़ܹේᷜ˄ᮄේᷜ˅DŽ⼎ᛣܜᇚ EFLAGSǃ䖨ಲഄഔҹঞߎ䫭ҷⷕय़ܹේᷜˈ䖬㽕 198 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 951 #ifdef CONFIG_EISA 950 { 949 void __init trap_init(void) ==================== arch/i386/kernel/traps.c 949 996 ==================== ߑ᭄ trap_init()ᰃ೼ arch/i386/kernel/traps.c ЁᅮНⱘ˖ ߭Џ㽕ᰃ⫼Ѣ໪䆒ⱘЁᮁDŽ ߑ᭄䖯㸠Ёᮁᴎࠊⱘ߱ྟ࣪DŽ݊Ё trap_init()Ё㽕ᰃᇍϔѯ㋏㒳ֱ⬭ⱘЁᮁ৥䞣ⱘ߱ྟ࣪ˈ㗠 init_IRQ() Linux ݙḌ೼߱ྟ࣪䰊↉ᅠ៤њᇍ义ᓣ㰮ᄬㅵ⧚ⱘ߱ྟ࣪ҹৢˈ֓䇗⫼ trap_init()੠ init_IRQ()ϸϾ 3.2 Ёᮁ৥䞣㸼 IDT ⱘ߱ྟ࣪ ೒ 3.4 Ёᮁᴎࠊ⼎ᛣ೒ Linux 䖭ḋⱘ⦄ҷ᪡԰㋏㒳ⱘ䳔㽕ˈैᰃ↿᮴⭥НⱘDŽ ڣ䎇 ⹂ᅲᰃϔ⾡㕢DŽᔧ✊ˈϡㅵᗢМ䇈ˈi386 ⱘ㋏㒳㒧ᵘ㛑໳⒵צ⦃ϡ㕢ⱘDŽ㗠Ⳍ↨ПϟˈLinux ݙḌⱘᅲ 䖰㾕ⱘˈ៥ӀᣁⳂҹᕙDŽབᵰ䇈ˈ೼㛑䖒ࠄⳌৠⳂᷛⱘࠡᦤϟㅔऩህᰃ㕢ˈ䙷М i386 ㋏㒳㒧ᵘᰒ✊ᰃ Intel ᇚ i386 ⱘ㋏㒳㒧ᵘ䖛Ѣ໡ᴖ࣪њDŽᔧ✊ˈг᳝ৃ㛑ᇚᴹӮߎ⦄ϔѯᮄⱘᡔᴃˈҢ㗠䆕ᯢ Intel ᰃ᳝ 199 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! བˈЁᮁ৥䞣 14 ህᰃЎ义䴶ᓖᐌֱ⬭ⱘˈCPU ⹀ӊ೼义䴶᯴ᇘঞ䆓䯂ⱘ䖛⿟Ёথ⫳䯂乬˄བ㔎义˅ˈ 䆒㕂Ёᮁ৥䞣㸼ᓔ༈ⱘ 19 Ͼ䱋䰅䮼ˈ䖭ѯЁᮁ৥䞣䛑ᰃ CPU ֱ⬭⫼Ѣᓖᐌ໘⧚ⱘDŽ՟ܜ⿟ᑣЁ 996 } 995 #endif 994 cobalt_init(); 993 lithium_init(); 992 superio_init(); 991 #ifdef CONFIG_X86_VISWS_APIC 990 989 cpu_init(); 988 */ 987 * Should be a barrier for any external CPU state. 986 /* 985 984 set_call_gate(&default_ldt[4],lcall27); 983 set_call_gate(&default_ldt[0],lcall7); 982 */ 981 * and a callgate to lcall27 for Solaris/x86 binaries 980 * default LDT is a single•entry callgate to lcall7 for iBCS 979 /* 978 977 set_system_gate(SYSCALL_VECTOR,&system_call); 976 975 set_trap_gate(19,&simd_coprocessor_error); 974 set_trap_gate(18,&machine_check); 973 set_trap_gate(17,&alignment_check); 972 set_trap_gate(16,&coprocessor_error); 971 set_trap_gate(15,&spurious_interrupt_bug); 970 set_trap_gate(14,&page_fault); 969 set_trap_gate(13,&general_protection); 968 set_trap_gate(12,&stack_segment); 967 set_trap_gate(11,&segment_not_present); 966 set_trap_gate(10,&invalid_TSS); 965 set_trap_gate(9,&coprocessor_segment_overrun); 964 set_trap_gate(8,&double_fault); 963 set_trap_gate(7,&device_not_available); 962 set_trap_gate(6,&invalid_op); 961 set_system_gate(5,&bounds); 960 set_system_gate(4,&overflow); 959 set_system_gate(3,&int3); /* int3•5 can be called from all */ 958 set_intr_gate(2,&nmi); 957 set_trap_gate(1,&debug); 956 set_trap_gate(0,÷_error); 955 954 #endif 953 EISA_bus = 1; if (isa_readl(0x0FFFD9) == 'E'+('I'<<8)+('S'<<16)+('A'<<24)) 952 200 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ໮ञᰃᓔⴔⱘˈ៥Ӏ೼㄀ 2 ゴЁⳟࠄ䖛ⱘ䙷ѯ⿟ᑣˈབ handle_mm_fault()ㄝㄝˈ䛑ᰃৃЁᮁⱘDŽℸ໪ 䗮䖛䱋䰅䮼䖯ܹ᳡ࡵ⿟ᑣᯊ߭㓈ᣕϡবDŽ᠔ҹˈ՟བ䇈ˈ಴ CPU ⱘ义䴶ᓖᐌ㗠䖯ܹ᳡ࡵ⿟ᑣᯊˈЁᮁ 䮼DŽ៥Ӏ೼ࠡ䴶Ꮖ㒣䆆䖛ˈ䱋䰅䮼ϢЁᮁ䮼ⱘϡৠҙ೼Ѣ䗮䖛Ёᮁ䮼䖯ܹ᳡ࡵ⿟ᑣᯊ㞾ࡼ݇Ёᮁˈ㗠 DPL 䆒៤ 0ˈ᠔ϡৠⱘᰃ䇗⫼_set_gate()ᯊⱘ㄀ 2 Ͼখ᭄Ў 15ˈгे㉏ൟЎ 111ˈ㸼⼎᠔䆒㕂ⱘᰃ䱋䰅 ⬅䕃ӊѻ⫳ⱘЁᮁህӮ㹿ᢦП䮼໪˄CPU Ӯѻ⫳ϔ⃵ᓖᐌ˅ˈ಴ℸϡ㛑ᕫ䗲DŽৠḋˈset_trap_gate()гᇚ ᴹ䖯ܹϡৃሣ㬑Ёᮁⱘ᳡ࡵ⿟ᑣᯊˈ⬅Ѣ⫼᠋⢊ᗕⱘ䖤㸠㑻߿Ў 3ˈ㗠Ёᮁ䮼ⱘ DPL Ў 0˄㑻߿᳔催˅ˈ ⱘ DPL ᰃ㹿ᗑ⬹ϡ乒ⱘˈ᠔ҹᘏ㛑こ䖛䆹Ёᮁ䮼DŽৃᰃˈ㽕ᰃ⫼᠋䖯⿟೼⫼᠋ぎ䯈䆩ⴔ⫼ϔᴵ“INT 2” ԡ↉DŽЁᮁ䮼ⱘ DPL ϔᕟ䆒㕂៤ 0 ᰃ᳝䆆おⱘDŽᔧЁᮁᰃ⬅໪䚼ѻ⫳៪ᰃ CPU ᓖᐌѻ⫳ᯊˈЁᮁ䮼 ᭄ 14 㸼⼎ D ᷛᖫԡЎ 1 㗠㉏ൟЎ 110ˈ᠔ҹ set_intr_gate()䆒㕂ⱘᰃЁᮁ䮼DŽ㄀ 3 Ͼখ᭄߭ᇍᑨѢ DPL 㸼Ёⱘ㄀ 2 Ͼǃ㄀ 3 Ͼখ᭄DŽ㄀ 2 Ͼখ᭄ᇍᑨѢЁᮁ䮼៪䱋䰅䮼ḐᓣЁⱘ D ᷛᖫԡࡴϞ㉏ൟԡ↉DŽখ 䖭ѯߑ᭄䛑䇗⫼ৠϔϾᄤ⿟ᑣ_set_gate()ˈ䆒㕂Ёᮁᦣ䗄㸼 idt_table Ёⱘ㄀ n 乍ˈ᠔ϡৠⱘᰃখ᭄ 826 } 825 _set_gate(a,12,3,addr); 824 { 823 static void __init set_call_gate(void *a, void *addr) 822 821 } 820 _set_gate(idt_table+n,15,3,addr); 819 { 818 static void __init set_system_gate(unsigned int n, void *addr) 817 816 } 815 _set_gate(idt_table+n,15,0,addr); 814 { 813 static void __init set_trap_gate(unsigned int n, void *addr) 812 811 } 810 _set_gate(idt_table+n,14,0,addr); 809 { 808 void set_intr_gate(unsigned int n, void *addr) ==================== arch/i386/kernel/traps.c 808 826 ==================== ᰃгሲѢৠϔ㒘ߑ᭄˄958 㸠䇗⫼њʽʽʽ˅DŽ䖭ѯߑ᭄䛑ᰃ೼᭛ӊ arch/i386/kernel/traps.c ЁᅮНⱘ˖ set_system_gate()ҹঞ set_call_gate()DŽ䖬᳝ϔϾ⫼Ѣ໪䆒Ёᮁⱘ set_intr_gate()ˈ䖭䞠㱑✊≵᳝⫼ࠄˈԚ Ң⿟ᑣЁৃҹⳟࠄˈ䖭䞠⫼њϝϾߑ᭄ᴹ䖯㸠䖭ѯ㸼乍ⱘ߱ྟ࣪ˈ䙷ህᰃ set_trap_gate()ǃ ݀ৌⱘ⡍⅞Ꮉ԰キᰒ⼎䆒໛ˈ᠔ҹህ⬹এњҢ 991 㸠ᓔྟⱘ޴㸠ᴵӊ㓪䆥ҷⷕDŽ Ⳍᑨ䆒㕂њϸϾ䇗⫼䮼ˈ983 㸠੠ 984 㸠ህᰃᇍ䖭ϸϾ䇗⫼䮼ⱘ߱ྟ࣪DŽ⬅Ѣ៥Ӏ೼䖭䞠ᑊϡ݇ᖗ SGI 䞞Ё᠔䇈ⱘ iBCS ੠ Solaris/x86DŽЎњϢ䖭ѯ㋏㒳Ϟ㓪䆥ⱘᑨ⫼⿟ᑣৃᠻ㸠ҷⷕⳌݐᆍˈLinux ݙḌг Linux ᪡԰㋏㒳ᴀ䑿ᑊϡՓ⫼䇗⫼䮼ˈԚᰃ᳝ѯ Unix ব⾡Ꮖ㒣⫼њ䇗⫼䮼ᴹᅲ⦄㋏㒳䇗⫼ˈབ⊼ 24 㸠˅ᅮНЎ 0x80ˈ᠔ҹᠻ㸠ϔᴵ“int $0x80āᣛҸህᰃ䖯㸠ϔ⃵㋏㒳䇗⫼DŽ ✊ৢᰃᇍ㋏㒳䇗⫼৥䞣ⱘ߱ྟ࣪ˈᐌ᭄ SYSCALL_VECTOR ೼ include/asm•i386/hw_irq.h Ё˄㄀ ህӮѻ⫳ϔ⃵ҹ 14˄0xe˅ЎЁᮁ৥䞣ⱘᓖᐌDŽ᪡԰㋏㒳ⱘ䆒䅵੠ᅲ⦄ᖙ乏䙉ᅜ䖭ѯ㾘ᅮDŽ 201 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 催16ԡЎ__KERNEL_CSˈ㗠Ԣ16 ԡЎ addr ⱘԢ16 ԡDŽ᥹ⴔˈ೼792㸠Ёᇚ(0x8000+(dpl<<13)+(type<<8)) ԡ˄⊼ᛣ%%dx Ϣ%%edx ⱘऎ߿˅DŽ䖭ḋˈ೼%%eax Ёህᔶ៤њ᠔䳔㽕ⱘЁᮁ䮼ⱘ㄀ϔϾ䭓ᭈ᭄ˈ݊ 䆒៤ addrˈ㗠%%eax 䆒៤(__KERNEL_CS << 16)DŽ㗠 791 㸠ᇚ%%edx ⱘԢ 16 ԡ⿏ܹ%%eax ⱘԢ 16 ᇚ%%edxܜ⬅Ѣ 791 㸠㽕⫼ࠄ%%dx ੠%%axˈ᠔ҹ㓪䆥˄ҹঞ∛㓪˅ҹৢⱘҷⷕӮᣝ䕧ܹ䚼ⱘ䇈ᯢ ೒ 3.5 Ёᮁ䮼੠䱋䰅䮼ⱘḐᓣᅮН Ўњᮍ֓ˈ៥Ӏᡞ᠔㽕∖ⱘЁᮁ䮼˄៪䱋䰅䮼˅ⱘḐᓣݡ㸼⼎೼೒ 3.5DŽ 䛑Ӯ೼ᓩ⫼䖭ѯব䞣Пࠡ䆒㕂དDŽ ᳝ϸϾব䞣ߚ߿ㄝӋѢ䕧ߎ䚼Ёⱘ%3 ੠%2DŽ䕧ܹ䚼Ё䇈ᯢⱘ৘䕧ܹব䞣ⱘؐˈࣙᣀ%3 ੠%2 ⱘؐˈ 㸠㟇 798 㸠߭Ў䕧ܹ䚼DŽ⬅Ѣ䕧ߎ䚼Ꮖ㒣ᅮНњ%0̚%3ˈ䕧ܹ䚼Ёⱘ㄀ϔϾব䞣֓Ў%4ˈ㗠ৢ䴶䖬 __d0 㒧ড়ˈᄬᬒ೼ᆘᄬ఼%%eax Ёˈ㗠%3 Ϣሔ䚼ব䞣__d1 㒧ড়ˈᄬᬒ೼ᆘᄬ఼%%edx ЁDŽҢ 797 Ϣሔ䚼ব䞣 2%˗ܗ㒧ড়DŽ݊Ё%0 Ϣখ᭄ gate_addr 㒧ড়ˈ%1 Ϣ(gate_addr+1)㒧ড়ˈѠ㗙䛑ᰃݙᄬऩ 797 㸠ⱘ㄀ 2 Ͼ“:āП䯈Ў䕧ߎ䚼ˈ݊Ё䇈ᯢњ᳝ಯϾব䞣Ӯ㹿ᬍবˈߚ߿Ϣ%0ǃ%1ǃ%2 ੠%3 Ⳍ ϟ䛑ϡӮ᳝䯂乬˄㾕㄀ 1 ゴ˅DŽҢ 795 㸠ⱘ㄀ϔϾ“:āࠄމᠻ㸠ϔ䘡DŽ⡍߿ᰃ೼㓪䆥ᯊϡㅵ೼ҔМᚙ ᅮњᅗⱘᕾ⦃ԧˈгህᰃҢ 790 㸠㟇 798 㸠ˈϔᅮӮ㹿ᠻ㸠ϔ䘡ˈᑊϨাއ(do {} while (0ˈܜ佪 799 } while (0) 798 "3" ((char *) (addr)),"2" (__KERNEL_CS << 16)); \ 797 :"i" ((short) (0x8000+(dpl<<13)+(type<<8))), \ 796 "=m" (*(1+(long *) (gate_addr))), "=&a" (__d0), "=&d" (__d1) \ 795 :"=m" (*((long *) (gate_addr))), \ 794 "movl %%edx,%1" \ 793 "movl %%eax,%0\n\t" \ 792 "movw %4,%%dx\n\t" \ 791 __asm__ __volatile__ ("movw %%dx,%%ax\n\t" \ 790 int __d0, __d1; \ 789 do { \ 788 #define _set_gate(gate_addr,type,dpl,addr) \ ==================== arch/i386/kernel/traps.c 788 799 ==================== ᅮН˖ 䖯ϔℹⳟⳟˈ䖭ѯ IDT 㸼乍ࠄᑩᗢМ䆒㕂DŽ_set_gate()г೼ৠϔ᭛ӊ˄arch/i386/kernel/traps.c˅Ё ህӮᡞ㋏㒳䇗⫼ᢦП䮼໪њDŽ ೼⫼᠋ぎ䯈䗮䖛“int $0x80ā䖯㸠ⱘˈা᳝ᇚ䆹䱋䰅䮼ⱘ DPL 䆒៤ 3 ᠡ㛑䅽㋏㒳䇗⫼乎߽こ䖛ˈ৺߭ set_system_gate()᠔䆒㕂ⱘгᰃ䱋䰅䮼ˈ᠔ҹ㋏㒳䇗⫼гᰃৃЁᮁⱘDŽԚᰃ DPL Ў 3ˈ಴Ў㋏㒳䇗⫼ᰃ 202 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᥹ⴔˈҢ FIRST_EXTERNAL_VECTOR ᓔྟˈ䆒ゟ NR_IRQS ϾЁᮁ৥䞣ⱘ IDT 㸼乍DŽᐌ᭄ 䘧䳔㽕᳝䖭МϔϾ㒧ᵘ᭄㒘ህ㸠њDŽ 䆒໛ো೼݊᠔ሲ䯳߫Ёᡒࠄ⡍ᅮⱘ᳡ࡵ⿟ᑣࡴҹᠻ㸠DŽ䖭Ͼ䖛⿟៥Ӏᇚ೼ҹৢ䆺㒚ҟ㒡ˈ䖭䞠া㽕ⶹ ᠻ㸠ϢЁᮁ৥䞣Ⳍᇍᑨⱘϔ↉ᘏ᳡ࡵ⿟ᑣˈḍ᥂݋ԧЁᮁ⑤ⱘܜ䚼ҹঞ᥻ࠊ㒧ᵘDŽᔧЁᮁথ⫳ᯊˈ佪 ㋴߭ᰃ䖭ḋϔϾ䯳߫ⱘ༈ܗ৥䞣ˈᇚ݊Ёᮁ᳡ࡵ⿟ᑣᣖࠄⳌᑨⱘ䯳߫Ёএˈ㗠᭄㒘 irq_desc[]Ёⱘ↣Ͼ կϔ⾡᠟↉DŽ಴ℸˈ㋏㒳ЁЎ↣ϾЁᮁ৥䞣䆒㕂ϔϾ䯳߫ˈ㗠ḍ᥂↣ϾЁᮁ⑤᠔Փ⫼˄ѻ⫳˅ⱘЁᮁ ⱘᮍ⊩ᰃЎ݅⫼Ёᮁ৥䞣ᦤއ㋏㒳Ёˈ䰤ࠊ↣ϾЁᮁ⑤䛑ᖙ乏⣀ऴՓ⫼ϔϾЁᮁ৥䞣ᰃϡ⦄ᅲⱘDŽ㾷 Linux 䖭ḋⱘ ڣ໳⫼DŽ㗠Ϩˈᕜ໮໪䚼䆒໛⬅Ѣ⾡⾡ॳ಴ৃ㛑ᴀᴹህϡᕫϡ݅⫼Ёᮁ৥䞣DŽ᠔ҹˈ೼ ᠷ䰸ϔѯЎ CPU ᴀ䑿ֱ⬭ⱘ৥䞣DŽԚᰃˈ԰ЎϔϾ䗮⫼ⱘ᪡԰㋏㒳ˈᕜ䲒䇈࠽ϟⱘ䖭ѯЁᮁ৥䞣ᰃ৺ irq_desc[]DŽЎҔМ㽕᳝䖭МϔϾ㒧ᵘ᭄㒘ਸ਼˛៥Ӏⶹ䘧ˈi386 ⱘ㋏㒳㒧ᵘᬃᣕ 256 ϾЁᮁ৥䞣ˈ䖬㽕 ᰃ೼ init_ISA_irq()Ёᇍ PC ⱘЁᮁ᥻ࠊ఼ 8259A 䖯㸠߱ྟ࣪ˈᑊϨ߱ྟ࣪ϔϾ㒧ᵘ᭄㒘ܜ佪 457 456 } 455 set_intr_gate(vector, interrupt[i]); 454 if (vector != SYSCALL_VECTOR) 453 int vector = FIRST_EXTERNAL_VECTOR + i; 452 for (i = 0; i < NR_IRQS; i++) { 451 */ 450 * 'special' SMP interrupts) 449 * us. (some of these will be overridden and become 448 * Cover the whole vector space, no vector can escape 447 /* 446 #endif 445 init_VISWS_APIC_irqs(); 444 #else 443 init_ISA_irqs(); 442 #ifndef CONFIG_X86_VISWS_APIC 441 440 int i; 439 { 438 void __init init_IRQ(void) ==================== arch/i386/kernel/i8259.c 438 457 ==================== arch/i386/kernel/i8259.c Ё˖ 䮼ҹৢˈህ㽕䖯ܹ init_IRQ() 䆒㕂໻䞣⫼Ѣ໪䆒ⱘ䗮⫼Ёᮁ䮼њDŽߑ᭄ init_IRQ() ⱘҷⷕ೼ ㋏㒳߱ྟ࣪ᯊˈ೼ trap_init()Ё䆒㕂њϔѯЎ CPU ֱ⬭ϧ⫼ⱘ IDT 㸼乍ҹঞ㋏㒳䇗⫼᠔⫼ⱘ䱋䰅 ϡ䖛ˈ䖭↩コᰃ೼ݙḌЁˈ㗠Ϩᰃᕜᑩሖⱘϰ㽓ˈϔ㠀гϡӮ᳝ᕜ໮Ҏএ䇏ǃএ㓈ᡸⱘDŽ ᰃ৺ؐᕫˈ໻ৃଚᾋDŽخ䆒㕂 IDT 㸼乍ϔ㉏ᑊϡᰃ乥㐕থ⫳ⱘ᪡԰ˈ䖭ḋڣ⡆ৃ䇏ᗻЎҷӋⱘDŽᇍѢ %%edx ݭܹ*(gate_addr+1)DŽ䇏㗙ϡོ䆩䆩ˈⳟⳟ㛑৺ݭߎᬜ⥛᳈催ⱘҷⷕʽᔧ✊ˈ䖭⾡催ᬜ⥛ᰃҹ⡎ ⱚЎ 0DŽ䖭ህᔶ៤њЁᮁ䮼Ёⱘ㄀ 2 Ͼ䭓ᭈ᭄DŽ✊ৢˈ793 㸠ᇚ%%eax ݭܹ*gate_addrˈ㗠 794 㸠߭ᇚ 0x8000˅ˈ DPL ԡ↉Ў dpl˄಴Ў dpl<<13˅ˈ㗠 D ԡࡴϞ㉏ൟԡ↉߭Ў type˄಴Ў type<<8˅ˈ݊ԭ৘ԡ 㺙ܹ%%edx ⱘԢ 16 ԡDŽ䖭ḋˈ%%edx Ё催 16 ԡЎ addr ⱘ催 16 ԡˈ㗠Ԣ 16 ԡⱘ P ԡЎ 1˄಴Ўᰃ 203 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 43 BI(x,4) BI(x,5) BI(x,6) BI(x,7) \ 42 BI(x,0) BI(x,1) BI(x,2) BI(x,3) \ 41 #define BUILD_16_IRQS(x) \ 40 39 BUILD_IRQ(x##y) 38 #define BI(x,y) \ ==================== arch/i386/kernel/i8259.c 38 51 ==================== arch/i386/kernel/i8259.c Ёⱘ঺໪޴㸠˖ ᅮНⱘਸ਼˛䇋ⳟܓ䙷МˈҢ IRQ0x00_interrupt ࠄ IRQ0x0f_interrupt 䖭 16 Ͼߑ᭄ᴀ䑿ᰃ೼ા བᵰᰃ໮໘⧚఼ SMP 㒧ᵘˈ߭ৢ䴶䖬᳝ IRQ0x10 㟇 IRQ0xdf ㄝ 208 Ͼߑ᭄ᣛ䩜DŽ ϔ䚼ߚ㒭ߎњ interrupt[]Ёⱘᓔ༈ 16 Ͼߑ᭄ᣛ䩜DŽᇍѢऩ CPU ㋏㒳㒧ᵘˈৢ䴶ⱘᣛᓩህ䛑ᰃ NULL њDŽ њᶃ➹㐕⧤ⱘ᭛ᄫᔩܹ੠㓪䕥DŽ᠔ҹˈ䖭ܡ䖭ḋˈህ߽⫼ gcc ⱘ乘໘⧚㞾ࡼ⫳៤њ᠔䳔ⱘ᭛ᄫˈ㗠䙓 IRQ0x00_interruptDŽৢ䴶ձ⃵Ў IRQ0x01_interruptǃIRQ0x02_interruptǃĂˈⳈࠄ IRQ0x0f_interruptDŽ 㸠ҹখ᭄ 0x0˄԰ЎᄫヺІ˅䇗⫼ IRQ_LIST_16()ᯊˈ102 㸠Ёⱘ IRQ(x,0)ህӮ೼乘໘⧚䰊↉㹿᳓ᤶ៤ ᭄ᣛ䩜ⱘ᭛ᄫᰃ⬅ gcc ⱘ乘໘⧚㞾ࡼѻ⫳ⱘˈ಴Ўヺো##ⱘ԰⫼ᰃᇚᄫヺІ䖲᥹೼ϔ䍋DŽ՟བˈᔧ 108 ᭄㒘ⱘ㄀ϔ䚼ߚݙᆍᅮНѢ 107 㸠ˈ乎ⴔ IRQLIST_16(x)੠ IRQ(x,y)ⱘᅮНࠄ 98 㸠ˈৃⶹ݇Ѣߑ 119 #undef IRQLIST_16 118 #undef IRQ 117 116 }; 115 #endif 114 IRQLIST_16(0xc), IRQLIST_16(0xd) 113 IRQLIST_16(0x8), IRQLIST_16(0x9), IRQLIST_16(0xa), IRQLIST_16(0xb), 112 IRQLIST_16(0x4), IRQLIST_16(0x5), IRQLIST_16(0x6), IRQLIST_16(0x7), 111 IRQLIST_16(0x1), IRQLIST_16(0x2), IRQLIST_16(0x3), 110 #ifdef CONFIG_X86_IO_APIC 109 108 IRQLIST_16(0x0), 107 void (*interrupt[NR_IRQS])(void) = { 106 105 IRQ(x,c), IRQ(x,d), IRQ(x,e), IRQ(x,f) 104 IRQ(x,8), IRQ(x,9), IRQ(x,a), IRQ(x,b), \ 103 IRQ(x,4), IRQ(x,5), IRQ(x,6), IRQ(x,7), \ 102 IRQ(x,0), IRQ(x,1), IRQ(x,2), IRQ(x,3), \ 101 #define IRQLIST_16(x) \ 100 99 IRQ##x##y##_interrupt 98 #define IRQ(x,y) \ ==================== arch/i386/kernel/i8259.c 98 119 ==================== ߑ᭄ᣛ䩜᭄㒘 interrupt[]ⱘݙᆍгᰃ೼ arch/i386/kernel/i8259.c ЁᅮНⱘ˖ 䙷Ꮖ㒣೼ࠡ䴶䆒㕂དњDŽ䖭䞠䆒㕂ⱘ᳡ࡵ⿟ᑣܹষഄഔ䛑ᴹ㞾ϔϾߑ᭄ᣛ䩜᭄㒘 interrupt[]DŽ ߭Ў 224ˈ䙷ᰃ೼ include/asm•i386/irq.h Ё˄㄀ 24 㸠˅ᅮНⱘDŽϡ䖛ˈ㽕䏇䖛⫼Ѣ㋏㒳䇗⫼ⱘ৥䞣 0x80ˈ FIRST_EXTERNAL_VECTOR ೼ include/asm•i386/hw_irq.h Ё˄㄀ 22 㸠˅ᅮНЎ 0x20ˈ㗠 NR_IRQS 204 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 488 * Set the clock to HZ Hz, we already have a valid 487 /* 486 485 #endif ==================== arch/i386/kernel/i8259.c 485 505 ==================== ...... 458 #ifdef CONFIG_SMP ==================== arch/i386/kernel/i8259.c 458 458 ==================== ಲࠄ init_IRQ()Ё㒻㓁ᕔϟⳟ˄arch/i386/kernel/i8259.c˅˖ Ң⬹DŽܜBUILD_COMMON_IRQ()㗠⫳៤ⱘˈ䖭↉⿟ᑣ៥Ӏ೼ৢ䴶ⱘᚙ᱃Ё䖬㽕䆆ˈ䖭䞠 㟇 0xffffff0f ˈԭ㉏᥼DŽ㟇Ѣ common_interrupt ˈ䙷гᰃ⬅ gcc ⱘ乘໘⧚ሩᓔϔϾᅣᅮН ৥䞣Ⳍ݇ⱘ᭄ؐ˄य़ܹේᷜЁ˅ᇍᑨѢ IRQ0x00_interrupt ࠄ IRQ0x0f_interruptˈ䆹᭄ؐߚ߿Ў 0x0fffff00 ೼ℸПࠡߚ߿䎥ࠄ IRQ0x01_interrupt ៪㗙 IRQ0x02_interrupt ㄝㄝⱘⳂⱘˈা೼Ѣ⬅ℸᕫࠄϔϾϢЁᮁ ⬅ℸৃҹⳟߎˈᅲ䰙Ϟ⬅໪䆒ѻ⫳ⱘЁᮁ໘⧚ܼ䛑䖯ܹϔ↉݀݅ⱘ⿟ᑣ common_interrupt Ёˈ㗠 "jmp common_interrupt"); "pushl $0x01 • 256 \n\t" \ "IRQ0x01_interrupt: \n\t" \ "\n" \ __asm__ ( \ asmlinkage void IRQ0x01_interrupt(); 㒣䖛 gcc ⱘ乘໘⧚ҹৢˈ֓Ӯሩᓔ៤ϔ㋏߫བϟᓣḋⱘҷⷕ˖ 111 #define IRQ_NAME(nr) IRQ_NAME2(IRQ##nr) 110 #define IRQ_NAME2(nr) nr##_interrupt(void) ==================== include/asm•i386/hw_irq.h 110 111 ==================== 178 "jmp common_interrupt"); 177 "pushl $"#nr"•256\n\t" \ 176 SYMBOL_NAME_STR(IRQ) #nr "_interrupt:\n\t" \ 175 "\n"__ALIGN_STR"\n" \ 174 __asm__( \ 173 asmlinkage void IRQ_NAME(nr); \ 172 #define BUILD_IRQ(nr) \ ==================== include/asm•i386/hw_irq.h 172 178 ==================== BUILD_IRQ(0x0f)݅ 16 乍ᅣᅮНⱘᓩ⫼DŽ㗠 BUILD_IRQ()߭ᰃ೼ include/asm•i386/hw_irq.h ЁᅮНⱘ˖ ৃ㾕ˈ51 㸠ⱘᅣᅮН BUILD_16_IRQS(0x0)೼乘໘⧚䰊↉Ӯ㹿ሩᓔ៤Ң BUILD_IRQ(0x00)㟇 51 BUILD_16_IRQS(0x0) 50 */ 49 * (these are usually mapped to vectors 0x20•0x2f) 48 * ISA PIC or low IO•APIC triggered (INTA•cycle or APIC) interrupts: 47 /* 46 45 BI(x,c) BI(x,d) BI(x,e) BI(x,f) BI(x,8) BI(x,9) BI(x,a) BI(x,b) \ 44 205 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ==================== include/linux/irq.h 23 55 ==================== include/linux/irq.h ЁᅮНⱘ˖ ᔶ៤ϔϾЁᮁ䇋∖䯳߫ⱘ᭄㒘ˈ䖭ህᰃ᭄㒘 irq_desc[]DŽЁᮁ䇋∖䯳߫༈䚼ⱘ᭄᥂㒧ᵘᰃ೼ ໛ϟϔϾĀЁᮁ䇋∖䯳߫āˈҢ㗠ޚ࣪ˈ᠔ҹ೼ IDT ⱘ߱ྟ࣪䰊↉াᰃЎ↣ϾЁᮁ৥䞣ˈгे↣Ͼ㸼乍 䆌䖭⾡݅⫼ⱘ㒧ᵘ೼㋏㒳䖤㸠ⱘ䖛⿟Ёࡼᗕഄবܕ⬅Ѣ䗮⫼Ёᮁ䮼ᰃ䅽໮ϾЁᮁ⑤݅⫼ⱘˈ㗠Ϩ ݅ѿˈ㗠ϧ⫼Ёᮁ䮼߭ᰃЎ⡍ᅮⱘЁᮁ⑤᠔ϧ⫼DŽ ⾡㸼乍ˈ݅ 224 乍ˈ䛑ᰃ⫼Ѣ໪䆒ⱘ䗮⫼Ёᮁ䮼DŽ䖭Ѡ㗙ⱘऎ߿೼Ѣ䗮⫼Ёᮁ䮼ৃҹЎ໮ϾЁᮁ⑤᠔ debug ⱘ INT 3˅DŽ䖭ѯЁᮁ䮼ⱘ৥䞣䰸⫼Ѣ㋏㒳䇗⫼ⱘ 0x80 ໪䛑೼ 0x20 ҹϟDŽҢ 0x20 ᓔྟህᰃ㄀ 2 ҹঞ⬅⫼᠋⿟ᑣ䗮䖛 INT ᣛҸѻ⫳ⱘЁᮁ˄៪⿄Ā䱋䰅ā˅ˈЏ㽕⫼ᴹѻ⫳㋏㒳䇗⫼˄঺໪䖬᳝Ͼ⫼Ѣ ᰃЎֱ⬭ϧ⫼Ѣ CPU ᴀ䑿ⱘЁᮁ䮼ˈЏ㽕⫼Ѣ⬅ CPU ѻ⫳ⱘᓖᐌˈབĀ䰸᭄Ў 0āǃĀ义䴶䫭āㄝㄝˈ ⹂ഄˈᑨ䆹䇈ĀЁᮁᦣ䗄㸼ā˅IDT Ё᳝ϸ⾡㸼乍ˈϔ⾡ޚ೼ࠡϔ㡖Ёˈ៥Ӏ䆆ࠄЁᮁ৥䞣㸼˄᳈ 3.3 Ёᮁ䇋∖䯳߫ⱘ߱ྟ࣪ ⬅ℸᯊ㾕ˈ䆒䅵ϔϾⳳℷᅲ⫼ⱘ᪡԰㋏㒳ˈ᳝໮ᇥџᚙ䳔㽕਼ࠄ㊒㒚ⱘ㗗㰥ʽ ໛ҹৢᠡ㛑䅽㛝᧣ᓔྟ䏇ࡼDŽޚгህ㽕䱣ПᓔྟDŽ᠔ҹˈϔᅮ㽕ㄝᅠ៤њᇍ䖯⿟䇗ᑺⱘ߱ྟ࣪ˈ԰དњ ਸ਼˛䖭ᰃ಴Ў㋏㒳೼䖭Ͼᯊ׭䖬≵᳝ᅠ៤ᇍ䖯⿟䇗ᑺᴎࠊⱘ߱ྟ࣪ˈ㗠ϔᮺᯊ䩳Ёᮁᓔྟˈ䖯⿟䇗ᑺ ᰃࡼ⠽ⱘᖗ䏇ǃ㛝᧣DŽ㗠⦄೼ݙḌⱘ㛝᧣ᇮ᳾ᓔྟDŽЎҔМ䖬ϡ䅽ᅗᓔྟڣᇍᯊ䩳Ёᮁⱘ᳡ࡵˈህད Ϩѻ⫳њᯊ䩳Ёᮁˈгাϡ䖛ᰃ䅽ᅗ೼ common_interrupt Ёぎ䎥ϔ䍳DŽ䇏㗙ҹৢᇚⳟࠄˈᯊ䩳Ёᮁ੠ 䩳Ёᮁ᳡ࡵ⿟ᑣᣖࠄ IRQ0 ⱘ䯳߫ЁএDŽ䖭Ͼᯊ׭ˈ䖭ѯ irq 䯳߫䛑䖬ᰃぎⱘˈ᠔ҹेՓᓔњЁᮁˈᑊ Ԛᰃ㽕⊼ᛣˈ㱑✊䆹Ёᮁ᳡ࡵⱘܹষഄഔᏆ㒣䆒㕂ࠄЁᮁ৥䞣㸼ЁˈԚᅲ䰙Ϟ៥Ӏ䖬≵᳝ᡞ݋ԧⱘᯊ 㒳ᯊ䩳ⱘ߱ྟ࣪њDŽҷⷕЁ᳝Ͼ⊼㾷ˈ䇈៥ӀᏆ㒣᳝њϾЁᮁ৥䞣ˈᅲ䰙Ϟᣛⱘᰃ IRQ0x00_interruptDŽ ⬅Ѣ៥Ӏ೼䖭䞠᮶ϡ݇ᖗ໮໘⧚఼ SMP 㒧ᵘˈгϡ㗗㰥 SGI ԰キⱘ⡍⅞໘⧚ˈ࠽ϟⱘህাᰃᇍ㋏ 505 } 504 setup_irq(13, &irq13); 503 if (boot_cpu_data.hard_math && !cpu_has_fpu) 502 */ 501 * original braindamaged IBM FERR coupling. 500 * External FPU? Set up irq13 if so, for 499 /* 498 497 #endif 496 setup_irq(2, &irq2); 495 #ifndef CONFIG_VISWS 494 493 outb(LATCH >> 8 , 0x40); /* MSB */ 492 outb_p(LATCH & 0xff , 0x40); /* LSB */ 491 outb_p(0x34,0x43); /* binary, mode 2, LSB/MSB, ch 0 */ 490 */ vector now: * 489 206 207 23 /* 24 * Interrupt controller descriptor. This is all we need 25 * to describe about the low•level hardware. 26 */ 27 struct hw_interrupt_type { 28 const char * typename; 29 unsigned int (*startup)(unsigned int irq); 30 void (*shutdown)(unsigned int irq); 31 void (*enable)(unsigned int irq); 32 void (*disable)(unsigned int irq); 33 void (*ack)(unsigned int irq); 34 void (*end)(unsigned int irq); 35 void (*set_affinity)(unsigned int irq, unsigned long mask); 36 }; 37 38 typedef struct hw_interrupt_type hw_irq_controller; 39 40 /* 41 * This is the "IRQ descriptor", which contains various information 42 * about the irq, including what kind of hardware handling it has, 43 * whether it is disabled etc etc. 44 * 45 * Pad this out to 32 bytes for cache and indexing reasons. 46 */ 47 typedef struct { 48 unsigned int status; /* IRQ status */ 49 hw_irq_controller *handler; 50 struct irqaction *action; /* IRQ action list */ 51 unsigned int depth; /* nested irq disables */ 52 spinlock_t lock; 53 } ____cacheline_aligned irq_desc_t; 54 55 extern irq_desc_t irq_desc [NR_IRQS]; ↣Ͼ䯳߫༈䚼Ё䰸ᣛ䩜 action ⫼ᴹ㓈ᣕϔϾ⬅Ёᮁ᳡ࡵ⿟ᑣᦣ䗄乍ᵘ៤ⱘऩ䫒䯳߫໪ˈ䖬᳝Ͼᣛ 䩜 handler ᣛ৥঺ϔϾ᭄᥂㒧ᵘˈे hw_interrupt_type ᭄᥂㒧ᵘDŽ䙷䞠Џ㽕ᰃϔѯߑ᭄ᣛ䩜ˈ⫼Ѣ䆹䯳 ߫ˈ៪㗙䇈䆹݅⫼ĀЁᮁ䗮䘧āⱘ᥻ࠊ˄㗠ᑊϡᰃᇍ݋ԧЁᮁ⑤ⱘ᳡ࡵ˅DŽ݋ԧⱘߑ᭄߭পއѢ᠔⫼ⱘ Ёᮁ᥻ࠊ఼˄䗮ᐌᰃ i8259A˅DŽ՟བˈߑ᭄ᣛ䩜 enable ੠ disable ⫼ᴹᓔਃ੠݇ᮁ݊᠔ሲⱘ䗮䘧ˈack ⫼ѢᇍЁᮁ᥻ࠊ఼ⱘડᑨˈ㗠 end ߭⫼Ѣ↣⃵Ёᮁ᳡ࡵ䖨ಲⱘࠡ໩DŽ䖭ѯߑ᭄䛑ᰃ೼ init_IRQ()Ё䇗⫼ init_ISA_irqs()䆒㕂དⱘˈ㾕 arch/i386/kernel/i8259.c˖ ==================== arch/i386/kernel/i8259.c 413 436 ==================== 413 void __init init_ISA_irqs (void) 414 { 415 int i; 416 417 init_8259A(0); 418 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 208 419 for (i = 0; i < NR_IRQS; i++) { 420 irq_desc[i].status = IRQ_DISABLED; 421 irq_desc[i].action = 0; 422 irq_desc[i].depth = 1; 423 424 if (i < 16) { 425 /* 426 * 16 old•style INTA•cycle interrupts: 427 */ 428 irq_desc[i].handler = &i8259A_irq_type; 429 } else { 430 /* 431 * 'high' PCI IRQs filled in on demand 432 */ 433 irq_desc[i].handler = &no_irq_type; 434 } 435 } 436 } ⿟ᑣܜ䇗⫼ init_8259A()ᇍ 8259A Ёᮁ᥻ࠊ఼䖯㸠߱ྟ࣪˄݊ҷⷕг೼ arch/i386/kernel/i8259.c Ё ˅ˈ ✊ৢᇚᓔ༈ 16 ϾЁᮁ䇋∖䯳߫ⱘ handler ᣛ䩜䆒㕂៤ᣛ৥᭄᥂㒧ᵘ i8259A_irq_typeˈ䙷гᰃ೼ arch/i386/kernel/i8259.c ЁᅮНⱘ˖ ==================== arch/i386/kernel/i8259.c 148 157 ==================== 148 static struct hw_interrupt_type i8259A_irq_type = { 149 "XT•PIC", 150 startup_8259A_irq, 151 shutdown_8259A_irq, 152 enable_8259A_irq, 153 disable_8259A_irq, 154 mask_and_ack_8259A, 155 end_8259A_irq, 156 NULL 157 }; ⫼Ѣ݋ԧЁᮁ᳡ࡵ⿟ᑣᦣ䗄乍ⱘ᭄᥂㒧ᵘ irqactionˈ߭ᰃ೼ include/linux/interrupt.h ЁᅮНⱘ˖ ==================== include/linux/interrupt.h 14 21 ==================== 14 struct irqaction { 15 void (*handler)(int, void *, struct pt_regs *); 16 unsigned long flags; 17 unsigned long mask; 18 const char *name; 19 void *dev_id; 20 struct irqaction *next; 21 }; ݊Ё᳔Џ㽕ⱘህᰃߑ᭄ᣛ䩜 handlerˈᣛ৥݋ԧⱘЁᮁ᳡ࡵ⿟ᑣDŽ ೼ IDT 㸼ⱘ߱ྟ࣪ᅠ៤П߱ˈ↣ϾЁᮁ᳡ࡵ䯳߫䛑ᰃぎⱘDŽℸᯊेՓᠧᓔЁᮁᑊϨᶤϾ໪䆒Ёᮁ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 209 ⳳⱘথ⫳њˈгᕫϡࠄᅲ䰙ⱘ᳡ࡵDŽ㱑✊ҢЁᮁ⑤ⱘ⹀ӊҹঞЁᮁ᥻ࠊ఼ⱘ㾦ᑺᴹⳟԐТᏆ㒣ᕫࠄ᳡ ࡵњˈ಴ЎᔶᓣϞ CPU ⹂ᅲ䗮䖛Ёᮁ䮼䖯ܹњᶤϾЁᮁ৥䞣ⱘᘏ᳡ࡵ⿟ᑣˈ՟བ IRQ0x01_interrupt()ˈ ᑊϨᣝ㽕∖ᠻ㸠њᇍЁᮁ᥻ࠊ఼ⱘ ack()ҹঞ end()ˈ✊ৢᠻ㸠 iret ᣛҸҢЁᮁ䖨ಲDŽԚᰃˈҢ䘏䕥ⱘ㾦 ᑺǃࡳ㛑ⱘ㾦ᑺᴹⳟˈ߭݊ᅲᑊ≵᳝ᕫࠄᅲ䋼ⱘ᳡ࡵˈ಴Ўᑊ≵᳝ᠻ㸠݋ԧⱘЁᮁ᳡ࡵ⿟ᑣDŽ᠔ҹˈ ⳳℷⱘЁᮁ᳡ࡵ㽕ࠄ݋ԧ䆒໛ⱘ߱ྟ࣪⿟ᑣᇚ݊Ёᮁ᳡ࡵ⿟ᑣ䗮䖛 request_irq()৥㋏㒳Āⱏ䆄āˈᣖܹ ᶤϾЁᮁ䇋∖䯳߫ҹৢᠡӮথ⫳DŽ ߑ᭄ request_irq()ⱘҷⷕ೼ arch/i386/kernel/irq.c Ё˖ ==================== arch/i386/kernel/irq.c 630 705 ==================== 630 /** 631 * request_irq • allocate an interrupt line 632 * @irq: Interrupt line to allocate 633 * @handler: Function to be called when the IRQ occurs 634 * @irqflags: Interrupt type flags 635 * @devname: An ascii name for the claiming device 636 * @dev_id: A cookie passed back to the handler function 637 * 638 * This call allocates interrupt resources and enables the 639 * interrupt line and IRQ handling. From the point this 640 * call is made your handler function may be invoked. Since 641 * your handler function must clear any interrupt the board 642 * raises, you must take care both to initialise your hardware 643 * and to set up the interrupt handler in the right order. 644 * 645 * Dev_id must be globally unique. Normally the address of the 646 * device data structure is used as the cookie. Since the handler 647 * receives this value it makes sense to use it. 648 * 649 * If your interrupt is shared you must pass a non NULL dev_id 650 * as this is required when freeing the interrupt. 651 * 652 * Flags: 653 * 654 * SA_SHIRQ Interrupt is shared 655 * 656 * SA_INTERRUPT Disable local interrupts while processing 657 * 658 * SA_SAMPLE_RANDOM The interrupt can be used for entropy 659 * 660 */ 661 662 int request_irq(unsigned int irq, 663 void (*handler)(int, void *, struct pt_regs *), 664 unsigned long irqflags, 665 const char * devname, 666 void *dev_id) 667 { Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⱘᘏ᥻ᑊϡ೼Тˈা㽕৘Ͼ݋ԧⱘЁᮁ᳡ࡵ⿟ᑣ㞾Ꮕ㛑໳䕼䆚੠Փ⫼ेৃˈ᠔ҹ䖭䞠 dev_id ⱘ㉏ൟЎ dev_id Ӯ㹿԰Ў䇗⫼খ᭄Ӵಲ᠔ᣛᅮⱘ᳡ࡵ⿟ᑣDŽ㟇Ѣ䖭 dev_id ࠄᑩᰃҔМˈrequest_irq()੠Ёᮁ᳡ࡵ ݊ᅗЁᮁ⑤݀⫼䆹Ёᮁ䇋∖䗮䘧DŽℸᯊᖙ乏ᦤկϔϾ䴲䳊ⱘ dev_id ҹկऎ߿DŽᔧЁᮁথ⫳ᯊˈখ᭄ 㟇 IRQ15 ᰃ⬅Ёᮁ᥻ࠊ఼ i8259A ᥻ࠊⱘDŽখ᭄ irqflags ᰃϔѯᷛᖫԡˈ݊Ёⱘ SA_SHIRQ ᷛᖫ㸼⼎Ϣ 䖭⾡Ёᮁ䇋∖োⳟ៤Ā䘏䕥āЁᮁ৥䞣ˈ㗠ৢ㗙߭ЎĀ⠽⧚āЁᮁ৥䞣DŽ䗮ᐌˈࠡ 16 Ͼᮁ䇋∖䗮䘧 IRQ0 ᠔⫼ⱘĀЁᮁোā៪ĀЁᮁ৥䞣āᰃϡৠⱘˈЁᮁ䇋∖ো IRQ0 ⳌᔧѢЁᮁ৥䞣 0x20DŽг䆌ˈৃҹᡞ ϔϾ䗮䘧ˈ᳝ᯊ׭㽕೼᥹ষवϞ䗮䖛ᖂൟᓔ݇៪䏇㒓ᴹ䆒㕂DŽԚᰃ㽕⊼ᛣˈ䖭ḋⱘЁᮁ䇋∖োϢ CPU খ᭄ irq ЎЁᮁ䇋∖䯳߫ⱘᑣোˈгህᰃҎӀ䗮ᐌ᠔䇈ⱘĀЁᮁ䇋∖োāˈᇍᑨѢЁᮁ᥻ࠊ఼Ёⱘ 705 } 704 return retval; 703 kfree(action); 702 if (retval) 701 retval = setup_irq(irq, action); 700 699 action•>dev_id = dev_id; 698 action•>next = NULL; 697 action•>name = devname; 696 action•>mask = 0; 695 action•>flags = irqflags; 694 action•>handler = handler; 693 692 return •ENOMEM; 691 if (!action) 690 kmalloc(sizeof(struct irqaction), GFP_KERNEL); 689 action = (struct irqaction *) 688 687 return •EINVAL; 686 if (!handler) 685 return •EINVAL; 684 if (irq >= NR_IRQS) 683 682 #endif 681 } 680 printk("Bad boy: %s (at 0x%x) called us without a dev_id!\n", devname, (&irq)[•1]); 679 if (!dev_id) 678 if (irqflags & SA_SHIRQ) { 677 */ 676 * interrupt freeing logic etc). 675 * to figure out which interrupt is which (messes up the 674 * a real dev•ID, otherwise we'll have trouble later trying 673 * Sanity•check: shared interrupts should REALLY pass in 672 /* 671 #if 1 670 669 struct irqaction * action; int retval; 668 210 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 995 /* add new interrupt at end of irq queue */ 994 993 } 992 return •EBUSY; 991 spin_unlock_irqrestore(&desc•>lock,flags); 990 if (!(old•>flags & new•>flags & SA_SHIRQ)) { 989 /* Can't share interrupts unless both agree to */ 988 if ((old = *p) != NULL) { 987 p = &desc•>action; 986 spin_lock_irqsave(&desc•>lock,flags); 985 */ 984 * The following block of code has to be executed atomically 983 /* 982 981 } 980 rand_initialize_irq(irq); 979 */ 978 * only the sysadmin is able to do this. 977 * installing a new handler, but is this really a problem, 976 * driver is attempted to be loaded, without actually 975 * Yes, this might clear the entropy pool if the wrong 974 * outside of the atomic block. 973 * This function might sleep, we want to call it first, 972 /* 971 if (new•>flags & SA_SAMPLE_RANDOM) { 970 */ 969 * running system. 968 * so we have to be careful not to interfere with a 967 * Some drivers like serial.c use request_irq() heavily, 966 /* 965 964 irq_desc_t *desc = irq_desc + irq; 963 struct irqaction *old, **p; 962 unsigned long flags; 961 int shared = 0; 960 { 959 int setup_irq(unsigned int irq, struct irqaction * new) 958 /* this was setup_x86_irq but it seems pretty generic */ ==================== arch/i386/kernel/irq.c 958 1014 ==================== 䯳߫DŽ݊ҷⷕ೼ৠϔ᭛ӊ˄arch/i386/kernel/irq.c˅Ё˖ ೼ߚ䜡ᑊ䆒㕂њϔϾ irqaction ᭄᥂㒧ᵘ action ҹৢˈ֓䇗⫼ setup_irq()ˈᇚ݊䫒ܹⳌᑨⱘЁᮁ䇋∖ 䆹 request_irq()ߑ᭄ᰃҢҔМഄᮍ䇗⫼ⱘˈՓ⿟ᑣਬৃҹḍ᥂䖭Ͼഄഔথ⦄ᰃ೼ાϾߑ᭄Ё䇗⫼ⱘDŽ ේᷜЁⱘԡ㕂DŽ䙷Мˈ೼&irq ϟ䴶ⱘᰃҔМਸ਼˛䙷ህᰃߑ᭄ⱘ䖨ಲഄഔDŽ᠔ҹˈ䖭Ͼ printk()䇁হᰒ⼎ ᰃ䇁হЁⱘখ᭄(&irq)[•1]DŽ䖭䞠 irq ᰃ㄀ϔϾ䇗⫼খ᭄ˈ᠔ҹᰃ᳔ৢय़ܹේᷜⱘˈ&irq ህᰃখ᭄ irq ೼ ѢĀᅜᡸ⼲”syslogd ੠ klogd ᰃ৺Ꮖ㒣೼䖤㸠DŽ䖭䞠᳝䍷ⱘއ/var/log/messages ៪㗙೼ሣᐩϞᰒ⼎ˈপ void*DŽ㗠 request_irq()Ё߭ᇍℸ䖯㸠ẔᶹDŽ乎֓ᦤϔϟˈprintk()ѻ⫳ϔϾߎ䫭ֵᙃDŽ䗮ᐌᰃݭܹ᭛ӊ 211 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Linux ݙḌᇍЁᮁડᑨ੠᳡ࡵⱘᘏԧⱘḐሔ੠ᅝᥦˈ䖬ৃҹ乎ⴔ䖭Ͼ䖛⿟ҟ㒡ݙḌЁⱘϔѯⳌ݇ⱘĀ෎ ⱘડᑨˈݡࠄЁᮁ᳡ࡵ⿟ᑣⱘ䇗⫼Ϣ䖨ಲˈ⊓ⴔ CPU ᠔㒣䖛ⱘ䏃㒓䍄ϔ䘡DŽ䖭ḋˈ᮶ৃҹᓘ⏙੠⧚㾷 ᧲⏙њ i386 CPUⱘЁᮁᴎࠊ੠ݙḌЁ᳝݇ⱘ߱ྟ࣪ҹৢˈ៥ӀህৃҹҢЁᮁ䇋∖ⱘথ⫳ࠄ CPU 3.4 Ёᮁⱘડᑨ੠᳡ࡵ ೼ݙḌЁˈ䆒໛偅ࡼ⿟ᑣϔ㠀䛑㽕䗮䖛 request_irq()৥㋏㒳ⱏ䆄݊Ёᮁ᳡ࡵ⿟ᑣDŽ 䆌݅⫼ᯊᠡᇚ݊䫒ܹ䯳߫ⱘሒ䚼DŽܕ䛑 䆌݅⫼ϔϾЁᮁ䗮䘧ˈা᳝೼ᮄࡴܹⱘ㒧ᵘҹঞ䯳߫Ёⱘ㄀ϔϾ㒧ᵘܕ⿡ࡴẔᶹˈẔᶹⱘݙᆍЎᰃ৺ ѯ߱ྟ࣪˄1006̚1008 㸠˅ˈࣙᣀ䇗⫼ᴀ䯳߫ⱘ startup ߑ᭄DŽᇍѢৢᴹࡴܹ䯳߫ⱘ irqaction 㒧ᵘ߭㽕 ᇍ㄀ϔϾࡴܹ䯳߫ⱘ irqaction 㒧ᵘⱘ໘⧚↨䕗ㅔऩ˄1003 㸠˅ˈϡ䖛ℸᯊ㽕ᇍ䯳߫ⱘ༈䚼䖯㸠ϔ ⱘ spin_unlock_irqrestore()߭ᰃЈ⬠ऎⱘߎষDŽ ⱘЈ⬠ऎDŽ៥Ӏᇚ೼ᴀкϟݠĀ໮໘⧚఼ SMP 㒧ᵘāϔゴЁҟ㒡੠䅼䆎 spin_lock_irqsave()ˈϢПⳌᇍ ݇䯁ˈ䖬㽕䰆ℶৃ㛑ᴹ㞾݊ᅗ໘⧚఼ⱘᑆᡄDŽҷⷕЁ 986 㸠ⱘ spin_lock_irqsave()ህՓ CPU 䖯ܹњ䖭ḋ Ёᮁ㽕ܝ䆌ফࠄᑆᡄˈᖙ乏㽕೼Ј⬠ऎݙ䖯㸠ˈϡܕৃᛇ㗠ⶹˈᇍѢЁᮁ䇋∖䯳߫ⱘ᪡԰ᔧ✊ϡ ߱ྟ࣪ϔϾ᭄᥂㒧ᵘˈ⫼ᴹ䆄ᔩ䆹ЁᮁⱘᯊᑣDŽ ᇚᷛᖫԡ SA_SAMPLE_RANDOM 䆒៤ 1DŽ㗠䖭䞠䇗⫼ⱘ rand_initialize_irq()ህ᥂ℸЎ䆹Ёᮁ䇋∖䯳߫ ᗻDŽ䳔㽕೼ᶤϾЁᮁ䇋∖䯳߫ˈ៪㗙䇈Ёᮁ䇋∖䗮䘧Ёᓩܹ䖭⾡䱣ᴎᗻᯊˈৃҹ೼䇗⫼খ᭄ irqflags Ё Ў䖭ḋⱘ䱣ᴎ಴㋴DŽ᠔ҹ Linux ݙḌᦤկњϔ⾡᠟↉ˈՓᕫৃҹḍ᥂Ёᮁথ⫳ⱘᯊ䯈ᴹᓩܹϔ⚍䱣ᴎ ಴㋴ˈ⿄ЎĀႵā˄entropy˅DŽ⬅৘⾡Ёᮁ⑤ѻ⫳ⱘЁᮁ䇋∖೼ᯊ䯈Ϟ໻໮ᰃⳌᔧ䱣ᴎⱘˈৃҹ⫼ᴹ԰ ㅫᴎѻ⫳ⱘ䱣ᴎ᭄⿄ЎĀӾ䱣ᴎ᭄ā˅DŽЎњ䖒ࠄሑৃ㛑ⱘ䱣ᴎˈ䳔㽕೼㋏㒳ⱘ䖤㸠Ёᓩܹϔѯ䱣ᴎⱘ 䅵ㅫᴎ㋏㒳೼Փ⫼Ёᐌᐌ᳝ѻ⫳䱣ᴎ᭄ⱘ㽕∖ˈԚᰃ㽕ѻ⫳ⳳℷⱘ䱣ᴎ᭄ᰃϡৃ㛑ⱘ˄᠔ҹ⬅䅵 1014 } 1013 return 0; 1012 register_irq_proc(irq); 1011 1010 spin_unlock_irqrestore(&desc•>lock,flags); 1009 } 1008 desc•>handler•>startup(irq); 1007 desc•>status &= ~(IRQ_DISABLED | IRQ_AUTODETECT | IRQ_WAITING); 1006 desc•>depth = 0; 1005 if (!shared) { 1004 1003 *p = new; 1002 1001 } 1000 shared = 1; 999 } while (old); 998 old = *p; 997 p = &old•>next; do { 996 212 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 152 #define BUILD_COMMON_IRQ() \ [IRQ0x03interrupt•>common_interrupt] ==================== include/asm•i386/hw_irq.h 152 161 ==================== ݀݅ⱘ䏇䕀Ⳃᷛ common_interrupt()ᰃ೼ include/asm•i386/hw_irq.h ЁᅮНⱘ˖ ᅮˈԚᅲ䰙Ϟैᰃ㒣䖛㊒ᖗ᥼ᭆⱘDŽއাᰃ԰㗙䱣ᛣⱘڣⱘ᳝ѯҷⷕⳟԐㅔऩˈད ⳂⱘDŽ㗠བᵰ㽕Ϣ঺ϔϾᐌ᭄Ⳍ↨䕗ˈ䙷ህ㟇ᇥ㽕໮䆓䯂ϔ⃵ݙᄬDŽҢ䖭Ͼ՟ᄤгৃҹⳟߎˈݙḌЁ ֓ⱘˈা㽕ϔᴵᆘᄬ఼ᣛҸህৃҹњˈབ“orl %%eax, %%eaxā៪“testl %%ecx, %%ecxā䛑ৃҹ䖒ࠄ ៤䋳᭄᮴⭥ᰃᬜ⥛᳔催ⱘDŽᇚϔϾᭈ᭄㺙ܹࠄϔϾ䗮⫼ᆘᄬ఼Пৢˈ㽕߸ᮁᅗᰃ৺໻ѢㄝѢ 0 ᰃᕜᮍ ϞϔϾᐌ᭄ˈ↨ᮍ䇈 0x1000ˈгৃҹ䖒ࠄⳂⱘDŽԚᰃˈབᵰ㗗㰥ࠄ䖤㸠ᯊⱘᬜ⥛ˈ䙷Мᡞ݊ЁПϔব ᔧ✊ˈ㽕ऎߚ㋏㒳䇗⫼ো੠Ёᮁ䇋∖োᑊϡ䴲ᕫᡞ݊ЁПϔব៤䋳᭄ϡৃDŽ՟བˈ೼Ёᮁ䇋∖োϞࡴ ᴹᄬᬒ㋏㒳䇗⫼োˈ㗠㋏㒳䇗⫼জϢЁᮁ᳡ࡵ݅⫼ϔ䚼ߚᄤ⿟ᑣDŽ䖭ḋˈህ㽕᳝Ͼ᠟↉ᴹࡴҹऎߚDŽ ህ⫼᭄ؐ 0x03 ϡᰃ᳈Ⳉ៾њᔧ৫˛䖭ᰃ಴Ўˈ㋏㒳ේᷜЁⱘ䖭Ͼԡ㕂೼಴㋏㒳䇗⫼㗠䖯ܹݙḌᯊ㽕⫼ এ 256 Փ݊ব៤䋳᭄ਸ਼˛ޣ䗮䖛䖭Ͼ᭄ؐᴹ⹂ᅮ䖭⃵Ёᮁⱘᴹ⑤DŽৃᰃЎҔМ㽕ҢЁᮁ䇋∖ো 0x03 Ё 䖭↉⿟ᑣⱘⳂⱘ೼ѢᇚϔϾϢЁᮁ䇋∖োⳌ݇ⱘ᭄ؐय़ܹේᷜˈՓᕫ೼ common_interrupt Ёৃҹ "jmp common_interrupt"); "pushl $0x03 • 256 \n\t" \ "IRQ0x03_interrupt: \n\t" \ "\n" \ __asm__ ( \ ⱘ῵ᓣ˖ བࠡ᠔䗄ˈ᠔᳝݀⫼Ёᮁ䇋∖ⱘ᳡ࡵ⿟ᑣᘏܹষᰃ⬅ gcc ⱘ乘໘⧚䰊↉⫳៤ⱘˈܼ䚼䛑݋᳝Ⳍৠ 䞠DŽݡ䇈ˈ៥Ӏ⦄೼ⱘ䅸䆚гৃҹ᳈⏅ܹϔѯњDŽ Ёᮁ᳡ࡵⱘᘏܹষ IRQ0xYY_interrupt ⱘҷⷕҹࠡᏆ㒣㾕ࠄ䖛њˈԚЎᮍ֓䍋㾕ݡᡞᅗ߫ߎ೼䖭 ЁᮁᏆ㹿݇ᮁ˗೼䞡ᮄᓔਃЁᮁПࠡݡ≵᳝݊ᅗⱘЁᮁৃҹথ⫳њDŽ EFLAGS ⱘݙᆍҹঞ䖨ಲഄഔ໪ህϔ᮴᠔᳝њDŽ঺໪ˈ⬅Ѣ᠔こ䖛ⱘᰃЁᮁ䮼˄㗠ϡᰃ䱋䰅䮼˅ˈ᠔ҹ ߛᤶࠄݙḌේᷜᯊˈ䖭ϾේᷜϔᅮᰃぎⱘDŽ䖭ḋˈᔧ CPU 䖯ܹ IRQ0x03_interrupt ᯊˈේᷜЁ䰸ᆘᄬ఼ 䯈ᯊේᷜᣛ䩜ϔᅮಲࠄ݊ॳ⚍ˈ៪᳄Āේᷜᑩ䚼āDŽгህᰃ䇈ˈᔧ CPU Ң TSS ЁপߎݙḌේᷜᣛ䩜ᑊ 䆹ᣛߎˈCPU ↣⃵Փ⫼ݙḌේᷜᯊᇍේᷜ᠔԰ⱘ᪡԰ᘏᰃഛ㸵ⱘˈ᠔ҹ↣⃵Ң㋏㒳ぎ䯈䖨ಲࠄ⫼᠋ぎ TSS Ёপߎ⫼ѢݙḌ˄0 㑻˅ⱘේᷜᣛ䩜ˈᑊᡞේᷜߛᤶࠄݙḌේᷜˈेᔧࠡ䖯⿟ⱘ㋏㒳ぎ䯈ේᷜDŽᑨ 㗠Ёᮁ᳡ࡵ⿟ᑣሲѢݙḌˈ݊䖤㸠㑻߿ DPL Ў 0ˈѠ㗙ϡৠDŽ᠔ҹˈCPU 㽕Ңᆘᄬ఼ TR ᠔ᣛⱘᔧࠡ ᅮЎ IRQ0x03_interruptDŽ⬅ѢЁᮁᰃᔧ CPU ೼⫼᠋ぎ䯈Ё䖤㸠ᯊথ⫳ⱘˈᔧࠡⱘ䖤㸠㑻߿ CPL Ў 3˗؛ ˈ㗠䆹㸼乍ᑨ䆹ᰃϔϾЁᮁ䮼DŽ䖭ḋˈCPU ህḍ᥂Ёᮁ䮼ⱘ䆒㕂㗠ࠄ䖒њ䆹䗮䘧ⱘᘏ᳡ࡵ⿟ᑣⱘܹষ CPU ҢЁᮁ᥻ࠊ఼পᕫЁᮁ৥䞣ˈ✊ৢḍ᥂݋ԧⱘЁᮁ৥䞣ҢЁᮁ৥䞣㸼 IDT ЁᡒࠄⳌᑨⱘ㸼乍ˈ ⴔⱘˈ᠔ҹ CPU ೼ᠻ㸠ᅠᔧࠡᣛҸৢህᴹડᑨ䆹⃵Ёᮁ䇋∖DŽ ϔ⃵Ёᮁ䇋∖DŽ䆹䇋∖䗮䖛Ёᮁ᥻ࠊ఼ i8259A ࠄ䖒њ CPU ⱘĀЁᮁ䇋∖āᓩ㒓 INTRDŽ⬅ѢЁᮁᰃᓔ ⱘЁᮁ䇋∖䯳߫Ёˈ㋏㒳ℷ೼⫼᠋ぎ䯈ℷᐌ䖤㸠˄᠔ҹЁᮁᖙ✊ᰃᓔⴔⱘ˅ˈᑊϨᶤϾ໪䆒Ꮖ㒣ѻ⫳њ ᅮ໪䆒ⱘ偅ࡼ⿟ᑣ䛑Ꮖ㒣ᅠ៤њ߱ྟ࣪ˈᑊϨᏆᡞⳌᑨⱘЁᮁ᳡ࡵ⿟ᑣᣖܹࠄ⡍ᅮ؛䖭䞠ˈ៥Ӏ ⸔䆒ᮑāDŽᇍℸѠ㗙ⱘњ㾷੠⧚㾷ˈ᳝ࡽѢ䇏㗙ᇍᭈϾݙḌⱘ⧚㾷DŽ 213 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ೒ 3.6 ᔶᓣDŽ ᷜ˅˗៪㗙㒻㓁Փ⫼㗠᮴䳔ֱᄬ˄བᵰϡ᳈ᤶේᷜ˅DŽ䖭ḋˈ೼ SAVE_ALL ҹৢˈේᷜЁⱘݙᆍህ៤Ў ϔϾЎ 3 㑻DŽ㟇Ѣॳᴹⱘේᷜ↉ᆘᄬ఼ SS ੠ේᷜᣛ䩜 SP ⱘݙᆍˈ߭៪㗙Ꮖ㹿य़ܹේᷜ˄བᵰ᳈ᤶේ __KERNEL_DS ੠__USER_DS 䛑ᣛ৥Ң 0 ᓔྟⱘぎ䯈˗᠔ϡৠⱘাᰃ䖤㸠㑻߿ DPL ϔϾЎ 0 㑻ˈ঺ ⱘݙᆍ㹿ֱᄬ೼ේᷜЁˈ✊ৢህ㹿ᬍ៤ᣛ৥⫼ѢݙḌⱘ__KERNEL_DSDŽ៥Ӏ೼㄀ 2 ゴЁ䆆䖛ˈ CPU ೼䖯ܹЁᮁ᳡ࡵᯊᏆ㒣ᡞᅗⱘݙᆍ䖲ৠ䖨ಲഄഔϔ䍋य़ܹේᷜњDŽ㄀Ѡᰃ↉ᆘᄬ఼ DS ੠ ES ॳᴹ 䖭䞠㽕ᣛߎϸ⚍˖㄀ϔᰃᷛᖫԡᆘᄬ఼ EFLAGS ⱘݙᆍᑊϡᰃ೼ SAVE_ALL Ёֱᄬⱘˈ䖭ᰃ಴Ў 99 movl %edx,%es; 98 movl %edx,%ds; \ 97 movl $(__KERNEL_DS),%edx; \ 96 pushl %ebx; \ 95 pushl %ecx; \ 94 pushl %edx; \ 93 pushl %esi; \ 92 pushl %edi; \ 91 pushl %ebp; \ 90 pushl %eax; \ 89 pushl %ds; \ 88 pushl %es; \ 87 cld; \ 86 #define SAVE_ALL \ ==================== arch/i386/kernel/entry.S 86 99 ==================== arch/i386/kernel/entry.S Ё˖ 䛑ֱᄬ೼ේᷜЁˈᕙЁᮁ᳡ࡵᅠ↩㽕䖨ಲПࠡݡᴹĀᘶ໡⦄എāDŽSAVE_ALL ⱘᅮН೼ 䖭䞠Џ㽕ⱘ᪡԰ᰃᅣ᪡԰ SAVE_ALLˈህᰃ᠔䇧Āֱᄬ⦄എāˈᡞЁᮁথ⫳ࠡ໩᠔᳝ᆘᄬ఼ⱘݙᆍ 161 160 "jmp "SYMBOL_NAME_STR(do_IRQ)); 159 SYMBOL_NAME_STR(call_do_IRQ)":\n\t" \ 158 "pushl $ret_from_intr\n\t" \ 157 SAVE_ALL \ 156 "common_interrupt:\n\t" \ 155 "\n" __ALIGN_STR"\n" \ 154 __asm__( \ asmlinkage void call_do_IRQ(void); \ 153 214 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ේᷜⱘ˄Ёᮁ䇗⫼ো•256˅᠔೼ⱘԡ㕂⿄Ў ORIG_EAXˈᇍЁᮁ᳡ࡵ⿟ᑣ㗠㿔ᅗҷ㸼ⴔЁᮁ䇋∖োDŽ ఼ⱘݙᆍ೼㋏㒳ේᷜЁⱘԡ㕂ⳌᇍѢℸᯊⱘේᷜᣛᓩⱘԡ⿏DŽࠡ䴶೼䕀ܹ common_interrupt Пࠡय़ܹ 䖭䞠ⱘ EAXˈВ՟ᴹ䇈ˈᔧߎ⦄೼ entry.S ⱘҷⷕЁᯊᑊϡᰃ㸼⼎ᆘᄬ఼%%eaxˈ㗠ᰃ㸼⼎䆹ᆘᄬ 64 OLDSS = 0x38 63 OLDESP = 0x34 62 EFLAGS = 0x30 61 CS = 0x2C 60 EIP = 0x28 59 ORIG_EAX = 0x24 58 ES = 0x20 57 DS = 0x1C 56 EAX = 0x18 55 EBP = 0x14 54 EDI = 0x10 53 ESI = 0x0C 52 EDX = 0x08 51 ECX = 0x04 50 EBX = 0x00 ==================== arch/i386/kernel/entry.S 50 64 ==================== ѯ݇㋏ᅮНњϔѯᐌ᭄˖ ℸᯊ㋏㒳ේᷜЁ৘乍ⳌᇍѢේᷜᣛᓩⱘԡ㕂བ೒ 3.6 ᠔⼎ˈ㗠 arch/i386/kernel/entry.S Ёгḍ᥂䖭 ೒ 3.6 䖯ܹЁᮁ᳡ࡵ⿟ᑣᯊ㋏㒳ේᷜ⼎ᛣ೒ 215 216 ಲࠄ common_interrupt ⱘҷⷕDŽ೼ SAVE_ALL ҹৢˈজᇚϔϾ⿟ᑣᷛো˄ܹষ˅ret_from_intr य़ ܹේᷜˈᑊ䗮䖛 jmp ᣛҸ䕀ܹ঺ϔ↉⿟ᑣ do_IRQ()DŽ䇏㗙ৃ㛑Ꮖ⊼ᛣࠄˈIRQ0x03_interrrupt ੠ common_interrupt ᴀ䋼Ϟ䛑ϡᰃߑ᭄ˈᅗӀ䛑≵᳝Ϣ return ⳌᔧⱘᣛҸˈ᠔ҹҢ common_interrupt ϡ㛑 䖨ಲࠄ IRQ0x03_interruptˈ㗠Ң IRQ0x03_interrupt гϡ㛑ᠻ㸠Ёᮁ䖨ಲDŽৃᰃˈdo_IRQ()ैᰃϔϾߑ ᭄DŽ᠔ҹˈ೼䗮䖛 jmp ᣛҸ䕀ܹ do_IRQ()Пࠡᇚ䖨ಲഄഔ ret_from_intr य़ܹේᷜህ῵ᢳњϔ⃵ߑ᭄䇗 ⫼ˈӓԯᇍ do_IRQ()ⱘ䇗⫼ህথ⫳೼ CPU 䖯ܹ ret_from_intr ⱘ㄀ϔᴵᣛҸࠡ໩ϔḋDŽ䖭ḋˈᔧҢ do_IRQ() 䖨ಲᯊህӮĀ䖨ಲāࠄ ret_from_intr 㒻㓁ᠻ㸠DŽdo_IRQ()ᰃ೼ arch/i386/kernel/irq.c ЁᅮНⱘˈ៥Ӏܜ ᴹⳟᓔ༈޴㸠˖ ==================== arch/i386/kernel/irq.c 543 565 ==================== [IRQ0x03_interrupt•>common_interrupt•>do_IRQ()] 543 /* 544 * do_IRQ handles all normal device IRQ's (the special 545 * SMP cross•CPU interrupts have their own specific 546 * handlers). 547 */ 548 asmlinkage unsigned int do_IRQ(struct pt_regs regs) 549 { 550 /* 551 * We ack quickly, we don't want the irq controller 552 * thinking we're snobs just because some other CPU has 553 * disabled global interrupts (we have already done the 554 * INT_ACK cycles, it's too late to try to pretend to the 555 * controller that we aren't taking the interrupt). 556 * 557 * 0 return value means that this irq is already being 558 * handled by some other CPU. (or is disabled) 559 */ 560 int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code */ 561 int cpu = smp_processor_id(); 562 irq_desc_t *desc = irq_desc + irq; 563 struct irqaction * action; 564 unsigned int status; 565 ߑ᭄ⱘ䇗⫼খ᭄ᰃϔϾ pt_regs ᭄᥂㒧ᵘDŽ⊼ᛣˈ䖭ᰃϔϾ᭄᥂㒧ᵘˈ㗠ϡᰃᣛ৥᭄᥂㒧ᵘⱘᣛ䩜DŽ гህᰃ䇈ˈ೼ේᷜЁⱘ䖨ಲഄഔҹϞⱘԡ㕂Ϟᑨ䆹ᰃϔϾ᭄᥂㒧ᵘⱘ᯴䈵DŽ᭄᥂㒧ᵘ pt_regs ᰃ೼ include/asm•i386/ptrace.h ЁᅮНⱘ˖ ==================== include/asm•i386/ptrace.h 23 42 ==================== 23 /* this struct defines the way the registers are stored on the 24 stack during a system call. */ 25 26 struct pt_regs { 27 long ebx; 28 long ecx; 29 long edx; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 572 */ 571 WAITING is used by probe to mark irqs that are being tested 570 REPLAY is when Linux resends an IRQ that was dropped earlier 569 /* 568 desc•>handler•>ack(irq); 567 spin_lock(&desc•>lock); 566 kstat.irqs[cpu][irq]++; [IRQ0x03_interrupt•>common_interrupt•>do_IRQ()] ==================== arch/i386/kernel/irq.c 566 587 ==================== 㸠˅DŽϟ䴶ህᰃᇍ݋ԧЁᮁ䇋∖䯳߫ⱘ᪡԰њDŽ៥Ӏ㒻㓁೼ do_IRQ()Ёᕔϟⳟ˖ ⦄೼ˈ᮶✊Ёᮁ䇋∖োᏆ㒣ᘶ໡ˈҢ᭄㒘 irq_desc[]ЁᡒࠄⳌᑨⱘЁᮁ䇋∖䯳߫ᔧ✊ᰃ䕏㗠ᯧВⱘњ˄562 ҷⷕЁ 561 㸠ⱘ smp_processor_id()ᰃЎ໮໘⧚఼ SMP 㒧ᵘ㗠䆒ⱘˈ೼ऩ໘⧚఼㋏㒳Ёᘏᰃ䖨ಲ 0DŽ DŽމҙ⫼ѢЁᮁ᳡ࡵˈ᠔ҹϡ䳔㽕乒ঞ㋏㒳䇗⫼ᯊⱘᚙ ᷜⱘ᭄ؐЎ 0xffffff03ˈ⦄೼䗮䖛 regs.orig_eax 䇏ಲᴹᑊϨᡞ催ԡሣ㬑ᥝˈህজᕫࠄ 0x03DŽ⬅Ѣ do_IRQ() Ёৃⶹ䘧Ёᮁⱘᴹ⑤ˈ⦄೼䖯ܹ do_IRQ()ҹৢⱘ㄀ϔӊџህᰃ㽕ᓘ⏙䖭ϔ⚍DŽҹ IRQ3 Ў՟ˈय़ܹේ ࠡ䴶䆆䖛ˈ೼ IRQ0x03_interrupt Ёᡞ᭄ؐ(0x03•256)य़ܹේᷜⱘⳂⱘᰃՓᕫ೼݀݅ⱘЁᮁ໘⧚⿟ᑣ ѢЁᮁˈ䖬⫼Ѣ㋏㒳䇗⫼DŽ⫼ܝҹৢ䇏㗙䖬Ӯⳟࠄˈᇍ㋏㒳ේᷜⱘ䖭⾡ᅝᥦϡ Ѣ㋏㒳ේᷜⱘ䖭⾡ᅝᥦ෎ᴀϞᰃϔ㟈ⱘDŽ ሲѢ䗮⫼ⱘЁᮁ䇋∖ˈ㗠ᰃЎ CPU ֱ⬭ϧ⫼ⱘˈ᠔ҹЁᮁথ⫳ᯊᑊϡ㒣䖛 do_IRQ()䖭ᴵ䏃㒓ˈԚᰃᇍ ⊩ᇚ䖭ϔ⚍䆆⏙Ἦˈ᠔ҹ⬹њ䖛এDŽ㗠⦄೼㒧ড়䖯ܹЁᮁⱘ䖛⿟ϔⳟህ⏙ἮњDŽϡ䖛ˈ义䴶ᓖᐌᑊϡ ㄀ϔϾখ᭄ህᰃᣛ৥ struct pt_regsⱘᣛ䩜ˈᅲ䰙Ϟህᰃᣛ৥㋏㒳ේᷜЁⱘ䙷ഫഄᮍDŽᔧᯊ៥Ӏ᮴ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code); ᳡ࡵ⿟ᑣ do_page_fault()ˈ݊䇗⫼খ᭄㸼Ў˖ ⱘݙᆍӴϟএˈϡ䖛䙷ᯊা㽕ӴϔϾᣛ䩜ህ໳њDŽ䇏㗙ϡོಲ乒ϔϟ៥Ӏ೼㄀ 2 ゴЁ䆆䖛ⱘ义䴶ᓖᐌ Ң䙷䞠ᠻ㸠Ёᮁ䖨ಲDŽৃᛇ㗠ⶹˈᔧ do_IRQ()䇗⫼݋ԧⱘЁᮁ᳡ࡵ⿟ᑣᯊгϔᅮӮᡞ pt_regs ᭄᥂㒧ᵘ Ё᮶ৃҹᮍ֓ഄⶹ䘧䖯ܹЁᮁࠡ໩৘Ͼᆘᄬ఼ⱘݙᆍˈজৃҹ೼ᠻ㸠ᅠ↩ৢ䖨ಲࠄ ret_from_intrˈᑊϨ ⱘˈᅲ䰙Ϟ䛑ᰃ೼Ў do_IRQ()ᓎゟϔϾ῵ᢳⱘᄤ⿟ᑣ䇗⫼⦃๗ˈՓᕫ೼ do_IRQ()خ೼䖯ܹЁᮁᯊ㞾ࡼ ⱘϔߛˈࣙᣀ CPUخⳌֵ䇏㗙ϔᅮӮ㘨ᛇࠄࠡ䴶䆆䖛ⱘ㋏㒳ේᷜⱘݙᆍᑊᏆᘡ✊໻ᙳ˖ॳᴹࠡ䴶᠔ 42 }; 41 int xss; 40 long esp; 39 long eflags; 38 int xcs; 37 long eip; 36 long orig_eax; 35 int xes; 34 int xds; 33 long eax; 32 long ebp; 31 long edi; long esi; 30 217 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 607 for (;;) { 606 */ 605 * SMP environment. 604 * useful for irq hardware that does not mask cleanly in an 603 * instance of the irq, not the third or fourth. So it is mostly 602 * or in the handler. But the code here only handles the _second_ 601 * instance of the same irq to arrive while we are in do_IRQ 600 * This applies to any hw interrupts that allow a second 599 * pending events. 598 * Edge triggered interrupts need to remember 597 /* 596 595 goto out; 594 if (!action) 593 */ 592 will take care of it. 591 a different instance of this same irq, the other processor 590 Since we set PENDING, if another processor is handling 589 * If there is no IRQ handler or it was disabled, exit early. 588 /* [IRQ0x03_interrupt•>common_interrupt•>do_IRQ()] ==================== arch/i386/kernel/irq.c 588 623 ==================== Ў໮໘⧚఼䆒㕂ⱘˈ㗠 IRQ_PENDING ⱘ԰⫼߭ϟ䴶ህӮⳟࠄ˖ IRQ_INPROGRESS ᷛᖫԡ䆒៤ 1ˈ㗠ᇚ IRQ_PENDING ᷛᖫԡ⏙ 0DŽ݊Ё IRQ_INPROGRESS Џ㽕ᰃ 䖛DŽҢ 569 㸠㟇 586 㸠Џ㽕ᰃᇍ desc•>statusˈेЁᮁ䗮䘧⢊ᗕⱘ໘⧚੠䆒㕂ˈ݇䬂೼Ѣᇚ݊ 䖭ӊџDŽᇍߑ᭄ᣛ䩜 desc•>handle•>ack ⱘ䆒㕂ࠡ䴶Ꮖ㒣䆆خ㸼⼎Ā៥Ꮖ㒣೼໘⧚āˈ䖭䞠ⱘ 568 㸠ህᰃ Ёᮁ໘⧚఼˄བ i8259A˅೼ᇚЁᮁ䇋∖ĀϞ᡹āࠄ CPU ҹৢˈᳳᕙ CPU 㒭ᅗϔϾ⹂䅸˄ACK˅ˈ 㒳㒧ᵘāϔゴЁ䆆䗄ˈ䖭䞠᱖Ϩা㗗㰥ऩ໘⧚఼㒧ᵘDŽ 㗠䆒㕂ⱘˈ៥Ӏᇚ೼Ā໮໘⧚఼ SMP ㋏މМ 567 㸠䖬㽕䇗⫼ spin_lock()ࡴ䫕ਸ਼˛䖭ᰃЎ໮໘⧚఼ⱘᚙ ᔧ䗮䖛Ёᮁ䮼䖯ܹЁᮁ᳡ࡵᯊˈCPU ⱘЁᮁડᑨᴎࠊህ㞾ࡼ㹿݇ᮁњDŽ᮶✊Ꮖ㒣݇䯁ЁᮁˈЎҔ 587 586 desc•>status = status; 585 } 584 status |= IRQ_INPROGRESS; /* we are handling it */ 583 status &= ~IRQ_PENDING; /* we commit to handling */ 582 action = desc•>action; 581 if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) { 580 action = NULL; 579 */ 578 * use the action we have. 577 * If the IRQ is disabled for whatever reason, we cannot 576 /* 575 574 status |= IRQ_PENDING; /* we _want_ to handle it */ status = desc•>status & ~(IRQ_REPLAY | IRQ_WAITING); 573 218 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 〇ᅮᗻ੠ৃ䴴ᗻгℷᰃỡḍѢ䖭⾡Ң Unix ᯊҷ㒻ᡓϟᴹǃᑊ㒣䖛ᯊ䯈㗗偠ⱘ䆒䅵ЁDŽᔧ✊ˈ೼ᵕッⱘ 䆒䅵੠ᅲ⦄໡ᴖ࣪њDŽ䖭Мϔ༫ᴎࠊⱘ䆒䅵੠ᅲ⦄ˈϡ㛑ϡ䇈ᰃ䴲ᐌ਼ࠄǃ䴲ᐌᎻ཭ⱘDŽ㗠 Linux ⱘ ໘⧚ⱘ䆱ˈ䙷ህ㽕∖᠔᳝ⱘЁᮁ᳡ࡵ⿟ᑣ䛑ᖙ䳔ᰃĀৃ䞡ܹāⱘĀ㒃ҷⷕāˈ䙷ḋህՓЁᮁ᳡ࡵ⿟ᑣⱘ 䆌ᑊথഄ䖯ܹৠϔϾЁᮁ᳡ࡵ⿟ᑣDŽབᵰϡᰃ䖭ḋܕ䆌Ёᮁ᳡ࡵጠ༫ˈ㗠ᇍѢϡৠⱘ CPU ߭ϡܕ㿔ϡ 䖭ḋˈৠϔϾЁᮁ䗮䘧ϞⱘЁᮁ໘⧚ህᕫࠄњϹḐⱘĀІ㸠࣪āDŽгህᰃ䇈ˈᇍѢৠϔϾ CPU 㗠 ⑤˅ⱘЁᮁጠ༫࣪㾷៤ЎϔϾᕾ⦃DŽ ҹজᕾ⦃ಲࠄ 609 㸠ݡ᳡ࡵϔ⃵DŽ䖭ḋˈህᡞᴀᴹৃ㛑থ⫳ⱘ೼ৠϔ䗮䘧Ϟ˄⫮㟇ৃ㛑ᴹ㞾ৠϔЁᮁ ᠔ˈމԡҡ✊Ў 0ˈ䙷Мᕾ⦃ህ೼ 613 㸠㒧ᴳњDŽ㗠བᵰব៤њ 1ˈ䙷ህ䇈ᯢᏆ㒣থ⫳䖛ࠡ䗄ⱘᶤ⾡ᚙ Ёⱘ IRQ_PENDING ᷛᖫᖙ✊Ў 0DŽᔧ CPU ᅠ៤њ݋ԧⱘЁᮁ᳡ࡵ䖨ಲࠄ 610 㸠ҹৢˈབᵰ䖭Ͼᷛᖫ ᕾ⦃Ёˈ݋ԧⱘЁᮁ᳡ࡵᰃ೼ 609 㸠ⱘ handle_IRQ_event()Ё䖯㸠ⱘDŽ೼䖯ܹ 609 㸠ᯊˈdesc•>status 䙷МˈIRQ_PENDING ᷛᖫԡࠄᑩᰃᗢḋ䍋԰⫼ⱘਸ਼˛䇋ⳟ 612 ੠ 613 ϸ㸠DŽ䖭ᰃ೼ϔϾ᮴䰤 for ᷛᖫԡЎ 1DŽ IN_PENDING 㕂៤Ў 1DŽᘏПˈ䖭ϸ⾡ᚙᔶϟ᳔ৢⱘ㒧ᵰгᰃϔḋⱘˈे desc•>status Ёⱘ IRQ_PENDING ⱘ䙷⃵ЁᮁгӮ಴Ў IRQ_INPROGRESS ᷛᖫЎ 1 㗠㒣 595 㸠䖨ಲˈԚгᰃᇚ desc•>status ⱘ ᰃ಴ᶤ⾡ॳ಴জᇚЁᮁᓔਃњˈ㗠Ϩ೼ৠϔϾЁᮁ䗮䘧Ёজѻ⫳њϔ⃵ЁᮁDŽ೼䖭⾡ᚙᔶϟৢ䴶থ⫳ Ёⱘ IRQ_PENDING ᷛᖫԡгᰃ 1DŽ㄀ 2 ⾡ᚙᔶᰃ೼ऩ໘⧚఼㋏㒳Ё CPU Ꮖ㒣೼Ёᮁ᳡ࡵ⿟ᑣЁˈԚ 䖯ܹњ do_IRQ()ˈ䖭ᯊ׭⬅Ѣ䯳߫ⱘ IRQ_INPROGRESS ᷛᖫЎ 1 㗠㒣 595 㸠䖨ಲˈℸᯊ desc•>status ᔶϟᠡӮথ⫳DŽϔ⾡ᚙᔶᰃ೼໮໘⧚఼ SMP ㋏㒳㒧ᵘЁˈϔϾ CPU ℷ೼Ёᮁ᳡ࡵˈ㗠঺ϔϾ CPU জ ᰃ᳡ࡵᏆ㒣ᓔਃˈ䯳߫гϡᰃぎⱘˈৃᰃ IRQ_INPROGRESS ᷛᖫЎ 1DŽ䖭া᳝೼ϸ⾡ᚙމ᳔ৢϔ⾡ᚙ ᖙ✊ᰃ݇ⴔⱘˈ಴Ў䖭ᰃ೼ᇚ㄀ϔϾ᳡ࡵ⿟ᑣᣖܹ䯳߫ᯊᠡᓔਃⱘDŽ᠔ҹˈ䖭ϸ⾡ᚙᔶᅲ䰙ϞⳌৠDŽ Ӯⳟࠄ䖭Ͼᷛᖫԡ㗠㸹Ϟϔ⃵Ёᮁ᳡ࡵˈ⿄Ў“IRQ_REPLAYāDŽ㗠བᵰ䯳߫ᰃぎⱘˈ䙷МᭈϾ䗮䘧г ੠ 583 㸠˅DŽ䖭ḋˈҹৢᔧ CPU˄೼໮໘⧚఼㋏㒳㒧ᵘЁ᳝ৃ㛑ᰃ঺ϔϾ CPU˅ᓔਃ䆹䯳߫ⱘ᳡ࡵᯊˈ ϟ desc•>status Ёⱘ IRQ_PENDING ᷛᖫЎ 1˄㾕 574މᕔϟᠻ㸠њˈ᠔ҹাད䖨ಲDŽԚᰃˈ೼䖭޴⾡ᚙ IRQ_INPROGRESS ᷛᖫԡЎ 1ˈ៪㗙䯳߫ᰃぎⱘˈ䙷Мᣛ䩜 action Ў NULL˄㾕 580 ੠ 582 㸠˅ˈ᮴⊩ བᵰᶤϔϾЁᮁ䇋∖䯳߫ⱘ᳡ࡵᰃ݇䯁ⴔⱘ˄IRQ_DISABLED ᷛᖫԡЎ 1˅ˈ៪㗙 623 spin_unlock(&desc•>lock); 622 desc•>handler•>end(irq); 621 */ 620 * disabled while the handler was running. 619 * The •>end() handler has to deal with interrupts which got 618 /* 617 out: 616 desc•>status &= ~IRQ_INPROGRESS; 615 } 614 desc•>status &= ~IRQ_PENDING; 613 break; 612 if (!(desc•>status & IRQ_PENDING)) 611 610 spin_lock(&desc•>lock); 609 handle_IRQ_event(irq, ®s, action); spin_unlock(&desc•>lock); 608 219 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 448 return status; 447 446 irq_exit(cpu, irq); 445 444 __cli(); 443 add_interrupt_randomness(irq); 442 if (status & SA_SAMPLE_RANDOM) 441 } while (action); 440 action = action•>next; 439 action•>handler(irq, action•>dev_id, regs); 438 status |= action•>flags; 437 do { 436 435 __sti(); 434 if (!(action•>flags & SA_INTERRUPT)) 433 432 status = 1; /* Force the "do bottom halves" bit */ 431 430 irq_enter(cpu, irq); 429 428 int cpu = smp_processor_id(); 427 int status; 426 { 425 int handle_IRQ_event(unsigned int irq, struct pt_regs * regs, struct irqaction * action) 424 */ 423 * prefer. 422 * waste of time and is not what some drivers would 421 * end up _always_ checking the bottom half, which is a 420 * we should do bottom half handling etc. Right now we 419 * This should really return information about whether 418 /* [IRQ0x03_interrupt•>common_interrupt•>do_IRQ()>handle_IRQ_event()] ==================== arch/i386/kernel/irq.c 418 449 ==================== ⷕг೼ arch/i386/kernel/irq.c Ё˖ 䅽ᅗӀ䕼䅸ᴀ⃵Ёᮁ䇋∖ᰃ৺ᴹ㞾৘㞾ⱘ᳡ࡵᇍ䈵ˈेЁᮁ⑤ˈབᵰᰃህ䖯㗠ᦤկⳌᑨⱘ᳡ࡵDŽ݊ҷ ݡⳟϞ䴶 for ᕾ⦃Ё䇗⫼ⱘ handle_IRQ_event()ˈ䖭Ͼߑ᭄ձ⃵ᠻ㸠䯳߫Ёⱘ৘ϾЁᮁ᳡ࡵ⿟ᑣˈ 㕂དⱘDŽ њЁᮁ᥻ࠊ఼⹀ӊⱘ㽕∖ˈ᠔䇗⫼ⱘߑ᭄гᰃ೼䯳߫߱ྟ࣪ᯊ䆒އЁᮁ᳡ࡵā᪡԰˄622 㸠˅ˈ݋ԧপ ᳔ৢˈ೼ᕾ⦃㒧ᴳҹৢˈা㽕ᴀ䯳߫ⱘЁᮁ᳡ࡵ䖬ᰃᓔⴔⱘˈህ㽕ᇍЁᮁ᥻ࠊ఼ᠻ㸠ϔ⃵Ā㒧ᴳ ㋏㒳㒧ᵘⱘ㗗㰥DŽ ϟ䖯㸠ⱘˈ䖭гᰃߎѢᇍ໮໘⧚఼ SMPމ䖬㽕ᣛߎˈᇍ desc•>status ⱘӏԩᬍব䛑ᰃ೼ࡴ䫕ⱘᚙ ⱘ԰㗙ᑨ䆹঺䇋催ህњDŽ 㗠ᕫϡࠄ㑴ℷⱘ䆱ˈ䙷М䆹Ёᮁ᳡ࡵ⿟ᑣމᕾ⦃ব៤ϔϾⳳℷⱘĀ᮴䰤āᕾ⦃DŽབᵰⳳⱘথ⫳䖭⾡ᚙ 䇋∖ˈՓᕫ CPU ↣⃵Ң handle_IRQ_event()䖨ಲᯊ IRQ_PENDING ᷛᖫ∌䖰ᰃ 1ˈҢ㗠Փ 607 㸠ⱘ for ϟˈг᳝ৃ㛑Ӯথ⫳䖭ḋⱘᚙ᱃˖Ёᮁ᳡ࡵ⿟ᑣЁᘏᰃᡞЁᮁᠧᓔˈ㗠Ёᮁ⑤জϡᮁഄѻ⫳Ёᮁމᚙ 220 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 627 return 1; 626 do_softirq(); 625 if (softirq_active(cpu) & softirq_mask(cpu)) 624 [IRQ0x03_interrupt•>common_interrupt•>do_IRQ()] ==================== arch/i386/kernel/irq.c 624 628 ==================== ᕔϟⳟ˖ ѯ⊼㾷˄418̚424 㸠˅ˈ݊԰⫼೼ⳟњϟ䴶䖭ϔ↉ҹৢህӮᯢⱑDŽ៥Ӏ䱣ⴔ CPU ಲࠄ do_IRQ()Ё㒻㓁 Ң handle_IRQ_event()䖨ಲⱘ status ⱘ᳔Ԣԡᖙ✊Ў 1ˈ䖭ᰃ೼ 432 㸠䆒㕂ⱘDŽҷⷕЁ䖬Ўℸࡴњ add_interrupt_randomness()ᴹᅲ⦄DŽ᳝݇䆺ᚙ೼䆒໛偅ࡼϔゴЁ䖬Ӯ䆆ࠄDŽ ᳔ৢˈ೼ 442 㟇 443 㸠ˈབᵰ䯳߫ЁⱘᶤϾ᳡ࡵ⿟ᑣ㽕Ў㋏㒳ᓩܹϔѯ䱣ᴎᗻⱘ䆱ˈህ䇗⫼ ҹˈᅲ䰙ϞϡӮ᳝ᰒ㨫ⱘᕅડDŽ 䖨ಲњˈ䖭Ͼ䖛⿟ϔ㠀া䳔㽕޴ᴵᴎ఼ᣛҸ˗݊⃵ˈ↣Ͼ䯳߫Ё᳡ࡵ⿟ᑣⱘ᭄䞣ϔ㠀гϡӮ໾໻DŽ᠔ ϔ㠀ᰃ䇏Ⳍᑨ䆒໛˄᥹ষवϞ˅ⱘЁᮁ⢊ᗕᆘᄬ఼ˈⳟᰃ৺᳝ᴹ㞾䆹䆒໛ⱘЁᮁ䇋∖ˈབ≵᳝ህ偀Ϟ ೼↣Ͼ݋ԧⱘЁᮁ᳡ࡵ⿟ᑣЁ䛑ᑨ䆹˄䗮ᐌ䛑⹂ᅲᰃ˅ϔᓔྟህẔᶹ৘㞾ⱘЁᮁ⑤ˈˈܜӮϹ䞡DŽ佪 㽕ձ⃵ᡞ䯳߫Ё᠔᳝ⱘ᳡ࡵ⿟ᑣձ⃵䛑ᠻ㸠ϔ䘡ˈቖ䴲Փᬜ⥛໻䰡˛ಲㄨᰃ˖⹂ᅲӮ᳝᠔ϟ䰡ˈԚϡ 䇏㗙៪䆌Ӯ䯂ˈབᵰЁᮁ䇋∖䯳߫Ё᳝໮Ͼ᳡ࡵ⿟ᑣᄬ೼ˈ↣⃵᳝ᴹ㞾䖭Ͼ䗮䘧ⱘЁᮁ䇋∖ᯊህ њDŽ pt_regs ᭄᥂㒧ᵘᣛ䩜 regs њDŽ㟇Ѣ݋ԧⱘЁᮁ᳡ࡵ⿟ᑣˈ䙷ᰃ䆒໛偅ࡼ㣗⭈ݙⱘϰ㽓ˈ䖭䞠ህϡ䅼䆎 ᑣ㞾㸠㾷䞞੠䖤⫼ˈ䖭ᰃ⬅䆒໛偅ࡼ⿟ᑣ೼䇗⫼ request_irq()ᯊ㞾Ꮕ㾘ᅮⱘ˗᳔ৢϔϾህᰃࠡ䗄ⱘ ᳡ࡵ⿟ᑣDŽ䇗⫼ⱘখ᭄᳝ϝϾ˖irq ЎЁᮁ䇋∖ো˗action•>dev_id ᰃϔϾ void ᣛ䩜ˈ⬅݋ԧⱘ᳡ࡵ⿟ ✊ৢˈҢ 437 㸠㟇 441 㸠ⱘ do•while ᕾ⦃ህᰃᅲ䋼ᗻⱘ᪡԰њDŽᅗձ⃵䇗⫼䯳߫Ёⱘ↣ϔϾЁᮁ ϟᠻ㸠DŽ䖭䞠ⱘ 434̚435 㸠੠ 444 㸠ህᰃЎℸ㗠䆒ⱘ˄_sti()ЎᓔЁᮁˈ_cli()Ў݇Ёᮁ˅DŽމਃЁᮁⱘᚙ 䆌ᇚখ᭄ irqflags ЁⱘϔϾᷛᖫԡ SA_INTERRUPT 㕂៤ 0ˈ㸼⼎䆹᳡ࡵ⿟ᑣᑨ䆹೼ᓔܕˈ᳡ࡵ䯳߫ᯊ ᔧህ䖬ᰃৃ㸠ⱘDŽᔧ✊ˈᖙ乏कߚᇣᖗDŽ᠔ҹˈ೼䇗⫼ request_irq()ᇚϔϾЁᮁ᳡ࡵ⿟ᑣᣖܹᶤϾЁᮁ 䆌Ёᮁ೼ϡৠⱘ䗮䘧Ϟጠ༫˗߭া㽕໘⧚ᕫܕˈIRQ_PENDING ᷛᖫԡⱘ䖤⫼ᴹֱ䆕њ䖭ϔ⚍DŽৃᰃ ⱘˈ಴ℸݙḌ೼ do_IRQ Ё䗮䖛ܡ䆌Ёᮁ೼ৠϔϾЁᮁ⑤៪ৠϔϾЁᮁ䗮䘧ጠ༫ᰃᑨ䆹䙓ܕˈ㒣偠㸼ᯢ ߿ᰃᔧЁᮁ᳡ࡵ⿟ᑣ䕗䭓ˈ᪡԰↨䕗໡ᴖᯊˈህ᳝ৃ㛑಴݇䯁Ёᮁⱘᯊ䯈ᣕ㓁໾䭓㗠϶༅݊ᅗⱘЁᮁDŽ гᰃ CPU ೼こ䍞Ёᮁ䮼ᯊ㞾ࡼ݇Ёᮁⱘॳ಴DŽԚᰃˈ݇ЁᮁᰃϾ᮶ϡৃϡ⫼ˈজϡৃⒹ⫼ⱘ᠟↉ˈ⡍ ϔ㠀ᴹ䇈ˈЁᮁ᳡ࡵ⿟ᑣ䛑ᰃ೼݇䯁Ёᮁ˄ϡࣙᣀĀϡৃሣ㬑Ёᮁ”NMI˅ⱘᴵӊϟᠻ㸠ⱘˈ䖭 䆌೼ℸᳳ䯈䖯㸠ⱘDŽܕᰃϡ ᔧ䖭Ͼ䅵఼᭄ⱘؐЎ䴲 0 ᯊህ㸼⼎ CPU ℷ໘Ѣ݋ԧⱘЁᮁ᳡ࡵ⿟ᑣЁˈҹৢ䇏㗙Ӯⳟࠄ᳝ѯ᪡԰ 35 #define irq_exit(cpu, irq) (local_irq_count(cpu)••) 34 #define irq_enter(cpu, irq) (local_irq_count(cpu)++) ==================== include/asm•i386/hardirq.h 34 35 ==================== include/asm•i386/hardirq.h˖ Ё 430 㸠ⱘ irq_enter()੠ 446 㸠ⱘ irq_exit()াᰃᇍϔϾ䅵఼᭄䖯㸠᪡԰ˈѠ㗙ഛᅮНѢ݊ { 449 221 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ݊ؐЎ 0x30DŽ276 㸠Ёⱘ CS(%esp)гᰃϔḋDŽ ݙᆍˈ䖭ህᰃֱᄬ೼ේᷜЁⱘЁᮁࠡ໩ᆘᄬ఼%eflags ⱘݙᆍDŽᐌ᭄ EFLAGS ៥ӀᏆ㒣೼ࠡ䴶ҟ㒡䖛ˈ 乎֓䇈ϟϔϟ 275 㸠ⱘ EFLAGS(%esp)㸼⼎ഄഔЎේᷜᣛ䩜%esp ⱘᔧࠡؐࡴϞᐌ᭄ EFLAGS ໘ⱘ ⾡䖤㸠㑻߿ˈ㋏㒳Ў 0ˈ⫼᠋Ў 3DŽ᠔ҹˈ㢹ᰃ CS ⱘ᳔ԢϸԡЎ䴲 0ˈ䙷ህ䇈ᯢЁᮁথ⫳Ѣ⫼᠋ぎ䯈DŽ ᳔Ԣϸԡˈ䙷ህ᳝᭛ゴњDŽ䖭ϸԡҷ㸼ⴔЁᮁথ⫳ᯊ CPU ⱘ䖤㸠㑻߿ CPLDŽ៥Ӏⶹ䘧 Linux া䞛⫼ϸ ᳝Ͼᷛᖫԡ㸼⼎ CPU ℷ೼ VM86 ῵ᓣЁ䖤㸠ˈ៥Ӏᇍ VM86 ῵ᓣϡᛳ݈䍷ˈ᠔ҹϡќ⏅おDŽ㗠 CS ⱘ VM86 ῵ᓣᰃЎ೼ i386 ֱᡸ῵ᓣϟ῵ᢳ䖤㸠 DOS 䕃ӊ㗠䆒㕂ⱘDŽ೼ᆘᄬ఼ EFLAGS ⱘ催 16 ԡЁ · Ёᮁࠡ໩ CPU 䖤㸠Ѣ⫼᠋ぎ䯈䖬ᰃ㋏㒳ぎ䯈DŽ · Ёᮁࠡ໩ CPU ᰃ৺䖤㸠Ѣ VM86 ῵ᓣDŽ ԡ˅ݙᆍᵘ៤ⱘ 32 ԡ䭓ᭈ᭄DŽ݊Ⳃⱘᰃ㽕Ẕ偠˖ 䍋⬅Ёᮁࠡ໩ᆘᄬ఼ EFLAGS ⱘ催 16 ԡ੠ҷⷕ↉ᆘᄬ఼ CS ⱘ˄8ޥ੠ 276 㸠߭೼ᆘᄬ఼ EAX Ёᣐ 䖭䞠ⱘ GET_CURRENT(%ebx)ᇚᣛ৥ᔧࠡ䖯⿟ⱘ task_struct 㒧ᵘⱘᣛ䩜㕂ܹᆘᄬ఼ EBXDŽ275 㸠 280 279 jmp restore_all 278 jne ret_with_reschedule 277 testl $(VM_MASK | 3),%eax # return to VM86 mode or non•supervisor? 276 movb CS(%esp),%al 275 movl EFLAGS(%esp),%eax # mix EFLAGS and CS 274 GET_CURRENT(%ebx) 273 ENTRY(ret_from_intr) [IRQ0x03_interrupt•>common_interrupt•>...•>ret_from_intr] ==================== arch/i386/kernel/entry.S 273 280 ==================== arch/i386/kernel/entry.S Ё˖ ಲࠄા䞠˛entry.S Ёⱘᷛো ret_from_intr ໘ˈ䖭ᰃݙḌЁ໘ᖗ⿃㰥ᅝᥦདњⱘDŽ݊ҷⷕ೼ ೼ do_softirq()Ёᠻ㸠ᅠⳌ݇ⱘ bh ߑ᭄˄བᵰ᳝ⱘ䆱˅ҹৢˈህࠄњҢ do_IRQ()䖨ಲⱘᯊ׭њDŽ䖨 া㽕ⶹ䘧᳝䖭М䯂џህ㸠њDŽ ᠻ㸠 bh ⱘᴎࠊᰃݙḌЁⱘϔ乍Ā෎⸔䆒ᮑāˈ᠔ҹ៥Ӏ೼ϟϔ㡖ऩ⣀ࡴҹҟ㒡DŽ䖭䞠ˈ䇏㗙᱖Ϩ 䖯⿟૸䝦ˈ߭ৃҹᬒ೼ĀৢञāЁᠻ㸠DŽ ᅮᰃ৺ᅠ៤њϔϾᄫヺІⱘ䕧ܹˈᑊ䖯ϔℹᡞⴵ⳴Ёⱘއ㗠䖯ϔℹẔᶹ᠔ᣝⱘᰃ৺Ā䯂䔺ā䬂ˈҢ㗠 㽕ᡞᄫヺ䇏䖯ᴹˈ䖭㽕ᬒ೼ĀࠡञāЁᠻ㸠˗ܜІⱘ䖛⿟˄䆺㾕䆒໛偅ࡼ˅ˈ↣ᔧᣝϔϾ䬂ⱘᯊ׭ˈ佪 ೼“cooked modeāϟҢ䬂Ⲭ䕧ܹᄫヺڣhalf˅ˈ೼ݙḌҷⷕЁᐌㅔ⿄Ў bhDŽ԰ЎϔϾ↨ஏˈ䇏㗙ϡོᛇ 㸠ˈ៪㗙ϡ䗖ᅰϔ⃵ऴ᥂ CPU ᯊ䯈໾䭓㗠ᕅડᇍ݊ᅗЁᮁ䇋∖ⱘ᳡ࡵDŽ䖭ህᰃ᠔䇧ⱘĀৢञā˄bottom Ёᮁ᳡ࡵЁ࠽ϟᴹⱘ䚼ߚড়ᑊ䍋ᴹᠻ㸠DŽ䖭ѯ᪡԰ᕔᕔᰃ↨䕗䌍ᯊⱘˈ಴㗠ϡ䗖ᅰ೼݇Ёᮁᴵӊϟᠻ ᠻ㸠ⱘDŽ㗠঺ϔ䚼ߚˈेĀৢञā䚼ߚˈᰃৃҹ⿡ৢ೼ᓔЁᮁᴵӊϟᠻ㸠ⱘˈᑊϨᕔᕔৃҹᇚ㢹ᑆ⃵ ᰃϸĀञāDŽ㄀ϔ䚼ߚᰃᖙ乏ゟेᠻ㸠ˈϔ㠀ᰃ೼݇Ёᮁᴵӊϟᠻ㸠ⱘˈᑊϨᖙ乏ᰃᇍ↣⃵䇋∖䛑ऩ⣀ ೼ Linux Ёˈ䆒໛偅ࡼ⿟ᑣⱘ䆒䅵ҎਬৃҹᇚЁᮁ᳡ࡵߚ៤ϸĀञāˈ݊ᅲᰃϸĀ䚼ߚāˈ㗠ᑊϡϔᅮ ೼䖭䞠᳝Ͼ⡍⅞ⱘ㗗㰥ˈ䖭ህᰃ᠔䇧 softirqˈेĀ˄೼ᯊ䯈Ϟ˅䕃ᗻⱘЁᮁ䇋∖āˈҹࠡ⿄Ў“bottom halfāDŽ ࠄ 624 㸠ҹৢˈҢ䘏䕥ⱘ㾦ᑺ䇈ᇍЁᮁ䇋∖ⱘ᳡ࡵԐТᏆ㒣ᅠ↩ˈৃҹ䖨ಲњDŽৃᰃ Linux ݙḌ { 628 222 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 289 jmp ret_from_sys_call 288 call SYMBOL_NAME(schedule) # test 287 reschedule: [IRQ0x03_interrupt•>common_interrupt•>...•>ret_from_intr•>ret_with_reschedule•>reschedule] ==================== arch/i386/kernel/entry.S 287 289 ==================== г೼ arch/i386/kernel/entry.S Ё˖ བᵰᔧࠡ䖯⿟ⱘ task_struct 㒧ᵘЁⱘ need_resched ᄫ↉Ў䴲 0ˈे㸼⼎䳔㽕䖯㸠䇗ᑺˈreschedule 79 need_resched = 20 78 exec_domain = 16 77 addr_limit = 12 76 sigpending = 8 75 flags = 4 74 state = 0 73 */ 72 * these are offsets into the task•struct. 71 /* ==================== arch/i386/kernel/entry.S 71 79 ==================== arch/i386/kernel/entry.S˅˖ ⱘݙᆍDŽ220 㸠ⱘ sigpending(%ebx)гᰃϔḋDŽᐌ᭄ need_resched ੠ sigpending ⱘᅮНЎ˄㾕 䖯⿟ⱘ task_struct 㒧ᵘᣛ䩜ˈ㗠 need_resched(%ebx)ህ㸼⼎䆹 task_struct 㒧ᵘЁԡ⿏Ў need_resched ໘ Ẕᶹᰃ৺䳔㽕䖯㸠ϔ⃵䖯⿟䇗ᑺDŽϞ䴶៥ӀᏆ㒣ⳟࠄˈᆘᄬ఼ EBX Ёⱘݙᆍህᰃᔧࠡܜ䖭䞠ˈ佪 233 jmp restore_all 232 call SYMBOL_NAME(do_signal) 231 xorl %edx,%edx 230 jne v86_signal_return 229 movl %esp,%eax 228 testl $(VM_MASK),EFLAGS(%esp) 227 sti # we can get here from an interrupt handler 226 signal_return: 225 ALIGN 224 223 RESTORE_ALL 222 restore_all: 221 jne signal_return 220 cmpl $0,sigpending(%ebx) 219 jne reschedule 218 cmpl $0,need_resched(%ebx) 217 ret_with_reschedule: [IRQ0x03_interrupt•>common_interrupt•>...•>ret_from_intr•>ret_with_reschedule] ==================== arch/i386/kernel/entry.S 217 233 ==================== Ӯࠄ䖒 restore_allDŽ䖭↉⿟ᑣ೼ৠϔ᭛ӊ˄arch/i386/kernel/entry.S˅Ё˖ ᅮЁᮁথ⫳Ѣ⫼᠋ぎ䯈ˈ಴ЎҢ ret_with_reschedule ᳔㒜䖬؛߭䕀⿏ࠄ ret_with_rescheduleDŽ䖭䞠៥Ӏ བᵰЁᮁথ⫳Ѣ㋏㒳ぎ䯈ˈ᥻ࠊህⳈ᥹䕀⿏ࠄ restore_allˈ㗠བᵰথ⫳Ѣ⫼᠋ぎ䯈˄៪ VM86 ῵ᓣ˅ 223 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ЎҔМ೼ RESTORE_ALL ⱘ 111 㸠㽕ᇚේᷜᣛ䩜ⱘᔧࠡؐࡴ 4˛䖭ᰃЎњ䏇䖛 ORIG_EAXˈ䙷ᰃ 100 99 movl %edx,%es; 98 movl %edx,%ds; \ 97 movl $(__KERNEL_DS),%edx; \ 96 pushl %ebx; \ 95 pushl %ecx; \ 94 pushl %edx; \ 93 pushl %esi; \ 92 pushl %edi; \ 91 pushl %ebp; \ 90 pushl %eax; \ 89 pushl %ds; \ 88 pushl %es; \ 87 cld; \ 86 #define SAVE_ALL \ ==================== arch/i386/kernel/entry.S 86 100 ==================== SAVE_ALL˄arch/i386/kernel/entry.S˅߫ߎ೼䖭䞠˖ ᰒ✊ˈ䖭ᰃϢ䖯ܹݙḌᯊᠻ㸠ⱘᅣ᪡԰ SAVE_ALL 䘹ⳌᇍᑨⱘDŽЎᮍ֓䇏㗙ࡴҹᇍ✻ˈ៥Ӏݡᡞ 112 3: iret; \ 111 addl $4,%esp; \ 110 2: popl %es; \ 109 1: popl %ds; \ 108 popl %eax; \ 107 popl %ebp; \ 106 popl %edi; \ 105 popl %esi; \ 104 popl %edx; \ 103 popl %ecx; \ 102 popl %ebx; \ 101 #define RESTORE_ALL \ ==================== arch/i386/kernel/entry.S 101 112 ==================== ˄arch/i386/kernel/entry.S˅Ё˖ restore_all ˈᑊҢ䙷䞠ᠻ㸠Ёᮁ䖨ಲDŽᅣ᪡԰ RESTORE_ALL ⱘᅮНг೼ৠϔ᭛ӊ ࠄ 222 㸠ⱘ restere_allDŽᅲ䰙Ϟˈret_from_sys_call ᳔ৢ䖬ಲࠄ ret_from_intrˈ᳔㒜⅞䗨ৠᔦ䛑Ӯࠄ䖒 ᰃϔ⾡䖯⿟䯈䗮ֵⱘ᠟↉ˈ៥Ӏᇚ೼Ā䖯⿟䯈䗮ֵāϔゴЁࡴҹҟ㒡DŽ໘⧚ᅠֵোҹৢˈ᥻ࠊ䖬ᰃಲ ৺ VM86 ῵ᓣˈ✊ৢᇚᆘᄬ఼%edx ⱘݙᆍ⏙ 0˄231 㸠˅ݡ䇗⫼ do_signal()DŽĀֵো˄signal˅ā෎ᴀϞ ऎߚᰃܜ䕀⿏ࠄ 226 㸠DŽ೼ 228 㸠໘ܜ໘⧚њ䖭ѯᕙ໘⧚ⱘֵোᠡ᳔ৢҢЁᮁ䖨ಲˈ᠔ҹܜ໘⧚ˈ㽕 ৠḋˈབᵰᔧࠡ䖯⿟ⱘ task_struct 㒧ᵘЁⱘ sigpending ᄫ↉Ў䴲 0ˈህ㸼⼎䆹䖯⿟᳝Āֵোāㄝᕙ ⳟࠄˈབᵰ㽕䇗ᑺⱘ䆱ˈҢ ret_from_sys_call ໘㒣䖛ϔ↉⬹Ў᳆ᡬⱘ䘧䏃᳔㒜гӮࠄ䖒 restore_allDŽ ᅮϡ䳔㽕䇗ᑺDŽ䇏㗙ҹৢӮ؛⫼ϔ㡖Ёݡࡴ䅼䆎DŽ㟇Ѣ schedule()߭೼䖯⿟ϔゴЁҟ㒡ˈ䖭䞠៥Ӏ᱖Ϩ ⿟ᑣ೼䖭䞠䇗⫼ϔϾߑ᭄ schedule()䖯㸠䇗ᑺˈ✊ৢজ䕀⿏ࠄ ret_from_sys_callDŽ៥Ӏᇚ೼㋏㒳䇗 224 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Ͼ bh ߑ᭄DŽԚᰃˈሑㅵབℸˈѠ㗙೼ὖᗉϞ䖬ᰃⳌৠⱘDŽ ㋴ै᳔໮া㛑ҷ㸼ϔܗϔϾЁᮁ䗮䘧ˈ᠔ҹᰃϔϾЁᮁ᳡ࡵ⿟ᑣ䯳߫DŽ㗠 bh_base[]Ёⱘ↣Ͼ ㋴ҷ㸼ⴔܗ(1) ᭄㒘 bh_base[]ⳌᔧѢ⹀ӊЁᮁᴎࠊЁⱘ᭄㒘 irq_desc[]DŽϡ䖛 irq_desc[]Ёⱘ↣Ͼ ៥Ӏৃҹ೼ЁᮁϢ bh Ѡ㗙П䯈ᓎゟ䍋ϔ⾡㉏↨DŽ ㋴DŽܗⱘ 32 ԡᇍᑨⴔ᭄㒘 bh_base[]Ёⱘ 32 Ͼ ϔϾ݋ԧⱘ bh ߑ᭄DŽৠᯊˈজ䆒㕂њϸϾ 32 ԡ᮴ヺোᭈ᭄ bh_active ੠ bh_maskˈ↣Ͼ᮴ヺোᭈ᭄Ё ҹࠡⱘݙḌЁ䆒㕂њϔϾߑ᭄ᣛ䩜᭄㒘 bh_base[]ˈ݊໻ᇣЎ 32ˈ᭄㒘Ёⱘ↣Ͼᣛ䩜ৃҹ⫼ᴹᣛ৥ ህ⿄Ў bhDŽԚᰃˈ೼ 2.4 ⠜˄⹂ߛഄ䇈ᰃ 2.3.43˅Ё᳝њᮄⱘথሩ੠᥼ᑓDŽ Linux ݙḌЎᇚЁᮁ᳡ࡵߚ៤ϸञᦤկњᮍ֓ˈᑊ䆒ゟњⳌᑨⱘᴎࠊDŽ೼ҹࠡⱘݙḌЁˈ䖭Ͼᴎࠊ ϟবᕫᕜᯢᰒњDŽމᰃ CISC ⱘˈ䴶Јⱘ䯂乬ϡሑⳌৠˈԚࠡ䗄ⱘ䯂乬Ꮖ㒣Փ bh ⱘᖙ㽕ᗻ೼䆌໮ᚙ ⱘ݇䬂ᗻⱘ᪡԰ˈ⿄ЎĀ䕏䞣㑻ЁᮁĀDŽ㗠঺ϔ䚼ߚˈ䙷ህⳌᔧѢ䖭䞠ⱘ bh њDŽ㱑✊ i386 ⱘ㒧ᵘЏ㽕 ᳡ࡵߚ៤ϸ䚼ߚDŽ㄀ϔ䚼ߚাֱᄬЎ᭄ϡ໮ⱘᆘᄬ఼˄ݙᆍ˅ˈᑊ߽⫼䖭Ў᭄ϡ໮ⱘᆘᄬ఼ᴹᅠ៤᳝䰤 䛑य़ܹේᷜˈᑊ೼䖨ಲᯊࡴҹᘶ໡ˈЎℸ㗠Ҭߎᕜ催ⱘҷӋDŽ᠔ҹˈ೼ RISC 㒧ᵘⱘ㋏㒳ЁᕔᕔᡞЁᮁ ㋏㒳㒧ᵘDŽ೼ RISC ⱘ CPU Ёˈ䗮ᐌ䛑᳝໻䞣ⱘᆘᄬ఼DŽᔧЁᮁথ⫳ᯊˈ㽕ᇚ᠔᳝䖭ѯᆘᄬ఼ⱘݙᆍ 䖭䞠ⱘৢञ䚼ߚህ⿄Ў“bottom halfāˈ೼ݙḌҷⷕЁᐌᐌ㓽ݭЎ bhDŽ䖭Ͼὖᗉ೼Ⳍᔧ⿟ᑺϞᴹ㞾 RISC 䖭ѯϡৠⱘᗻ䋼ᐌᐌՓЁᮁ᳡ࡵⱘࠡৢϸञᯢᰒഄऎߚᓔᴹˈৃҹǃ㗠Ϩᑨ䆹ߚ߿ࡴҹϡৠⱘᅲ⦄DŽ 䆌ᓊ䖳ࠄ⿡ৢᠡᴹᠻ㸠ˈ㗠Ϩ᳝ৃ㛑ᇚ໮⃵Ёᮁ᳡ࡵЁⱘⳌ݇䚼ߚড়ᑊ೼ϔ䍋໘⧚DŽܕ䖭ѯ᪡԰ᐌᐌ ҹǃ㗠Ϩᑨ䆹೼ᓔЁᮁᴵӊϟᠻ㸠ˈ䖭ḋᠡϡ㟇Ѣ಴ᇚЁᮁ݇䯁䖛Й㗠䗴៤݊ᅗЁᮁⱘ϶༅DŽৠᯊˈ ϔᅮⱘᯊ䯈䰤ࠊЁᅠ៤ˈ㗠ϨⳌ㒻ⱘ໮⃵Ёᮁ䇋∖гϡ㛑ড়ᑊ೼ϔ䍋ᴹ໘⧚DŽ㗠ৢञ䚼ߚˈ߭䗮ᐌৃ ѯ݇䬂ᗻ᪡԰DŽৠᯊˈ䖭䚼ߚ᪡԰ⱘᯊ䯈ᗻজᕔᕔᕜᔎˈᖙ乏೼Ёᮁ䇋∖থ⫳ৢĀゟेā៪㟇ᇥᰃ೼ ߚDŽᓔ༈ⱘ䚼ߚᕔᕔᰃᖙ乏೼݇Ёᮁᴵӊϟᠻ㸠ⱘDŽ䖭ḋᠡ㛑೼ϡফᑆᡄⱘᴵӊϟĀॳᄤāഄᅠ៤ϔ 㗠ܼ⿟ᓔЁᮁ߭জ䗴៤Āϡᅝᅮ಴㋴āˈᕜ䲒প㟡DŽϔ㠀ᴹ䇈ˈϔ⃵Ёᮁ᳡ࡵⱘ䖛⿟ᐌᐌৃҹߚ៤ϸ䚼 ᕔᕔᰃ˖㢹೼᳡ࡵⱘܼ䖛⿟݇Ёᮁ߭Āᠽ໻ᠧߏ䴶āˈމࡵ⿟ᑣ೼ᓔЁᮁⱘᴵӊϟᠻ㸠DŽ✊㗠ˈᅲ䰙ⱘᚙ 䆌೼ᇚ݋ԧⱘЁᮁ᳡ࡵ⿟ᑣᣖܹЁᮁ䇋∖䯳߫ᯊᇚ SA_INTERRUPT ᷛᖫ㕂៤ 0ˈՓ䖭ϾЁᮁ᳡ܕݙḌ ݇Ёᮁⱘᯊ䯈ᣕ㓁໾䭓ህৃ㛑಴Ў CPU ϡ㛑ঞᯊડᑨ݊ᅗⱘЁᮁ䇋∖㗠ՓЁᮁ˄䇋∖˅϶༅ˈЎℸˈ ጠ༫㗠Փ᥻ࠊ໡ᴖ࣪DŽৃᰃˈབᵰܡЁᮁ᳡ࡵϔ㠀䛑ᰃ೼ᇚЁᮁ䇋∖݇䯁ⱘᴵӊϟᠻ㸠ⱘˈҹ䙓 3.5 䕃ЁᮁϢ Bottom Half ᷜDŽ CPU ҢЁᮁ䖨ಲDŽ䎳䖯ܹЁᮁᯊⳌᇍᑨˈབᵰᰃҢ㋏㒳ᗕ䖨ಲࠄ⫼᠋ᗕህӮᇚᔧࠡේᷜߛᤶࠄ⫼᠋ේ 䖭ḋˈᔧ CPU ࠄ䖒 112 㸠ⱘ iret ᣛҸᯊˈ㋏㒳ේᷜজᘶ໡ࠄ߮䖯ܹЁᮁ䮼ᯊⱘ⢊ᗕˈ㗠 iret ߭Փ এਸ਼˛ܓ⦄೼᠔᳝ⱘᆘᄬ఼䛑Ꮖऴ⒵њˈ䖬㛑 popl ࠄા ϟ⹂ᅲᑨ䆹Փ⫼ popl ᣛҸˈԚᰃ pop ᣛҸϔᅮᰃϢϔϾᆘᄬ఼Ⳍ㘨㋏ⱘˈމᣛҸਸ਼˛ᰃⱘˈ೼ℷᐌⱘᚙ ᇍේᷜЁⱘ݊ᅗݙᆍϔḋгՓ⫼ poplڣ੠ᓖᐌᯊ䇏㗙Ӯ䖯ϔℹⳟࠄ݊԰⫼DŽ䇏㗙г䆌Ӯ䯂˖䙷ЎҔМϡ পߎ᳔݊Ԣ 8 ԡDŽ✊ৢҹℸЎϟᷛҢ irq_desc[]ЁᡒࠄⳌᑨⱘЁᮁ᳡ࡵᦣ䗄㒧ᵘDŽҹৢ೼䆆䗄㋏㒳䇗⫼ ೼䖯ܹЁᮁП߱य़ܹේᷜⱘЁᮁ䇋∖ো˄㒣䖛বᔶ˅DŽ៥ӀᏆ㒣ⳟࠄ೼ do_IRQ()Ёⱘ㄀ϔӊџህᰃҢЁ 225 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 282 { 281 void __init softirq_init() ==================== kernel/softirq.c 281 290 ==================== ㋏㒳೼߱ྟ࣪ᯊ䗮䖛ߑ᭄ softirq_init()ᇍݙḌⱘ䕃Ёᮁᴎࠊ䖯㸠߱ྟ࣪DŽ݊ҷⷕ kernel/softirq.c Ё˖ ϟ䴶ˈ៥Ӏҹ bh ߑ᭄ЎЏ㒓ˈ䗮䖛䯙䇏ҷⷕᴹভ䗄 2.4 ⠜ݙḌⱘ䕃Ёᮁ˄softirq˅ᴎࠊDŽ ᠔ҹˈᇍĀ䕃Ёᮁā䖭Ͼ䆡ⱘ৿ᛣ㽕ḍ᥂ಲϞϟ᭛ࡴҹऎߚDŽ 㗠Āֵোā߭ᰃ⬅ݙḌ˄៪݊ᅗ䖯⿟˅ᇍᶤϾ䖯⿟ⱘЁᮁDŽৢ䴶䖭Ѡ㗙䛑ᰃ⬅䕃ӊѻ⫳ⱘĀ䕃ЁᮁāDŽ 䇈Ā⹀Ёᮁā䗮ᐌᰃ໪䚼䆒໛ᇍ CPU ⱘЁᮁˈ䙷М softirq 䗮ᐌᰃĀ⹀Ёᮁ᳡ࡵ⿟ᑣāᇍݙḌⱘЁᮁˈ 䴶гড᯴њ䖭ᰃϔ⾡೼ᯊ䯈㽕∖Ϟ᳈Ў䕃ᗻⱘЁᮁ䇋∖DŽᅲ䰙Ϟˈ䖭䞠᠔ԧ⦄ⱘᰃሖ⃵ⱘϡৠDŽབᵰ Ѣ bh ⱘᴎࠊ⿄ЎĀ䕃Ёᮁāজ⹂ᅲᕜ䌈ߛDŽ䖭ϔᮍ䴶ড᯴њϞ䗄 bh ߑ᭄ϢЁᮁП䯈ⱘ㉏↨ˈ঺ϔᮍ ˄signal˅ⱘҷৡ䆡ˈ಴Ўֵোᅲ䰙ϞህᰃĀҹ䕃ӊ᠟↉ᅲ⦄ⱘЁᮁᴎࠊāDŽԚᰃˈ঺ϔᮍ䴶ˈᡞ㉏Ԑ Ңᄫ䴶Ϟ䇈 softirq ህᰃ䕃ЁᮁˈৃᰃĀ䕃Ёᮁā䖭Ͼ䆡˄ᇸ݊ᰃ೼Ё᭛䞠˅Ꮖ㒣㹿⫼԰Āֵো” ᡞᅗӀ㒇ܹϔϾ㒳ϔⱘḚᶊЁDŽ䖭ህᰃ 2.4 ⠜ЁⱘĀ䕃Ёᮁā˄softirq˅ᴎࠊDŽ ᄬ೼ⱘ bh ߑ᭄ᰒ✊ϡヺড়䖭ѯ㽕∖DŽ᠔ҹˈ↨䕗དⱘࡲ⊩ᰃֱ⬭ bhˈ঺໪ݡ๲䆒ϔ⾡៪޴⾡ᴎࠊˈᑊ њ䖭ϔᴵˈህ㽕ᇍ bh ߑ᭄ᴀ䑿ⱘ䆒䅵੠ᅲ⦄᳝᳈催ⱘ㽕∖˄՟བᇍՓ⫼ܼሔ䞣ⱘѦ᭹˅ˈ㗠ॳᴹᏆ㒣 ܼሔᗻⱘˈ᠔ҹᰃĀ䰆ि䖛ᔧāњDŽ᮶✊བℸˈህᑨ䆹㗗㰥ᬒᆑϞ䗄ⱘ㄀Ѡᴵ᥾ᮑDŽԚᰃˈབᵰᬒᆑ ԰ϔ↨䕗ህৃҹথ⦄ˈ೼ do_IRQ ЁⱘІ㸠࣪াᰃ䩜ᇍϔϾ݋ԧЁᮁ䗮䘧ⱘˈ㗠 bh ߑ᭄ⱘІ㸠࣪ैᰃ ᕜ໮bhߑ᭄䳔㽕ᠻ㸠ᯊˈ㱑✊㋏㒳Ё᳝໮ϾCPUᄬ೼ˈैা᳝ϔϾCPU䖭МϔϾĀ⣀᳼ḹāDŽ䎳do_IRQ() 㒧ᵘⱘᗻ㛑᳝ϡ߽ⱘᕅડDŽॳ಴ህ೼ѢϞ䗄ⱘ㄀Ѡᴵ᥾ᮑՓ bh ߑ᭄ⱘᠻ㸠ᅠܼІ㸠࣪њDŽᔧ㋏㒳Ё᳝ ೼ᔧᯊⱘ Linux ݙḌৃҹ೼໮ CPU SMP 㒧ᵘϞ〇ᅮ䖤㸠ҹৢˈህ᜶᜶থ⦄䖭ḋⱘ໘⧚ᇍѢ໮ CPU SMP 䖭ϸᴵ᥾ᮑˈ⡍߿ᰃ㄀Ѡᴵ᥾ᮑˈֱ䆕њҢऩ CPU 㒧ᵘࠄ໮ CPU SMP㒧ᵘⱘᑇ〇䖛⏵DŽৃᰃˈ bh ߑ᭄ˈ᠔ҹгゟे䖨ಲDŽ ߑ᭄гࡴњ䫕DŽ䖭ḋˈབᵰ䖯ܹ do_bottom_half()ҹৢথ⦄䖭Ͼ䫕Ꮖ㒣䫕Ϟˈህ䇈ᯢᏆ㒣᳝ CPU ೼ᠻ㸠 㟇᳈໮Ͼ CPU ৠᯊᴹᠻ㸠 bh ߑ᭄㗠ѦⳌᑆᡄDŽЎℸ೼ do_bottom_half()Ё䩜ᇍϡৠ CPU ৠᯊᠻ㸠 bh 䆌ϔϾ CPU ᠻ㸠 bh ߑ᭄ˈҹ䰆᳝ϸϾ⫮ܕ঺ϔᮍ䴶ˈᰃ೼໮ CPU ㋏㒳Ёˈ೼ৠϔᯊ䯈ݙ᳔໮া ಴Ў䖭䇈ᯢ CPU ೼ᴀ⃵Ёᮁথ⫳ПࠡᏆ㒣೼䖭Ͼߑ᭄ЁњDŽ ᇍৠϔ CPU Ϟⱘጠ༫ᠻ㸠ࡴњ䫕DŽ䖭ḋˈབᵰ䖯ܹ do_bottom_half()ҹৢথ⦄Ꮖ㒣Ϟњ䫕ˈህゟे䖨ಲDŽ ࡵҹৢ೼ do_IRQ()Ё䛑㽕Ẕᶹ੠໘⧚ bh ߑ᭄ⱘᠻ㸠ˈህ᳝ৃ㛑ጠ༫DŽЎℸˈ೼ do_bottom_half()Ё䩜 䆌ጠ༫DŽབᵰ೼ᠻ㸠 bh ߑ᭄ⱘ䖛⿟Ёথ⫳Ёᮁˈ䙷М⬅Ѣ↣⃵Ёᮁ᳡ܕϔᮍ䴶ˈbh ߑ᭄ⱘᠻ㸠ϡ 㸠࣪āњDŽ䖭⾡І㸠᳝࣪ϸᮍ䴶ⱘ㗗㰥੠᥾ᮑ˖ do_IRQ()Ёϔḋˈᡞ bh ߑ᭄ⱘᠻ㸠ϹḐഄĀІ ڣЎњㅔ࣪ bh ߑ᭄ⱘ䆒䅵ˈ೼ do_bottom_half()Ёг Ͼߑ᭄ do_bottom_half()Ёᠻ㸠Ⳍᑨⱘ bh ߑ᭄DŽ㗠 do_bottom_half()ˈ߭㉏ԐѢ do_IRQ()DŽ 䙷МህӮ೼↣⃵ᠻ㸠ᅠ do_IRQ()ЁⱘЁᮁ᳡ࡵ⿟ᑣҹৢˈҹঞ↣⃵㋏㒳䇗⫼㒧ᴳПᯊˈ೼ϔ 䆌ᠻ㸠䖭Ͼ bh ߑ᭄ˈܕ(4) བᵰⳌᔧѢĀЁᮁሣ㬑ᆘᄬ఼āⱘ bh_mask ЁⱘⳌᑨԡгᰃ 1ˈे㋏㒳 Ёᮁ⑤থߎњЁᮁ䇋∖ˈ㗠᠔䆒㕂ⱘ݋ԧᷛᖫԡ߭㉏ԐѢĀЁᮁ৥䞣āDŽ (3) 䳔㽕ᠻ㸠ϔϾ bh ߑ᭄ᯊˈህ䗮䖛ϔϾߑ᭄ mark_bh()ᇚ bh_active Ёⱘᶤϔԡ䆒៤ 1ˈⳌᔧѢ ᮁሣ㬑ᆘᄬ఼āDŽ ᮴ヺোᭈ᭄ bh_active ೼ὖᗉϞⳌᔧѢ⹀ӊⱘĀЁᮁ䇋∖ᆘᄬ఼āˈ㗠 bh_mask ߭ⳌᔧѢĀЁ (2) 226 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 109 started, it will be executed only once. 108 * If the tasklet is already scheduled, but its excecution is still not 107 to be executed on some cpu at least once after this. 106 * If tasklet_schedule() is called, then tasklet is guaranteed 105 Properties: 104 103 may be run simultaneously on different CPUs. 102 Main feature differing them of BHs: different tasklets 101 100 is running only on one CPU simultaneously. 99 Main feature differing them of generic softirqs: tasklet 98 97 /* Tasklets ••• multithreaded analogue of BHs. ==================== include/linux/interrupt.h 97 124 ==================== 䖭⾡᭄᥂㒧ᵘⱘᅮНг೼ include/linux/interrupt.h Ё˖ 233 struct tasklet_struct bh_task_vec[32]; ==================== kernel/softirq.c 233 233 ==================== ᭄᥂㒧ᵘⱘ᭄㒘DŽ ⳟ bh ᴎࠊⱘ߱ྟ࣪DŽݙḌЁЎ bh ᴎࠊ䆒㕂њϔϾ㒧ᵘ᭄㒘 bh_task_vec[]ˈ䖭ᰃ tasklet_structܜ 䆒ⱘˈ᠔ҹ೼ softirq_init Ёাᇍ TASKLET_SOFTIRQ ੠ HI_SOFTRQ ϸ⾡䕃Ёᮁ䖯㸠߱ྟ࣪DŽ ݊ᅲ䖭Ѡ㗙↿᮴݇㋏DŽᰒ✊ˈNET_TX_SOFTIRQ ੠ NET_RX_SOFTIRQ ϸ⾡䕃ЁᮁᰃϧЎ㔥㒰᪡԰㗠 䆡ⱘॳᛣ೼Ѣ㸼⼎䖭ᰃϔ⠛ᇣᇣⱘĀӏࡵāˈԚᰃ䖭Ͼ䆡ᆍᯧՓҎ㘨ᛇࠄ“taskāे䖯⿟㗠ᓩ䍋䇃Ӯˈ 䖭䞠᳔ؐᕫ⊼ᛣⱘᰃ TASKLET_SOFTIRQˈҷ㸼ⴔϔ⾡⿄Ў tasklet ⱘᴎࠊDŽг䆌䞛⫼ tasklet 䖭Ͼ 62 }; 61 TASKLET_SOFTIRQ 60 NET_RX_SOFTIRQ, 59 NET_TX_SOFTIRQ, 58 HI_SOFTIRQ=0, 57 { 56 enum ==================== include/linux/interrupt.h 56 62 ==================== include/linux/interrupt.h˖ гৃҹ䇈ᰃ䆒䅵᳔ֱᅜⱘˈԚैᰃ᳔ㅔऩǃ᳔ᅝܼⱘ䕃ЁᮁDŽ䰸ℸП໪ˈ䖬᳝݊ᅗⱘ䕃ЁᮁˈᅮНѢ 䕃Ёᮁᴀ䑿ᰃϔ⾡ᴎࠊˈৠᯊгᰃϔϾḚᶊDŽ೼䖭ϾḚᶊ䞠᳝ bh ᴎࠊˈ䖭ᰃϔ⾡⡍⅞ⱘ䕃Ёᮁˈ 290 } 289 open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL); 288 open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL); 287 286 tasklet_init(bh_task_vec+i, bh_action, i); 285 for (i=0; i<32; i++) 284 int i; 283 227 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 110 spin_lock_irqsave(&softirq_mask_lock, flags); 109 108 int i; 107 unsigned long flags; 106 { 105 void open_softirq(int nr, void (*action)(struct softirq_action*), void *data) 104 103 static spinlock_t softirq_mask_lock = SPIN_LOCK_UNLOCKED; [softirq_init()>open_softirq()] ==================== kernel/softirq.c 103 117 ==================== ᇍ݊ᅗ䕃Ёᮁⱘ߱ྟ࣪ᰃ䗮䖛 open_softirq()ᅠ៤ⱘˈ݊ҷⷕг೼ৠϔ᭛ӊЁ˖ ܼ䛑ᣛ৥ bh_action()DŽ ೼ softirq_init()Ёˈᇍ⫼Ѣ bh ⱘ 32 Ͼ tasklet_struct 㒧ᵘ䇗⫼ tasklet_init()ҹৢˈᅗӀⱘߑ᭄ᣛ䩜 func 210 } 209 atomic_set(&t•>count, 0); 208 t•>state = 0; 207 t•>data = data; 206 t•>func = func; 205 { 204 void (*func)(unsigned long), unsigned long data) 203 void tasklet_init(struct tasklet_struct *t, [softirq_init()>tasklet_init()] ==================== kernel/softirq.c 203 210 ==================== ߑ᭄ tasklet_init()ⱘҷⷕ೼ kernel/softirq.c Ё˖ ˄ᑊϡᰃ bh ߑ᭄ᴀ䑿˅ህᰃ԰ЎϔϾ tasklet ᴹᅲ⦄ⱘˈ೼ℸ෎⸔ϞݡࡴϞ᳈ϹḐⱘ䰤ࠊˈህ៤њ bhDŽ 䩜 func ᣛ৥݊᳡ࡵ⿟ᑣDŽ䙷МˈЎҔМ೼ bh ᴎࠊЁ㽕Փ⫼䖭⾡᭄᥂㒧ᵘਸ਼˛䖭ᰃ಴Ў bh ߑ᭄ⱘᠻ㸠 㸠 taskletˈԚᖙ乏ᰃϡৠⱘ taskletDŽϔϾ tasklet_struct ᭄᥂㒧ᵘህҷ㸼ⴔϔϾ taskletˈ㒧ᵘЁⱘߑ᭄ᣛ 䆌೼ϡৠⱘ CPU Ϟৠᯊᠻܕᇍ bh ߑ᭄䙷ḋϹḐˈ᠔ҹڣЎҔМ䖭М䇈ਸ਼˛಴Ўᇍ tasklet ⱘІ㸠࣪ϡ ҷⷕⱘ԰㗙ࡴњ䆺㒚ⱘ⊼䞞ˈ䇈 tasklet ᰃĀ໮ᑣā˄ϡᰃĀ໮䖯⿟ā៪Ā໮㒓⿟āʽ˅ⱘ bh ߑ᭄DŽ 124 }; 123 unsigned long data; 122 void (*func)(unsigned long); 121 atomic_t count; 120 unsigned long state; 119 struct tasklet_struct *next; 118 { 117 struct tasklet_struct 116 115 */ 114 he makes it with spinlocks. 113 wrt another tasklets. If client needs some intertask synchronization, 112 * Tasklet is strictly serialized wrt itself, but not 111 from tasklet itself), it is rescheduled for later. If this tasklet is already running on another CPU (or schedule is called * 110 228 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 23 #define __IRQ_STAT(cpu, member) (irq_stat[cpu].member) 22 #ifdef CONFIG_SMP ==================== include/linux/irq_cpustat.h 22 30 ==================== 45 irq_cpustat_t irq_stat[NR_CPUS]; ==================== kernel/softirq.c 45 45 ==================== 16 } ____cacheline_aligned irq_cpustat_t; 15 unsigned int __nmi_count; /* arch dependent */ 14 unsigned int __syscall_count; 13 unsigned int __local_bh_count; 12 unsigned int __local_irq_count; 11 unsigned int __softirq_mask; 10 unsigned int __softirq_active; 9 typedef struct { 8 /* entry.S is sensitive to the offsets of these fields */ ==================== include/asm•i386/hardirq.h 8 16 ==================== ߎѢϟˈկ䇏㗙㞾Ꮕ䯙䇏˖ 䖭Ͼ᭄㒘гᰃܼሔ䞣ˈԚᰃ৘Ͼ CPU ৃҹᣝ݊㞾䑿ⱘ㓪ো䆓䯂Ⳍᑨⱘ᭄᥂㒧ᵘDŽ៥Ӏᡞ᳝݇ⱘᅮН߫ 㒧ᵘāˈ᠔ҹ䖭ѯ᭄᥂㒧ᵘᔶ៤ϔϾҹ CPU 㓪োЎϟᷛⱘ᭄㒘 irq_stat[]DŽމ⢊ˋ݊㞾ᏅⱘĀ䕃Ёᮁ᥻ࠊ ᭄㒘 softirq_vec[]ᰃϾܼሔ䞣ˈ㋏㒳Ёⱘ৘Ͼ CPU ᠔ⳟࠄⱘᰃৠϔϾ᭄㒘DŽԚᰃˈ↣Ͼ CPU ৘᳝ 72 }; 71 void *data; 70 void (*action)(struct softirq_action *); 69 { 68 struct softirq_action 67 66 */ 65 * asm/hardirq.h to get better cache usage. KAO 64 /* softirq mask and active fields moved to irq_cpustat_t in ==================== include/linux/interrupt.h 64 72 ==================== 䖭ᰃϔϾ softirq_action ᭄᥂㒧ᵘⱘ᭄㒘ˈ݊ᅮНЎ˖ 48 static struct softirq_action softirq_vec[32] __cacheline_aligned; ==================== kernel/softirq.c 48 48 ==================== irq_desc[]DŽ ݙḌЁЎ䕃Ёᮁ䆒㕂њϔϾҹĀ䕃ЁᮁোāЎϟᷛⱘ᭄㒘 softirq_vec[]ˈ㉏ԐѢЁᮁᴎࠊЁⱘ 117 } 116 spin_unlock_irqrestore(&softirq_mask_lock, flags); 115 softirq_mask(i) |= (1<state)) { 172 { 171 static inline void tasklet_hi_schedule(struct tasklet_struct *t) [mark_bh()>tasklet_hi_schedule()] ==================== include/linux/interrupt.h 171 183 ==================== func 䛑ᣛ৥ bh_action()DŽ include/linux/interrupt.h ЁDŽ䇏㗙ᑨ䆹䖬䆄ᕫˈ೼ bh_tasm_vec[]ⱘ↣Ͼ tasklet_struct 㒧ᵘЁˈߑ᭄ᣛ䩜 ߑ᭄ⱘ㓪োЎϟᷛህৃҹᡒࠄⳌᑨⱘ᭄᥂㒧ᵘˈᑊ⫼݊䇗⫼ tasklet_hi_schedule()ˈ݊ҷⷕг೼ བࠡ᠔䗄ˈݙḌЁЎ bh ߑ᭄ⱘᠻ㸠䆒ゟњϔϾ tasklet_struct 㒧ᵘ᭄㒘 bh_task_vec[]ˈ䖭䞠ҹ bh 235 } 234 tasklet_hi_schedule(bh_task_vec+nr); 233 { 232 static inline void mark_bh(int nr) ==================== include/linux/interrupt.h 232 235 ==================== mark_bh()ⱘҷⷕ೼ include/linux/interrupt.h Ё˖ ϔ㡖Ёৃҹⳟࠄ೼ do_timer()Ё䗮䖛“mark_bh(TIMER_BH);āᦤߎᇍ timer_bh()ⱘᠻ㸠䇋∖DŽߑ᭄ 䳔㽕ᠻ㸠ϔϾ⡍ᅮⱘ bh ߑ᭄ᯊˈৃҹ䗮䖛ϔϾ inline ߑ᭄ mark_bh()ᦤߎ䇋∖DŽ䇏㗙೼Āᯊ䩳Ёᮁ” ⱘĀ⌕∈㒓ā᳝݇ˈ㗠䖭ᑊϡᰃ៥Ӏ⦄೼᠔݇ᖗⱘDŽ ᰒ✊ˈ䖭䞠ⱘ᭄㒘 bh_base[]ህᰃࠡ䗄ⱘߑ᭄ᣛ䩜᭄㒘DŽ䖭䞠䇗⫼ⱘߑ᭄ mb()Ϣ CPU Ёᠻ㸠ᣛҸ 273 } 272 mb(); 271 bh_base[nr] = routine; 270 { 269 void init_bh(int nr, void (*routine)(void)) ==================== kernel/softirq.c 269 273 ==================== ݡⳟ init_bh()ⱘҷⷕˈ䖭ᰃ೼ kernel/softirq.c Ё˖ 43 }; 42 ISICOM_BH 41 MACSERIAL_BH, 40 JS_BH, 39 CM206_BH, 38 CYCLADES_BH, 37 IMMEDIATE_BH, 36 SCSI_BH, 35 ESP_BH, 34 AURORA_BH, 33 SPECIALIX_BH, RISCOM8_BH, 32 231 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 215 jne handle_softirq 214 #endif 213 testl SYMBOL_NAME(irq_stat)+4,%ecx # softirq_mask 212 movl SYMBOL_NAME(irq_stat),%ecx # softirq_active 211 #else 210 testl SYMBOL_NAME(irq_stat)+4(,%eax),%ecx # softirq_mask 209 movl SYMBOL_NAME(irq_stat)(,%eax),%ecx # softirq_active 208 shll $CONFIG_X86_L1_CACHE_SHIFT,%eax 207 movl processor(%ebx),%eax 206 #ifdef CONFIG_SMP 205 ENTRY(ret_from_sys_call) ==================== arch/i386/kernel/entry.S 205 215 ==================== ঺ϔ↉ҷⷕপ㞾 arch/i386/kernel/entry.Sˈ䖭ᰃ೼Ң㋏㒳䇗⫼䖨ಲᯊᠻ㸠ⱘ˖ 626 do_softirq(); 625 if (softirq_active(cpu) & softirq_mask(cpu)) ==================== arch/i386/kernel/irq.c 625 626 ==================== 㽕Ẕᶹᰃ৺᳝䕃Ёᮁ䇋∖೼ㄝᕙᠻ㸠DŽϟ䴶ᰃ do_IRQ()ЁⱘϔϾ⠛↉˖ ݙḌ↣ᔧ೼ do_IRQ()Ёᠻ㸠ᅠϔϾ䗮䘧ЁⱘЁᮁ᳡ࡵ⿟ᑣҹৢˈҹঞ↣ᔧҢ㋏㒳䇗⫼䖨ಲᯊˈ䛑 ᇚ݊Ё__soft_irq_active ᄫ↉ݙⱘⳌᑨᷛᖫԡ䆒៤ 1DŽ 㒧ᵘā᪡԰ˈމ⢊ˋ䇏㗙೼ࠡ䴶Ꮖ㒣ⳟࠄ䖛 softirq_active()ⱘᅮНˈᅗᇍ㒭ᅮ CPU ⱘĀ䕃Ёᮁ᥻ࠊ 80 } 79 softirq_active(cpu) |= (1<tasklet_hi_schedule()>__cpu_raise_softirq()] ==================== include/linux/interrupt.h 77 80 ==================== __cpu_raise_softirq()ℷᓣথߎ䕃Ёᮁ䇋∖DŽ ᠔ҹ೼᭄᥂㒧ᵘЁ䆒㕂њϔϾᷛᖫԡ TASKLET_STATE_SCHED ᴹֱ䆕䖭ϔ⚍DŽ᳔ৢˈ䖬㽕䗮䖛 䆌ݡᇚ݊䫒ܹ䯳߫ˈܕ᭄᥂㒧ᵘˈབᵰᏆ㒣ᇍ݊䇗⫼њ tasklet_hi_schedule()ˈ㗠ᇮ᳾ᕫࠄᠻ㸠ˈህϡ ೼ৠϔᯊ䯈ݙা㛑ᡞᅗ䫒ܹϔϾ䯳߫Ёˈ㗠ϡৃ㛑ৠᯊߎ⦄೼໮Ͼ䯳߫ЁDŽᇍѢৠϔϾ tasklet_struct ᰃ䖭Ͼᛣᗱˈ㗠ϢĀ䖯⿟䇗ᑺā↿᮴݇㋏DŽ঺ϔᮍ䴶ˈϔϾ tasklet_struct ҷ㸼ⴔᇍ bh ߑ᭄ⱘϔ⃵ᠻ㸠ˈ ⱘ㽕∖ᰃ೼ાϔϾ CPU ϞᦤߎⱘˈህᡞᅗĀ䇗ᑺā೼ાϔϾ CPU Ϟᠻ㸠ˈߑ᭄ৡЁⱘ“scheduleāህ ࠄ䆹 CPU ⱘ䯳߫༈ˈᡞখ᭄ t ᠔ᣛⱘ tasklet_struct ᭄᥂㒧ᵘ䫒ܹ䖭Ͼ䯳߫DŽ⬅ℸৃ㾕ˈᇍᠻ㸠 bh ߑ᭄ 䖭䞠ⱘ smp_processor_id()䖨ಲᔧࠡ䖯⿟᠔೼ CPU ⱘ㓪োˈ✊ৢҹℸЎϟᷛҢ tasklet_hi_vec[]Ёᡒ 183 } 182 } 181 local_irq_restore(flags); 180 __cpu_raise_softirq(cpu, HI_SOFTIRQ); 179 tasklet_hi_vec[cpu].list = t; t•>next = tasklet_hi_vec[cpu].list; 178 232 233 ==================== arch/i386/kernel/entry.S 282 284 ==================== 282 handle_softirq: 283 call SYMBOL_NAME(do_softirq) 284 jmp ret_from_intr ⊼ᛣˈ䖭䞠ⱘ processor 㸼⼎ task_struct ᭄᥂㒧ᵘЁ䆹ᄫ↉ⱘԡ⿏ˈ᠔ҹ 207 㸠ᰃҢᔧࠡ䖯⿟ⱘ task_struct ᭄᥂㒧ᵘЁপᔧࠡ CPU ⱘ㓪োDŽ㗠 SYMBOL_NAME(irq_stat)(,%eax)ⳌᔧѢ irq_stat[cpu]ˈ ᑊϨᰃ݊Ё㄀ϔϾᄫ↉˗ⳌᑨഄˈSYMBOL_NAME(irq_stat)+4(,%eax)Ⳍᔧ䖭Ͼ᭄᥂㒧ᵘЁⱘ㄀ѠϾᄫ ↉ˈᑊϨ㄀ϔϾᄫ↉ᖙ乏ᰃ 32 ԡDŽ䇏㗙ϡོಲ䖛এⳟϔϟ irq_cpustat_t ⱘᅮНˈ೼䙷䞠᳝Ͼ⊼䞞ˈ䇈 arch/i386/kernel/entry.S Ёⱘҷⷕᇍ䖭Ͼ᭄᥂㒧ᵘЁⱘᄫ↉ԡ㕂ᬣᛳˈህᰃ䖭ϾᛣᗱDŽ᠔ҹˈ䖭ѯ∛㓪 ҷⷕᅲ䰙ϞϢϞ䴶 do_IRQ()Ёⱘϸ㸠 C ҷⷕᰃϔḋⱘDŽ Ẕ⌟ࠄ䕃Ёᮁ䇋∖ҹৢˈህ㽕䗮䖛 do_softirq()ࡴҹᠻ㸠њDŽ݊ҷⷕ೼ kernel/softirq.c Ё˖ ==================== kernel/softirq.c 50 100 ==================== 50 asmlinkage void do_softirq() 51 { 52 int cpu = smp_processor_id(); 53 __u32 active, mask; 54 55 if (in_interrupt()) 56 return; 57 58 local_bh_disable(); 59 60 local_irq_disable(); 61 mask = softirq_mask(cpu); 62 active = softirq_active(cpu) & mask; 63 64 if (active) { 65 struct softirq_action *h; 66 67 restart: 68 /* Reset active bitmask before enabling irqs */ 69 softirq_active(cpu) &= ~active; 70 71 local_irq_enable(); 72 73 h = softirq_vec; 74 mask &= ~active; 75 76 do { 77 if (active & 1) 78 h•>action(h); 79 h++; 80 active >>= 1; 81 } while (active); 82 83 local_irq_disable(); Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ==================== kernel/softirq.c 235 267 ==================== ೼៥Ӏ䖭Ͼᚙ᱃Ёˈབࠡ᠔䗄ˈᠻ㸠ⱘ᳡ࡵ⿟ᑣЎ bh_action()ˈ݊ҷⷕ೼ kernel/softirq.c Ё˖ ϡӮᛳࠄೄ䲒ˈ៥Ӏህϡ໮䇈њDŽ 乏कߚᇣᖗˈϡ㛑䅽ᅗӀѦⳌᑆᡄ˄՟བ䗮䖛݅ѿⱘܼሔ䞣˅DŽ㟇Ѣ do_softirq()Ё݊ᅗⱘҷⷕˈ߭䇏㗙 ᛣНϞˈ䕃Ёᮁ᳡ࡵ⿟ᑣⱘᠻ㸠ᰃĀᑊথāⱘǃ໮ᑣⱘDŽԚᰃˈ䖭ѯ䕃Ёᮁ᳡ࡵ⿟ᑣⱘ䆒䅵੠ᅲ⦄ᖙ CPU ৃҹৠᯊ䖯ܹᇍ䕃Ёᮁ᳡ࡵ⿟ᑣⱘᠻ㸠˄㾕 78 㸠˅ˈߚ߿ᠻ㸠৘㞾᠔䇋∖ⱘ䕃Ёᮁ᳡ࡵDŽҢ䖭Ͼ in_interrupt()ˈ᠔ҹᇍ䕃Ёᮁ᳡ࡵ⿟ᑣⱘᠻ㸠ᑊ≵᳝䞛পࠡ䗄ⱘ㄀ѠᴵІ㸠࣪᥾ᮑDŽ䖭ህᰃ䇈ˈϡৠⱘ Ң do_softirq()ⱘҷⷕЁৃҹⳟߎˈՓ CPU ϡ㛑ᠻ㸠䕃Ёᮁ᳡ࡵ⿟ᑣⱘĀ݇वāা᳝ϔϾˈ䙷ህᰃ 11 #define local_bh_enable() cpu_bh_enable(smp_processor_id()) 10 #define local_bh_disable() cpu_bh_disable(smp_processor_id()) 9 8 #define cpu_bh_enable(cpu) do { barrier(); local_bh_count(cpu)••; } while (0) 7 #define cpu_bh_disable(cpu) do { local_bh_count(cpu)++; barrier(); } while (0) ==================== include/asm•i386/softirq.h 7 11 ==================== local_bh_disable()᳝݇ⱘᅮН೼ include/asm•i386/softirq.h Ё˖ ᰒ✊ˈ䖭Ͼ⌟䆩䰆ℶњ䕃Ёᮁ᳡ࡵ⿟ᑣⱘጠ༫ˈ䖭ህᰃࠡ䴶䆆ⱘ㄀ϔᴵІ㸠࣪ḐᮑDŽϢ 25 (local_irq_count(__cpu) + local_bh_count(__cpu) != 0); }) 24 #define in_interrupt() ({ int __cpu = smp_processor_id(); \ 23 */ 22 * or hardware interrupt processing? 21 * Are we in an interrupt context? Either doing bottom half 20 /* ==================== include/asm•i386/hardirq.h 20 25 ==================== ᠻ㸠ˈ᠔ҹ㽕䗮䖛ϔϾᅣ᪡԰ in_interrupt()ࡴҹẔ⌟ˈ䖭ᰃ೼ include/asm•i386/hardirq.h ЁᅮНⱘ˖ 䆌೼ϔϾ䕃Ёᮁ᳡ࡵ⿟ᑣݙ䚼ܕ䆌೼ϔϾ⹀Ёᮁ᳡ࡵ⿟ᑣݙ䚼ᠻ㸠ˈгϡܕ䕃Ёᮁ᳡ࡵ⿟ᑣ᮶ϡ 100 } 99 goto restart; 98 retry: 97 96 return; 95 */ 94 * it protected us. Now we are defenceless. 93 * window for infinite recursion, while we help local bh count, 92 /* Leave with locally disabled hard irqs. It is critical to close 91 90 local_bh_enable(); 89 88 } 87 goto retry; 86 if ((active &= mask) != 0) active = softirq_active(cpu); 85 84 234 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 㸠࣪᥾ᮑDŽ㟇Ѣḍ᥂ bh ߑ᭄㓪োᠻ㸠Ⳍᑨⱘߑ᭄ˈ䙷ህᕜㅔऩњDŽ೼៥Ӏ䖭Ͼᚙ᱃Ёˈ݋ԧⱘ bh ߑ ህϡ㛑䖯ܹ䖭Ͼऎ䯈њˈ᠔ҹ೼ӏԩᯊ䯈᳔໮া᳝ϔϾ CPU ೼ᠻ㸠 bh ߑ᭄DŽ䖭ህᰃࠡ䗄ⱘ㄀ѠᴵІ 䖭ᡞĀ䫕āህᰃܼሔ䞣 global_bh_lockˈা㽕᳝ϔϾ CPU ೼ 253 㸠㟇 260 㸠П䯈䖤㸠ˈ߿ⱘ CPU 74 #define spin_trylock(lock) (!test_and_set_bit(0,(lock))) ==================== include/linux/spinlock.h 74 74 ==================== bh_action()DŽ㗠঺ϔ䘧݇व spin_trylock()ህϡৠњˈᅗⱘҷⷕ೼ include/linux/spinlock.h Ё˖ Ϣࠡ䴶ⱘ in_interrupt()↨䕗ϔϟህৃⳟߎˈ䖭䖬ᰃ೼䰆ℶҢϔϾ⹀Ёᮁ᳡ࡵ⿟ᑣݙ䚼䇗⫼ 31 #define hardirq_trylock(cpu) (local_irq_count(cpu) == 0) ==================== include/asm•i386/hardirq.h 31 31 ==================== 䖭䞠ᇍ݋ԧ bh ߑ᭄ⱘᠻ㸠˄㾕 257 㸠˅জ䆒㕂њϸ䘧݇वDŽϔ䘧ᰃ hardirq_trylock()ˈ݊ᅮНЎ˖ 267 } 266 mark_bh(nr); 265 resched: 264 spin_unlock(&global_bh_lock); 263 resched_unlock: 262 261 return; 260 spin_unlock(&global_bh_lock); 259 hardirq_endlock(cpu); 258 257 bh_base[nr](); 256 if (bh_base[nr]) 255 254 goto resched_unlock; 253 if (!hardirq_trylock(cpu)) 252 251 goto resched; 250 if (!spin_trylock(&global_bh_lock)) 249 248 int cpu = smp_processor_id(); 247 { 246 static void bh_action(unsigned long nr) 245 244 spinlock_t global_bh_lock = SPIN_LOCK_UNLOCKED; 243 */ 242 It can be removed only after auditing all the BHs. 241 240 due to wait_on_irq(). 239 by kernel now, so that this lock is not made private only 238 spin_unlock_wait(&global_bh_lock). This operation is not used 237 It is still possible to make synchronize_bh() as 236 235 /* BHs are serialized by spinlock global_bh_lock. do_softirq()>bh_action()]] 235 236 ᭄ᰃ timer_bh()ˈ៥Ӏᇚ೼Āᯊ䩳Ёᮁāϔ㸠Ё䯙䇏䖭Ͼߑ᭄ⱘҷⷕ ԰Ўᇍ↨ˈ៥Ӏ߫ߎ঺ϔϾ䕃Ёᮁ᳡ࡵ⿟ᑣ tasklet_action()ⱘҷⷕˈ䇏㗙ৃҹᡞᅗϢ bh_action()↨ 䕗ˈⳟⳟ᳝ાѯ䞡㽕ⱘऎ߿DŽ䖭Ͼߑ᭄ⱘҷⷕ೼ kernel/softirq.c Ё˖ ==================== kernel/softirq.c 122 163 ==================== 122 struct tasklet_head tasklet_vec[NR_CPUS] __cacheline_aligned; 123 124 static void tasklet_action(struct softirq_action *a) 125 { 126 int cpu = smp_processor_id(); 127 struct tasklet_struct *list; 128 129 local_irq_disable(); 130 list = tasklet_vec[cpu].list; 131 tasklet_vec[cpu].list = NULL; 132 local_irq_enable(); 133 134 while (list != NULL) { 135 struct tasklet_struct *t = list; 136 137 list = list•>next; 138 139 if (tasklet_trylock(t)) { 140 if (atomic_read(&t•>count) == 0) { 141 clear_bit(TASKLET_STATE_SCHED, &t•>state); 142 143 t•>func(t•>data); 144 /* 145 * talklet_trylock() uses test_and_set_bit that imply 146 * an mb when it returns zero, thus we need the explicit 147 * mb only here: while closing the critical section. 148 */ 149 #ifdef CONFIG_SMP 150 smp_mb__before_clear_bit(); 151 #endif 152 tasklet_unlock(t); 153 continue; 154 } 155 tasklet_unlock(t); 156 } 157 local_irq_disable(); 158 t•>next = tasklet_vec[cpu].list; 159 tasklet_vec[cpu].list = t; 160 __cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); 161 local_irq_enable(); 162 } 163 } ᳔ৢˈ䕃Ёᮁ᳡ࡵ⿟ᑣˈࣙᣀ bh ߑ᭄ˈϢᐌ㾘Ёᮁ᳡ࡵ⿟ᑣⱘߚ⾏ᑊϡᰃᔎࠊᗻⱘˈ㽕ḍ᥂䆒໛ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 299 pushl %ebp 298 xorl %eax,%eax 297 pushl %eax 296 pushl %ds 295 error_code: ==================== arch/i386/kernel/entry.S 295 321 ==================== ⿟ᑣܹষ error_code ⱘҷⷕг೼ৠϔ᭛ӊ˄arch/i386/kernel/entry.S˅Ё˖ ໛DŽޚ⫼ⱘ⿟ᑣܹষDŽ㗠ᇚ᳡ࡵ⿟ᑣ do_page_fault()ⱘഄഔय़ܹේᷜˈ߭Ў䖯ܹ݋ԧⱘ᳡ࡵ⿟ᑣ԰དњ ໪䆒Ёᮁ໘⧚Ёⱘ common_interrupt ϔḋˈᰃ৘⾡ᓖᐌ໘⧚᠔݅ڣ䖭䞠ⱘ䏇䕀Ⳃᷛ error_code ህད 412 jmp error_code 411 pushl $ SYMBOL_NAME(do_page_fault) 410 ENTRY(page_fault) ==================== arch/i386/kernel/entry.S 410 412 ==================== 义䴶ᓖᐌ໘⧚ⱘܹষ page_fault ᰃ೼ arch/i386/kernel/entry.S ЁᅮНⱘ˖ Ͼϡৠˈᇍᓖᐌⱘ໘⧚੠ᇍЁᮁⱘ໘⧚೼ҷⷕЁг㽕᳝᠔ϡৠDŽ Ⳍᑨⱘᓖᐌ໘⧚⿟ᑣᇍේᷜࡴҹ䇗ᭈˈՓᕫ೼ CPU ᓔྟᠻ㸠 iret ᣛҸᯊේᷜ乊䚼ᰃ䖨ಲഄഔDŽ⬅Ѣ䖭 Ꮖ㒣ᯊ䖛๗䖕ˈCPU Ꮖ㒣᮴Ңⶹ䘧ᔧ߱থ⫳ᓖᐌⱘॳ಴ˈ಴ℸϡӮ㞾ࡼ䏇䖛ේᷜЁⱘ䖭ϔ乍ˈ㗠㽕䴴 ᰃˈCPU াᰃ೼䖯ܹᓖᐌᯊᠡⶹ䘧ᰃ৺ᑨ䆹ᡞߎ䫭ҷⷕय़ܹේᷜDŽ㗠Ңᓖᐌ໘⧚䗮䖛 iret ᣛҸ䖨ಲᯊ 㗠Ϩˈᅲ䰙Ϟ៥Ӏ೼㄀ 2 ゴЁᏆ㒣ⳟࠄ do_page_fault()བԩ䗮䖛䖭Ͼߎ䫭ҷⷕ䆚߿থ⫳ᓖᐌⱘॳ಴DŽৃ ⱘᡔᴃ䌘᭭៪Ⳍ݇ϧ㨫ˈԚᰃ㒱໻໮᭄ᓖᐌˈࣙᣀ៥Ӏ䖭䞠᠔݇ᖗⱘ义䴶ᓖᐌᰃӮѻ⫳ߎ䫭ҷⷕⱘDŽ ߎ䫭ҷⷕⱘ䆱ˈህᡞ䖭Ͼߎ䫭ҷⷕгय़ܹේᷜDŽᑊ䴲᠔᳝ⱘᓖᐌ䛑ѻ⫳ߎ䫭ҷⷕˈ᳝݇䆺ᚙৃখ㗗 Intel 䖛њDŽᔧᓖᐌথ⫳ᯊˈ೼Ϟ䗄䖭ѯ᪡԰Пৢˈ䖬㽕ࡴϞ䰘ࡴⱘ᪡԰DŽ䙷ህᰃ˖བᵰ᠔থ⫳ⱘᓖᐌѻ⫳ ේᷜⱘߛᤶˈᑊϨ㽕ᡞҷ㸼㗕ේᷜᣛ䩜ⱘ SS ੠ ESP ⱘݙᆍय़ܹේᷜDŽ䖭ϔ⚍ˈ៥ӀᏆ㒣೼ࠡ䴶ҟ㒡 ഄഔⱘ CS ੠ EIP ϸϾᆘᄬ఼ⱘݙᆍय़ܹේᷜDŽབᵰ CPU ⱘ䖤㸠㑻߿থ⫳ব࣪ˈ߭೼ℸПࠡ䖬㽕থ⫳ ϔ㡖DŽԚᰃˈ᳝ϔ⚍ᕜ䞡㽕ⱘϡৠDŽᔧЁᮁথ⫳ᯊˈCPU ᇚᆘᄬ఼ EFLAGS ⱘݙᆍˈҹঞҷ㸼ⴔ䖨ಲ ᮁ䮼ⱘ䖛⿟ˈࣙᣀේᷜⱘব࣪ˈϢ಴໪䆒Ёᮁ㗠ᓩ䍋ⱘ䖛⿟෎ᴀϞᰃϔḋⱘˈ䇏㗙ৃҹখ䯙໪䆒Ёᮁ 㸠˅ˈ᠔ҹᔧথ⫳义䴶ᓖᐌᯊˈCPU こ䖛Ёᮁ䮼ҹৢህⳈ᥹ࠄ䖒њ page_fault()DŽCPU ಴ᓖᐌ㗠こ䖛Ё Ў义䴶ᓖᐌ䆒㕂ⱘЁᮁ䮼ᣛ৥⿟ᑣܹষ page_fault˄㾕 IDT ߱ྟ࣪ϔ㡖Ё᠔ᓩ trap_init()Ёⱘ 970 䖭ϔ⚍៥ӀᏆ㒣೼߱ྟ࣪ϔ㡖ЁⳟࠄњDŽ Ϣ໪䆒Ёᮁϡৠˈ৘⾡ᓖᐌ䛑᳝Ўֱ݊⬭ⱘϧ⫼Ёᮁ৥䞣ˈ಴ℸⳌᑨⱘ߱ྟ࣪гᰃⳈ៾њᔧⱘˈ ⦄೼ˈ៥Ӏৃҹᴹ㸹Ϟ䖭Ͼ㔎ষњDŽ do_page_fault()П䯈ⱘ䙷ϔ↉䏃⿟ˈҹঞҢ do_page_fault()䖨ಲПৢࠄ CPU 䖨ಲࠄ⫼᠋ぎ䯈䖭ϔ↉䏃⿟DŽ ⱘЁᮁ੠ᓖᐌᴎࠊˈ᠔ҹ᱖ᯊ䏇䖛њᇍ义䴶ᓖᐌⱘડᑨ䖛⿟ˈгህᰃҢথ⫳ᓖᐌ㟇 CPU ࠄ䖒 ៥Ӏ೼㄀ 2 ゴЁҟ㒡ݙḌᇍ义䴶ᓖᐌ໘⧚ᯊˈᰃҢ do_page_fault()ᓔྟⱘDŽᔧᯊ಴Ўᇮ᳾ҟ㒡 CPU 3.6 义䴶ᓖᐌⱘ䖯ܹ੠䖨ಲ ᅮDŽއг䆌䖬᳝䆒䅵Ҏਬⱘ∈ᑇ˅ᴹ˄މ偅ࡼⱘ݋ԧᚙ 237 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! াᰃ೼ ORIG_EAX ԡ㕂Ϟᰃ㋏㒳䇗⫼ো㗠ϡᰃЁᮁ䇋∖োDŽ 乎֓ᦤϔϟˈ㋏㒳䇗⫼ᯊⱘේᷜ೼ᠻ㸠ᅠ SAVE_ALL ҹৢϢ೒ 3.7 ⱘে䖍˄Ёᮁ˅޴Тᅠܼϔḋˈ SAVE_ALL ҹৢⱘේᷜ˄ে䖍˅԰ϔ↨䕗DŽ ҹঞЎҔМDŽ㾖ᆳ೒ 3.7ˈ៥Ӏᡞ CPU ᠻ㸠ࠄ䖭䞠ⱘ 307 㸠ᯊⱘේᷜ˄Ꮊ䖍˅Ϣ CPU ೼໪䆒Ёᮁᯊ 䖯ܹЁᮁડᑨᯊ䙷ḋᓩ⫼ SAVE_ALLDŽ䅽៥Ӏᴹⳟⳟ᳝ҔМऎ߿ˈڣ䇏㗙г䆌⊼ᛣࠄњˈ䖭䞠ᑊϡ 321 jmp ret_from_exception 320 addl $8,%esp 319 call *%edi 318 GET_CURRENT(%ebx) 317 movl %edx,%es 316 movl %edx,%ds 315 movl $(__KERNEL_DS),%edx 314 pushl %edx # push the pt_regs pointer 313 pushl %esi # push the error code 312 movl %esp,%edx 311 movl %ecx, ES(%esp) 310 movl %eax, ORIG_EAX(%esp) 309 movl ES(%esp), %edi # get the function address 308 movl ORIG_EAX(%esp), %esi # get the error code 307 movl %es,%ecx 306 cld 305 pushl %ebx 304 pushl %ecx 303 decl %eax # eax = •1 302 pushl %edx 301 pushl %esi pushl %edi 300 238 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! %esi Ёᰃߎ䫭ҷⷕˈ㗠 312 㸠Ꮖ㒣ᡞේᷜᣛ䩜ⱘᔧࠡݙᆍᣋ䋱ࠄ%edx ЁDŽ೼Ёᮁϔ㡖Ё៥ӀᏆ㒣䆆䖛ˈ ᡞ%esi ੠%edx ⱘݙᆍय़ܹේᷜDŽ៥Ӏⶹ䘧ˈৢܜಲࠄࠡ䴶 error_code ⱘҷⷕЁˈ㄀ 313 㸠੠ 314 㸠 䖭䞠໮њϔ㸠“pushl $0āˈᇚ 0 य़ܹේᷜЁϢߎ䫭ҷⷕⳌᑨⱘഄᮍˈℸৢህ䛑ϔḋњDŽ 326 jmp error_code 325 pushl $ SYMBOL_NAME(do_coprocessor_error) 324 pushl $0 323 ENTRY(coprocessor_error) ==================== arch/i386/kernel/entry.S 323 326 ==================== ᇐ㟈ⱘᓖᐌ coprocessor_error˖ ࠡ㸹ϞϔϾህᰃњDŽ䇋ⳟˈৠϔ⑤᭛ӊ˄arch/i386/kernel/entry.S˅DŽ಴ण໘⧚఼˄coprocessor˅ߎ䫭㗠 䇏㗙г䆌Ӯ䯂˖䙷МˈᇍѢϡѻ⫳ߎ䫭ҷⷕⱘᓖᐌজᗢМ໘⧚ਸ਼˛ᕜㅔऩˈ೼䖯ܹ error_code П དњDŽ៥Ӏ೼Ёᮁϔ㡖ЁᏆ㒣ⳟࠄᇚᴹ䖨ಲᯊ೼ RESTORE_ALL ЁӮᡞ ORIG_EAX 䏇䖛এDŽ ᷜⱘݙᆍϢЁᮁ៪㋏㒳䇗⫼ᯊህᅠܼϔḋњˈাᰃ ORIG_EAX ⱘԡ㕂ϞЎ•1DŽ䖭Мϔᴹˈේᷜህ䇗ᭈ ᣛ䩜 do_page_fault()೼%edi Ёˈ㗠ේᷜЁব៤њᆘᄬ఼%es ⱘࡃᴀDŽ㟇ℸˈгህᰃ೼ 311 㸠ҹৢˈේ ᣛ䩜䕀⿏ࠄᆘᄬ఼%edi ЁDŽ೼ℸПࠡⱘ 307 㸠Ꮖ㒣ᇚ%es ⱘݙᆍ㺙ܹњ%ecxˈ᠔ҹ೼ 311 㸠ҹৢߑ᭄ ៤њ•1˄㾕 298 㸠੠ 303 㸠˅DŽৠᯊˈজҹᆘᄬ఼%ecx ⱘݙᆍ᳓ᤶේᷜЁ ES ໘ⱘߑ᭄ᣛ䩜ˈ㗠ᡞߑ᭄ Ёˈᑊᇚ݊᳓ᤶ៤%eax ЁⱘݙᆍDŽ䖭ḋϔᴹˈߎ䫭ҷⷕህࠄњ%esi ЁDŽ㗠ේᷜЁⱘ ORIG_EAX ህব ষഄഔDŽ݊ᅗህ䛑ϔḋњDŽৃᰃˈϟ䴶ӮᇚේᷜЁᇍᑨѢ ORIG_EAX ԡ㕂Ϟⱘݙᆍ䕀⿏ࠄᆘᄬ఼%esi CPU ೼থ⫳ᓖᐌᯊय़ܹේᷜⱘߎ䫭ҷⷕDŽ঺ϔϾᰃ೼Ϣ ES Ⳍᑨⱘԡ㕂Ϟˈ⦄೼ᰃ do_page_fault()ⱘܹ ↨䕗Пৢˈৃҹⳟࠄ݊ᅲгা᳝೼ϸϾԡ㕂ϞϡৠDŽϔϾᰃϢ ORIG_EAX ᇍᑨⱘԡ㕂Ϟˈ⦄೼ᰃ ೒ 3.7 ᓖᐌ໘⧚੠Ёᮁ໘⧚㋏㒳ේᷜᇍ✻೒ 239 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! sleep()П ڣϟDŽϔ⾡ᰃĀ㞾ᜓāⱘˈ䗮䖛މ⫳DŽ೼䖯⿟ϔゴЁˈ䇏㗙ᇚӮⳟࠄ䖯⿟䇗ᑺথ⫳೼ϸ⾡ᚙ 㽕Ẕᶹᰃ৺䳔㽕䇗ᑺˈ㢹᳝䳔㽕ህ䖯㸠䖯⿟䇗ᑺDŽџᅲϞˈ䇗ᑺাᰃᔧ CPU ೼ݙḌЁ䖤㸠ᯊᠡৃ㛑থ ៥Ӏ೼Ёᮁϔ㡖ЁⳟࠄˈݙḌ೼↣⃵Ёᮁ˄ҹঞ㋏㒳䇗⫼੠ᓖᐌ˅᳡ࡵᅠ↩䖨ಲ⫼᠋ぎ䯈Пࠡ䛑 䖬䖰ϡℶѢℸDŽ 㛑ℷᐌ䖤㸠њˈ಴Ў make ᰃ䴴ᯊ䯈ᷛ䆄ᴹ⹂ᅮᰃ৺䳔㽕䞡ᮄ㓪䆥ҹঞ䖲᥹ⱘDŽৃᰃᯊ䩳Ёᮁⱘ䞡㽕ᗻ ᰃऩ㒃ⱘ䅵ᯊгᏆ㒣䎇໳䞡㽕њDŽ߿ⱘϡ䇈ˈ≵᳝ℷ⹂ⱘᯊ䯈݇㋏ˈԴ⫼ᴹ䞡ᓎݙḌⱘᎹ݋ make ህϡ ೼᠔᳝ⱘ໪䚼ЁᮁЁˈᯊ䩳Ёᮁ䍋ⴔ⡍⅞ⱘ԰⫼ˈ݊԰⫼䖰䴲ऩ㒃ⱘ䅵ᯊ᠔㛑Ⳍ↨DŽᔧ✊ˈेՓ 3.7 ᯊ䩳Ёᮁ 䖬᳝ೄ䲒ৃҹಲࠄࠡ޴㡖ݡⳟⳟDŽ བᵰ≵᳝䕃Ёᮁ䇋∖䳔㽕໘⧚ˈህⳈ᥹䖯ܹ ret_from_intrDŽৢ䴶䖭ѯҷⷕ䇏㗙Ꮖ㒣ᕜ❳ᙝњˈ㽕ᰃ 279 jmp restore_all 278 jne ret_with_reschedule 277 testl $(VM_MASK | 3),%eax # return to VM86 mode or non•supervisor? 276 movb CS(%esp),%al 275 movl EFLAGS(%esp),%eax # mix EFLAGS and CS 274 GET_CURRENT(%ebx) 273 ENTRY(ret_from_intr) 272 271 jne handle_softirq 270 #endif 269 testl SYMBOL_NAME(irq_stat)+4,%ecx # softirq_mask 268 movl SYMBOL_NAME(irq_stat),%ecx # softirq_active 267 #else 266 testl SYMBOL_NAME(irq_stat)+4(,%eax),%ecx # softirq_mask 265 movl SYMBOL_NAME(irq_stat)(,%eax),%ecx # softirq_active 264 shll $CONFIG_X86_L1_CACHE_SHIFT,%eax 263 movl processor(%ebx),%eax 262 GET_CURRENT(%ebx) 261 #ifdef CONFIG_SMP 260 ret_from_exception: [page_fault•>error_code•>...•>ret_from_exception] ==================== arch/i386/kernel/entry.S 260 279 ==================== Ё˖ do_page_fault()ⱘ㉏ൟᰃ voidˈ᠔ҹ≵᳝䖨ಲؐDŽret_from_exception ⱘҷⷕг೼ arch/i386/kernel/entry.S Ң䇗⫼ⱘߑ᭄ˈ೼䖭䞠ᰃ do_page_fault()䖨ಲҹৢˈCPU ህ䕀ܹ ret_from_exceptionDŽ⬅Ѣ ડᑨЁ䛑Ꮖⳟࠄ䖛ˈ䖭䞠ህϡ䞡໡њDŽ ໛Ꮉ԰䇏㗙೼Ёᮁޚ໛DŽ݊ᅗϔѯޚ䇗⫼খ᭄DŽᡞ䇗⫼খ᭄य़ᷜҹৢˈህЎ 319 㸠ⱘߑ᭄䇗⫼԰དњ ⱘ䍋⚍DŽ᠔ҹˈ䖭Ѡ㗙ϔ乍ᰃߎ䫭ҷⷕ㗠঺ϔ乍֓ᰃ pt_regs 㒧ᵘᣛ䩜ˈ䖭ℷᰃ do_page_fault()ⱘϸϾ ݙḌᇚ SAVE_ALL ҹৢේᷜЁⱘݙᆍ㾚ৠϔϾ pt_regs ᭄᥂㒧ᵘˈ㗠ᔧᯊⱘේᷜᣛ䩜ᣛ৥䆹᭄᥂㒧ᵘ 240 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᭄᥂㒧ᵘЁ䆄䕑ⱘᰃҢग़৆Ϟᶤϔࠏᓔྟⱘᯊ䯈ⱘĀ㒱ᇍؐāˈ᭄݊ؐᴹ㞾䅵ㅫᴎЁϔϾ CMOS 91 }; 90 suseconds_t tv_usec; /* microseconds */ 89 time_t tv_sec; /* seconds */ 88 struct timeval { ==================== include/linux/time.h 88 91 ==================== ㉏ൟЎ struct timevalˈᰃ೼ include/linux/time.h ЁᅮНⱘ˖ ᔧ៥ӀᦤঞĀ㋏㒳ᯊ䩳āᯊˈᅲ䰙ϞᰃᣛⴔݙḌЁⱘϸϾܼሔ䞣ПϔDŽϔϾᰃ᭄᥂㒧ᵘ xtimeˈ݊ 706 } ==================== arch/i386/kernel/time.c 706 706 ==================== ...... 704 setup_irq(0, &irq0); ==================== arch/i386/kernel/time.c 704 704 ==================== 631 xtime.tv_usec = 0; 630 xtime.tv_sec = get_cmos_time(); 629 628 extern int x86_udelay_tsc; 627 { 626 void __init time_init(void) ==================== arch/i386/kernel/time.c 626 631 ==================== arch/i386/kernel/time.c Ё˖ ໛DŽߑ᭄ time_init()ⱘҷⷕ೼ޚᅠ៤ᇍ䇗ᑺᴎࠊⱘ߱ྟ࣪ˈ԰དܜህৃ㛑㽕䖯㸠䇗ᑺˈ᠔ҹ㽕 Ң䖭䞠гৃҹⳟߎˈᯊ䩳Ёᮁ੠䇗ᑺᰃᆚߛ㘨㋏೼ϔ䍋ⱘDŽҹࠡг䆆ࠄ䖛ˈϔᮺᓔྟ᳝ᯊ䩳Ёᮁ 537 time_init(); 536 sched_init(); 535 init_IRQ(); 534 trap_init(); ==================== init/main.c 534 537 ==================== ᅠ៤ҹৢˈህ䕂ࠄᯊ䩳Ёᮁⱘ߱ྟ࣪DŽ䇋ⳟ init/main.c Ё start_kernel()ⱘ⠛↉˖ ೼߱ྟ࣪䰊↉ˈ೼ᇍ໪䚼Ёᮁⱘ෎⸔䆒ᮑˈгህᰃ IRQ 䯳߫ⱘ߱ྟ࣪ˈҹঞᇍ䇗ᑺᴎࠊⱘ߱ྟ࣪ ᮁᰃ㓈ᡸĀ⫳ੑāⱘᖙ㽕ᴵӊˈ䲒ᗾҎӀ⿄ᯊ䩳ЁᮁЎ“heart beatāˈгेĀᖗ䏇āDŽ Linux 䖭ḋⱘĀߚᯊ㋏㒳āᴹ䇈ˈᯊ䩳Ё ڣᅮⱘᯊ䯈ݙᖙᅮӮথ⫳ⱘˈህᰃĀᯊ䩳ЁᮁāDŽ᠔ҹˈᇍѢ 䆹䇗ᑺˈ䙷гᕫ㽕᳝Ёᮁǃᓖᐌ៪㋏㒳䇗⫼Փ CPU 䖯ܹݙḌ䖤㸠ᠡ㛑থ⫳䇗ᑺDŽ㗠ᚳϔৃҹ乘⌟೼ϔ ᅮᰃ৺ᑨއ㑻˅ᴹܜ߭˄՟བ䖯⿟ⱘӬޚᬒⱘ䖯⿟߭䱋೼⅏ᕾ⦃ЁDŽ䗔ϔℹ䆆ˈेՓ៥Ӏ䖬᳝݊ᅗⱘ ϟ∌䖰ϡӮ᳝䇗ᑺˈ㗠⅏ᡧԣ CPU ϡމњDŽ䖭ᰃ಴Ўˈ೼䖭⾡ᚙخϾ㋏㒳ህ೼ॳഄᠧ䕀ҔМџгϡ㛑 ⦃ˈ㗠೼ᕾ⦃ԧݙг≵᳝԰ӏԩ㋏㒳䇗⫼ˈᑊϨг≵᳝থ⫳໪䆒Ёᮁˈ䙷Мˈ㽕ᰃ≵᳝ᯊ䩳Ёᮁˈᭈ ੠ᯊᴎˈ㗠া㛑ձ䌪Ѣ৘Ͼ䖯⿟ⱘĀᗱᛇ㾝ᙳāњDŽ䆩ᛇˈབᵰ᳝ϔϾ䖯⿟೼⫼᠋ぎ䯈Ё䱋ܹњ⅏ᕾ ݙḌህӮᔎࠊഄ䇗ᑺ݊ᅗ䖯⿟ᴹ䖤㸠DŽབᵰ≵᳝њᯊ䩳ˈݙḌህ༅এњϢᯊ䯈᳝݇ⱘᔎࠊ䇗ᑺⱘձ᥂ ᴹ䖤㸠DŽ঺ϔ⾡ᰃĀᔎࠊāⱘˈᔧϔϾ䖯⿟䖲㓁䖤㸠ⱘᯊ䯈䍙䖛ϔᅮ䰤ᑺᯊˈܜ䅽ݙḌ䇗ᑺ݊ᅗ䖯⿟ ㉏ⱘ㋏㒳䇗⫼ᅲ⦄˗៪㗙ᰃ೼䗮䖛݊ᅗ㋏㒳䇗⫼䖯ܹݙḌҹৢ಴ᶤ⾡ॳ಴ফ䰏䳔㽕ㄝᕙˈ㗠Ā㞾ᜓ” 241 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 472 if (use_tsc) 471 470 write_lock(&xtime_lock); 469 */ 468 * locally disabled. •arca 467 * the irq version of write_lock because as just said we have irq 466 * CPU. We need to avoid to SMP race with it. NOTE: we don' t need 465 * disabled but we don't know if the timer_bh is running on the other 464 * Here we are in the timer irq handler. We just have irqs locally 463 /* 462 461 int count; 460 { 459 static void timer_interrupt(int irq, void *dev_id, struct pt_regs *regs) 458 */ 457 * we later on can estimate the time of day more exactly. 456 * Time Stamp Counter value at the time of the timer interrupt, so that 455 * This is the same as the above, except we _also_ save the current 454 /* ==================== arch/i386/kernel/time.c 454 505 ==================== Ў 1DŽ᳡ࡵ⿟ᑣ timer_interrupt()ⱘҷⷕ೼ৠϔϾ᭛ӊ˄arch/i386/kernel/time.c˅Ё˖ ᖫԡ SA_SHIRQ Ў 0˗㗠Ϩ೼ᠻ㸠 timer_interrupt()ⱘ䖛⿟Ёϡᆍ䆌Ёᮁˈ಴Ўᷛᖫԡ SA_INTERRUPT ৃ㾕ˈᯊ䩳Ёᮁⱘ᳡ࡵ⿟ᑣЎ timer_interrupt()˗Ёᮁ䇋∖ 0 Ўᯊ䩳Ёᮁϧ⫼ˈ಴Ў irq0.flags Ёᷛ 547 static struct irqaction irq0 = { timer_interrupt, SA_INTERRUPT, 0, "timer", NULL, NULL}; ==================== arch/i386/kernel/time.c 547 547 ==================== arch/i386/kernel/time.c ЁᅮНⱘ˖ 䩳Ёᮁⱘ䇋∖োЎ 0DŽ㄀ 2 Ͼখ᭄ᰃᣛ৥ϔϾ irqaction ᭄᥂㒧ᵘ irq0 ⱘᣛ䩜DŽirq0 гᰃ೼ 䇏㗙೼Ёᮁϔ㡖Ёⳟࠄ䖛 setup_irq()ˈৃҹಲ䖛༈এⳟϔϟDŽ䖭䞠ⱘ㄀ϔϾখ᭄ЎЁᮁ䇋∖োˈᯊ ᖗᯊ䯈ⱘ㊒ᑺˈ᠔ҹ䏇䖛њҷⷕЁ᳝݇ⱘ䚼ߚˈ㗠া݇⊼ᏺ᳝ᴀ䋼ᗻⱘ䚼ߚDŽ 䖤㸠ϞगᑈᠡӮ⑶ߎDŽᰒ✊ˈৃҹ߽⫼ TSC ⱘ䇏᭄ᴹᬍ୘ᯊ䩳Ёᮁⱘ㊒ᑺDŽϡ䖛ˈ៥Ӏ೼䖭䞠ᑊϡ݇ 乥⥛Ў 500MHzˈ߭ TSC ⱘ䅵ᯊ㊒ᑺЎ 2nsDŽ⬅Ѣ TSC ᰃϾ 64 ԡⱘ䅵఼᭄ˈ݊䅵᭄㽕㒣䖛䖲㓁ކ䩳㛝 䖯㸠䅵᭄ˈ՟བ㽕ᰃ CPU ⱘᯊކ఼᭄ā˄Time Stamp Counter˅TSCDŽ䖭Ͼ䅵఼᭄ᇍ偅ࡼ CPU ⱘᯊ䩳㛝 䕗ᮄⱘ i386 CPU Ё˄Џ㽕ᰃ Pentium ঞҹৢ˅ˈ䖬䆒㕂њϔϾ⡍⅞ⱘ 64 ԡᆘᄬ఼ˈ⿄ЎĀᯊ䯈ॄ䆄䅵 ㋏㒳Ё᳝ᕜ໮಴㋴Ӯᕅડࠄᯊ䩳Ёᮁ೼ᯊ䯈Ϟⱘ㊒⹂ᑺˈ᠔ҹ㽕䗮䖛ད໮᠟↉ᴹࡴҹ᷵ℷDŽ೼↨ ব䞣DŽ include/asm•i386/param.h ЁDŽҹৢ䇏㗙Ӯⳟࠄˈ೼ݙḌЁ jiffies 䖰䖰↨ xtime 䞡㽕ˈᰃϾ㒣ᐌ㽕⫼ࠄⱘ Ѣ㋏㒳ЁⱘϔϾᐌ᭄ HZˈ䖭Ͼᐌ᭄ᅮНѢއᰃᯊ䩳Ёᮁⱘ਼ᳳˈ᳝ᯊ׭г⿄ЎϔϾ tickˈপ ঺ϔϾܼሔ䞣ᰃϾ᮴ヺোᭈ᭄ˈি jiffiesˈ䆄ᔩⴔҢᓔᴎҹᴹᯊ䩳Ёᮁⱘ⃵᭄DŽ↣Ͼ jiffy ⱘ䭓ᑺህ ᯊ䯈ⱘ㊒ᑺЎ⾦DŽ㗠ᯊ䩳Ёᮁˈ߭ᰃ⬅঺ϔϾ᱊⠛ѻ⫳ⱘDŽ ⹂ⱘᯊ䯈DŽϞ䴶ⱘ 630 㸠ህᰃ䗮䖛 get_cmos_time()Ң CMOS ᯊ䩳᱊⠛Ёᡞᔧᯊⱘᅲ䰙ᯊ䯈䇏ܹ xtimeˈ ᱊⠛ˈᐌᐌ⿄ЎĀᅲᯊᯊ䩳āDŽ䖭ഫ CMOS ᱊⠛ᰃ⬅⬉∴կ⬉ⱘˈ᠔ҹेՓᴎ఼ᮁњ⬉г䖬㛑㓈ᣕℷ 242 243 473 { 474 /* 475 * It is important that these two operations happen almost at 476 * the same time. We do the RDTSC stuff first, since it's 477 * faster. To avoid any inconsistencies, we need interrupts 478 * disabled locally. 479 */ 480 481 /* 482 * Interrupts are just disabled locally since the timer irq 483 * has the SA_INTERRUPT flag set. •arca 484 */ 485 486 /* read Pentium cycle counter */ 487 488 rdtscl(last_tsc_low); 489 490 spin_lock(&i8253_lock); 491 outb_p(0x00, 0x43); /* latch the count ASAP */ 492 493 count = inb_p(0x40); /* read the latched count */ 494 count |= inb(0x40) << 8; 495 spin_unlock(&i8253_lock); 496 497 count = ((LATCH•1) • count) * TICK_SIZE; 498 delay_at_last_interrupt = (count + LATCH/2) / LATCH; 499 } 500 501 do_timer_interrupt(irq, NULL, regs); 502 503 write_unlock(&xtime_lock); 504 505 } ೼䖭䞠៥Ӏᑊϡ݇ᖗ໮໘⧚఼ SMP 㒧ᵘˈгϡ݇ᖗᯊ䯈ⱘ㊒ᑺˈ᠔ҹᅲ䰙Ϟা࠽ϟ 501 㸠ⱘ do_timer_interrupt()˖ ==================== arch/i386/kernel/time.c 380 386 ==================== [timer_interrupt()>do_timer_interrupt()] 380 /* 381 * timer_interrupt() needs to keep up the real•time clock, 382 * as well as call the "do_timer()" routine every clocktick 383 */ 384 static inline void do_timer_interrupt(int irq, void *dev_id, struct pt_regs *regs) 385 { 386 #ifdef CONFIG_X86_IO_APIC ...... ==================== arch/i386/kernel/time.c 400 435 ==================== 400 #endif Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ==================== kernel/timer.c 674 685 ==================== Ё˖ ⱘⳂⱘ೼Ѣ⿃㌃㒳䅵ֵᙃˈгϡᰃ៥Ӏ݇ᖗⱘ䞡⚍DŽ᳔ৢህা࠽ϟ do_timer()њˈ䙷ᰃ೼ kernel/timer.c 䖭ḋˈህা࠽ϟњϸӊџDŽϔӊџᰃ do_timer()ˈ঺ϔӊᰃ x86_do_profile()DŽ݊Ё x86_do_profile() ᯊ䩳ⱘ㊒ᑺ˄420̚433 㸠 ˅DŽ ℸ໪ˈ៥Ӏ೼䖭䞠гϡ݇ᖗˈމ˄402̚405 㸠˅੠ PS/2 ⱘ“Micro channelā˄ 435̚449 㸠˅ⱘ⡍⅞ᚙ ৠḋˈ៥Ӏ೼䖭䞠ᑊϡ݇ᖗ໮໘⧚఼ SMP 㒧ᵘЁ䞛⫼ APIC ᯊⱘ⡍⅞໘⧚ˈгϡ݇ᖗ SGI ԰キ 450 } 449 #endif ==================== arch/i386/kernel/time.c 449 450 ==================== ...... 435 #ifdef CONFIG_MCA 434 433 } 432 last_rtc_update = xtime.tv_sec • 600; /* do it again in 60 s */ 431 else 430 last_rtc_update = xtime.tv_sec; 429 if (set_rtc_mmss(xtime.tv_sec) == 0) 428 xtime.tv_usec <= 500000 + ((unsigned) tick) / 2) { 427 xtime.tv_usec >= 500000 • ((unsigned) tick) / 2 && 426 xtime.tv_sec > last_rtc_update + 660 && 425 if ((time_status & STA_UNSYNC) == 0 && 424 */ 423 * called as close as possible to 500 ms before the new second starts. 422 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be 421 * If we have an externally synchronized Linux clock, then update 420 /* 419 418 #endif 417 smp_local_timer_interrupt(regs); 416 if (!smp_found_config) 415 #else 414 x86_do_profile(regs•>eip); 413 if (!user_mode(regs)) 412 #ifndef CONFIG_X86_LOCAL_APIC 411 */ 410 * system, in that case we have to call the local interrupt handler. 409 * profiling, except when we simulate SMP mode on a uniprocessor 408 * In the SMP case we use the local APIC timer interrupt to do the 407 /* 406 do_timer(regs); 405 #endif 404 co_cpu_write(CO_CPU_STAT,co_cpu_read(CO_CPU_STAT) & ~CO_STAT_TIMEINTR); 403 /* Clear the interrupt */ ifdef CONFIG_VISWS# 402 401 244 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᅮⱘˈᅮНѢއ ㋏㒳ⱘᯊ䩳ЁᮁᣖϞ䩽DŽࠡ䴶䆆䖛ˈLinux ㋏㒳ᯊ䩳ⱘ乥⥛ᰃ⬅ϔϾᐌ᭄ HZ ⱘাᰃᗢḋϢއ݊ᅲˈ⬅Ѣ䖭Ͼ໪䆒ⱘ᥻ࠊᅠܼᰃ਼ᳳᗻⱘˈᴀᴹህϡᖙՓ⫼⣀ゟⱘЁᮁˈ᠔䳔㽕㾷 ѯ䅵ㅫˈᑊᡞ䅵ㅫ㒧ᵰݭܹᅗⱘ᥻ࠊᆘᄬ఼ҹ偅ࡼϔৄℹ䖯偀䖒ˈ㗠䆹䆒໛ᑊϡ݋໛ѻ⫳Ёᮁⱘࡳ㛑DŽ 㽕ЎϔϾ໪䚼䆒໛ݭ偅ࡼ⿟ᑣˈ䆹䆒໛㽕∖↣ 20ms 䇏ϔ⃵ᅗⱘ⢊ᗕᆘᄬ఼ˈݡḍ᥂䇏ܹⱘֵᙃ䖯㸠ᶤ 䩽ˈՓϔѯ᪡԰ᣝ䖤㸠ᯊⱘ䳔㽕Āᣖ䴴ā೼ᶤ⾡Ёᮁ៪⫮㟇ᶤ⾡݊ᅗⱘџӊЁDŽВ՟ᴹ䇈ˈབᵰ៥Ӏ ᗢМࡲ˛䖬᳝ˈ᳈䞡㽕ഄˈ೼ᅲ䏉ЁᐌᐌӮ᳝㽕∖䅽ᶤѯ᪡԰䎳ᶤϾᏆ㒣ᄬ೼ⱘЁᮁ᳡ࡵࡼᗕഄᣖϞ ࠡ䆆䖛ˈLinux ݙḌЁৃ㛑ⱘ bh ⱘ᭄䞣ᰃ 32DŽ䇏㗙ᖗ䞠ৃ㛑Ꮖ㒣೼ᛇˈ32 Ͼ bh ໳৫˛བᵰ䳔㽕᳈໮ ߑ᭄Ў timer_bh()DŽ㗠 TQUEUE_BH ੠ IMMEDIATE_BHˈ߭জᰃݙḌЁϸ乍䞡㽕ⱘ෎⸔䆒ᮑDŽ៥Ӏҹ ᯊ䩳Ёᮁ᳡ࡵǃԚজϡᰃ䙷М㋻ᗹˈ៪㗙ৃҹ೼᳈Ўᆑᵒⱘ⦃๗˄ᓔЁᮁ˅ϟᅠ៤ⱘ᪡԰ˈ݊Ⳍᑨⱘ 䖭䞠߱ྟ࣪њϝϾ bhDŽ㄀ϔϾᰒ✊ᰃ೼↣⃵ᯊ䩳Ёᮁ㒧ᴳПࠡ䛑㽕ᠻ㸠ⱘˈ⫼ᴹᅠ៤䘏䕥ϞሲѢ 1262 init_bh(IMMEDIATE_BH, immediate_bh); 1261 init_bh(TQUEUE_BH, tqueue_bh); 1260 init_bh(TIMER_BH, timer_bh); ==================== kernel/sched.c 1260 1262 ==================== 䆒㕂དњⱘDŽᇍℸˈ೼ kernel/sched.c ⱘ sched_init()Ё᳝ϝ㸠䞡㽕ⱘҷⷕ˖ܜ ܹ tasklet_hi_vec ⱘ䯳߫ЁˈՓ CPU ೼Ёᮁ䖨ಲПࠡᠻ㸠Ϣ TIMER_BH ᇍᑨⱘߑ᭄ timer_bh()ˈ䖭ᰃџ ೼ᶤϾ bh 䯳߫Ё䖬᳝џㄝⴔ㽕໘⧚DŽ㗠䖭䞠ⱘ 682 㸠ህ䗮䖛 mark_bh()ᇚ bh_task_vec[TIMER_BH]ᣖ ៥Ӏ೼ࠡ޴㡖ЁᏆҟ㒡䖛Ёᮁ᳡ࡵ⿟ᑣⱘĀৢञāˈे bhDŽCPU ೼ҢЁᮁ䖨ಲПࠡ䛑㽕Ẕᶹᰃ৺ 㸠Ўᯊ䩳ЁᮁᅝᥦⱘĀৢञā੠Ā㄀Ѡ㘠Ϯāˈै㽕㗫䌍໮ᕫ໮ⱘ㊒࡯DŽ гᰃЎ䇗ᑺⱘⳂⱘDŽᇍ⫼Ѣ䆄ᯊ੠㒳䅵ⱘ䖭ѯব䞣ⱘ᪡԰ৃ䇈ᰃᯊ䩳ЁᮁⱘĀࠡञāˈৃᰃ 682 㸠੠ 684 ߑ᭄ⱘৡᄫгৃҹⳟߎˈᅗ໘⧚ⱘᰃᔧࠡ䖯⿟Ϣᯊ䯈᳝݇ⱘব䞣ˈϔᮍ䴶ᰃЎ㒳䅵ⱘⳂⱘˈ঺ϔᮍ䴶 ߑ᭄ update_process_times()ህϢ䖯⿟ⱘ䇗ᑺ᳝݇њˈ៥Ӏᇚ೼䖯⿟䇗ᑺϔ㡖Ёݡᴹҟ㒡DŽԚᰃˈҢ ᯊ䩳਼ᳳ޴ТᰃⳌৠⱘˈԚࠡ㗙ֱ䆕њ᪡԰ⱘĀॳᄤāᗻDŽ ᇚ jiffies ⱘݙᆍ MOV 㟇ᆘᄬ఼ EAXˈ✊ৢ䗦๲ˈݡ MOV ಲএDŽѠ㗙᠔㗫䌍ⱘ CPUܜৃ㛑Ӯ㹿㓪䆥៤ ⱘ INC ᣛҸDŽ㗠㢹䞛⫼“jiffies++āˈ᳝߭ܗϾĀॳᄤāⱘ᪡԰DŽgcc ᇚ䖭ᴵ䇁হ㗏䆥៤ϔᴵᇍݙᄬऩ ⫼䖭Мϔ⾡༛ᗾⱘᮍᓣਸ਼˛䖭ᰃ಴Ўҷⷕⱘ԰㗙㽕Փᇚ䗦๲ jiffies ⱘ᪡԰೼ϔᴵᣛҸЁᅲ⦄ˈ៤Ўϔ 䖭䞠ⱘ㄀ 676 㸠Փ jiffies ࡴ 1DŽ㒚ᖗⱘ䇏㗙ৃ㛑Ӯ䯂ˈЎҔМ䖭䞠ϡ⫼ㅔऩⱘ“jiffies++āˈ㗠㽕Փ 685 } 684 mark_bh(TQUEUE_BH); 683 if (TQ_ACTIVE(tq_timer)) 682 mark_bh(TIMER_BH); 681 #endif 680 update_process_times(user_mode(regs)); 679 678 /* SMP process accounting uses the local APIC timer */ 677 #ifndef CONFIG_SMP 676 (*(unsigned long *)&jiffies)++; 675 { 674 void do_timer(struct pt_regs *regs) timer_interrupt()>do_timer_interrupt()>do_timer()]] 245 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 652 /* 651 650 unsigned long ticks; 649 { 648 static inline void update_times(void) 647 646 rwlock_t xtime_lock = RW_LOCK_UNLOCKED; 645 */ 644 * This spinlock protect us from races in SMP while playing with xtime. •arca 643 /* [timer_bh()>update_times()] ==================== kernel/timer.c 643 666 ==================== ⳟৠϔ᭛ӊ˄kernel/timer.c˅Ёⱘ update_times()˖ܜ { 672 671 run_timer_list(); 670 update_times(); 669 { 668 void timer_bh(void) ==================== kernel/timer.c 668 672 ==================== Ϣ TIMER_BH ᇍᑨⱘ timer_bh()೼ kernel/timer.c Ё˖ ޴⃵Ёᮁᯊˈᇍ do_softirq()ˈҢ㗠ᇍ timer_bh()ⱘᠻ㸠ህ᥼䖳ᑊϨড়ᑊњDŽ ⱘᠻ㸠ᰃ೼ϔϾᕾ⦃Ё䖯㸠ⱘˈ㗠 do_softirq()াᠻ㸠ϔ⃵DŽ䖭ḋˈᔧৠϔЁᮁ䗮䘧ݙ㋻᥹ⴔথ⫳њད ҹⳟߎˈᇍ݋ԧЁᮁ᳡ࡵ⿟ᑣⱘᠻ㸠Ϣᇍ do_softirq()ⱘᠻ㸠ϡᰃϔᇍϔⱘ݇㋏DŽᇍ݋ԧЁᮁ᳡ࡵ⿟ᑣ SA_INTERRUPT ᷛᖫԡ˅˗㗠 timer_bh()ⱘᠻ㸠߭≵᳝䖭МϹḐⱘ㽕∖DŽ݊⃵ˈ೼ do_IRQ()ⱘҷⷕЁৃ ࠡ䴶Ꮖ㒣ⳟࠄˈᠻ㸠 timer_interrupt()ⱘᭈϾ䖛⿟ЁЁᮁᰃ݇䯁ⱘ˄㾕ࠡ䴶ⱘˈܜ਼ᡬਸ਼˛佪 䖬Ӯ䯂ˈ᮶✊ timer_bh()㚃ᅮᰃ㽕ᠻ㸠ⱘˈЎҔМϡᑆ㛚ᡞᅗгᬒ೼ do_timer()Ёᠻ㸠ˈ㗠㽕䌍䖭Мѯ Ͼᚙ᱃Ёˈtimer_bh()㚃ᅮӮᕫࠄᠻ㸠ˈ㗠 tqueue_bh()߭೼ tq_timer 䯳߫䴲ぎᯊӮᕫࠄᠻ㸠DŽ䇏㗙г䆌 ᡬܹ do_softirq()এᑆᅗⱘĀৢञā੠Ā㄀Ѡ㘠ϮāDŽ೼៥Ӏ䖭ܜˈ೼䖨ಲ䗨Ёˈै೼⾏ᓔ do_IRQ()Пࠡ ໛ҹৢˈᯊ䩳Ёᮁ᳡ࡵⱘĀࠡञāህᅠ៤њDŽৃᰃ䇏㗙೼Ёᮁϔ㡖ЁᏆ㒣ⳟࠄˈCPUޚད䖭ѯخ೼ ⱘĀℷϮāⱘ䆱ˈ䙷М tqueue_bh()ⱘᠻ㸠֓ᰃĀ㄀Ѡ㘠ϮāњDŽ ໛偅ࡼā᳝݇ゴ㡖Ёҟ㒡DŽབᵰ䇈ˈᯊ䩳ЁᮁⱘĀࠡञ”timer_interrupt()੠Āৢञ”timer_bh()䖬ᰃᅗ ໪ˈ䖬᳝݊ᅗϔѯ bh ੠Ⳍᑨⱘ䯳߫ˈIMMEDIATE_BH ᰃ݊ЁПϔDŽ᳝݇䆺ᚙ៥Ӏᇚ೼Ā䖯⿟ā੠Ā䆒 䛑䇗⫼ϔ䘡DŽ⬅ℸৃ㾕ˈTQUEUE_BH ⹂ᅲᰃϔ乍ᕜ䞡㽕ⱘ෎⸔䆒ᮑDŽ䰸Ϣᯊ䩳ᣖ䩽ⱘ tq_timer 䯳߫ гᣖܹ tasklet_hi_vec ⱘ䯳߫Ёˈ䖭ḋݙḌህӮ೼ᠻ㸠 bh ᯊ䗮䖛 tqueue_bh()ᴹᇚ䆹䯳߫Ё᠔᳝ⱘߑ᭄ 㗠䖭䞠ⱘ 683 㸠߭Ẕᶹ tq_timer ᰃ৺ЎぎDŽབᵰϡЎぎህ䗮䖛 mark_bh()ᡞ bh_task_vec[TQUEUE_BH] Ͼ䯳߫ˈᛇ㽕䅽㋏㒳೼↣⃵ᯊ䩳Ёᮁᯊ䛑ᴹ䇗⫼ᶤϾߑ᭄˄ᔧ✊ᰃ೼㋏㒳ぎ䯈˅ˈህᇚ݊ᣖܹ䆹䯳߫䞠DŽ ᗢḋ䅽ᯊ䩳Ёᮁ↣⃵䛑ᴹ䇗⫼ᅗਸ਼˛TQUEUE_BH ህᰃЎ䖭⾡䳔㽕㗠䆒㕂ⱘDŽܼሔ䞣 tq_timer ᣛ৥ϔ ᭄ᯊህ䞛䲚᭄᥂ˈЎ༛᭄ᯊህ䅵ㅫᑊ䕧ߎDŽ䖭ḋህৃҹ㾷⃵䯂乬њDŽৃᰃˈي఼᭄ˈՓᕫ↣ᔧ䅵᭄Ў DŽ᠔ҹˈབᵰݭϾ⿟ᑣˈᑊϨ㛑೼↣⃵ᯊ䩳ЁᮁЁ䛑䇗⫼ᅗϔ⃵DŽ㗠೼⿟ᑣЁ߭䆒㕂ϔϾ䅵㋏݇ס᭄ include/asm•i386/param.hDŽ䗮ᐌ HZ ᅮНЎ 100ˈгे↣ 10ms ϔ⃵ᯊ䩳Ёᮁˈ䎳䳔㽕ⱘ 20ms ℷདᰃᭈ 246 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ߭˗㗠঺ϔᮍ䴶ˈ䖭гᇍ䆒䅵੠ᓔথⱘҎޚ䖭ᑨ䆹៤Ў㋏㒳⿟ᑣ䆒䅵˄⡍߿ᰃ䆒໛偅ࡼ⿟ᑣ˅ⱘϔ乍 㟇᳔ᇣDŽϔᮍ䴶ˈޣ䆌೼ᯊ䯈Ϟ᳝᠔Ԍ㓽ⱘᴵӊϟᅠ៤ˈ䖭ḋᠡ㛑ᇚᇍ㋏㒳ⱘᕅડܕⱘᴵӊϟˈҹঞ Ёᮁ⢊ᗕϟᠻ㸠ⱘাᰃᇥ䞣݇䬂ᗻⱘ᪡԰ˈ㗠໻䞣ⱘ᪡԰ሑৃ㛑㽕ᬒ೼↨䕗ᆑᵒⱘ⦃๗ϟˈेᓔЁᮁ ৃ㾕ˈ೼ᭈϾᯊ䩳Ёᮁ᳡ࡵⱘᳳ䯈ˈ໻䚼ߚⱘ᪡԰ᰃ೼Āৢञāˈे bh ߑ᭄Ёᅠ៤ⱘDŽⳳℷ೼݇ ᳳⱘᅲ⦄Ёϡ㛑ᏺখ᭄˅DŽབࠡ᠔䗄ˈ೼ᠻ㸠 bh ߑ᭄ᯊЁᮁᰃᠧᓔⱘDŽ expiresDŽ㒧ᵘЁⱘߑ᭄ᣛ䩜 function ᣛ৥乘ᅮ೼ࠄ⚍ᯊᠻ㸠ⱘ bh ߑ᭄ˈᑊϨৃҹᏺϔϾখ᭄ dataDŽ˄ᮽ ᭄䞣гϡফ䰤ࠊ˄ᮽᳳⱘᅲ⦄䞛⫼᭄㒘ˈ಴㗠ফࠄ᭄㒘໻ᇣⱘ䰤ࠊ˅DŽ↣Ͼᅮᯊ఼䛑᳝Ͼࠄ⚍ᯊ䯈 䖭ᰃϔϾ⫼Ѣ䫒㸼ⱘ᭄᥂㒧ᵘˈ䫒㸼ⱘ䭓ᑺᰃࡼᗕⱘ㗠ϡফ䰤ࠊˈ಴ℸ㋏㒳Ёৃҹ䆒㕂ⱘᅮᯊ఼ 25 }; 24 void (*function)(unsigned long); 23 unsigned long data; 22 unsigned long expires; 21 struct list_head list; 20 struct timer_list { ==================== include/linux/timer.h 20 25 ==================== ↣Ͼᅮᯊ఼䛑⬅ϔϾ timer_list ᭄᥂㒧ᵘҷ㸼ˈᅮНѢ include/linux/timer.h Ё˖ ⱘҷⷕDŽ ߑ᭄˅DŽ៥Ӏᇚ೼Ā䖯⿟Ϣ䖯⿟䇗ᑺāDŽϔゴЁ䆆䗄ᅮᯊ఼ⱘ䆒㕂ˈࠄ䙷ᯊݡಲ䖛ᴹ䯙䇏 run_timer_list() ৘ϾĀᅮᯊ఼ā˄timer˅ˈབᵰᶤϾᅮᯊ఼Ꮖ㒣Āࠄ⚍āህᠻ㸠ЎП乘ᅮⱘߑ᭄˄䖭ህᰃ䆹ᅮᯊ఼ⱘ bh Ң update_times()䖨ಲৢˈህᰃ timer_bh()ⱘЏԧ䚼ߚ run_timer_list()њDŽᅗẔᶹ㋏㒳ЁᏆ㒣䆒㕂ⱘ 䞣㋏㒳䋳㥋䕏䞡ⱘᣛᷛDŽ⬅Ѣ⍝ঞⱘЏ㽕ᰃ᭄ؐ䅵ㅫˈ᠔ҹ៥Ӏгϡ⏅ܹ䖯এњDŽ ⿃੠᳈ᮄϔ⃵㋏㒳೼䖛এⱘ 15 ߚ䩳ǃ10 ߚ䩳ҹঞ 1 ߚ䩳ݙᑇഛ᳝໮ᇥϾ䖯⿟໘Ѣৃᠻ㸠⢊ᗕˈ԰Ў㸵 ㄀Ѡӊџᰃ calc_load()ˈⳂⱘᰃ䅵ㅫ੠⿃㌃݇Ѣ CPU 䋳㥋ⱘ㒳䅵ֵᙃDŽݙḌ↣䱨 5 ⾦⾡䅵ㅫǃ㌃ ࠄњᯊ䕈Ϟⱘાϔ⚍DŽޚⳌᇍᑨⱘ jiffies ؐˈ㸼⼎Āᣖ䩳āᔧࠡⱘ䇏᭄Ꮖ㒣䕗 jiffies ϔḋᰃϾܼሔ䞣ˈᅗҷ㸼ⴔϢᔧࠡ xtime Ёⱘ᭄ؐ ڣ៥Ӏህϡ⏅ܹ䖯এњDŽ䖭䞠ⱘ wall_jiffies г xtime Ёⱘ᭄ؐˈࣙᣀ䅵᭄ˈ䖯ԡˈҹঞЎ㊒ᑺⳂⱘ㗠԰ⱘ᷵ℷDŽ᠔⍝ঞⱘЏ㽕гᰃ᭄ؐⱘ䅵ㅫ੠໘⧚ˈ њϸӊџDŽ㄀ϔӊџᰃ update_wall_time()ˈⳂⱘᰃ໘⧚᠔䇧Āᅲᯊᯊ䩳ā៪㗙䇈Āᣖ䩳”خ䖭䞠 666 } 665 calc_load(ticks); 664 write_unlock_irq(&xtime_lock); 663 } 662 update_wall_time(ticks); 661 wall_jiffies += ticks; 660 if (ticks) { 659 ticks = jiffies • wall_jiffies; 658 657 write_lock_irq(&xtime_lock); 656 */ 655 * need to save/restore the flags of the local CPU here. •arca 654 * just know that the irqs are locally enabled and so we don't update_times() is run from the raw timer_bh handler so we * 653 247 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᰃᏺ᳝ ret ᣛҸⱘ⿟ᑣ↉䛑⿄԰Āߑ᭄āDŽ㗠Ёᮁ᳡ࡵ⿟ᑣ੠㋏㒳䇗⫼ˈ⬅Ѣ ret˄ᅲ䰙Ϟᰃ iret˅ᣛҸ ⱘⳂⱘᰃ䗮䖛Āࡃ԰⫼āᴹԧ⦄ⱘDŽԚᰃˈ೼ C 䇁㿔ᡞ᠔᳝ৃҹ䗮䖛䇗⫼ᣛҸᴹ䇗⫼ⱘ⿟ᑣ↉ˈгህ sethostname()䖭ḋⱘˈ䖨ಲⱘؐᅲ䰙ϞাᰃϔϾᰃ৺៤ࡳⱘᷛᖫˈ㗠䇗⫼ ڣህᰃ䖭ḋ˗㗠঺ϔ㉏ህᰃ ⱘ㋏㒳䇗⫼ৃҹߚ៤ϸ㉏˖ϔ㉏↨䕗᥹䖥ѢⳳℷᛣНϞⱘĀߑ᭄āˈ䇗⫼ⱘ㒧ᵰህᰃߑ᭄ؐˈ՟བ getpid() 㸼⼎༅䋹DŽ༅䋹ᯊ⫼᠋⿟ᑣЁⱘܼሔব䞣 error ৿᳝݋ԧⱘߎ䫭ҷⷕDŽҢ⿟ᑣ䆒䅵ⱘ㾖⚍ᴹⳟˈLinux খ᭄ name ህᰃ㽕䆒㕂ⱘЏᴎৡˈ㗠 len ߭Ў䆹ᄫヺІⱘ䭓ᑺDŽ䇗⫼㒧ᴳৢ䖨ಲ 0 㸼⼎៤ࡳˈ•1 ߭ int sethostname(cost char *name, size_t len); ㅔऩ˖ ㋏㒳䇗⫼ sethostname()ⱘࡳ㛑䴲ᐌㅔऩˈህᰃ䆒㕂䅵ㅫᴎ˄೼㔥㒰Ёⱘ˅ĀЏᴎৡāˈ݊Փ⫼гᕜ 䇗⫼ᴎࠊDŽ sethostname()԰Ўᚙ᱃ˈ䗮䖛ᇍ CPU ೼䖭Ͼ㋏㒳䇗⫼ܼ䖛⿟Ё᠔䍄䖛ⱘ䏃㒓ⱘߚᵤˈҟ㒡ݙḌⱘ㋏㒳 ⬅Ѣ៥Ӏᑊϡ݇ᖗݙḌ೼݋ԧ㋏㒳䇗⫼Ё᠔ᦤկⱘ᳡ࡵˈ᠔ҹ䗝ᢽњϔϾ䴲ᐌㅔऩⱘ䇗⫼ ݡ䞡໡DŽ ⡍߿ᰃЁᮁ䖛⿟ϔ㡖㒧ড়䯙䇏DŽџᅲϞˈ᳝ѯҷⷕህᰃѠ㗙݅⫼ⱘˈ޵ᰃҹࠡᏆ㒣ҟ㒡䖛ⱘᴀ㡖ህϡ ⱘ᳔ḍᴀⱘǃ᳔䞡㽕ⱘ෎⸔䆒ᮑDŽ⬅Ѣ㋏㒳䇗⫼ϢЁᮁⱘ݅ৠᗻˈ䇏㗙೼䯙䇏ᴀ㡖ᯊᑨ䆹Ϣࠡ޴㡖ˈ ՟ᄤˈԚᑊϡҢࡳ㛑ⱘ㾦ᑺᴹ݇ᖗ݋ԧⱘ䇗⫼ˈ㗠ᰃⴔⴐѢ䖭Ͼ݀݅ⱘ䖛⿟DŽ㋏㒳䇗⫼ᰃݙḌ᠔ᦤկ ᶤϾ⡍ᅮⱘ䇗⫼ˈ㗠ᰃ᠔᳝ⱘ㋏㒳䇗⫼䛑㽕㒣ग़ⱘ݅ৠⱘ䖛⿟DŽ㱑✊៥Ӏ䗝ᢽњϔϾ݋ԧⱘ䇗⫼԰Ў 㒳䇗⫼Ё䖯ܹ㋏㒳ぎ䯈ˈҹঞ೼ᅠ៤њ᠔䳔ⱘ᳡ࡵҹৢҢ㋏㒳ぎ䯈䖨ಲⱘ䖛⿟DŽ䖭Ͼ䖛⿟ᑊϡሔ䰤Ѣ 䰅䮼ā៪ĀЁᮁ䮼ā䖯ܹ㋏㒳ぎ䯈ⱘᴎࠊˈҹঞ IDT 㸼Ё䱋䰅䮼ⱘ߱ྟ࣪DŽᴀ㡖ᇚⴔ䞡ҟ㒡䖯⿟೼㋏ Linux ⱘ㋏㒳䇗⫼ᰃ䗮䖛ЁᮁᣛҸ“int $0x80āᅲ⦄ⱘDŽ៥ӀᏆ㒣೼ࠡ䴶޴㡖Ё䅼䆎䖛䖯⿟䗮䖛Ā䱋 བᵰ⫼᠋⿟ᑣⶹ䘧݋ԧߑ᭄ⱘܹষഄഔˈህৃҹ㒩䖛Ā㋏㒳䇗⫼ā㗠Ⳉ᥹䇗⫼䖭ѯߑ᭄DŽ 㾘ᅮད৘⾡ߑ᭄ܹষഄഔⱘ᱂䗮ߑ᭄䇗⫼≵᳝໮໻ϡৠDŽܜгᰃ䗮䖛ЁᮁᣛҸ INT ᴹᅲ⦄ⱘˈԚᰃ䎳乘 Ёˈ՟བ DOSˈ᠔䇧㋏㒳䇗⫼ᅲ䰙Ϟাϡ䖛ᰃࡼᗕ䖲᥹ⱘᑧߑ᭄䇗⫼㗠ᏆDŽ㱑✊೼ DOS 䞠䴶㋏㒳䇗⫼ њ䖭ḋⱘ᠟↉ˈгህ᮴᠔䇧Āֱᡸ῵ᓣāњDŽⳌ↨Пϟˈ೼ϡߚĀ⫼᠋ᗕā੠Ā㋏㒳ᗕāⱘ᪡԰㋏㒳 㗠㋏㒳䇗⫼ैাথ⫳Ѣ⫼᠋ぎ䯈ˈ䖭জᰃѠ㗙ϡৠⱘഄᮍDŽ䖭䞠ˈ݇䬂ᰃ CPU 䖤㸠⢊ᗕⱘᬍবˈ≵᳝ 㒳ぎ䯈ˈ䖭ϔϾ෎ᴀ⚍ϞѠ㗙ᰃϔ㟈ⱘDŽᔧ✊ˈЁᮁ᳝ৃ㛑থ⫳೼ CPU Ꮖ㒣䖤㸠೼㋏㒳ぎ䯈ⱘᯊ׭ˈ 䖬ᰃ᳝ᕜ໻ⱘ݅ᗻDŽ䖭ᰃ಴Ўˈ೼Փ CPU ⱘ䖤㸠⢊ᗕҢ⫼᠋ᗕ䕀ܹ㋏㒳ᗕˈгህᰃҢ⫼᠋ぎ䯈䕀ܹ㋏ ܹ㋏㒳ぎ䯈DŽⳌ↨ПϟˈЁᮁⱘথ⫳ᏺ᳝ᕜ໻ⱘϡৃ乘⌟ᗻDŽԚᰃˈሑㅵ᳝ⴔ䖭ḋⱘऎ߿ˈѠ㗙П䯈 㗠Āৠℹā߭ᰃ䇈ˈCPU˄ᅲ䰙Ϟᰃ䕃ӊⱘ䆒䅵Ҏਬ˅⹂ߛഄⶹ䘧೼ᠻ㸠ાϔᴵᣛҸҹৢህϔᅮӮ䖯 䅵ߦདњⱘ㸠ЎDŽܜࡼഄǃৠℹഄ䖯ܹ㋏㒳ぎ䯈ⱘ᠟↉DŽ䖭䞠᠔䇧ĀЏࡼāˈᰃᣛ CPUĀ㞾ᜓāⱘǃџ བᵰ䇈໪䚼ЁᮁᰃՓ CPU 㹿ࡼഄǃᓖℹഄ䖯ܹ㋏㒳Ё䯈ⱘϔ⾡᠟↉ˈ䙷М㋏㒳䇗⫼ህᰃ CPU Џ 3.8 ㋏㒳䇗⫼ ᇍ㋏㒳᳝⏅ࠏⱘ⧚㾷DŽ ਬᦤߎњᕜ催ⱘ㽕∖ˈ಴Ў㽕ऎߚϔ乍᪡԰ᰃ৺ᖙ乏೼ĀࠡञāЁᠻ㸠ˈҹঞᰃ৺ᖙ乏݇Ёᮁˈ䳔㽕 248 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 3 Disassembly of section .text: 2 1 sysdep.o: file format elf32•i386 ೼䖲᥹ᯊӮᡞഄഔ__syscall_error()฿ܹ䆹໘DŽߑ᭄__syscall_error()г೼ libc.a Ё˖ __syscall_error()ᑊҢ䙷䞠䖨ಲDŽ䖭䞠ⱘ 1a:R_386_PC32 㸼⼎ഄഔ sethostname+0x1a ໘Ў䞡ᅮԡֵᙃˈ %eax Ёⱘݙᆍᰃ೼ 0xfffff001 Ϣ 0xffffffff П䯈ˈгህᰃ•1 㟇•4095 П䯈ˈ䙷ህᰃߎ䫭њˈህ㽕䕀৥ gcc ೼Փ⫼ᆘᄬ఼ᯊӮ䙉ᅜ䖭Ͼ㑺ᅮ˅DŽ✊ৢህᰃẔᶹ㋏㒳䇗⫼ⱘ䖨ಲؐˈ䙷ᰃ೼ᆘᄬ఼%eax ЁDŽབᵰ ⱘݙᆍህ϶༅њˈ䖭ᰃϔ⾡㑺ᅮˈܜⱘݙᆍˈ䙷ᰃ೼㋏㒳䇗⫼Пֱࠡᄬ೼%edx Ёⱘ˄%edx Ёॳܜॳ ᰃҢ%edx Ёᘶ໡%ebxܜDŽ佪މⳟϔϟҢ㋏㒳䇗⫼䖨ಲҹৢⱘᚙܜ៥Ӏ᱖Ϩϡ䱣ⴔ CPU 䖯ܹݙḌˈ㗠 䇏প䖭ѯখ᭄ˈԚ↩コ↨䕗䌍џњDŽ㗠䗮䖛ᆘᄬ఼ᴹӴ䗦খ᭄ˈ߭䇏㗙ϟ䴶ӮⳟࠄˈᰃϾᎻ཭ⱘᅝᥦDŽ ⫼᠋ේᷜЁˈ㗠䖯ܹ㋏㒳ぎ䯈ҹৢህᤶ៤њ㋏㒳ේᷜDŽ㱑✊䖯ܹ㋏㒳ぎ䯈Пৢг䖬ৃҹҢ⫼᠋ේᷜЁ ⬅Ѣ䖤㸠㑻߿ⱘবࡼˈ㽕Ң⫼᠋ේᷜ䕀ᤶࠄ㋏㒳ේᷜDŽབᵰ೼ INT ᣛҸПࠡᡞখ᭄य़ܹේᷜˈ䙷ᰃ೼ ЎҔМ㽕⫼ᆘᄬ఼Ӵ䗦খ᭄˛䇏㗙г䆌䖬䆄ᕫ˖ᔧ CPU こ䖛䱋䰅䮼ˈҢ⫼᠋ぎ䯈䖯ܹ㋏㒳ぎ䯈ᯊˈ ᆘᄬ఼㗠ϡᰃ䗮䖛ේᷜӴ䗦খ᭄ⱘDŽ ᆘᄬ఼%eaxˈ᥹ⴔህᰃЁᮁᣛҸ“int $0x80āDŽ䖭䞠ˈ䇏㗙Ꮖ㒣ⳟࠄˈLinux ݙḌ೼㋏㒳䇗⫼ᯊᰃ䗮䖛 ✊ৢˈজᇚখ᭄ name ҢේᷜЁᄬܹᆘᄬ఼%ebxDŽ᳔ৢᰃᇚҷ㸼 sethostname()ⱘ㋏㒳䇗⫼ো 0x4a ᄬܹ ᆘᄬ఼%esp ⱘԡ⿏Ў 0x8˄ԡ⿏ऩԡЎ 1˅໘ⱘݙᆍ˄೼៥Ӏ䖭Ͼᚙ᱃Ёህᰃখ᭄ len˅ᄬܹᆘᄬ఼%ecxDŽ ԡ῵ᓣˈ᠔᳝ⱘখ᭄䛑ᰃᣝ 32 ԡ䭓ᭈ᭄य़ܹේᷜⱘDŽᣛҸ“movl 0x8(%esp, 1), %ecxā㸼⼎ᇚⳌᇍѢ 䇗⫼䆹ߑ᭄ᯊⱘ㄀ϔϾখ᭄˄name˅ˈࡴ 8 ⱘഄᮍЎ㄀ѠϾখ᭄ lenˈձ⃵㉏᥼DŽ⬅Ѣ i386 䖤㸠Ѣ 32 䖯ܹߑ᭄ sethostname()ҹৢˈේᷜᣛ䩜%esp ᣛ৥䖨ಲഄഔˈ㗠೼ේᷜᣛ䩜ⱘݙᆍࡴ 4 ⱘഄᮍ߭ᰃ 49 1e: c3 ret 48 18: 0f 83 fc ff ff ff jae 1a 47 13: 3d 01 f0 ff ff cmp $0xfffff001,%eax 46 11: 89 d3 mov %edx,%ebx 45 f: cd 80 int $0x80 44 a: b8 4a 00 00 00 mov $0x4a,%eax 43 6: 8b 5c 24 04 mov 0x4(%esp,1),%ebx 42 2: 8b 4c 24 08 mov 0x8(%esp,1),%ecx 41 0: 89 da mov %ebx,%edx 40 00000000 : 39 38 Disassembly of section .text: 37 36 sethostname.o: file format elf32•i386 㓪ҷⷕгᰃ᳝ད໘ⱘDŽ⡍߿ᰃᇍѢ㋏㒳⿟ᑣਬᴀ䇈ˈ䯙䇏੠Փ⫼∛㓪䇁㿔гᰃϔ⾡᳝⫼ⱘᡔ㛑DŽ 䞠䞛⫼Ң libc.a ড∛㓪ᕫࠄⱘҷⷕDŽॳ಴ᰃˈϔᴹᮍ֓ˈĀᕫᴹܼϡ䌍Ꮉ໿ā˗Ѡᴹˈ䇏㗙໮᥹㾺ϔѯ∛ ᭄ЁথߎⱘDŽGNU ⱘ C 䇁㿔ᑧߑ᭄ⱘ⑤ҷⷕгᰃ݀ᓔⱘˈৃҹҢ GNU ⱘ㔥キϟ䕑DŽԚᰃˈ៥Ӏ೼䖭 Ӏⱘᚙ᱃ߚᵤDŽ݊ᅲˈsethostname()ᰃϔϾᑧߑ᭄˄೼/usr/lib/libc.a˅ˈ㗠ᅲ䰙ⱘ㋏㒳䇗⫼ህᰃ೼䙷Ͼߑ Ўњᐂࡽ䇏㗙᳈དഄ⧚㾷㋏㒳䇗⫼ⱘܼ䖛⿟ˈ៥ӀҢ⫼᠋ぎ䯈ᇍߑ᭄ sethostname()ⱘ䇗⫼ᓔྟ៥ ⱘᄬ೼гህ៤њĀߑ᭄āDŽ៥Ӏ೼䅼䆎Ёгᇚ䙉ᕾ C 䇁㿔ⱘ㾘ᅮ੠Ӵ㒳ϔὖ⿄ПЎߑ᭄DŽ 249 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! བࠡ᠔䗄ˈCPU ೼こ䖛䱋䰅䮼䖯ܹ㋏㒳ݙḌᯊᑊϡ㞾ࡼ݇Ёᮁˈ᠔ҹ㋏㒳䇗⫼ⱘ䖛⿟ᰃৃЁᮁⱘDŽ ಲ䖛༈এ䞡⏽ϔϟDŽܜᗕˈ䇏㗙ϡོ ᷜᤶ៤њ㋏㒳ේᷜˈⳌᔧѢ CPU ೼থ⫳Ѣ⫼᠋ぎ䯈ⱘ໪䚼Ёᮁ䖛⿟Ёࠄ䖒 IRQ0xYY_interrupt ᯊⱘ⢊ ᭄ᣛ䩜ᣛ৥ system_call()DŽᔧ CPU ࠄ䖒 system_call()ᯊˈᏆ㒣Ң⫼᠋ᗕߛᤶࠄњ㋏㒳ᗕˈᑊϨҢ⫼᠋ේ ᣛ৥ᔧࠡⱘЁᮁ৥䞣㸼 IDTˈ㗠 IDT ЁᇍᑨѢ 0x80 ⱘ㸼乍ህᰃЎ INT $0x80 䆒㕂ⱘ䱋䰅䮼ˈ݊Ёⱘߑ 㑻߿ DPL Ў 3DŽᆘᄬ఼ IDTRܹޚ㑻߿Ϣ CPU ⱘᔧࠡ䖤㸠㑻߿DŽЎ㋏㒳䇗⫼䆒㕂ⱘ䱋䰅䮼ⱘܹޚ㾘ᅮⱘ 㑻߿ⱘˈ㗠೼䗮䖛 INT ᣛҸこ䍞Ёᮁ䮼៪䱋䰅䮼ᯊˈ߭㽕Ḍᇍ᠔ܹޚᮁ䮼ᯊᰃϡẔᶹЁᮁ䮼᠔㾘ᅮⱘ ⿟Ϣথ⫳Ёᮁᯊこ䖛Ёᮁ䮼ⱘ䖛⿟Ⳍৠˈ䖭䞠ህϡ䞡໡њDŽϡ䖛ˈ䖬ᰃ㽕ᣛߎˈ಴໪䚼Ёᮁ㗠こ䖛Ё ᧲⏙њথ⫳೼⫼᠋ぎ䯈ⱘ䖛⿟ˈ៥Ӏህ䖯ܹݙḌˈгህᰃ㋏㒳ぎ䯈ЁএњDŽCPU こ䖛䱋䰅䮼ⱘ䖛 ⫼˄䖨ಲᭈ᭄ⱘ䇗⫼˅䖨ಲؐⱘ㑺ᅮDŽ 䖛ᆘᄬ఼%eax 䖨ಲࠄ⫼᠋䖯⿟ⱘ᭄ؐ֓ᰃ•1ˈ㗠 errono ߭৿᳝݋ԧⱘߎ䫭ҷⷕDŽ䖭ᰃᇍ໻䚼ߚ㋏㒳䇗 ߎߎ䫭ҷⷕ㟇%ecxǃᑊᇚ݊ݭܹܼሔ䞣 erronoDŽ᳔ৢˈ೼䖨ಲПࠡˈᇚ%eax ⱘݙᆍᬍ៤•1DŽ䖭ḋˈ䗮 ᇚ݊य़ܹේᷜDŽ᥹ⴔˈজ䇗⫼__errno_location()ˈᇚܼሔ䞣 errono ⱘഄഔপܹ%eaxDŽ✊ৢҢේᷜЁᡯ প%eax ݙᆍⱘ䋳ؐˈՓ᭄݊ؐব៤ 1̚4095 П䯈ˈ䖭ህᰃߎ䫭ҷⷕˈᑊܜˈ೼__syscall_error Ё 33 21: eb f1 jmp 14 <__errno_location+0x14> 32 1b: 8b 80 b8 01 00 00 mov 0x1b8(%eax),%eax 31 16: e8 fc ff ff ff call 17 <__errno_location+0x17> 30 15: c3 ret 29 14: c9 leave 28 f: b8 00 00 00 00 mov $0x0,%eax 27 d: 75 07 jne 16 <__errno_location+0x16> 26 b: 85 c0 test %eax,%eax 25 6: a1 00 00 00 00 mov 0x0,%eax 24 3: 83 ec 08 sub $0x8,%esp 23 1: 89 e5 mov %esp,%ebp 22 0: 55 push %ebp 21 00000000 <__errno_location>: 20 19 Disassembly of section .text: 18 17 errno•loc.o: file format elf32•i386 16 15 14 10: c3 ret 13 b: b8 ff ff ff ff mov $0xffffffff,%eax 12 9: 89 08 mov %ecx,(%eax) 11 8: 59 pop %ecx 10 3: e8 fc ff ff ff call 4 <__syscall_error_1+0x2> 9 2: 50 push %eax 8 00000002 <__syscall_error_1>: 7 6 0: f7 d8 neg %eax syscall_error>:__> 00000000 5 4 250 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 244 tracesys: ==================== arch/i386/kernel/entry.S 244 254 ==================== Ё˖ ᔧ PT_TRACESYS ᷛᖫԡ˄0x20˅Ў 1 ᯊˈህ㽕䕀ܹ tracesysˈ݊ҷⷕг೼ arch/i386/kernel/entry.S arch/i386/kernel/entry.S Ёⱘ 75 㸠ᅮНЎ 4DŽ䖭ϔ⚍ҹࠡᏆ㒣䆆䖛ˈ䖭䞠ݡᦤ䝦ϔϟDŽ Ѣ%ebx ⱘݙᆍˈгህᰃᔧࠡ䖯⿟ⱘ task_struct 㒧ᵘᣛ䩜ǃԡ⿏Ў flags ໘ⱘഄഔˈ㗠 flags ೼ ೼Ẕᶹᔧࠡ䖯⿟ⱘ PT_TRACESYS ᰃ৺Ў 1DŽ⊼ᛣˈflags(%ebx)ᑊϡᰃϔϾߑ᭄䇗⫼ˈ㗠ᰃ㸼⼎Ⳍᇍ ㋏㒳Ё᳝ϔᴵੑҸ strace ህᰃᑆ䖭ӊџⱘˈᰃϔϾᕜ᳝⫼ⱘᎹ݋DŽ䖭䞠 system_call()Ёⱘ㄀ 201 㸠ህᰃ ㋏㒳䇗⫼ ptrace()ˈᇚϔϾᄤ䖯⿟ⱘ PT_TRACESYS ᷛᖫԡ䆒៤ 1ˈҢ㗠䎳䏾䆹ᄤ䖯⿟ⱘ㋏㒳䇗⫼DŽLinux ೈDŽ೼ task_struct ᭄᥂㒧ᵘЁ᳝Ͼ៤ߚ flagsˈ݊Ё᳝Ͼᷛᖫԡি PT_TRACESYSDŽϔϾ䖯⿟ৃҹ䗮䖛 GET_CURRENT ៥Ӏᇚ೼䖯⿟ϔゴЁҟ㒡˅DŽ✊ৢˈህẔᶹᆘᄬ఼%eax Ёⱘ㋏㒳䇗⫼োᰃ৺䍙ߎњ㣗 ᅣ䇗⫼ GET_CURRENT(%ebx) Փᆘᄬ఼%ebx ᣛ৥ᔧࠡ䖯⿟ⱘ task_struct 㒧ᵘ˄݇Ѣ ᅮⱘˈ㗠᳝݊⡍⅞ⱘ㗗㰥DŽއᑊϡᰃ䱣ᛣ 㒳䇗⫼Ё⣀ゟӴ䗦ⱘখ᭄ϡ㛑䍙䖛 5 ϾDŽҢ䖭䞠гৃҹⳟߎˈSAVE_ALL Ёᇚᆘᄬ఼य़ܹේᷜⱘ⃵ᑣ ᣛ䩜ⱘˈгϡ㛑⫼ᴹӴ䗦খ᭄DŽ䖭ḋˈᅲ䰙Ϟህা᳔᳝ৢ 5 Ͼᆘᄬ఼ৃҹ⫼ᴹӴ䗦খ᭄ˈ᠔ҹˈ೼㋏ ো˄Ϣ orig_ax Ⳍৠ˅ˈᰒ✊ϡ㛑ݡ⫼ᴹӴ䗦খ᭄˗㗠%ebp ᰃ⫼԰ᄤ⿟ᑣ䇗⫼䖛⿟ЁⱘĀᏻā˄frame˅ ఼ձ⃵ЎDŽ%esǃ%dsǃ%eaxǃ%ebpDŽ%ediǃ%esiǃ%edxǃ%ecx ੠%ebxDŽ䖭䞠ⱘ%eax ᣕ᳝㋏㒳䇗⫼ ህ៤Ўখ᭄ 1ˈ㗠%ecx ⱘݙᆍህᰃখ᭄ 2 њDŽಲࠄ SAVE_ALL এⳟϔϟˈৃҹⳟࠄ㹿य़ܹේᷜⱘᆘᄬ ೼%ebx ੠%ecx ЁDŽ೼ SAVE_ALL Ё%ebx ᰃ᳔ৢय़ܹේᷜⱘˈ%ecx ⃵ПDŽ᠔ҹේᷜЁ%ebx ⱘݙᆍ ᥂䳔㽕԰Ў⣀ゟⱘখ᭄Ӵ䗦㒭݋ԧⱘ᳡ࡵ⿟ᑣDŽҹ sethostname()Ў՟ˈ䳔㽕Ӵ䗦ⱘখ᭄ᰃϸϾˈߚ߿ 㗙೼Ёᮁ᳡ࡵϔ㡖ЁᏆ㒣ⳟࠄDŽৃᰃˈ೼㋏㒳䇗⫼Ёህϡৠњˈ䖭䞠ේᷜЁ↣Ͼᆘᄬ఼ⱘݙᆍৃҹḍ ᰃ԰ЎϔϾ pt_regs ᭄᥂㒧ᵘˈᔧ៤খ᭄Ӵ䗦㒭 do_IRQ()ˈ✊ৢজӴ䗦㒭݋ԧⱘ᳡ࡵ⿟ᑣⱘˈ䖭ϔ⚍䇏 ϡϔḋⱘDŽ೼Ёᮁ䖛⿟ЁˈSAVE_ALL ҹৢˈᔧ䇗⫼݋ԧⱘЁᮁ᳡ࡵ⿟ᑣᯊᏆ㒣ֱᄬ೼ේᷜЁⱘݙᆍ ӀᏆ㒣೼Ёᮁ䖛⿟ϔ㡖Ёⳟࠄ䖛њDŽԚᰃˈ䖭䞠㽕ᣛߎˈᇍѢय़ܹේᷜЁⱘᆘᄬ఼ݙᆍⱘՓ⫼ᮍᓣᰃ 䖛⿟Ё⫼ᴹֱᄬ˄㒣䖛বᔶⱘ˅Ёᮁ䇋∖োˈ㗠೼㋏㒳䇗⫼Ё߭⫼ᴹֱᄬ㋏㒳䇗⫼োDŽSAVE_ALL ៥ ᰃᇚᆘᄬ఼%eax ⱘݙᆍय़ܹේᷜDŽ㋏㒳ේᷜЁⱘ䖭Ͼԡ㕂೼ҷⷕЁ⿄Ў orig_axˈ೼໪䚼Ёᮁܜ佪 205 ENTRY(ret_from_sys_call) 204 movl %eax,EAX(%esp) # save the return value 203 call *SYMBOL_NAME(sys_call_table)(,%eax,4) 202 jne tracesys 201 testb $0x02,tsk_ptrace(%ebx) # PT_TRACESYS 200 jae badsys 199 cmpl $(NR_syscalls),%eax 198 GET_CURRENT(%ebx) 197 SAVE_ALL 196 pushl %eax # save orig_eax 195 ENTRY(system_call) ==================== arch/i386/kernel/entry.S 195 205 ==================== ߑ᭄ system_call()ⱘҷⷕ೼ arch/i386/kernel/entry.S Ё˖ 251 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᭄ capable(CAP_SYS_ADMIN)Ẕᶹᔧࠡ䖯⿟ᰃ৺ѿ᳝ CAP_SYS_ADMIN ⱘᥜᴗDŽབ≵᳝ⱘ䆱ህ䖨ಲ Ẕᶹ䖭ϔ⚍DŽߑܜৃᛇ㗠ⶹˈsethostname ᑨ䆹ᰃা᳝⡍ᴗ⫼᠋ᠡৃҹ䖯㸠ⱘ᪡԰ˈ᠔ҹϔϞᴹህ 987 } 986 return errno; 985 up_write(&uts_sem); 984 } 983 errno = 0; 982 system_utsname.nodename[len] = 0; 981 if (!copy_from_user(system_utsname.nodename, name, len)) { 980 errno = •EFAULT; 979 down_write(&uts_sem); 978 return •EINVAL; 977 if (len < 0 || len > __NEW_UTS_LEN) 976 return •EPERM; 975 if (!capable(CAP_SYS_ADMIN)) 974 973 int errno; 972 { 971 asmlinkage long sys_sethostname(char *name, int len) ==================== kernel/sys.c 971 987 ==================== ᠔ҹ೼៥Ӏ䖭Ͼᚙ᱃Ёህ䖯ܹњ sys_sethostname()ˈ䖭гᰃ೼ kernel/sys.c ЁᅮНⱘ˖ 䏇䕀㸼Ёԡ⿏Ў 0x4aˈгህᰃ 74 ໘ⱘߑ᭄ᣛ䩜˄㾕ৢ䴶䏇䕀㸼Ёⱘ 500 㸠˅Ў sys_sethostnameˈ ಲؐ•1ˈ㗠ܼሔ䞣 errno ⱘؐЎ ENOSYS •ENOSYSˈ㸼⼎䆹㋏㒳䇗⫼ᇮ᳾ᅲ⦄DŽ㒧ড়ࠡ䴶䆆䖛ⱘ libc.a Ёⱘ໘⧚ˈৃⶹℸᯊ⫼᠋⿟ᑣӮᕫࠄ䖨 㸼Ё޵ᰃݙḌϡᬃᣕⱘ㋏㒳䇗⫼োܼ䚼䛑ᣛ৥ sys_ni_syscall()ˈ䖭Ͼߑ᭄াᰃ䖨ಲϔϾߎ䫭ҷⷕ˖ ᣛ䩜᭄㒘ˈ⬅Ѣ㆛ᐙ䕗໻ˈ៥Ӏᡞᅗऩ⣀԰Ўϔ㡖ˈ䰘೼ᴀゴПৢDŽ ㅫԡ⿏˄%eax ⳌᇍѢ sys_call_table˅ᯊⱘऩԡЎ 4 ᄫ㡖DŽ㋏㒳䇗⫼䏇䕀㸼 sys_call_table[]ᰃϔϾߑ᭄ ЁDŽ㸼䖒ᓣ(,%eax,4)ⱘ㄀ϔϾ䗫োࠡ䴶Ўぎˈ㸼⼎೼%eax ⱘ෎⸔Ϟᑊ≵᳝݊ᅗⱘԡ⿏ˈ㗠 4 ߭㸼⼎䅵 ㋴ܗߑ᭄ᣛ䩜Ёˈ㗠䖭Ͼߑ᭄ᣛ䩜೼᭄㒘 sys_call_table[]Ёҹ%eax ⱘݙᆍЎϟᷛǃऩԡЎ 4 Ͼᄫ㡖ⱘ 㞾ᏅⳟⳟDŽ⦄೼ಲࠄ system_call()Ё㒻㓁ⳟ䙷䞠ⱘ 203 㸠DŽ䖭ᰃϔᴵ call ᣛҸˈ᠔ call ⱘഄഔ೼ϔϾܜ 䇗⫼ⱘ䖯ܹ੠䖨ಲDŽ៥Ӏᇚ೼䆆䗄䖯⿟䯈䗮ֵᯊݡ⏅ܹࠄ syscall_trace()ЁএˈԚᰃ᳝݈䍷ⱘ䇏㗙ϡོ Ў 1 ᯊˈ೼䇗⫼݋ԧⱘ᳡ࡵ⿟ᑣПࠡ੠Пৢ䛑㽕䇗⫼ϔϟߑ᭄ syscall_trace()ˈ৥⠊䖯⿟᡹ਞ݋ԧ㋏㒳 ᇚ䖭ϔ↉⿟ᑣϢࠡ䴶ℷᐌᠻ㸠ᯊⱘ 203 㸠԰ϔ↨䕗ˈህৃҹⳟࠄϡৠП໘೼Ѣ˖ᔧ PT_TRACESYS 254 jmp ret_from_sys_call 253 call SYMBOL_NAME(syscall_trace) 252 tracesys_exit: 251 movl %eax,EAX(%esp) # save the return value 250 call *SYMBOL_NAME(sys_call_table)(,%eax,4) 249 jae tracesys_exit 248 cmpl $(NR_syscalls),%eax 247 movl ORIG_EAX(%esp),%eax 246 call SYMBOL_NAME(syscall_trace) movl $•ENOSYS,EAX(%esp) 245 252 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 55 return n; 54 __copy_user_zeroing(to,from,n); 53 if (access_ok(VERIFY_READ, from, n)) 52 { 51 __generic_copy_from_user(void *to, const void *from, unsigned long n) 50 unsigned long ==================== arch/i386/lib/usercopy.c 50 56 ==================== ϟ߭䗮䖛__generic_copy_user()ᴹᅠ៤DŽ݊ҷⷕ೼ arch/i386/lib/usercopy.c Ё˖މ೼ϔ㠀ⱘᚙ ᔧ໡ࠊⱘ䭓ᑺЎϔѯ⡍⅞ⱘᐌ᭄ˈ՟བ 4ǃ8ǃĂǃ512 ㄝㄝᯊˈ݋ԧⱘ᪡԰㽕⬹Ўㅔऩϔѯˈ㗠 570 __generic_copy_from_user((to),(from),(n))) 569 __constant_copy_from_user((to),(from),(n)) : \ 568 (__builtin_constant_p(n) ? \ 567 #define copy_from_user(to,from,n) \ ==================== include/asm•i386/uaccess.h 567 570 ==================== ⱘ˖ ᕜ䞡㽕ǃгᕜᐌ⫼ⱘDŽᇍѢ i386 CPUˈᅣ᪡԰ copy_from_user()ᰃ೼ include/asm•i386/uaccess.h ЁᅮН ℸˈᇍѢ㋏㒳䇗⫼ⱘᅲ⦄ˈ㉏ԐѢ copy_from_user()䖭ḋ೼⫼᠋ぎ䯈੠㋏㒳ぎ䯈П䯈໡ࠊ᭄᥂ⱘ᪡԰ᰃ ᄬ఼Ӵ䗦ⱘֵᙃ䞣ᰒ✊ϡ໻ˈ᠔ҹӴ䗦ⱘখ᭄໻໮ᰃᣛ䩜ˈ䖭ḋᠡ㛑䗮䖛ᣛ䩜ᡒࠄ᳈໻ഫⱘ᭄᥂DŽ಴ Ͼᅣ᪡԰ copy_from_user()ᴹᅠ៤໡ࠊDŽབࠡ᠔䗄ˈ㋏㒳䇗⫼ᯊᰃ䗮䖛ᆘᄬ఼Ӵ䗦খ᭄ⱘˈ㛑໳䗮䖛ᆘ ḌЁⱘ system_utsname.nodenameDŽ䖭Ͼ᪡԰ⱘ⑤೼⫼᠋ぎ䯈Ёˈ㗠Ⳃᷛ೼㋏㒳Ё䯈Ёˈ᠔ҹ㽕䗮䖛ϔ ϟ䴶ˈህᰃᴀ⃵㋏㒳䇗⫼᠔㽕ᅠ៤ⱘᅲ䋼ᗻⱘ᪡԰њˈ䖭ህᰃᇚখ᭄ name ᠔ᣛ৥ⱘᄫヺІݭܹݙ њѦⳌᡶ䘧DŽܡ䖤㸠ˈҢ㗠䙓ܜ᱖㓧ˈ䅽߿ⱘ䖯⿟ ⾡ֱᡸˈϞ䗄䖛⿟Ёᔧ䖯⿟ C ࠄ䖒 979 㸠ᯊӮথ⦄Ꮖ㒣᳝Ͼ䖯⿟ℷ೼䞠䴶᪡԰ˈĀ䇋࣓ᠧᡄāˈ㗠㞾ᜓ sys_sethostname()Ё 979 㸠ⱘ down_write()੠ 985 㸠ⱘ up_write()᠔ᅲ⦄ⱘℷᰃ䖭ḋⱘֱᡸᴎࠊDŽ᳝њ䖭 system_utsname.nodenamer ⱘ᪡԰ᬒ೼ফࠄĀֵো䞣ā˄semaphore ˅ֱᡸⱘĀЈ⬠ऎāЁˈ㗠 থ⫳ˈህ㽕ᇚᇍމ೼᪡԰㋏㒳⧚䆎Ёˈ䖭⾡⦄䈵⿄Ў“race conditionā˄ᡶ䘧˅DŽЎњ䰆ℶ䖭⾡ᚙ ᆍैᰃ“CBāDŽ (6) ᔧ䖯⿟ A ᅠ៤ᇍ sethostname()ⱘ䇗⫼㗠Ā៤ࡳā䖨ಲᯊˈݙḌ system_utsname.nodename ⱘݙ (5) ⿡ৢˈ䖯⿟ A ᘶ໡䖤㸠ˈ㒻㓁ᡞ“Bāݭܹ system_utsname.nodenameDŽ system_utsname_nodename 䆒㕂៤“CDāDŽ (4) 䖯⿟ C 䖯ܹݙḌˈᑊϨᅠ៤њᇍ sethostname() ⱘ䇗⫼ˈ៤ࡳഄᇚݙḌЁⱘ њ䖯ᴹDŽ system_utsname.nodenameˈৃᰃ䖬≵᳝ᴹᕫঞݭ“BāПࠡথ⫳њЁᮁˈ㗠 C ೼䖭Ͼᯊ׭ᦦ 䖯ܹݙḌˈᑊϨᏆ㒣೼ sys_setthostname() Ёᇚ“AāݭܹњݙḌЁⱘܜ (3) 䖯⿟ A (2) 䖯⿟ C ೼঺ϔϾ CPU Ϟ䖤㸠ˈг䇗⫼ sethostname()ˈ㽕ᡞЏᴎৡ䆒៤“CDāDŽ (1) 䖯⿟ A 䇗⫼ sethostname()ˈ㽕ᡞЏᴎৡ䆒៤“ABāDŽ ᯊ䇗⫼ sethostname()ˈ㗠ᔶ៤䖭ḋⱘ⦄䈵˖ ೼໮໘⧚఼㋏㒳Ёˈৠᯊৃҹ᳝໮Ͼ䖯⿟೼ϡৠⱘ CPU Ϟ䖤㸠DŽ䖭ḋˈህ᳝ৃ㛑থ⫳ϸϾ䖯⿟ৠ ߎ䫭ҷⷕ•EPERMDŽ✊ৢˈজᇍᄫヺІⱘ䭓ᑺ䖯㸠Ẕᶹҹֱ䆕ᅝܼDŽ 253 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ৢˈህᓔྟᠻ㸠 267̚270 㸠ⱘ∛㓪䇁㿔⿟ᑣDŽ⿟ᑣЁ߽⫼њ X86 ໘⧚఼ⱘ REP ੠ MOVS ᣛҸ䖯㸠៤ %1ˈ%2 ੠%3ˈߚ߿ᑨ䆹㕂߱ؐЎ(size/4)ˈখ᭄ toˈҹঞখ᭄ fromDŽᅠ៤њ䕧ܹ䚼᠔㾘ᅮⱘ߱ྟ࣪ҹ 䚼āˈ䇈ᯢњಯϾব䞣DŽ㄀ϔϾЎ%3ˈᰃϔϾᆘᄬ఼ব䞣ˈ߱ؐЎ(size&3)ˈ㗠ৢ䴶ϸϾ߭ߚ߿ㄝӋѢ 䞣__d0ˈϢᆘᄬ఼%%edi 㒧ড়˗㗠%2 ߭ᇍᑨѢሔ䚼ব䞣__d1ˈϢᆘᄬ఼%%esi 㒧ড়DŽ287 㸠ЎĀ䕧ܹ Ͼব䞣ˈߚ߿Ў%0ǃ%1 ҹঞ%2DŽ݊Ё%0 ᇍᑨѢখ᭄ sizeˈϢᆘᄬ఼%%ecx 㒧ড়˗%1 ᇍᑨѢሔ䚼ব ᠻ㸠ⱘDŽ䖭ϔ䚼ߚᅲ䋼Ϟা᳝ 267̚270 ಯ㸠ˈࡴϞ 286̚288 ϝ㸠DŽ286 㸠ЎĀ䕧ߎ䚼āˈ݅䇈ᯢњϝ ϟމᴹⳟ__copy_user_zeroing()ҷⷕЁᐌ㾘ⱘ䚼ߚˈ䖭ѯҷⷕᰃ೼᪡԰乎߽ˈϔߛ䛑ℷᐌⱘᚙܜ佪 289 } while (0) 288 : "memory"); \ 287 : "r"(size & 3), "0"(size / 4), "1"(to), "2"(from) \ 286 : "=&c"(size), "=&D" (__d0), "=&S" (__d1) \ 285 ".previous" \ 284 ".long 1b,4b\n" \ 283 ".long 0b,3b\n" \ 282 ".align 4\n" \ 281 ".section __ex_table,\"a\"\n" \ 280 ".previous\n" \ 279 " jmp 2b\n" \ 278 " popl %0\n" \ 277 " popl %%eax\n" \ 276 " rep; stosb\n" \ 275 " xorl %%eax,%%eax\n" \ 274 " pushl %%eax\n" \ 273 "4: pushl %0\n" \ 272 "3: lea 0(%3,%0,4),%0\n" \ 271 ".section .fixup,\"ax\"\n" \ 270 "2:\n" \ 269 "1: rep; movsb\n" \ 268 " movl %3,%0\n" \ 267 "0: rep; movsl\n" \ 266 __asm__ __volatile__( \ 265 int __d0, __d1; \ 264 do { \ 263 #define __copy_user_zeroing(to,from,size) \ ==================== include/asm•i386/uaccess.h 263 289 ==================== ⱘDŽᅣ᪡԰__copy_user_zeroing()ⱘᅮН೼ include/asm•i386/uaccess.h Ё˖ ĀଗāϔϟⱘDŽ঺ϔᮍ䴶ˈ៥Ӏ೼㄀ 2 ゴЁ䆆䗄 do_page_fault()⬭ϟњϔϾሒᏈˈℷᰃ䎳䖭ѯ᪡԰᳝݇ __constant_copy_user()ˈ䖬᳝__do_strncpy_form_user()ˈget_user()ㄝㄝ䛑Ϣℸ䴲ᐌⳌԐˈ᠔ҹ䖬ᰃؐᕫ 䞡㽕ⱘDŽ㗠Ϩ䖬᳝ϔѯ݊ᅗⱘ㉏Ԑ᪡԰ˈ՟བ೼ copy_to_user() Ё䇗⫼ⱘ__copy_user() ҹঞ ໡ࠊDŽ䖭䞠__copy_user_zeroing()ⱘҷⷕৃҹ䇈ᰃϔഫĀ⹀偼༈āDŽৃᰃˈ䖭Ͼ᪡԰ᇍѢ㋏㒳䇗⫼জᰃᕜ Ϟ䰤ˈ㗠ᑊϡẔᶹ䆹ऎ䯈ᰃ৺Ꮖ㒣᯴ᇘDŽ✊ৢˈህ䗮䖛঺ϔϾᅣ᪡԰__copy_user_zeroing()Ң⫼᠋ぎ䯈 ᇍѢ䇏᪡԰ˈaccess_ok()াᰃẔᶹখ᭄ from ੠ n ⱘড়⧚ᗻˈ՟བ(from + n)ᰃ৺䍙ߎњ⫼᠋ぎ䯈ⱘ { 56 254 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 317 goto no_context; 316 if (!(error_code & 4)) 315 /* Kernel mode? Handle exceptions or die */ ==================== arch/i386/mm/fault.c 315 318 ==================== ...... 299 do_sigbus: [do_page_fault()] ==================== arch/i386/mm/fault.c 299 299 ==================== 㸠ഄഔāैᰃ೼㋏㒳ぎ䯈ЁDŽЎᮍ֓䍋㾕ˈ៥Ӏݡ߫ߎ do_page_fault()Ё᳝݇ⱘ޴㸠ҷⷕ˖ ߭ᓖᐌথ⫳Ѣᔧ CPU 䖤㸠೼㋏㒳ぎ䯈ⱘᯊ׭DŽ㱑✊䆓䯂༅䋹ⱘⳂᷛഄഔ೼⫼᠋ぎ䯈ЁˈԚ CPU ⱘĀᠻ DŽ㗠೼៥Ӏ⦄೼䖭Ͼᚙ᱃ЁˈމЁˈ៥ӀᇍѢ bad_area া䆆њᔧᓖᐌথ⫳Ѣ CPU 䖤㸠೼⫼᠋ぎ䯈ᯊⱘᚙ ህᰃ䗮䖛 find_vma()᧰㋶ᔧࠡ䖯⿟ⱘ㰮ᄬऎ䯈䫒㸼ˈབᵰ᧰㋶༅䋹ህ䕀ܹ bad_areaDŽ೼㄀ 2 ゴܜЁˈ佪 ⦄೼ˈ៥Ӏݡಲ䖛༈এⳟⳟ do_page_fault()DŽᔧ⺄Ϟണᣛ䩜㗠义䴶ᓖᐌⳳⱘথ⫳ᯊˈ೼ do_page_fault() ৻ˈݙḌৃҹ೼义䴶ᓖᐌⱘ᳡ࡵ⿟ᑣЁϾ߿ഄ໘⧚䖭Ͼ䯂乬DŽ ᅮᡞᇍᣛ䩜ড়⊩ᗻⱘẔᶹপ⍜њDŽϛϔ⺄Ϟњണᣛ䩜ˈ䙷ህ䅽义䴶ᓖᐌথ⫳އϟ䰡DŽ᠔ҹˈᮄ⠜ᴀህ ĀⱒߚПбकѨҹϞⱘᣛᓩ䛑ᰃདⱘāˈᅲ೼⢃ϡⴔЎᇥ᭄ⱘണᣛ䩜㗠Āᠧߏϔ໻⠛āˈ㟈Փᘏԧᬜ⥛ ೼ᅲ䰙ᑨ⫼Ёˈ㱑✊ᣛ䩜᳝䯂乬ⱘৃ㛑ᗻгᰃ᳝ⱘˈ⫮㟇ৃ㛑䖬ϡᇣˈԚ↩コᘏᰃᇥ᭄ˈг䆌ৃҹ䇈 ៪ݭᯊ䛑㽕䖯㸠䖭ḋⱘẔᶹᅲ೼ᰃϾ䋳ᢙˈ⌟䆩㸼ᯢ䖭Ͼ䋳ᢙ೼݌ൟⱘᑨ⫼Ё⹂ᅲᰒ㨫ഄᕅડњᬜ⥛DŽ ⱘDŽԚᰃˈ↣⃵Ң⫼᠋ऎ䇏خverify_area()⫼Ѣ䖭ϾⳂⱘDŽ㗠 Linux ݙḌ㗕ϔѯⱘ⠜ᴀЁ⹂ᅲህᰃ䖭ḋ 䆌᠔䳔ⱘ᪡԰˄䇏៪ݭ˅DŽݙḌЁϧ䮼᳝Ͼߑ᭄ܕᑺЎ len ⱘऎ䯈ᰃ৺Ꮖ㒣ᓎゟ᯴ᇘˈᑊϨᰃ৺ ೼⫼᠋ぎ䯈᠔᳝Ꮖ㒣ᓎゟ᯴ᇘⱘऎ䯈DŽা㽕᧰㋶䖭Ͼ᭄᥂㒧ᵘЁⱘ䫒㸼ˈህৃҹথ⦄Ң name ᓔྟˈ䭓 㰮ᄬऎ䯈ᰃ৺Ꮖ㒣ᓎゟ᯴ᇘDŽ↣Ͼ䖯⿟䛑᳝Ͼҷ㸼ᅗⱘ㰮ᄬぎ䯈ⱘ mm_struct ᭄᥂㒧ᵘˈ䆄ᔩⴔ䆹䖯⿟ ᅮⱘއẔᶹ䖭Ͼऎ䯈ⱘড়⊩ᗻˈⳟⳟ⬅ᣛ䩜੠䭓ᑺϸϾখ᭄᠔ܜĀড়⊩āⱘDŽ᠔ҹˈЎᅝܼ䍋㾕ᑨ䆹 䖭Ͼᚙ᱃Ёⱘ nameˈᰃᕜ䲒ֱ䆕䖭ϾᣛᓩⱘĀড়⊩āᗻⱘˈ᳈䲒ֱ䆕೼䭓ᑺЎ len ⱘᭈϾऎ䯈䛑ᰃڣ 䇈ᯢDŽᔧݙḌҢϔϾ䖯⿟ᕫࠄҢ⫼᠋ぎ䯈Ӵ䗦䖯ᴹⱘᣛ䩜ᯊˈህܙҹ៥Ӏ㒧ড়ᴀ㡖ⱘᚙ᱃ߚᵤࡴҹ㸹 /usr/src/linux/Documentation ⳂᔩЁᡒࠄ䖭Ͼ᭛ӊ˅DŽϡ䖛䇏㗙೼䯙䇏䙷㆛䇈ᯢᯊৃ㛑䖬Ӯᛳࠄೄ䲒ˈ᠔ “ Documentation/exception.txt āˈ㾷䞞݊ॳ಴˄བᵰ䇏㗙ⱘ䅵ㅫᴎᅝ㺙њ Linux ˈৃҹ೼ ৃᰃˈЎҔМ㽕᳝Ң 271 㸠㟇 280 㸠䖭ѯҷⷕਸ਼˛ҷⷕⱘ԰㗙⡍ഄݭњϾ䇈ᯢˈህᰃ᭛ӊ ᆍᯧ⧚㾷ⱘDŽ ᰒ✊ˈѠ㗙ⱘᬜ⥛ᰃϡ㛑Ⳍ↨ⱘDŽ䇏㗙೼ࠡ޴㡖ЁᏆ㒣ⳟࠄ䖛㉏Ԑⱘҷⷕˈ᠔ҹ䖭ϔ䚼ߚҷⷕᰃ } while(r••) *((char *) to)++ = *((char *) from)++; while(size••) *((int *) to)++ = *((int *) from)++; size = size/4; r = size & 3 int r; { __copy_user_zeroing(void *to, void *from, size) ࠽ԭⱘ䚼ߚ˄ϡ䍙䖛 3 Ͼ˅ᄫ㡖ᣝᄫ㡖䖯㸠DŽབᵰ⫼ C 䇁㿔ᴹݭ䖭↉⿟ᑣˈ䙷ህⳌᔧѢ˖ ᣝ䭓ᭈ᭄䖯㸠ˈ✊ৢᇍܜІ MOVE˗ᆘᄬ఼%%ecx Ў䅵఼᭄ˈ%%esi Ў⑤ᣛ䩜ˈ%%edi ЎⳂᷛᣛ䩜DŽ 255 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 12 static inline unsigned long [do_page_fault()>search_exception_table()>search_one_table()] ==================== arch/i386/mm/extable.c 12 31 ==================== ᳔㒜ᘏᰃ㽕䇗⫼ search_one_table()DŽ䙷гᰃ೼ৠϔϾ⑤᭛ӊ arch/i386/mm/extable.c Ё˖ њ㋏㒳䜡㕂˅ˈއϡㅵ 38 㸠ⱘ CONFIG_MODULES ᰃ৺᳝ᅮНˈेᰃ৺ᬃᣕĀৃᅝ㺙῵ഫā˄প 55 } 54 return 0; 53 52 #endif 51 } 50 if (ret) return ret; 49 mp•>ex_table_end • 1, addr); 48 ret = search_one_table(mp•>ex_table_start, 47 continue; 46 if (mp•>ex_table_start == NULL) 45 for (mp = module_list; mp != NULL; mp = mp•>next) { 44 struct module *mp; 43 /* The kernel is the last "module" •• no need to treat it special. */ 42 #else 41 if (ret) return ret; 40 ret = search_one_table(__start___ex_table, __stop___ex_table•1, addr); 39 /* There is only the kernel to search. */ 38 #ifndef CONFIG_MODULES 37 36 unsigned long ret; 35 { 34 search_exception_table(unsigned long addr) 33 unsigned long [do_page_fault()>search_exception_table()] ==================== arch/i386/mm/extable.c 33 55 ==================== ᰃ೼ arch/i386/mm/extable.c ЁᅮНⱘ˖ ⫳ᮄⱘᓖᐌˈ㨑ܹĀϛࡿϡ໡āⱘഄℹDŽ᠔ҹˈᖙ乏ᡞᅗĀҢ⊹ഥ䞠ᢝߎᴹāDŽߑ᭄ search_exception_table() ⱑњ˅DŽ㗠བᵰӏ݊㞾✊ⱘ䆱ˈ߭Ңᓖᐌ䖨ಲҹৢˈᔧࠡ䖯⿟ᖙ✊Ӯ᥹䖲ϡᮁഄ಴ᠻ㸠ৠϔᴵᣛҸ㗠ѻ ϟݙḌϡ㛑Ўᔧࠡ䖯⿟㸹ϞϔϾ义䴶˄䙷ḋⱘ䆱 name ᠔ᣛⱘᄫヺІህব៤ぎމਸ਼˛಴Ў೼䖭⾡ᚙخḋ ໡āഄഔ fixupˈህᇚ CPU ೼ᓖᐌ䖨ಲৢᇚ㽕䞡ᮄᠻ㸠ⱘഄഔ᳓ᤶ៤䖭ϾĀׂ໡āഄഔDŽЎҔМ㽕䖭 ህᰃ䇈ˈབᵰݙḌ㛑໳೼ϔϾĀᓖᐌ㸼āЁᡒࠄথ⫳ᓖᐌⱘᣛҸ᠔೼ⱘഄഔˈᑊᕫࠄⳌᑨⱘĀׂ 260 } 259 return; 258 regs•>eip = fixup; 257 if ((fixup = search_exception_table(regs•>eip)) != 0) { 256 /* Are we prepared to handle this kernel fault? */ 255 no_context: ==================== arch/i386/mm/fault.c 255 260 ==================== return; 318 256 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ೼䖭䞠ৃ㛑থ⫳䯂乬ⱘᣛҸ݊ᅲা᳝ˈܜ⦄೼៥Ӏৃҹಲࠄ__copy_user_zeroing()ⱘҷⷕЁњDŽ佪 ぎ䯈ᣋ䋱ˈৃ㛑থ⫳䯂乬ˈᅗህᑨ䆹䋳䋷೼ᓖᐌ㸼ЁЎ݊ৃ㛑থ⫳䯂乬ⱘᣛҸᓎゟ䍋䖭ḋⱘ᭄᥂㒧ᵘDŽ ḋⱘ᭄᥂㒧ᵘਸ਼˛ᕜㅔऩˈህᰃĀ䇕Փ⫼ˈ䇕䋳䋷āDŽ՟བˈ៥Ӏ䖭䞠ⱘ__copy_user_zeroing()㽕Ң⫼᠋ ⱘ䙷М໮˗݊⃵ˈ⬅䇕ᴹЎ䖭ℸᣛҸᓎゟ䖭ڣᛇڣ㛑থ⫳䯂乬ⱘᣛҸ݊ᅲᑊϡৃˈܜਸ਼˛ಲㄨᰃ˖佪 Ӯ䯂˖ৃ㛑থ⫳䯂乬ⱘᣛҸ᳝䙷М໮ˈᗢМ㛑Ў↣ϔᴵৃ㛑থ⫳䯂乬ⱘᣛҸ䛑ᓎゟ䖭ḋϔϾ᭄᥂㒧ᵘ 㒧ᵘЁⱘ insn 㸼⼎ৃ㛑ѻ⫳ᓖᐌⱘᣛҸ᠔೼ⱘഄഔˈ㗠 fixup ߭Ў⫼ᴹ᳓ᤶⱘĀׂ໡āഄഔDŽ䇏㗙 83 }; 82 unsigned long insn, fixup; 81 { 80 struct exception_table_entry 79 78 */ 77 * on our cache or tlb entries. 76 * we don't even have to jump over them. Further, they do not intrude 75 * with the main instruction path. This means when everything is well, 74 * All the routines below use bits of fixup code that are out of line 73 * 72 * what to do. 71 * modified, so it is entirely up to the continuation code to figure out 70 * the address at which the program should continue. No registers are 69 * address of an instruction that is allowed to fault, and the second is 68 * The exception table consists of pairs of addresses: the first is the 67 /* ==================== include/asm•i386/uaccess.h 67 83 ==================== exception_table_entry জᰃ೼ include/asm•i386/uaccess.h ЁᅮНⱘ˖ ᰒ✊ˈ䖭䞠᠔ᅲ⦄ⱘᰃ೼ϔϾ exception_table_entry 㒧ᵘ᭄㒘Ё䖯㸠ⱘᇍߚ᧰㋶DŽ᭄᥂㒧ᵘ struct 31 } 30 return 0; 29 } 28 last = mid•1; 27 else 26 first = mid+1; 25 else if (diff < 0) 24 return mid•>fixup; 23 if (diff == 0) 22 diff = mid•>insn • value; 21 mid = (last • first) / 2 + first; 20 19 long diff; 18 const struct exception_table_entry *mid; 17 while (first <= last) { 16 { 15 unsigned long value) 14 const struct exception_table_entry *last, search_one_table(const struct exception_table_entry *first, 13 257 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 213 testl SYMBOL_NAME(irq_stat)+4,%ecx # softirq_mask 212 movl SYMBOL_NAME(irq_stat),%ecx # softirq_active 211 #else 210 testl SYMBOL_NAME(irq_stat)+4(,%eax),%ecx # softirq_mask 209 movl SYMBOL_NAME(irq_stat)(,%eax),%ecx # softirq_active 208 shll $CONFIG_X86_L1_CACHE_SHIFT,%eax 207 movl processor(%ebx),%eax 206 #ifdef CONFIG_SMP 205 ENTRY(ret_from_sys_call) kystel˖11ˆcallˉ˚rerfrofrotlTˆsyscalu ==================== arch/i386/kernel/entry.S 205 223 ==================== ህࠄ䖒њ ret_from_sys_callDŽ %eax ᇍᑨⱘഄᮍˈ䖭ḋ೼ RESTORE_ALL ҹৢˈ䖭Ͼ䖨ಲؐҡ䗮䖛%eax Ӵಲ⫼᠋ぎ䯈DŽ䖭ҹৢˈCPU ໛དⱘ䖨ಲؐ೼ᆘᄬ఼%eax Ёˈ᠔ҹ೼㄀ 204 㸠ᇚᅗݭܹࠄේᷜЁϢޚࡵ⿟ᑣ䖨ಲᯊˈ⬅᳡ࡵ⿟ᑣ ⬅Ѣ sys_sethostname()ᴀ䑿ᕜㅔऩˈ⦄೼ಲࠄᴀ㡖ᓔ༈ⱘ system_call()DŽCPU Ң݋ԧ㋏㒳䇗⫼ⱘ᳡ ḋ᳔ৢ sys_sethostname()䖨ಲ 0 㸼⼎៤ࡳˈ㗠㢹೼ copy_from_user()䖛⿟Ё༅䋹߭䖨ಲ•EFAULTDŽ Ёˈ߭ḍ᥂䖭Ͼ䖨ಲؐᴹ߸ᮁ copy_from_user()ᰃ৺៤ࡳDŽᔧ䖨ಲؐЎ 0 ᯊˈህᡞ errno г䆒៤ 0DŽ䖭 ҹˈ᳔ৢ೼__generic_copy_from_user()Ё䖨ಲⱘ n 㸼⼎䖬᳝޴Ͼᄫ㡖ᇮ᳾ᅠ៤DŽ㗠೼ sys_sethostname() ᡞᅗֱᄬ೼ේᷜЁˈ㗠ࠄ 278 㸠ݡᴹᘶ໡DŽ᠔ܜᅮᰃ䴲 0DŽৃᰃˈϟ䴶೼ 276 㸠䖬㽕⫼%%ecxˈ᠔ҹ ᇣˈⳈࠄЎ 0 ᯊ movsl ៪ movsb ህ㒧ᴳњDŽᔧ᪡԰Ё䗨༅䋹㗠ࠄ䖒 273 㸠ᯊˈ%%ecx ⱘؐϔޣؐϔⳈ খ᭄ size˗಴㗠гህᰃ nDŽৠᯊˈᅗህᰃᆘᄬ఼%%ecxDŽ೼ movsl ៪ movsb ᠻ㸠ⱘ䖛⿟Ёˈ%%ecx ⱘ ߭ n ҷ㸼њ࠽ϟ᳾ᅠ៤䚼ߚⱘ໻ᇣDŽಲ༈ⳟϔϟ__copy_user_zeroing()Ёⱘ㄀ 273 㸠ˈ䖭䞠ⱘ%0 ህᰃ ᇣˈϔⳈࠄ 0 ЎℶDŽབᵰЁ䗨༅䋹ⱘ䆱ˈޣϾߑ᭄ˈ㗠ᰃϔϾᅣᅮНDŽ೼ᠻ㸠ⱘ䖛⿟Ёˈn 䱣ⴔ໡ࠊ㗠 ᰃ䇗⫼খ᭄ˈгህᰃҢ⫼᠋ぎ䯈ᣋ䋱ⱘ䭓ᑺDŽ䖭ᰃᗢМಲџਸ਼˛䖭ᰃ಴Ў__copy_user_zeroing()ϡᰃϔ DŽ䖭䞠ˈ䇏㗙䖬ᑨ⊼ᛣϔϟߑ᭄__generic_copy_from_user()ⱘ䖨ಲؐDŽҢҷⷕЁৃҹⳟࠄˈ䖨ಲⱘܙ㸹 ᡞ⊼ᛣ࡯䲚Ё೼Ёᮁⱘ෎ᴀᴎࠊϞ㗠≵᳝䆆䗄᳝݇ⱘݙᆍˈ៥Ӏ೼ϟ䴶䆆ࠄҢ㋏㒳䇗⫼䖨ಲᯊӮࡴҹ ໛ˈ݊Ё䖬ࣙᣀ៥Ӏ೼ࠡϔ㡖Ёⳟࠄ䖛ⱘ RESTORE_ALLDŽᔧᯊЎњ䅽䇏㗙ޚ㛑থ⫳䯂乬ⱘ䛑㽕᳝᠔ ໛དĀׂ໡ഄഔāˈӏԩ೼ݙḌЁ䖤㸠ᯊৃޚcopy_user_zeroing()䖭ḋⱘߑ᭄㽕__ڣᰃܝᅲ䰙Ϟˈϡ 乍㺙ܹᓖᐌഄഔ㸼ЁDŽ 㸠ህᰃਞ䆝 gcc ᑨ䆹ᡞⳌᑨⱘҷⷕߚ߿ᬒ೼ fixup ੠__ex_table ↉Ёˈ䖲᥹ᯊ ld Ӯᣝഄഔᥦᑣᇚ䖭ѯ㸼 ᳝໾໻ऎ߿DŽ঺ϔϾᰃ__ex_tableˈϧ䮼⫼Ѣᓖᐌഄഔ㸼DŽ㗠__copy_user_zeroing()Ёⱘ 271 㸠੠ 281 GNU ⱘ gcc ੠ ld 䖬ᬃᣕ঺໪ϸϾ↉DŽϔϾᰃ fixupˈϧ䮼⫼Ѣᓖᐌথ⫳ৢⱘׂ໡ˈᅲ䰙Ϟ䎳 text ↉≵ ໻ᆊⶹ䘧ˈ⿟ᑣ㒣㓪䆥˄៪∛㓪˅䖲᥹ҹৢˈ݊ৃᠻ㸠ҷⷕߚ៤ text ੠ data ϸϾ↉DŽԚᰃˈ݊ᅲ њ䱋ܹ೼Āᓖᐌ— 䞡ᠻ㸠āП䯈ৃ㛑থ⫳ⱘ᮴䰤ᕾ⦃DŽܡै䙓 JMP ᣛҸ䏇䕀ࠄࠡ䴶ᷛোЎ 2 ໘ˈгህᰃ㒧ᴳⱘഄᮍDŽ䖭ḋˈ㱑✊Ң⫼᠋ぎ䯈ᣋ䋱ⱘⳂⱘ≵᳝䖒ࠄˈ DŽ✊ৢˈህ䗮䖛 279 㸠ⱘ˅خᡞ system_utsname.nodename Ё࠽ԭⱘ䚼ߚ䆒៤ 0˄ᔧ✊гৃҹᰃҔМ䛑ϡ ѯҔМׂ໡ਸ਼˛೼䖭䞠ᰃ䗮䖛 stosbخഔˈгህᰃᣛҸ lea ᠔೼ⱘഄഔDŽ䖭ᯊˈCPU ҢĀׂ໡ഄഔāᓔྟ ᷛোЎ 0 ໘ⱘഄഔˈгህᰃᣛҸ movsl ᠔೼ⱘഄഔˈ䙷М݊Āׂ໡ഄഔ”fixup Ўࠡ䴶ᷛোЎ 3 ໘ⱘഄ 乍ˈ䖭ህᰃ 282 㸠㟇 284 㸠᠔䇈ᯢⱘˈ݇䬂П໘೼ 283 㸠੠ 284 㸠DŽ283 㸠㸼⼎ˈབᵰᓖᐌথ⫳೼ࠡ䴶 ϸᴵˈϔᴵᰃ 267 㸠ᷛোЎ 0 ⱘ movslˈ঺ϔᴵ߭ᰃ 269 㸠ᷛϔোЎ 1 ⱘ movsbDŽ᠔ҹᑨ䆹ᓎゟϸϾ㸼 258 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 128 .long 2b,5b; \ 127 .long 1b,4b; \ 126 .align 4; \ 125 .section __ex_table,"a";\ 124 .previous; \ 123 call do_exit; \ 122 pushl $11; \ 121 popl %es; \ 120 pushl %ss; \ 119 popl %ds; \ 118 6: pushl %ss; \ 117 jmp 2b; \ 116 5: movl $0,(%esp); \ 115 jmp 1b; \ 114 4: movl $0,(%esp); \ 113 .section .fixup,"ax"; \ 112 3: iret; \ 111 addl $4,%esp; \ 110 2: popl %es; \ 109 1: popl %ds; \ 108 popl %eax; \ 107 popl %ebp; \ 106 popl %edi; \ 105 popl %esi; \ 104 popl %edx; \ 103 popl %ecx; \ 102 popl %ebx; \ 101 #define RESTORE_ALL \ ==================== arch/i386/kernel/entry.S 101 130 ==================== ⳟҷⷕ˄arch/i386/kernel/entry.S˅ˈݡࡴҹ䅼䆎˖ܜ䖭ϝᴵᣛҸᰃ˖popl %dsˈpopl %es ҹঞ iretDŽ៥Ӏ ໛Āׂ໡āDŽޚⱘᰃˈ೼ RESTORE_ALL Ё᳝ϝᴵᣛҸৃ㛑Ӯᓩ䍋ᓖᐌˈ᠔ҹ䳔㽕ЎПܙ䳔㽕㸹 䇏㗙Ꮖ㒣䇏䖛ҢЁᮁ䖨ಲᯊⱘҷⷕˈᇍϞ䴶䖭ѯҷⷕᑨ䆹ϡӮ᳝䯂乬њDŽ 284 jmp ret_from_intr 283 call SYMBOL_NAME(do_softirq) 282 handle_softirq: ==================== arch/i386/kernel/entry.S 282 284 ==================== ...... 223 RESTORE_ALL 222 restore_all: 221 jne signal_return 220 cmpl $0,sigpending(%ebx) 219 jne reschedule 218 cmpl $0,need_resched(%ebx) 217 ret_with_reschedule: 216 215 jne handle_softirq endif# 214 259 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 㓪োབϟ᠔⼎˖ ᭛ӊ include/asm•i386/unistd.h ϞЎ↣Ͼ㋏㒳䇗⫼ᅮНњϔϾଃϔⱘ㓪োˈ⿄Ў㋏㒳䇗⫼োDŽ䚼ߚ 3.9 ㋏㒳䇗⫼োϢ䏇䕀㸼 ህϡৃҹњDŽ ᔧࠡ䖯⿟ⱘ⫼᠋ぎ䯈ˈ᠔Փ⫼ⱘ㰮ᢳഄഔгϢᔧ䖯⿟໘Ѣ⫼᠋ぎ䯈ᯊⱘഄഔᅠܼⳌৠDŽᔧ✊ˈড䖛ᴹ ᳔ৢˈ䖬㽕ᣛߎϔϾ䇏㗙Ꮖ㒣ⳟࠄԚᰃ᳾ᖙ⏙Ἦഄᛣ䆚ࠄⱘџᅲˈ䙷ህᰃҢݙḌЁৃҹⳈ᥹䆓䯂 䛑ᰃⳌৠⱘDŽҹৢˈᔧ៥Ӏ䇜ࠄ㋏㒳䇗⫼ᯊˈህⳈ᥹ҢݙḌЁⱘᅲ⦄ˈབ sys_sethostname()䙷ḋᓔྟDŽ ᣛҸ䖭ϔ↉ҷⷕˈ߭ᰃ᠔᳝㋏㒳䇗⫼᠔݅⫼ⱘDŽϡㅵҔМ㋏㒳䇗⫼ˈ݊䖯ܹݙḌҹঞ䗔ߎݙḌⱘ䖛⿟ sys_sethostname()ࠡⱘ䖭ϔ↉ҷⷕˈҹঞҢ sys_sethostname()䖨ಲৢⳈࠄᅠ៤ RESTORE_ALL Ёⱘ iret ㋏㒳䇗⫼ sethostname() ⱘᅲ⦄㱑✊ᕜㅔऩˈԚᰃҢݙḌЁⱘܹষ system_call ࠄ䖯ܹ ᠋ぎ䯈Ёএˈ䙷ᯊ׭㽕Ң㋏㒳ේᷜЁᘶ໡ⱘᆘᄬ఼ࡃᴀгᰃ঺ϔϾ䖯⿟ⱘࡃᴀњDŽ ᑺ঺ϔϾ䖯⿟៤Ўᔧࠡ䖯⿟DŽ᠔ҹˈᔧݡ㽕Ң㋏㒳ぎ䯈䖨ಲࠄ⫼᠋ぎ䯈ᯊˈᰃ䖨ಲࠄ঺ϔϾ䖯⿟ⱘ⫼ 䇗ᑺāϔゴ˅ˈᇚᔧࠡ䖯⿟Ā϶दֱ䔺āᴔᥝㅫњ˄㾕 118̚123 㸠˅DŽᡞᔧࠡ䖯⿟ᴔњҹৢˈݙḌӮ䇗 ⱘࡲ⊩ˈ߭াད䗮䖛 do_exit()˄䆺㾕Ā䖯⿟Ϣ䖯⿟އ↨“popl %dsā᠔ৃ㛑থ⫳ⱘ䯂乬᳈ЎϹ䞡DŽ㗠㾷 ⫼ぎ䗝ᢽⷕⱘĀⵦ໽䖛⍋ā᠟↉ˈ಴Ў CS ੠ SS ḍᴀϡ᥹ফぎ䗝ᢽⷕ˄Ӯѻ⫳ GP ᓖᐌ˅DŽ᠔ҹˈ䯂乬 থ⫳䯂乬㗠ѻ⫳ GP ᓖᐌˈՓ CPU ಲϡࠄ⫼᠋Ё䯈ЁএDŽ䙷Мˈᗢḋׂ໡ਸ਼˛ᇍ CS ੠ SS ϡ㛑䗮䖛Փ ೼⫼᠋ぎ䯈ⱘ䖨ಲഄഔˈࣙᣀҷⷕ↉ᆘᄬ఼ⱘݙᆍDŽϢ᭄᥂↉ᆘᄬ఼%ds ㉏Ԑˈ䖭ϸϾℹ偸䛑᳝ৃ㛑 ⫼᠋ぎ䯈ᯊˈ㽕Ң㋏㒳ේᷜЁᘶ໡⫼᠋ේᷜⱘᣛ䩜ˈࣙᣀේᷜ↉ᆘᄬ఼ⱘݙᆍˈᑊҢ㋏㒳ේᷜЁᘶ໡ ᳔ৢ˗ЎҔМ“iretāгৃ㛑থ⫳䯂乬ˈজᗢḋĀׂ໡āਸ਼˛ᔧ i386 CPUҢ㋏㒳ぎ䯈Ёᮁ䖨ಲࠄ 䯂乬ᕔϟ᥼ǃᕔৢ᥼㗠ᏆDŽ110 㸠ⱘ“popl %esāϢℸⳌৠDŽ ᳔໮гϡ䖛ᰃᡞ䖭䖯⿟Āᴔāњˈ㗠ϡӮ೼㋏㒳ϔ㑻Ϟѻ⫳䯂乬DŽ᠔ҹˈ䖭䞠ⱘׂ໡᠟↉ᅲ䰙Ϟᰃᡞ 䗮䖛䖭Ͼぎ䗝ᢽⷕ䆓䯂ݙᄬᯊᠡӮᓩ䍋ᓖᐌˈԚ䙷ᰃಲࠄ⫼᠋ぎ䯈ҹৢⱘџњDŽ೼⫼᠋ぎ䯈থ⫳ᓖᐌˈ 䗝ᢽⷕāDŽᇚぎ䗝ᢽⷕ㺙ܹϔϾ↉ᆘᄬ఼˄䰸 CS ੠ SS ҹ໪˅ᴀ䑿ϡӮᓩ䍋 GP ᓖᐌˈ㗠㽕ࠄҹৢӕ೒ 䖯ϔℹⱘ GP ᓖᐌDŽҹ 0 ԰Ў↉䗝ᢽⷕ⿄ЎĀぎܡḋህ㛑Āׂ໡āਸ਼˛݊ᅲᑊϡᰃⳳⱘׂ໡ˈ㗠াᰃ䙓 ⏙៤ 0ˈ✊ৢ೼ 115 㸠䕀ಲ 109 㸠䞡ᮄᠻ㸠“popl %dsāDŽЎҔМ䖭ܜ䖭ᴵᣛҸᇚ%ds ೼ේᷜЁⱘࡃᴀ ໛ⱘׂ໡᠟↉ᰃҢᷛোЎ 4 ໘ ˈे 114 㸠 ⱘ“ move $0, (%esp)āᣛҸᓔྟⱘ⿟ᑣ↉ˈᅲ䰙Ϟা᳝ϸ㸠DŽޚ ”໛དׂ໡᠟↉DŽ೼䖭䞠ˈЎ“popl %dsޚЎ GP ᓖᐌ˅DŽᔧ䖭ḋⱘᓖᐌথ⫳Ѣ㋏㒳ぎ䯈ᯊˈህ㽕ЎП ಴㗠Փᕫ䗝ᢽⷕ៪ᦣ䗄乍᮴ᬜ៪ϡヺᯊˈCPU ህӮѻ⫳ϔ⃵Āܼ䴶ֱᡸā˄General Protection˅ᓖᐌ˄⿄ ఼ⱘĀϡৃ㾕ā䚼ߚˈՓᕫҹৢϡᖙ↣⃵䛑㽕ࠄݙᄬЁএ䆓䯂䆹ᦣ䗄乍DŽৃᰃˈབᵰ಴ЎϡㅵҔМॳ ᢽⱘ↉ᦣ䗄乍ˈᑊࡴҹẔᶹDŽབᵰᦣ䗄乍Ϣ䗝ᢽⷕ䛑᳝ᬜᑊϨⳌヺˈህᇚᦣ䗄乍㺙ܹࠄ CPU Ё↉ᆘᄬ ↉ᆘᄬ఼ᯊˈCPU 䛑㽕ḍ᥂䖭ᮄⱘ↉䗝ᢽⷕҹঞ GDTR ៪ LDTR ⱘݙᆍ೼Ⳍᑨⱘ↉ᦣ䗄㸼Ёᡒࠄ᠔䗝 㸠੠ 112 㸠DŽ䙷МˈЎҔМҢේᷜЁᘶ໡%ds Ӯ᳝ৃ㛑থ⫳䯂乬ਸ਼˛䇏㗙г䆌䖬䆄ᕫˈ↣ᔧ㺙ܹϔϾ ໛њϝϾĀׂ໡āഄഔˈߚ߿೼ 127̚129 㸠˗㗠ৃ㛑ߎ䯂乬ⱘᣛҸ߭ߚ߿೼ 109 㸠ǃ110ޚ䖭䞠 130 .previous long 3b,6b; \. 129 260 261 ==================== include/asm•i386/unistd.h 8 21 ==================== 8 #define __NR_exit 1 9 #define __NR_fork 2 10 #define __NR_read 3 11 #define __NR_write 4 12 #define __NR_open 5 13 #define __NR_close 6 14 #define __NR_waitpid 7 15 #define __NR_creat 8 16 #define __NR_link 9 17 #define __NR_unlink 10 18 #define __NR_execve 11 19 #define __NR_chdir 12 20 #define __NR_time 13 21 #define __NR_mknod 14 ㋏㒳䇗⫼ⱘ䏇䕀㸼ᰃϔϾߑ᭄ᣛ䩜᭄㒘ˈ䏇䕀ᯊҹ㋏㒳䇗⫼োЎϟᷛ೼᭄㒘ЁᡒࠄⳌᑨⱘߑ᭄ᣛ 䩜DŽ䆹᭄㒘ᰃ೼ arch/i386/kernel/entry.S ЁᅮНⱘDŽ᭄㒘ⱘ໻ᇣ⬅ᐌ᭄ NR_syscalls އᅮˈ䆹ᐌ᭄೼ include/linux/sys.h ЁᅮНЎ 256DŽⳂࠡ Linux ݅ᅮНњ 221 Ͼ㋏㒳䇗⫼ˈ݊ԭⱘ 30 ԭ乍ৃկ⫼᠋㸠⏏ ࡴDŽ᭄㒘Ёᇍ޵ᰃ≵᳝ᅮНⱘϟᷛ˄㋏㒳䇗⫼ো˅䛑ᬒϞϔϾߑ᭄ᣛ䩜ˈᣛ৥ sys_ni_syscall()ˈ݊ҷⷕ ೼ kernel/sys.c Ё˖ ==================== kernel/sys.c 169 172 ==================== 169 asmlinkage long sys_ni_syscall(void) 170 { 171 return •ENOSYS; 172 } ϟ䴶े arch/i386/kernel/entry.S Ё᭄㒘 sys_call_table ⱘ∛㓪ҷⷕDŽ㄀ 656 㸠໘ⱘ rept NR_syscalls•221 ㋏ gcc 乘໘⧚ੑҸDŽ᭛ӊ㒣乏໘⧚ৢህӮᇚৢ䴶ⱘ 657 㸠䞡໡(NR_syscalls•221)⃵ˈгे 35 ⃵DŽ ==================== arch/i386/kernel/entry.S 425 658 ==================== 425 ENTRY(sys_call_table) 426 .long SYMBOL_NAME(sys_ni_syscall) /* 0 • old "setup()" system call*/ 427 .long SYMBOL_NAME(sys_exit) 428 .long SYMBOL_NAME(sys_fork) 429 .long SYMBOL_NAME(sys_read) 430 .long SYMBOL_NAME(sys_write) 431 .long SYMBOL_NAME(sys_open) /* 5 */ 432 .long SYMBOL_NAME(sys_close) 433 .long SYMBOL_NAME(sys_waitpid) 434 .long SYMBOL_NAME(sys_creat) 435 .long SYMBOL_NAME(sys_link) 436 .long SYMBOL_NAME(sys_unlink) /* 10 */ 437 .long SYMBOL_NAME(sys_execve) 438 .long SYMBOL_NAME(sys_chdir) 439 .long SYMBOL_NAME(sys_time) 440 .long SYMBOL_NAME(sys_mknod) Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 262 441 .long SYMBOL_NAME(sys_chmod) /* 15 */ 442 .long SYMBOL_NAME(sys_lchown16) 443 .long SYMBOL_NAME(sys_ni_syscall) /* old break syscall holder */ 444 .long SYMBOL_NAME(sys_stat) 445 .long SYMBOL_NAME(sys_lseek) 446 .long SYMBOL_NAME(sys_getpid) /* 20 */ 447 .long SYMBOL_NAME(sys_mount) 448 .long SYMBOL_NAME(sys_oldumount) 449 .long SYMBOL_NAME(sys_setuid16) 450 .long SYMBOL_NAME(sys_getuid16) 451 .long SYMBOL_NAME(sys_stime) /* 25 */ 452 .long SYMBOL_NAME(sys_ptrace) 453 .long SYMBOL_NAME(sys_alarm) 454 .long SYMBOL_NAME(sys_fstat) 455 .long SYMBOL_NAME(sys_pause) 456 .long SYMBOL_NAME(sys_utime) /* 30 */ 457 .long SYMBOL_NAME(sys_ni_syscall) /* old stty syscall holder */ 458 .long SYMBOL_NAME(sys_ni_syscall) /* old gtty syscall holder */ 459 .long SYMBOL_NAME(sys_access) 460 .long SYMBOL_NAME(sys_nice) 461 .long SYMBOL_NAME(sys_ni_syscall) /* 35 */ /* old ftime syscall holder */ 462 .long SYMBOL_NAME(sys_sync) 463 .long SYMBOL_NAME(sys_kill) 464 .long SYMBOL_NAME(sys_rename) 465 .long SYMBOL_NAME(sys_mkdir) 466 .long SYMBOL_NAME(sys_rmdir) /* 40 */ 467 .long SYMBOL_NAME(sys_dup) 468 .long SYMBOL_NAME(sys_pipe) 469 .long SYMBOL_NAME(sys_times) 470 .long SYMBOL_NAME(sys_ni_syscall) /* old prof syscall holder */ 471 .long SYMBOL_NAME(sys_brk) /* 45 */ 472 .long SYMBOL_NAME(sys_setgid16) 473 .long SYMBOL_NAME(sys_getgid16) 474 .long SYMBOL_NAME(sys_signal) 475 .long SYMBOL_NAME(sys_geteuid16) 476 .long SYMBOL_NAME(sys_getegid16) /* 50 */ 477 .long SYMBOL_NAME(sys_acct) 478 .long SYMBOL_NAME(sys_umount) /* recycled never used phys() */ 479 .long SYMBOL_NAME(sys_ni_syscall) /* old lock syscall holder */ 480 .long SYMBOL_NAME(sys_ioctl) 481 .long SYMBOL_NAME(sys_fcntl) /* 55 */ 482 .long SYMBOL_NAME(sys_ni_syscall) /* old mpx syscall holder */ 483 .long SYMBOL_NAME(sys_setpgid) 484 .long SYMBOL_NAME(sys_ni_syscall) /* old ulimit syscall holder */ 485 .long SYMBOL_NAME(sys_olduname) 486 .long SYMBOL_NAME(sys_umask) /* 60 */ 487 .long SYMBOL_NAME(sys_chroot) 488 .long SYMBOL_NAME(sys_ustat) 489 .long SYMBOL_NAME(sys_dup2) Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 263 490 .long SYMBOL_NAME(sys_getppid) 491 .long SYMBOL_NAME(sys_getpgrp) /* 65 */ 492 .long SYMBOL_NAME(sys_setsid) 493 .long SYMBOL_NAME(sys_sigaction) 494 .long SYMBOL_NAME(sys_sgetmask) 495 .long SYMBOL_NAME(sys_ssetmask) 496 .long SYMBOL_NAME(sys_setreuid16) /* 70 */ 497 .long SYMBOL_NAME(sys_setregid16) 498 .long SYMBOL_NAME(sys_sigsuspend) 499 .long SYMBOL_NAME(sys_sigpending) 500 .long SYMBOL_NAME(sys_sethostname) 501 .long SYMBOL_NAME(sys_setrlimit) /* 75 */ 502 .long SYMBOL_NAME(sys_old_getrlimit) 503 .long SYMBOL_NAME(sys_getrusage) 504 .long SYMBOL_NAME(sys_gettimeofday) 505 .long SYMBOL_NAME(sys_settimeofday) 506 .long SYMBOL_NAME(sys_getgroups16) /* 80 */ 507 .long SYMBOL_NAME(sys_setgroups16) 508 .long SYMBOL_NAME(old_select) 509 .long SYMBOL_NAME(sys_symlink) 510 .long SYMBOL_NAME(sys_lstat) 511 .long SYMBOL_NAME(sys_readlink) /* 85 */ 512 .long SYMBOL_NAME(sys_uselib) 513 .long SYMBOL_NAME(sys_swapon) 514 .long SYMBOL_NAME(sys_reboot) 515 .long SYMBOL_NAME(old_readdir) 516 .long SYMBOL_NAME(old_mmap) /* 90 */ 517 .long SYMBOL_NAME(sys_munmap) 518 .long SYMBOL_NAME(sys_truncate) 519 .long SYMBOL_NAME(sys_ftruncate) 520 .long SYMBOL_NAME(sys_fchmod) 521 .long SYMBOL_NAME(sys_fchown16) /* 95 */ 522 .long SYMBOL_NAME(sys_getpriority) 523 .long SYMBOL_NAME(sys_setpriority) 524 .long SYMBOL_NAME(sys_ni_syscall) /* old profil syscall holder */ 525 .long SYMBOL_NAME(sys_statfs) 526 .long SYMBOL_NAME(sys_fstatfs) /* 100 */ 527 .long SYMBOL_NAME(sys_ioperm) 528 .long SYMBOL_NAME(sys_socketcall) 529 .long SYMBOL_NAME(sys_syslog) 530 .long SYMBOL_NAME(sys_setitimer) 531 .long SYMBOL_NAME(sys_getitimer) /* 105 */ 532 .long SYMBOL_NAME(sys_newstat) 533 .long SYMBOL_NAME(sys_newlstat) 534 .long SYMBOL_NAME(sys_newfstat) 535 .long SYMBOL_NAME(sys_uname) 536 .long SYMBOL_NAME(sys_iopl) /* 110 */ 537 .long SYMBOL_NAME(sys_vhangup) 538 .long SYMBOL_NAME(sys_ni_syscall) /* old "idle" system call */ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 264 539 .long SYMBOL_NAME(sys_vm86old) 540 .long SYMBOL_NAME(sys_wait4) 541 .long SYMBOL_NAME(sys_swapoff) /* 115 */ 542 .long SYMBOL_NAME(sys_sysinfo) 543 .long SYMBOL_NAME(sys_ipc) 544 .long SYMBOL_NAME(sys_fsync) 545 .long SYMBOL_NAME(sys_sigreturn) 546 .long SYMBOL_NAME(sys_clone) /* 120 */ 547 .long SYMBOL_NAME(sys_setdomainname) 548 .long SYMBOL_NAME(sys_newuname) 549 .long SYMBOL_NAME(sys_modify_ldt) 550 .long SYMBOL_NAME(sys_adjtimex) 551 .long SYMBOL_NAME(sys_mprotect) /* 125 */ 552 .long SYMBOL_NAME(sys_sigprocmask) 553 .long SYMBOL_NAME(sys_create_module) 554 .long SYMBOL_NAME(sys_init_module) 555 .long SYMBOL_NAME(sys_delete_module) 556 .long SYMBOL_NAME(sys_get_kernel_syms) /* 130 */ 557 .long SYMBOL_NAME(sys_quotactl) 558 .long SYMBOL_NAME(sys_getpgid) 559 .long SYMBOL_NAME(sys_fchdir) 560 .long SYMBOL_NAME(sys_bdflush) 561 .long SYMBOL_NAME(sys_sysfs) /* 135 */ 562 .long SYMBOL_NAME(sys_personality) 563 .long SYMBOL_NAME(sys_ni_syscall) /* for afs_syscall */ 564 .long SYMBOL_NAME(sys_setfsuid16) 565 .long SYMBOL_NAME(sys_setfsgid16) 566 .long SYMBOL_NAME(sys_llseek) /* 140 */ 567 .long SYMBOL_NAME(sys_getdents) 568 .long SYMBOL_NAME(sys_select) 569 .long SYMBOL_NAME(sys_flock) 570 .long SYMBOL_NAME(sys_msync) 571 .long SYMBOL_NAME(sys_readv) /* 145 */ 572 .long SYMBOL_NAME(sys_writev) 573 .long SYMBOL_NAME(sys_getsid) 574 .long SYMBOL_NAME(sys_fdatasync) 575 .long SYMBOL_NAME(sys_sysctl) 576 .long SYMBOL_NAME(sys_mlock) /* 150 */ 577 .long SYMBOL_NAME(sys_munlock) 578 .long SYMBOL_NAME(sys_mlockall) 579 .long SYMBOL_NAME(sys_munlockall) 580 .long SYMBOL_NAME(sys_sched_setparam) 581 .long SYMBOL_NAME(sys_sched_getparam) /* 155 */ 582 .long SYMBOL_NAME(sys_sched_setscheduler) 583 .long SYMBOL_NAME(sys_sched_getscheduler) 584 .long SYMBOL_NAME(sys_sched_yield) 585 .long SYMBOL_NAME(sys_sched_get_priority_max) 586 .long SYMBOL_NAME(sys_sched_get_priority_min) /* 160 */ 587 .long SYMBOL_NAME(sys_sched_rr_get_interval) Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 265 588 .long SYMBOL_NAME(sys_nanosleep) 589 .long SYMBOL_NAME(sys_mremap) 590 .long SYMBOL_NAME(sys_setresuid16) 591 .long SYMBOL_NAME(sys_getresuid16) /* 165 */ 592 .long SYMBOL_NAME(sys_vm86) 593 .long SYMBOL_NAME(sys_query_module) 594 .long SYMBOL_NAME(sys_poll) 595 .long SYMBOL_NAME(sys_nfsservctl) 596 .long SYMBOL_NAME(sys_setresgid16) /* 170 */ 597 .long SYMBOL_NAME(sys_getresgid16) 598 .long SYMBOL_NAME(sys_prctl) 599 .long SYMBOL_NAME(sys_rt_sigreturn) 600 .long SYMBOL_NAME(sys_rt_sigaction) 601 .long SYMBOL_NAME(sys_rt_sigprocmask) /* 175 */ 602 .long SYMBOL_NAME(sys_rt_sigpending) 603 .long SYMBOL_NAME(sys_rt_sigtimedwait) 604 .long SYMBOL_NAME(sys_rt_sigqueueinfo) 605 .long SYMBOL_NAME(sys_rt_sigsuspend) 606 .long SYMBOL_NAME(sys_pread) /* 180 */ 607 .long SYMBOL_NAME(sys_pwrite) 608 .long SYMBOL_NAME(sys_chown16) 609 .long SYMBOL_NAME(sys_getcwd) 610 .long SYMBOL_NAME(sys_capget) 611 .long SYMBOL_NAME(sys_capset) /* 185 */ 612 .long SYMBOL_NAME(sys_sigaltstack) 613 .long SYMBOL_NAME(sys_sendfile) 614 .long SYMBOL_NAME(sys_ni_syscall) /* streams1 */ 615 .long SYMBOL_NAME(sys_ni_syscall) /* streams2 */ 616 .long SYMBOL_NAME(sys_vfork) /* 190 */ 617 .long SYMBOL_NAME(sys_getrlimit) 618 .long SYMBOL_NAME(sys_mmap2) 619 .long SYMBOL_NAME(sys_truncate64) 620 .long SYMBOL_NAME(sys_ftruncate64) 621 .long SYMBOL_NAME(sys_stat64) /* 195 */ 622 .long SYMBOL_NAME(sys_lstat64) 623 .long SYMBOL_NAME(sys_fstat64) 624 .long SYMBOL_NAME(sys_lchown) 625 .long SYMBOL_NAME(sys_getuid) 626 .long SYMBOL_NAME(sys_getgid) /* 200 */ 627 .long SYMBOL_NAME(sys_geteuid) 628 .long SYMBOL_NAME(sys_getegid) 629 .long SYMBOL_NAME(sys_setreuid) 630 .long SYMBOL_NAME(sys_setregid) 631 .long SYMBOL_NAME(sys_getgroups) /* 205 */ 632 .long SYMBOL_NAME(sys_setgroups) 633 .long SYMBOL_NAME(sys_fchown) 634 .long SYMBOL_NAME(sys_setresuid) 635 .long SYMBOL_NAME(sys_getresuid) 636 .long SYMBOL_NAME(sys_setresgid) /* 210 */ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 266 637 .long SYMBOL_NAME(sys_getresgid) 638 .long SYMBOL_NAME(sys_chown) 639 .long SYMBOL_NAME(sys_setuid) 640 .long SYMBOL_NAME(sys_setgid) 641 .long SYMBOL_NAME(sys_setfsuid) /* 215 */ 642 .long SYMBOL_NAME(sys_setfsgid) 643 .long SYMBOL_NAME(sys_pivot_root) 644 .long SYMBOL_NAME(sys_mincore) 645 .long SYMBOL_NAME(sys_madvise) 646 .long SYMBOL_NAME(sys_getdents64) /* 220 */ 647 .long SYMBOL_NAME(sys_fcntl64) 648 .long SYMBOL_NAME(sys_ni_syscall) /* reserved for TUX */ 649 650 /* 651 * NOTE!! This doesn't have to be exact • we just have 652 * to make sure we have _enough_ of the "sys_ni_syscall" 653 * entries. Don't panic if you notice that this hasn't 654 * been shrunk every time we add a new system call. 655 */ 656 .rept NR_syscalls•221 657 .long SYMBOL_NAME(sys_ni_syscall) 658 .endr Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䱚˅DŽܟ˄㒚㚲ߚ㺖䙷ḋ䗮䖛㋏㒳䇗⫼໡ࠊߎᴹⱘˈ⿄Ў“forkā˄ߚঝ˅៪“cloneāڣϾϮᏆᄬ೼ⱘ䖯⿟ Linux ㋏㒳䖤㸠ᯊⱘ㄀ϔϾ䖯⿟ᰃ೼߱ྟ࣪䰊↉Āᤣ䗴āߎᴹⱘDŽ㗠ℸৢⱘ䖯⿟៪㒓↉߭䛑ᰃ⬅ϔ ੠ Unix ⱘᅲ⦄ᴹ䇈ᰃϔⷕџ˅DŽ ㋏㒳㒧ᵘˈ㗠 Unix Ёⱘ䖯⿟೼ Intel ⱘᡔᴃ䌘᭭Ё߭⿄ЎĀӏࡵā˄ϹḐ䇈ᴹ᳝⚍ऎ߿ˈԚᰃᇍ Linux ᰃ಴Ў Linux ⑤㞾 Unix ੠ i386މ૸䝦ϔϾⴵ⳴䖯⿟ⱘߑ᭄ৡЎ wake_up_process()DŽП᠔ҹ᳝䖭ḋⱘᚙ ᐌ⏋⫼䖭ϸϾৡ䆡੠ὖᗉDŽ՟བˈ↣ϔϾ䖯⿟䛑㽕᳝ϔϾ task_struct ᭄᥂㒧ᵘˈ㗠݊োⷕैজᰃ pid˗ 䖬᳝ˈ೼ Linux ㋏㒳ЁĀ䖯⿟ā˄process˅੠Āӏࡵā˄task˅ᰃৠϔϾᛣᗱˈ೼ݙḌⱘҷⷕЁгᐌ 㽕ḍ᥂Ϟϟ᭛⧚㾷݊৿ᛣDŽ 䖯⿟DŽݡ䇈ˈ㒓⿟г᳝“pidāˈг᳝ task_struct 㒧ᵘˈ᠔ҹ䖭ϸϾ䆡೼Փ⫼Ё᳝ᯊ׭ᑊϡϹḐࡴҹऎߚˈ ぎ䯈ˈᑊϢ⠊䖯⿟ߚ䘧ᡀ䭇ˈ៤ЎⳳℷᛣНϞⱘټḐ䇈ᴹ䖬ᰃ㒓⿟˗Ԛᰃᄤ䖯⿟ৃҹᓎゟ݊㞾Ꮕⱘᄬ ぎ䯈ˈ᠔ҹϹټ೼ Linux˄ҹঞ Unix˅㋏㒳Ёˈ䆌໮䖯⿟೼Ā䆲⫳āП߱䛑Ϣ݊⠊䖯⿟݅⫼ৠϔϾᄬ ঺ϔᮍ䴶ˈ䖯⿟Ϣ㒓⿟ⱘऎߚгϡᰃकߚϹḐⱘˈϔ㠀೼䆆ࠄ䖯⿟ᯊᐌᐌгࣙᣀњ㒓⿟DŽџᅲϞˈ ⿟ݙ䚼ˈे⫼᠋ぎ䯈Ё㞾㸠ᅲ⦄㒓⿟DŽ Ͼ䇗ᑺऩԡⳈ᥹ফݙḌ䇗ᑺDŽ㗠Ϩˈ᮶✊ Linux ݙḌᦤկњᇍ㒓⿟ⱘᬃᣕˈϔ㠀гህ≵᳝ᖙ㽕ݡ೼䖯 ぎ䯈ⱘৠϔ䖯⿟ݙᅲ⦄ⱘĀ㒓⿟āⳌ⏋⎚DŽ䙷⾡㒓⿟ᰒ✊ϡᢹ᳝⣀ゟǃϧ⫼ⱘ㋏㒳ේᷜˈгϡ԰Ўϔ ゴЁⳟࠄ䖛ⱘ kswapdˈህᰃϔϾݙḌ㒓⿟DŽ䇏㗙㽕⊼ᛣˈϡ㽕ᡞ䖭䞠ⱘĀ㒓⿟āϢ᳝ѯ㋏㒳Ё೼⫼᠋ ⫼᠋ぎ䯈߭ህ⿄ЎĀ⫼᠋㒓⿟āDŽ೼ϡ㟈ᓩ䍋⏋⎚ⱘഎড়ˈѠ㗙г䛑ᕔᕔㅔ⿄ЎĀ㒓⿟āDŽ䇏㗙೼㄀ 2 䙷ህ⿄ЎĀ㒓⿟āDŽ⡍߿ഄˈབᵰᅠܼ≵᳝⫼᠋ぎ䯈ˈህ⿄ЎĀݙḌ㒓⿟ā˄kernel thread˅˗㗠བᵰ݅ѿ 䖭ಯᴵ䛑ᰃᖙ㽕ᴵӊˈ㔎њ݊Ёӏԩϔᴵህϡ⿄݊ЎĀ䖯⿟āDŽབᵰা݋໛њࠡ䴶Ѡᴵ㗠㔎㄀ಯᴵˈ 䗮䖛㋏㒳䇗⫼˅ᬍব㋏㒳ぎ䯈ⱘݙᆍ˄䰸݊ᴀ䑿ⱘ㋏㒳ぎ䯈ේᷜҹ໪˅DŽ ໪䖬᳝݊ϧ⫼ⱘ⫼᠋ぎ䯈ේᷜDŽ⊼ᛣˈ㋏㒳ぎ䯈ᰃϡ㛑⣀ゟⱘˈӏԩ䖯⿟䛑ϡৃ㛑Ⳉ᥹˄ϡ ぎ䯈ˈᛣੇⴔᢹ᳝ϧ᳝ⱘ⫼᠋ぎ䯈˗䖯ϔℹˈ䖬ᛣੇⴔ䰸ࠡ䗄ⱘ㋏㒳ぎ䯈ේᷜټ(4) ᳝⣀ゟⱘᄬ 䖭Ͼ㒧ᵘজᰃ䖯⿟ⱘĀ䋶ѻⱏ䆄वāˈ䆄ᔩⴔ䖯⿟᠔ऴ⫼ⱘ৘乍䌘⑤DŽ ࠊഫāDŽ᳝њ䖭Ͼ᭄᥂㒧ᵘˈ䖯⿟ᠡ㛑៤ЎݙḌ䇗ᑺⱘϔϾ෎ᴀऩԡ᥹ফݙḌⱘ䇗ᑺDŽৠᯊˈ (3) ᳝Ā᠋ষāˈ䖭ህᰃ೼ݙḌЁⱘϔϾ task_struct ᭄᥂㒧ᵘˈ᪡԰㋏㒳ᬭ⾥кЁᐌ⿄ЎĀ䖯⿟᥻ (2) ᳝䍋ⷕⱘĀ⾕᳝䋶ѻāˈ䖭ህᰃ䖯⿟ϧ⫼ⱘ㋏㒳ේᷜぎ䯈DŽ ϡৠ࠻ಶⱘ䆌໮എⓨߎৃҹ݅⫼ϔϾ࠻ᴀϔḋDŽڣϢ݊ᅗ䖯⿟݅⫼ˈህད ϔഎ៣㽕᳝Ͼ࠻ᴀϔḋDŽ䖭↉⿟ᑣϡϔᅮᰃ䖯⿟᠔ϧ᳝ˈৃҹڣ(1) ᳝ϔ↉⿟ᑣկ݊ᠻ㸠ˈህད ߫䇌㽕㋴˖ 㽕㒭Ā䖯⿟āϟϔϾ⹂ߛⱘᅮНϡᰃӊᆍᯧⱘџDŽϡ䖛ˈϔ㠀ᴹ䇈 Linux ㋏㒳Ёⱘ䖯⿟䛑݋໛ϟ 4.1 䖯⿟ಯ㽕㋴ ㄀4ゴ䖯⿟Ϣ䖯⿟䇗ᑺ 267 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ࡴ䗳ҹৢᇍ䆹 TSS ↉ⱘ䆓䯂DŽ 䗝ᢽⷕ㺙ܹࠄ TR ЁᯊˈCPU ህ㞾ࡼᡒࠄ᠔䗝ᢽⱘ TSS ᦣ䗄乍ᑊᇚ݊㺙ܹࠄ TR Ёⱘϡৃ㾕䚼ߚˈҹ CS ੠ DS ϔḋˈTR г᳝ϔϾϡৃ㾕ⱘ䚼ߚˈ↣ᔧᇚϔϾ↉ ڣҸ LTRˈᇍ TR ᆘᄬ఼䖯㸠㺙ܹ᪡԰DŽ ঺໪ˈCPU Ё䖬๲䆒њϔϾĀӏࡵᆘᄬ఼”TRˈᣛ৥ᔧࠡӏࡵⱘ TSSDŽⳌᑨഄˈ䖬๲ࡴњϔᴵᣛ ˄খ㗗㄀ 2 ゴ˅ˈԚ᳝ϔϾ B˄Busy˅ᷛᖫԡˈ㸼⼎Ⳍᑨ TSS ᠔ҷ㸼ⱘӏࡵᰃ৺ℷ೼䖤㸠៪㗙ℷ㹿ЁᮁDŽ ˄㸼⼎Փ⫼ LDT˅ˈህӮѻ⫳ϔ⃵Āᘏֱᡸ”GP ᓖᐌDŽTSS ᦣ䗄乍ⱘ㒧ᵘϢ݊ᅗⱘ↉ᦣ䗄乍෎ᴀⳌৠ 㛑ᬒ೼ӏԩϔϾ LDT Ё៪ IDT ЁDŽབᵰ䗮䖛ϔϾ↉䗝ᢽ乍䆓䯂ϔϾ TSSˈ㗠䗝ᢽ乍Ёⱘ TI ᷛᖫԡЎ 1 ᅗⱘĀ↉āϔḋˈTSS г㽕೼↉ᦣ䗄㸼Ё᳝Ͼ㸼乍DŽϡ䖛 TSS ⱘᦣ䗄乍া㛑೼ GDT Ёˈ㗠ϡ݊ڣ ϾⳂⱘDŽ঺ϔϾᰃĀЁᮁ䞡ᅮ৥ԡ೒āˈ⫼Ѣ vm86 ῵ᓣDŽ ҹᇚ໪䆒偅ࡼᅲ⦄ѢϔϾ᮶䴲ݙḌ˄0 㑻˅г䴲⫼᠋˄3 㑻˅ⱘぎ䯈Ёˈ䖭Ͼԡ೒ህᰃ⫼Ѣ䖭 䆌 I/O ᣛҸ೼↨ 0 㑻Ԣⱘ⢊ᗕϟᠻ㸠ˈгህᰃ䇈ৃܕϔᰃ㸼⼎ I/O ᴗ䰤ⱘԡ೒DŽi386 ㋏㒳㒧ᵘ · ೼ϔϾ TSS ↉Ёˈ䰸њ෎ᴀⱘ 104 ᄫ㡖ⱘ TSS 㒧ᵘҹ໪ˈ䖬ৃҹ᳝ϔѯ䰘ࡴⱘֵᙃDŽ݊ЁП ᓖᐌˈ䖭ḋህৃҹ೼ debug ᓖᐌⱘ᳡ࡵ⿟ᑣЁᅝᥦ᠔䳔ⱘ᪡԰ˈབࡴҹ䆄ᔩǃᰒ⼎ǃㄝㄝDŽ · ϔϾ⫼Ѣ⿟ᑣ䎳䏾ⱘᷛᖫԡ TDŽᔧ T ᷛᖫԡЎ 1 ᯊˈCPU ህӮ೼ߛܹ䆹䖯⿟ᯊѻ⫳ϔ⃵ debug ᆍˈᅲ⦄ේᷜⱘߛᤶDŽ ᄬ఼ˈԚᰃ CPU ೼䖯ܹᮄⱘ䖤㸠㑻߿ᯊӮ㞾ࡼҢᔧࠡӏࡵⱘ TSS Ё㺙ܹⳌᑨ SS ੠ ESP ⱘݙ SS1 ੠ SS2ˈҹঞ ESP0ǃESP1 ੠ ESP2 ⱘݙᆍDŽ⊼ᛣˈ೼ CPU Ёা᳝ϔϾ SS ੠ϔϾ ESP ᆘ · ϝϾේᷜᣛ䩜ˈߚ߿Ўᔧӏࡵ䖤㸠Ѣ 0 㑻ǃl 㑻੠ 2 㑻ᯊⱘේᷜᣛ䩜ˈࣙᣀේᷜ↉ᆘᄬ఼ SS0ǃ · ᥻ࠊᆘᄬ఼ CR3 ⱘݙᆍˈᅗᣛ৥ӏࡵⱘ义䴶ⳂᔩDŽ · 䆹ӏࡵⱘ LDT ↉䗝ᢽⷕˈᅗᣛ৥ӏࡵⱘ LDTDŽ ᅮ˅DŽއᢽⷕ᠔ᣛⱘ˄TSS ᠔ҷ㸼ⱘ˅ӏࡵ˄䖨ಲഄഔ߭⬅ේᷜ · ᣛ৥ࠡϔϾӏࡵⱘ TSS 㒧ᵘⱘ↉䗝ᢽⷕDŽᔧࠡӏࡵᠻ㸠 IRET ᣛҸᯊˈህ䖨ಲࠄ⬅䖭Ͼ↉䗝 · ӏࡵߛᤶࠡ໩˄ߛܹ⚍Ϟ˅䆹ӏࡵᣛҸഄഔᆘᄬ఼ EIP ⱘݙᆍDŽ · ೼ࡵߛᤶࠡ໩˄ߛܹ⚍Ϟ˅䆹ӏࡵ EFLAGS ᆘᄬ఼ⱘݙᆍDŽ ᆍDŽ · ӏࡵߛᤶࠡ໩˄ߛܹ⚍Ϟ˅䆹ӏࡵ৘Ͼ↉ᆘᄬ఼˄ࣙᣀ ESǃCSǃSSǃDSǃFS ੠ GS˅ⱘݙ · ӏࡵߛᤶࠡ໩˄гህᰃߛܹ⚍Ϟ˅䆹ӏࡵ৘䗮⫼ᆘᄬ఼ⱘݙᆍDŽ ⫼ҹ䆄ᔩϔϾӏࡵⱘ݇䬂ᗻⱘ⢊ᗕֵᙃˈࣙᣀ˖ ҷⷕ↉ǃ᭄᥂↉ㄝϔḋˈгᰃϔϾĀ↉āˈᅲ䰙ϞैাᰃϔϾ 104 ᄫ㡖ⱘ᭄᥂㒧ᵘǃ៪᳄᥻ࠊഫˈڣ䇈 Āӏࡵ⢊ᗕ↉”TSSDŽϔϾ TSS 㱑خᤶDŽЎℸⳂⱘˈIntel ೼ i386 ㋏㒳㒧ᵘЁ๲䆒њ঺ϔ⾡ᮄⱘ↉ˈি Intel ೼ i386 ㋏㒳㒧ᵘⱘ䆒䅵Ё㗗㰥ࠄњ䖯⿟˄ӏࡵ˅ⱘㅵ⧚੠䇗ᑺˈᑊҢ⹀ӊϞᬃᣕӏࡵ䯈ⱘߛ Linux ݙḌᇍ䖭⾡ᴎࠊⱘ⡍⅞䖤⫼੠໘⧚DŽ䇏㗙ৃҹ㒧ড়㄀ 2 ゴЁⱘ᳝݇ݙᆍ䯙䇏DŽ 䆆ϔϟ i386 ㋏㒳㒧ᵘ᠔ᦤկⱘ䖯⿟ㅵ⧚ᴎࠊҹঞܜ೼䕀ܹ䆺㒚ҟ㒡䖯⿟ⱘ৘Ͼ㽕㋴Пࠡˈ៥Ӏ њᆓЏ CPU ⱘ㋏㒳㒧ᵘDŽއ߭೼Ⳍᔧ⿟ᑺϞপ ሲѢ task_struct ⱘ䌘⑤ˈ㗠 task_struct ᭄᥂㒧ᵘ߭೼䖭ᮍ䴶䍋ⴔⱏ䆄वⱘ԰⫼DŽ㟇Ѣ䖯⿟ⱘ݋ԧᅲ⦄ˈ ᭄᥂㒧ᵘҹঞϟሲⱘ vm_area ᭄᥂㒧ᵘˈҹঞⳌᑨⱘ义䴶Ⳃᔩ乍੠义䴶㸼DŽԚ䙷ѯ䛑ᰃ㄀ѠԡⱘˈҢ ぎ䯈ᛣੇⴔ䖯⿟ᢹ᳝⫼᠋ぎ䯈ˈ಴ℸህ㽕᳝⫼Ѣ㰮ᄬㅵ⧚ⱘ mm_structټ՟བˈϞ䴶䇈䖛ˈĀ⣀ゟāⱘᄬ 䰸Ϟ䗄᳔䍋ⷕⱘĀ䋶ѻāˈे task_struct ᭄᥂㒧ᵘ੠㋏㒳ේᷜП໪ˈϔϾ䖯⿟䖬㽕᳝ѯ䰘ࡴⱘ䌘⑤DŽ 268 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 393 0,0, /* back_link, __blh */ \ 392 #define INIT_TSS { \ ==================== include/asm•i386/processor.h 392 406 ==================== 䙷Мˈ䖭Ͼ TSS ᰃҔМḋⱘਸ਼˛䇋ⳟ include/asm•i386/processor.h Ёᇍ INIT_TSS ⱘᅮН˖ বњDŽ ϟˈሑㅵݙḌЁ⹂ᅲ᳝໮Ͼ TSSˈԚᰃ↣Ͼ CPU ҡᮻা᳝ϔϾ TSSˈϔ㒣㺙ܹህϡݡމ໮໘⧚఼ⱘᚙ 㽕ᇣᕫ໮DŽ಴ℸˈ೼ Linux ݙḌЁˈTSS ᑊϡᰃሲѢᶤϾ䖯⿟ⱘ䌘⑤ˈ㗠ᰃϾܼሔᗻⱘ݀݅䌘⑤DŽ೼ ⌕∈ԐഄবࡼDŽ䖭䞠ⱘॳ಴೼Ѣ˖ᬍব TSS Ё SS0 ੠ ESP0 ᠔㢅ⱘᓔ䫔↨䗮䖛㺙ܹ TR ҹ᳈ᤶϔϾ TSS ϔ㒣ᓎゟህݡгϡࡼњDŽ㗠䞠䴶ⱘݙᆍˈгህᰃᔧࠡӏࡵⱘ㋏㒳ේᷜᣛ䩜ˈ߭䱣ⴔ䖯⿟ⱘ䇗ᑺߛᤶ㗠 ϔᑻ㧹Ⲭˈڣˈ㗠䍄偀♃Ԑഄ䕀DŽৃᰃ೼ Linux ݙḌЁैব៤њĀ䪕ᠧⱘ㧹Ⲭ⌕∈ⱘ݉ā˖ህϔϾ TSS 䩜ˈгህᰃ SS0 ੠ ESP0 ϸ乍њDŽIntel ॳᴹⱘᛣ೒ᰃ䅽 TR ⱘݙᆍˈ䱣ⴔϡৠⱘ TSSˈ䱣ⴔӏࡵⱘߛᤶ ේᷜᣛ䩜ࡃᴀг༅এњᛣНDŽѢᰃˈᇍѢ Linux ݙḌᴹ䇈ˈTSS Ё᳝ᛣНⱘህা࠽ϟњ 0 㑻ⱘේᷜᣛ ೼ Linux ЁাՓ⫼ϸϾ䖤㸠㑻߿ˈे 0 㑻੠ 3 㑻ˈ᠔ҹ TSS ЁЎ঺ϸϾ㑻߿˄े 1 㑻੠ 2 㑻˅䆒㕂ⱘ 䩜ˈࣙᣀේᷜ↉ᆘᄬ఼ SS ⱘݙᆍ੠ේᷜᣛᓩᆘᄬ఼ ESP ⱘݙᆍˈ߭প㞾Āᔧࠡāӏࡵⱘ TSSDŽ⬅Ѣ ᮁ៪㋏㒳䇗⫼㗠Ң⫼᠋ぎ䯈䖯ܹ㋏㒳ぎ䯈ᯊˈӮ⬅Ѣ䖤㸠㑻߿ⱘব࣪㗠㞾ࡼ᳈ᤶේᷜDŽ㗠ᮄⱘේᷜᣛ 䖭ḋϔᴹˈTSS Ёⱘ㒱໻䚼ߚݙᆍᏆ㒣༅এњॳᴹⱘᛣНDŽৃᰃˈ೼㄀ 3 ゴЁ䆆䖛ˈᔧ CPU ಴Ё ֱᄬ೼৘Ͼ䖯⿟㞾Ꮕⱘ㋏㒳ぎ䯈ේᷜЁˈህབ䇏㗙೼㄀ 3 ゴЁ᠔ⳟࠄⱘ䙷ḋDŽ ৠϔϾ TSSDŽৠᯊˈݙḌгϡձ䴴 TSS ֱᄬ↣Ͼ䖯⿟ߛᤶᯊⱘᆘᄬ఼ࡃᴀˈ㗠ᰃᇚ䖭ѯᆘᄬ఼ⱘࡃᴀ ݙᆍњDŽгህᰃ䇈ˈ↣Ͼ CPU˄བᵰ᳝໮Ͼ CPU ⱘ䆱˅೼߱ྟ࣪ҹৢⱘܼ䚼䖤㸠䖛⿟Ё∌䖰৘㞾Փ⫼ ᣛҸᅲᮑӏࡵߛᤶDŽݙḌাᰃ೼߱ྟ࣪䰊↉䆒㕂 TRˈՓПᣛ৥ϔϾ TSSˈҢℸҹৢህݡϡᬍব TR ⱘ 䆌Փ⫼ JMP ៪ CALLܕ䆒㕂ད TR ঞ TSS ҹ⒵䎇 CPU ⱘ㽕∖DŽԚᰃˈݙḌЁᑊϡՓ⫼ӏࡵ䮼ǃгϡ ӊᦤկⱘӏࡵߛᤶᴎࠊDŽϡ䖛ˈ⬅Ѣ i386 CPU 㽕∖䕃ӊ䆒㕂 TR ঞ TSSˈݙḌЁ֓াདĀ䍄䖛എāഄ ህབᇍ i386 ᠔ᦤկⱘ䆌໮݊ᅗࡳ㛑ϔḋˈ䇏㗙ᇚӮⳟࠄˈLinux ݙḌᅲ䰙ϞᑊϡՓ⫼ i386 CPU ⹀ ҹড়ᑊDŽ ㋏㒳Ёˈӏࡵߛᤶህাথ⫳Ѣ㋏㒳ぎ䯈ˈ಴㗠Ϣ㋏㒳䇗⫼੠Ёᮁᆚߛ㘨㋏೼ϔ䍋ˈᑊϨ᳝䆌໮᪡԰ৃ ᗻDŽ᳈䞡㽕ⱘᰃˈӏࡵⱘߛᤶᕔᕔϡᰃᄸゟⱘˈᐌᐌ䎳݊ᅗⱘ᪡԰㘨㋏೼ϔ䍋DŽ՟བˈ೼ Unix ੠ Linux 㒳ⱘ䆒䅵੠ᅲ⦄㗠㿔ˈԴᕔᕔӮ䗝ᢽĀ∛㓪䇁㿔āᴹᅲ⦄䖭Ͼᴎࠊˈҹ䖒ࠄ᳈催ⱘᬜ⥛੠᳈໻ⱘ♉⌏ ᰃϔ⾡Ā催㑻䇁㿔āⱘ៤ߚDŽԴ೎✊ৃҹ⫼ᅗˈԚᇍѢ᪡԰㋏ڣCPU ᠔ᦤկⱘ䖭⾡ӏࡵߛᤶᴎࠊህད ೼ϔᅮⱘᴵӊϟᴀᴹᰃৃҹㅔ࣪ⱘˈ᳝ⱘџ߭ৃ㛑೼ϔᅮⱘᴵӊϟᑨ䆹ᣝϡৠⱘᮍᓣ㒘ড়DŽ᠔ҹˈi386 ⱘџˈ㗠݊Ё᳝ⱘџخњ᠔᳝ৃ㛑䳔㽕خᣛҸऴ 12 Ͼ CPU ᯊ䩳਼ᳳ˅DŽ೼ᠻ㸠ⱘ䖛⿟ЁˈCPU ᅲ䰙Ϟ ᰃ݌ൟⱘǃ⫮㟇ᰃᵕッⱘĀ໡ᴖᣛҸāᠻ㸠䖛⿟ˈ݊ᠻ㸠䖛⿟䭓䖒 300 ໮Ͼ CPU ᯊ䩳਼ᳳ˄ϔᴵ POP ⱘ㋏㒳㒧ᵘ෎ᴀϞᰃ CISC ⱘˈ㗠䗮䖛 JMP ᣛҸ៪ CALL ᣛҸ˄៪Ёᮁ˅ᅠ៤ӏࡵߛᤶⱘ䖛⿟ৃҹ䇈 䇏㗙ৃ㛑䇃ҹЎⱘ䙷ḋাⳌᔧѢĀϔᴵᣛҸāDŽᅲ䰙ϟˈi386ڣCPU 㞾ࡼᅠ៤ⱘ䖭⾡ӏࡵߛᤶᑊϡᰃ Intel ⱘ䖭⾡䆒䅵⹂ᅲᕜ਼ࠄˈгЎӏࡵߛᤶᦤկњϔϾ䴲ᐌㅔ⋕ⱘᴎࠊDŽԚᰃˈ䇋䇏㗙⊼ᛣˈ⬅ ⫼ⱘⳂᷛ↉˄ҷⷕ↉˅ᅲ䰙Ϟᣛ৥ GDT 㸼ЁⱘϔϾ TSS ᦣ䗄乍ᯊˈህӮᓩ䍋ϔ⃵ӏࡵߛᤶDŽ TR ᣛ৥ᮄⱘ TSSˈᑊᅠ៤ӏࡵߛᤶDŽCPU 䖬ৃҹ䗮䖛 JMP ੠ CALL ᣛҸᅲ⦄ӏࡵߛᤶˈᔧ䏇䕀៪䇗 Ͼ TSS ↉䗝ᢽⷕDŽᔧ CPU ಴Ёᮁ㗠こ䖛ϔϾӏࡵ䮼ᯊˈህӮᇚӏࡵ䮼Ёⱘ↉䗝ᢽⷕ㞾ࡼ㺙ܹ TRˈՓ 䖬᳝ˈ೼ IDT 㸼Ёˈ䰸Ёᮁ䮼ǃ䱋䰅䮼੠䇗⫼䮼໪ˈ䖬ᅮНњϔ⾡Āӏࡵ䮼āDŽӏࡵ䮼Ёࣙ৿᳝ϔ 269 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᭄᥂㒧ᵘ tss_struct ᰃ೼ include/asm•i386/processor.h ЁᅮНⱘˈᅗড᯴њ TSS ↉ⱘ㒧ᵘ˖ 㸠ᇍ唤DŽކINIT_TSS ᅮНDŽℸ໪ˈ↣Ͼ TSS ⱘ䍋ྟഄഔ䛑Ϣ催䗳㓧ᄬЁⱘ㓧 㒧ᵘ᭄㒘 init_tss ⱘ໻ᇣЎ NR_CPUSˈे㋏㒳Ё CPU ⱘϾ᭄DŽ↣Ͼ TSS ⱘݙᆍ䛑Ⳍৠˈ䛑⬅ 33 struct tss_struct init_tss[NR_CPUS] __cacheline_aligned = { [0 ... NR_CPUS•1] = INIT_TSS }; 32 */ 31 * on exact cacheline boundaries, to eliminate cacheline ping•pong. 30 * section. Since TSS's are completely CPU•local, we want them 29 * so they are allowed to end up in the .data.cacheline_aligned 28 * no more per•task TSS's. The TSS size is kept cacheline•aligned 27 * per•CPU TSS segments. Threads are completely 'soft' on Linux, 26 /* ==================== arch/i386/kernel/init_task.c 26 33 ==================== ᇍ INIT_TSS ⱘᓩ⫼߭೼ arch/i386/kernel/init_task.c Ё㒭ߎ˖ 482 #endif 481 # define INIT_TASK_SIZE 2048*sizeof(long) 480 #ifndef INIT_TASK_SIZE ==================== include/linux/sched.h 480 482 ==================== 487 }; 486 unsigned long stack[INIT_TASK_SIZE/sizeof(long)]; 485 struct task_struct task; 484 union task_union { ==================== include/linux/sched.h 484 487 ==================== 24 { INIT_TASK(init_task_union.task) }; 23 __attribute__((__section__(".data.init_task"))) = 22 union task_union init_task_union ==================== arch/i386/kernel/init_task.c 22 24 ==================== 452 #define init_stack (init_task_union.stack) ==================== include/asm•i386/processor.h 452 452 ==================== init_stack ⱘᅮНབϟ˖ 䖭䞠ᡞ㋏㒳Ё㄀ϔϾ䖯⿟ⱘ SS0 䆒㕂៤__KERNEL_DSˈ㗠ᡞ ESP0 䆒㕂៤ᣛ৥&init_stack ⱘ乊ッDŽ 406 } 405 {~0, } /* ioperm */ \ 404 0, INVALID_IO_BITMAP_OFFSET, /* tace, bitmap */ \ 403 __LDT(0),0, /* ldt */ \ 402 0,0,0,0,0,0, /* ds,fs,gs */ \ 401 0,0,0,0,0,0, /* es,cs,ss */ \ 400 0,0,0,0, /* esp,ebp,esi,edi */ \ 399 0,0,0,0, /* eax,ecx,edx,ebx */ \ 398 0,0, /* eip,eflags */ \ 397 0, /* cr3 */ \ 396 0,0,0,0,0,0, /* stack1, stack2 */ \ 395 __KERNEL_DS, 0, /* ss0 */ \ sizeof(init_stack) + (long) &init_stack, /* esp0 */ \ 394 270 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 㒧ᵘ˗㗠೼㒧ᵘⱘϞ䴶ህ⫼԰䖯⿟ⱘ㋏㒳ぎ䯈ේᷜˈ㾕೒ 4.1DŽ 㒧ᵘᯊˈᅲ䰙Ϟߚ䜡ϸϾ䖲㓁ⱘ⠽⧚义䴶˄݅ 8192 ᅕ㡖˅DŽ䖭ϸϾ义䴶ⱘᑩ䚼⫼԰䖯⿟ⱘ task_struct ぎ䯈Ёг䖲೼ϔ䍋DŽݙḌ೼Ў↣Ͼ䖯⿟ߚ䜡ϔϾ task_structټ㔎ϔϡৃˈজ᳝㋻ᆚⱘ㘨㋏ˈ᠔ҹ೼⠽⧚ᄬ ぎ䯈DŽ䖭Ѡ㗙ټࠡ䴶䆆䖛ˈ↣Ͼ䖯⿟䛑᳝ϔϾ task_struct ᭄᥂㒧ᵘ੠ϔ⠛⫼԰㋏㒳ぎ䯈ේᷜⱘᄬ 356 }; 355 unsigned long __cacheline_filler[5]; 354 */ 353 * pads the TSS to be cacheline•aligned (size is 0x100) 352 /* 351 unsigned long io_bitmap[IO_BITMAP_SIZE+1]; 350 unsigned short trace, bitmap; 349 unsigned short ldt, __ldth; 348 unsigned short gs, __gsh; 347 unsigned short fs, __fsh; 346 unsigned short ds, __dsh; 345 unsigned short ss, __ssh; 344 unsigned short cs, __csh; 343 unsigned short es, __esh; 342 unsigned long edi; 341 unsigned long esi; 340 unsigned long ebp; 339 unsigned long esp; 338 unsigned long eax,ecx,edx,ebx; 337 unsigned long eflags; 336 unsigned long eip; 335 unsigned long __cr3; 334 unsigned short ss2,__ss2h; 333 unsigned long esp2; 332 unsigned short ss1,__ss1h; 331 unsigned long esp1; 330 unsigned short ss0,__ss0h; 329 unsigned long esp0; 328 unsigned short back_link,__blh; 327 struct tss_struct { include/asm•i386/processor.h 327 356 ==================== ==================== 271 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䙷МˈЎҔМϡᡞ䖭ഄഔᬒ೼ϔϾܼሔ䞣ЁˈՓᕫ↣⃵䇗ᑺϔϾᮄⱘ䖯⿟䖤㸠ᯊህᇚ䆹䖯⿟ⱘ ੠䇈ᯢˈ䇏㗙ᑨϡ䲒⧚㾷ЎҔМ䖭ḋህৃҹᕫࠄ᠔䳔ⱘഄഔDŽ ⿟ task_struct 㒧ᵘⱘ䍋ྟഄഔ˄∛㓪ҷⷕⱘ㾷䞞ৃখⳟ㄀ 2 ゴ੠㄀ 3 ゴⱘ޴Ͼ՟ᄤ˅DŽ㒧ড়ࠡ䴶ⱘ೒ 4.1 ㄀ 9 㸠䗮䖛ᇚᔧࠡⱘේᷜᣛ䩜ᆘᄬ఼ ESP ⱘݙᆍϢ 8191UL˄0xfffffe00˅ⳌĀϢā㗠ᕫࠄᔧࠡ䖯 13 #define current get_current() 12 11 } 10 return current; 9 __asm__("andl %%esp,%0; ":"=r" (current) : "0" (~8191UL)); 8 struct task_struct *current; 7 { 6 static inline struct task_struct * get_current(void) ==================== include/asm•i386/current.h 6 13 ==================== Ё˄include/asm•i386/current.h˅ᅮНњϔϾᅣ᪡԰ currentˈᦤկᣛ৥ᔧࠡ䖯⿟ task_struct 㒧ᵘⱘᣛ䩜˖ ᔧ䖯⿟೼㋏㒳ぎ䯈䖤㸠ᯊˈᐌᐌ䳔㽕䆓䯂ᔧࠡ䖯⿟㞾䑿ⱘ task_struct ᭄᥂㒧ᵘDŽЎℸⳂⱘˈݙḌ 䳔ⱘぎ䯈ϔ䍋ߚ䜡DŽ⊼ᛣ__get_free_pages()Ё㄀ѠϾখ᭄ⱘؐ 1 㸼⼎ 21ˈгህᰃϸϾ义䴶DŽ ᅲ䰙Ϟैϡᰃˈ䖭ᰃ಴Ў᠔ߚ䜡ⱘᑊϡҙҙᰃ task_struct ᭄᥂㒧ᵘⱘ໻ᇣˈ㗠ᰃ䖲ৠ㋏㒳ぎ䯈ේᷜ᠔ struct task_struct *t = kmalloc(sizeof(struct task_struct)); ៤䖭ḋ˖ڣぎ䯈໻ᇣDŽ㟇Ѣ alloc_task_struct()ⱘᅲ⦄ˈ䇏㗙г䆌Ӯᛇټ䖭ϸ乍෎ᴀ䌘⑤᠔ऴⱘ⠽⧚ᄬ THREAD_SIZE ᅮНЎϸϾ义䴶ˈ㸼⼎↣ϾݙḌ㒓⿟˄ϔϾ䖯⿟ᖙᅮৠᯊজᰃϔϾݙḌ㒓⿟˅ⱘ 448 #define free_task_struct(p) free_pages((unsigned long) (p), 1) 447 #define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL,1)) 446 #define THREAD_SIZE (2*PAGE_SIZE) ==================== include/asm•i386/processor.h 446 448 ==================== ˄include/asm•i386/processor.h˅˖ ᅮњݙḌЁϔѯᅣ᪡԰ⱘᅮНއˈ䖯⿟ task_struct 㒧ᵘҹঞ㋏㒳ぎ䯈ේᷜⱘ䖭⾡⡍⅞ᅝᥦ 䖭䞠ⱘ buf ᰃሔ䚼ব䞣ˈ಴Ўᰃ೼ේᷜЁˈᅗϔϟᄤህ㗫এњ 1K ᄫ㡖ˈᰒ✊ᰃϡড়䗖ⱘDŽ } ...... char buf[1024]; { int something() ˖ܡህᑨ䆹䙓 ϟ䴶⿟ᑣЁ䖭ḋⱘሔ䚼ব䞣ڣ᭄ጠ༫໾⏅ˈৠᯊˈ೼䖭ѯߑ᭄ЁгϡᅰՓ⫼໾໮ǃ໾໻ⱘሔ䚼ব䞣DŽ ᠔ҹˈ೼Ёᮁ᳡ࡵ⿟ᑣǃݙḌ䕃Ёᮁ᳡ࡵ⿟ᑣҹঞ݊ᅗ䆒໛偅ࡼ⿟ᑣⱘ䆒䅵Ёˈᑨ⊼ᛣϡ㛑䅽䖭ѯߑ ぎ䯈ේᷜ䙷ḋৃҹ೼䖤㸠ᯊࡼᗕഄᠽሩ˄㾕㄀ 2 ゴ˅ˈ㗠ᰃ䴭ᗕഄ⹂ᅮњⱘDŽ᠋⫼ڣぎ䯈ේᷜⱘぎ䯈ϡ ᭄᥂㒧ᵘ task_struct ⱘ໻ᇣ㑺 1K ᄫ㡖ˈ᠔ҹ䖯⿟㋏㒳ぎ䯈ේᷜⱘ໻ᇣ㑺Ў 7K ᄫ㡖DŽ⊼ᛣˈ㋏㒳 ೒ 4.1 䖯⿟㋏㒳ේᷜ⼎ᛣ೒ 272 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 306 * (only the 'next' pointer fits into the cacheline, but 305 /* 304 unsigned long cpus_allowed; 303 int has_cpu, processor; 302 struct mm_struct *mm; 301 unsigned long policy; 300 long nice; 299 long counter; 298 */ 297 * the goodness() loop in schedule(). 296 * all fields in a single cacheline that are needed for 295 * offset 32 begins here on 32•bit platforms. We keep 294 /* 293 292 int lock_depth; /* Lock depth */ 291 290 unsigned long ptrace; 289 volatile long need_resched; 288 struct exec_domain *exec_domain; 287 */ 286 0•0xFFFFFFFF for kernel•thread 285 0•0xBFFFFFFF for user•thead 284 mm_segment_t addr_limit; /* thread address space: 283 int sigpending; 282 unsigned long flags; /* per process flags, defined below */ 281 volatile long state; /* •1 unrunnable, 0 runnable, >0 stopped */ 280 */ 279 * offsets of these are hardcoded elsewhere • touch with care 278 /* 277 struct task_struct { ==================== include/linux/sched.h 277 397 ==================== task_struct ⱘᅮН೼ include/linux/sched.h Ё㒭ߎ˖ task_struct 㒧ᵘП䯈ⱘ݇㋏DŽ ៥Ӏ೼㄀ 2 ゴЁ䏇䖛њᇍ䖭↉⿟ᑣⱘ㾷䞞ˈৠЎ䙷ᯊ䖬≵᳝䆆ࠄ䖯⿟ⱘ㋏㒳ぎ䯈ේᷜϢ݊ 115 "andl $•8192, %ebx\n\t" 114 "movl %esp, %ebx\n\t" \ 113 #define GET_CURRENT \ ==================== include/asm•i386/hw_irq.h 113 115 ==================== include/asm•i386/hw_irq.h ЁᅮНⱘ˖ ϢℸⳌ㉏Ԑⱘˈ䖬᳝೼䖯ܹЁᮁ੠㋏㒳䇗⫼ᯊ᠔ᓩ⫼ⱘᅣ᪡԰ GET_CURRENTˈ䙷ᰃ೼ ㋏㒳⿟ᑣਬⱘĀᡴ䮼āⳳᰃࠄњᵕ⚍DŽ 䖭ḋ೼䳔㽕ᯊᠡЈᯊᡞᅗ䅵ㅫߎᴹড㗠ᬜ⥛᳈催DŽ䇏㗙Ң䖭䞠гৃҹⳟߎˈ催∈ᑇⱘڣˈ਼ᳳˈ᠔ҹ AND ᣛҸⱘᠻ㸠া䳔 4 Ͼ CPU ᯊ䩳਼ᳳˈ㗠ϔᴵҢᆘᄬ఼ࠄᆘᄬ఼ⱘ MOV ᣛҸгᠡ 2 Ͼ CPU ᯊ䩳 task_struct 㒧ᵘⱘ䍋ྟഄഔݭܹ䖭Ͼব䞣ˈҹৢ֓䱣ᯊৃ⫼ˈ䖭ḋϡᰃ᳈᳝ᬜ৫˛ㄨḜᙄᙄⳌডDŽϔᴵ 273 274 307 * that's just fine.) 308 */ 309 struct list_head run_list; 310 unsigned long sleep_time; 311 312 struct task_struct *next_task, *prev_task; 313 struct mm_struct *active_mm; 314 315 /* task state */ 316 struct linux_binfmt *binfmt; 317 int exit_code, exit_signal; 318 int pdeath_signal; /* The signal sent when the parent dies */ 319 /* ??? */ 320 unsigned long personality; 321 int dumpable:1; 322 int did_exec:1; 323 pid_t pid; 324 pid_t pgrp; 325 pid_t tty_old_pgrp; 326 pid_t session; 327 pid_t tgid; 328 /* boolean value for session group leader */ 329 int leader; 330 /* 331 * pointers to (original) parent process, youngest child, younger sibling, 332 * older sibling, respectively. (p•>father can be replaced with 333 * p•>p_pptr•>pid) 334 */ 335 struct task_struct *p_opptr, *p_pptr, *p_cptr, *p_ysptr, *p_osptr; 336 struct list_head thread_group; 337 338 /* PID hash table linkage. */ 339 struct task_struct *pidhash_next; 340 struct task_struct **pidhash_pprev; 341 342 wait_queue_head_t wait_chldexit; /* for wait4() */ 343 struct semaphore *vfork_sem; /* for vfork() */ 344 unsigned long rt_priority; 345 unsigned long it_real_value, it_prof_value, it_virt_value; 346 unsigned long it_real_incr, it_prof_incr, it_virt_incr; 347 struct timer_list real_timer; 348 struct tms times; 349 unsigned long start_time; 350 long per_cpu_utime[NR_CPUS], per_cpu_stime[NR_CPUS]; 351 /* mm fault and swap info: this can arguably be seen as either mm•specific or thread•specific */ 352 unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap; 353 int swappable:1; 354 /* process credentials */ 355 uid_t uid,euid,suid,fsuid; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 84 #define TASK_RUNNING 0 ==================== include/linux/sched.h 84 88 ==================== ㄀ 281 㸠ⱘ state 㸼⼎䖯⿟ᔧࠡⱘ䖤㸠⢊ᗕˈ݋ԧᅮН㾕 include/linux/sched.h˖ ৃҹߚ៤⢊ᗕǃᗻ䋼ǃ䌘⑤੠㒘㒛ㄝ޴໻㉏DŽ ᡞ㒧ᵘЁ޴Ͼ⡍߿䞡㽕ⱘ៤ߚҟ㒡ϔϟˈ݊ԭ߭⬭ᕙҹৢ⫼ࠄⱘᯊ׭ݡᴹҟ㒡DŽ䖭ѯ៤ߚ໻ԧܜ ;{ 397 396 spinlock_t alloc_lock; 395 /* Protection of (de•)allocation: mm, files, fs, tty */ 394 u32 self_exec_id; 393 u32 parent_exec_id; 392 /* Thread group tracking */ 391 390 sigset_t *notifier_mask; 389 void *notifier_data; 388 int (*notifier)(void *priv); 387 size_t sas_ss_size; 386 unsigned long sas_ss_sp; 385 384 struct sigpending pending; 383 sigset_t blocked; 382 381 struct signal_struct *sig; 380 spinlock_t sigmask_lock; /* Protects signal and blocked */ 379 /* signal handlers */ 378 struct files_struct *files; 377 /* open file information */ 376 struct fs_struct *fs; 375 /* filesystem information */ 374 struct thread_struct thread; 373 /* CPU•specific state of this task */ 372 struct sem_queue *semsleeping; 371 struct sem_undo *semundo; 370 /* ipc stuff */ 369 unsigned int locks; /* How many file locks are being held */ 368 struct tty_struct *tty; /* NULL if no tty */ 367 int link_count; 366 /* file system info */ 365 char comm[16]; 364 unsigned short used_math; 363 struct rlimit rlim[RLIM_NLIMITS]; 362 /* limits */ 361 struct user_struct *user; 360 int keep_capabilities:1; 359 kernel_cap_t cap_effective, cap_inheritable, cap_permitted; 358 gid_t groups[NGROUPS]; 357 int ngroups; gid_t gid,egid,sgid,fsgid; 356 275 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 412 411 #define PF_VFORK 0x00001000 /* Wake up parent in mm_release */ 410 #define PF_MEMALLOC 0x00000800 /* Allocating memory */ 409 #define PF_SIGNALED 0x00000400 /* killed by a signal */ 408 #define PF_DUMPCORE 0x00000200 /* dumped core */ 407 #define PF_SUPERPRIV 0x00000100 /* used super•user privileges */ 406 #define PF_FORKNOEXEC 0x00000040 /* forked but didn't exec */ 405 #define PF_EXITING 0x00000004 /* getting shut down */ 404 #define PF_STARTING 0x00000002 /* being created */ 403 /* Not implemented yet, only for 486*/ 402 #define PF_ALIGNWARN 0x00000001 /* Print alignment warning msgs */ 401 */ 400 * Per process flags 399 /* ==================== include/linux/sched.h 399 413 ==================== 䖭ѯᷛᖫԡгᰃ೼ include/linux/sched.h ЁᅮНⱘ˖ ㄀ 282 㸠Ёⱘ flags гᰃড᯴䖯⿟⢊ᗕⱘֵᙃˈԚᑊϡᰃ䖤㸠⢊ᗕˈ㗠ᰃϢㅵ⧚᳝݇ⱘ݊ᅗֵᙃDŽ 㗏䖛এⳟϔϟDŽܜ ೼ᴀゴĀ䖯⿟ⱘ䇗ᑺϢߛᤶāϔ㡖Ё᳝ϔϾ䖯⿟ⱘ⢊ᗕ䕀ᤶ⼎ᛣ೒˄㄀ 367 义೒ 4.4˅ˈ䇏㗙ϡོ TASK_STOPPED 㗠䖯ܹĀᣖ䍋ā⢊ᗕˈ✊ৢ೼᥹ᬊࠄϔϾ SIGCONT ֵোᯊজᘶ໡㒻㓁䖤㸠DŽ TASK_STOPPED Џ㽕⫼Ѣ䇗䆩ⳂⱘDŽ䖯⿟᥹ᬊࠄϔϾ SIGSTOP ֵোৢህᇚ䖤㸠⢊ᗕᬍ៤ TASK_ZOMBIE ⢊ᗕ㸼⼎䖯⿟Ꮖ㒣ĀএϪā˄exit˅㗠Ā᠋ষāᇮ᳾⊼䫔DŽ ᇚ䆹䖯⿟ⱘ task_struct 㒧ᵘ䗮䖛݊䯳߫༈ run_list˄㾕 309 㸠˅ᣖܹϔϾĀ䖤㸠䯳߫āDŽ 㸼⼎䖭Ͼ䖯⿟ৃҹ㹿䇗ᑺᠻ㸠㗠៤Ўᔧࠡ䖯⿟DŽᔧ䖯⿟໘Ѣ䖭ḋⱘৃᠻ㸠˄៪ህ㒾˅⢊ᗕᯊˈݙḌህ TASK_RUNNING ⢊ᗕᑊϡᰃ㸼⼎ϔϾ䖯⿟ℷ೼ᠻ㸠Ёˈ៪㗙䇈䖭Ͼ䖯⿟ህᰃĀᔧࠡ䖯⿟āˈ㗠ᰃ ᮁā㗠㿔DŽ োāˈ㗠ֵোⱘὖᗉᅲ䰙ϞϢЁᮁⱘὖᗉᰃⳌৠⱘˈ᠔ҹ䖭䞠᠔䇧 INTERRUPTIBLE гᰃᣛ䖭⾡Ā䕃Ё 䎳ĀЁᮁā↿᮴݇㋏ˈ㗠াᰃ䇈ⴵ⳴㛑৺಴݊ᅗџӊ㗠Ёᮁˈे૸䝦DŽϡ䖛ˈ᠔䇧݊ᅗџӊЏ㽕ᰃĀֵ 䗮䖛থϔϾֵোᴹĀᴔāᥝ䖭Ͼ䖯⿟њDŽ䖬ᑨ䆹⊼ᛣˈ䖭䞠ⱘ INTERRUPTIBLE ៪ UNINTERRUPTIBLE ԰ҎਬᣝᶤϾ䬂ⱘᯊ׭ˈህϡᑨ䆹䖯ܹ⏅ᑺⴵ⳴ˈ৺߭ህϡ㛑ᇍ߿ⱘџӊ԰ߎডᑨˈ߿ⱘ䖯⿟ህϡ㛑 ⱘ㋏㒳䇗⫼Ёㄝᕙᶤϔџӊথ⫳ᯊˈᑨ䆹䖯ܹĀৃЁᮁāⴵ⳴㗠ϡᑨ⏅ᑺⴵ⳴DŽ՟བˈᔧ䖯⿟ㄝᕙ᪡ ⬠ऎ੠݇䬂ᗻⱘ䚼ԡˈ㗠ĀৃЁᮁāⱘⴵ⳴䙷ህᰃ䗮⫼ⱘњDŽ⡍߿ˈᔧ䖯⿟೼Ā䰏าᗻā˄blocking˅ ᑺⴵ⳴ˈ㗠 interruptible_sleep_on()੠ wake_up_interruptible()߭⫼Ѣ⌙ᑺⴵ⳴DŽ⏅ᑺⴵ⳴ϔ㠀া⫼ѢЈ ϔϾ䖯⿟䖯ܹϡৠ⏅ᑺⱘⴵ⳴៪ᇚ䖯⿟Ңⴵ⳴Ё૸䝦DŽ݋ԧഄ䇈ˈߑ᭄ sleep_on()੠ wake_up()⫼Ѣ⏅ ᠧᡄˈ㗠 TASK_INTERRUPTIBLE ߭ৃҹ಴Āֵোāⱘࠄᴹ㗠㹿૸䝦DŽݙḌЁᦤկњϡৠⱘߑ᭄ˈ䅽 TASK_UNINTERRUPTIBLE 㸼⼎䖯⿟໘ѢĀ⏅ᑺⴵ⳴ā㗠ϡফĀֵোā˄signalˈг⿄Ā䕃Ёᮁā˅ⱘ ⢊ᗕ TASK_INTERRUPTIBLE ੠ TASK_UNINTERRUPTIBLE ഛ㸼⼎䖯⿟໘Ѣⴵ⳴⢊ᗕDŽԚᰃˈ 88 #define TASK_STOPPED 8 87 #define TASK_ZOMBIE 4 86 #define TASK_UNINTERRUPTIBLE 2 define TASK_INTERRUPTIBLE 1# 85 276 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! exec_domain——䰸њ personality ҹ໪ˈᑨ⫼⿟ᑣ䖬᳝ϔѯ݊ᅗⱘ⠜ᴀ䯈ⱘᏂᓖ˗Ң㗠ᔶ៤њϡৠ 32 #define PER_SOLARIS (0x000d | STICKY_TIMEOUTS) 31 #define PER_RISCOS (0x000c) 30 #define PER_IRIX64 (0x000b | STICKY_TIMEOUTS) /* IRIX6 64•bit */ 29 #define PER_IRIXN32 (0x000a | STICKY_TIMEOUTS) /* IRIX6 new 32•bit */ 28 #define PER_IRIX32 (0x0009 | STICKY_TIMEOUTS) /* IRIX5 32•bit */ 27 #define PER_LINUX32 (0x0008) 26 #define PER_XENIX (0x0007 | STICKY_TIMEOUTS) 25 #define PER_SUNOS (PER_BSD | STICKY_TIMEOUTS) 24 #define PER_BSD (0x0006) 23 #define PER_ISCR4 (0x0005 | STICKY_TIMEOUTS) 22 #define PER_WYSEV386 (0x0004 | STICKY_TIMEOUTS) 21 #define PER_SCOSVR3 (0x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS) 20 #define PER_SVR3 (0x0002 | STICKY_TIMEOUTS) 19 #define PER_SVR4 (0x0001 | STICKY_TIMEOUTS) 18 #define PER_LINUX_32BIT (0x0000 | ADDR_LIMIT_32BIT) 17 #define PER_LINUX (0x0000) 16 #define PER_MASK (0x00ff) 15 */ 14 * it will conflict with error returns. 13 /* Personality types. These go in the low byte. Avoid using the top bit, 12 11 #define ADDR_LIMIT_32BIT 0x0800000 10 #define WHOLE_SECONDS 0x2000000 9 #define STICKY_TIMEOUTS 0x4000000 8 /* Flags for bug emulation. These occupy the top three bytes. */ ==================== include/linux/personality.h 8 32 ==================== ĀϾᗻāDŽ᭛ӊ include/linux/personality.h ЁᅮНњ᳝݇ⱘᐌ᭄˖ ⱘᑨ⫼⿟ᑣህ᳾ᖙϢЎ Linux ᓔথⱘ݊ᅗ䕃ӊᅠܼݐᆍDŽ᠔ҹḍ᥂ᠻ㸠⿟ᑣⱘϡৠˈ↣Ͼ䖯⿟䛑᳝݊ personality——⬅Ѣ Unix ᳝䆌໮ϡৠⱘ⠜ᴀ੠ব⾡ˈᑨ⫼⿟ᑣгህ᳝њ䗖⫼㣗ೈˈ՟བ Unix SVR4 㒓⿟㗠㿔߭ᰃ㋏㒳ぎ䯈ⱘϞ䰤ˈ᠔ҹᰃ 0xffffffffDŽ addr_limit——㰮ᄬഄഔぎ䯈ⱘϞ䰤DŽᇍ䖯⿟㗠㿔ᰃ݊⫼᠋ぎ䯈ⱘϞ䰤ˈ᠔ҹᰃ 0xbfffffff˗ᇍݙḌ Ϟ䗄ᔧࠡ⢊ᗕ䛑ড᯴њ䖯⿟ⱘࡼᗕ⡍ᕕˈ䖬᳝ϔѯ߭ড᯴䴭ᗕ⡍ᕕ˖ need_resched——Ϣ䇗ᑺ᳝݇ˈ㸼⼎ CPU Ң㋏㒳ぎ䯈䖨ಲ⫼᠋ぎ䯈ࠡ໩㽕䖯㸠ϔ⃵䇗ᑺDŽ counter——Ϣ䇗ᑺ᳝݇ˈ䆺㾕Ā䖯⿟ⱘ䇗ᑺϢߛᤶāϔ㡖DŽ ЁⱘĀֵোāϔ㡖ˈҹঞᴀゴЁⱘ᳝݇ভ䗄DŽ sigqueueǃsigqueue_tailǃsig ㄝᣛ䩜ҹঞ sigmask_lockǃsignalǃblocked ㄝ៤ߚDŽ䇋䆺㾕Ā䖯⿟䯈䗮ֵ” sigpending——㸼⼎䖯⿟ᬊࠄњĀֵোāԚᇮ᳾໘⧚DŽϢ䖭ϾᷛᖫⳌ㘨㋏ⱘᰃϢֵো䯳᳝߫݇ 䰸Ϟ䗄ⱘ state ੠ flags ҹ໪ˈড᯴ᔧࠡ⢊ᗕⱘ៤ߚ䖬᳝ϟ䴶䖭Мϔѯ˖ ҷⷕ԰㗙᠔ࡴⱘ⊼㾷Ꮖ㒣䇈ᯢњ৘Ͼᷛᖫԡⱘ԰⫼ˈ䖭䞠ህϡ໮䇈њDŽ define PF_USEDFPU 0x00100000 /* task used FPU this quantum (SMP) */# 413 277 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 8 #define RLIMIT_CPU 0 /* CPU time in ms */ 7 6 */ 5 * Resource limits 4 /* ==================== include/asm•i386/resource.h 4 20 ==================== include/asm•i386/resource.h Ё㒭ߎ˖ ᇍ i386 ⦃๗㗠㿔ˈ䖯⿟ৃ⫼䌘⑤᳝݅ RLIM_NLIMITS 乍ˈे 10 乍DŽ↣⾡䌘⑤ⱘ䰤ࠊ೼᭛ӊ 43 }; 42 unsigned long rlim_max; 41 unsigned long rlim_cur; 40 struct rlimit { ==================== include/linux/resource.h 40 43 ==================== ゴЁᏆ㒣ⳟࠄ䖛݊ᑨ⫼DŽ᭄᥂㒧ᵘ rlimit ೼ include/linux/resource.h ЁᅮНⱘ˖ ㅵ⧚āϔټrlim——䖭ᰃϔϾ㒧ᵘ᭄㒘ˈ㸼ᯢ䖯⿟ᇍ৘⾡䌘⑤ⱘՓ⫼᭄䞣᠔ফⱘ䰤ࠊDŽ䇏㗙೼Āᄬ Ͼ䖯⿟ⱘ user 㒧ᵘᰃϸⷕџDŽLinux ݙḌЁⱘ user 㒧ᵘᰃ䴲ᐌㅔऩⱘˈ䆺㾕Ā㋏㒳䇗⫼ fork()āϔ㡖DŽ user——ᣛ৥ϔϾ user_struct 㒧ᵘˈ䆹᭄᥂㒧ᵘҷ㸼ⴔ䖯⿟᠔ሲⱘ⫼᠋DŽ⊼ᛣ䖭䎳 Unix ݙḌЁ↣ 䖭ᮍ䴶ⱘⷨおϢথሩгᰃϔϾ䞡㽕ⱘ䇒乬DŽ 䰤㒧ড়೼ϔ䍋ˈᔶ៤њ㋏㒳ᅝܼᗻⱘ෎⸔DŽ೼⦄Ҟⱘ㔥㒰ᯊҷˈ䖭⾡ᅝܼᗻℷ೼বᕫᛜᴹᛜ䞡㽕ˈ㗠 ᰃ৺᳝ᴗ䞡ᓩᇐ᪡԰㋏㒳˄䖨ಲ䴲 0 㸼⼎᳝ᴗ˅DŽؐᕫ⊼ᛣⱘᰃˈᇍ᪡԰ᴗ䰤ⱘ䖭⾡ߦߚϢ᭛ӊ䆓䯂ᴗ ᭄ capable()ˈ⫼ᴹẔ偠ᔧࠡ䖯⿟ᰃ৺݋᳝ᶤ⾡ᴗ䰤DŽབ capable(CAP_SYS_BOOT)ˈህᰃẔᶹᔧࠡ䖯⿟ 䆺㒚ⱘ⊼㾷˄䆺㾕Ā᭛ӊ㋏㒳āϔゴ˅DŽ↣ϔ⾡ᴗ䰤䛑⬅ϔϾᷛᖫԡҷ㸼ˈݙḌЁᦤկњϔϾ inline ߑ ᰃĀ⡍ᴗ⫼᠋ā䖯⿟DŽ᭛ӊ include/linux/capability.h ЁᅮНњ䆌໮䖭ḋⱘᴗ䰤ˈҷⷕⱘ԰㗙䖬ࡴњⳌᔧ ѢϔϾ䖯⿟ᰃ৺އ᳝ CAP_SYS_BOOT ᥜᴗDŽ䖭ḋˈህᡞ䖯⿟ⱘ৘⾡ᴗ䰤ߚ㒚њˈ㗠ϡݡᰃㄐ㒳ഄপ Ѣ䆹䖯⿟ᰃ৺݋އᅮⱘ˗ϔϾ䖯⿟ᰃ৺᳝ᴗ䞡ᮄᓩᇐ᪡԰㋏㒳ˈ߭পއ݋᳝ CAP_SYS_PTRACE ᥜᴗ ৘⾡ϡৠⱘᴗ䰤DŽ՟བˈϔϾ䖯⿟ᰃ৺ৃҹ䗮䖛㋏㒳䇗⫼ ptrace()䎳䏾঺ϔϾ䖯⿟ˈህᰃ⬅䆹䖯⿟ᰃ৺ cap_effectiveǃcap_inheritableǃcap_permitted——ϔ㠀䖯⿟䛑ϡ㛑ĀЎ᠔℆Ўāˈ㗠ᰃ৘㞾㹿䌟ќњ ゴDŽ uidǃeuidǃsuidǃfsuidǃgidǃegidǃsgidǃfsgid——Џ㽕Ϣ᭛ӊ᪡԰ᴗ䰤᳝݇ˈ㾕Ā᭛ӊ㋏㒳āϔ parent_exec_idǃself_exec_id——Ϣ䖯⿟㒘˄session˅᳝݇ˈ㾕Ā㋏㒳䇗⫼ exit()Ϣ wait4()āDŽ policy——䗖⫼Ѣᴀ䖯⿟ⱘ䇗ᑺᬓㄪˈ䆺㾕Ā䖯⿟ⱘ䇗ᑺϢߛᤶāDŽ 㑻߿ˈ䆺㾕Ā䖯⿟ⱘ䇗ᑺϢߛᤶāDŽܜ㑻߿ҹঞĀᅲᯊāӬܜpriorityǃrt_priority——Ӭ ៤䖯⿟㒘DŽ䆺㾕Ā㋏㒳䇗⫼ execāϔ㡖DŽ 䖯⿟䛑ሲѢ䖭ৠϔϾ sessionDŽℸ໪ˈ㢹ᑆ䖯⿟ৃҹ䗮䖛Āㅵ䘧ā㒘ড়೼ϔ䍋ˈབ“ls | wc •lāˈҢ㗠ᔶ pgrpǃsessionǃleader——ᔧϔϾ⫼᠋ⱏᔩࠄ㋏㒳ᯊˈህᓔྟњϔϾ䖯⿟㒘˄session˅ˈℸৢ߯ᓎⱘ pid——䖯⿟োDŽ exit_codeǃexit_signalǃpdeath_signal——䆺㾕Ā㋏㒳䇗⫼ exit()Ϣ wait4()āDŽ binfmt——ᑨ⫼⿟ᑣⱘ᭛ӊḐᓣˈབ a.outǃelf ㄝDŽ䆺㾕Ā㋏㒳䇗⫼ exec()āϔ㡖DŽ ⱘĀᠻ㸠ඳāDŽ䖭Ͼᣛ䩜ህᣛ৥ᦣ䗄ᴀ䖯⿟᠔ሲᠻ㸠ඳⱘ᭄᥂㒧ᵘDŽ 278 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ೒ 4.2 䖯⿟ᆊ䈅⼎ᛣ೒ ೒ 4.2 ህᰃ䖭Ͼ䖯⿟Āᆊ䈅āⱘ⼎ᛣ೒DŽ ⱘভ䗄DŽ ⿟䫒DŽ䖭ѯᣛ䩜⹂ᅮњϔϾ䖯⿟೼݊ĀᅫᮣāЁⱘϞǃϟǃᎺǃে݇㋏ˈ䆺㾕ᴀゴЁᇍ fork()੠ exit() ᣛ৥᳔Āᑈ䕏āⱘᄤ䖯⿟ˈ㗠 p_ysptr ੠ p_osptr ߭ߚ߿ᣛ৥݊Āᓳᓳā੠ĀહહāˈҢ㗠ᔶ៤ϔϾᄤ䖯 p_pptrǃp_cptrǃp_ysptr ੠ p_osptr ᵘ៤DŽ݊Ё p_opptr ੠ p_pptr ᣛ৥⠊䖯⿟ⱘ task_struct 㒧ᵘˈp_cptr ᰃ⬅↣Ͼ䖯⿟ⱘĀᆊᒁϢ⼒Ӯ݇㋏āᔶ៤ⱘĀᅫᮣā៪Āᆊ䈅āDŽ䖭ᰃϔ⾡ᷥൟⱘ㒘㒛ˈ䗮䖛ᣛ䩜 p_opptrǃ ⿟Ⳍ㘨㋏DŽҢݙḌⱘ㾦ᑺⳟˈ߭ᰃ㽕ᣝϡৠⱘⳂⱘ੠ᗻ䋼ᇚ↣Ͼ䖯⿟㒇ܹϡৠⱘ㒘㒛ЁDŽ㄀ϔϾ㒘㒛 ᳔ৢˈ↣ϔϾ䖯⿟䛑ϡᰃᄸゟഄᄬ೼Ѣ㋏㒳Ёˈ㗠ᘏᰃḍ᥂ϡৠⱘⳂⱘǃ݇㋏੠䳔㽕Ϣ݊ᅗⱘ䖯 ᳝ cmin_fltˈ೼᭄᥂㒧ᵘ times ЁⳌᇍѢ tms_utime ᳝ tms_cutime ㄝDŽ ᳝݇㒳䅵ֵᙃ㽕ড়ᑊࠄ⠊䖯⿟Ёˈ᠔ҹᇍ↣乍㒳䅵ֵᙃ䛑᳝ϔ乍ⳌᑨⱘĀᘏ䅵āֵᙃˈབⳌᇍѢ min_flt min_fltǃmaj_flt ҹঞᤶܹˋᤶߎⱘ⃵᭄ nswap ㄝDŽᔧϔϾ䖯⿟䗮䖛 do_exit()㒧ᴳ݊⫳ੑᯊˈ䆹䖯⿟ⱘ ⱘ㌃䅵ᯊ䯈DŽ㗠᭄᥂㒧ᵘ times Ё߭ᰃᇍ䖭ѯᯊ䯈ⱘ∛ᘏDŽℸ໪ˈ䖬᳝ᇍথ⫳义䴶ᓖᐌ⃵᭄ⱘ㒳䅵 ˄೼໮໘⧚఼ SMP 㒧ᵘЁˈϔϾ䖯⿟ৃҹফ䇗ᑺ೼ϡৠⱘ໘⧚఼Ϟ䖤㸠˅䖤㸠Ѣ⫼᠋ぎ䯈੠㋏㒳ぎ䯈 㟇Ѣ㒳䅵ֵᙃˈ߭Џ㽕᳝ per_cpu_utime[]੠ per_cpu_stime[]ϸϾ᭄㒘ˈ㸼⼎䆹䖯⿟೼৘Ͼ໘⧚఼Ϟ it_real_value ㄝˈᇍ䖭ѯ៤ߚ䛑᳝ϧ䮼ⱘゴ㡖ࡴҹҟ㒡ˈ䖭䞠ህϡ䞡໡њDŽ 䖬᳝ϔѯ៤ߚҷ㸼䖯⿟᠔ऴ᳝੠Փ⫼ⱘ䌘⑤ˈབ mmǃactive_mmǃfsǃfilesǃttyǃreal_timerǃtimesǃ 20 #define RLIM_NLIMITS 11 19 18 #define RLIMIT_LOCKS 10 /* maximum file locks held */ 17 #define RLIMIT_AS 9 /* address space limit */ 16 #define RLIMIT_MEMLOCK 8 /* max locked•in•memory address space */ 15 #define RLIMIT_NOFILE 7 /* max number of open files */ 14 #define RLIMIT_NPROC 6 /* max number of processes */ 13 #define RLIMIT_RSS 5 /* max resident set size */ 12 #define RLIMIT_CORE 4 /* max core file size */ 11 #define RLIMIT_STACK 3 /* max stack size */ 10 #define RLIMIT_DATA 2 /* max data size */ define RLIMIT_FSIZE 1 /* Maximum filesize */# 9 279 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! П໪ˈ᠔᳝݊ᅗⱘ䖯⿟੠ݙḌ㒓⿟䛑⬅䖭Ͼॳྟ䖯⿟៪݊ᄤᄭ䖯⿟᠔߯ᓎˈ䛑ᰃ䖭Ͼॳྟ䖯⿟ⱘĀৢ དњⱘDŽݙḌ೼ᓩᇐᑊᅠ៤њ෎ᴀⱘ߱ྟ࣪ҹৢˈህ᳝њ㋏㒳ⱘ㄀ϔ䖯⿟˄ᅲ䰙ϞᰃݙḌ㒓⿟˅DŽ䰸ℸ ⍜ѵⱘ䖛⿟DŽ೼ Linux ㋏㒳Ёˈ㄀ϔϾ䖯⿟ᰃ㋏㒳೎᳝ⱘǃϢ⫳ׅᴹⱘ៪㗙䇈ᰃ⬅ݙḌⱘ䆒䅵㗙ᅝᥦ ϪϞϛ⠽䛑᳝ѻ⫳ǃথሩϢ⍜ѵⱘ䖛⿟ϔḋˈ↣Ͼ䖯⿟г᳝㹿߯ᓎǃᠻ㸠ᶤ↉⿟ᑣҹঞ᳔ৢڣህ 4.2 䖯⿟ϝ䚼᳆˖߯ᓎǃᠻ㸠Ϣ⍜ѵ Ā䖯⿟䇗ᑺϢ䖯⿟ߛᤶāҹঞĀ㋏㒳䇗⫼ nanosleep()āЁⱘ᳝݇ভ䗄DŽ ૸䝦ᯊ߭জ䫒ܹࠄ䆹䯳߫Ёˈ೼䇗ᑺⱘ䖛⿟Ёг᳝ৃ㛑ӮᬍবϔϾ䖯⿟೼ℸ䯳߫Ёⱘԡ㕂DŽ䆺㾕ᴀゴ 催ˈгՓҷⷕᕫҹㅔ࣪DŽৃᠻ㸠䯳߫ⱘব࣪ᰃ䴲ᐌ乥㐕ⱘˈϔϾ䖯⿟䖯ܹⴵ⳴ᯊህҢ䯳߫Ё㜅䫒ˈ㹿 ҹࠡ䇈䖛ˈ䖭ᰃ⫼Ѣঠ৥䫒᥹ⱘ䗮⫼᭄᥂㒧ᵘˈ݋᳝ϔѯϢП䜡༫ⱘߑ᭄៪ᅣ᪡԰ˈ໘⧚ⱘᬜ⥛↨䕗 ϔϾ䖯⿟ⱘ task_struct ᰃ䗮䖛݊ list_head ᭄᥂㒧ᵘ run_listǃ㗠ϡᰃϾ߿ⱘᣛ䩜ˈ䫒᥹䖯ৃᠻ㸠䯳߫ⱘDŽ ᳔䞡㽕ⱘ䯳߫ˈϔϾ䖯⿟া᳝೼ৃᠻ㸠䯳߫Ёᠡ᳝ৃ㛑ফࠄ䇗ᑺ㗠ᡩܹ䖤㸠DŽϢࠡ޴Ͼ䯳߫ϡৠⱘᰃˈ ೼䖤㸠ⱘ䖛⿟ЁˈϔϾ䖯⿟䖬ৃҹࡼᗕഄ䫒᥹䖯Āৃᠻ㸠䯳߫ā᥹ফ㋏㒳ⱘ䇗ᑺDŽᅲ䰙Ϟˈ䖭ᰃ 䯳߫䛑ᰃ䴭ᗕⱘDŽ ↣Ͼ䖯⿟䛑ᖙ✊ৠᯊ䑿໘䖭ϝϾ䯳߫ПЁˈⳈࠄ䖯⿟⍜ѵᯊᠡҢ䖭ϝϾ䯳߫Ёᨬ䰸ˈ᠔ҹ䖭ϝϾ task_struct 㒧ᵘЁⱘ next_task ੠ prev_task ϸϾᣛ䩜䫒ܹ䖭Ͼ㒓ᗻ䯳߫ЁDŽ Ͼ㒓ᗻ䯳߫ህᰃҹ init_task Ў䍋⚍˄гৃҹᡞᅗⳟ៤ᰃϔϾ䯳߫༈˅ˈৢ㒻↣߯ᓎϔϾ䖯⿟ህ䗮䖛݊ ህᰃ䖭МϔϾ㒓ᗻ䯳߫DŽ㋏㒳Ё㄀ϔϾᓎゟⱘ䖯⿟Ў init_taskˈ䖭Ͼ䖯⿟ህᰃ᠔᳝䖯⿟ⱘᘏḍˈ᠔ҹ䖭 䖭ḋህৃҹ䗮䖛ϔϾㅔऩⱘ for ᕾ⦃៪ while ᕾ⦃䘡ग़᠔᳝䖯⿟ⱘ task_struct 㒧ᵘDŽ᠔ҹˈ㄀ϝϾ㒘㒛 ҔМџᚙᯊˈ䖬䳔㽕ᇚ㋏㒳Ё᠔᳝ⱘ䖯⿟䛑㒘㒛៤ϔϾ㒓ᗻⱘ䯳߫ˈ⚍خᔧݙḌ䳔㽕ᇍ↣ϔϾ䖯⿟ ࠄ pid ЎᶤϾ㒭ᅮؐⱘ䖯⿟ህᕜ䖙䗳њDŽ 㸼ⱘՓ⫼ˈ㽕ᡒޥDŽ⬅Ѣᴖؐޥ㸼ЁⱘᶤϾ䯳߫Ёˈৠϔ䯳߫Ё᠔᳝䖯⿟ⱘ pid 䛑݋᳝Ⳍৠⱘᴖޥࠄᴖ དऴϔϾ义䴶DŽ↣Ͼ䖯⿟ⱘ task_struct ᭄᥂㒧ᵘ䛑䗮䖛݊ pidhash_next ੠ pidhash_pprev ϸϾᣛ䩜ᬒܹ 㸼˄ϡࣙᣀ৘Ͼ䯳߫˅ℷޥ㸼ⱘ໻ᇣЎ 1024DŽ⬅Ѣ↣Ͼᣛ䩜ⱘ໻ᇣᰃ 4 Ͼᄫ㡖ˈ᠔ҹᭈϾᴖޥᴖ 495 #define PIDHASH_SZ (4096 >> 2) ==================== include/linux/sched.h 495 495 ==================== 㸼ⱘ໻ᇣ PIDHASH_SZ ߭೼ include/linux/sched.h ЁᅮН˖ޥᴖ 35 struct task_struct *pidhash[PIDHASH_SZ]; ==================== kernel/fork.c 35 35 ==================== 㸼 pidhash ᰃ೼ kernel/fork.c ЁᅮНⱘ˖ޥⴔ䆹䯳߫ህৃҹ䕗ᆍᯧഄᡒࠄ⡍ᅮⱘ䖯⿟њDŽᴖ 㸼ЁᡒࠄϔϾ䯳߫ˈݡ乎ޥ䅵ㅫˈҹ䅵ㅫⱘ㒧ᵰЎϟᷛ೼ᴖޥᇍ pid ᮑ㸠ᴖܜˈϾ pid 㽕ᡒࠄ䆹䖯⿟ᯊ 㸼Ў෎⸔ⱘ䖯⿟䯳߫ⱘ䰉߫DŽᔧ㒭ᅮϔޥࠄⱘϔ⾡᪡԰DŽѢᰃˈህ᳝њ㄀ѠϾ㒘㒛ˈ䙷ህᰃϔϾҹᴖ ⫼ᴹᡒࠄϔϾ䖯⿟ⱘ䏃ᕘֵᙃˈ㗠㒭ᅮϔϾ䖯⿟ো㽕∖ᡒࠄ䆹䖯⿟ⱘ task_struct 㒧ᵘैজᰃᐌᐌ㽕⫼ Ёḍ᥂䖯⿟ো pid ᡒࠄϔϾ䖯⿟ै䴲ᯧџDŽ䖯⿟োⱘߚ䜡ᰃⳌᔧ䱣ᴎⱘˈ೼䖯⿟োЁᑊϡࣙ৿ӏԩৃҹ 䖭Ͼ㒘㒛㱑✊⹂ᅮњ↣Ͼ䖯⿟ⱘĀᅫᮣā݇㋏ˈ⎉Ⲫњ㋏㒳Ё᠔᳝ⱘ䖯⿟ˈԚᰃˈ㽕೼䖭Ͼ㒘㒛 280 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ህᰃϧ䮼Ўℸ㗠䆒䅵ⱘDŽ᳈䞡㽕ⱘད໘ᰃˈ䖭ḋ᳝߽Ѣ⠊ǃᄤ䖯⿟䯈䗮䖛 pipe ᴹᓎゟ䍋ϔ⾡ㅔऩ᳝ᬜ ᳝ client/serverˈ಴Ў fork()ԐТܜ᳝ fork()䖬ᰃܜ᳝ᬜǃ᳔䗖ᅰⱘ᠟↉DŽヨ㗙᳝ᯊ׭ㅔⳈᗔ⭥ˈࠄᑩᰃ 㔥㒰ⱘᅲ⦄Ёˈҹঞ೼ client/server ㋏㒳Ёⱘ server ϔᮍⱘᅲ⦄Ёˈfork()៪ clone()ᐌᐌᰃ᳔㞾✊ǃ᳔ ҷӋᰃᕜԢⱘˈԚᰃ䗮䖛໡ࠊ㗠㒻ᡓϟᴹⱘ䌘⑤߭ᕔᕔᇍᄤ䖯⿟ᕜ᳝⫼DŽ䇏㗙ҹৢӮⳟࠄˈ೼䅵ㅫᴎ 䳔㽕ݭⱘᯊ׭ᠡ䗮䖛 copy_on_write ⱘ᠟↉Ў᠔⍝ঞⱘ义䴶ᓎゟϔϾᮄⱘࡃᴀDŽ᠔ҹˈᘏⱘᴹ䇈໡ࠊⱘ 䴶㸼ㄝㄝˈᇍ⠊䖯⿟ⱘҷⷕঞܼሔব䞣߭ᑊϡ䳔㽕໡ࠊˈ㗠াᰃ䗮䖛া䇏䆓䯂ⱘᔶᓣᅲ⦄݅ѿˈҙ೼ ℹ䍄ᕜ᳝ད໘DŽ᠔䇧໡ࠊˈাᰃ䖯⿟ⱘ෎ᴀ䌘⑤ⱘ໡ࠊˈབ task_struct ᭄᥂㒧ᵘǃ㋏㒳ぎ䯈ේᷜǃ义 ᡓϟᴹⱘ䖭⾡ߚϸℹ䍄ˈᑊϨ೼㄀ϔℹЁ䞛প໡ࠊᮍᓣⱘᮍḜˈ߽䖰໻ѢᓞDŽҢᬜ⥛ⱘ㾦ᑺⳟˈߚϸ 䇏㗙г䆌㽕䯂˖䖭ϸ⾡ᮍḜࠄᑩાϔ⾡ད˛ᑨ䆹䇈ᰃ৘᳝߽ᓞDŽԚᰃ᳈ᑨ䆹䇈ˈLinux Ң Unix 㒻 ЎℸᦤկњϔϾ㋏㒳䇗⫼ execve()ˈ䅽ϔϾ䖯⿟ᠻ㸠ҹ᭛ӊᔶᓣᄬ೼ⱘϔϾৃᠻ㸠⿟ᑣⱘ᯴䈵DŽ এᠻ㸠˄Ԛгϡϔᅮ˅ˈ᠔ҹˈ໡ࠊᅠ៤ҹৢˈᄤ䖯⿟䗮ᐌ㽕Ϣ⠊䖯⿟ߚ䘧ᡀ䭇ˈ䍄㞾Ꮕⱘ䏃DŽLinux ㄀ѠℹᰃⳂᷛ⿟ᑣⱘᠻ㸠DŽϔ㠀ᴹ䇈ˈ߯ᓎϔϾᮄⱘ䖯⿟ᰃ಴Ў᳝ϡৠⱘⳂᷛ⿟ᑣ㽕䅽ᮄⱘ⿟ᑣ ᴹⱘᰃ㒓⿟㗠ϡᰃ䖯⿟DŽ䇏㗙ᇚӮⳟࠄˈvfork()Џ㽕ᰃߎѢᬜ⥛ⱘ㗗㰥㗠䆒䅵ᑊᦤկⱘDŽ ᰃ䰸 task_struct 㒧ᵘ੠㋏㒳ぎ䯈ේᷜҹ໪ⱘ䌘⑤ܼ䛑䗮䖛᭄᥂㒧ᵘᣛ䩜ⱘ໡ࠊĀ䘫Ӵāˈ᠔ҹ vfork()ߎ ϡᅰ᳈ᬍњDŽ৺߭ˈг䆌ᑨ䆹ѦᤶϔϟৡᄫDŽৢᴹˈজ๲䆒њϔϾ㋏㒳䇗⫼ vfork()ˈгϡᏺখ᭄ˈԚ ೼䖭М⌕㸠ˈ㗠᮶✊ϮᏆᄬ೼ˈህ⦃ڣ䱚ā䖭Ͼ䆡䖬ϡܟ೼Ѣ fork()Ң Unix ߱ᳳेᏆᄬ೼ˈ䙷ᯊ׭Ā 䱚āDŽ⹂ᅲᰃ䖭ḋˈॳ಴ܟ᳝খ᭄DŽ䇏㗙г䆌Ꮖ㒣ᛣ䆚ࠄˈfork()݊ᅲ↨ clone()᳈᥹䖥ᴀᴹᛣНϞⱘĀ ϟˈϔϾ䖯⿟ৃҹ clone()ߎϔϾ㒓⿟DŽ᠔ҹˈ㋏㒳䇗⫼ fork()ᰃ᮴খ᭄ⱘˈ㗠 clone()߭ᏺމ೼ᵕッⱘᚙ clone()߭ৃҹᇚ䌘⑤᳝䗝ᢽഄ໡ࠊ㒭ᄤ䖯⿟ˈ㗠≵᳝໡ࠊⱘ᭄᥂㒧ᵘ߭䗮䖛ᣛ䩜ⱘ໡ࠊ䅽ᄤ䖯⿟݅ѿDŽ ϸ㗙ⱘऎ߿೼Ѣ fork()ᰃܼ䚼໡ࠊˈ⠊䖯⿟᠔᳝ⱘ䌘⑤ܼ䛑䗮䖛᭄᥂㒧ᵘⱘ໡ࠊĀ䘫Ӵā㒭ᄤ䖯⿟DŽ㗠 ⱘᰃĀ໡ࠊāDŽLinux ЎℸᦤկњϸϾ㋏㒳䇗⫼ˈϔϾᰃ fork()ˈ঺ϔϾᰃ clone()DŽخ᠔ҹˈ䖭ϔℹ᠔ ೼ⳌৠⱘഄᮍDŽذᠧᓔњѨϾ᭛ӊˈ䙷Мᄤ䖯⿟г᳝ѨϾᠧᓔⱘ᭛ӊˈ㗠Ϩ䖭ѯ᭛ӊⱘᔧࠡ䇏ݭᣛ䩜г 䖯⿟᳝㞾Ꮖⱘ task_struct 㒧ᵘ੠㋏㒳ぎ䯈ේᷜˈԚϢ⠊䖯⿟݅ѿ݊ᅗ᠔᳝ⱘ䌘⑤DŽ՟བˈ㽕ᰃ⠊䖯⿟ Ā㒚㚲ߚ㺖ϔḋāˈাᰃᠧϾ↨ᮍˈᅲ䰙Ϟˈ໡ࠊߎᴹⱘᄤڣϔḋഄ໡ࠊߎϔϾĀᄤ䖯⿟āDŽ䖭䞠᠔䇧 㒚㚲ߚ㺖ڣLinux ᇚ䖯⿟ⱘ߯ᓎϢⳂᷛ⿟ᑣⱘᠻ㸠ߚ៤ϸℹDŽ㄀ϔℹᰃҢᏆ㒣ᄬ೼ⱘĀ⠊䖯⿟āЁ ԚᰃˈLinux˄ҹঞ Unix˅䞛⫼ⱘᮍ⊩ैϡৠDŽ ܼ⣀ゟˈᑊϡϢ݊⠊䖯⿟݅ѿ䌘⑤DŽ ᰃĀᘶ໡ā䖤㸠ᯊⱘϟϔᴵᣛҸˈ߭ህ೼ fn ᣛⱘഄᮍDŽ䖭ϾĀᄤ䖯⿟ā⫳ϟᴹᯊϸ᠟ぎぎˈैৃҹᅠ ᰃϔϾᴀᴹህᏆ㒣ᄬ೼㗠ℷ೼ⴵ⳴ⱘ䖯⿟DŽᔧ䖭Ͼ䖯⿟㹿䇗ᑺ䖤㸠ⱘᯊ׭ˈ݊Ā䖨ಲഄഔāˈгህڣད ᭄᥂㒧ᵘ੠㋏㒳ぎ䯈ේᷜˈᑊ߱ྟ࣪䖭ѯ䌘⑤˗䖬㽕䆒㕂݊㋏㒳ぎ䯈ේᷜˈՓᕫ䖭Ͼᮄ䖯⿟ⳟ䍋ᴹህ Ͼৃᠻ㸠⿟ᑣⱘ᭛ӊৡDŽ䖭䞠᠔䇧Ā߯䗴āˈࣙᣀЎ䖯⿟ߚ䜡᠔䳔ⱘ䌘⑤ǃࣙᣀሲѢ᳔Ԣ䰤ᑺⱘ task_struct ੠䆒䅵ˈখ᭄ fn гৃҹᤶ៤ϔމ䖯⿟ˈᑊՓ䆹䖯⿟Ңߑ᭄ᣛ䩜 fn ᠔ᣛⱘഄᮍᓔྟᠻ㸠DŽḍ᥂ϡৠⱘᚙ ৃᰃ೼ᕜ໮᪡԰㋏㒳˄ࣙᣀϔѯ Unix ⱘব⾡˅Ё䛑䞛⫼њĀϔ᧑ᄤāⱘᮍ⊩DŽᅗĀ߯䗴āߎϔϾ int creat_proc(int (*fn)(void*), void *arg, unsigned long options); Ԑ䖭ḋⱘ㋏㒳䇗⫼˖ ᴹ˄㗠᠔䇧Ā߯ᓎāᅲ䰙ህᰃ໡ࠊ˅DŽ᠔ҹˈLinux ㋏㒳˄Unix гϔḋ˅ᑊϡ৥⫼᠋˄े䖯⿟˅ᦤկ㉏ ҷāDŽ೼ Linux ㋏㒳ЁˈϔϾᮄⱘ䖯⿟ϔᅮ㽕⬅ϔϾᏆ㒣ᄬ೼ⱘ䖯⿟Ā໡ࠊāߎᴹˈ㗠ϡᰃĀ߯䗴āߎ 281 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ഔDŽ᠔ҹˈᔧ⠊䖯⿟੠ᄤ䖯⿟ফ䇗ᑺ㒻㓁䖤㸠㗠ҢݙḌぎ䯈䖨ಲᯊ䛑䖨ಲࠄৠϔ⚍ϞDŽҹࠡⱘҷⷕা 䖯⿟ϔḋഄ᥹ফݙḌⱘ䇗ᑺˈ㗠Ϩ݋᳝Ⳍৠⱘ䖨ಲഄ⠊݊ڣࠊϔϾᄤ䖯⿟DŽᄤ䖯⿟໡ࠊߎᴹҹৢˈህ 䖭䞠ˈ䖯ܹ main()ⱘ䖯⿟Ў⠊䖯⿟ˈᅗ೼㄀ 8 㸠ᠻ㸠њ㋏㒳䇗⫼ fork()߯ᓎϔϾᄤ䖯⿟ˈгህᰃ໡ 23 } 22 return 0; 21 } 20 printf("pid %d: done\n", myself); 19 wait4(child, NULL, 0, NULL); 18 printf("pid %d: %d is my son\n", myself, child); 17 int myself = getpid(); 16 { 15 else 14 } 13 printf("pid %d: I am back, something is wrong!\n", getpid()); 12 execve("/bin/echo", args, NULL); 11 printf("pid %d: %d is my father\n", getpid(), getppid()); 10 /* child */ 9 { 8 if (!(child = fork())) 7 6 char *args[] = { "/bin/echo", "Hello", "World!", NULL}; 5 int child; 4 { 3 int main() 2 1 #include ϟ䴶ᰃϔϾ⫼ᴹⓨ⼎䖯⿟ⱘ䖭⾡Ā⫳ੑ਼ᳳāⱘㅔऩ⿟ᑣ˖ ᮍᓣˈ៪㗙⿄ЎĀৠℹāⱘᮍᓣDŽ ᰃ⠊䖯⿟ϡফ䰏ⱘ˄non_blocking˅ᮍᓣˈг⿄ЎĀᓖℹāⱘᮍᓣ˗঺ϔ⾡ᰃ⠊䖯⿟ফ䰏ⱘ˄blocking˅ 䇗⫼ exit()DŽ䖭䞠ⱘ㄀ϝϾ䗝ᢽ݊ᅲϡ䖛ᰃ㄀ϔϾ䗝ᢽⱘϔ⾡⡍՟ˈ᠔ҹҢᴀ䋼Ϟ䇈ᰃϸ⾡䗝ᢽ˖ϔ⾡ ϔϾᄤ䖯⿟এϪDŽ㄀ϝϾ䗝ᢽᰃĀ㞾㸠䗔ߎग़৆㟲ৄāˈ㒧ᴳ㞾Ꮕⱘ⫳ੑDŽLinux Ўℸ䆒㕂њϔϾ㋏㒳 wait4()੠ wait3()DŽϸϾ㋏㒳䇗⫼෎ᴀⳌৠˈwait4()ㄝᕙᶤϾ⡍ᅮⱘᄤ䖯⿟এϪˈ㗠 wait3()߭ㄝᕙӏԩ ᗕˈㄝᕙᄤ䖯⿟ᅠ៤݊Փੑ㗠᳔㒜এϪˈ✊ৢ⠊䖯⿟ݡ㒻㓁䖤㸠DŽLinux ЎℸᦤկњϸϾ㋏㒳䇗⫼ˈ ϟᴹˈгህᰃ䖯ܹⴵ⳴⢊ذѢ⠊䖯⿟ĀএϪāˈ߭⬅ݙḌ㒭⠊䖯⿟থϔϾ᡹ϻⱘֵোDŽ㄀Ѡᰃܜᄤ䖯⿟ ߯ᓎњᄤ䖯⿟ҹৢˈ⠊䖯⿟᳝ϝϾ䗝ᢽDŽ㄀ϔᰃ㒻㓁䍄㞾Ꮕⱘ䏃ˈϢᄤ䖯⿟ߚ䘧ᡀ䭇DŽাᰃབᵰ ߑ᭄ˈᇚ䖭ϸℹࣙ㺙೼ϔ䍋DŽ њҹৢህϡ㾝ᕫ໡ᴖˈгᕜᇥব࣪њDŽݡ䇈ˈབᵰ᳝ᖙ㽕гৃҹ䗮䖛⿟ᑣᑧᦤկϔϾĀϔ᧑ᄤāⱘᑧ 㒗℺៪ⓨ៣ϔḋ᳝Ͼ೎ᅮⱘĀ᢯ᓣāˈϔᮺᥠᦵڣࡴ execve()ⱘᮍḜгᑊϡ໡ᴖᕜ໮DŽ䖯ϔℹ䇈ˈ䖭г ᔧ✊ˈҢ঺ϔ㾦ᑺˈгህᰃҢ⿟ᑣ䆒䅵⬠䴶ⱘ㾦ᑺᴹⳟˈ߭Āϔ᧑ᄤāⱘᮍḜ᳈Ўㅔ⋕DŽϡ䖛 fork() 䖰ⱘᕅડDŽৃҹ䇈ˈ䖭ᰃϔ乍໽ᠡⱘথᯢˈᅗ೼ᕜ໻⿟ᑺϞᬍবњ᪡԰㋏㒳ⱘথሩᮍ৥DŽ ⱘথሩ੠᥼ᑓᑨ⫼ˈᇍѢ Unix ⿟ᑣ䆒䅵⦃๗ⱘᔶ៤ˈᇍѢ Unix ⿟ᑣ䆒䅵亢Ḑⱘᔶ៤ˈ䛑᳝ⴔ䴲ᐌ⏅ ⱘ䖯⿟䯈䗮ֵㅵ䘧ˈᑊϨҢ㗠ѻ⫳њ᪡԰㋏㒳ⱘ⫼᠋⬠䴶े shell ⱘĀㅵ䘧āᴎࠊDŽ䖭ϔ⚍ˈᇍѢ Unix 282 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 461 : "memory"); 460 "b" (flags | CLONE_VM) 459 "r" (arg), "r" (fn), 458 :"0" (__NR_clone), "i" (__NR_exit), 457 :"=&a" (retval), "=&S" (d0) 456 "1:\t" 455 "int $0x80\n" 454 "movl %3,%0\n\t" /* exit */ 453 "call *%5\n\t" /* call fn */ 452 "pushl %%eax\n\t" 451 "movl %4,%%eax\n\t" 450 * •mregparm or not. */ 449 * not matter whether the called function is compiled with 448 /* Load the argument into eax, and push it. That way, it does 447 "je 1f\n\t" /* parent • jump */ 446 "cmpl %%esp,%%esi\n\t" /* child or parent? */ 445 "int $0x80\n\t" /* Linux/i386 system call */ 444 "movl %%esp,%%esi\n\t" 443 __asm__ __volatile__( 442 441 long retval, d0; 440 { 439 int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags) ==================== arch/i386/kernel/process.c 439 463 ==================== arch/i386/kernel/process.c 㒭ߎⱘ˖ 䙷ḋᠻ㸠ϔϾৃᠻ㸠᯴䈵᭛ӊˈ㗠াᰃᠻ㸠ݙḌЁⱘᶤϔϾߑ᭄DŽ៥Ӏϡོⳟϔϟᅗⱘҷⷕˈ䖭ᰃ೼ 䇗⫼ execve()ᯊڣkernel_thread()ˈկݙḌ㒓⿟䇗⫼DŽԚᰃˈᅲ䰙Ϟ䖭াᰃᇍ clone()ⱘࣙ㺙ˈᅗᑊϡ㛑 䖬㽕ᣛߎˈLinux ݙḌЁ⹂ᅲ᳝Ͼ䉠ԐĀϔ᧑ᄤā߯ᓎݙḌ㒓⿟ⱘߑ᭄˄ᐌᐌ⿄ЎĀॳ䇁ā˅ ᴹ⹂ᅮ䇕ᰃ⠊䖯⿟䇕ᰃᄤ䖯⿟DŽৢܜ⃵ᑣᰃϡᅮⱘˈгϡ㛑ḍ᥂䖨ಲⱘৢ ܜ⬅Ѣᄤ䖯⿟Ϣ⠊䖯⿟ϔḋ᥹ফݙḌ䇗ᑺˈ㗠↣⃵㋏㒳䇗⫼䛑᳝ৃ㛑ᓩ䍋䇗ᑺˈ᠔ҹѠ㗙䖨ಲⱘ ݇DŽ 䇗⫼ᅗˈ㗠ᰃҹ return 䇁হҢ main()䖨ಲˈԚᰃ gcc ೼㓪䆥੠䖲᥹ᯊӮ㞾ࡼࡴϞˈ᠔ҹ䇕г䗗ϡ䖛䖭ϔ ᄤ䖯⿟㒧ᴳᅗⱘ⫳ੑDŽᇍ exit()ⱘ䇗⫼ᰃ↣ϔϾৃᠻ㸠⿟ᑣ᯴䈵ᖙ᳝ⱘˈ㱑✊೼៥Ӏ䖭Ͼ⿟ᑣЁᑊ≵᳝ ϡӮಲࠄ䖭䞠ⱘ㄀ 13 㸠ˈ㗠ᰃĀໂ຿ϔএϡ໡䖨āDŽ䖭ᰃ಴Ў೼/bin/echo Ёᖙᅮ᳝ϔϾ exit()䇗⫼ˈՓ ᴹㄝᕙˈ᠔ҹ⠊䖯⿟ᠻ㸠 wait4()˗㗠ᄤ䖯⿟߭䗮䖛 execve()ᠻ㸠“/bin/echoāDŽᄤ䖯⿟೼ᠻ㸠 echo ҹৢ ϟذ㽕ᠻ㸠ⱘҷⷕˈԚᰃ if 䇁হᇚᅗӀ৘㞾ⱘᠻ㸠䏃㒓ߚᓔњDŽ೼䖭Ͼ⿟ᑣЁˈ៥Ӏ䗝ᢽњ䅽⠊䖯⿟ 9̚14 㸠ሲѢᄤ䖯⿟ˈ㗠 16̚21 㸠ሲѢ⠊䖯⿟ˈ㱑✊ϸϾ䖯⿟݋᳝Ⳍৠⱘ㾚䞢ˈ䛑㛑Āⳟࠄāᇍᮍ᠔ ḋˈ㄀ 8 㸠ⱘ if 䇁হህৃҹḍ᥂䖭Ͼ⡍ᕕᡞѠ㗙ऎߚᓔᴹˈՓϸϾ䖯⿟৘㞾ⶹ䘧Ā៥ᰃ䇕āDŽ✊ৢˈ㄀ ಲāᯊˈ݊䖨ಲؐЎ 0˗㗠⠊䖯⿟Ң fork()䖨ಲᯊⱘ䖨ಲؐैᰃᄤ䖯⿟ⱘ pidˈ䖭ᰃϡৃ㛑Ў 0 ⱘDŽ䖭 ϔḋDŽ݊⃵ˈг䆌᳈Ў䞡㽕ⱘᰃˈѠ㗙Ң fork()䖨ಲᯊ᠔݋᳝ⱘ䖨ಲؐϡϔḋDŽᔧᄤ䖯⿟Ң fork()Ā䖨 ҎӀⱘ᠋ষ៪ḷḜЁг᳝ⳌᑨⱘᷣⳂڣ㗠Ϩᄤ䖯⿟ⱘ task_struct Ё᳝޴Ͼᄫ↉䇈ᯢ䇕ᰃᅗⱘ⠊҆ˈህ ᄤ䖯⿟᳝ϔϾϡৠѢ⠊䖯⿟ⱘ䖯⿟ো pidˈˈܜ᠔᳝䌘⑤੠⡍ᗻˈԚ䖬ᰃ᳝ϔѯ㒚ᖂै䞡㽕ⱘऎ߿DŽ佪 ᳝ϔϾ䖯⿟ᠻ㸠ˈ㗠Ң䖭ϔ⚍ᓔྟै᳝ϸϾ䖯⿟೼ᠻ㸠њDŽ໡ࠊߎᴹⱘᄤ䖯⿟ܼ䴶ഄ㒻ᡓњ⠊䖯⿟ⱘ 283 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 692 return do_fork(SIGCHLD, regs.esp, ®s, 0); 691 { 690 asmlinkage int sys_fork(struct pt_regs regs) ==================== arch/i386/kernel/process.c 690 720 ==================== 䖭޴Ͼ㋏㒳䇗⫼ⱘҷⷕ䛑೼ arch/i386/kernel/process.c Ё˖ ᇥ㋏㒳ᓔ䫔ˈ݊⿟ᑣ䆒䅵᥹ষ߭Ϣ fork ⳌৠDŽޣˈ⥛ⱘᬜ 䇗⫼ vfork()ˈ݊԰⫼гᰃ߯ᓎϔϾ㒓⿟ˈԚЏ㽕াᰃ԰Ў߯ᓎ䖯⿟ⱘЁ䯈ℹ偸ˈⳂⱘ೼Ѣᦤ催߯ᓎᯊ гৃҹ⫼__clone()߯ᓎ䖯⿟ˈ᳝䗝ᢽഄ໡ࠊ⠊䖯⿟ⱘ䌘⑤DŽ㗠 fork()ˈ߭ᰃܼ䴶ഄ໡ࠊDŽ䖬᳝ϔϾ㋏㒳 ߯ᓎ⫼᠋ぎ䯈㒓⿟ᯊˈৃҹ㒭ᅮᄤ㒓⿟⫼᠋ぎ䯈ේᷜⱘԡ㕂ˈ䖬ৃҹᣛᅮᄤ䖯⿟䖤㸠ⱘ䍋⚍DŽৠᯊˈ ㋏㒳䇗⫼__clone()ⱘЏ㽕⫼䗨ᰃ߯ᓎϔϾ㒓⿟ˈ䖭Ͼ㒓⿟ৃҹᰃݙḌ㒓⿟ˈгৃҹᰃ⫼᠋㒓⿟DŽ int clone(int (*fn)(void *arg), void *child_stack, int flags, void *arg); pid_t fork(void); ⱘϡৠ˖ ᴹⳟϔϟѠ㗙೼⿟ᑣ䆒䅵᥹ষϞܜࠡ䴶Ꮖ㒣ㅔ㽕ഄҟ㒡䖛 fork()Ϣ clone()Ѡ㗙ⱘ԰⫼੠ऎ߿DŽ䖭䞠 4.3 ㋏㒳䇗⫼ fork()ǃvfork()Ϣ clone() ⱘᅲ⦄ˈՓ䇏㗙ᇍ䖯⿟ⱘ߯ᓎǃᠻ㸠ҹঞ⍜ѵ᳝᳈⏅ܹⱘ⧚㾷DŽ ҹৢˈ៥Ӏᇚೈ㒩ⴔࠡ䴶ⱘ䙷Ͼ⿟ᑣᴹҟ㒡㋏㒳䇗⫼ fork()ǃclone()ǃexecve()ǃwait4()ҹঞ exit() 䖨ಲDŽ᠔ҹˈ䖭䞠೼ 455 㸠জ䖯㸠ϔ⃵㋏㒳䇗⫼ˈ㗠䖭⃵ⱘ㋏㒳䇗⫼ো೼%3 Ёˈ䙷ᰃ__NR_exitDŽ Āᅶ⅏Ҫеāˈ೼᠔ᠻ㸠ⱘ⿟ᑣЁএϪDŽৃᰃݙḌ㒓⿟াϡ䖛ᰃ䇗⫼ϔϾⳂᷛߑ᭄ˈᔧ✊㽕Ң䙷Ͼߑ᭄ ᑣⱘᮍᓣϞⱘ䖭⾡ϡৠˈজᓩথߎ঺ϔϾ䞡㽕ⱘϡৠˈ䙷ህᰃ䖯⿟೼䇗⫼ execve()Пৢϡݡ䖨ಲˈ㗠ᰃ ৃҹⶹ䘧%5 Ϣব䞣 fn Ⳍ㒧ড়ˈ㗠䙷ℷᰃ kernel_thread()ⱘ㄀ϔϾখ᭄DŽݙḌ㒓⿟Ϣ䖯⿟೼ᠻ㸠Ⳃᷛ⿟ 453 㸠ⱘ call ᣛҸህᰃᇍ䖭Ͼߑ᭄ⱘ䇗⫼DŽߑ᭄ᣛ䩜%5 ᰃҔМਸ਼˛Ң 457 㸠ⱘ䕧ߎ䚼ᓔྟ᭄ϔϟˈህ 䖯⿟ϔḋᠻ㸠ϔϾৃᠻ㸠᯴䈵᭛ӊˈ㗠া㛑ᠻ㸠ݙḌЁⱘϔϾߑ᭄DŽڣࠡ䴶䆆䖛ˈݙḌ㒓⿟ϡ㛑 ࠄᑩ೼ા䞠DŽ ✊ˈ䖭Ͼᮍ⊩া᳝ᇍݙḌ㒓⿟ᠡ䗖⫼ˈ಴Ў᱂䗮ⱘ䖯⿟䛑೼⫼᠋ぎ䯈ˈḍᴀህϡⶹ䘧݊㋏㒳ぎ䯈᥼ᷜ ᄤ㒓⿟ˈ߭ᄤ㒓⿟ⱘ pid ህг᳝ৃ㛑ᰃ 0DŽ᠔ҹˈ䖭䞠䞛⫼ⱘ↨䕗ේᷜᣛ䩜ⱘᮍ⊩ˈᰃ᳈Ўৃ䴴ⱘDŽᔧ 䖭ᰃ಴Ў clone()᠔ѻ⫳ⱘᄤ㒓⿟ৃҹ݋᳝Ϣ⠊㒓⿟Ⳍৠⱘ pidˈབᵰ pid Ў 0 ⱘݙḌ㒓⿟ݡ clone()ϔϾ fork()䖨ಲᯊ᠔⫼ⱘᮍ⊩ਸ਼˛ ڣ⫼㒳ぎ䯈ේᷜˈᄤ䖯⿟ⱘේᷜᣛ䩜ᖙ✊Ϣ⠊䖯⿟ϡৠDŽ䙷МˈЎҔМϡ䞛 ⱘේᷜᣛ䩜Ϣֱᄬ೼ᆘᄬ఼ ESI Ёⱘ⠊䖯⿟ⱘේᷜᣛ䩜䖯㸠↨䕗DŽ⬅Ѣ↣ϔϾݙḌ㒓⿟䛑᳝㞾Ꮕⱘ㋏ ⱘ㋏㒳䇗⫼োDŽҢ clone()䖨ಲҹৢˈ䖭䞠䞛⫼њϔ⾡ϡৠⱘᮍ⊩ऎߚ⠊䖯⿟Ϣᄤ䖯⿟ˈህᰃᇚ䖨ಲᯊ 䌟ؐЎ__NR_cloneDŽ᠔ҹˈ೼䖯ܹ 444 㸠ᯊᆘᄬ఼ EAX Ꮖ㒣㹿䆒㕂៤__NR_cloneˈे clone()ܜ%0 ᑨџ ㄀ 457 㸠ⱘ䕧ߎ䚼ˈ䖭䞠ᆘᄬ఼ EAX Ϣব䞣 retval Ⳍ㒧ড়԰Ў%0ˈ㗠೼ 458 㸠ᓔྟⱘ䕧ܹ䚼䞠জ㾘ᅮˈ 䖭䞠 445 ੠ 455 㸠ⱘᣛҸ“int $0x80āህᰃ㋏㒳䇗⫼DŽ䙷М㋏㒳䇗⫼োᰃ೼ા䞠䆒㕂ⱘਸ਼˛䇋ⳟ 463 } return retval; 462 284 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 555 int do_fork(unsigned long clone_flags, unsigned long stack_start, 554 */ 553 * arch/ia64/kernel/process.c. 552 * For an example that's using stack_top, see 551 * specific copy_thread() routine. Most platforms ignore stack_top. 550 * "stack_top" arguments are simply passed along to the platform 549 * copies the data segment in its entirety. The "stack_start" and 548 * information (task[nr]) and sets up the necessary registers. It also 547 * Ok, this is the main fork•routine. It copies the system process 546 /* [sys_fork()>do_fork()] ==================== kernel/fork.c 546 575 ==================== kernel/fork.c ЁᅮНⱘDŽ䖭Ͼߑ᭄↨䕗໻ˈ䅽៥Ӏ䗤↉ᕔϟⳟ˖ child_stack Ў 0ˈህ㸼⼎Փ⫼⠊䖯⿟ⱘ⫼᠋ぎ䯈ේᷜDŽ䖭ϝϾ㋏㒳䇗⫼ⱘЏԧ䚼ߚ do_fork()ᰃ೼ 䘡DŽ䇗⫼__clone()ᯊৃҹЎᄤ䖯⿟䆒㕂ϔϾ⣀ゟⱘ⫼᠋ぎ䯈ේᷜ˄೼ৠϔϾ⫼᠋ぎ䯈Ё˅ˈབᵰ __clone()ᯊⱘখ᭄ child_stackˈ䇏㗙བᵰ䖬ϡ⏙Ἦˈৃҹಲࠄ㄀ 3 ゴĀ㋏㒳䇗⫼āϔ㡖乎ⴔҷⷕݡ䍄ϔ Ѣ䖭ѯখ᭄᠔䍋ⱘ԰⫼ˈ䇏њ do_fork()ⱘҷⷕҹৢህӮ⏙ἮDŽ⊼ᛣ sys_clone()Ёⱘ regs.ecxˈህᰃ䇗⫼ ৃ㾕ˈϝϾ㋏㒳䇗⫼ⱘᅲ⦄䛑ᰃ䗮䖛 do_fork()ᴹᅠ៤ⱘˈϡৠⱘাᰃᇍ do_fork()ⱘ䇗⫼খ᭄DŽ݇ 720 } 719 return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs.esp, ®s, 0); 718 { 717 asmlinkage int sys_vfork(struct pt_regs regs) 716 */ 715 * the information you need. 714 * do not have enough call•clobbered registers to hold all 713 * done by calling the "clone()" system call directly, you 712 * In user mode vfork() cannot have a stack frame, and if 711 * Not so, for quite unobvious reasons • register pressure. 710 * 709 * could equally well be done in user mode. 708 * This is trivial, and on the face of it looks like it 707 /* 706 705 } 704 return do_fork(clone_flags, newsp, ®s, 0); 703 newsp = regs.esp; 702 if (!newsp) 701 newsp = regs.ecx; 700 clone_flags = regs.ebx; 699 698 unsigned long newsp; 697 unsigned long clone_flags; 696 { asmlinkage int sys_clone(struct pt_regs regs) 695 694 { 693 285 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Ў CLONE_VFORK | CLONE_VMˈ㸼⼎⠊ǃᄤ䖯⿟݅⫼˄⫼᠋˅㰮ᄬऎ䯈ˈᑊϨᔧᄤ䖯⿟䞞ᬒ݊㰮ᄬ ᇍѢ fork()ˈ䖭ϔ䚼ߚЎܼ 0ˈ㸼⦄ᇍ᳝݇ⱘ䌘⑤䛑㽕໡ࠊ㗠ϡᰃ䗮䖛ᣛ䩜݅ѿDŽ㗠ᇍ vfork()ˈ߭ 44 #define CLONE_SIGNAL (CLONE_SIGHAND | CLONE_THREAD) 43 42 #define CLONE_THREAD 0x00010000/* Same thread group? */ 41 #define CLONE_PARENT 0x00008000/* set if we want to have the same parent as the cloner */ */ 40 #define CLONE_VFORK 0x00004000 /* set if the parent wants the child to wake it up on mm_release 39 #define CLONE_PTRACE 0x00002000/* set if we want to let tracing continue on the child too */ 38 #define CLONE_PID 0x00001000 /* set if pid shared */ 37 #define CLONE_SIGHAND 0x00000800 /* set if signal handlers and blocked signals shared */ 36 #define CLONE_FILES 0x00000400 /* set if open files shared between processes */ 35 #define CLONE_FS 0x00000200 /* set if fs info shared between processes */ 34 #define CLONE_VM 0x00000100 /* set if VM shared between processes */ 33 #define CSIGNAL 0x000000ff /* signal mask to be sent at exit */ 32 */ 31 * cloning flags: 30 /* ==================== include/linux/sched.h 30 44 ==================== Нⱘ˖ ᅮDŽ㄀Ѡ䚼ߚᰃϔѯ㸼⼎䌘⑤੠⡍ᗻⱘᷛᖫԡˈ䖭ѯᷛᖫԡᰃ೼ include/linux/sched.h Ёᅮއ⬅䇗⫼㗙 থߎⱘֵোDŽ៥ӀᏆ㒣ⳟࠄˈᇍѢ fork()੠ vfork()䖭Ͼֵোህᰃ SIGCHILDˈ㗠ᇍ__clone()߭䆹ԡ↉ৃ খ᭄ clone_flags ⬅ϸ䚼ߚ㒘៤ˈ᳔݊Ԣⱘᄫ㡖Ўֵো㉏ൟˈ⫼ҹ㾘ᅮᄤ䖯⿟এϪᯊᑨ䆹৥⠊䖯⿟ ো䞣ˈ݊ᅮН੠ᅲ⦄㾕㄀ 6 ゴĀ䖯⿟䯈䗮ֵāDŽ ㄀ 560 㸠ⱘᅣ᪡԰ DECLARE_MUTEX_LOCKED()ᅮН੠߯ᓎњϔϾ⫼Ѣ䖯⿟䯈Ѧ᭹੠ৠℹⱘֵ 575 574 *p = *current; 573 572 goto fork_out; 571 if (!p) 570 p = alloc_task_struct(); 569 568 current•>vfork_sem = &sem; 567 566 } 565 return •EPERM; 564 if (current•>pid) 563 /* This is only allowed from the boot up thread */ 562 if (clone_flags & CLONE_PID) { 561 560 DECLARE_MUTEX_LOCKED(sem); 559 struct task_struct *p; 558 int retval = •ENOMEM; 557 { struct pt_regs *regs, unsigned long stack_size) 556 286 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ==================== include/linux/sched.h 256 267 ==================== Ͼ᭄᥂㒧ᵘⱘᅮН೼ include/linux/sched.h Ё˖ ᠋ⱘ䖯⿟᭄䞣䅵᭄DŽৃᛇ㗠ⶹˈݙḌ㒓⿟ᑊϡሲѢᶤϾ⫼᠋ˈ᠔ҹ݊ task_struct Ёⱘ user ᣛᓩЎ 0DŽ䖭 ѿ䖭ѯֵᙃDŽᰒ✊ˈ↣Ͼ⫼᠋᳝Ϩা᳝ϔϾ user_struct 㒧ᵘDŽ㒧ᵘЁ᳝Ͼ䅵఼᭄__countˈᇍሲѢ䆹⫼ ᠔ҹ᳝݇⫼᠋ⱘϔѯֵᙃᑊϡϧሲѢᶤϔϾ䖯⿟DŽ䖭ḋˈሲѢৠϔ⫼᠋ⱘ䖯⿟ህৃҹ䗮䖛ᣛ䩜 user ݅ ೼ task_struct 㒧ᵘЁ᳝Ͼᣛ䩜 userˈ⫼ᴹᣛ৥ϔϾ user_struct 㒧ᵘDŽϔϾ⫼᠋ᐌᐌ᳝䆌໮Ͼ䖯⿟ˈ 601 600 p•>pid = get_pid(clone_flags); 599 copy_flags(clone_flags, p); 598 597 p•>state = TASK_UNINTERRUPTIBLE; 596 p•>swappable = 0; 595 p•>did_exec = 0; 594 593 __MOD_INC_USE_COUNT(p•>binfmt•>module); 592 if (p•>binfmt && p•>binfmt•>module) 591 590 get_exec_domain(p•>exec_domain); 589 588 goto bad_fork_cleanup_count; 587 if (nr_threads >= max_threads) 586 */ 585 * increase under us (but it may decrease). 584 * the kernel lock so nr_threads can't 583 * Counter increases are protected by 582 /* 581 580 atomic_inc(&p•>user•>processes); 579 atomic_inc(&p•>user•>__count); 578 goto bad_fork_free; 577 if (atomic_read(&p•>user•>processes) >= p•>rlim[RLIMIT_NPROC].rlim_cur) 576 retval = •EAGAIN; [sys_fork()>do_fork()] ==================== kernel/fork.c 576 601 ==================== ᥹ⴔⳟϟϔ↉˄kernel/fork.c˅˖ ⱘ᭄᥂㒧ᵘЁDŽ㒣㓪䆥ҹৢˈ䖭ḋⱘ䌟ؐᰃ⫼ memcpy()ᅲ⦄ⱘˈ᠔ҹᬜ⥛ᕜ催DŽ ⊼ᛣ 574 㸠ⱘ䌟ؐЎᭈϾ᭄᥂㒧ᵘⱘ䌟ؐDŽ䖭ḋˈ⠊䖯⿟ⱘᭈϾ task_struct ህ㹿໡ࠊࠄњᄤ䖯⿟ ᵘˈ催ッ߭⫼԰݊㋏㒳ぎ䯈ේᷜDŽ ᥹ⴔˈ䗮䖛 alloc_task_struct()Ўᄤ䖯⿟ߚ䜡ϸϾ䖲㓁ⱘ⠽⧚义䴶ˈԢッ⫼԰ᄤ䖯⿟ⱘ task_struct 㒧 䆌䖭ḋᴹ䇗⫼__clone()ˈ᠔ҹ 564 㸠ᇍℸࡴҹẔᶹDŽܕᰃ㋏㒳Ёⱘॳྟ䖯⿟˄ᅲ䰙Ϟᰃ㒓⿟˅ˈᠡ ᰃ䇈ˈᄤ䖯⿟㱑✊᳝݊㞾Ꮕⱘ task_struct ᭄᥂㒧ᵘˈैՓ⫼⠊䖯⿟ⱘ pidDŽԚᰃˈা᳝ 0 ো䖯⿟ˈгህ ԡ CLONE_PID ᳝⡍⅞ⱘ԰⫼ˈᔧ䖭ϾᷛᖫԡЎ 1 ᯊˈ⠊ǃᄤ䖯⿟˄㒓⿟˅݅⫼ৠϔϾ䖯⿟োˈгህ ऎ䯈ᯊ㽕૸䝦⠊䖯⿟ˈ㟇Ѣ__clone()ˈ߭䖭ϔ䚼ߚᅠܼ⬅䇗⫼㗙䆒ᅮ㗠԰Ўখ᭄Ӵ䗦ϟᴹDŽ݊Ёᷛᖫ 287 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 45 lcall7_func handler; 44 const char *name; 43 struct exec_domain { 42 */ 41 * offset of the handler is hard coded in kernel/sys_call.S. 40 * N.B. The name and lcall7 handler must be where they are since the 39 * lcall7 syscall handler, start up / shut down functions etc. 38 /* Description of an execution domain • personality range supported, ==================== include/linux/personality.h 38 51 ==================== include/linux/personality.h ЁᅮНⱘ˖ ᠻ㸠ඳDŽ೼ task_struct 㒧ᵘЁ᳝ϔϾᣛ䩜 exec_domainˈৃҹᣛ৥ϔϾ exec_domain ᭄᥂㒧ᵘDŽ䙷ᰃ೼ 䖭Ͼ䖯⿟ህሲѢ Solaris ᠻ㸠ඳ PER_SOLARISDŽᔧ✊ˈ೼ Linux Ϟ䖤㸠ⱘ㒱໻໮᭄⿟ᑣ䛑ሲѢ Linux ⱘ Solaris জ᳝ऎ߿ˈ䖭ህᔶ៤њϡৠⱘᠻ㸠ඳDŽབᵰϔϾ䖯⿟᠔ᠻ㸠ⱘ⿟ᑣᰃЎ Solaris ᓔথⱘˈ䙷М ⳌП䯈೼ᅲ⦄㒚㡖Ϟैҡ✊᳝ᯢᰒⱘϡৠDŽ՟བˈAT&T ⱘ Sys V ੠ BSD 4.2 ህ᳝Ⳍᔧⱘϡৠˈ㗠 Sun ᑊϨヺড় POSIX ⱘ㾘ᅮDŽԚᰃˈ᳝ᕜ໮⠜ᴀⱘ᪡԰㋏㒳ৠḋᰃ Unix ব⾡ˈৠḋヺড় POSIX 㾘ᅮˈѦ ϔϾ䖯⿟䰸њሲѢᶤϔϾ⫼᠋П໪ˈ䖬ሲѢᶤϾĀᠻ㸠ඳāDŽᘏⱘᴹ䇈ˈLinux ᰃ Unix ⱘϔϾব⾡ˈ ሲѢӏԩ⫼᠋ⱘݙḌ㒓⿟ᗢМࡲਸ਼˛587 㸠ЁⱘϸϾ䅵఼᭄ህᰃЎ䖯⿟ⱘᘏ䞣㗠䆒ⱘDŽ 䆌ᅗ fork()њDŽ䙷МˈᇍѢϡܕ᠋䖯⿟ˈᑊϨ䆹⫼᠋ᢹ᳝ⱘ䖯⿟᭄䞣Ꮖ㒣䖒ࠄњ㾘ᅮⱘ䰤ࠊؐˈህݡϡ rlim[RLIMIT_NPROC]ህ㾘ᅮњ䆹䖯⿟᠔ሲⱘ⫼᠋ৃҹᢹ᳝ⱘ䖯⿟᭄䞣DŽ᠔ҹˈབᵰᔧࠡ䖯⿟ᰃϔϾ⫼ ৘䖯⿟ⱘ task_struct 㒧ᵘЁ䖬᳝Ͼ᭄㒘 rlimˈᇍ䆹䖯⿟ऴ⫼৘⾡䌘⑤ⱘ᭄䞣԰ߎ䰤ࠊˈ㗠 user_struct 㒧ᵘDŽ 䖤ㅫˈህৃҹ䅵ㅫߎϔϾϟᷛ㗠ᡒࠄ䆹⫼᠋ⱘޥhash˅㸼DŽᇍ⫼᠋ৡᮑҹᴖ˄ޥ䖭ᰃϔϾᴖ 26 static struct user_struct *uidhash_table[UIDHASH_SZ]; ==================== kernel/user.c 26 26 ==================== 20 #define UIDHASH_SZ (1 << UIDHASH_BITS) 19 #define UIDHASH_BITS 8 ==================== kernel/user.c 19 20 ==================== uidhash_table˖ Ⳍ⏋⎚ˈѠ㗙ᰃ៾✊ϡৠⱘὖᗉDŽ೼ kernel/user.c Ё䖬ᅮНњϔϾ user_struct 㒧ᵘᣛ䩜ⱘ᭄㒘 ❳ᙝ Unix ݙḌⱘ䇏㗙㽕⊼ᛣˈϡ㽕ᡞ Unix ⱘ䖯⿟᥻ࠊ㒧ᵘЁⱘ user ऎϢ䖭䞠ⱘ user_struct 㒧ᵘ 267 }; 266 uid_t uid; 265 struct user_struct *next, **pprev; 264 /* Hash table maintenance information */ 263 262 atomic_t files; /* How many open files does this user have? */ 261 atomic_t processes; /* How many processes does this user have? */ 260 atomic_t __count; /* reference count */ 259 struct user_struct { 258 */ Some day this will be a full•fledged user tracking system.. * 257 */ 256 288 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 90 spin_lock(&lastpid_lock); 89 88 return current•>pid; 87 if (flags & CLONE_PID) 86 85 struct task_struct *p; 84 static int next_safe = PID_MAX; 83 { 82 static int get_pid(unsigned long flags) [sys_fork()>do_fork()>get_pid()] ==================== kernel/fork.c 82 123 ==================== Ё˖ ⿟˅ⱘ pidˈ៪䖨ಲϔϾᮄⱘ pid ᬒ೼ᄤ䖯⿟ⱘ task_struct ЁDŽߑ᭄ get_pid()ⱘҷⷕг೼ kernel/fork.c 㟇Ѣ 600 㸠ⱘ get_pid()ˈ߭ḍ᥂ clone_flags Ёᷛᖫԡ CLONE_PID ⱘؐˈ៪䖨ಲ⠊䖯⿟˄ᔧࠡ䖯 বᤶˈ✊ৢݭܹ p•>flagsDŽ䖭Ͼߑ᭄ⱘҷⷕг೼ kernel/fork.c ЁDŽ䇏㗙ৃҹ㞾Ꮕ䯙䇏DŽ ੠ܙᡞ⢊ᗕ䆒៤ UNINTERRUPTIBLEDŽߑ᭄ copy_flags()ᇚখ᭄ clone_flags Ёⱘᷛᖫԡ⬹ࡴ㸹ܜᠡџ pid ⱘ᪡԰ᖙ乏ᰃ⣀ऴⱘˈᔧࠡ䖯⿟ৃ㛑Ӯ಴Ўϔᯊ䖯ϡњЈ⬠ऎ㗠াད᱖ᯊ䖯ܹⴵ⳴⢊ᗕㄝᕙˈ᠔ҹ ЎҔМ㽕೼ 597 㸠ᡞ⢊ᗕ䆒៤ TASK_UNINTERRUPTIBLE ਸ਼˛䖭ᰃ಴Ў೼ get_pid()Ёѻ⫳ϔϾᮄ __MOD_INC_USE_COUNT()ህᰃᇍ᳝݇῵ഫⱘՓ⫼䅵఼᭄䖯㸠᪡԰DŽ Ё䖬᳝ϔϾᣛ৥ Linux_binfmt ᭄᥂㒧ᵘⱘᣛ䩜 binfmt ˈ㗠 do_fork() Ё 593 㸠ⱘ 㰮ᢳᴎḐᓣDŽᇍ䖭ѯϡৠḐᓣⱘᬃᣕ䗮ᐌᰃ䗮䖛ࡼᗕᅝ㺙ⱘ偅ࡼ῵ഫᴹᅲ⦄ⱘDŽ᠔ҹ task_struct 㒧ᵘ ৠḋⱘ䘧⧚ˈ↣Ͼ䖯⿟᠔ᠻ㸠ⱘ⿟ᑣሲѢᶤ⾡ৃᠻ㸠᯴䈵Ḑᓣˈབ a.out Ḑᓣǃelf Ḑᓣǃ⫮ܼ java 60 if (it && it•>module) __MOD_INC_USE_COUNT(it•>module); 59 #define get_exec_domain(it) \ ==================== include/linux/personality.h 59 60 ==================== get_exec_domain()䗦๲݋ԧ῵ഫⱘ᭄᥂㒧ᵘЁⱘ䅵఼᭄˄ᅮН೼ include/linux/personality.h Ё ˅DŽ 㒧ᵘЁ䛑᳝ϔϾ䅵఼᭄ˈ㸼ᯢ᳝޴Ͼ䖯⿟䳔㽕Փ⫼䖭Ͼ῵ഫDŽ಴ℸˈdo_fork()Ё䗮䖛 590 㸠ⱘ ϔϾ䖭ḋⱘ䖯⿟೼䖤㸠˗䖭ѯЎ Solaris ᠔䳔ⱘ῵ഫህϡ㛑ᢚ䰸DŽ᠔ҹˈ೼ᦣ䗄↣ϾᏆᅝ㺙῵ഫⱘ᭄᥂ ݇㋏DŽ՟བˈϔϾሲѢ Solaris ᠻ㸠ඳⱘ䖯⿟ህᕜৃ㛑㽕⫼ࠄϧ䮼Ў Solaris 䆒㕂ⱘϔѯ῵ഫˈা㽕䖬᳝ moduleˈՓ݊೼䖤㸠ᯊࡼᗕഄᅝ㺙੠ᢚ䰸DŽ䖭ѯĀࡼᗕᅝ㺙῵ഫāϢ䖤㸠Ёⱘ䖯⿟ⱘᠻ㸠ඳ᳝ᆚߛⱘ ӊ㋏㒳੠䆒໛偅ࡼⱘゴ㡖ЁᇚӮⳟࠄˈ೼ Linux ㋏㒳Ё䆒໛偅ࡼ⿟ৃҹ䆒䅵ᑊᅲ⦄៤Āࡼᗕᅝ㺙῵ഫ” ៥Ӏ೼䖭䞠Џ㽕݇ᖗⱘ㒧ᵘ៤ߚᰃ moduleˈ䖭ᰃᣛ৥ᶤϾ module ᭄᥂㒧ᵘⱘᣛ䩜DŽ䇏㗙೼᳝݇᭛ ᳝ PER_LINUXǃPER_SVR4ǃPER_BSD ੠ PER_SOLARIS ㄝㄝDŽ ߑ᭄ᣛ䩜 handlerˈ⫼Ѣ䗮䖛䇗⫼䮼ᅲ⦄㋏㒳䇗⫼ˈ៥Ӏᑊϡ݇ᖗDŽᄫ㡖 pers_low Ўᶤ⾡ඳⱘҷⷕˈ 51 }; 50 struct exec_domain *next; 49 struct module * module; 48 unsigned long * signal_invmap; 47 unsigned long * signal_map; unsigned char pers_low, pers_high; 46 289 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 608 p•>p_pptr = current; 607 if (!(p•>ptrace & PT_PTRACED)) 606 p•>p_opptr = current; 605 if ((clone_flags & CLONE_VFORK) || !(clone_flags & CLONE_PARENT)) { 604 603 p•>run_list.prev = NULL; 602 p•>run_list.next = NULL; [sys_fork()>do_fork()] ==================== kernel/fork.c 602 640 ==================== ಲࠄ do_fork()Ёݡᕔϟⳟ˄kernel/fork.c˅˖ ᴖˈ៥Ӏህϡ໮ࡴ㾷䞞њDŽ ᰃЎ㋏㒳䖯⿟˄ࣙᣀݙḌ㒓⿟˅ֱ⬭ⱘˈЏ㽕⫼Ѣ৘⾡Āֱᡸ⼲ā䖯⿟DŽҹϞ䖭↉ҷⷕⱘ䘏䕥ᑊϡ໡ 䖭䞠ⱘᐌ᭄ PID_MAX ᅮНЎ 0x8000DŽৃ㾕ˈ䖯⿟োⱘ᳔໻ؐᰃ 0x7fffˈे 32767DŽ䖯⿟ো 0̚299 123 } 122 return last_pid; 121 120 spin_unlock(&lastpid_lock); 119 } 118 read_unlock(&tasklist_lock); 117 } 116 next_safe = p•>session; 115 if(p•>session > last_pid && next_safe > p•>session) 114 next_safe = p•>pgrp; 113 if(p•>pgrp > last_pid && next_safe > p•>pgrp) 112 next_safe = p•>pid; 111 if(p•>pid > last_pid && next_safe > p•>pid) 110 } 109 goto repeat; 108 } 107 next_safe = PID_MAX; 106 last_pid = 300; 105 if(last_pid & 0xffff8000) 104 if(++last_pid >= next_safe) { 103 p•>session == last_pid) { 102 p•>pgrp == last_pid || 101 if(p•>pid == last_pid || 100 for_each_task(p) { 99 repeat: 98 read_lock(&tasklist_lock); 97 next_safe = PID_MAX; 96 inside: 95 if(last_pid >= next_safe) { 94 } 93 goto inside; 92 last_pid = 300; /* Skip daemons etc. */ if((++last_pid) & 0xffff8000) { 91 290 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 641 retval = •ENOMEM; [sys_fork()>do_fork()] ==================== kernel/fork.c 641 655 ==================== 㟇ℸˈᇍ task_struct ᭄᥂㒧ᵘⱘ໡ࠊϢ߱ྟ࣪ህ෎ᴀᅠ៤њDŽϟ⬅ህ䕂ࠄ݊ᅗⱘ䌘⑤њ˖ Ёᮁ਼ᳳЎऩԡⱘҢ㋏㒳߱ྟ࣪ᓔྟ㟇ℸᯊⱘᯊ䯈DŽ ᳔ৢˈtask_struct 㒧ᵘЁⱘ start_time 㸼⼎䖯⿟߯ᓎⱘᯊ䯈ˈ㗠ܼሔব䞣 jiffies ⱘ᭄ؐህᰃҹᯊ䩳 ೼䖭䞠៥Ӏϡ݇ᖗᇍ໮໘⧚఼ SMP 㒧ᵘⱘ⡍⅞㗗㰥ˈ᠔ҹг䏇䖛 627̚637 㸠DŽ ᥹ϟᴹᰃᇍ task_struct 㒧ᵘЁ৘⾡䅵ᯊব䞣ⱘ߱ྟ࣪ˈ៥Ӏᇚ೼Ā䖯⿟䇗ᑺāϔкЁҟ㒡䖭ѯব䞣DŽ ݇㒧ᵘ៤ߚⱘ߱ྟ࣪DŽᇍ䖭ѯϢֵো᳝݇ⱘ㒧ᵘ៤ߚ៥Ӏᇚ೼Ā䖯⿟䯈䗮ֵāⱘֵোϔ㡖Ё䆺㒚ҟ㒡DŽ ㉏Ԑഄˈᇍ৘⾡ֵᙃ䞣г㽕ࡴҹ߱ྟ࣪DŽ䖭䞠 615 ੠ 616 㸠ᰃᇍᄤ䖯⿟ⱘᕙ໘⧚ֵো䯳߫ҹঞ᳝ ᄤ䖯⿟ℸᯊᇮ᳾Āߎ⫳āˈᔧ✊䇜ϡϞᄤ䖯⿟ⱘㄝᕙ䯳߫ˈ᠔ҹ㽕೼ 611 㸠Ёࡴҹ߱ྟ࣪DŽ task_struct Ё䆒㕂њϔϾ䯳߫༈䚼 wait_chldexitˈࠡ䴶೼໡ࠊ task_struct 㒧ᵘᯊᡞ䖭г✻ᡘњ䖛ᴹˈ㗠 ϟᴹㄝᕙ݊ᄤ䖯⿟ᅠ៤ՓੑDŽЎℸˈ೼ذ៥Ӏ೼ࠡϔ㡖Ёᦤࠄ䖛 wait4()੠ wait3()ˈϔϾ䖯⿟ৃҹ 640 639 p•>start_time = jiffies; 638 p•>lock_depth = •1; /* •1 = no lock */ 637 #endif 636 } 635 spin_lock_init(&p•>sigmask_lock); 634 p•>per_cpu_utime[i] = p•>per_cpu_stime[i] = 0; 633 for(i = 0; i < smp_num_cpus; i++) 632 /* ?? should we just memset this ?? */ 631 p•>processor = current•>processor; 630 p•>has_cpu = 0; 629 int i; 628 { 627 #ifdef CONFIG_SMP 626 p•>times.tms_cutime = p•>times.tms_cstime = 0; 625 p•>times.tms_utime = p•>times.tms_stime = 0; 624 p•>tty_old_pgrp = 0; 623 p•>leader = 0; /* session leadership doesn't inherit */ 622 621 p•>real_timer.data = (unsigned long) p; 620 init_timer(&p•>real_timer); 619 p•>it_real_incr = p•>it_virt_incr = p•>it_prof_incr = 0; 618 p•>it_real_value = p•>it_virt_value = p•>it_prof_value = 0; 617 616 init_sigpending(&p•>pending); 615 p•>sigpending = 0; 614 613 spin_lock_init(&p•>alloc_lock); 612 p•>vfork_sem = NULL; 611 init_waitqueue_head(&p•>wait_chldexit); p•>p_cptr = NULL; 610 { 609 291 292 642 /* copy all the process information */ 643 if (copy_files(clone_flags, p)) 644 goto bad_fork_cleanup; 645 if (copy_fs(clone_flags, p)) 646 goto bad_fork_cleanup_files; 647 if (copy_sighand(clone_flags, p)) 648 goto bad_fork_cleanup_fs; 649 if (copy_mm(clone_flags, p)) 650 goto bad_fork_cleanup_sighand; 651 retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs); 652 if (retval) 653 goto bad_fork_cleanup_sighand; 654 p•>semundo = NULL; 655 ߑ᭄ copy_files()᳝ᴵӊഄ໡ࠊᏆᠧᓔ᭛ӊⱘ᥻ࠊ㒧ᵘˈ䖭⾡໡ࠊা᳝೼ clone_flags Ё CLONE_FILES ᷛᖫԡЎ 0 ᯊᠡⳳℷ䖯㸠ˈ৺߭ህাᰃ݅ѿ⠊䖯⿟ⱘᏆᠧᓔ᭛ӊDŽᔧϔϾ䖯⿟᳝Ꮖᠧᓔ ᭛ӊᯊˈtask_struct 㒧ᵘЁⱘᣛ䩜 files ᣛ৥ϔϾ files_struct ᭄᥂㒧ᵘˈ৺߭Ў 0DŽ᠔᳝Ϣ㒜ッ䆒໛ tty Ⳍ㘨㋏ⱘ⫼᠋䖯⿟ⱘ༈ϝϾ᭛ӊˈे stdinǃstdoutǃঞ stderrˈ䛑ᰃ乘ܜᠧᓔⱘˈ᠔ҹᣛ䩜ϔ㠀ϡӮᰃ 0DŽ᭄᥂㒧ᵘ files_struct ᰃ೼ include/linux/sched.h ЁᅮНⱘ˄䆺㾕Ā᭛ӊ㋏㒳āϔゴ˅ˈcopy_files()ⱘ ҷⷕ߭䖬ᰃ೼ kernel/fork.c Ё˖ ==================== kernel/fork.c 408 513 ==================== [sys_fork()>do_fork()>copy_files] 408 static int copy_files(unsigned long clone_flags, struct task_struct * tsk) 409 { 410 struct files_struct *oldf, *newf; 411 struct file **old_fds, **new_fds; 412 int open_files, nfds, size, i, error = 0; 413 414 /* 415 * A background process may not have any files ... 416 */ 417 oldf = current•>files; 418 if (!oldf) 419 goto out; 420 421 if (clone_flags & CLONE_FILES) { 422 atomic_inc(&oldf•>count); 423 goto out; 424 } 425 426 tsk•>files = NULL; 427 error = •ENOMEM; 428 newf = kmem_cache_alloc(files_cachep, SLAB_KERNEL); 429 if (!newf) 430 goto out; 431 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 293 432 atomic_set(&newf•>count, 1); 433 434 newf•>file_lock = RW_LOCK_UNLOCKED; 435 newf•>next_fd = 0; 436 newf•>max_fds = NR_OPEN_DEFAULT; 437 newf•>max_fdset = __FD_SETSIZE; 438 newf•>close_on_exec = &newf•>close_on_exec_init; 439 newf•>open_fds = &newf•>open_fds_init; 440 newf•>fd = &newf•>fd_array[0]; 441 442 /* We don't yet have the oldf readlock, but even if the old 443 fdset gets grown now, we'll only copy up to "size" fds */ 444 size = oldf•>max_fdset; 445 if (size > __FD_SETSIZE) { 446 newf•>max_fdset = 0; 447 write_lock(&newf•>file_lock); 448 error = expand_fdset(newf, size); 449 write_unlock(&newf•>file_lock); 450 if (error) 451 goto out_release; 452 } 453 read_lock(&oldf•>file_lock); 454 455 open_files = count_open_files(oldf, size); 456 457 /* 458 * Check whether we need to allocate a larger fd array. 459 * Note: we're not a clone task, so the open count won't 460 * change. 461 */ 462 nfds = NR_OPEN_DEFAULT; 463 if (open_files > nfds) { 464 read_unlock(&oldf•>file_lock); 465 newf•>max_fds = 0; 466 write_lock(&newf•>file_lock); 467 error = expand_fd_array(newf, open_files); 468 write_unlock(&newf•>file_lock); 469 if (error) 470 goto out_release; 471 nfds = newf•>max_fds; 472 read_lock(&oldf•>file_lock); 473 } 474 475 old_fds = oldf•>fd; 476 new_fds = newf•>fd; 477 478 memcpy(newf•>open_fds•>fds_bits, oldf•>open_fds•>fds_bits, open_files/8); 479 memcpy(newf•>close_on_exec•>fds_bits, oldf•>close_on_exec•>fds_bits, open_files/8); 480 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ݊ϝ߭ᰃ file 㒧ᵘ᭄㒘 fd_array[]DŽ䖭ϝϾ䚼ӊ䛑ᰃ೎ᅮ໻ᇣⱘˈབᵰᠧᓔⱘ᭛ӊ᭄䞣䍙䖛݊ᆍ䞣ˈህ Ё᳝ϝϾЏ㽕ⱘĀ䚼ӊāDŽ݊ϔᰃϾԡ೒ˈৡЎ close_on_exec_init˗݊Ѡгᰃԡ೒ˈৡЎ open_fds_init˗ 䖯⿟ߚ䜡ϔϾ files_struct ᭄᥂㒧ᵘ԰Ў newfˈ✊ৢҢ oldf ᡞݙᆍ໡ࠊࠄ newfDŽ೼ files_struct ᭄᥂㒧ᵘ 䗮䖛 kmem_cache_alloc()Ўᄤܜ᭄᥂㒧ᵘDŽ৺߭ˈབᵰ CLONE_FILES ᷛᖫԡЎ 0ˈ䙷ህ㽕໡ࠊњDŽ佪 䩜 files 㞾✊г໡ࠊࠄњᄤ䖯⿟ⱘ task_struct 㒧ᵘЁˈՓᄤ䖯⿟䗮䖛䖭Ͼᣛ䩜݅ѿᔧࠡ䖯⿟ⱘ files_struct ⬅Ѣ೼ℸПࠡᏆ䗮䖛᭄᥂㒧ᵘ䌟ؐᇚᔧࠡ䖯⿟ⱘᭈϾ task_struct 㒧ᵘ䛑໡ࠊ㒭њᄤ䖯⿟ˈ㒧ᵘЁⱘᣛ 䗦๲ᔧࠡ䖯⿟ⱘ files_struct 㒧ᵘЁⱘ݅ѿ䅵᭄ˈ㸼⼎䖭Ͼ᭄᥂㒧ᵘ⦄೼໮њϔϾĀ⫼᠋āˈህ䖨ಲњDŽ ݡⳟ໡ࠊⱘᴵӊDŽབᵰখ᭄ clone_flags Ёⱘ CLONE_FILES ᷛᖫԡЎ 1ˈህাᰃ䗮䖛 atomic_inc() ⱘ task_struct 㒧ᵘЁⱘ files_struct 㒧ᵘᣛ䩜԰Ў oldfDŽ ⳟ໡ࠊⱘᮍ৥DŽ಴Ўᰃᔧࠡ䖯⿟೼߯ᓎᄤ䖯⿟ˈᰃҢᔧࠡ䖯⿟໡ࠊࠄᄤ䖯⿟ˈ᠔ҹᡞᔧࠡ䖯⿟ܜ 䞞DŽ ԰ϔѯ㾷ܜ䇏㗙ৃҹ೼ᄺдњĀ᭛ӊ㋏㒳āϔゴҹৢݡಲ䖛༈ᴹҨ㒚䯙䇏䖭↉ҷⷕˈ៥Ӏ೼䖭䞠 513 } 512 goto out; 511 kmem_cache_free(files_cachep, newf); 510 free_fdset (newf•>open_fds, newf•>max_fdset); 509 free_fdset (newf•>close_on_exec, newf•>max_fdset); 508 out_release: 507 506 return error; 505 out: 504 error = 0; 503 tsk•>files = newf; 502 501 } 500 memset(&newf•>close_on_exec•>fds_bits[start], 0, left); 499 memset(&newf•>open_fds•>fds_bits[start], 0, left); 498 497 int start = open_files / (8 * sizeof(unsigned long)); 496 int left = (newf•>max_fdset•open_files)/8; 495 if (newf•>max_fdset > open_files) { 494 493 memset(new_fds, 0, size); 492 /* This is long word aligned thus could use a optimized version */ 491 490 size = (newf•>max_fds • open_files) * sizeof(struct file *); 489 /* compute the remainder to be cleared */ 488 487 read_unlock(&oldf•>file_lock); 486 } 485 *new_fds++ = f; 484 get_file(f); 483 if (f) 482 struct file *f = *old_fds++; for (i = open_files; i != 0; i••) { 481 294 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 356 /* We don't need to lock fs • think why ;•) */ 355 struct fs_struct *fs = kmem_cache_alloc(fs_cachep, GFP_KERNEL); 354 { 353 static inline struct fs_struct *__copy_fs_struct(struct fs_struct *old) [sys_fork()>do_fork()>copy_fs()>copy_fs_struct()>__cooy_fs_struct()] ==================== kernel/fork.c 353 376 ==================== 381 } 380 return __copy_fs_struct(old); 379 { 378 struct fs_struct *copy_fs_struct(struct fs_struct *old) [sys_fork()>do_fork()>copy_fs()>copy_fs_struct()] ==================== kernel/fork.c 378 381 ==================== 393 } 392 return 0; 391 return •1; 390 if (!tsk•>fs) 389 tsk•>fs = __copy_fs_struct(current•>fs); 388 } 387 return 0; 386 atomic_inc(¤t•>fs•>count); 385 if (clone_flags & CLONE_FS) { 384 { 383 static inline int copy_fs(unsigned long clone_flags, struct task_struct * tsk) [sys_fork()>do_fork()>copy_fs()] ==================== kernel/fork.c 383 393 ==================== 䖭ѯҷⷕ⬭㒭䇏㗙˖ ˄䆺㾕Ā᭛ӊ㋏㒳āϔゴ˅DŽߑ᭄ copy_fs()䖲ৠ޴Ͼ᳝݇Ԣሖߑ᭄ⱘҷⷕг೼ kernel/fork.c ЁDŽ៥Ӏᡞ ᔩ pwdǃϔϾ⫼Ѣ᭛ӊ᪡԰ᴗ䰤ㅵ⧚ⱘ umaskˈ䖬᳝ϔϾ䅵఼᭄ˈ݊ᅮН೼ include/linux/fs_struct.h Ё task_struct 㒧ᵘЁⱘᣛ䩜ᣛ৥ϔϾ fs_struct ᭄᥂㒧ᵘˈ㒧ᵘЁ䆄ᔩⱘᰃ䖯⿟ⱘḍⳂᔩ rootǃᔧࠡᎹ԰Ⳃ ࠊ䘫Ӵ㒭ᄤ䖯⿟DŽ㉏Ԑഄˈcopy_fs()гᰃা᳝೼ clone_flags Ё CLONE_FS ᷛᖫԡЎ 0 ᯊᠡࡴҹ໡ࠊDŽ 䰸 files_struct ᭄᥂㒧ᵘ໪ˈ䖬᳝Ͼ fs_struct ᭄᥂㒧ᵘгᰃϢ᭛ӊ㋏㒳᳝݇ⱘˈг㽕䗮䖛݅ѿ៪໡ ࡃᴀˈህ᳝њᇍৠϔ᭛ӊⱘ঺ϔϾ䇏ݭϞϟ᭛ˈҹৢህৃҹ৘䍄৘ⱘ䏃ˈѦϡᑆᡄњDŽ ϟህϡϔḋњˈ⬅Ѣᄤ䖯⿟᳝㞾ᏅⱘމЎϸϾ䖯⿟݅ѿⴔᇍ᭛ӊⱘৠϔϾ䇏ݭϞϟ᭛DŽ㗠೼໡ࠊⱘᚙ བᵰᄤ䖯⿟ᇍᶤϾᏆᠧᓔ᭛ӊ䇗⫼њϔ⃵ lseek()ˈ߭⠊䖯⿟ᇍ䖭Ͼ᭛ӊⱘ䇏ݭԡ㕂г䱣ⴔᬍবњˈ಴ ϟˈϸϾ䖯⿟ᰃѦⳌ⡉ࠊⱘDŽމ⚍ϞԐТϢ݅ѿ≵᳝ҔМऎ߿DŽৃᰃˈ䱣ৢऎ߿ህᴹњDŽ೼݅ѿⱘᚙ ໡ࠊᅠ៤П߱ˈᄤ䖯⿟᳝њϔӑࡃᴀˈᅗⱘݙᆍϢ⠊䖯⿟ⱘĀℷᴀā೼ݙᆍϞ෎ᴀᰃⳌৠⱘˈ೼䖭ϔ ࠄⳂⱘˈЎҔМ䖬㽕ϡ䕲䕯ࢇഄ໡ࠊਸ਼˛ऎ߿೼Ѣᄤ䖯⿟˄ҹঞ⠊䖯⿟ᴀ䑿˅ᰃ৺㛑Ā⣀ゟ㞾ЏāDŽᔧ ᰒ㗠ᯧ㾕ˈ݅ѿ↨໡ࠊ㽕ㅔऩᕫ໮DŽ䙷М䖭Ѡ㗙೼ᬜᵰϞࠄᑩ᳝ҔМऎ߿ਸ਼˛བ⫼݅ѿህৃҹ䖒 ѢᏆᠧᓔ᭛ӊⱘ᭄䞣DŽއ੠ fd ᘏᰃߚ߿ᣛ৥䖭ϝ㒘ֵᙃDŽ᠔ҹˈབԩ໡ࠊপ 䞛⫼ files_struct ᭄᥂㒧ᵘݙ䚼ⱘ䖭ϝϾ䚼ӊ៪ᰃ䞛⫼໪䚼ⱘ᳓ᤶぎ䯈ˈᣛ䩜 close_on_execǃopen_fds ᕫ䗮䖛 expand_fdset()੠ expand_fd_array()೼ files_struct ᭄᥂㒧ᵘҹ໪঺㸠ߚ䜡ぎ䯈԰Ў᳓ᤶDŽϡㅵᰃ 295 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 518 517 struct signal_struct *sig; 516 { 515 static inline int copy_sighand(unsigned long clone_flags, struct task_struct * tsk) [sys_fork()>do_fork()>copy_sighand()] ==================== kernel/fork.c 515 531 ==================== ҹ䗮䖛໡ࠊ៪݅ѿᡞᅗҢ⠊䖯⿟㒻ᡓϟᴹDŽߑ᭄ copy_sighand()ⱘҷⷕབϟ˄kernel/fork.c˅˖ ݊Ёⱘ᭄㒘 action[]⹂ᅮњϔϾ䖯⿟ᇍ৘⾡ֵো˄ҹֵোⱘ᭄ؐЎϟᷛ˅ⱘডᑨ੠໘⧚ˈᄤ䖯⿟ৃ 247 }; 246 spinlock_t siglock; 245 struct k_sigaction action[_NSIG]; 244 atomic_t count; 243 struct signal_struct { ==================== include/linux/sched.h 243 247 ==================== 䖭⾡㒧ᵘᰃ೼ include/linux/sched.h ЁᅮНⱘ˖ བᵰϔϾ䖯⿟䆒㕂њֵো໘⧚⿟ᑣˈ݊ task_struct 㒧ᵘЁⱘᣛ䩜 sig ህᣛ৥ϔϾ signal_struct ᭄᥂㒧ᵘDŽ 㒳ৃҹЎ৘ϾЁᮁ⑤䆒㕂ⳌᑨⱘЁᮁ᳡ࡵ⿟ᑣϔḋDŽ㋏ڣЎ৘⾡ֵো䆒㕂⫼Ѣ䆹ֵোⱘ໘⧚⿟ᑣˈህད ЁᮁПѢϔϾ໘⧚఼DŽ䖯⿟ৃҹڣࠊⱘDŽֵো෎ᴀϞᰃϔ⾡䖯⿟䯈䗮ֵ᠟↉ˈֵোПѢϔϾ䖯⿟ህད ᥹ⴔᰃ݇Ѣᇍֵোⱘ໘⧚ᮍᓣDŽᰃ৺໡ࠊ⠊䖯⿟ᇍֵোⱘ໘⧚ᰃ⬅ᷛᖫԡ CLONE_SIGHAND ᥻ 䅵᭄DŽ ᭄᥂㒧ᵘˈህ೼䖭ϔሖϞ᳝њ㞾Џᗻˈ㟇Ѣᇍ᳈⏅ሖⱘ᭄᥂㒧ᵘ߭䖬ᰃ݅ѿˈ᠔ҹ㽕䗦๲ᅗӀⱘ݅ѿ ϔϾ⫼᠋DŽ⊼ᛣˈ೼䖭䞠㽕໡ࠊⱘᰃ fs_struct ᭄᥂㒧ᵘˈ㗠ᑊϡ໡ࠊ᳈⏅ሖⱘ᭄᥂㒧ᵘDŽ໡ࠊњ fs_struct ҷⷕЁⱘ mntget()੠ dget()䛑ᰃ⫼ᴹ䗦๲Ⳍᑨ᭄᥂㒧ᵘЁ݅ѿ䅵᭄ⱘˈ಴Ў䖭ѯ᭄᥂㒧ᵘ⦄೼໮њ 376 } 375 return fs; 374 } 373 read_unlock(&old•>lock); 372 } 371 fs•>altroot = NULL; 370 fs•>altrootmnt = NULL; 369 } else { 368 fs•>altroot = dget(old•>altroot); 367 fs•>altrootmnt = mntget(old•>altrootmnt); 366 if (old•>altroot) { 365 fs•>pwd = dget(old•>pwd); 364 fs•>pwdmnt = mntget(old•>pwdmnt); 363 fs•>root = dget(old•>root); 362 fs•>rootmnt = mntget(old•>rootmnt); 361 read_lock(&old•>lock); 360 fs•>umask = old•>umask; 359 fs•>lock = RW_LOCK_UNLOCKED; 358 atomic_set(&fs•>count, 1); if (fs) { 357 296 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 303 goto good_mm; 302 mm = oldmm; 301 atomic_inc(&oldmm•>mm_users); 300 if (clone_flags & CLONE_VM) { 299 298 return 0; 297 if (!oldmm) 296 oldmm = current•>mm; 295 */ 294 * We need to steal a active VM for that.. 293 * 292 * Are we cloning a kernel thread? 291 /* 290 289 tsk•>active_mm = NULL; 288 tsk•>mm = NULL; 287 286 tsk•>nswap = tsk•>cnswap = 0; 285 tsk•>cmin_flt = tsk•>cmaj_flt = 0; 284 tsk•>min_flt = tsk•>maj_flt = 0; 283 282 int retval; 281 struct mm_struct * mm, *oldmm; 280 { 279 static int copy_mm(unsigned long clone_flags, struct task_struct * tsk) [sys_fork()>do_fork()>copy_mm()] ==================== kernel/fork.c 279 351 ==================== 㒡䖛ˈ䖭䞠ϡݡ䞡໡DŽߑ᭄ copy_mm()ⱘҷⷕ䖬ᰃ೼ kernel/fork.c Ё˖ task_struct 㒧ᵘЁ䆹ᣛ䩜Ў 0DŽ᳝݇ mm_struct ঞ݊ϟሲⱘ vm_area_struct ㄝ᭄᥂㒧ᵘᏆ㒣೼㄀ 2 ゴЁҟ Ͼҷ㸼ⴔ䖯⿟ⱘ⫼᠋ぎ䯈ⱘ mm_struct ᭄᥂㒧ᵘDŽ⬅ѢݙḌ㒓⿟ᑊϡᢹ᳝⫼᠋ぎ䯈ˈ᠔ҹ೼ݙḌ㒓⿟ⱘ ✊ৢᰃ⫼᠋ぎ䯈ⱘ㒻ᡓDŽ䖯⿟ⱘ task_struct 㒧ᵘЁ᳝Ͼᣛ䩜 mmˈ䇏㗙Ꮖ㒣Ⳍᔧ❳ᙝњˈᅗᣛ৥ϔ ৺߭ህ݅ѿ⠊䖯⿟ⱘ sig ᣛ䩜ˈᑊᇚ⠊䖯⿟ⱘ signal_struct Ёⱘ݅ѿ䅵᭄ࡴ 1DŽ copy_files()੠ copy_fs()ϔḋˈcopy_sighand()гᰃা᳝೼ CLONE_SIGHAND Ў 0 ᯊᠡⳳℷ䖯㸠˗ ڣ { 531 530 return 0; 529 memcpy(tsk•>sig•>action, current•>sig•>action, sizeof(tsk•>sig•>action)); 528 atomic_set(&sig•>count, 1); 527 spin_lock_init(&sig•>siglock); 526 return •1; 525 if (!sig) 524 tsk•>sig = sig; 523 sig = kmem_cache_alloc(sigact_cachep, GFP_KERNEL); 522 } 521 return 0; 520 atomic_inc(¤t•>sig•>count); if (clone_flags & CLONE_SIGHAND) { 519 297 298 304 } 305 306 retval = •ENOMEM; 307 mm = allocate_mm(); 308 if (!mm) 309 goto fail_nomem; 310 311 /* Copy the current MM stuff.. */ 312 memcpy(mm, oldmm, sizeof(*mm)); 313 if (!mm_init(mm)) 314 goto fail_nomem; 315 316 down(&oldmm•>mmap_sem); 317 retval = dup_mmap(mm); 318 up(&oldmm•>mmap_sem); 319 320 /* 321 * Add it to the mmlist after the parent. 322 * 323 * Doing it this way means that we can order 324 * the list, and fork() won't mess up the 325 * ordering significantly. 326 */ 327 spin_lock(&mmlist_lock); 328 list_add(&mm•>mmlist, &oldmm•>mmlist); 329 spin_unlock(&mmlist_lock); 330 331 if (retval) 332 goto free_pt; 333 334 /* 335 * child gets a private LDT (if there was an LDT in the parent) 336 */ 337 copy_segments(tsk, mm); 338 339 if (init_new_context(tsk,mm)) 340 goto free_pt; 341 342 good_mm: 343 tsk•>mm = mm; 344 tsk•>active_mm = mm; 345 return 0; 346 347 free_pt: 348 mmput(mm); 349 fail_nomem: 350 return retval; 351 } Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 299 ᰒ✊ˈᇍ mm_struct ⱘ໡ࠊгᰃা೼ clone_flags Ё CLONE_VM ᷛᖫЎ 0 ᯊᠡⳳℷ䖯㸠ˈ৺߭ህ াᰃ䗮䖛Ꮖ㒣໡ࠊⱘᣛ䩜݅ѿ⠊䖯⿟ⱘ⫼᠋ぎ䯈DŽᇍ mm_struct ⱘ໡ࠊህϡাᰃሔ䰤Ѣ䖭Ͼ᭄᥂㒧ᵘᴀ 䑿њˈгࣙᣀњᇍ᳈⏅ሖ᭄᥂㒧ᵘⱘ໡ࠊDŽ݊Ё᳔䞡㽕ⱘᰃ vm_area_struct ᭄᥂㒧ᵘ੠义䴶᯴ᇘ㸼ˈ䖭 ᰃ⬅ dup_mmap()໡ࠊⱘDŽߑ᭄ dup_mmap()ⱘҷⷕг೼ kernel/fork.c ЁDŽ䇏㗙೼䅸ⳳ䇏䖛ᴀк㄀ 2 ゴҹ ৢˈ䯙䇏䖭↉⿟ᑣᯊᑨ䆹ϡӮᛳࠄೄ䲒ˈৠᯊгᰃϔ⃵ᕜདⱘ㒗дDŽ ==================== kernel/fork.c 125 193 ==================== [sys_fork()>do_fork()>copy_mm()>dup_mmap()] 125 static inline int dup_mmap(struct mm_struct * mm) 126 { 127 struct vm_area_struct * mpnt, *tmp, **pprev; 128 int retval; 129 130 flush_cache_mm(current•>mm); 131 mm•>locked_vm = 0; 132 mm•>mmap = NULL; 133 mm•>mmap_avl = NULL; 134 mm•>mmap_cache = NULL; 135 mm•>map_count = 0; 136 mm•>cpu_vm_mask = 0; 137 mm•>swap_cnt = 0; 138 mm•>swap_address = 0; 139 pprev = &mm•>mmap; 140 for (mpnt = current•>mm•>mmap ; mpnt ; mpnt = mpnt•>vm_next) { 141 struct file *file; 142 143 retval = •ENOMEM; 144 if(mpnt•>vm_flags & VM_DONTCOPY) 145 continue; 146 tmp = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); 147 if (!tmp) 148 goto fail_nomem; 149 *tmp = *mpnt; 150 tmp•>vm_flags &= ~VM_LOCKED; 151 tmp•>vm_mm = mm; 152 mm•>map_count++; 153 tmp•>vm_next = NULL; 154 file = tmp•>vm_file; 155 if (file) { 156 struct inode *inode = file•>f_dentry•>d_inode; 157 get_file(file); 158 if (tmp•>vm_flags & VM_DENYWRITE) 159 atomic_dec(&inode•>i_writecount); 160 161 /* insert tmp into the share list, just after mpnt */ 162 spin_lock(&inode•>i_mapping•>i_shared_lock); 163 if((tmp•>vm_next_share = mpnt•>vm_next_share) != NULL) 164 mpnt•>vm_next_share•>vm_pprev_share = Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 300 165 &tmp•>vm_next_share; 166 mpnt•>vm_next_share = tmp; 167 tmp•>vm_pprev_share = &mpnt•>vm_next_share; 168 spin_unlock(&inode•>i_mapping•>i_shared_lock); 169 } 170 171 /* Copy the pages, but defer checking for errors */ 172 retval = copy_page_range(mm, current•>mm, tmp); 173 if (!retval && tmp•>vm_ops && tmp•>vm_ops•>open) 174 tmp•>vm_ops•>open(tmp); 175 176 /* 177 * Link in the new vma even if an error occurred, 178 * so that exit_mmap() can clean up the mess. 179 */ 180 *pprev = tmp; 181 pprev = &tmp•>vm_next; 182 183 if (retval) 184 goto fail_nomem; 185 } 186 retval = 0; 187 if (mm•>map_count >= AVL_MIN_MAP_COUNT) 188 build_mmap_avl(mm); 189 190 fail_nomem: 191 flush_tlb_mm(current•>mm); 192 return retval; 193 } 䖭䞠䗮䖛 140̚185 㸠ⱘ for ᕾ⦃ᇍৠϔ⫼᠋ぎ䯈Ёⱘ৘Ͼऎ䯈䖯㸠໡ࠊDŽᇍѢ䗮䖛 mmap()᯴ᇘࠄ ᶤϾ᭛ӊⱘऎ䯈ˈ155̚169 㸠ᰃϔѯ⡍⅞ⱘ䰘ࡴ໘⧚DŽ172 㸠ⱘ copy_page_range()ᰃ݇䬂᠔೼ˈ䖭Ͼ ߑ᭄䗤ሖ໘⧚义䴶Ⳃᔩ乍੠义䴶㸼乍ˈ݊ҷⷕ೼ mm/memory.c Ё˖ ==================== mm/memory.c 144 257 ==================== [sys_fork()>do_fork()>copy_mm()>dup_mmap()>copy_page_range()] 144 /* 145 * copy one vm_area from one task to the other. Assumes the page tables 146 * already present in the new task to be cleared in the whole range 147 * covered by this vma. 148 * 149 * 08Jan98 Merged into one routine from several inline routines to reduce 150 * variable count and make things faster. •jj 151 */ 152 int copy_page_range(struct mm_struct *dst, struct mm_struct *src, 153 struct vm_area_struct *vma) 154 { 155 pgd_t * src_pgd, * dst_pgd; 156 unsigned long address = vma•>vm_start; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 301 157 unsigned long end = vma•>vm_end; 158 unsigned long cow = (vma•>vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE; 159 160 src_pgd = pgd_offset(src, address)•1; 161 dst_pgd = pgd_offset(dst, address)•1; 162 163 for (;;) { 164 pmd_t * src_pmd, * dst_pmd; 165 166 src_pgd++; dst_pgd++; 167 168 /* copy_pmd_range */ 169 170 if (pgd_none(*src_pgd)) 171 goto skip_copy_pmd_range; 172 if (pgd_bad(*src_pgd)) { 173 pgd_ERROR(*src_pgd); 174 pgd_clear(src_pgd); 175 skip_copy_pmd_range: address = (address + PGDIR_SIZE) & PGDIR_MASK; 176 if (!address || (address >= end)) 177 goto out; 178 continue; 179 } 180 if (pgd_none(*dst_pgd)) { 181 if (!pmd_alloc(dst_pgd, 0)) 182 goto nomem; 183 } 184 185 src_pmd = pmd_offset(src_pgd, address); 186 dst_pmd = pmd_offset(dst_pgd, address); 187 188 do { 189 pte_t * src_pte, * dst_pte; 190 191 /* copy_pte_range */ 192 193 if (pmd_none(*src_pmd)) 194 goto skip_copy_pte_range; 195 if (pmd_bad(*src_pmd)) { 196 pmd_ERROR(*src_pmd); 197 pmd_clear(src_pmd); 198 skip_copy_pte_range: address = (address + PMD_SIZE) & PMD_MASK; 199 if (address >= end) 200 goto out; 201 goto cont_copy_pmd_range; 202 } 203 if (pmd_none(*dst_pmd)) { 204 if (!pte_alloc(dst_pmd, 0)) 205 goto nomem; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 302 206 } 207 208 src_pte = pte_offset(src_pmd, address); 209 dst_pte = pte_offset(dst_pmd, address); 210 211 do { 212 pte_t pte = *src_pte; 213 struct page *ptepage; 214 215 /* copy_one_pte */ 216 217 if (pte_none(pte)) 218 goto cont_copy_pte_range_noset; 219 if (!pte_present(pte)) { 220 swap_duplicate(pte_to_swp_entry(pte)); 221 goto cont_copy_pte_range; 222 } 223 ptepage = pte_page(pte); 224 if ((!VALID_PAGE(ptepage)) || 225 PageReserved(ptepage)) 226 goto cont_copy_pte_range; 227 228 /* If it's a COW mapping, write protect it both in the parent and the child */ 229 if (cow) { 230 ptep_set_wrprotect(src_pte); 231 pte = *src_pte; 232 } 233 234 /* If it's a shared mapping, mark it clean in the child */ 235 if (vma•>vm_flags & VM_SHARED) 236 pte = pte_mkclean(pte); 237 pte = pte_mkold(pte); 238 get_page(ptepage); 239 240 cont_copy_pte_range: set_pte(dst_pte, pte); 241 cont_copy_pte_range_noset: address += PAGE_SIZE; 242 if (address >= end) 243 goto out; 244 src_pte++; 245 dst_pte++; 246 } while ((unsigned long)src_pte & PTE_TABLE_MASK); 247 248 cont_copy_pmd_range: src_pmd++; 249 dst_pmd++; 250 } while ((unsigned long)src_pmd & PMD_TABLE_MASK); 251 } 252 out: 253 return 0; 254 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᄤ䖯⿟ⱘ⫼᠋ぎ䯈DŽ ⠊ǃᄤ䖯⿟ᰃ೼ⳳℷⱘᛣНϞ݅ѿ⫼᠋ぎ䯈ˈ⠊䖯⿟ݭܹ݊⫼᠋ぎ䯈ⱘݙᆍৠᯊгĀݭܹ” ᷛᖫԡЎ 1ˈ಴㗠⠊ǃᄤ䖯⿟䗮䖛ᣛ䩜݅ѿ⫼᠋ぎ䯈ᯊˈcopy_on_write ህ⫼ϡϞњDŽℸᯊˈ њ˅DŽৃᰃˈcopy_on_write া᳝೼⠊ǃᄤ䖯⿟৘㞾ᢹ᳝㞾Ꮕⱘ义䴶㸼ᯊᠡ㛑ᅲ⦄DŽᔧ CLONE_VM Ͼ䖯⿟ˈᅠܼձ䌪Ѣ“copy on wrtreā˄৺߭ˈ೼ fork ϔϾ䖯⿟ᯊህᕫ㽕໡ࠊ↣ϔϾ⠽⧚义䴶 ✊ৢᇚϸϾ义䴶㸼ЁⳌᑨⱘ㸼乍ᬍ៤ৃݭDŽ᠔ҹˈLinux ݙḌП᠔ҹৃҹᕜ䖙䗳ഄĀ໡ࠊāϔ 䴶ˈᑊᡞݙᆍⳳℷഄĀ໡ࠊāࠄᮄⱘ⠽⧚义䴶Ёˈˈ䅽⠊ǃᄤ䖯⿟৘㞾ᢹ᳝㞾Ꮕⱘ⠽⧚义䴶ˈ 义䴶ᯊˈ䛑Ӯᓩ䍋ϔ⃵义䴶ᓖᐌDŽ㗠义䴶ᓖᐌ໘⧚⿟ᑣᇍℸⱘডᑨ߭ᰃ঺㸠ߚ䜡ϔϾ⠽⧚义 ϔᴹˈⳌᑨⱘ义䴶೼ϸϾ䖯⿟Ё䛑ব៤Āা䇏āњˈᔧϡㅵᰃ⠊䖯⿟៪ᰃᄤ䖯⿟ӕ೒ݭܹ䆹 䴶㸼乍ᬍ៤ݭֱᡸˈ✊ৢ೼ 236 㸠ᡞᏆ㒣ᬍ៤ݭֱᡸⱘ㸼乍䆒㕂ࠄᄤ䖯⿟ⱘ义䴶㸼ЁDŽ䖭ḋ 㽕೼ 230 ੠ 231 㸠ᇚ⠊䖯⿟ⱘ义ܜϸӊ䞡㽕ⱘџᚙˈ佪خ䴶㸼乍᱖ᯊ݅ѿϔϾ义䴶㸼乍ᯊ㽕 Ѣ copy_on_write ऎ䯈DŽᅲ䰙ϞˈᇍѢ㒱໻໮᭄ⱘৃݭ㰮ᄬऎ䯈ˈcow 䛑ᰃ 1DŽ೼䗮䖛໡ࠊ义 㰮ᄬऎ䯈ⱘᗻ䋼ᰃৃݭ˄VM_MAYWRITE Ў 1˅㗠জϡᰃ݅ѿ˄VM_SHARED Ў 0˅ˈህሲ ⱘሔ䚼ব䞣 cow ᰃ೼ࠡ䴶 158 㸠ᅮНⱘˈব䞣ৡ cow ᰃ“copy on writeāⱘ㓽ݭDŽা㽕ϔϾ ᯊ݅ѿ䖭Ͼ义䴶ˈࠄᄤ䖯⿟˄៪⠊䖯⿟˅ⳳⱘ㽕ݭ䖭Ͼ义䴶ᯊݡ⃵ߚ䜡义䴶੠໡ࠊDŽҷⷕЁ 䗮䖛໡ࠊ义䴶㸼乍᱖ܜˈ໮ᇥњDŽ᠔ҹˈLinux ݙḌ䞛⫼њϔ⾡⿄Ў“copy on writeāⱘᡔᴃ া㽕⠊䖯⿟Ңℸϡݡݭ䖭Ͼ义䴶ˈህᅠܼৃҹ䗮䖛໡ࠊᣛ䩜ᴹ݅ѿ䖭Ͼ义䴶ˈ䙷ϡⶹ㽕ⳕџ 㢺㢺໡ࠊϟᴹⱘ义䴶ˈᄤ䖯⿟ᰃ৺ϔᅮӮ⫼ਸ਼˛⡍߿ᰃӮ᳝ݭ䆓䯂ਸ਼˛བᵰাᰃ䇏䆓䯂ˈ߭ 䴶ᡞݙᆍ໡ࠊ䖛ᴹˈᑊЎПᓎゟ᯴ᇘDŽᰒ✊ˈ䖭Ͼ᪡԰ⱘҷӋᰃϡᇣⱘDŽ✊㗠ˈᇍ䖭М䕯䕯 (4) 䳔㽕Ң⠊䖯⿟໡ࠊⱘৃݭ义䴶DŽᴀᴹˈℸᯊᑨ䆹ߚ䜡ϔϾぎ䯆ⱘݙᄬ义䴶ˈݡҢ⠊䖯⿟ⱘ义 ℸ㸼乍໡ࠊࠄᄤ䖯⿟ⱘ义䴶㸼ЁDŽ ᤶߎᴎࠊㅵ䕪ⱘˈᅲ䰙Ϟгϡ⍜㗫ࡼᗕߚ䜡ⱘݙᄬ义䴶ˈ᠔ҹг䕀ࠄ cont_copy_pte_range ᇚ ϡᰃݙᄬ义䴶DŽ䖭ḋⱘ义䴶ǃҹঞ㱑ᰃݙᄬ义䴶Ԛ⬅ݙḌֱ⬭ⱘ义䴶ˈᰃϡሲѢ义䴶ᤶܹˋ ಲ乒ϔϟˈ៥Ӏҹࠡ䆆䖛᳝ѯ⠽⧚义䴶೼໪䆒᥹ষवϞˈⳌᑨⱘഄഔ⿄ЎĀᘏ㒓ഄഔāˈ㗠ᑊ (3) ᯴ᇘᏆᓎゟˈԚᰃ⠽⧚义䴶ϡᰃϔϾ᳝ᬜⱘݙᄬ义䴶ˈ᠔ҹ VALID_PAGE()䖨ಲ 0DŽ䇏㗙ৃҹ ✊ৢˈህ䕀ࠄ cont_copy_pte_range ᇚℸ㸼乍໡ࠊࠄᄤ䖯⿟ⱘ义䴶㸼ЁDŽ ⱘഄ⚍ˈ㗠⦄೼䆹ⲬϞ义䴶໮њϔϾĀ⫼᠋āˈ᠔ҹ㽕䗮䖛 swap_duplicate()䗦๲ᅗⱘ݅ѿ䅵᭄DŽ Ԛᰃ䆹义䴶Ⳃࠡϡ೼ݙᄬЁˈᏆ㒣㹿䇗ߎࠄѸᤶ䆒໛ϞDŽℸᯊ㸼乍ⱘݙᆍᣛᯢĀⲬϞ义䴶” (2) 㸼乍ⱘ᳔Ԣԡˈे_PAGE_PRESENT ᷛᖫԡЎ 0ˈ᠔ҹ pte_present()䖨ಲ 1DŽ䇈ᯢ᯴ᇘᏆᓎゟˈ ӏԩџDŽخ಴ℸϡ䳔㽕 (1) 㸼乍ⱘݙᆍЎܼ 0ˈ᠔ҹ pte_none()䖨ಲ 1DŽ䇈ᯢ䆹义䴶ⱘ᯴ᇘᇮ᳾ᓎゟˈ៪㗙䇈ᰃϾĀぎ⋲āˈ ߭᮴䴲ᰃϟ䴶䖭Мϔѯৃ㛑˖ ᅮ݋ԧⱘ᪡԰DŽ㗠㸼乍ⱘݙᆍˈއᕾ⦃ЁẔᶹ⠊䖯⿟ϔϾ义䴶㸼Ёⱘ↣Ͼ㸼乍ˈḍ᥂㸼乍ⱘݙᆍ ⱘ do ᕾ⦃߭ᰃᇍ义䴶㸼乍ⱘᕾ⦃DŽ៥Ӏᡞ⊼ᛣ࡯䲚Ё೼ 211̚246 㸠ᇍ义䴶㸼乍ⱘ do•while ᕾ⦃DŽ ҷⷕЁ 163 㸠ⱘ for ᕾ⦃ᰃᇍ义䴶Ⳃᔩ乍ⱘᕾ⦃ˈ188 㸠ⱘ do ᕾ⦃ᰃᇍЁ䯈Ⳃᔩ乍ⱘᕾ⦃ˈ211 㸠 257 } 256 return •ENOMEM; nomem: 255 303 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ぎ䯈ᑺ᮹ˈࣙᣀ⫼᠋ぎ䯈ේᷜ೼ݙDŽټгህᰃ䇈ˈ㒣 vfork()໡ࠊⱘᰃϾ㒓⿟ˈা㛑䴴݅ѿ݊⠊䖯⿟ⱘᄬ ಴Ўᷛᖫԡ CLONE_VM Ў 1ˈাᰃ䗮䖛ᣛ䩜݅ѿ݊⠊䖯⿟ⱘ mm_structˈᑊ≵᳝ϔӑ㞾ᏅⱘࡃᴀDŽ䖭 CLONE_VM | SIGCHLDˈ᠔ҹাᠻ㸠њ copy_files()ǃcopy_fs()ҹঞ copy_sighand()˗㗠 copy_mm()ˈ߭ 䖭ಯ乍䌘⑤ܼ䛑໡ࠊњDŽ㗠ᔧ vfork()㒣䖛 sys_vfork()䖯ܹ do_fork()ᯊˈ߭݊ clone_flags Ў VFORK | ᠔᳝ⱘᷛᖫԡഛЎ 0ˈ᠔ҹ copy_files()ǃcopy_fs()ǃcopy_sighand()ҹঞ copy_mm()ܼ䚼䛑ⳳℷᠻ㸠њˈ ಲ乒ϔϟˈᔧ㋏㒳䇗⫼ fork()䗮䖛 sys_fork()䖯ܹ do_fork()ᯊˈ݊ clone_flags Ў SIGCHILDˈгህᰃ䇈ˈ ᔧ CPU Ң copy_mm()ಲࠄ do_fork()Ёᯊˈ᠔᳝䳔㽕᳝ᴵӊ໡ࠊⱘ䌘⑤䛑Ꮖ㒣໘⧚ᅠњDŽ䇏㗙ϡོ 䇁হDŽ ಲࠄ copy_mm()ⱘҷⷕDŽᇍѢ i386 CPUᴹ䇈ˈcopy_mm()Ё 339 㸠໘ⱘ init_new_context()ᰃϾぎ 521 } 520 new_mm•>context.segments = ldt; 519 } 518 memcpy(ldt, old_ldt, LDT_ENTRIES*LDT_ENTRY_SIZE); 517 else 516 printk(KERN_WARNING "ldt allocation failed\n"); 515 if (!ldt) 514 ldt = vmalloc(LDT_ENTRIES*LDT_ENTRY_SIZE); 513 */ 512 * Completely new LDT, we initialize it from the parent: 511 /* 510 if (old_mm && (old_ldt = old_mm•>context.segments) != NULL) { 509 old_mm = current•>mm; 508 ldt = NULL; 507 506 void *old_ldt, *ldt; 505 struct mm_struct * old_mm; 504 { 503 void copy_segments(struct task_struct *p, struct mm_struct *new_mm) 502 */ 501 * done in switch_mm() as needed. 500 * we do not have to muck with descriptors here, that is 499 /* [sys_fork()>do_fork()>copy_mm()>copy_segments()] ==================== arch/i386/kernel/process.c 499 521 ==================== Ё˖ ᰃ᳝݈䍷ⱘ䇏㗙гϡོ㞾Ꮕⳟⳟᅗᰃᗢḋ໡ࠊⱘDŽcopy_segments()ⱘҷⷕ೼ arch/i386/kernel/process.c Ӏ೼㄀ 2 ゴЁ䆆䖛ˈা᳝೼ VM86 ῵ᓣЁ䖤㸠ⱘ䖯⿟ᠡӮ᳝ LDTDŽ㱑✊៥Ӏᑊϡ݇ᖗ VM86 ῵ᓣˈԚ ಲࠄ copy_mm()ⱘҷⷕЁDŽߑ᭄ copy_segments()໘⧚ⱘᰃ䖯⿟ৃ㛑݋᳝ⱘሔ䚼↉ᦣ䗄㸼 LDTDŽ៥ ݙḌ㛑໳ᕜ䖙䗳ഄ fork()៪ clone()ϔϾ䖯⿟ⱘ⾬ᆚDŽ ৃ㾕ˈৡЎ copy_page_range()ˈᅲ䰙Ϟै䖲ϔϾ义䴶г≵᳝ⳳℷഄĀ໡ࠊāˈ䖭ህᰃЎҔМ Linux 䖯⿟ⱘা䇏义䴶DŽ䖭⾡义䴶ᴀᴹህϡ䳔㽕໡ࠊDŽ಴㗠ৃҹ໡ࠊ义䴶㸼乍݅ѿ⠽⧚义䴶DŽ⠊ (5) 304 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ህᣛ৥њᄤ䖯⿟㋏㒳ぎ䯈ේᷜЁⱘ pt_regs 㒧ᵘˈབ೒ 4.3 ᠔ˈ1 ޣᇚ݊বᤶ៤ struct pt_regs*ˈݡҢЁ 䩜ˈᣛ৥ϸϾ䖲㓁⠽⧚义䴶ⱘ䍋ྟഄഔ˗㗠 THREAD_SIZE+(unsigned long)p ߭ᣛ৥䖭ϸϾ义䴶ⱘ乊ッDŽ Ḍࠡ໩৘Ͼᆘᄬ఼ⱘݙᆍˈᑊᔶ៤ϔϾ pt_regs ᭄᥂㒧ᵘDŽ䖭䞠 535 㸠Ёⱘ p Ўᄤ䖯⿟ⱘ task_struct ᣛ 䇏㗙Ꮖ㒣ⳟࠄᔧϔϾ䖯⿟಴㋏㒳䇗⫼៪Ёᮁ㗠䖯ܹݙḌᯊˈ݊㋏㒳ぎ䯈ේᷜⱘ乊䚼ֱᄬⴔ CPU 䖯ܹݙ ᴹⳟ 535 㸠DŽ೼㄀ 3 ゴЁˈܜᰃᄤ䖯⿟њˈ᠔ҹ໡ࠊҹৢ䖬㽕⬹԰䇗ᭈDŽ䖭ᰃϔ↉ᕜ᳝䍷ⱘ⿟ᑣˈ៥Ӏ ᅗ໡ࠊ㒭ᄤ䖯⿟DŽԚᰃˈབᵰᄤ䖯⿟ⱘ㋏㒳ぎ䯈ේᷜϢ⠊䖯⿟ⱘᅠܼⳌৠˈ䙷䖨ಲҹৢህ᮴Ңऎߚ䇕 䖛㋏㒳䇗⫼䖯ܹ㋏㒳ぎ䯈ᓔྟࠄ䖯ܹ copy_thread()ⱘᴹग़ˈᄤ䖯⿟ᇚ㽕ᕾⳌৠⱘ䏃㒓䖨ಲˈ᠔ҹ㽕ᡞ ৡЎ copy_thread()ˈᅲ䰙Ϟैাᰃ໡ࠊ⠊䖯⿟ⱘ㋏㒳ぎ䯈ේᷜDŽේᷜЁⱘݙᆍ䇈ᯢњ⠊䖯⿟Ң䗮 552 } 551 return 0; 550 549 struct_cpy(&p•>thread.i387, ¤t•>thread.i387); 548 unlazy_fpu(current); 547 546 savesegment(gs,p•>thread.gs); 545 savesegment(fs,p•>thread.fs); 544 543 p•>thread.eip = (unsigned long) ret_from_fork; 542 541 p•>thread.esp0 = (unsigned long) (childregs+1); 540 p•>thread.esp = (unsigned long) childregs; 539 538 childregs•>esp = esp; 537 childregs•>eax = 0; 536 struct_cpy(childregs, regs); 535 childregs = ((struct pt_regs *) (THREAD_SIZE + (unsigned long) p)) • 1; 534 533 struct pt_regs * childregs; 532 { 531 struct task_struct * p, struct pt_regs * regs) 530 unsigned long unused, 529 int copy_thread(int nr, unsigned long clone_flags, unsigned long esp, 528 527 asm volatile("movl %%" #seg ",%0":"=m" (*(int *)&(value))) 526 #define savesegment(seg,value) \ 525 */ 524 * Save a segment. 523 /* ==================== arch/i386/kernel/process.c 523 552 ==================== 䖭ӊџњDŽ䖭Ͼߑ᭄ⱘҷⷕ೼ arch/i386/kernel/process.c Ё˖خcopy_thread()ᴹ task_struct 㒧ᵘˈᏆ㒣෎ᴀϞ໡ࠊདњ˗㗠⫼԰㋏㒳ぎ䯈ේᷜⱘ催ッˈै䖬≵᳝໡ࠊDŽ⦄೼ህ⬅ ಲࠄ do_fork()ⱘҷⷕЁDŽࠡ䴶Ꮖ䗮䖛 alloc_task_struct()ߚ䜡њϸϾ䖲㓁ⱘ义䴶ˈ݊Ԣッ⫼԰ ᳝Ꮖᠧᓔ᭛ӊˈ䙷МेՓᠻ㸠њ copy_files()ˈг䖬ᰃぎⱘDŽ Ѣ⠊䖯⿟݋᳝ҔМ䌘⑤ˈ㽕ᰃ⠊䖯⿟≵އѢ䇗⫼ᯊⱘখ᭄DŽᔧ✊ˈ᳔㒜䖬ᕫপއ㟇Ѣ__clone()ˈ߭প 305 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 659 p•>parent_exec_id = p•>self_exec_id; 658 657 These must match for thread signalling to apply */ 656 /* Our parent execution domain becomes current domain [sys_fork()>do_fork()] ==================== kernel/fork.c 656 706 ==================== ಲࠄ do_fork()ˈݡᕔϟⳟ˖ ⍂⚍໘⧚఼㗠䆒ⱘˈ䙷ህϡᰃ៥Ӏ᠔݇ᖗⱘњDŽ гህᰃᡞᔧࠡⱘ↉ᆘᄬ఼ fs ⱘֱؐᄬ೼ p•>thread.fs ЁDŽ546 㸠Ϣℸ㉏ԐDŽ548 㸠੠ 549 㸠ᰃЎ i387 asm volatile (“movl %%fs, %0 “ : “ = m” (* (int *) & p•>thread.fs)) ᠔ҹˈ545 㸠೼ gcc 乘໘⧚ҹৢህӮব៤ 䇏᳝݇䖯⿟ߛᤶⱘҷⷕᯊ䖬㽕䆆ࠄDŽ545 㸠੠ 546 㸠ⱘ savesegment ᰃϾᅣ᪡԰ˈ݊ᅮНህ೼ 526 㸠DŽ ᓔྟˈ䖭ϔ⚍ҹৢ೼䯙ܓᇚℸഄഔ䆒㕂៤ ret_from_forkˈՓ߯ᓎⱘᄤ䖯⿟೼佪⃵㹿䇗ᑺ䖤㸠ᯊህҢ䙷 p•>thread.eip ⱘؐ㸼⼎ᔧ䖯⿟ϟϔ⃵㹿ߛᤶ䖯ܹ䖤㸠ᯊⱘߛܹ⚍ˈ㉏ԐѢߑ᭄䇗⫼៪Ёᮁⱘ䖨ಲഄഔDŽ Ӯᇚ䖭Ͼব䞣ⱘؐݭܹ TSS ⱘ esp0 ᄫ↉ˈ㸼⼎䖭Ͼ䖯⿟䖯ܹ 0 㑻䖤㸠ᯊ݊ේᷜⱘԡ㕂DŽℸ໪ˈ њϔḋDŽ㗠 p•>thread.esp0 ߭ᑨ䆹ᣛ৥ᄤ䖯⿟ⱘ㋏㒳ぎ䯈ේᷜⱘ乊ッDŽᔧϔϾ䖯⿟㹿䇗ᑺ䖤㸠ᯊˈݙḌ 䖭Ͼᄤ䖯⿟ҹࠡ᳒㒣䖤㸠䖛ˈ㗠೼䖯ܹݙḌҹৢℷ㽕䖨ಲ⫼᠋ぎ䯈ᯊ㹿ߛᤶڣ㒧ᵘⱘ䍋ྟഄഔˈህད 䯈ේᷜˈ᠔ҹг㽕Ⳍᑨࡴҹ䇗ᭈDŽ݋ԧഄ䇈ˈ540 㸠ᇚ p•>thread.esp 䆒㕂៤ᄤ䖯⿟㋏㒳ぎ䯈ේᷜЁ pt_regs ໡ࠊ task_struct ᭄᥂㒧ᵘⱘᯊ׭ˈ䖭ѯֵᙃгॳᇕϡࡼഄ໡ࠊњ䖛ᴹDŽৃᰃˈᄤ䖯⿟᳝㞾Ꮕⱘ㋏㒳ぎ ᔩⴔ䖯⿟೼ߛᤶᯊⱘ˄㋏㒳ぎ䯈˅ේᷜᣛ䩜ˈপᣛҸഄഔ˄гህᰃĀ䖨ಲഄഔā˅ㄝ݇䬂ᗻⱘֵᙃDŽ೼ ೼䖯⿟ⱘ task_struct 㒧ᵘЁ᳝Ͼ䞡㽕ⱘ៤ߚ threadˈᅗᴀ䑿ᰃϔϾ᭄᥂㒧ᵘ thread_structˈ䞠䴶䆄 ᠔ҹᅲ䰙Ϟᑊ≵᳝ᬍবˈ䖬ᰃᣛ৥⠊䖯⿟ॳᴹ೼⫼᠋ぎ䯈ⱘේᷜDŽ 䇗⫼Ёˈ䖭Ͼখ᭄ᰃ⬅䇗⫼㗙㒭ᅮⱘDŽ㗠೼ fork()੠ vfork()Ёˈ߭ᴹ㞾䇗⫼ do_fork()ࠡ໩ⱘ regs.espˈ ᅮњ䖯⿟೼⫼᠋ぎ䯈ⱘේᷜԡ㕂DŽ೼__clone()އؐЎ 0DŽ݊⃵ˈ䖬㽕ᇚ㒧ᵘЁⱘ esp㕂៤䖭䞠ⱘখ᭄ espˈᅗ ᔧᄤ䖯⿟ফ䇗ᑺ㗠Āᘶ໡ā䖤㸠ˈҢ㋏㒳䇗⫼Ā䖨ಲāᯊˈ䖭ህᰃ䖨ಲؐDŽབࠡ᠔䗄ˈᄤ䖯⿟ⱘ䖨ಲ ᇚ䆹㒧ᵘЁⱘ eax 㕂៤ 0DŽˈܜᷜЁⱘ pt_regs 㒧ᵘ໡ࠊ䖛এˈݡᴹ԰ᇥ䞣ⱘ䇗ᭈDŽҔМḋⱘ䇗ᭈਸ਼˛佪 ᇚᔧࠡ䖯⿟㋏㒳ぎ䯈ේܜᕫࠄњᣛ৥ᄤ䖯⿟㋏㒳ぎ䯈ේᷜЁ pt_regs 㒧ᵘⱘᣛ䩜 childregs ҹৢˈህ ೒ 4.3 ᄤ䖯⿟㋏㒳ぎ䯈ේᷜ⼎ᛣ೒ DŽ⼎ 306 307 660 661 /* ok, now we should be set up.. */ 662 p•>swappable = 1; 663 p•>exit_signal = clone_flags & CSIGNAL; 664 p•>pdeath_signal = 0; 665 666 /* 667 * "share" dynamic priority between parent and child, thus the 668 * total amount of dynamic priorities in the system doesnt change, 669 * more scheduling fairness. This is only important in the first 670 * timeslice, on the long run the scheduling behaviour is unchanged. 671 */ 672 p•>counter = (current•>counter + 1) >> 1; 673 current•>counter >>= 1; 674 if (!current•>counter) 675 current•>need_resched = 1; 676 677 /* 678 * Ok, add it to the run•queues and make it 679 * visible to the rest of the system. 680 * 681 * Let it rip! 682 */ 683 retval = p•>pid; 684 p•>tgid = retval; 685 INIT_LIST_HEAD(&p•>thread_group); 686 write_lock_irq(&tasklist_lock); 687 if (clone_flags & CLONE_THREAD) { 688 p•>tgid = current•>tgid; 689 list_add(&p•>thread_group, ¤t•>thread_group); 690 } 691 SET_LINKS(p); 692 hash_pid(p); 693 nr_threads++; 694 write_unlock_irq(&tasklist_lock); 695 696 if (p•>ptrace & PT_PTRACED) 697 send_sig(SIGSTOP, p, 1); 698 699 wake_up_process(p); /* do this last */ 700 ++total_forks; 701 702 fork_out: 703 if ((clone_flags & CLONE_VFORK) && (retval > 0)) 704 down(&sem); 705 return retval; 706 ҷⷕЁⱘ parent_exec_id 㸼⼎⠊䖯⿟ⱘᠻ㸠ඳˈself_exec_id Ўᴀ䖯⿟ⱘᠻ㸠ඳˈswappable 㸼⼎ᴀ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 70 #define DECLARE_MUTEX(name) __DECLARE_SEMAPHORE_GENERIC(name,1) ==================== include/asm•i386/semaphore.h 70 71 ==================== DECLARE_MUTEX_LOCKED ᰃ೼ include/asm•i386/semaphore.h ЁᅮНⱘ˖ ܓ䖭 560 DECLARE_MUTEX_LOCKED(sem); [sys_fork()>do_fork()] ==================== kernel/fork.c 560 560 ==================== 䜡њぎ䯈˅ ো䞣 sem ᰃ೼ߑ᭄ᓔ༈ᯊⱘ 560 㸠ᅮНⱘϔϾሔ䚼䞣˄ৡ᳄ DECLAREˈᅲ䰙ϞЎПߚֵˈܜ佪 ⳟⳟ݋ԧᰃᗢḋᅲ⦄ⱘDŽ 䗮䖛䅽ᔧࠡ䖯⿟˄⠊䖯⿟˅೼ϔϾֵো䞣Ϟᠻ㸠ϔ⃵ down()᪡԰ˈҹ䖒ࠄᠷ⬭⠊䖯⿟ⱘⳂⱘDŽ៥Ӏᴹ ϟˈމ᠔ҹˈdo_fork()Ёⱘ 703 㸠੠ 704 㸠೼ CLONE_VFORK ᷛᖫЎ 1 ᑊϨ fork ᄤ䖯⿟៤ࡳⱘᚙ ⱘ⫼᠋ぎ䯈៪݊ЁϔϾ䖯⿟˄ᖙ✊ᰃಲࠄ⫼᠋ぎ䯈䖤㸠ⱘ䙷Ͼ䖯⿟˅⍜ѵЎℶDŽ ⱘࡲ⊩া㛑ᰃĀᠷ⬭ā݊ЁϔϾ䖯⿟ˈ㗠া䅽ϔϾ䖯⿟ಲࠄ⫼᠋ぎ䯈ˈⳈࠄϸϾ䖯⿟ϡݡ݅ѿᅗӀއ 䛑ಲࠄ⫼᠋ぎ䯈ᑊথഄ䖤㸠˗৺߭ˈᖙ✊ᰃϸϾ䖯⿟᳔㒜䛑хᴹϔ⇨៪㗙಴䴲⊩䍞⬠䆓䯂㗠⅏ѵDŽ㾷 ϟ㒱ϡ㛑䅽ϸϾ䖯⿟މੑⱘњDŽ㗠↣⃵ᇍᄤ⿟ᑣⱘ䇗⫼䛑ᰃᇍේᷜऎⱘݭܹʽ⬅ℸৃ㾕ˈ೼䖭ḋⱘᚙ ϟ⠊ǃᄤ䖯⿟৘㞾ᇍ᭄݊᥂ऎⱘݭܹৃ㛑Ӯᓩ䍋䯂乬ⱘ䆱ˈ䙷Мᇍේᷜऎⱘݭܹৃህᰃ㟈މ೼䖭⾡ᚙ ѿ⫼᠋ぎ䯈ˈ⠊䖯⿟ݭܹ݊⫼᠋ぎ䯈ⱘݙᆍৠᯊгĀݭܹāᄤ䖯⿟ⱘ⫼᠋ぎ䯈ˈডПѺ✊DŽབᵰ䇈ˈ CLONE_VM ᷛᖫԡЎ 1ˈ಴㗠⠊ǃᄤ䖯⿟䗮䖛ᣛ䩜݅ѿ⫼᠋ぎ䯈ᯊˈ⠊ǃᄤ䖯⿟ᰃ೼ⳳℷⱘᛣНϞ݅ Ѣ CLONE_VM ᷛᖫԡⱘؐDŽᔧއ⿟ⱘ task_struct 㒧ᵘЁᣛ৥݊ mm_struct 㒧ᵘⱘᣛ䩜ᴹ݅ѿˈ݋ԧপ ሲⱘ৘Ͼ vm_area_struct ᭄᥂㒧ᵘˈݡࡴϞ⠊䖯⿟ⱘ义䴶Ⳃᔩ੠义䴶㸼ᴹ㒻ᡓ˗гৃҹㅔऩഄ໡ࠊ⠊䖯 ࠡ䴶䇏㗙Ꮖ㒣ⳟࠄˈ೼߯ᓎᄤ䖯⿟ᯊˈᇍѢ⠊䖯⿟ⱘ⫼᠋ぎ䯈ৃҹ䗮䖛໡ࠊ⠊䖯⿟ⱘ mm_struct ঞ݊ϟ exit()䗔ߎ㋏㒳ᯊˈᠡৃҹᘶ໡⠊䖯⿟ⱘ䖤㸠DŽЎҔМਸ਼˛䖭㽕Ң⫼᠋ぎ䯈ⱘ໡ࠊ៪݅ѿ䖭Ͼ䯂乬䇈䍋DŽ 䖤㸠ˈϔⳈࠄᄤ䖯⿟䗮䖛㋏㒳䇗⫼ execve()ᠻ㸠ϔϾᮄⱘৃᠻ㸠⿟ᑣ៪㗙䗮䖛㋏㒳䇗⫼ܜ䆕䅽ᄤ䖯⿟ 㽕㗗㰥DŽᔧ䇗⫼ do_fork()ⱘখ᭄Ё CLONE_VFORK ᷛᖫԡЎ 1 ᯊˈϔᅮ㽕ֱމ䖬᳝ϔ⾡⡍⅞ᚙ 䖤㸠ⱘৃ㛑䕗໻DŽܜ䴶ˈ᠔ҹ⠊䖯⿟ ϡ䖛ˈϔ㠀㗠㿔ˈ⬅Ѣ⠊ǃᄤ䖯⿟䗖⫼Ⳍৠⱘ䇗ᑺᬓㄪˈ㗠⠊䖯⿟೼ৃᠻ㸠䖯⿟䯳߫Ёᥦ೼ᄤ䖯⿟ࠡ 䖨ಲࠄ⫼᠋ぎ䯈ᰃϡ⹂ᅮⱘDŽܜ˄ᔧࠡ䖯⿟˅Ң㋏㒳䇗⫼䖨ಲⱘࠡ໩ৃ㛑Ӯ᥹ফ䇗ᑺˈ᠔ҹˈࠄᑩ䇕Ӯ ೼⫼᠋ぎ䯈Ё݋᳝Ⳍৠⱘ䖨ಲഄഔˈ✊ৢᠡӮ಴⫼᠋ぎ䯈Ё⿟ᑣⱘᅝᥦ㗠ߚᓔDŽৠᯊˈ⬅Ѣᔧ⠊䖯⿟ 㟇ℸˈᮄ䖯⿟ⱘ߯ᓎᏆ㒣ᅠ៤њˈᑊϨᏆ㒣ᣖܹњৃ䖤㸠䖯⿟ⱘ䯳߫᥹ফ䇗ᑺDŽᄤ䖯⿟Ϣ⠊䖯⿟ гህᰃᇚ݊ᣖܹৃᠻ㸠䖯⿟䯳߫ㄝᕙ䇗ᑺDŽ᳝݇䆺ᚙৃখⳟĀ䖛⿟ⱘⴵ⳴Ϣ૸䝦āϔ㡖DŽ ⿟āҹঞĀ䖯⿟ⱘ䇗ᑺϢߛᤶāϸ㡖Ёⱘ᳝݇ভ䗄DŽ᳔ৢˈ䗮䖛 wake_up_process()ᇚᄤ䖯⿟Ā૸䝦āˈ 䯳߫DŽ᳝݇䖭ѯ䯳߫ⱘ䆺ᚙৃখⳟĀ䖯ޥ䖯⿟䯳߫ˈ✊ৢজ䗮䖛 hash_pid()ᇚ݊䫒ܹᣝ݊ pid 䅵ㅫᕫⱘᴖ 䗮䖛 SET_LINKS(p)ᇚᄤ䖯⿟ⱘ task_struct 㒧ᵘ䫒ܹݙḌⱘܜ᥹ⴔˈህ㽕䅽ᄤ䖯⿟䖯ܹᅗⱘ݇㋏㔥њDŽ 㒓⿟ˈ߭䖬㽕䗮䖛 task_struct 㒧ᵘЁⱘ䯳߫༈ thread_group Ϣ⠊䖯⿟䫒᥹䍋ᴹˈᔶ៤ϔϾĀ㒓⿟㒘āDŽ ⱘ䖤㸠ᯊ䯈䜡乱ˈ䖭䞠ᇚ⠊䖯⿟ⱘᯊ䯈䜡乱ߚ៤ϸञˈ䅽⠊ǃᄤ䖯⿟৘᳝ॳؐⱘϔञDŽབᵰ߯ᓎⱘᰃ Ў㽕∖⠊䖯⿟೼ᠻ㸠 exit()ᯊ৥ᴀ䖯⿟থߎⱘֵোDŽℸ໪ˈtask_struct 㒧ᵘЁ counter ᄫ↉ⱘؐህᰃ䖯⿟ 义䴶ৃҹ㹿ᤶߎˈexit_signal Ўᴀ䖯⿟ᠻ㸠 exit()ᯊᑨ৥⠊䖯⿟থߎⱘֵݭˈpdeath_signalټ䖯⿟ⱘᄬ 308 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 733 goto out; 732 if (IS_ERR(filename)) 731 error = PTR_ERR(filename); 730 filename = getname((char *) regs.ebx); 729 728 char * filename; 727 int error; 726 { 725 asmlinkage int sys_execve(struct pt_regs regs) 724 */ 723 * sys_execve() executes a new program. 722 /* ==================== arch/i386/kernel/process.c 722 740 ==================== ㋏㒳䇗⫼ execve()ݙḌܹষᰃ sys_execve()ˈҷⷕ㾕 arch/i386/kernel/process.c˖ ⱘ㒘ড়DŽ៥ӀᏆ㒣೼ᴀゴ㄀ 2 㡖ҟ㒡䖛ᑨ⫼⿟ᑣᗢḋ䇗⫼ execve()ˈ⦄೼៥Ӏህᴹҟ㒡 execve()ⱘᅲ⦄DŽ ੠ execvp()DŽℸ໪ˈ䖬᳝ᑧߑ᭄ system()ˈгϢ exccve()᳝݇ˈϡ䖛 system()ᰃ fork()ǃexecve()ǃwait4() ᑧЁ߭জ೼ℸ෎⸔Ϟ৥ᑨ⫼⿟ᑣᦤկϔᭈ༫ⱘᑧߑ᭄ˈࣙᣀ execl()ǃexeclp()ǃexecle()ǃexecleo()ǃexecv() 㸠⿟ᑣᰃ䖯⿟⫳ੑग़⿟Ё݇䬂ᗻⱘϔℹDŽLinux ЎℸᦤկњϔϾ㋏㒳䇗⫼ execve()ˈ㗠೼ C 䇁㿔ⱘ⿟ᑣ ߎᴹⱘᄤ䖯⿟ϡ㛑Ϣ⠊䖯⿟ߚ䘧ᡀ䭇ˈĀ䍄㞾Ꮕⱘ䏃āˈ䙷ህ≵᳝໮໻ᛣНDŽ᠔ҹˈᠻ㸠ϔϾᮄⱘৃᠻ ϟˈབᵰ໡ࠊމ䇏㗙೼ࠡϔ㡖ЁᏆ㒣ⳟࠄˈ䖯⿟䗮ᐌᰃᣝ݊⠊䖯⿟ⱘॳḋ໡ࠊߎᴹⱘˈ೼໮᭄ᚙ 4.4 ㋏㒳䇗⫼ execve() ህᰃϟϔ㡖Ā㋏㒳䇗⫼ execve()ā᠔㽕䆆䗄ⱘݙᆍњDŽ ⿟Ⳍৠⱘৃᠻ㸠⿟ᑣ੠᭄᥂ˈাᰃ⠊䖯⿟ⱘĀᕅᄤāˈ䙷জ᳝ҔМᛣНਸ਼˛ᄤ䖯⿟ᖙ乏䍄㞾Ꮕⱘ䏃ˈ䖭 ϡㅵᗢḋˈᄤ䖯⿟ⱘ߯ᓎ㒜Ѣᅠ៤њˈ䅽៥Ӏ⼱⽣䖭ᮄⱘ⫳ੑʽৃᰃˈབᵰᄤ䖯⿟া݋᳝Ϣ⠊䖯 CLONE_VFORK 㒧ড়Փ⫼ˈ৺߭ህӮথ⫳ࠡ䗄ⱘ䯂乬DŽ䰸䴲೼⫼᠋⿟ᑣЁ䞛পњ⡍⅞ⱘ乘䰆᥾ᮑDŽ ᄤ䖯⿟Փ⫼䖭Ͼֵো䞣ҹৢˈֵো䞣᠔೼ⱘぎ䯈ህϡӮফࠄᠧᡄDŽ䖬ᑨᣛߎˈCLONE_VM 㽕Ϣ 㒧ᵘЁ᳝ᣛ৥䖭Ͼֵো䞣ⱘᣛ䩜˄े vfork_semˈ㾕 do_fork()ⱘ㄀ 586 㸠˅DŽ᮶✊⠊䖯⿟ϔⳈ㽕ⴵ⳴ࠄ ߎˈ䖭Ͼֵো䞣ᰃ do_fork()ⱘϔϾሔ䚼ব䞣ˈ᠔ҹ೼⠊䖯⿟ⱘ㋏㒳ぎ䯈ේᷜЁˈ㗠ᄤ䖯⿟೼݊ task_struct 䖭ӊџDŽ䖭䞠䖬㽕ᣛخ䖭ӊџDŽℸ໪ˈᄤ䖯⿟೼䗮䖛 exit()䗔ߎ㋏㒳ᯊгӮخϔϾᮄⱘৃᠻ㸠⿟ᑣᯊӮ 䙷Мˈ䇕ᴹᡩܹ䌘⑤ਸ਼˛೼Ā㋏㒳䇗⫼ execve()āϔ㡖Ё䇏㗙ᇚⳟࠄˈᄤ䖯⿟೼䗮䖛 execve()ᠻ㸠 ᡩܹ䌘⑤ˈгህᰃᠻ㸠ϔ⃵ up()᪡԰ᯊᠡӮ㹿૸䝦DŽ ྟህᰃ 0ˈ᠔ҹ㄀ϔϾᇍℸᠻ㸠 down()᪡԰ⱘ䖯⿟ህӮ䖯ܹⴵ⳴ˈϔⳈ㽕ࠄᶤϾ䖯⿟ᕔ䖭Ͼֵݭ䞣Ё П䮼໪䖯ܹⴵ⳴ˈⳈࠄ㄀ϔϾ䖯⿟ᔦ䖬䌘⑤⾏ᓔЈ⬠ऎᯊᠡ㹿૸䝦DŽ㗠⦄೼䖭Ͼֵো䞣ⱘ䌘⑤Ңϔᓔ ⱘ䖯⿟䖯ܹЈ⬠ऎˈ㗠Փ䌘⑤᭄䞣ব៤њ 0ˈҹৢᠻ㸠ࡴ down()᪡԰ⱘ䖯⿟֓Ӯ಴Ў䌘⑤Ў 0 㗠㹿ᢦ Ё䌘⑤ⱘ᭄䞣Ў 1ˈ㗠⦄೼䖭Ͼֵো䞣Ё䌘⑤ⱘ᭄䞣Ў 0DŽᔧ䌘⑤᭄䞣Ў 1 ᯊˈ㄀ϔϾᠻ㸠 down()᪡԰ ϟֵো䞣މᇚ DECLARE_MUTEX_LOCKED Ϣ DECLARE_MUTEX ԰ϔ↨䕗ˈৃҹⳟߎℷᐌᚙ define DECLARE_MUTEX_LOCKED(name) __DECLARE_SEMAPHORE_GENERIC(name,0) 71 309 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 106 * POSIX.1 2.4: an empty pathname is invalid (ENOENT). 105 * 104 * kernel data space before using them.. 103 * checking and hopefully speeding things up, we copy filenames to the 102 /* In order to reduce some races, while at the same time doing additional [sys_execve()>getname()>do_getname()] ==================== fs/namei.c 102 127 ==================== ߑ᭄ do_getname()ⱘҷⷕг೼᭛ӊ fs/namei.c Ё˖ ೼ getname()ЁᅮНϔϾሔ䚼ⱘ 4KB ⱘᄫヺ᭄㒘˄⊼ᛣˈሔ䚼ব䞣᠔ऴ᥂ⱘぎ䯈ᰃ೼ේᡒЁߚ䜡ⱘ˅DŽ 䖭ᰃϔϾ㒱ᇍ䏃ᕘৡDŽ݊⃵ˈ៥Ӏҹࠡ䆆䖛ˈ䖯⿟㋏㒳ぎ䯈ේᷜⱘ໻ᇣᰃ໻㑺 7KBˈϡ㛑Ⓓ⫼ˈϡᅰ 䖭ϾᄫヺІ⹂᳝ৃ㛑Ⳍᔧ䭓ˈ಴Ўˈܜऎਸ਼˛佪ކ䙷МˈЎҔМ㽕ϧ䮼Ўℸߚ䜡ϔϾ⠽⧚义䴶԰Ў㓧 ऎˈ✊ৢ䇗⫼ do_getname()Ң⫼᠋ぎ䯈ᣋ䋱ᄫヺІDŽކ䗮䖛__getname()ߚ䜡ϔϾ⠽⧚义䴶԰Ў㓧ܜ { 145 144 return result; 143 } 142 } 141 result = ERR_PTR(retval); 140 putname(tmp); 139 if (retval < 0) { 138 result = tmp; 137 136 int retval = do_getname(filename, tmp); 135 if (tmp) { 134 tmp = __getname(); 133 result = ERR_PTR(•ENOMEM); 132 131 char *tmp, *result; 130 { 129 char * getname(const char * filename) [sys_execve()>getname()] ==================== fs/namei.c 129 145 ==================== Ё˖ ⱘDŽߑ᭄ getname()ⱘҷⷕ೼ fs/namei.cخ䯈ˈ೼㋏㒳ぎ䯈Ёᓎゟ䍋ϔϾࡃᴀDŽ䅽៥Ӏⳟⳟ݋ԧᰃᗢМ ЁˈԚᄫヺІᴀ䑿䖬೼⫼᠋ぎ䯈Ёˈ᠔ҹ 730 㸠ⱘ getname()㽕ᡞ䖭ϾᄫヺІҢ⫼᠋ぎ䯈ᣋ䋱ࠄ㋏㒳ぎ ೼ᴀゴ㄀ 2 㡖᠔Вⱘ՟ᄤЁˈ䖭Ͼখ᭄Ўᣛ৥ᄫヺІ“/bin/echoāⱘᣛ䩜DŽ⦄೼ˈᣛ䩜ᄬᬒ೼ regs.ebx ҹࠡ䆆䖛ˈ㋏㒳䇗⫼䖯ܹݙḌᯊˈregs.ebx ЁⱘݙᆍЎᑨ⫼⿟ᑣЁ䇗⫼Ⳍᑨᑧߑ᭄ᯊⱘ㄀ϔϾখ᭄DŽ 740 } 739 return error; 738 out: 737 putname(filename); 736 current•>ptrace &= ~PT_DTRACE; 735 if (error == 0) error = do_execve(filename, (char **) regs.ecx, (char **) regs.edx, ®s); 734 310 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 837 */ 836 * sys_execve() executes a new program. 835 /* [sys_execve()>do_execve()] ==================== fs/exec.c 835 850 ==================== ⱘҷⷕ೼ fs/exec.c Ёˈ៥Ӏ䗤↉ഄᕔϟⳟ˖ ݊Џԧ䚼ߚⱘᎹ԰DŽᔧ✊ˈᅠ៤ҹৢ䖬㽕䗮䖛 putname()ᇚ᠔ߚ䜡ⱘ⠽⧚义䴶䞞ᬒDŽߑ᭄ do_execve() ೼㋏㒳ぎ䯈Ёᓎゟ䍋ϔӑৃᠻ㸠᭛ӊⱘ䏃ᕘৡࡃᴀҹৢˈsys_execve()ህ䇗⫼ do_execve()ˈҹᅠ៤ ㄀ 3 ゴЁҟ㒡䖛ⱘ__generic_copy_from_user()ᕜⳌԐˈ䇏㗙ৃҹ㞾㸠ᇍ✻䯙䇏DŽ 䖭Ͼߑ᭄ⱘЏԧ strncpy_from_user()ᰃϔϾᅣ᪡԰ˈг೼ৠϔ⑤᭛ӊ arch/i386/lib/usercopy.c Ё ˈϢ 107 } 106 return res; 105 __do_strncpy_from_user(dst, src, count, res); 104 if (access_ok(VERIFY_READ, src, 1)) 103 long res = •EFAULT; 102 { 101 strncpy_from_user(char *dst, const char *src, long count) 100 long [sys_execve()>getname()>do_getname()>strncpy_from_user()] ==================== arch/i386/lib/usercopy.c 100 107 ==================== arch/i386/lib/usercopy.c˖ 䆄ᕫ TASK_SIZE ⱘؐᰃ 3GB DŽ݋ԧⱘᣋ䋱ᰃ䗮䖛 strncpy_from_user() 䖯㸠ⱘˈҷⷕ㾕 བᵰᣛ䩜 filename ⱘؐ໻ѢㄝѢ TASK_SIZEˈህ㸼⼎ filename ᅲ䰙Ϟ೼㋏㒳ぎ䯈ЁDŽ䇏㗙ᑨ䆹䖬 127 } 126 return retval; 125 retval = •ENOENT; 124 } else if (!retval) 123 return •ENAMETOOLONG; 122 return 0; 121 if (retval < len) 120 if (retval > 0) { 119 retval = strncpy_from_user((char *)page, filename, len); 118 117 len = TASK_SIZE • (unsigned long) filename; 116 } else if (TASK_SIZE • (unsigned long) filename < PAGE_SIZE) 115 return •EFAULT; 114 if (!segment_eq(get_fs(), KERNEL_DS)) 113 if ((unsigned long) filename >= TASK_SIZE) { 112 111 unsigned long len = PATH_MAX + 1; 110 int retval; 109 { static inline int do_getname(const char *filename, char *page) 108 /* 107 311 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 856 bprm.sh_bang = 0; 855 bprm.filename = filename; 854 bprm.file = file; 853 852 memset(bprm.page, 0, MAX_ARG_PAGES*sizeof(bprm.page[0])); 851 bprm.p = PAGE_SIZE*MAX_ARG_PAGES•sizeof(void *); [sys_execve()>do_execve()] ==================== fs/exec.c 851 887 ==================== ݊Ё৘Ͼ៤ߚⱘ԰⫼䇏њϟ䴶ⱘҷⷕህӮ⏙ἮDŽ៥Ӏ㒻㓁೼ do_execve()Ёᕔϟⳟ˖ 33 }; 32 unsigned long loader, exec; 31 char * filename; /* Name of binary */ 30 int argc, envc; 29 kernel_cap_t cap_inheritable, cap_permitted, cap_effective; 28 int e_uid, e_gid; 27 struct file * file; 26 int sh_bang; 25 unsigned long p; /* current top of mem */ 24 struct page *page[MAX_ARG_PAGES]; 23 char buf[BINPRM_BUF_SIZE]; 22 struct linux_binprm{ 21 */ 20 * This structure is used to hold the arguments that are used when loading binaries. 19 /* ==================== include/linux/binfmts.h 19 33 ==================== include/linux/binfmts.h ᅮНⱘ˖ НњϔϾ᭄᥂㒧ᵘ linux_binprmˈҹ֓ᇚ䖤㸠ϔϾৃᠻ㸠᭛ӊᯊ᠔䳔ⱘֵᙃ㒘㒛೼ϔ䍋ˈ䖭ᰃ೼ ᅮⳂᷛ᭛ӊᏆ㒣ᠧᓔˈϟϔℹህ㽕Ң᭛ӊЁ㺙ܹৃᠻ㸠⿟ᑣњDŽݙḌЁЎৃᠻ㸠⿟ᑣⱘ㺙ܹᅮ؛ 㸠䯙䇏DŽ fs/exec.c Ёˈ䇏㗙ৃ㒧ড়Ā᭛ӊ㋏㒳āϔゴЁ᳝݇ᠧᓔ᭛ӊ᪡԰ⱘݙᆍˈ⡍߿ᰃ path_walk()ⱘҷⷕ㞾 㽕ᇚ㒭ᅮⱘৃᠻ㸠⿟ᑣ᭛ӊᡒࠄᑊᠧᓔˈopen_exec()ህᰃЎℸ㗠䇗⫼ⱘˈ݊ҷⷕг೼ܜˈᰒ✊ 850 849 return retval; 848 if (IS_ERR(file)) 847 retval = PTR_ERR(file); 846 845 file = open_exec(filename); 844 843 int i; 842 int retval; 841 struct file *file; 840 struct linux_binprm bprm; 839 { int do_execve(char * filename, char ** argv, char ** envp, struct pt_regs * regs) 838 312 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ҷⷕᕜㅔऩˈԚᰃᓩ⫼ⱘᅣᅮН get_user()ै乛᳝ѯᣥ៬ᗻˈؐᕫϔ䇏DŽᅗгϢ㄀ 3 ゴЁҟ㒡䖛ⱘ ᠋ぎ䯈㗠ϡ೼㋏㒳ぎ䯈ˈ᠔ҹ䅵᭄ⱘ᪡԰ᑊϡ䙷МㅔऩDŽߑ᭄ count()ⱘҷⷕ೼ fs/exec.c Ёˈᅗᴀ䑿ⱘ ؐDŽৠḋˈᇍ԰Ўখ᭄Ӵ䖛ᴹⱘ⦃๗ব䞣г㽕䗮䖛 count()䅵᭄DŽ⊼ᛣ䖭䞠ⱘ᭄㒘 argv[]੠ envp[]ᰃ೼⫼ 䆌ⱘ᳔໻ܕ⼎ⱘˈ䖭䞠⫼ᅗᇍᄫヺІᣛ䩜᭄㒘 argv[]Ёখ᭄ⱘϾ᭄䖯㸠䅵᭄ˈ㗠 bprm.p/sizeof(void *)㸼 䩜ⱘ໻ᇣˈ಴Ў㄀ 0 Ͼখ᭄гህᰃ argv[0]ᰃৃᠻ㸠⿟ᑣᴀ䑿ⱘ䏃ᕘৡDŽߑ᭄ count()ᰃ೼ exec.c ЁᅮН এϔϾᣛޣࠡ䴶Ꮖ䗮䖛 memset()ᇚ䖭Ͼᣛ䩜᭄㒘߱ྟ࣪៤ܼ 0DŽ⦄೼ᇚ bprm.p 䆒㕂៤䖭ѯ义䴶ⱘᘏ੠ 䆌ⱘ᳔໻খ᭄Ͼ᭄ MAX_AGE_PAGESˈⳂࠡ䖭Ͼᐌ᭄ᅮНЎ 32DŽܕϔϾ义䴶ᣛ䩜᭄㒘ˈ᭄㒘ⱘ໻ᇣЎ Ϣৃᠻ㸠᭛ӊ䏃ᕘৡⱘ໘⧚ࡲ⊩ϔḋˈ↣Ͼখ᭄ⱘ᳔໻䭓ᑺгᅮЎϔϾ⠽⧚义䴶ˈ᠔ҹ bprm Ё᳝ ⱘ݊ᅗϸϾব䞣г᱖ᯊ䆒㕂៤ 0DŽ᥹ⴔህ໘⧚ৃᠻ㸠᭛ӊⱘখ᭄੠⦃๗ব䞣DŽ ᅮЎѠ䖯ࠊ᭛ӊDŽ᭄᥂㒧ᵘЁ؛ܜ䞞ᠻ㸠˅ᯊ㕂Ў 1DŽ㗠⦄೼䖬ϡⶹ䘧ˈ᠔ҹ᱖Ϩᇚ݊㕂Ў 0ˈгህᰃ ᭛ӊⱘᗻ䋼ˈᔧৃᠻ㸠᭛ӊᰃϔϾ Shell 䖛⿟˄Shell Scriptˈ⫼ Shell 䇁㿔㓪ݭⱘੑҸ᭛ӊˈ⬅ shell 㾷 ⴔ䇏ܹৃᠻ㸠᭛ӊⱘϞϟ᭛ˈ᠔ҹᇚֱ݊ᄬ೼᭄᥂㒧ᵘ bprm ЁDŽব䞣 bprm.sh_bang ⱘؐ䇈ᯢৃᠻ㸠 ҷⷕЁⱘ linux_binprm ᭄᥂㒧ᵘ bprm ᰃϾሔ䚼䞣DŽߑ᭄ open_exec()䖨ಲϔϾ file 㒧ᵘᣛ䩜ˈҷ㸼 887 886 goto out; 885 if (retval < 0) 884 retval = copy_strings(bprm.argc, argv, &bprm); 883 882 goto out; 881 if (retval < 0) 880 retval = copy_strings(bprm.envc, envp, &bprm); 879 bprm.exec = bprm.p; 878 877 goto out; 876 if (retval < 0) 875 retval = copy_strings_kernel(1, &bprm.filename, &bprm); 874 873 goto out; 872 if (retval < 0) 871 retval = prepare_binprm(&bprm); 870 869 } 868 return bprm.envc; 867 fput(file); 866 allow_write_access(file); 865 if ((bprm.envc = count(envp, bprm.p / sizeof(void *))) < 0) { 864 863 } 862 return bprm.argc; 861 fput(file); 860 allow_write_access(file); 859 if ((bprm.argc = count(argv, bprm.p / sizeof(void *))) < 0) { 858 bprm.exec = 0; bprm.loader = 0; 857 313 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ḌЁ᳝ϔϾ䯳߫ˈি formatsˈᣖ೼ℸ䯳߫Ёⱘ៤ਬᰃҷ㸼ⴔ৘⾡ৃᠻ㸠᭛ӊḐᓣⱘĀҷ⧚Ҏāˈ↣Ͼ ҟ㒡ϔϾ໻ὖDŽݙܜˈᰒ✊ˈ䖭䞠ⱘ݇䬂ᰃ search_binary_handler()DŽ೼⏅ܹࠄ䖭Ͼߑ᭄ݙ䚼Пࠡ 906 } 905 return retval; 904 903 } 902 __free_page(page); 901 if (page) 900 struct page * page = bprm.page[i]; 899 for (i = 0 ; i < MAX_ARG_PAGES ; i++) { 898 897 fput(bprm.file); 896 if (bprm.file) 895 allow_write_access(bprm.file); 894 /* Something went wrong, return the inode and free the argument pages*/ 893 out: 892 891 return retval; 890 /* execve success */ 889 if (retval >= 0) 888 retval = search_binary_handler(&bprm,regs); [sys_execve()>do_execve()] ==================== fs/exec.c 888 906 ==================== ϟᴹህ㽕㺙ܹᑊ䖤㸠Ⳃᷛ⿟ᑣњ˄fs/exec.c˅˖ ໛Ꮉ԰䛑Ꮖᅠ៤ˈ᠔᳝ᖙ㽕ⱘֵᙃ䛑Ꮖ㒣᧰䲚ࠄњ linux_binprm 㒧ᵘ bprm Ёˈ᥹ޚ㟇ℸˈ᠔᳝ⱘ Ёњˈ᠔ҹ⫼ copy_strings_kernel()Ң㋏㒳ぎ䯈Ёᣋ䋱ˈ݊ᅗⱘህ㽕⫼ copy_strings()Ң⫼᠋ぎ䯈ᣋ䋱DŽ ᣋ䋱ࠄ᭄᥂㒧ᵘ bprm ЁDŽ݊Ёⱘ㄀ 1 Ͼখ᭄ argv[0]ህᰃৃᠻ㸠᭛ӊⱘ䏃ᕘৡˈᏆ㒣೼ bprm.filename ໛Ꮉ԰ህᰃᡞᠻ㸠ⱘখ᭄ˈгህᰃ argv[]ˈҹঞ䖤㸠ⱘ⦃๗ˈгህᰃ envp[]ˈҢ⫼᠋ぎ䯈ޚ᳔ৢⱘ ߚⱘֵᙃDŽㄝϔϟ䇏㗙ህӮⳟࠄ䖭ѯֵᙃⱘ⫼䗨DŽܙѢৃᠻ㸠᭛ӊሲᗻⱘᖙ㽕㗠 䖭ᰃ಴ЎˈϡㅵⳂᷛ᭛ӊᰃ elf Ḑᓣ䖬ᰃ a.out Ḑᓣˈ៪㗙߿ⱘḐᓣˈ೼ᓔ༈ 128 Ͼᄫ㡖Ё䛑ࣙᣀњ݇ 䇏 128 Ͼᄫ㡖DŽܜ䇈ᯢЎҔМাᰃܜ៥Ӏᓎ䆂䇏㗙೼ᄺдњĀ᭛ӊ㋏㒳āҹৢݡಲ䖛ᴹ㞾㸠䯙䇏DŽℸ໘ ݋᳝“set uidā⡍ᗻ߭㽕԰Ⳍᑨⱘ䆒㕂DŽ䖭Ͼߑ᭄ⱘҷⷕг೼ fs/exec.c ЁDŽ⬅Ѣ⍝ঞ᭛ӊ᪡԰ⱘ㒚㡖ˈ Ẕ偠ᔧࠡ䖯⿟ᰃ৺᳝䖭Ͼᴗ࡯ˈҹঞ䆹᭛ӊᰃ৺᳝ৃᠻ㸠ሲᗻDŽབᵰৃᠻ㸠᭛ӊܜ✊ˈ೼䇏Пࠡ䖬㽕 ऎDŽᔧކ໛Ꮉ԰ˈҢৃᠻ㸠᭛ӊЁ䇏ܹᓔ༈ⱘ 128 Ͼᄫ㡖ࠄ linux_binprm 㒧ᵘ bprm Ёⱘ㓧ޚbprm ⱘ ᭄᥂㒧ᵘخᅠ៤њᇍখ᭄੠⦃๗ব䞣ⱘ䅵᭄ҹৢˈdo_execve()জ䇗⫼ prepare_binprm()ˈ䖯ϔℹ ৃᠻ㸠᭛ӊᯊ೼ open_exec()Ё䇗⫼ⱘDŽ ᇘᬍবᅗⱘݙᆍ˄䆺㾕Ā᭛ӊ㋏㒳āҹঞ㋏㒳䇗⫼ mmap()˅DŽϢ݊䜡ᇍⱘ deny_write_access()ᰃ೼ᠧᓔ 䜡ᇍՓ⫼ⱘˈⳂⱘ೼Ѣ䰆ℶ݊ᅗ䖯⿟˄ৃ㛑೼঺ϔϾ CPU Ϟ䖤㸠˅೼䇏ܹৃᠻ㸠᭛ӊᳳ䯈䗮䖛ݙᄬ᯴ ༅䋹ˈे䖨ಲ䋳ؐˈ߭㽕ᇍⳂᷛ᭛ӊᠻ㸠ϔ⃵ allow_write_access()DŽ䖭Ͼߑ᭄ᰃϢ deny_write_access() ੠ arch/i386/lib/getuser.S Ёˈ䇗⫼ⱘ䏃ᕘЎ[count()>get_user()>_get_user()>_get_user_4()]DŽབᵰ count() generic_copy_from_user()ⳌԐˈ៥Ӏᡞᅗ⬭㒭䇏㗙԰Ў㒗дDŽ᳝݇ⱘҷⷕ೼ include/asm•i386/uaccess.h__ 314 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 811 return retval; 810 read_unlock(&binfmt_lock); 809 if (!bprm•>file) { 808 break; 807 if (retval != •ENOEXEC) 806 put_binfmt(fmt); 805 read_lock(&binfmt_lock); 804 } 803 return retval; 802 current•>did_exec = 1; 801 bprm•>file = NULL; 800 fput(bprm•>file); 799 if (bprm•>file) 798 allow_write_access(bprm•>file); 797 put_binfmt(fmt); 796 if (retval >= 0) { 795 retval = fn(bprm, regs); 794 read_unlock(&binfmt_lock); 793 continue; 792 if (!try_inc_mod_count(fmt•>module)) 791 continue; 790 if (!fn) 789 int (*fn)(struct linux_binprm *, struct pt_regs *) = fmt•>load_binary; 788 for (fmt = formats ; fmt ; fmt = fmt•>next) { 787 read_lock(&binfmt_lock); 786 for (try=0; try<2; try++) { 785 #endif ==================== fs/exec.c 785 832 ==================== 754 #ifdef __alpha__ 753 struct linux_binfmt *fmt; 752 int try,retval=0; 751 { 750 int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs) 749 */ 748 * cycle the list of binary formats handler, until one recognizes the image 747 /* [sys_execve()>do_execve()>search_binary_handler()] ==================== fs/exec.c 747 754 ==================== 䆥ˈ೼ϟ߫ⱘҷⷕЁ䏇䖛њ䖭↉ᴵӊ㓪䆥䇁হ˖ ߑ᭄ search_binary_handler()ⱘҷⷕг೼ fs/exec.c Ёˈ݊Ё᳝ϔ↉ᰃϧ䮼䩜ᇍ alpha ໘⧚఼ⱘᴵӊ㓪 formats 䯳߫Ёˈ✊ৢ䅽 formats 䯳߫Ёⱘ৘ϾĀҷ⧚Ҏāݡᴹ䆩ϔ⃵DŽ ࡼᗕᅝ㺙῵ഫᅲ⦄ⱘĀҷ⧚Ҏāᄬ೼Ѣ᭛ӊ㋏㒳ЁDŽབᵰ᳝ⱘ䆱ህᡞ䖭῵ഫᅝ㺙䖯ᴹᑊϨᇚ݊ᣖܹࠄ Ѹ㒭ᅗDŽ㽕ᰃ䛑ϡ䅸䆚ਸ਼˛䙷ህḍ᥂᭛ӊ༈䚼ⱘֵᙃݡᡒᡒⳟˈᰃ৺᳝Ўℸ⾡Ḑᓣ䆒䅵ˈҡᰃ԰Ўৃ ⦄೼ህ⬅ formats 䯳߫Ёⱘ៤ਬ䗤Ͼᴹ䅸乚ˈ䇕㽕ᰃ䕼䅸ࠄњᅗ᠔ҷ㸼ⱘৃᠻ㸠᭛ӊḐᓣˈ䖤㸠ⱘџህ ऎˈ㗠Ϩ䖤㸠᠔䳔ⱘখ᭄੠⦃๗ব䞣гᏆ㒣ᬊ䲚೼ bprm ЁDŽކ䚼䇏ܹњ 128 Ͼᄫ㡖ᄬᬒ೼ bprm ⱘ㓧 ໛䰊↉ЁˈᏆ㒣Ңৃᠻ㸠᭛ӊ༈ޚ៤ਬা䅸䆚ᑊϨ໘⧚ϔ⾡⡍ᅮḐᓣⱘৃᠻ㸠᭛ӊⱘ䖤㸠DŽ೼ࠡ䴶ⱘ 315 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ࡼᠻ㸠Ⳃᷛ⿟ᑣⱘ䖛⿟DŽ݊ᅲˈa.out Ḑᓣⱘৃᠻ㸠᭛ӊᏆ㒣⏤⏤㹿⎬≄њˈপ㗠ҷПⱘᰃ elf ḐᓣDŽԚ ᕜ䲒ᇍ execve()ǃ䖯㗠ᇍ Linux 䖯⿟ⱘ䖤㸠᳝⏅ࠏⱘ⧚㾷DŽϟ䴶៥Ӏҹ a.out ḐᓣЎ՟ˈ䆆䗄㺙ܹᑊਃ ⫼߭ϡ㿔㞾ᯢDŽᰒ✊ˈ䖭䞠᳔ḍᴀⱘᰃ load_binaryDŽৠᯊˈབᵰϡ᧲⏙݋ԧⱘ㺙䕑⿟ᑣᗢḋᎹ԰ˈህ 䩜ˈload_binary ⫼ᴹ㺙ܹৃᠻ㸠⿟ᑣˈload_shlib ⫼ᴹ㺙ܹࡼᗕᅝ㺙ⱘ݀⫼ᑧ⿟ᑣˈ㗠 core_dump ⱘ԰ ᭄᥂㒧ᵘ linux_binfmt ᅮНѢ include/linux/binfmts.h Ёˈࠡ䴶Ꮖ㒣ⳟࠄ䖛њDŽ㒧ᵘЁ᳝ϝϾߑ᭄ᣛ ㄀ϔϾᄫヺЎ“#āˈ㄀ѠϾᄫヺЎ“!āˈৢ䴶ᰃⳌᑨ㾷䞞⿟ᑣⱘ䏃ᕘৡDŽ ੠“eāDŽབᵰৃᠻ㸠᭛ӊЎ Shell 䖛⿟៪ perl ᭛ӊˈे㄀ϔ㸠ⱘḐᓣЎ#! /bin/sh ៪#! /usr/bin/perlˈℸᯊ ӊⱘ༈ಯϾᄫ㡖Ў“0x7Fāǃ“ e āǃ“ l ā੠“fā˗㗠 java ⱘৃᠻ㸠᭛ӊ༈䚼ಯϾᄫ㡖߭Ў“cāǃ“ a āǃ“ f ” ⱘ magic numberˈབᵰᡞᅗᢚᓔ៤ᄫ㡖ˈ߭ᕔᕔজᰃ䇈ᯢ᭛ӊḐᓣⱘᄫヺDŽ՟བˈelf Ḑᓣⱘৃᠻ㸠᭛ 㛑೼ Linux ㋏㒳Ϟ䖤㸠ⱘৃᠻ㸠⿟ᑣⱘᓔ༈޴Ͼᄫ㡖ˈ⡍߿ᰃᓔ༈ 4 Ͼᄫ㡖ˈᕔᕔᵘ៤ϔϾ᠔䇧 ഫҹৢݡᴹ䆩ϔ⃵DŽ Ā᭛ӊ㋏㒳ā੠Ā䆒໛偅ࡼāϸゴЁⱘ᳝݇ݙᆍ˅DŽ໪ሖⱘ for ᕾ⦃݅䖯㸠ϸ⃵ˈℷᰃЎњ೼ᅝ㺙њ῵ ӊⱘ㄀ 2 ੠㄀ 3 Ͼᄫ㡖⫳៤ϔϾ binfmt ῵ഫৡˈ䗮䖛 request_module()䆩ⴔᇚⳌᑨⱘ῵ഫ㺙ܹ˄㾕ᴀк Ѣ㓪䆥䗝ᢽ乍 CONFIG_KMOD˅ˈህḍ᥂Ⳃᷛ᭛އⱘḐᓣDŽ䖭ᯊ׭ˈབᵰݙḌᬃᣕࡼᗕᅝ㺙῵ഫ˄প ݙሖᕾ⦃㒧ᴳҹৢˈབᵰ༅䋹ⱘॳ಴ᰃ•ENOEXECˈህ䇈ᯢ䯳߫Ё᠔᳝ⱘ៤ਬ䛑ϡ䅸䆚Ⳃᷛ᭛ӊ •ENOEXECˈ䙷ህ㸼⼎ᇍϞњোԚߎњ݊ᅗⱘ䫭ˈ䖭ህϡ⫼ݡ䅽݊ᅗⱘ៤ਬᴹ䆩њDŽ ≵᳝থ⫳݊ᅗⱘ䫭䇃ˈ᠔ҹᕾ⦃ಲএˈ䅽䯳߫ЁⱘϟϔϾ៤ਬݡᴹ䆩䆩DŽԚᰃབᵰߎњ䫭㗠জᑊϡᰃ 䕼䆚ˈ៪㗙೼໘⧚ⱘ䖛⿟Ёߎњ䫭ˈህ䖨ಲϔϾ䋳᭄DŽߎ䫭ҷⷕ•ENOEXEC 㸼⼎াᰃᇍϡϞোˈ㗠ᑊ ݡ䖨ಲϔϾℷ᭄៪ 0DŽᔧ CPU Ң㋏㒳䇗⫼䗔ಲᯊˈ䆹Ⳃᷛ᭛ӊⱘᠻ㸠ህⳳℷᓔྟњDŽ৺߭ˈབᵰϡ㛑 䆩ᅗӀⱘ load_binary()ߑ᭄ˈⳟⳟ㛑৺ᇍϞোDŽབᵰᇍϞњোˈ䙷ህᡞⳂᷛ᭛ӊ㺙ܹᑊᇚ݊ᡩܹ䖤㸠ˈ ⿟ᑣЁ᳝ϸሖጠ༫ⱘ for ᕾ⦃DŽݙሖᰃᇍ formats 䯳߫Ёⱘ↣Ͼ៤ਬᕾ⦃ˈ䅽䯳߫Ёⱘ៤ਬ䗤Ͼ䆩 832 } 831 return retval; 830 } 829 } 828 #endif 827 request_module(modname); 826 sprintf(modname, "binfmt•%04x", *(unsigned short *)(&bprm•>buf[2])); 825 break; /* •ENOEXEC */ 824 printable(bprm•>buf[3])) 823 printable(bprm•>buf[2]) && 822 printable(bprm•>buf[1]) && 821 if (printable(bprm•>buf[0]) && 820 char modname[20]; 819 #define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e)) 818 }else{ 817 #ifdef CONFIG_KMOD 816 break; 815 if (retval != •ENOEXEC) { read_unlock(&binfmt_lock); 814 { 813 { 812 316 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 4 struct exec ==================== include/asm•i386/a.out.h 4 18 ==================== 䛑ᑨ䆹ᰃϔϾ exec ᭄᥂㒧ᵘˈ䖭ᰃ೼ include/asm•i386/a.out.h ЁᅮНⱘ˖ ᰃẔᶹⳂᷛ᭛ӊⱘḐᓣˈⳟⳟᰃ৺ᇍϞোDŽ᠔᳝ a.out Ḑᓣৃᠻ㸠᭛ӊ˄Ѡ䖯ࠊҷⷕ˅ⱘᓔ༈ܜ佪 268 } 267 return •ENOEXEC; 266 bprm•>file•>f_dentry•>d_inode•>i_size < ex.a_text+ex.a_data+N_SYMSIZE(ex)+N_TXTOFF(ex)) { 265 N_TRSIZE(ex) || N_DRSIZE(ex) || 264 N_MAGIC(ex) != QMAGIC && N_MAGIC(ex) != NMAGIC) || 263 if ((N_MAGIC(ex) != ZMAGIC && N_MAGIC(ex) != OMAGIC && 262 ex = *((struct exec *) bprm•>buf); /* exec•header */ 261 260 int retval; 259 unsigned long rlim; 258 unsigned long fd_offset; 257 unsigned long error; 256 struct exec ex; 255 { 254 static int load_aout_binary(struct linux_binprm * bprm, struct pt_regs * regs) 253 252 */ 251 * libraries. There is no binary dependent code anywhere else. 250 * These are the functions used to load a.out style executables and shared 249 /* [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()] ==================== fs/binfmt_aout.c 249 268 ==================== ϟⳟDŽ݊ҷⷕ೼ fs/binfmt_aout.c Ё˖ 䖭ᰃϾ↨䕗໡ᴖⱘ䖛⿟ˈߑ᭄г↨䕗໻DŽ៥Ӏ䖬ᰃ㗕ࡲ⊩ˈϔ↉ϔ↉ᕔˈڣload_aout_binary()DŽৃҹᛇ 䇏㗙ৃҹᇚᅗϢࠡ䴶ⱘ᭄᥂㒧ᵘⱘ㉏ൟᅮНⳌᇍ✻DŽ㺙䕑੠ᡩܹ䖤㸠 a.out ḐᓣⳂᷛ᭛ӊⱘߑ᭄Ў 40 }; 39 NULL, THIS_MODULE, load_aout_binary, load_aout_library, aout_core_dump, PAGE_SIZE 38 static struct linux_binfmt aout_format = { ==================== fs/binfmt_aout.c 38 40 ==================== 㒧ᵘˈ䖭Ͼ᭄᥂㒧ᵘህᰃ೼ formats 䯳߫Ёҷ㸼 a.out Ḑᓣⱘ˖ ᴹⳟ a.out Ḑᓣⱘ linux_binfmt ᭄᥂ܜϢ a.out Ḑᓣৃᠻ㸠᭛ӊ᳝݇ⱘҷⷕ䛑೼ fs/binfmt_aout.c ЁDŽ 4.4.1 a.out ḐᓣⳂᷛ᭛ӊⱘ㺙䕑੠ᡩܹ䖤㸠 ḐᓣⱘⳌ݇ҷⷕDŽ ᐙ㗗㰥៥Ӏ䗝ᢽњ a.outDŽ䇏㗙೼᧲⏙њ a.out Ḑᓣⱘ㺙䕑੠ᡩܹ䖤㸠䖛⿟ҹৢˈৃҹ㞾㸠䯙䇏᳝݇ elf ᰃˈa.out Ḑᓣ㽕ㅔऩᕫ໮ˈᑊϨᮍ֓៥Ӏ䗮䖛ᅗᴹ䆆䗄Ⳃᷛ⿟ᑣⱘ㺙䕑੠ᡩܹ䖤㸠ⱘ䖛⿟ˈ᠔ҹҢ㆛ 317 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 277 if (rlim >= RLIM_INFINITY) 276 rlim = current•>rlim[RLIMIT_DATA].rlim_cur; 275 */ 274 * arrays in the data or bss. 273 * size limits imposed on them by creating programs with large 272 /* Check initial limits. This avoids letting people circumvent 271 270 fd_offset = N_TXTOFF(ex); [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()] ==================== fs/binfmt_aout.c 270 287 ==================== 㒻㓁೼ fs/binfmt_aout.c Ёᕔϟⳟ˖ a.out Ḑᓣⱘˈ᠔ҹ䖨ಲ•ENOEXECDŽ བᵰ magic number ϡヺˈ៪㗙 exec 㒧ᵘЁᦤկⱘֵᙃϢᅲ䰙ϡヺˈ䙷ህϡ㛑䅸Ў䖭ϾⳂᷛ᭛ӊᰃ 71 #define CMAGIC 0421 70 /* Code indicating core file. */ 69 68 #define QMAGIC 0314 67 The first page is unmapped to help trap NULL pointer references */ 66 /* This indicates a demand•paged executable with the header in the text. 65 #define ZMAGIC 0413 64 /* Code indicating demand•paged executable. */ 63 #define NMAGIC 0410 62 /* Code indicating pure executable. */ 61 #define OMAGIC 0407 60 /* Code indicating object file or impure executable. */ ==================== include/linux/a.out.h 60 71 ==================== ZMAGICǃOMAGICǃQMAGIC ҹঞ NMAGICˈ䖭ᰃ೼ include/linux/a.out.h ЁᅮНⱘ˖ ೼᳝ⱘḐᓣЁ䙷ḋᰃৃᠧॄᅛヺˈ㗠ᰃ㸼⼎ᶤѯሲᗻⱘ㓪ⷕˈϔ᳝݅ಯ⾡ˈेڣmagic number ᑊϡ ⱘҷⷕˈᇍѢ i386 CPU 䖭䚼ߚⱘؐЎ 100˄0x64˅˗㗠Ԣ 16 ԡህᰃ magic numberDŽϡ䖛ˈa.out ᭛ӊⱘ 㒧ᵘЁⱘ㄀ϔϾ᮴ヺো䭓ᭈ᭄ a_info ೼䘏䕥Ϟߚ៤ϸ䚼ߚ˖݊催 16 ԡᰃϔϾҷ㸼Ⳃᷛ CPU ㉏ൟ 18 #define N_SYMSIZE(a) ((a).a_syms) 17 #define N_DRSIZE(a) ((a).a_drsize) 16 #define N_TRSIZE(a) ((a).a_trsize) 15 14 }; 13 unsigned a_drsize; /* length of relocation info for data, in bytes */ 12 unsigned a_trsize; /* length of relocation info for text, in bytes */ 11 unsigned a_entry; /* start address */ 10 unsigned a_syms; /* length of symbol table data in file, in bytes */ 9 unsigned a_bss; /* length of uninitialized data area for file, in bytes */ 8 unsigned a_data; /* length of data, in bytes */ 7 unsigned a_text; /* length of text, in bytes */ unsigned long a_info; /* Use macros N_MAGIC, etc for access */ 6 } 5 318 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 535 534 if (retval) goto flush_failed; 533 retval = make_private_signals(); 532 oldsig = current•>sig; 531 */ 530 * Make sure we have a private signal table 529 /* 528 527 struct signal_struct * oldsig; 526 int i, ch, retval; 525 char * name; 524 { 523 int flush_old_exec(struct linux_binprm * bprm) [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>flush_old_exec()] ==================== fs/exec.c 523 585 ==================== ߑ᭄ flush_old_exec()ⱘҷⷕг೼ fs/exec.c Ё˖ 㺖DŽއ㒻ᡓϟᴹⱘDŽϡ䖛ˈϟ䴶䇏㗙Ӯⳟࠄˈ䖭⾡ਞ߿гᑊ䴲ᕏᑩⱘ 䖭⾡Āਞ߿䖛এāᛣੇⴔᬒᓗҢ⠊䖯⿟Ā㒻ᡓāϟᴹⱘܼ䚼⫼᠋ぎ䯈ˈϡㅵᰃ䗮䖛໡ࠊ䖬ᰃ䗮䖛݅ѿ 乎߽䗮䖛њ䖭ѯẔ偠ህ㸼⼎݋໛њᠻ㸠䆹Ⳃᷛ᭛ӊⱘᴵӊˈ᠔ҹህࠄњĀϢ䖛এਞ߿āⱘᯊ׭DŽ 㛑䍙ߎ䖭Ͼ䰤ࠊDŽ ݊Ёгࣙᣀᇍ⫼Ѣ᭄᥂ⱘݙᄬぎ䯈ⱘ䰤ࠊDŽ᠔ҹˈⳂᷛ᭛ӊ᠔⹂ᅮⱘ data ੠ bss ϸϾĀ↉āⱘᘏ੠ϡ ҹࠡ᳒㒣䆆䖛ˈ↣Ͼ䖯⿟ⱘ task_struct 㒧ᵘЁ᳝Ͼ᭄㒘 rlimˈ㾘ᅮњ䆹䖯⿟Փ⫼৘⾡䌘⑤ⱘ䰤ࠊˈ 86 #endif 85 (N_MAGIC(x) == QMAGIC ? 0 : sizeof (struct exec))) 84 (N_MAGIC(x) == ZMAGIC ? _N_HDROFF((x)) + sizeof (struct exec) : \ 83 #define N_TXTOFF(x) \ 82 #if !defined (N_TXTOFF) 81 80 #define _N_HDROFF(x) (1024 • sizeof (struct exec)) ==================== include/linux/a.out.h 80 86 ==================== ЁᅮНⱘ˖ ԰ N_TXTOFF()ˈҹ֓ḍ᥂ҷⷕⱘ⡍ᗻপᕫℷ᭛೼Ⳃᷛ᭛ӊЁⱘ䍋ྟԡ㕂ˈ䖭ᰃ೼ include/linux/a.out.h ৘⾡ a.out Ḑᓣⱘ᭛ӊ಴Ⳃᷛҷⷕⱘ⡍ᗻϡৠˈ݊ℷ᭛ⱘ䍋ྟԡ㕂гህϡৠDŽЎℸᦤկњϔϾᅣ᪡ 287 /* OK, This is the point of no return */ 286 285 return retval; 284 if (retval) 283 retval = flush_old_exec(bprm); 282 /* Flush all traces of the currently running executable */ 281 280 return •ENOMEM; 279 if (ex.a_data + ex.a_bss > rlim) rlim = ~0; 278 319 320 536 /* 537 * Release all of the old mmap stuff 538 */ 539 retval = exec_mmap(); 540 if (retval) goto mmap_failed; 541 542 /* This is the point of no return */ 543 release_old_signals(oldsig); 544 545 current•>sas_ss_sp = current•>sas_ss_size = 0; 546 547 if (current•>euid == current•>uid && current•>egid == current•>gid) 548 current•>dumpable = 1; 549 name = bprm•>filename; 550 for (i=0; (ch = *(name++)) != '\0';) { 551 if (ch == '/') 552 i = 0; 553 else 554 if (i < 15) 555 current•>comm[i++] = ch; 556 } 557 current•>comm[i] = '\0'; 558 559 flush_thread(); 560 561 de_thread(current); 562 563 if (bprm•>e_uid != current•>euid || bprm•>e_gid != current•>egid || 564 permission(bprm•>file•>f_dentry•>d_inode,MAY_READ)) 565 current•>dumpable = 0; 566 567 /* An exec changes our domain. We are no longer part of the thread 568 group */ 569 570 current•>self_exec_id++; 571 572 flush_signal_handlers(current); 573 flush_old_files(current•>files); 574 575 return 0; 576 577 mmap_failed: 578 flush_failed: 579 spin_lock_irq(¤t•>sigmask_lock); 580 if (current•>sig != oldsig) 581 kfree(current•>sig); 582 current•>sig = oldsig; 583 spin_unlock_irq(¤t•>sigmask_lock); 584 return retval; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ҷⷕ೼ৠϔ᭛ӊ˄fs/exec.c˅Ё˖ Ⳍ↨Пϟˈexec_mmap()ᰃ᳈Ў݇䬂ⱘ㸠ࡼˈҢ⠊䖯⿟㒻ᡓϟᴹⱘ⫼᠋ぎ䯈ህᰃ೼䖭䞠ᬒᓗⱘDŽ݊ ϞಲᴹњDŽ __clone()˅ˈ䙷ህϔᅮ䛑Ꮖ㒣໡ࠊདњˈ䖭䞠ⱘ make_private_signals()াϡ䖛ᰃẔᶹϔϟ݅ѿ䅵᭄ህ偀 ݅ѿ䅵᭄˅᠔㢅䌍ⱘҷӋᰃᕜᇣⱘDŽᔧ✊ˈབᵰᄤ䖯⿟ᰃ䗮䖛 fork()߯ᓎߎᴹⱘ䆱˄㗠ϡᰃ vfork()៪ ໡ࠊህৃ㛑䗴៤⌾䌍㗠Ϩϡヺড়㽕∖DŽݡ䇈ˈẔᶹϔϟᰃ৺䖬೼Ϣ⠊䖯⿟݅ѿֵো໘⧚㸼˄䗮䖛Ẕᶹ 㞾Ꮕⱘ䏃āˈԚ䖭ᰃ≵ֱ᳝䆕ⱘDŽབᵰ߯ᓎⱘᰃ㒓⿟䙷ህϡϔᅮӮᠻ㸠 execve()ˈབᵰϔᕟ೼߯ᓎᯊህ DŽ㱑✊ᮄ߯ᓎⱘ䖯⿟ϔ㠀䛑Ӯᠻ㸠 execve()ˈĀ䍄خϡৃᯊᠡخcomputationāⱘὖᗉ˖ϔӊџᚙা᳝೼䴲 䇏㗙г䆌㽕䯂˖᮶✊᳔㒜䖬ᰃ㽕ᡞᅗ໡ࠊ䖛ᴹˈԩϡ೼ᔧ߱ϔℹህᡞᅗ໡ࠊདњ˛䖭ህᰃ᠔䇧“lazy 452 } 451 return 0; 450 spin_unlock_irq(¤t•>sigmask_lock); 449 current•>sig = newsig; 448 spin_lock_irq(¤t•>sigmask_lock); 447 memcpy(newsig•>action, current•>sig•>action, sizeof(newsig•>action)); 446 atomic_set(&newsig•>count, 1); 445 spin_lock_init(&newsig•>siglock); 444 return •ENOMEM; 443 if (newsig == NULL) 442 newsig = kmem_cache_alloc(sigact_cachep, GFP_KERNEL); 441 return 0; 440 if (atomic_read(¤t•>sig•>count) <= 1) 439 438 struct signal_struct * newsig; 437 { 436 static inline int make_private_signals(void) 435 434 */ 433 * table via the CLONE_SIGNAL option to clone().) 432 * disturbing other processes. (Other processes might share the signal 431 * so that flush_signal_handlers can later reset the handlers without 430 * This function makes sure the current process has its own signal table, 429 /* [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>flush_old_exec()>make_private_signals()] ==================== fs/exec.c 429 452 ==================== copy_sighand()෎ᴀⳌৠDŽ ⧚㸼ⱘ䆱ˈህ㽕ᡞᅗ໡ࠊ䖛ᴹDŽℷ಴Ў䖭ḋˈmake_private_signals()ⱘҷⷕϢ do_fork()Ё䇗⫼ⱘ ⿟ⱘֵো໘⧚㸼DŽ⦄೼ˈᄤ䖯⿟᳔㒜㽕Ā㞾ゟ䮼᠋āњˈ᠔ҹ㽕ⳟϔϟབᵰ䖬೼݅ѿ⠊䖯⿟ⱘֵো໘ 㛑Ꮖ㒣໡ࠊ䖛ᴹˈԚг᳝ৃ㛑াᰃᡞ⠊䖯⿟ⱘֵো໘⧚㸼ᣛ䩜໡ࠊњ䖛ᴹˈ㗠䗮䖛䖭ᣛ䩜ᴹ݅ѿ⠊䖯 ᮁ৥䞣㸼ˈ㱑✊䖤⫼ⱘሖ⃵ϡৠˈ݊ὖᗉᰃⳌԐⱘDŽᔧᄤ䖯⿟㹿߯ᓎߎᴹᯊˈ⠊䖯⿟ⱘֵো໘⧚㸼ৃ ϔϾ㋏㒳ЁⱘЁڣᰃ䖯⿟ⱘֵো˄䕃Ёᮁ˅໘⧚㸼DŽ៥Ӏ䆆䖛ˈϔϾ䖯⿟ⱘֵো໘⧚㸼ህདܜ佪 { 585 321 322 ==================== fs/exec.c 385 427 ==================== [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>flush_old_exec()>exec_mmap()] 385 static int exec_mmap(void) 386 { 387 struct mm_struct * mm, * old_mm; 388 389 old_mm = current•>mm; 390 if (old_mm && atomic_read(&old_mm•>mm_users) == 1) { 391 flush_cache_mm(old_mm); 392 mm_release(); 393 exit_mmap(old_mm); 394 flush_tlb_mm(old_mm); 395 return 0; 396 } 397 398 mm = mm_alloc(); 399 if (mm) { 400 struct mm_struct *active_mm = current•>active_mm; 401 402 if (init_new_context(current, mm)) { 403 mmdrop(mm); 404 return •ENOMEM; 405 } 406 407 /* Add it to the list of mm's */ 408 spin_lock(&mmlist_lock); 409 list_add(&mm•>mmlist, &init_mm.mmlist); 410 spin_unlock(&mmlist_lock); 411 412 task_lock(current); 413 current•>mm = mm; 414 current•>active_mm = mm; 415 task_unlock(current); 416 activate_mm(active_mm, mm); 417 mm_release(); 418 if (old_mm) { 419 if (active_mm != old_mm) BUG(); 420 mmput(old_mm); 421 return 0; 422 } 423 mmdrop(active_mm); 424 return 0; 425 } 426 return •ENOMEM; 427 } ৠḋˈᄤ䖯⿟ⱘ⫼᠋ぎ䯈ৃ㛑ᰃ⠊䖯⿟⫼᠋ぎ䯈ⱘ໡ࠊકˈгৃ㛑াᰃ䗮䖛ϔϾᣛ䩜ᴹ݅ѿ⠊䖯 ⿟ⱘ⫼᠋ぎ䯈ˈ䖭ϔ⚍া㽕Ẕᶹϔϟᇍ⫼᠋ぎ䯈ǃгህᰃ current•>mm ⱘ݅ѿ䅵᭄ህৃ⏙ἮDŽᔧ݅ѿ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ҹᅝܼഄ㒻㓁䖤㸠DŽेՓབℸˈг䖬ᰃ᳝ॅ䰽ˈᄤ䖯⿟㒱ᇍϡ㛑Ң䇗⫼ vfork()ⱘ䙷Ͼߑ᭄Ё䖨ಲˈ৺ ϟᄤ䖯⿟䛑Ӯ䞞ᬒ݊݅ѿⱘ⫼᠋ぎ䯈ˈՓ⠊䖯⿟ৃމⳂᷛ⿟ᑣˈ៪㗙䗮䖛 exit()ᇓ㒜ℷᆱDŽ೼䖭ϸ⾡ᚙ 䖭ϾᷛᖫԡЎ 1 ᯊˈ⠊䖯⿟೼߯ᓎњᄤ䖯⿟ҹৢህ䖯ܹⴵ⳴⢊ᗕˈㄝ׭ᄤ䖯⿟䗮䖛 execve()ᠻ㸠঺ϔϾ ܹ㋏㒳ぎ䯈䖤㸠DŽ᠔ҹˈ೼ sys_vfork()䇗⫼ do_fork()ᯊ㒧ড়Փ⫼њ঺ϔϾᷛᖫԡ CLONE_VFORKDŽᔧ vfork()ⱘՓ⫼ᰃᕜॅ䰽ⱘˈ೼ᄤ䖯⿟ᇮ᳾ᬒᓗᇍ⠊䖯⿟⫼᠋ぎ䯈ⱘ݅ѿПࠡˈ㒱ϡ㛑䅽ϸϾ䖯⿟䛑䖯 ϟᄤ䖯⿟ৃҹᬍব⠊䖯⿟ⱘේᷜˈড䖛ᴹ⠊䖯⿟гৃҹᬍবᄤ䖯⿟ⱘේᷜʽ಴Ў䖭Ͼॳ಴ˈމ䖭⾡ᚙ ܹњ⠊䖯⿟ⱘぎ䯈ЁDŽ៥Ӏⶹ䘧ˈᔧϔϾ䖯⿟೼⫼᠋ぎ䯈䖤㸠ᯊˈ݊ේᷜг೼⫼᠋ぎ䯈DŽ䖭ᛣੇⴔ೼ 义䴶㸼೼ݙⱘ᠔᭄᳝᥂㒧ᵘˈ䙷ህ᮴⊩ᅲᮑ“copy_on_writeāњDŽℸᯊᄤ䖯⿟᠔ݭܹⱘݙᆍህⳳℷ䖯 ⠽⧚义䴶ˈ䖭ህি“copy_on_writeāDŽⳌ↨Пϟˈབᵰᄤ䖯⿟Ϣ⠊䖯⿟݅ѿ⫼᠋ぎ䯈ˈгህᰃ݅ѿࣙᣀ ᶤϾ义䴶ⱘݙᆍᯊˈህӮ಴ᴗ䰤ϡヺ㗠ᇐ㟈义䴶ᓖᐌˈ೼义䴶ᓖᐌⱘ໘⧚⿟ᑣЁЎᄤ䖯⿟໡ࠊ᠔䳔ⱘ 䴶㸼ˈৃҹ೼ᄤ䖯⿟ⱘ义䴶㸼䞠ᡞᇍ᠔᳝义䴶ⱘ䆓䯂ᴗ䰤䛑䆒㕂៤Āা䇏āDŽ䖭ḋˈᔧᄤ䖯⿟ӕ೒ᬍব 㒧ᵘˈԚᰃ᳔㒜೼⠽⧚Ϟ䖬ᰃϢ⠊䖯⿟݅⫼Ⳍৠⱘ义䴶DŽϡ䖛ˈ⬅Ѣᄤ䖯⿟᳝݊⣀ゟⱘ义䴶ⳂᔩϢ义 ᴹᮄⱘ䯂乬DŽҹࠡ䆆䖛ˈfork()ҹৢˈexecve()Пࠡˈᄤ䖯⿟㱑✊᳝ᅗ㞾Ꮕⱘϔᭈ༫ҷ㸼⫼᠋ぎ䯈ⱘ᭄᥂ 䞞ᬒ⫼᠋ぎ䯈䖭ϔℹˈⳈ᥹ህЎᄤ䖯⿟ߚ䜡ᮄⱘ⫼᠋ぎ䯈DŽԚᰃˈ䖭ḋϔᴹⳕџᰃⳕџњˈैৃ㛑ᏺ এ໡ࠊ⫼᠋ぎ䯈ⱘ咏⚺DŽ㗠ᔧᄤ䖯⿟䇗⫼ execve()ህৃҹ䏇䖛ܡⱘ⫼᠋ぎ䯈DŽ䖭ḋˈ߯ᓎᄤ䖯⿟ᯊৃҹ ᄤ䖯⿟ˈ㗠াᰃᇚᣛ৥ mm_struct ᭄᥂㒧ᵘⱘᣛ䩜໡ࠊ㒭ᄤ䖯⿟ˈ䅽ᄤ䖯⿟䗮䖛䖭Ͼᣛ䩜ᴹ݅ѿ⠊䖯⿟ Ͼᰃ CLONE_VMDŽᔧ CLONE_VM ᷛᖫԡЎ 1 ᯊˈݙḌᑊϡᇚ⠊䖯⿟ⱘ⫼᠋ぎ䯈˄᭄᥂㒧ᵘ˅໡ࠊ㒭 ৃ㾕ˈsys_vfork()೼䇗⫼ do_fork()ᯊ↨ sys_fork()໮њϸϾᷛᖫԡˈϔϾᰃ CLONE_VFORKˈ঺ϔ 720 } 719 return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs.esp, ®s, 0); 718 { 717 asmlinkage int sys_vfork(struct pt_regs regs) ==================== arch/i386/kernel/process.c 717 720 ==================== 693 } 692 return do_fork(SIGCHLD, regs.esp, ®s, 0); 691 { 690 asmlinkage int sys_fork(struct pt_regs regs) ==================== arch/i386/kernel/process.c 690 693 ==================== sys_vfork()೼䇗⫼ do_fork()ᯊⱘϡৠ˄arch/i386/kernel/process.c˅˖ 䇗⫼ҹৢজ๲ࡴњϔϾ vfork()㋏㒳䇗⫼˄Ң BSD Unix ᓔྟ˅ⱘॳ಴DŽ䅽៥Ӏಲ乒ϔϟ sys_fork()Ϣ ᯊজ䕯䕯㢺㢺ᡞᅗӀܼ䚼䞞ᬒ˛᮶᳝Ҟ᮹ˈԩᖙᔧ߱˛ᰃⱘˈ䖭⹂ᅲϡড়⧚DŽ䖭ህᰃ೼᳝њ fork()㋏㒳 ᄤ䖯⿟ⱘᯊ׭ˈ䕯䕯㢺㢺ഄ໡ࠊњҷ㸼⫼᠋ぎ䯈ⱘ᠔᭄᳝᥂㒧ᵘˈ䲒䘧Ⳃⱘህ೼Ѣ⿡ৢ೼ᠻ㸠 execve() ᰃ㽕䯂ϔহˈ೼⠊䖯⿟ fork()צ೼៥Ӏ⦄೼݇ᖗП߫ˈ㗠Ϩࠡ㗙ᇍ i386 ໘⧚఼㗠㿔ḍᴀህᰃぎ䇁হDŽ䖭䞠 䴶г䇗⫼њ䖭Ͼߑ᭄DŽ㟇Ѣ flush_cache_mm()੠ flush_tlb_mm()ˈ䙷াᰃՓ催䗳㓧ᄬϢݙᄬⳌϔ㟈ˈϡ 䇏DŽ೼䇗⫼ exit_mmap()Пࠡ䖬䇗⫼њϔϾߑ᭄ mm_release()ˈᇍℸ៥Ӏᇚ೼⿡ৢࡴҹ䅼䆎ˈ಴Ў೼ৢ 䴶㸼Ёⱘ㸼乍䛑䆒㕂៤ 0DŽ݋ԧഄ䖭ᰃ⬅ exit_mmap()ᅠ៤ⱘˈ݊ҷⷕ೼ mm/mmap.c Ёˈ䇏㗙ৃ㞾㸠䯙 mm_struct ᭄᥂㒧ᵘҹϟⱘ᠔᳝ vm_area_struct ᭄᥂㒧ᵘ˄Ԛᰃϡࣙᣀ mm_struct 㒧ᵘᴀ䑿˅ˈᑊϨᇚ义 䞞ᬒܜ䅵᭄Ў 1 ᯊˈ㸼ᯢᇍℸぎ䯈ⱘՓ⫼ᰃ⣀ऴⱘˈгህᰃ䇈䖭ᰃҢ⠊䖯⿟໡ࠊ䖛ᴹⱘˈ䙷ህ㽕 323 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ぎ䯈ߛᤶࠄњ⬅ᮄߚ䜡 mm_struct ᭄᥂㒧ᵘ᠔ҷ㸼ⱘぎ䯈ህৃҹњDŽ䖬㽕ᣛߎˈ⦄೼ᮄⱘĀ⫼᠋ぎ䯈” ៥Ӏᇚ೼Ā䖯⿟ⱘ䇗ᑺϢߛᤶāϔ㡖Ё䯙䇏 switch_mm()ⱘҷⷕˈ೼䖭䞠া㽕ⶹ䘧ᔧࠡ䖯⿟ⱘ⫼᠋ 62 switch_mm((prev),(next),NULL,smp_processor_id()) 61 #define activate_mm(prev, next) \ ==================== include/asm•i386/mmu_context.h 61 62 ==================== ߛᤶࠄ䖭Ͼᮄⱘ⫼᠋ぎ䯈DŽ䖭ᰃϔϾᅣ᪡԰ˈᅮНѢ include/asm•i386/mmu_context.h˖ 㒧ᵘЁⱘᣛ䩜 mm ੠ active_mm 䆒㕂៤ᣛ৥ᮄߚ䜡ⱘ mm_struct ᭄᥂㒧ᵘҹৢˈህ㽕䗮䖛 activate_mm() 㒧ᵘⱘ CPUˈ䖭䞠ⱘ init_new_context()ᰃぎ᪡԰ˈ∌䖰䖨ಲ 0ˈ᠔ҹᡞᅗ䏇䖛DŽᡞᔧࠡ䖯⿟ⱘ task_struct ϔϾ mm_struct ᭄᥂㒧ᵘҹঞ义䴶ⳂᔩˈՓᕫ⿡ৢৃҹ೼ℸ෎⸔Ϟᓎゟ䍋ᄤ䖯⿟ⱘ⫼᠋ぎ䯈DŽᇍѢ i386 ぎ䯈ˈ䙷ህϡ䳔㽕䇗⫼ exit_mmap()䞞ᬒҷ㸼⫼᠋ぎ䯈ⱘ䙷ѯ᭄᥂㒧ᵘњDŽԚᰃˈℸᯊ㽕Ўᄤ䖯⿟ߚ䜡 ಲࠄ exec_mmap()Ёˈབᵰᄤ䖯⿟ⱘ⫼᠋ぎ䯈ᰃ䗮䖛ᣛ䩜݅ѿ㗠ϡᰃ໡ࠊⱘˈ៪㗙ḍᴀህ≵᳝⫼᠋ 277 } 276 } 275 up(tsk•>p_opptr•>vfork_sem); 274 tsk•>flags &= ~PF_VFORK; 273 if (tsk•>flags & PF_VFORK) { 272 /* notify parent sleeping on vfork() */ 271 270 struct task_struct *tsk = current; 269 { 268 void mm_release(void) 267 */ 266 * Eric Biederman 10 January 1998 265 * restoring the old one. . . 264 * the old one. Because we mmput the new mm_struct before 263 * only half set up a mm_struct for a new process and need to restore 262 * This difference is important for error handling, when we 261 * 260 * from the current process. 259 * mm_release is called after a mm_struct has been removed 258 * 257 * error success whatever. 256 * mmput is called whenever we stop holding onto a mm_struct, 255 /* Please note the differences between mmput and mm_release. [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>flush_old_exec()>exec_mmap()>mm_release()] ==================== kernel/fork.c 255 277 ==================== mm_release()ⱘҷⷕ೼ kernel/fork.c Ё˖ 䖯ܹⴵ⳴ⱘᅝᥦˈ䖭䞠ⱘ mm_release()߭䅽ᄤ䖯⿟೼ℸֵো䞣Ϟᠻ㸠ϔ⃵ up()᪡԰ᇚ⠊䖯⿟૸䝦DŽߑ᭄ 㗙Ꮖ㒣೼ do_fork()ⱘҷⷕЁⳟࠄњݙḌ䅽⠊䖯⿟೼ϔϾ 0 䌘⑤ⱘĀֵো䞣āϞᠻ㸠ϔ⃵ down()᪡԰㗠 䙷МˈᗢḋՓ⠊䖯⿟䖯ܹⴵ⳴㗠ㄝᕙᄤ䖯⿟䇗⫼ execve()៪ exit()ਸ਼˛ᔧ✊ৃҹ᳝ϡৠⱘᅲ⦄DŽ䇏 execve()䖭ϾࠡᦤПϞⱘDŽ ߭䖬ᰃৃ㛑⸈ണ⠊䖯⿟ⱘ䖨ಲഄഔDŽ᠔ҹˈvfork()ᅲ䰙Ϟᰃᓎゟ೼ᄤ䖯⿟೼߯ᓎҹৢゟेህӮ䇗⫼ 324 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Փ⫼䅵᭄DŽ䖭ᰃϔϾ inline ߑ᭄ˈ݊ҷⷕ೼݊ޣactive_mm њDŽ᠔ҹˈ㽕䇗⫼ mmdrop()ˈ䗦 ᴹⱘ׳⫼⦄೼ˈᏆ㒣Ў䖭ݙḌ㒓⿟ߚ䜡њᅗ㞾Ꮕⱘ mm_struct 㒧ᵘˈՓ݊छḐ៤Ўњ䖯⿟ˈህϡݡՓ 䙷Ͼ䖯⿟ⱘ active_mm˄䆺㾕Ā䖯⿟ⱘ䇗ᑺϢߛᤶā˅ˈ಴㗠㽕䗦๲䖭Ͼ mm_struct 㒧ᵘⱘՓ⫼䅵᭄DŽ㗠 ā೼ᅗПࠡ䖤㸠ⱘ⫼׳ℶ䖤㸠ᯊজᇚ䆹ᣛ䩜䆒㕂៤ 0DŽгህᰃ䇈ˈϔϾݙḌ㒓⿟೼ফ䇗ᑺ䖤㸠ᯊ㽕Āذ ݙḌᇚ݊ active_mm 䆒㕂៤Ϣ೼݊Пࠡ䖤㸠ⱘ䙷Ͼ䖯⿟ⱘ active_mm Ⳍৠˈ㗠೼䇗ᑺ݊ˈމ೼䖭⾡ᚙ ϔϾDŽ׳˄ݙḌ㒓⿟˅㹿䇗ᑺ䖤㸠ᯊˈ㽕∖ᅗⱘ active_mm ϔᅮ㽕ᣛ৥ᶤϾ mm_struct 㒧ᵘˈ᠔ҹাད᱖ active_mmDŽᇍѢ݋᳝⫼᠋ぎ䯈ⱘ䖯⿟䖭ϸϾᣛ䩜ྟ㒜ᰃϔ㟈ⱘDŽԚᰃˈᔧϔϾϡ݋໛⫼᠋ぎ䯈ⱘ䖯⿟ ⱘDŽ䖯⿟ⱘ task_struct Ё᳝ϸϾ mm_struct 㒧ᵘᣛ䩜˖ϔϾᰃ mmˈᣛ৥䖯⿟ⱘ⫼᠋ぎ䯈ˈ঺ϔϾᰃ Ԛᰃˈ঺ϔϾ mm_struct 㒧ᵘᣛ䩜 active_mm ैϡЎ 0ˈ䖭ᰃ಴Ў೼䖯⿟ߛᤶᯊⱘϔϾ⡍⅞㽕∖㗠ᓩ䍋 ᯊˈ݊ task_struct 㒧ᵘЁⱘ mm_struct 㒧ᵘᣛ䩜 mm Ў 0ˈгህᰃ≵᳝⫼᠋ぎ䯈˄᠔ҹᰃݙḌ㒓⿟˅DŽ 㽕㗗㰥ˈ䙷ህᰃᔧᄤ䖯⿟䖯ܹexec_mmap()މಲࠄࠡ䴶exec_mmap()ⱘҷⷕЁˈ᳔ৢ䖬᳝ϔϾ⡍⅞ᚙ ⱘ݅ѿᯊϡӮՓ݅ѿ䅵᭄䖒ࠄ 0DŽ Ёˈ᮶✊ᄤ䖯⿟䗮䖛ᣛ䩜݅ѿ⠊䖯⿟ⱘ⫼᠋ぎ䯈ˈ߭⠊䖯⿟ᑨ䆹ⴵ⳴ㄝᕙˈ᠔ҹᔧᄤ䖯⿟䞞ᬒᇍぎ䯈 ϟᠡথ⫳DŽ㗠೼៥Ӏ⦄೼䖭Ͼᚙ᱃މҹৢব៤њ 0 䖭⾡⡍⅞ᚙ 1 ޣ াᰃ೼ᇚ⠊䖯⿟ⱘ mm•>mm_users ߭䖯ϔℹᇚ䖭Ͼ໇ˈгህᰃ义䴶㸼੠义䴶Ⳃᔩҹঞ mm_struct ᭄᥂㒧ᵘᴀ䑿ˈгܼ䛑䞞ᬒњDŽϡ䖛ˈ䖭 义䴶㸼ЁϢ⫼᠋ぎ䯈Ⳍᇍᑨⱘ㸼乍䛑䆒㕂៤ 0ˈՓᭈϾĀ⫼᠋ぎ䯈ā៤ЎњϔϾĀぎ໇āDŽ㗠 mmdrop()ˈ ៥ӀᏆ㒣ҟ㒡䖛 exit_mmap()ⱘ԰⫼ˈᅗ䞞ᬒ mm_struct ϟ䴶ⱘ᠔᳝ vm_area_struct ᭄᥂㒧ᵘˈᑊϨᇚ ҹৢব៤њ 0ˈህᇍ mm ᠻ㸠 exit_mmap()੠ mmdrop()DŽ 1 ޣབᵰˈ1 ޣ ህᰃ䇈ˈᇚ mm•>mm_users 253 } 252 } 251 mmdrop(mm); 250 exit_mmap(mm); 249 spin_unlock(&mmlist_lock); 248 list_del(&mm•>mmlist); 247 if (atomic_dec_and_lock(&mm•>mm_users, &mmlist_lock)) { 246 { 245 void mmput(struct mm_struct *mm) 244 */ 243 * Decrement the use count and release all resources for an mm. 242 /* [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>flush_old_exec()>exec_mmap()>mmput()] ==================== kernel/fork.c 242 253 ==================== њDŽ䖭ᰃ⬅ mmput()ᅠ៤ⱘˈ݊ҷⷕ೼ kernel/fork.c Ё˖ ҹৢ䖒ࠄњ 0ˈ߭䖬㽕ᇚ݊ϟሲⱘ᭄᥂㒧ᵘ䞞ᬒˈ಴ЎℸᯊᏆ≵᳝䖯⿟䖬೼Փ⫼䖭Ͼぎ䯈 1 ޣ݅ѿ䅵᭄ ᇥᅗⱘ݅ѿ䅵᭄DŽℸ໪ˈབᵰᇚᅗⱘޣĀҹ䰆ϛϔā㗠ᏆDŽ䙷МˈᇍѢ⠊䖯⿟ⱘ⫼᠋ぎ䯈ਸ਼˛ᔧ✊㽕 CLONE_VM ᷛᖫⳌ㘨㋏ⱘˈ᠔ҹ䖭䞠ᇍ mm_release()ⱘ䇗⫼᳈Ў݇䬂ˈ㗠ࠡ䴶ⱘ mm_release()߭াᰃ 䯈ⱘ݅ѿDŽᔧ✊ˈℸᯊ㽕ᠻ㸠 mm_release()ᇚ⠊䖯⿟૸䝦DŽᅲ䰙ϞˈCLONE_VFORK 䗮ᐌ䛑ᰃϢ ৃᰃˈॳᴹⱘ⫼᠋ぎ䯈߭ҢℸϢᔧࠡ䖯⿟᮴݇њDŽгህᰃ䇈ˈᔧࠡ䖯⿟᳔㒜ᬒᓗњᇍॳᴹ⫼᠋ぎ ᠋ぎ䯈ⱘߛᤶᇍⳂࠡⱘ䖤㸠ᑊ᮴ᕅડDŽ ᅲ䰙ϞাᰃϔϾḚᶊˈϔϾĀぎ໇āˈ䞠䴶ϔϾ义䴶г≵᳝DŽ঺ϔᮍ䴶ˈ⦄೼ᰃ೼ݙḌЁ䖤㸠ˈ᠔ҹ⫼ 325 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᠋ぎ䯈ⱘ݅ѿˈ᠔ҹህ㽕䗮䖛 de_thread()Ң䖭Ͼ㒓⿟㒘Ё㜅⾏ߎᴹDŽ䖭Ͼߑ᭄ⱘҷⷕ೼ fs/exec.c Ё˖ ᣖܹ⬅݊⠊䖯⿟Ў佪ⱘĀ㒓⿟㒘ā䯳߫DŽ⦄೼ˈᅗᏆ㒣೼䗮䖛 execve()छ㑻Ў䖯⿟ˈᬒᓗњᇍ⠊䖯⿟⫼ བᵰĀᔧࠡ䖯⿟āॳᴹাᰃϔϾ㒓⿟ˈ䙷Мᅗⱘ task_struct 㒧ᵘ䗮䖛㒧ᵘЁⱘ䯳߫༈ thread_group ੠ i387 ण໘⧚఼᳝݇ⱘݙᆍˈϡᰃ៥Ӏ᠔݇ᖗⱘDŽ 䖬㽕ᡞ bprm•>filename ⱘⳂᷛ⿟ᑣ䏃ᕘৡЁⱘ᳔ৢϔ↉ᡘ䖛এDŽ᥹ⴔⱘ flush_thread()াᰃ໘⧚Ϣ debug џᚙDŽℸ໪ˈ䖯⿟ⱘ task_struct 㒧ᵘЁ᳝ϔϾᄫヺ᭄㒘 comm[]ˈ⫼Ѣֱᄬ䖯⿟᠔ᠻ㸠ⱘ⿟ᑣৡˈ᠔ҹ ⱘخЎ 0 ህ㽕ᇚ݊᠔ऴⱘぎ䯈䞞ᬒˈ䖭ህᰃ release_old_signals()᠔ৢޣ໘⧚㸼ⱘ݅ѿ䅵᭄ˈᑊϨབᵰ䗦 䖯⿟ֵো⠊ޣ䖛ᣛ䩜݅ѿ⠊䖯⿟ⱘֵো໘⧚㸼ⱘˈ㗠⦄೼᳝њ㞾Ꮕⱘ⣀ゟⱘֵো໘⧚㸼ˈ᠔ҹг㽕䗦 њˈಲϡࠄॳᴹⱘ⫼᠋ぎ䯈Ёএњ˄㾕ҷⷕЁⱘ⊼㾷˅DŽࠡ䴶䆆䖛ˈᔧࠡ䖯⿟˄ᄤ䖯⿟˅ॳᴹৃ㛑ᰃ䗮 ব៤њϔϾ⣀ゟⱘĀぎ໇āˈгህᰃϔϾ໻ᇣЎ 0 ⱘ⣀ゟⱘ⫼᠋ぎ䯈DŽ䖭ᯊ׭ⱘ䖯⿟Ꮖ㒣ᰃĀН᮴ড乒” Ң exec_mmap()䖨ಲࠄ flush_old_exec()ᯊˈᄤ䖯⿟Ң⠊䖯⿟㒻ᡓⱘ⫼᠋ぎ䯈Ꮖ㒣䞞ᬒˈ݊⫼᠋ぎ䯈 DŽޣḌЁᇍ䆹 mm_struct ᭄᥂㒧ᵘⱘՓ⫼㗠๲ 㗠 mm_count ߭಴ݙˈޣ㒧ᵘߚ䜡П߱Ѡ㗙䛑䆒Ў 1ˈ✊ৢ mm_users 䱣ᄤ䖯⿟ᇍ⫼᠋ぎ䯈ⱘ݅ѿ㗠๲ ব៤ 0 ᠡӮᇚ݊䞞ᬒDŽ⊼ᛣϸϾ䅵఼᭄ˈे mm_users Ϣ mm_count ⱘऎ߿DŽ೼ mm_structৢޣ᳝೼䗦 ᑊẔᶹ݊Փ⫼䅵᭄ mm_countˈাޣৃ㾕ˈmmdrop()೼ᇚϔϾ mm_struct ᭄᥂㒧ᵘ䞞ᬒПࠡг㽕䗦 240 } 239 free_mm(mm); 238 destroy_context(mm); 237 pgd_free(mm•>pgd); 236 if (mm == &init_mm) BUG(); 235 { 234 inline void __mmdrop(struct mm_struct *mm) 233 */ 232 * mmput. Free the page directory and the mm. 231 * is dropped: either by a lazy thread or by 230 * Called when the last reference to the mm 229 /* >__mmdrop()] [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>flush_old_exec()>exec_mmap()>mmdrop() ==================== kernel/fork.c 229 240 ==================== 㗠__mmdrop()ⱘҷⷕ߭೼ kernel/fork.c Ё˖ 715 } 714 __mmdrop(mm); 713 if (atomic_dec_and_test(&mm•>mm_count)) 712 { 711 static inline void mmdrop(struct mm_struct * mm) 710 extern inline void FASTCALL(__mmdrop(struct mm_struct *)); 709 /* mmdrop drops the mm and the page tables */ [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>flush_old_exec()>exec_mmap()>mmdrop()] ==================== include/linux/sched.h 709 715 ==================== include/linux/sched.h Ё˖ 326 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 140 sigemptyset(&ka•>sa.sa_mask); 139 ka•>sa.sa_flags = 0; 138 ka•>sa.sa_handler = SIG_DFL; 137 if (ka•>sa.sa_handler != SIG_IGN) 136 for (i = _NSIG ; i != 0 ; i••) { 135 struct k_sigaction *ka = &t•>sig•>action[0]; 134 int i; 133 { 132 flush_signal_handlers(struct task_struct *t) 131 void 130 129 */ 128 * Flush all handlers for a task. 127 /* [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>flush_old_exec()>flush_signal_handlers()] ==================== kernel/signal.c 127 143 ==================== ᳡ࡵ⿟ᑣⱘ㸼乍ᬍ៤ SIG_DFLDŽ䖭ᰃ⬅ flush_signal_handlers()ᅠ៤ⱘˈҷⷕ೼ kernel/signal.c Ё˖ 䛑Ꮖ㒣ᬒᓗњˈᗢМ䖬㛑䅽ֵো໘⧚㸼ⱘ㸼乍ᣛ৥⫼᠋ぎ䯈ⱘᄤ⿟ᑣਸ਼˛᠔ҹ䖬ᕫẔᶹϔ䘡ˈᇚᣛ৥ ᓣ˄՟བᬊࠄ SIGQUIT ህ exit()˅˗㄀ϝ⾡ህᰃᣛ৥ϔϾ⫼᠋ぎ䯈ⱘᄤ⿟ᑣDŽৃᰃˈ⦄೼ᭈϾ⫼᠋ぎ䯈 乍ⱘ᳝ؐϝ⾡ৃ㛑˖ϔ⾡ৃ㛑ᰃ SIG_IGNˈ㸼⼎ϡ⧚ⵀ˗㄀Ѡ⾡ᰃ SIG_DFLˈ㸼⼎䞛প乘䆒ⱘડᑨᮍ ˄default˅ડᑨˈᑊϡϔᅮ䴲㽕ᣛ৥ϔϾ᳡ࡵ⿟ᑣDŽᔧᡞֵো໘⧚㸼Ң⠊䖯⿟໡ࠊ䖛ᴹᯊˈ݊Ё↣Ͼ㸼 ৥䞣㸼Ёⱘ㸼乍㽕Мᣛ৥ϔϾ᳡ࡵ⿟ᑣˈ㽕Мህ≵᳝˗㗠ֵো໘⧚㸼Ё߭䖬ৃҹ᳝ᇍ৘⾡ֵো乘䆒ⱘ ᰃϾЁᮁ৥䞣㸼DŽԚᰃˈ䖭䞠䖬᳝Ͼ䞡㽕ⱘϡৠˈህᰃЁᮁڣࠡ䴶䇈䖛ˈ䖯⿟ⱘֵো໘⧚㸼ህད 521 } 520 tsk•>tgid = tsk•>pid; 519 /* Minor oddity: this might stay the same. */ 518 517 } 516 write_unlock_irq(&tasklist_lock); 515 list_del_init(&tsk•>thread_group); 514 write_lock_irq(&tasklist_lock); 513 if (!list_empty(&tsk•>thread_group)) { 512 { 511 static inline void de_thread(struct task_struct *tsk) 510 */ 509 * at worst the list_del_init() might end up being a no•op. 508 * unlink • even if we race with the last other thread exit, 507 * dynamically. And if we are, we only need to protect the 506 * a thread group, there is no way we can become one 505 * whether we migth need to do this. If we're not part of 504 * Note: we don't have to hold the tasklist_lock to test 503 * An execve() will automatically "de•thread" the process. 502 /* >__mmdrop()] [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>flush_old_exec()>exec_mmap()>mmdrop() fs/exec.c 502 521 ==================== ==================== 327 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ໛䖢᥹ᮄޚˈҢ flush_old_exec()䖨ಲࠄ load_aout_binary()Ёᯊˈᔧࠡ䖯⿟Ꮖ㒣ᅠ៤њϢ䖛এਞ߿ ϡ݇䯁ⱘ˗݊ᅗⱘᏆᠧᓔ᭛ӊ߭䛑ᑨ݇䯁ˈԚᰃгৃҹ䗮䖛 ioctl()㋏㒳䇗⫼ᴹࡴҹᬍবDŽ ϔ㠀ᴹ䇈ˈ䖯⿟ⱘᓔ༈ϝϾ᭛ӊˈे fd Ў 0ǃl ੠ 2˄៪ stdinǃstdout ҹঞ stderr˅ⱘᏆᠧᓔ᭛ӊᰃ 500 } 499 write_unlock(&files•>file_lock); 498 } 497 496 write_lock(&files•>file_lock); 495 } 494 } 493 sys_close(i); 492 if (set & 1) { 491 for ( ; set ; i++,set >>= 1) { 490 write_unlock(&files•>file_lock); 489 files•>close_on_exec•>fds_bits[j] = 0; 488 continue; 487 if (!set) 486 set = files•>close_on_exec•>fds_bits[j]; 485 break; 484 if (i >= files•>max_fds || i >= files•>max_fdset) 483 i = j * __NFDBITS; 482 j++; 481 480 unsigned long set, i; 479 for (;;) { 478 write_lock(&files•>file_lock); 477 476 long j = •1; 475 { 474 static inline void flush_old_files(struct files_struct * files) 473 472 */ 471 * so that a new one can be started 470 * These functions flushes out all traces of the currently running executable 469 /* [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>flush_old_exec()>flush_old_files()] ==================== fs/exec.c 469 500 ==================== fs/exec.c Ё˖ ⱘህᰃḍ᥂䖭Ͼԡ೒ⱘᣛ⼎ᇚ䖭ѯ᭛ӊ݇䯁ˈᑊϨᇚℸԡ೒⏙៤ܼ 0DŽ݊ҷⷕ೼خ㗠 flush_old_files()㽕 ⴔ㸼⼎ાѯ᭛ӊ೼ᠻ㸠ϔϾᮄⳂᷛ⿟ᑣᯊᑨќ݇䯁ⱘֵᙃDŽټ㒧ᵘЁ᳝Ͼԡ೒ close_on_execˈ䞠䴶ᄬ ᣛ৥ϔϾ file_struct 㒧ᵘⱘᣛ䩜“filesāˈ᠔ᣛ৥ⱘ᭄᥂㒧ᵘЁֱᄬⴔᏆᠧᓔ᭛ӊⱘֵᙃDŽ೼ file_struct ᳔ৢˈᰃᇍॳ᳝Ꮖᠧᓔ᭛ӊⱘ໘⧚ˈ䖭ᰃ⬅ flush_old_files()ᅠ៤ⱘDŽ䖯⿟ⱘ task_struct 㒧ᵘЁ᳝Ͼ 143 } 142 } ka++; 141 328 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 153 #define N_BSSADDR(x) (N_DATADDR(x) + (x).a_data) 152 #if !defined (N_BSSADDR) 151 /* Address of bss segment in memory after it is loaded. */ 150 149 #endif 148 : (_N_SEGMENT_ROUND (_N_TXTENDADDR(x)))) 147 (N_MAGIC(x)==OMAGIC? (_N_TXTENDADDR(x)) \ 146 #define N_DATADDR(x) \ 145 #ifndef N_DATADDR 144 143 #define _N_TXTENDADDR(x) (N_TXTADDR(x)+(x).a_text) 142 141 #define _N_SEGMENT_ROUND(x) (((x) + SEGMENT_SIZE • 1) & ~(SEGMENT_SIZE • 1)) ==================== include/linux/a.out.h 141 154 ==================== 111 #endif 110 #define N_TXTADDR(x) (N_MAGIC(x) == QMAGIC ? PAGE_SIZE : 0) 109 #if !defined (N_TXTADDR) 108 /* Address of text segment in memory after it is loaded. */ ==================== include/linux/a.out.h 108 111 ==================== њ start ੠ end ϸϾᣛ䩜DŽ↣↉ⱘ䍋ྟഄഔᅮНѢ include/linux/a.out.h˖ ໛DŽⳂᷛҷⷕⱘ᯴䈵ߚ៤ textǃdata ҹঞ bss ϝ↉ˈmm_struct 㒧ᵘЁЎ↣Ͼ↉䛑䆒㕂ޚⷕⱘ᯴䈵԰ད ぎ䯈ᑊ䇏ܹৃᠻ㸠ҷټ䖭䞠ᰃᇍᮄⱘ mm_struct ᭄᥂㒧ᵘЁⱘϔѯব䞣䖯㸠߱ྟ࣪ˈЎҹৢߚ䜡ᄬ 307 current•>flags &= ~PF_FORKNOEXEC; 306 compute_creds(bprm); 305 current•>mm•>mmap = NULL; 304 current•>mm•>rss = 0; 303 302 (current•>mm•>start_brk = N_BSSADDR(ex)); 301 current•>mm•>brk = ex.a_bss + 300 (current•>mm•>start_data = N_DATADDR(ex)); 299 current•>mm•>end_data = ex.a_data + 298 (current•>mm•>start_code = N_TXTADDR(ex)); 297 current•>mm•>end_code = ex.a_text + 296 295 #endif 294 #endif 293 memcpy(¤t•>thread.core_exec, &ex, sizeof(struct exec)); 292 #if !defined(__sparc_v9__) 291 set_personality(PER_SUNOS); 290 #else 289 set_personality(PER_LINUX); 288 #if !defined(__sparc__) 287 /* OK, This is the point of no return */ [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()] ==================== fs/binfmt_aout.c 287 307 ==================== ⱘՓੑњDŽ៥Ӏ㒻㓁⊓ⴔ fs/binfmt_aout.c ᕔϟⳟ˄Ԛᰃ䏇䖛䩜ᇍ sparc ໘⧚఼ⱘᴵӊ㓪䆥˅˖ 329 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䗮䖛 do_brk()ЎℷܜˈЎ OMAGIC ᯊˈ㸼⼎䆹᭛ӊЁⱘৃᠻ㸠ҷⷕᑊ䴲Ā㒃ҷⷕāDŽᇍѢ䖭ḋⱘҷⷕ ࠡ䴶䆆䖛ˈa.out ḐᓣⳂᷛҷⷕЁⱘ magic number 㸼⼎ⴔҷⷕⱘ⡍ᗻˈ៪㗙䇈㉏ൟDŽᔧ magic number 351 } else { 350 flush_icache_range(text_addr, text_addr+ex.a_text+ex.a_data); 349 348 } 347 return error; 346 send_sig(SIGKILL, current, 0); 345 if (error < 0) { 344 ex.a_text+ex.a_data, &pos); 343 error = bprm•>file•>f_op•>read(bprm•>file, (char *)text_addr, 342 341 } 340 return error; 339 send_sig(SIGKILL, current, 0); 338 if (error != (text_addr & PAGE_MASK)) { 337 error = do_brk(text_addr & PAGE_MASK, map_size); 336 335 #endif 334 map_size = ex.a_text+ex.a_data; 333 pos = 32; 332 #else 331 map_size = ex.a_text+ex.a_data + PAGE_SIZE • 1; 330 pos = fd_offset; 329 #if defined(__alpha__) || defined(__sparc__) 328 327 text_addr = N_TXTADDR(ex); 326 325 loff_t pos; 324 unsigned long text_addr, map_size; 323 if (N_MAGIC(ex) == OMAGIC) { 322 321 #endif ==================== fs/binfmt_aout.c 321 351 ==================== 309 if (N_MAGIC(ex) == NMAGIC) { 308 #ifdef __sparc__ [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()] ==================== fs/binfmt_aout.c 308 309 ==================== Ѣ⡍⅞ a.out Ḑᓣৃᠻ㸠ҷⷕⱘ⡍ᗻњ˄fs/binfmt_aout.c˅˖އ᥹ϟᴹˈህপ Ёⱘݙᆍ੠ᔧࠡⱘᴗ䰤⹂ᅮⱘDŽ݊ҷⷕ೼ exec.c Ёˈ䇏㗙ৃ㞾㸠䯙䇏DŽ ✊ৢˈ䗮䖛 compute_creds()⹂ᅮ䖯⿟೼ᓔྟᠻ㸠ᮄⱘⳂᷛҷⷕҹৢ᠔݋᳝ⱘᴗ䰤ˈ䖭ᰃḍ᥂ bprm 䜡ⱘݙᄬĀේāҹঞ⫼᠋ぎ䯈ⱘේᷜњDŽ Ѣ݋ԧⱘḐᓣDŽℷ᭛↉Ϟ䴶ᰃ᭄᥂↉˗✊ৢᰃ bss ↉ˈ䙷ህᰃϡࡴ߱ྟ࣪ⱘ᭄᥂↉DŽݡᕔϞህᰃࡼᗕߚ އৃ㾕ˈ㺙ܹݙᄬҹৢⱘ⿟ᑣ᯴䈵Ңℷ᭛↉˄ҷⷕ↉˅ᓔྟˈ݊䍋ྟഄഔЎ 0 ៪ PAGE_SIZEˈপ endif# 154 330 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 387 if (error != N_TXTADDR(ex)) { 386 385 up(¤t•>mm•>mmap_sem); 384 fd_offset); 383 MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE, 382 PROT_READ | PROT_EXEC, 381 error = do_mmap(bprm•>file, N_TXTADDR(ex), ex.a_text, 380 down(¤t•>mm•>mmap_sem); 379 378 } 377 goto beyond_if; 376 ex.a_text+ex.a_data); 375 (unsigned long) N_TXTADDR(ex) + 374 flush_icache_range((unsigned long) N_TXTADDR(ex), 373 ex.a_text+ex.a_data, &pos); 372 bprm•>file•>f_op•>read(bprm•>file,(char *)N_TXTADDR(ex), 371 do_brk(N_TXTADDR(ex), ex.a_text+ex.a_data); 370 loff_t pos = fd_offset; 369 if (!bprm•>file•>f_op•>mmap||((fd_offset & ~PAGE_MASK) != 0)) { 368 367 } 366 error_time = jiffies; 365 bprm•>file•>f_dentry•>d_name.name); 364 "fd_offset is not page aligned. Please convert program: %s\n", 363 printk(KERN_WARNING 362 { 361 (jiffies•error_time) > 5*HZ) 360 if ((fd_offset & ~PAGE_MASK) != 0 && 359 358 } 357 error_time2 = jiffies; 356 printk(KERN_NOTICE "executable not page aligned\n"); 355 { 354 (N_MAGIC(ex) != NMAGIC) && (jiffies•error_time2) > 5*HZ) 353 if ((ex.a_text & 0xfff || ex.a_data & 0xfff) && 352 static unsigned long error_time, error_time2; 351 } else { [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()] ==================== fs/binfmt_aout.c 351 402 ==================== ৃᰃˈབᵰϡᰃ OMAGIC ㉏ൟਸ਼˛䇋᥹ⴔᕔϟⳟ˄fs/binfmt_aout.c˅˖ OMAGIC ㉏ൟⱘ a.out ৃᠻ㸠᭛ӊ㗠㿔ˈ㺙ܹ⿟ᑣⱘᎹ԰ህ෎ᴀᅠ៤њDŽ Ўϔぎ䇁হDŽ㟇Ѣ bss ↉ˈ߭᮴䳔Ң᭛ӊ䇏ܹˈা㽕ߚ䜡ぎ䯈ህৃҹњˈ᠔ҹᬒ೼ৢ䴶ݡ໘⧚DŽᇍѢ Ңഄഔ 0 ᓔྟⱘഄᮍˈ䇏ܹⱘᘏ䭓ᑺЎ ex.a_text+ex.a_dataDŽᇍѢ i386 CPU㗠㿔ˈflush_icache_range() 䖭䞠ህϡ䞡໡њDŽϡ䖛㽕ᣛߎˈ䇏ܹҷⷕᯊᰃҢ᭛ӊЁԡ⿏Ў 32 ⱘഄᮍᓔྟˈ䇏ܹࠄ䖯⿟⫼᠋ぎ䯈Ё ゴЁҟ㒡䖛ˈ㗠Ң᭛ӊ䇏ܹ߭೼Ā᭛ӊ㋏㒳ā੠Āഫ䆒໛偅ࡼāϸゴЁ᳝䆺㒚ভ䗄ˈ䇏㗙ৃҹখ䯙ˈ ᭛↉੠᭄᥂↉ড়೼ϔ䍋ߚ䜡ぎ䯈ˈ✊ৢህᡞ䖭ϸ䚼ߚҢ᭛ӊЁ䇏䖯ᴹDŽߑ᭄ do_brk()៥ӀᏆ㒣೼㄀ 2 331 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 414 413 } 412 return retval; 411 send_sig(SIGKILL, current, 0); 410 /* Someone check•me: is this error path enough? */ 409 if (retval < 0) { 408 retval = setup_arg_pages(bprm); 407 406 set_brk(current•>mm•>start_brk, current•>mm•>brk); 405 404 set_binfmt(&aout_format); 403 beyond_if: [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()] ==================== fs/binfmt_aout.c 403 416 ==================== 㟇ℸˈℷ᭛↉੠᭄᥂↉䛑Ꮖ㒣㺙ܹህ㒾њˈ᥹ϟᴹህᰃ bss ↉੠ේᷜ↉њ˄fs/binfmt_aout.c˅˖ ࠡ᮴䳔ߚ䜡ぎ䯈ˈ䙷Ꮖ㒣ࣙ৿೼ mmap()ПЁњDŽ ᇚ᭛ӊⱘℷ᭛↉੠᭄᥂↉᯴ᇘࠄ䖯⿟ⱘ⫼᠋ぎ䯈Ёˈ᯴ᇘⱘഄഔ߭Ϣ㺙ܹⱘഄഔϔ㟈DŽ䇗⫼ mmap()П N_TXTADDR(ex)ˈ䭓ᑺЎϸ↉ⱘᘏ੠DŽབᵰ⒵䎇᯴ᇘⱘᴵӊˈ䙷ህ᳈དњˈ䙷ህ䗮䖛 do_mmap()ߚ߿ ᰃҢ᭛ӊЁԡ⿏Ў fd_offsetˈे N_TXTOFF(ex)ⱘഄᮍᓔྟˈ䇏ܹࠄ⬅᭛ӊⱘ༈䚼᠔ᣛᅮⱘഄഔ ᇍ唤DŽབᵰϡ⒵䎇᯴ᇘⱘᴵӊˈህߚ䜡ぎ䯈ᑊϨᇚℷ᭛↉੠᭄᥂↉ϔ䍋䇏ܹ㟇䖯⿟ⱘ⫼᠋ぎ䯈ˈ䖭⃵ կ mmapǃህᰃᇚϔϾᏆᠧᓔ᭛ӊ᯴ᇘࠄ㰮ᄬぎ䯈ⱘ᪡԰ˈҹঞℷ᭛↉ঞ᭄᥂↉ⱘ䭓ᑺᰃ৺Ϣ义䴶໻ᇣ Ѣ݋ԧⱘ᭛ӊ㋏㒳ᰃ৺ᦤއᗕব䞣 error_time2ˈՓ䄺ਞֵᙃП䯈ⱘ䯈䱨ϡᇣѢ 5 ⾦DŽ᥹ϟᴹⱘ᪡԰প ⦄≵᳝ᇍ唤ህ㽕䗮䖛 printk()থߎ䄺ਞֵᙃDŽԚᰃˈথߎ䄺ਞֵᙃ໾乥㐕гϡདˈ᠔ҹህ䆒㕂њϔϾ䴭 ೼䖭ϝ⾡㉏ൟⱘৃᠻ㸠᭛ӊЁˈ䰸 NMAGIC ҹ໪䛑㽕∖ℷ᭛↉ঞ᭄᥂↉ⱘ䭓ᑺϢ义䴶໻ᇣᇍ唤DŽབথ ᠔ҹˈݙḌᑆ㛚ᇚৃᠻ㸠᭛ӊ᯴ᇘࠄњ䖯⿟ⱘ⫼᠋ぎ䯈Ёˈ䖭ḋ䖲䗮ᐌ swap ᠔䳔ⱘⲬϞぎ䯈гⳕএњDŽ ऎЁDŽކবDŽ޵ᰃ㽕೼䖤㸠䖛⿟Ёᬍবݙᆍⱘϰ㽓䛑೼ේᷜЁ˄ሔ䚼ব䞣˅ˈ㽕ϡ✊ህ೼ࡼᗕߚ䜡ⱘ㓧 ҷⷕDŽℸ㉏ҷⷕЁˈϡԚ݊ℷ᭛↉ⱘᠻ㸠ҷⷕ೼䖤㸠ᯊϡӮᬍবˈ᭄݊᥂↉ⱘݙᆍгϡӮ೼䖤㸠ᯊᬍ ೼ a.out Ḑᓣⱘৃᠻ㸠᭛ӊЁˈ䰸 OMAGIC ҹ໪݊ᅗϝ⾡ഛЎ㒃ҷⷕ˗гህᰃ᠔䇧ⱘĀৃ䞡ܹ” 402 } 401 } 400 return error; 399 send_sig(SIGKILL, current, 0); 398 if (error != N_DATADDR(ex)) { 397 up(¤t•>mm•>mmap_sem); 396 fd_offset + ex.a_text); 395 MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE, 394 PROT_READ | PROT_WRITE | PROT_EXEC, 393 error = do_mmap(bprm•>file, N_DATADDR(ex), ex.a_data, 392 down(¤t•>mm•>mmap_sem); 391 390 } 389 return error; send_sig(SIGKILL, current, 0); 388 332 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 297 if (bprm•>loader) 296 bprm•>p += stack_base; 295 294 stack_base = STACK_TOP • MAX_ARG_PAGES*PAGE_SIZE; 293 292 int i; 291 struct vm_area_struct *mpnt; 290 unsigned long stack_base; 289 { 288 int setup_arg_pages(struct linux_binprm *bprm) [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>setup_arg_pages()] ==================== fs/exec.c 288 332 ==================== ऴⱘ⠽⧚义䴶Ϣℸ㰮ᄬऎ䯈ᓎゟ䍋᯴ᇘDŽ䖭ᰃ⬅ setup_arg_pages()ᅠ៤ⱘˈ݊ҷⷕ೼ fs/exec.c Ё˖ ᥹ⴔˈ䖬㽕೼⫼᠋ぎ䯈ⱘේᷜऎ乊䚼Ў䖯⿟ᓎゟ䍋ϔϾ㰮ᄬऎ䯈ˈᑊᇚᠻ㸠খ᭄ҹঞ⦃๗ব䞣᠔ 䇏㗙೼㄀ 2 ゴЁ䇏䖛 do_brk()ⱘҷⷕˈᑨ䆹⧚㾷ЎҔМ bss ↉Ёݙᆍⱘ߱ྟؐЎܼ 0DŽ 85 unsigned long dump_start, dump_size; 84 int has_dumped = 0; 83 mm_segment_t fs; 82 { 81 static int aout_core_dump(long signr, struct pt_regs * regs, struct file *file) 80 79 */ 78 * dumping of the process results in another error.. [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>set_brk()] ==================== fs/binfmt_aout.c 42 49 ==================== ˄fs/binfmt_aout.c˅˖ ߑ᭄ set_brk()Ўৃᠻ㸠ҷⷕⱘ bss ↉ߚ䜡ぎ䯈ᑊᓎゟ䍋义䴶᯴ᇘˈ݊ҷⷕ೼ৠϔ᭛ӊЁ 㸠䇁হˈ䙷ህᰃ䆒㕂 current•>binfmtDŽ བᵰᔧࠡ䖯⿟ॳᴹᠻ㸠ⱘҷⷕḐᓣϢᮄⱘҷⷕḐᓣ䛑ϡᰃ⬅ৃᅝ㺙῵ഫᬃᣕˈ߭ᅲ䰙ϟা࠽ϟϔ 916 } 915 __MOD_DEC_USE_COUNT(old•>module); 914 if (old && old•>module) 913 current•>binfmt = new; 912 __MOD_INC_USE_COUNT(new•>module); 911 if (new && new•>module) 910 struct linux_binfmt *old = current•>binfmt; 909 { 908 void set_binfmt(struct linux_binfmt *new) [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>set_binfmt()] ==================== fs/exec.c 908 916 ==================== ߑ᭄ set_binfmt()ⱘ᪡԰ᕜㅔऩ˄fs/exec.c˅˖ 416 (unsigned long) create_aout_tables((char *) bprm•>p, bprm); current•>mm•>start_stack = 415 333 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ==================== fs/binfmt_aout.c 187 200 ==================== create_aout_tables()ᅠ៤ⱘˈ݊ҷⷕг೼ৠϔ᭛ӊ˄fs/binfmt_aout.c˅Ё˖ ໪ˈ䖬㽕ᇚֱᄬⴔⱘ˄ᄫヺІᔶᓣⱘ˅খ᭄੠⦃๗ব䞣໡ࠊࠄ⫼᠋ぎ䯈ⱘ乊ッDŽ䖭䛑ᰃ⬅ 䞢āПݙ㗠ᏆDŽ᠔ҹˈ⫼᠋ぎ䯈ේᷜЁҢϔᓔྟህ㽕䆒㕂དϝ乍᭄᥂ˈे envp[]ǃargv[]ҹঞ argcDŽℸ ໻ᇣDŽԚᰃᅲ䰙Ϟ䖬᳝Ͼ䱤㮣ⴔⱘᄫヺᣛ䩜᭄㒘 envp[]⫼ᴹӴ䗦⦃๗ব䞣ˈাᰃϡ೼⫼᠋⿟ᑣⱘĀ㾚 ষ䛑ᰃ main()ˈ㗠 main ᳝ϸϾখ᭄ argc ੠ argv[]DŽ݊Ёখ᭄ argv[]ᰃᄫヺᣛ䩜᭄㒘ˈargc ߭Ў᭄㒘ⱘ ✊ৢˈ೼䖭ѯ义䴶ⱘϟᮍˈህᰃ䖛⿟ⱘ⫼᠋ぎ䯈ේᷜњDŽ঺ϔᮍ䴶ˈ໻ᆊⶹ䘧ӏԩ⫼᠋⿟ᑣⱘܹ Ѣ䖭ѯᠻ㸠খ᭄੠⦃๗ব䞣ⱘ᭄䞣DŽއMAX_ARG_PAGESˈ㗠ᅲ䰙᯴ᇘⱘ义䴶᭄䞣߭প ㋴䛑ᰃϔϾ义䴶DŽ᭄㒘ⱘ໻ᇣЎܗ˄0xC0000000˅DŽේᷜऎⱘ乊䚼ЎϔϾ᭄㒘ˈ᭄㒘Ёⱘ↣ϔϾ 䖯⿟ⱘ⫼᠋ぎ䯈Ёഄഔ᳔催໘Ўේᷜऎˈ䖭䞠ⱘᐌ᭄ STACK_TOP ህᰃ TASK_SIZEˈгህᰃ 3GB 332 } 331 return 0; 330 329 up(¤t•>mm•>mmap_sem); 328 } 327 stack_base += PAGE_SIZE; 326 } 325 put_dirty_page(current,page,stack_base); 324 current•>mm•>rss++; 323 bprm•>page[i] = NULL; 322 if (page) { 321 struct page *page = bprm•>page[i]; 320 for (i = 0 ; i < MAX_ARG_PAGES ; i++) { 319 318 } 317 current•>mm•>total_vm = (mpnt•>vm_end • mpnt•>vm_start) >> PAGE_SHIFT; 316 insert_vm_struct(current•>mm, mpnt); 315 mpnt•>vm_private_data = (void *) 0; 314 mpnt•>vm_file = NULL; 313 mpnt•>vm_pgoff = 0; 312 mpnt•>vm_ops = NULL; 311 mpnt•>vm_flags = VM_STACK_FLAGS; 310 mpnt•>vm_page_prot = PAGE_COPY; 309 mpnt•>vm_end = STACK_TOP; 308 mpnt•>vm_start = PAGE_MASK & (unsigned long) bprm•>p; 307 mpnt•>vm_mm = current•>mm; 306 { 305 down(¤t•>mm•>mmap_sem); 304 303 return •ENOMEM; 302 if (!mpnt) 301 mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); 300 299 bprm•>exec += stack_base; bprm•>loader += stack_base; 298 334 335 [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()>create_aout_table()] 187 /* 188 * create_aout_tables() parses the env• and arg•strings in new user 189 * memory and creates the pointer tables from them, and puts their 190 * addresses on the "stack", returning the new stack pointer value. 191 */ 192 static unsigned long * create_aout_tables(char * p, struct linux_binprm * bprm) 193 { 194 char **argv, **envp; 195 unsigned long * sp; 196 int argc = bprm•>argc; 197 int envc = bprm•>envc; 198 199 sp = (unsigned long *) ((•(unsigned long)sizeof(char *)) & (unsigned long) p); 200 #ifdef __sparc__ ==================== fs/binfmt_aout.c 204 205 ==================== 204 #endif 205 #ifdef __alpha__ ==================== fs/binfmt_aout.c 217 247 ==================== 217 #endif 218 sp •= envc+1; 219 envp = (char **) sp; 220 sp •= argc+1; 221 argv = (char **) sp; 222 #if defined(__i386__) || defined(__mc68000__) || defined(__arm__) 223 put_user((unsigned long) envp,••sp); 224 put_user((unsigned long) argv,••sp); 225 #endif 226 put_user(argc,••sp); 227 current•>mm•>arg_start = (unsigned long) p; 228 while (argc••>0) { 229 char c; 230 put_user(p,argv++); 231 do { 232 get_user(c,p++); 233 } while (c); 234 } 235 put_user(NULL,argv); 236 current•>mm•>arg_end = current•>mm•>env_start = (unsigned long) p; 237 while (envc••>0) { 238 char c; 239 put_user(p,envp++); 240 do { 241 get_user(c,p++); 242 } while (c); 243 } 244 put_user(NULL,envp); 245 current•>mm•>env_end = (unsigned long) p; 246 return sp; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 336 247 } 䇏㗙ᑨ䆹㛑ⳟᯢⱑˈ䖭ᰃ೼ේᷜⱘ乊ッᵘㄥ envp[]ǃargv[]੠ argcDŽ䇋䇏㗙⊼ᛣⳟϔϟ䖭↉ҷⷕЁ ⱘ 228 㟇 234 㸠˄ҹঞ 237 㟇 243 㸠˅ˈ✊ৢಲㄨϔϾ䯂乬˖ЎҔМᰃ get_user(c, ptt)㗠ϡᰃ get_user(&c, ptt)˛ҹࠡ៥Ӏ᳒㒣䆆䖛ˈget_user()ᰃϔ↉乛݋ᣥ៬ᗻⱘҷⷕˈᑊᓎ䆂䇏㗙㞾㸠䯙䇏DŽ⦄೼ㅔ㽕ഄҟ㒡 ϔϟˈⳟⳟԴᰃ৺䇏ពњDŽ䖭ᰃ೼ include/asm•i386/uaccess.h ЁᅮНⱘϔϾᅣᅮН˖ ==================== include/asm•i386/uaccess.h 89 124 ==================== 89 /* 90 * These are the main single•value transfer routines. They automatically 91 * use the right size if we just have the right pointer type. 92 * 93 * This gets kind of ugly. We want to return _two_ values in "get_user()" 94 * and yet we don't want to do any pointers, because that is too much 95 * of a performance impact. Thus we have a few rather ugly macros here, 96 * and hide all the uglyness from the user. 97 * 98 * The "__xxx" versions of the user access functions are versions that 99 * do not verify the address space, that must have been done previously 100 * with a separate "access_ok()" call (this is used when we do multiple 101 * accesses to the same area of user memory). 102 */ 103 104 extern void __get_user_1(void); 105 extern void __get_user_2(void); 106 extern void __get_user_4(void); 107 108 #define __get_user_x(size,ret,x,ptr) \ 109 __asm__ __volatile__("call __get_user_" #size \ 110 :"=a" (ret),"=d" (x) \ 111 :"0" (ptr)) 112 113 /* Careful: we have to cast the result to the type of the pointer for sign reasons */ 114 #define get_user(x,ptr) \ 115 ({ int __ret_gu,__val_gu; \ 116 switch(sizeof (*(ptr))) { \ 117 case 1: __get_user_x(1,__ret_gu,__val_gu,ptr); break; \ 118 case 2: __get_user_x(2,__ret_gu,__val_gu,ptr); break; \ 119 case 4: __get_user_x(4,__ret_gu,__val_gu,ptr); break; \ 120 default: __get_user_x(X,__ret_gu,__val_gu,ptr); break; \ 121 }\ 122 (x) = (__typeof__(*(ptr)))__val_gu; \ 123 __ret_gu; \ 124 }) ܜⳟϔϟ 122 㸠ˈᅗಲㄨњЎҔМᓩ⫼ᯊⱘ㄀ϔϾখ᭄ᰃ c 㗠ϡᰃ&c ⱘ䯂乬DŽ݊⃵ˈ㒣䖛 gcc ⱘ 乘໘⧚ҹৢˈ__get_user_x()ህব៤__get_user_1()ˈ__get_user_2()៪__get_user_4()ˈߚ߿⫼ѢҢ⫼᠋ぎ 䯈䇏পϔϾᄫ㡖ǃϔϾⷁᭈ᭄៪ϔϾ䭓ᭈ᭄DŽᅣ᪡԰ get_user ḍ᥂㄀ 2 Ͼখ᭄ⱘ㉏ൟ⹂ᅮⳂᷛⱘ໻ᇣ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! include/asm•i386/processor.h Ё˖ 䖭䞠া࠽ϟ᳔ৢϔϾ݇䬂ᗻⱘ᪡԰њˈ䙷ህᰃ start_thread() DŽ䖭ᰃϾᅣ᪡԰ˈᅮНѢ 424 } 423 return 0; 422 send_sig(SIGTRAP, current, 0); 421 if (current•>ptrace & PT_PTRACED) 420 start_thread(regs, ex.a_entry, current•>mm•>start_stack); 419 #endif 418 regs•>gp = ex.a_gpvalue; 417 #ifdef __alpha__ [sys_execve()>do_execve()>search_binary_handler()>load_aout_binary()] ==================== fs/binfmt_aout.c 417 424 ==================== ໛དDŽ៥Ӏݡ㒻㓁ᕔϟⳟ˄fs/binfmt_aout.c˅˖ ޚᔧ CPU Ң create_aout_tables()䖨ಲࠄ do_load_aout_binary()ᯊˈේᷜ乊ッⱘ argv[]੠ argc 䛑Ꮖ㒣 ঺ϔϾᅣ᪡԰ put_user()ϢℸⳌԐˈাᰃᮍ৥ⳌডDŽ 䖨ಲⱘߑ᭄ؐDŽ ϡ㛑䱣ᛣ᳈ᬍⱘDŽབᵰഄഔ≵᳝䍙ߎ㣗ೈህҢ⫼᠋ぎ䯈ᡞ݊ݙᆍ䇏ܹᆘᄬ఼ DXˈᑊᇚ EAX ⏙ 0 ԰Ў ԰Ўখ᭄Ӵ䖛ᴹⱘഄഔϡᕫ催Ѣ䖭ϾϞ䰤DŽ䖭г䇈ᯢˈᇍ task_struct 㒧ᵘⱘᅮН˄ᓔ༈޴Ͼ៤ߚ˅ᰃ ࠡ䖯⿟ⱘ task_struct 㒧ᵘᣛ䩜DŽ೼ task_struct 㒧ᵘЁԡ⿏ 12 ໘Ўᔧࠡ䖯⿟⫼᠋ぎ䯈ഄഔⱘϞ䰤ˈ᠔ҹ 䖭䞠ⱘ㄀ 30 ੠ 31 㸠ᇚᔧࠡ䖯⿟ⱘ㋏㒳ぎ䯈ේᷜᣛ䩜Ϣ 8KˈेϸϾ义䴶ⱘ䖍⬠ᇍ唤ˈҢ㗠পᕫᔧ 68 67 ret 66 movl $•14,%eax 65 xorl %edx,%edx 64 bad_get_user: ==================== arch/i386/lib/getuser.S 64 68 ==================== 36 ret 35 xorl %eax,%eax 34 1: movzbl (%eax),%edx 33 jae bad_get_user 32 cmpl addr_limit(%edx),%eax 31 andl $0xffffe000,%edx 30 movl %esp,%edx 29 __get_user_1: 28 .globl __get_user_1 27 .align 4 26 .text 25 24 addr_limit = 12 ==================== arch/i386/lib/getuser.S 24 36 ==================== ⷕ䛑೼ arch/i386/lib/getuser.S Ёˈҹ__get_user_1()Ў՟˖ 㗠䖨ಲᯊ EAX ЁЎ䖨ಲⱘߑ᭄ؐ˄ߎ䫭ҷⷕ˅ˈEDX ЁЎҢ⫼᠋ぎ䯈䇏䖛ᴹⱘ᭄ؐDŽ䖭޴Ͼߑ᭄ⱘҷ 㗠ߚ߿䇗⫼__get_user_1()ˈ__get_user_2()៪__get_user_4()DŽ䇗⫼ᯊⳂᷛഄഔ˄ptt˅೼ᆘᄬ఼ EAX Ё˗ 337 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 24 if ((bprm•>buf[0] != '#') || (bprm•>buf[1] != '!') || (bprm•>sh_bang)) 23 22 int retval; 21 char interp[BINPRM_BUF_SIZE]; 20 struct file *file; 19 char *cp, *i_name, *i_arg; 18 { 17 static int load_script(struct linux_binprm *bprm,struct pt_regs *regs) [sys_execve()>do_execve()>search_binary_handler()>load_script()] ==================== fs/binfmt_script.c 17 58 ==================== ӊⱘ㺙䕑ˈ䖭ᰃ⬅ load_script()ᅠ៤ⱘ˄fs/binfmt_script.c˅˖ /usr/bin/perl ㄝㄝˈৢ䴶䖬ৃҹ᳝খ᭄DŽԚᰃˈ㄀ϔ㸠ⱘ䭓ᑺϡᕫ䭓Ѣ 127 ϾᄫヺDŽ៥Ӏᴹⳟ Script ᭛ ҹࠡ៥Ӏᦤࠄ䖛ˈScript ᭛ӊⱘᓔ༈ⱘϾᄫヺᑨЎ“#!āˈ✊ৢᰃ㾷䞞⿟ᑣⱘ䏃ᕘৡˈབ/bin/shˈ 97 }; 96 NULL, THIS_MODULE, load_script, NULL, NULL, 0 95 struct linux_binfmt script_format = { ==================== fs/binfmt_script.c 95 97 ==================== 䆹↨䕗䕏ᵒњˈ᠔ҹ៥Ӏা԰ϔѯㅔ㽕ⱘᦤ⼎˄fs/binfmt_script.c˅˖ binfmt_script.c ЁDŽ⬅ѢᏆ㒣↨䕗䆺㒚ഄ䯙䇏њѠ䖯ࠊৃᠻ㸠᭛ӊⱘ໘⧚ˈ䇏㗙೼䯙䇏ϟ䴶ⱘҷⷕᯊᑨ ⦄೼ˈݡᴹㅔ㽕ഄⳟϔϟᄫヺᔶᓣⱘৃᠻ㸠᭛ӊ˄Ў shell 䖛⿟៪ perl ᭛ӊ˅ⱘᠻ㸠DŽ᳝݇ⱘҷⷕ䛑೼ ࠡ䴶ҟ㒡њ a.out Ḑᓣৃᠻ㸠᭛ӊⱘ㺙ܹ੠ᡩܹ䖤㸠䖛⿟ˈ៥Ӏᡞ䖭԰ЎѠ䖯ࠊৃᠻ㸠᭛ӊⱘҷ㸼DŽ 4.4.2 ᭛ᄫᔶᓣৃᠻ㸠᭛ӊⱘᠻ㸠 ৢгህ㒧ᴳњDŽᔧ CPU Ң㋏㒳䇗⫼䖨ಲࠄ⫼᠋ぎ䯈ᯊˈህӮҢ⬅ ex.a_entry ⹂ᅮⱘഄഔᓔྟᠻ㸠DŽ 㟇ℸˈৃᠻ㸠ҷⷕⱘ㺙ܹ੠ᡩܹ䖤㸠Ꮖ㒣ᅠ៤DŽ㗠 do_execve()೼䇗⫼њ search_binary_handler()ҹ ៥Ӏ᠔䳔㽕ⱘDŽ 䩜ᇚᰃ current•>mm•>start_stack˗㗠䖨ಲഄഔˈгህᰃ EIP ⱘݙᆍˈ߭ᇚᰃ ex.a_entryDŽᰒ✊ˈ䖭ℷᰃ ᔧ䖯⿟Ң㋏㒳䇗⫼䖨ಲᯊˈ䖭ѯ᭄ؐህӮ㹿Āᘶ໡āࠄ CPU ⱘ৘Ͼᆘᄬ఼ЁDŽ᠔ҹˈ䙷ᯊ׭ⱘේᷜᣛ 䇏㗙ᇍ䖭䞠ⱘ regs ᣛ䩜Ꮖ㒣ᕜ❳ᙝˈᅗᣛ৥ֱ⬭೼ᔧࠡ䖯⿟㋏㒳ぎ䯈ේᷜЁⱘ৘Ͼᆘᄬ఼ࡃᴀDŽ 417 } while (0) 416 regs•>esp = new_esp; \ 415 regs•>eip = new_eip; \ 414 regs•>xcs = __USER_CS; \ 413 regs•>xss = __USER_DS; \ 412 regs•>xes = __USER_DS; \ 411 regs•>xds = __USER_DS; \ 410 set_fs(USER_DS); \ 409 __asm__("movl %0,%%fs ; movl %0,%%gs": :"r" (0)); \ 408 #define start_thread(regs, new_eip, new_esp) do { \ include/asm•i386/processor.h 408 417 ==================== ==================== 338 339 25 return •ENOEXEC; 26 /* 27 * This section does the #! interpretation. 28 * Sorta complicated, but hopefully it will work. •TYT 29 */ 30 31 bprm•>sh_bang++; 32 allow_write_access(bprm•>file); 33 fput(bprm•>file); 34 bprm•>file = NULL; 35 36 bprm•>buf[BINPRM_BUF_SIZE • 1] = '\0'; 37 if ((cp = strchr(bprm•>buf, '\n')) == NULL) 38 cp = bprm•>buf+BINPRM_BUF_SIZE•1; 39 *cp = '\0'; 40 while (cp > bprm•>buf) { 41 cp••; 42 if ((*cp == ' ') || (*cp == '\t')) 43 *cp = '\0'; 44 else 45 break; 46 } 47 for (cp = bprm•>buf+2; (*cp == ' ') || (*cp == '\t'); cp++); 48 if (*cp == '\0') 49 return •ENOEXEC; /* No interpreter name found */ 50 i_name = cp; 51 i_arg = 0; 52 for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++) 53 /* nothing */ ; 54 while ((*cp == ' ') || (*cp == '\t')) 55 *cp++ = '\0'; 56 if (*cp) 57 i_arg = cp; 58 strcpy (interp, i_name); ᕫࠄњ㾷䞞⿟ᑣⱘ䏃ᕘৡҹৢˈ䯂乬ህ䕀࣪៤њᇍ㾷䞞⿟ᑣⱘ㺙ܹˈ㗠 script ᭛ӊᴀ䑿߭䕀࣪៤њ 㾷䞞⿟ᑣⱘ䖤㸠খ᭄DŽ㱑✊ script ᭛ӊᴀ䑿ᑊϡᰃѠ䖯ࠊḐᓣⱘৃᠻ㸠᭛ӊˈ㾷䞞⿟ᑣⱘ᯴䈵ैᰃϔϾ Ѡ䖯ࠊⱘৃᠻ㸠᭛ӊDŽ䖬ᰃ೼ fs/binfmt_script.c ᭛ӊЁᕔϟⳟ˖ ==================== fs/binfmt_script.c 59 93 ==================== [sys_execve()>do_execve()>search_binary_handler()>load_script()] 59 /* 60 * OK, we've parsed out the interpreter name and 61 * (optional) argument. 62 * Splice in (1) the interpreter's name for argv[0] 63 * (2) (optional) argument to interpreter 64 * (3) filename of shell script (replace argv[0]) 65 * 66 * This is done in reverse order, because of how the Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 340 67 * user environment and arguments are stored. 68 */ 69 remove_arg_zero(bprm); 70 retval = copy_strings_kernel(1, &bprm•>filename, bprm); 71 if (retval < 0) return retval; 72 bprm•>argc++; 73 if (i_arg) { 74 retval = copy_strings_kernel(1, &i_arg, bprm); 75 if (retval < 0) return retval; 76 bprm•>argc++; 77 } 78 retval = copy_strings_kernel(1, &i_name, bprm); 79 if (retval) return retval; 80 bprm•>argc++; 81 /* 82 * OK, now restart the process with the interpreter's dentry. 83 */ 84 file = open_exec(interp); 85 if (IS_ERR(file)) 86 return PTR_ERR(file); 87 88 bprm•>file = file; 89 retval = prepare_binprm(bprm); 90 if (retval < 0) 91 return retval; 92 return search_binary_handler(bprm,regs); 93 } ৃ㾕ˈScript ᭛ӊⱘՓ⫼೼㺙ܹ䖤㸠ⱘ䖛⿟Ёᓩܹњ䗦ᔦᗻˈload_script() ᳔ৢজ䇗⫼ search_binary_handler()DŽϡㅵ䗦ᔦ᳝໮⏅ˈ᳔㒜ᠻ㸠ⱘϔᅮᰃϾѠ䖯ࠊৃᠻ㸠᭛ӊˈ՟བ/bin/shǃ /usr/bin/perl ㄝ㾷䞞⿟ᑣDŽ೼䗦ᔦⱘ䖛⿟Ёˈ䗤ሖⱘৃᠻ㸠᭛ӊ䏃ᕘৡᔶ៤ϔϾখ᭄ේᷜˈӴ䗦㒭᳔㒜 ⱘ㾷䞞⿟ᑣDŽ 4.5 ㋏㒳䇗⫼ exit()Ϣ wait4() ㋏㒳䇗⫼ exit()Ϣ wait4()ⱘҷⷕ෎ᴀϞ䛑೼ kernel/exit.c Ёˈϟ䴶៥Ӏ೼ᓩ⫼ҷⷕᯊ޵ϡ⡍߿䇈ᯢ ߎ໘ⱘഛᴹ㞾䖭Ͼ᭛ӊDŽ ܜᴹⳟ exit()ⱘᅲ⦄˄kernel/exit.c˅˖ ==================== kernel/exit.c 482 485 ==================== 482 asmlinkage long sys_exit(int error_code) 483 { 484 do_exit((error_code&0xff)<<8); 485 } ᰒ✊ˈ݊ЏԧЎ do_exit()DŽܜⳟᅗⱘࠡञ䚼˖ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᣖܹњݙḌЁⱘᅮᯊ఼䯳߫DŽ⦄೼䖯⿟ेᇚ䗔ߎ㋏㒳ˈϔᴹᰃ䖭Ͼᅮᯊ఼Ꮖ㒣≵᳝њᄬ೼ⱘᖙ㽕ˈѠ ᅮ䗔ߎПࠡৃ㛑Ꮖ㒣䆒㕂њᅲᯊᅮᯊ఼ˈгህᰃᇚ݊ task_struct 㒧ᵘЁⱘ៤ਬ real_timerއ䖯⿟೼ 䖯⿟ⱘ pid ࡴҹẔᶹDŽ 䆌䗔ߎⱘˈ᠔ҹ᥹ⴔ㽕ᇍᔧࠡܕ1 ো䖯⿟ˈгህᰃĀぎ䕀ā˄idle˅䖯⿟੠Ā߱ྟ࣪ā˄init˅䖯⿟ˈᰃϡ া㽕ϡᰃ೼Ёᮁ᳡ࡵⱘϞϟ᭛Ёˈ䙷ህϔᅮᰃ೼ᶤϾ䖯⿟˄៪㒓䪕˅ⱘϞϟ᭛ЁњDŽԚᰃˈ0 ো䖯⿟੠ local_bh_count[__cpu]Ў䴲 0ˈህ䇈ᯢ CPU ℷ೼ᠻ㸠ᶤϾ bh ߑ᭄ˈ䖭г䎳Ёᮁ᳡ࡵ⿟ᑣϔḋDŽডПˈ ҹˈা㽕䖭Ͼ䅵఼᭄Ў䴲 0ˈህ䇈ᯢ CPU ೼ handle_IRQ_event() ЁDŽ㉏Ԑഄˈা㽕䅵఼᭄ 䅵఼᭄ local_irq_count[__cpu]DŽ᠔ޣߎষ໘৘᳝ϔϾߑ᭄䇗⫼ irq_enter()੠ irq_exit()ˈህߚ߿䗦๲੠䗦 ೼ऩ CPU ⱘ㋏㒳Ёˈ__cpu ϔᅮᰃ 0DŽ೼㄀ 3 ゴЁ䆆ࠄ䖛ߑ᭄ handle_IRQ_event()ˈ೼ܹ݊ষ໘੠ 25 (local_irq_count(__cpu) + local_bh_count(__cpu) != 0); }) 24 #define in_interrupt() ({ int __cpu = smp_processor_id(); \ 23 */ 22 * or hardware interrupt processing? 21 * Are we in an interrupt context? Either doing bottom half 20 /* ==================== include/asm•i386/hardirq.h 20 25 ==================== in_interrupt()˖ 䙷МˈᗢМⶹ䘧ᰃ৺೼Ёᮁ᳡ࡵ⿟ᑣЁਸ਼˛䅽៥Ӏᴹⳟⳟ೼ include/asm•i386/hardirq.h ЁᅮНⱘ ࡵ⿟ᑣЁ䇗⫼ⱘˈ䙷ህϔᅮᰃߎњ䯂乬DŽ 䗮䖛 in_interrupt()ᇍℸࡴҹẔᶹˈབথ⦄䖭ᰃ೼ᶤϾЁᮁ᳡ܜϡㅵᰃⳈ᥹䖬ᰃ䯈᥹䇗⫼DŽ᠔ҹˈ䖭䞠佪 ߎⱘⳂⱘDŽ঺ϔᮍ䴶ˈ᠔䇧 exitˈা᳝䖯⿟˄៪㒓⿟˅ᠡ䇜ᕫϞDŽЁᮁ᳡ࡵ⿟ᑣḍᴀህϡᑨ䆹䇗⫼ do_exit()ˈ Ң sys_exit()Ё䖨ಲˈҢ㗠гህϡӮҢ㋏㒳䇗⫼ exit()䖨ಲDŽгা᳝䖭ḋˈᠡ㛑䖒ࠄ“exitāˈेҢ㋏㒳䗔 ಴ˈ䇏㗙೼䇏њϟ䴶ⱘҷⷕҹৢህᯢⱑњDŽ䖭䞠াᣛߎˈ᮶✊ CPU ϡӮҢ do_exit()Ё䖨ಲˈгህϡӮ ⿟ህ೼Ё䗨ᇓ㒜ℷᆱˈϡӮҢ䖭Ͼߑ᭄䖨ಲDŽ᠔䇧ϡӮҢ䖭Ͼߑ᭄䖨ಲࠄᑩᰃᗢМಲџˈজᰃҔМॳ ᅮНЎ“/* */āˈ᠔ҹᇍ㓪䆥↿᮴ᕅડˈԚ䍋ࠄњᦤ䝦䇏㗙ⱘ԰⫼DŽCPU ೼䖯ܹ do_exit()ҹৢˈᔧࠡ䖯 ೼ߑ᭄ⱘ㉏ൟ void ࠡ䴶䖬᳝Ͼ䇈ᯢ NORET_TYPEDŽ೼ include/linux/kernel.h Ё NORET_TYPEˈܜ佪 433 432 del_timer_sync(&tsk•>real_timer); 431 tsk•>flags |= PF_EXITING; 430 panic("Attempted to kill init!"); 429 if (tsk•>pid == 1) 428 panic("Attempted to kill the idle task!"); 427 if (!tsk•>pid) 426 panic("Aiee, killing interrupt handler!"); 425 if (in_interrupt()) 424 423 struct task_struct *tsk = current; 422 { 421 NORET_TYPE void do_exit(long code) [sys_exit()>do_exit()] kernel/exit.c 421 433 ==================== ==================== 341 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ぎ䯈ǃᏆᠧᓔ᭛ӊǃᎹ԰Ⳃᔩǃֵো໘⧚㸼ㄝㄝDŽⳌᑨഄˈ䖭䞠ህ᳝ټҢ⠊䖯⿟Ā㒻ᡓāⱘ䌘⑤᳝ᄬ ৃᛇ㗠ⶹˈ䖯⿟೼㒧ᴳ⫳ੑ䗔ߎ㋏㒳Пࠡ㽕䞞ᬒ݊᠔᳝ⱘ䌘⑤DŽ៥Ӏ೼ࠡϔ㡖ⱘ do_fork()Ёⳟࠄ 472 } 471 goto fake_volatile; 470 */ 469 * not paranoid: it's just that everybody is out to get me. 468 * happens, and the schedule returns. This way we can try again. I'm 467 * the start of the function just in case something /really/ bad 466 * goto right after each other, but I put the fake_volatile label at 465 * In fact the natural way to do all this is to have the label and the 464 * 463 * returns. 462 * circumstances: when current•>state = ZOMBIE, schedule() never 461 * is volatile. In fact it's schedule() that is volatile in some 460 * I did this little loop that confuses gcc to think do_exit really 459 * In order to get rid of the "volatile function does return" message 458 /* 457 BUG(); 456 schedule(); 455 exit_notify(); 454 tsk•>exit_code = code; 453 452 __MOD_DEC_USE_COUNT(tsk•>binfmt•>module); 451 if (tsk•>binfmt && tsk•>binfmt•>module) 450 put_exec_domain(tsk•>exec_domain); 449 448 disassociate_ctty(1); 447 if (current•>leader) 446 445 exit_thread(); 444 exit_sighand(tsk); 443 __exit_fs(tsk); 442 __exit_files(tsk); 441 sem_exit(); 440 lock_kernel(); 439 438 __exit_mm(tsk); 437 #endif 436 acct_process(code); 435 #ifdef CONFIG_BSD_PROCESS_ACCT 434 fake_volatile: [sys_exit()>do_exit()] ==================== kernel/exit.c 434 472 ==================== 㒻㓁ᕔϟⳟ˄kernel/exit.c˅˖ ᇚᅗҢ䯳߫Ё㜅⾏DŽ᠔ҹˈ㽕䗮䖛 del_timer_sync()ᇚᔧࠡ䖯⿟Ңᅮᯊ఼䯳߫Ё㜅⾏ߎᴹDŽ ܜᴹ䖯⿟ⱘ task_struct 㒧ᵘ㸠ᇚ᩸䫔ˈ԰Ў݊៤ਬⱘ real_timer гᇚĀⲂПϡᄬˈ↯ᇚ⛝䰘āˈᔧ✊㽕 342 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 994 if(sma==NULL) 993 if (q•>prev) { 992 991 current•>semsleeping = NULL; 990 sma = sem_lock(semid); 989 int semid = q•>id; 988 if ((q = current•>semsleeping)) { 987 */ 986 * remove it from the queue. 985 /* If the current process was sleeping for a semaphore, 984 983 int nsems, i; 982 struct sem_array *sma; 981 struct sem_undo *u, *un = NULL, **up, **unp; 980 struct sem_queue *q; 979 { 978 void sem_exit (void) 977 */ 976 * and SVID should be consulted to determine what behavior is mandated. 975 * The current implementation does not do so. The POSIX standard 974 * The original implementation attempted to do this (queue and wait). 973 * should we queue up and wait until we can do so legally? 972 * manner or not. That is, if we are attempting to decrement the semval 971 * set of adjustments that needs to be done should be done in an atomic 970 * IMPLEMENTATION NOTE: There is some confusion over whether the 969 * so some of them may be out of date. 968 * undo structures are not freed when semaphore arrays are destroyed 967 * add semadj values to semaphores, free undo structures. 966 /* [sys_exit()>do_exit()>sem_exit()] ==================== ipc/sem.c 966 1041 ==================== 䖯ܹ䖭ѯ᭄᥂㒧ᵘⱘଃϔ䗨ᕘˈ᠔ҹᖙ乏ᡞᅗӀ䞞ᬒDŽߑ᭄ sem_exit()ⱘҷⷕ೼ ipc/sem.c Ё˖ ऎ˄sem_undo ᭄᥂㒧ᵘ੠ sem_queue ᭄᥂㒧ᵘˈ䆺㾕Ā䖯⿟䯈䗮ֵā˅DŽ㗠Ϩˈ䖭ϸϾᣛ䩜ህᰃކ䜡㓧 ⫼᠋ぎ䯈߯ᓎ੠Փ⫼ֵো䞣ᯊˈݙḌӮЎ䖯⿟ task_struct 㒧ᵘЁⱘϸϾᣛ䩜 semundo ੠ semsleeping ߚ ᥂㒧ᵘг䞞ᬒᥝˈ৺߭݊ᅗᣛ৥䖭Ͼ㒧ᵘⱘᣛ䩜ህ䛑Ā ぎāњDŽ݋ԧࠄ⫼᠋ぎ䯈ֵো䞣ˈᔧ䖯⿟೼ ⿟ⱘ p_pptr 㗠ߚ䜡ⱘˈ䖭Ͼ p_pptr ᑊϡᰃ䖯ܹ݊⠊䖯⿟ⱘ task_struct ⱘଃϔ䗨ᕘˈ᠔ҹϡ㛑ᡞ䖭Ͼ᭄ 䞞ᬒDŽ㗠ᣛ䩜 p_pptr ᣛ৥⠊䖯⿟ⱘ task_struct 㒧ᵘˈৃᰃ⠊䖯⿟ⱘ task_struct 㒧ᵘैᑊϡᰃϧ䮼Ўᄤ䖯 ⱘֵো໘⧚㸼ˈ䖭Ͼ㸼᠔ऴⱘぎ䯈ᰃϧЎ sig ߚ䜡ⱘˈᣛ䩜 sig ህᰃ䖯ܹ䖭Ͼ㸼ⱘଃϔ䗨ᕘˈ᠔ҹᖙ乏 ぎ䯈Ā⊘ⓣāDŽ՟བˈᣛ䩜 sig ᣛ৥䖯⿟ټⱘଃϔ䗨ᕘˈ䙷ህϔᅮ㽕ᡞᅗ䞞ᬒˈ৺߭ህӮ䗴៤ݙḌⱘᄬ ऎކऎˈ㗠Ϩ䖭Ͼᣛ䩜জᰃ䗮৥䖭Ͼ᭄᥂㒧ᵘ៪㓧ކ㸠䖛⿟Ё㽕Ў݊೼ݙḌЁߚ䜡ϔϾ᭄᥂㒧ᵘ៪㓧 ߭ˈህᰃⳟ task_struct ᭄᥂㒧ᵘЁⱘ৘Ͼ៤ߚˈབᵰϔϾ៤ߚᰃϾᣛ䩜ˈ೼䖯⿟߯ᓎᯊҹঞ䖤ޚऩⱘ Ѣ䖯⿟䯈䗮䆃ⱘ䌘⑤ˈབᵰ೼䇗⫼ exit()Пࠡ䖬ֵ᳝ো䞣ᇮ᳾᩸䫔ˈ䙷ህг㽕ᡞᅗ᩸䫔DŽ䖭䞠᳝ϔϾㅔ ҹ೼ do_fork()ЁϡӮⳟࠄˈ䙷ህᰃ䖯⿟೼⫼᠋ぎ䯈ᓎゟ੠Փ⫼ⱘĀֵো䞣ā˄semaphore˅DŽ䖭ᰃϔ⾡⫼ exit_mm()ǃ__exit_files()ǃ__exit_fs()ҹঞ__exit_sighand()DŽৃᰃˈ䖬᳝ϔ⾡䌘⑤ᰃϡĀ㒻ᡓāⱘˈ᠔__ 343 344 995 BUG(); 996 remove_from_queue(q•>sma,q); 997 } 998 if(sma!=NULL) 999 sem_unlock(semid); 1000 } 1001 1002 for (up = ¤t•>semundo; (u = *up); *up = u•>proc_next, kfree(u)) { 1003 int semid = u•>semid; 1004 if(semid == •1) 1005 continue; 1006 sma = sem_lock(semid); 1007 if (sma == NULL) 1008 continue; 1009 1010 if (u•>semid == •1) 1011 goto next_entry; 1012 1013 if (sem_checkid(sma,u•>semid)) 1014 goto next_entry; 1015 1016 /* remove u from the sma•>undo list */ 1017 for (unp = &sma•>undo; (un = *unp); unp = &un•>id_next) { 1018 if (u == un) 1019 goto found; 1020 } 1021 printk ("sem_exit undo list error id=%d\n", u•>semid); 1022 goto next_entry; 1023 found: 1024 *unp = un•>id_next; 1025 /* perform adjustments registered in u */ 1026 nsems = sma•>sem_nsems; 1027 for (i = 0; i < nsems; i++) { 1028 struct sem * sem = &sma•>sem_base[i]; 1029 sem•>semval += u•>semadj[i]; 1030 if (sem•>semval < 0) 1031 sem•>semval = 0; /* shouldn't happen */ 1032 sem•>sempid = current•>pid; 1033 } 1034 sma•>sem_otime = CURRENT_TIME; 1035 /* maybe some queued•up processes were waiting for this */ 1036 update_queue(sma); 1037 next_entry: 1038 sem_unlock(semid); 1039 } 1040 current•>semundo = NULL; 1041 } བᵰᔧࠡ䖛⿟ℷ೼˄ⴵ⳴˅ㄝᕙ䖯ܹᶤϾЈ⬠ऎˈ߭݊ task_struct 㒧ᵘЁⱘᣛ䩜 semsleeping ᣛ৥ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᄤ䖯⿟ⱘ task_struct ᭄᥂㒧ᵘЁ䖬᳝ϡᇥ᳝⫼ⱘ㒳䅵ֵᙃˈ䅽⠊䖯⿟ᴹ᭭⧚ৢџৃҹᇚ䖭ѯ㒳䅵ֵᙃ ೼ˈܜЎҔМ㽕䖭ḋᅝᥦˈ㗠ϡᰃ䅽ᔧࠡ䖯⿟ˈгህᰃᄤ䖯⿟㞾Ꮕ✻᭭ϔߛਸ਼˛᳝ϸϾॳ಴DŽ佪 㒜ࠡ⊼䫔㞾Ꮕⱘ᠋ষϔḋˈ㗠ᰃ䇗⫼ exit_notify()䗮ⶹ݊⠊䖯⿟ˈ䅽⠊䖯⿟᭭⧚ৢџDŽ ҎӀ㞾Ꮕᑊϡ೼ЈڣϸϾ义䴶DŽҔМᯊ׭䞞ᬒ䖭ϸϾ义䴶ਸ਼˛ᔧࠡ䖯⿟㞾Ꮕᑊϡ䞞ᬒ䖭ϸϾ义䴶ˈህ Ԛᰃᔧࠡ䖯⿟ⱘ⅟傌ҡᮻऴ⫼ⴔ᳔Ԣ䰤ᑺⱘ䌘⑤ˈࣙᣀ݊ task_struct ᭄᥂㒧ᵘ੠㋏㒳ぎ䯈ේᷜ᠔೼ⱘ ᥹ⴔˈᔧࠡ䖯⿟ⱘ⢊ᗕህᬍ៤њ TASK_ZOMBIEˈ㸼⼎䖯⿟ⱘ⫳ੑᏆ㒣㒧ᴳˈҢℸϡݡ᥹ফ䇗ᑺDŽ exit_thread()ᰃϾぎߑ᭄DŽ ಲࠄ do_exit()ⱘҷⷕЁˈ݊ᅗ޴Ͼ⫼Ѣ䞞ᬒ䌘⑤ⱘߑ᭄䇏㗙ৃ㞾㸠䯙䇏DŽᇍѢ i386 ໘⧚఼ ᇚϔϾ䖯⿟ⱘ task_struct 㒧ᵘЁⱘᣛ䩜 mm ⏙៤ 0ˈ䖭Ͼ䖯⿟֓ϡݡ᳝⫼᠋ぎ䯈њDŽ ߫ߎˈ䖭䞠ϡݡ䞡໡DŽ mm_release()ˈ೼䖭Ͼֵো䞣Ϟᠻ㸠ϔ⃵ up()᪡԰૸䝦ⴵ⳴Ёⱘ⠊䖯⿟DŽ݊ҷⷕᏆ㒣೼ execve()ϔ㡖Ё ぎ䯈ᯊᠻ㸠䖭Ͼ᪡԰ˈ᠔ҹ䖭䞠㽕䗮䖛ټᠡ㛑ಲࠄ⫼᠋ぎ䯈䖤㸠ˈ㗠ᄤ䖯⿟ᖙ乏೼䞞ᬒ݊⫼᠋ᄬ ᯊᷛᖫԡ CLONE_VFORK Ў 1 ᯊˈ⠊䖯⿟೼ⴵ⳴ˈㄝᕙᄤ䖯⿟೼ϔϾֵো䞣Ϟᠻ㸠ϔ⃵ up()᪡԰ҹৢ 䖭䞠㽕ᦤ䝦䇏㗙ⱘᰃ䖭䞠ᇍ mm_relaese()ⱘ䇗⫼DŽ೼ fork()੠ execve()ϸ㡖Ёˈ䇏㗙Ꮖ㒣ⳟࠄˈᔧ do_fork() ぎ䯈䞞ᬒᰃ䇗⫼ mmput()ᅠ៤ⱘ˄ҷⷕ೼ fork.c Ё˅ˈ៥ӀᏆ೼ࠡϔ㡖Ё䇏䖛ᅗⱘҷⷕˈټᅲ䰙ⱘᄬ 316 } 315 } 314 mmput(mm); 313 enter_lazy_tlb(mm, current, smp_processor_id()); 312 task_unlock(tsk); 311 tsk•>mm = NULL; 310 task_lock(tsk); 309 /* more a memory barrier than a real lock */ 308 if (mm != tsk•>active_mm) BUG(); 307 atomic_inc(&mm•>mm_count); 306 if (mm) { 305 mm_release(); 304 303 struct mm_struct * mm = tsk•>mm; 302 { 301 static inline void __exit_mm(struct task_struct * tsk) 300 */ 299 * aren't already.. 298 * Turn us into a lazy TLB process if we 297 /* [sys_exit()>do_exit()>__exit_mm()] ==================== kernel/exit.c 297 316 ==================== ݡⳟ__exit_mm()ⱘҷⷕ˄kernel/exit.c˅˖ ݇ݙᆍৢݡಲ䖛ᴹ㞾Ꮕ䇏ϔϟ䖭↉ҷⷕDŽ 㒣᩸䫔ˈЈ⬠ऎᏆ㒣㽕Ā⏙എāᑊĀ݇䮼໻ঢ়āˈ໻ᆊ䇋ಲ৻DŽᓎ䆂䇏㗙೼ᄺдњĀ䖯⿟䯈䗮ֵāⱘ᳝ ᭭⧚䙷ѯℷ೼⬅ᔧࠡ䖛⿟᠔߯ᓎⱘ⫼᠋ぎ䯈ֵো䞣˄ेЈ⬠ऎ˅Ϟ᪡԰ⱘ䖛⿟ˈਞ䆝ᅗӀ˖ֵো䞣Ꮖ ᠔೼ⱘ䯳߫DŽᰒ✊ˈ⦄೼ϡ䳔㽕ݡㄝᕙњˈ᠔ҹᡞᔧࠡ䖛⿟Ң䖭Ͼ䯳߫Ё㜅䫒DŽ᥹ⴔᰃϔϾ for ᕾ⦃ˈ 345 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 153 static inline void forget_original_parent(struct task_struct * father) 152 */ 151 * the global child reaper process (ie "init") 150 * group, and if no such member exists, give it to 149 * Try to give them to another thread in our process 148 * When we die, we re•parent all our children. 147 /* [sys_exit()>do_exit()>exit_notify()>forget_original_parent()] ==================== kernel/exit.c 147 174 ==================== ׭ህ≵᳝⠊䖯⿟ᴹ᭭⧚ᅗӀⱘৢџњDŽ䖭ህᰃ 331 㸠䇗⫼ forget_original_parent()ⱘⳂⱘ˄kernel/exit.c˅DŽ 䰶āˈ㽕ϡ✊ࠄᅗӀг㽕 exit()ⱘᯊܓ⦄೼ˈᔧࠡ䖯⿟㽕 exit()њˈ᠔ҹ㽕ᇚ݊᠔᳝ⱘᄤ䖯⿟䛑䗕䖯Āᄸ 㒧ᵘⱘ义䴶䛑ᰃ೼਼䕀Փ⫼ⱘˈ᠔ҹᅲ䰙Ϟϔᴹᑊ≵ֱ᳝⬭䖭Ͼ䆄ᔩⱘᛣНˈѠᴹᡔᴃϞг᳝ೄ䲒DŽ ⬅ℸৃ㾕ˈ᠔䇧“original parentāгϡᰃ∌䖰ϡবⱘˈॳ಴೼Ѣ㋏㒳Ёⱘ䖯⿟ো pid ҹঞ⫼԰ task_struct 䰶DŽܓᰃᄸڣ⿟ⱘ p_opptr ᣛ৥䖭Ͼ㒓⿟DŽ৺߭ˈህাདᠬҬ㒭㋏㒳Ёⱘ init 䖯⿟ˈ᠔ҹ䖭 init 䖯⿟ህད ᶤϾ䖯⿟DŽᠬҬ㒭䇕ਸ਼˛བᵰᔧࠡ䖯⿟ᰃϔϾ㒓⿟ˈ䙷ህᠬҬ㒭ৠϔ㒓⿟㒘ЁⱘϟϔϾ㒓⿟ˈՓᄤ䖯 䩜ैϡবˈҡᮻᣛ৥݊⫳⠊DŽབᵰϔϾ䖯⿟೼݊ᄤ䖯⿟ПࠡĀএϪāⱘ䆱ˈህ㽕ᡞᅗⱘᄤ䖯⿟ᠬҬ㒭 āDŽ㗠㹿䎳䏾䖯⿟ⱘ p_opptr ᣛ⠊ݏ㕂៤ᣛ৥ℷ೼䎳䏾ᅗⱘ䖯⿟ˈ䙷Ͼ䖯⿟ህ᱖ᯊ៤њ㹿䎳䏾䖯⿟ⱘĀ থ⫳೼ϔϾ䖯⿟䗮䖛㋏㒳䇗⫼ ptrace()ᴹ䎳䏾঺ϔϾ䖯⿟ⱘᯊ׭ˈ䖭ᯊ׭㹿䎳䏾䖯⿟ⱘ p_pptr ᣛ䩜㹿䆒 ᰃϔ㟈ⱘˈ᠔ҹϸϾᣛ䩜ᣛ৥ৠϔϾ⠊䖯⿟DŽԚᰃˈ೼䖤㸠Ё p_pptr ৃҹ᱖ᯊഄᬍবDŽ䖭⾡ᬍব⠊ݏ DŽϔϾ䖯⿟೼߯ᓎП߱݊⫳⠊੠⠊ݏ৥݊“original parentāгे⫳⠊ˈ঺໪䖬᳝Ͼᣛ䩜 p_pptr ߭ᣛ৥ āПߚDŽ೼ task_struct 㒧ᵘЁ᳝Ͼᣛ䩜 p_opptr ᣛ⠊ݏҎϔḋˈ᠔䇧⠊䖯⿟г᳝Ā⫳⠊ā੠Āڣህ 331 forget_original_parent(current); 330 329 struct task_struct * p, *t; 328 { 327 static void exit_notify(void) 326 */ 325 * to properly mourn us.. 324 * Send signals to all our closest relatives so that they know 323 /* [sys_exit()>do_exit()>exit_notify()] ==================== kernel/exit.c 323 331 ==================== ᭄ exit_notify()ⱘ⑤ҷⷕ˖ њDŽ䅽៥Ӏᴹⳟⳟ exit.c ЁߑމѢՓ⿟ᑣㅔ࣪ˈ৺߭ⱘ䆱䇗ᑺ⿟ᑣ schedule()ህᕫ㽕໮㗗㰥ϔѯ⡍⅞ᚙ ⿟ᓔྟ䖤㸠Пৢᠡ㛑䞞ᬒDŽ䖭ḋˈ䅽⠊䖯⿟᭭⧚ৢџህᰃϔϾড়⧚ⱘᅝᥦњDŽℸ໪ˈ䖭ḋᅝᥦг᳝߽ 䖬᳝Āϡৃሣ㬑ЁᮁāઽDŽ᠔ҹˈᄤ䖯⿟ⱘ task_struct 㒧ᵘ੠㋏㒳ぎ䯈ේᷜᖙ乏㽕ֱᄬࠄ঺ϔϾ䖯މ 䱭Ёথ⫳ህӮ䗴៤䯂乬DŽ䆮✊ˈЁᮁᰃৃҹ݇䯁ⱘˈৃᰃᓖᐌैϡ㛑䗮䖛݇Ёᮁᴹ䰆ℶ݊থ⫳ˈ᳈ԩ ⱘ task_struct 㒧ᵘ੠㋏㒳ぎ䯈ේᷜ䞞ᬒˈ䙷ህӮ䗴៤ϔϾぎ䱭ˈབᵰᙄད᳝ϔ⃵Ёᮁ៪㗙ᓖᐌ೼ℸぎ ⧚⿟ᑣЁ䛑㽕⫼ࠄᔧࠡ䖯⿟ⱘ㋏㒳ぎ䯈ේᷜDŽབᵰᄤ䖯⿟೼㋏㒳䇗ᑺ঺ϔϾ䖯⿟ᡩܹ䖤㸠Пࠡህᡞᅗ ҹৢˈӏԩϔࠏ䛑䳔㽕᳝ϾĀᔧࠡ䖯⿟āᄬ೼DŽ䇏㗙೼㄀ 3 ゴЁⳟࠄњˈ೼Ёᮁ᳡ࡵ⿟ᑣҹঞᓖᐌ໘ ᑊܹ⠊䖯⿟ⱘ㒳䅵ֵᙃЁ㗠ϡӮՓ䖭ѯֵᙃ϶༅DŽ݊⃵ˈг䆌᳈䞡㽕ⱘᰃˈ㋏㒳ϔᮺ䖯ܹ໮䖯⿟⢊ᗕ 346 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 344 if ((t•>pgrp != current•>pgrp) && 343 342 t = current•>p_pptr; 341 340 */ 339 * is about to become orphaned. 338 * and we were the only connection outside, so our pgrp 337 * Case i: Our father is in a different pgrp than we are 336 * 335 * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) 334 * as a result of our exiting, and if they have any stopped 333 * Check to see if any process groups have become orphaned 332 /* [sys_exit()>do_exit()>exit_notify()] ==================== kernel/exit.c 332 419 ==================== ᔧࠡ䖯⿟ⱘĀ⊩ᅮⲥᡸҎāˈᡂⓨⴔ᳈Ў䞡㽕ⱘ㾦㡆˄kernel/exit.c˅˖ ᰃڣā䖯⿟њDŽ䖭Ͼ⠊䖯⿟ህད⠊ݏಲࠄ exit_notify()Ёˈϟ䴶ህᴹ໘⧚⬅ᣛ䩜 p_pptr ᠔ᣛ৥ⱘĀ ࠡ䖯⿟ⱘ task_struet 㒧ᵘЁⱘ pdeath_signal ⱘ䆒㕂৥݊থϔϾֵোˈਞⶹ⫳⠊ⱘĀఽ㗫āDŽ ᣛ৥ child_reaperˈे init 䖯⿟ˈᑊఅ݊ᇚᴹ exit()ᯊ㽕থϔϾ SIGCHLD ֵো㒭 child_reaperˈᑊḍ᥂ᔧ ህᰃ䇈ˈ᧰㋶᠔᳝ⱘ task_struct ᭄᥂㒧ᵘˈ޵থ⦄Ā⫳⠊āЎᔧࠡ䖯⿟㗙ህᇚ݊ p_opptr ᣛ䩜ᬍ៤ 825 for (p = &init_task ; (p = p•>next_task) != &init_task ; ) 824 #define for_each_task(p) \ ==================== include/linux/sched.h 824 825 ==================== 䖭↉⿟ᑣЁⱘ for_each_task ೼ include/linux/sched.h ЁᅮНЎ˖ 174 } 173 read_unlock(&tasklist_lock); 172 } 171 } 170 if (p•>pdeath_signal) send_sig(p•>pdeath_signal, p, 0); 169 p•>p_opptr = reaper; 168 p•>self_exec_id++; 167 p•>exit_signal = SIGCHLD; 166 /* We dont want people slaying init */ 165 if (p•>p_opptr == father) { 164 for_each_task(p) { 163 162 reaper = child_reaper; 161 if (reaper == father) 160 reaper = next_thread(father); 159 /* Next in our thread group */ 158 157 read_lock(&tasklist_lock); 156 struct task_struct * p, *reaper; 155 } 154 347 348 345 (t•>session == current•>session) && 346 will_become_orphaned_pgrp(current•>pgrp, current) && 347 has_stopped_jobs(current•>pgrp)) { 348 kill_pg(current•>pgrp,SIGHUP,1); 349 kill_pg(current•>pgrp,SIGCONT,1); 350 } 351 352 /* Let father know we died 353 * 354 * Thread signals are configurable, but you aren't going to use 355 * that to send signals to arbitary processes. 356 * That stops right now. 357 * 358 * If the parent exec id doesn't match the exec id we saved 359 * when we started then we know the parent has changed security 360 * domain. 361 * 362 * If our self_exec id doesn't match our parent_exec_id then 363 * we have changed execution domain as these two values started 364 * the same after a fork. 365 * 366 */ 367 368 if(current•>exit_signal != SIGCHLD && 369 ( current•>parent_exec_id != t•>self_exec_id || 370 current•>self_exec_id != current•>parent_exec_id) 371 && !capable(CAP_KILL)) 372 current•>exit_signal = SIGCHLD; 373 374 375 /* 376 * This loop does two things: 377 * 378 * A. Make init inherit all the child processes 379 * B. Check to see if any process groups have become orphaned 380 * as a result of our exiting, and if they have any stopped 381 * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) 382 */ 383 384 write_lock_irq(&tasklist_lock); 385 current•>state = TASK_ZOMBIE; 386 do_notify_parent(current, current•>exit_signal); 387 while (current•>p_cptr != NULL) { 388 p = current•>p_cptr; 389 current•>p_cptr = p•>p_osptr; 390 p•>p_ysptr = NULL; 391 p•>ptrace = 0; 392 393 p•>p_pptr = p•>p_opptr; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 734 */ 733 * Let a parent know about a status change of a child. 732 /* [sys_exit()>do_exit()>exit_notify()>do_notify_parent()] ==================== kernel/signal.c 732 777 ==================== ऩˈ䇏㗙ৃ㞾㸠䯙䇏˖ 㗠ᴹ᭭⧚ᄤ䖯⿟ⱘৢџˈ䖭ᰃ䗮䖛 do_notify_parent()ᴹᅠ៤ⱘDŽ݊ҷⷕ೼ kernel/signal.c Ёˈ⿟ᑣᕜㅔ ៥Ӏ䆆䖛ˈexit_notify()᳔Џ㽕ⱘⳂⱘᰃ㽕㒭⠊䖯⿟থϔϾֵোˈ䅽݊ⶹ䘧ᄤ䖯⿟ⱘ⫳ੑᏆ㒣㒧ᴳ থϔϾ SIGHUP ֵোˈ✊ৢݡথϔϾ SIGCONT ֵোˈ䖭ᰃ⬅ kill_pg()ᅠ៤ⱘDŽܜ ϟˈᣝ POSIX 3.2.2.2 ⱘ㾘ᅮ㽕㒭䖭Ͼ䖯⿟㒘Ё᠔᳝ⱘ䖯⿟䛑މāDŽ೼䖭ḋⱘᚙܓ䖭ᭈϾ㒘ህ៤њĀᄸ sessionˈϡৠⱘ㒘ˈৠᯊজᰃ݊᠔೼ⱘ㒘Ϣ݊⠊䖯⿟П䯈ᚳϔⱘ㒑ᏺˈ䙷Мϔᮺᔧࠡ䖯⿟ϡᄬ೼ҹৢˈ ᳔ᮽ߯ᓎⱘ䖯⿟ˈ䖭Ͼ䖯⿟ⱘ pid ህ៤Ў session ੠䖯⿟㒘ⱘҷোDŽབᵰᔧࠡ䖯⿟Ϣ⠊䖯⿟ሲѢϡৠⱘ 䖭ѯ䖯⿟ᔶ៤ϔϾĀ㒘”(session Ϣ㒘ᰃϸϾϡৠⱘὖᗉ)DŽ↣Ͼ session ៪䖯⿟㒘Ё䛑᳝ϔϾЎЏⱘǃ ೼ৠϔᴵ shell ੑҸ៪ᠻ㸠⿟ᑣЁਃࡼ໮Ͼ䖯⿟ˈ՟བ೼ੑҸ“ls | wc •lāЁህৠᯊਃࡼњϸϾ䖯⿟ˈ ˄៪⫼ᴹ῵ᢳϔϾ㒜ッⱘにষ˅DŽ䖭ѯՓ⫼ৠϔϾ᥻ࠊ㒜ッⱘ䖯⿟ሲѢৠϔϾ sessionDŽℸ໪ˈ⫼᠋ৃҹ ϔϾ⫼᠋ login ࠄ㋏㒳Ёҹৢˈৃ㛑Ӯਃࡼ䆌໮ϡৠⱘ䖯⿟ˈ᠔᳝䖭ѯ䖯⿟䛑Փ⫼ৠϔϾ᥻ࠊ㒜ッ ϡ䖛㽕㒭ќϔѯᦤ⼎DŽ ҷⷕ԰㗙೼⿟ᑣЁࡴњϡᇥ⊼㾷ˈҷⷕᴀ䑿гᑊϡ໡ᴖˈ᠔ҹ៥Ӏ෎ᴀϞᡞᅗ⬭㒭䇏㗙㞾Ꮕ䯙䇏ˈ 419 } 418 write_unlock_irq(&tasklist_lock); 417 } 416 } 415 write_lock_irq(&tasklist_lock); 414 } 413 kill_pg(pgrp,SIGCONT,1); 412 kill_pg(pgrp,SIGHUP,1); 411 if (is_orphaned_pgrp(pgrp) && has_stopped_jobs(pgrp)) { 410 write_unlock_irq(&tasklist_lock); 409 408 int pgrp = p•>pgrp; 407 (p•>session == current•>session)) { 406 if ((p•>pgrp != current•>pgrp) && 405 */ 404 * outside, so the child pgrp is now orphaned. 403 * than we are, and it was the only connection 402 * Case ii: Our child is in a different pgrp 401 * process group orphan check 400 /* 399 do_notify_parent(p, p•>exit_signal); 398 if (p•>state == TASK_ZOMBIE) 397 p•>p_pptr•>p_cptr = p; 396 p•>p_osptr•>p_ysptr = p; 395 if (p•>p_osptr) p•>p_osptr = p•>p_pptr•>p_cptr; 394 349 350 735 736 void do_notify_parent(struct task_struct *tsk, int sig) 737 { 738 struct siginfo info; 739 int why, status; 740 741 info.si_signo = sig; 742 info.si_errno = 0; 743 info.si_pid = tsk•>pid; 744 info.si_uid = tsk•>uid; 745 746 /* FIXME: find out whether or not this is supposed to be c*time. */ 747 info.si_utime = tsk•>times.tms_utime; 748 info.si_stime = tsk•>times.tms_stime; 749 750 status = tsk•>exit_code & 0x7f; 751 why = SI_KERNEL; /* shouldn't happen */ 752 switch (tsk•>state) { 753 case TASK_STOPPED: 754 /* FIXME •• can we deduce CLD_TRAPPED or CLD_CONTINUED? */ 755 if (tsk•>ptrace & PT_PTRACED) 756 why = CLD_TRAPPED; 757 else 758 why = CLD_STOPPED; 759 break; 760 761 default: 762 if (tsk•>exit_code & 0x80) 763 why = CLD_DUMPED; 764 else if (tsk•>exit_code & 0x7f) 765 why = CLD_KILLED; 766 else { 767 why = CLD_EXITED; 768 status = tsk•>exit_code >> 8; 769 } 770 break; 771 } 772 info.si_code = why; 773 info.si_status = status; 774 775 send_sig_info(sig, &info, tsk•>p_pptr); 776 wake_up_parent(tsk•>p_pptr); 777 } খ᭄ tsk ᣛ৥ᔧࠡ䖯⿟ⱘ task_struct 㒧ᵘˈা᳝ᔧ䖯⿟໘Ѣ TASK_ZOMBIE˄ℷ೼ exit()˅៪ TASK_STOPPED˄㹿䎳䏾˅ᯊᠡܕ䆌䇗⫼ do_notify_parent()DŽҢҷⷕЁৃ㾕ˈ䖭䞠ⱘ᠔䇧 parent ᰃᣛ ᔧࠡ䖯⿟ⱘĀݏ⠊ā㗠ϡᰃĀ⫳⠊āˈгህᰃ⬅ᣛ䩜 p_pptr ᠔ᣛ㗠ϡᰃ p_opptr ᠔ᣛⱘ䖯⿟DŽ೼ࠡ䴶ⱘ forget_original_parent()ЁᏆ㒣ᡞ↣Ͼᄤ䖯⿟ⱘ p_opptr ᬍ៤њᣛ৥ child_reaperˈ㗠 notify_parent()Ёै Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 567 * (3) Clears the controlling tty for all processes in the 566 * (2) Clears the tty from being controlling the session 565 * (1) Sends a SIGHUP and SIGCONT to the foreground process group 564 * It performs the following functions: 563 * 562 * it wants to disassociate itself from its controlling tty. 561 * This function is typically called only by the session leader, when 560 /* [sys_exit()>do_exit()>exit_notify()>disassociate_ctty()] ==================== drivers/char/tty_io.c 560 606 ==================== ᭄ disassociate_ctty()ⱘҷⷕ೼ drivers/char/tty_io.c Ё˖ 㒜ッⱘ㘨㋏ߛᮁˈᑊᇚ䆹 tty 䞞ᬒ˄⊼ᛣˈ䖯⿟ⱘ task_struct 㒧ᵘЁ᳝Ͼᣛ䩜 tty ᣛ৥݊Џ᥻㒜ッ˅DŽߑ བᵰᔧࠡ䖯⿟ᰃϔϾ session ЁⱘЏ䖯⿟˄current•>leader 䴲 0˅ˈ䙷ህ䖬㽕ᇚᭈϾ session Ϣ݊Џ᥻ ᄤ䖯⿟г㽕Ẕᶹ݊᠔ሲⱘ䖯⿟㒘ᰃ৺៤ЎњĀᄸቯāDŽ ↣Ͼ䖯⿟䛑䕀⿏ࠄ child_reaper ⱘᄤ䖯⿟䯳߫ЁএˈᑊՓ݊ p_pptr гᣛ৥ child_reaperDŽৠᯊˈᇍ↣Ͼ 䖯⿟ⱘ p_opptr 䛑Ꮖᣛ৥ child_reaperˈ㗠 p_pptr ҡᣛ৥ᔧࠡ䖯⿟DŽ䱣ৢⱘ while ᕾ⦃ᇚᄤ䖯⿟䯳߫Ёⱘ ⦄೼ˈᰃ䗔ߎ䖭Ͼ݇㋏㔥ⱘᯊ׭њDŽᔧ CPU Ң do_notify_parent()䖨ಲࠄ exit_notify()Ёᯊˈ᠔᳝ᄤ 822 } while (0) 821 (p)•>p_pptr•>p_cptr = p; \ 820 (p)•>p_osptr•>p_ysptr = p; \ 819 if (((p)•>p_osptr = (p)•>p_pptr•>p_cptr) != NULL) \ 818 (p)•>p_ysptr = NULL; \ 817 init_task.prev_task = (p); \ 816 init_task.prev_task•>next_task = (p); \ 815 (p)•>prev_task = init_task.prev_task; \ 814 (p)•>next_task = &init_task; \ 813 #define SET_LINKS(p) do { \ ==================== include/linux/sched.h 813 822 ==================== 䖯ܹ䖭Ͼ݇㋏㔥ⱘDŽSET_LINK ⱘᅮН೼ include/linux/sched.h Ё˖ task_struct ᣛ䩜ˈҢ㗠ᔶ៤ϔϾᑊϡㅔऩⱘĀ݇㋏㔥āDŽ䖯⿟ᰃ೼߯ᓎⱘᯊ׭೼ do_fork()Ё䗮䖛 SET_LINK 䖭Ͼ⬅҆ሲ݇㋏ᔶ៤ⱘ䯳߫ЁП໪ˈৠᯊг䑿໘݊ᅗⱘ䯳߫Ёˈ᠔ҹ task_struct 㒧ᵘЁ䖬᳝݊ᅗⱘ ԐТ᮴݇㋻㽕DŽᔧ✊ˈϔϾ䖯⿟䰸䑿໘צāˈ㗠 p_opptr ᠔ᣛⱘĀ⫳⠊ā⠊ݏᄤ䖯⿟೼㸠џᯊা䅸݊Ā ϔϾ䖯⿟ⱘ p_pptr 䛑ᣛ৥ᔧࠡ䖯⿟ˈ㗠ᔧࠡ䖯⿟ⱘ p_optr ߭ᣛ৥䯳߫Ё᳔ৢ߯ᓎⱘᄤ䖯⿟DŽ᳝䍷ⱘᰃˈ 䖭ḋˈᔧࠡ䖯⿟ⱘ᠔᳝ᄤ䖯⿟䛑䗮䖛 p_ysptr ੠ p_osptr 䖲᥹೼ϔ䍋ᔶ៤ϔϾঠ䫒䯳߫DŽ䯳߫Ё↣ p_osptrˈᣛ৥ᔧࠡ䖯⿟ⱘĀહહāˈ䖭䞠ⱘ o 㸼⼎“olderāDŽ p_ysptrˈᣛ৥ᔧࠡ䖯⿟ⱘĀᓳᓳāˈ䖭䞠ⱘ y 㸼⼎“youngerāˈ㗠 s 㸼⼎“siblingāDŽ 䖯⿟ᯊˈp_cptr ᣛ৥݊Ā᳔ᑈ䕏ⱘāˈгህᰃ᳔䖥߯ᓎⱘ䙷Ͼᄤ䖯⿟DŽ p_cptrˈᣛ৥ᄤ䖯⿟ˈ䖭䞠ⱘ c 㸼⼎“childāDŽ p_cptr Ϣ p_pptr ᰃⳌᇍᑨⱘDŽᔧϔϾ䖯⿟᳝໮Ͼᄤ 䖯⿟П䯈䛑䗮䖛҆㓬݇㋏䖲᥹೼ϔ䍋㗠ᔶ៤Ā݇㋏㔥āˈ᠔⫼ⱘᣛ䩜䰸 p_opptr ੠ p_pptr ໪ˈ䖬᳝˖ ⿟থֵো৫˛ϡ㽕㋻ˈexit_notify()ⱘҷⷕЁ䱣ৢ˄392 㸠˅ህᡞᄤ䖯⿟ⱘ p_pptr 䆒㕂៤Ϣ p_opptr ⳌৠDŽ ᰃ৥ p_pptr ᠔ᣛ䖯⿟থֵো˗䙷ḋˈᇚᴹᔧ䙷ѯᄤ䖯⿟㽕 exit()ᯊቖϡᰃ㽕৥ϔϾᏆ㒣ϡᄬ೼њⱘ⠊䖯 351 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! থߎֵোDŽҢℸҹৢˈ䖭ѯ䖯⿟ህ≵᳝Џ᥻㒜ッˈ៤њĀৢৄ䖯⿟āDŽ Ϣ݊Џ᥻㒜ッᮁ㒱݇㋏ᛣੇⴔᭈϾ session Ёⱘ䖯⿟䛑ϢПᮁ㒱њ݇㋏ˈ᠔ҹ㽕㒭ৠϔ session Ёⱘ䖯⿟ ѸѦߚ㒘˄session˅ˈᑊՓᕫ԰ℸ䇗⫼ⱘ䖯⿟៤Ў䆹 session ⱘЏ䖯⿟˄leader˅DŽϔϾ session ⱘЏ䖯⿟ ᕫ䗮䖛 setsid()㋏㒳䇗⫼ᴹᓎゟϔϾᮄⱘҎᴎܜⱘЏ᥻㒜ッ݇䯁✊ৢݡᠧᓔ঺ϔϾ ttyDŽϡ䖛ˈ೼ℸПࠡ ᇚᔧࠡܜ݋᳝Ϣ⠊䖯⿟ⳌৠⱘЏ᥻㒜ッDŽԚᰃᄤ䖯⿟ৃҹ䗮䖛 ioctl()㋏㒳䇗⫼ᴹᬍবЏ᥻㒜ッˈгৃҹ ᯊˈᇚ⠊䖯⿟ⱘ task_struct 㒧ᵘ໡ࠊ㒭ᄤ䖯⿟ⱘ䖛⿟Ёᡞ㒧ᵘЁⱘ tty ᣛ䩜г໡ࠊњϟᴹˈ᠔ҹᄤ䖯⿟ 䙷Мˈ䖯⿟ϢЏ᥻㒜ッⱘ䖭⾡㘨㋏᳔߱ᰃᗢḋˈҹঞ೼ҔМᯊ׭ᓎゟⱘਸ਼˛ᰒ✊ˈ೼߯ᓎᄤ䖯⿟ 606 605 } 604 read_unlock(&tasklist_lock); 603 p•>tty = NULL; 602 if (p•>session == current•>session) 601 for_each_task(p) 600 read_lock(&tasklist_lock); 599 598 tty•>pgrp = •1; 597 tty•>session = 0; 596 current•>tty_old_pgrp = 0; 595 594 } 593 kill_pg(tty_pgrp, SIGCONT, on_exit); 592 if (!on_exit) 591 kill_pg(tty_pgrp, SIGHUP, on_exit); 590 if (tty_pgrp > 0) { 589 } 588 return; 587 } 586 kill_pg(current•>tty_old_pgrp, SIGCONT, on_exit); 585 kill_pg(current•>tty_old_pgrp, SIGHUP, on_exit); 584 if (current•>tty_old_pgrp) { 583 } else { 582 tty_vhangup(tty); 581 if (on_exit && tty•>driver.type != TTY_DRIVER_TYPE_PTY) 580 tty_pgrp = tty•>pgrp; 579 if (tty) { 578 577 int tty_pgrp = •1; 576 struct task_struct *p; 575 struct tty_struct *tty = current•>tty; 574 { 573 void disassociate_ctty(int on_exit) 572 */ 571 * exiting; it is 0 if called by the ioctl TIOCNOTTY. 570 * The argument on_exit is set to 1 if called when a process is 569 * session group. * 568 352 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 53 long __waker; 52 long __magic; 51 #if WAITQUEUE_DEBUG 50 struct list_head task_list; 49 struct task_struct * task; 48 #define WQ_FLAG_EXCLUSIVE 0x01 47 unsigned int flags; 46 struct __wait_queue { ==================== include/linux/wait.h 46 56 ==================== wait_queue_t ᭄᥂㒧ᵘDŽ᳝݇ⱘᅣᅮН੠᭄᥂㒧ᵘ䛑ᰃ೼ include/linux/wait.h ϞЁᅮНⱘ˖ ೼ᔧࠡ䖯⿟ⱘ㋏㒳ぎ䯈ේᷜЁ䗮䖛 DECLARE_WAITQUEUE ߚ䜡ぎ䯈ᑊᓎゟњϔϾˈܜ佪 খ᭄ pid ЎᶤϔϾᄤ䖯⿟ⱘ䖯⿟োDŽ 496 add_wait_queue(¤t•>wait_chldexit,&wait); 495 494 return •EINVAL; 493 if (options & ~(WNOHANG|WUNTRACED|__WNOTHREAD|__WCLONE|__WALL)) 492 491 struct task_struct *tsk; 490 DECLARE_WAITQUEUE(wait, current); 489 int flag, retval; 488 { 487 asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru) ==================== kernel/exit.c 487 496 ==================== ᅗ㋏㒳䇗⫼ϔḋˈwait4()೼ݙḌЁⱘܹষᰃ sys_wait4()ˈ㾕 kernel/exit.c Ёⱘҷⷕ˖݊ڣ 㒳Ё⍜༅њDŽ೼៥Ӏ䖭Ͼᚙ᱃Ёˈ⠊䖯⿟ℷ೼ wait4()ЁㄝⴔઽDŽ 䖯⿟ᬊࠄᄤ䖯⿟থᴹⱘֵো㗠ᴹ᭭⧚ৢџˈᇚᄤ䖯⿟ⱘ task_struct 㒧ᵘ䞞ᬒПᯊˈᄤ䖯⿟ህ᳔㒜Ң㋏ ᰃ಴ЎϡӮ㹿䗝Ё㗠ϡ㛑䖨ಲˈҢ⧚䆎Ϟ䇈াᰃ᮴䰤᥼䖳㗠Ꮖˈ݊ task_struct 㒧ᵘ䖬ᰃᄬ೼ⱘDŽࠄ⠊ ಲࠄ঺ϔϾ䖯⿟ЁএњˈাᰃҢᔧࠡ䖯⿟ⱘ㾦ᑺᴹⳟ≵᳝䖨ಲ㗠ᏆDŽϡ䖛ˈ㟇ℸЎℶˈᔧࠡ䖯⿟䖬া 吸ϔএϡ໡䖨њāDŽ㗠䖭䞠ᇍ schedule()ⱘ䇗⫼ˈᅲ䰙Ϟ˄Ң CPU ⱘ㾦ᑺⳟ˅гᰃ䖨ಲⱘˈাϡ䖛ᰃ䖨 task•>state Ꮖ㒣ব៤њ TASK_ZOMBIEˈ䖭ϾᴵӊՓᅗ೼ schedule()Ё∌䖰ϡӮݡ㹿䗝Ёˈ᠔ҹህĀ咘 ѢҔМᯊ׭㹿䖯⿟䇗ᑺ䗝Ё㗠ᕫҹ㒻㓁䖤㸠DŽৃᰃˈ೼䖭䞠ˈᔧࠡ䖯⿟ⱘއМᯊ׭Ң schedule()䖨ಲপ 䖯ܹݙḌৢҢ㋏㒳ぎ䯈䖨ಲ⫼᠋ぎ䯈Пࠡ˅㹿䗝Ёᯊݡ㒻㓁䖤㸠ˈҢ㗠Ң schedule()Ё䖨ಲDŽ᠔ҹˈҔ ᣕ݊Ā䖤㸠⢊ᗕāˈे task•>state ϡবˈㄝᕙϟϔ⃵জ೼ schedule()Ё˄⬅঺ϔϾ䖯⿟ᓩ䍋ˈ៪㗙಴Ёᮁ гৃ㛑ᰃ঺ϔϾ䖯⿟DŽབᵰϡৠⱘ䆱ˈ䙷ህ㽕䖯㸠ߛᤶDŽ㗠ᔧࠡ䖯⿟㱑✊㹿᱖ᯊ࠹༎њ䖤㸠ᴗˈै㓈 ߭Ң㋏㒳Ёᣥ䗝ϔϾ᳔䗖ড়ⱘ䖯⿟䖯ܹ䖤㸠DŽ䖭Ͼ䖯⿟᳝ৃ㛑ህᰃℷ೼䖤㸠ⱘ䖯⿟ᴀ䑿ˈޚ✻ϔᅮⱘ ⱘ䇗⫼ᰃ䖨ಲⱘˈাϡ䖛䖨ಲⱘᯊᴎ㽕ᓊ䖳ࠄᴀ䖯⿟ݡ⃵㹿䇗ᑺ㗠䖯ܹ䖤㸠ⱘᯊ׭DŽߑ᭄ schedule()ᣝ ᰃ䖭䞠ⱘ schedule()DŽᤶ㿔Пˈ೼䖭䞠ᇍ schedule()ⱘ䇗⫼ᰃϡ䖨ಲⱘDŽᔧ✊ˈ೼ℷᐌᴵӊϟᇍ schedule() њˈ䙷ህᰃ schedule()ˈे䖯⿟䇗ᑺDŽࠡ䴶䆆䖛ˈdo_exit()ᰃϡ䖨ಲⱘˈᅲ䰙ϞՓ do_exit()ϡ䖨ಲⱘℷ ݡಲࠄ do_exit()ⱘҷⷕЁDŽᔧ CPU ᅠ៤њ exit_notify()ˈಲࠄ do_exit()Ёᯊˈ࠽ϟⱘ໻џা᳝ϔӊ 353 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 503 struct task_struct *p; 502 do { 501 tsk = current; 500 read_lock(&tasklist_lock); 499 current•>state = TASK_INTERRUPTIBLE; 498 flag = 0; 497 repeat: [sys_wait4()] ==================== kernel/exit.c 497 583 ==================== Ͼϡᇣⱘᕾ⦃˄kernel/exit.c˖sys_wait4()˅˖ ⱘ԰⫼೼ϟ䴶䞡⏽њ do_notify_parent()ⱘҷⷕҹৢህӮ⏙ἮDŽ᥹ⴔˈህ䖯ܹњϔϾᕾ⦃ˈ䖭ᰃϔخḋ ✊ৢˈ䗮䖛 add_wait_queue()ᇚ䖭Ͼ᭄᥂㒧ᵘ˄wait˅ࡴܹࠄᔧࠡ䖯⿟ⱘ wait_chldexit 䯳߫ЁDŽ䖭 ᭄᥂㒧ᵘ wait_chldexit ⫼Ѣ䖭ϾⳂⱘDŽ sys_wait4()䖨ಲˈ䖭Ͼ᭄᥂㒧ᵘህϡ໡ᄬ೼њDŽϢℸⳌᑨˈ೼䖯⿟ⱘ task_struct Ё᳝Ͼ wait_queue_head_t ᵘ task_list ЁⱘϸϾᣛ䩜ഛЎ NULLDŽ⬅Ѣ䖭Ͼ᭄᥂㒧ᵘᓎゟ೼ᔧࠡ䖯⿟ⱘ㋏㒳ぎ䯈ේᷜЁˈϔᮺҢ wait˅ˈ㒧ᵘЁⱘ compiler_warning Ў 0x1234567ˈᣛ䩜 task ᣛ৥ᔧࠡ䖯⿟ⱘ task_structˈ㗠 list_head 㒧 гህᰃ䇈ˈsys_wait4()ϔᓔ༈ህ೼ᔧࠡ䖯⿟ⱘ㋏㒳ේᷜϞߚ䜡ϔϾ wait_queue_t ᭄᥂㒧ᵘ˄ৡЎ 115 wait_queue_t name = __WAITQUEUE_INITIALIZER(name,task) 114 #define DECLARE_WAITQUEUE(name,task) \ 113 { 0x0, task, { NULL, NULL } __WAITQUEUE_DEBUG_INIT(name)} 112 #define __WAITQUEUE_INITIALIZER(name,task) \ 111 110 #endif 109 # define __WAITQUEUE_HEAD_DEBUG_INIT(name) 108 # define __WAITQUEUE_DEBUG_INIT(name) 107 #else 106 , (long)&(name).__magic, (long)&(name).__magic 105 # define __WAITQUEUE_HEAD_DEBUG_INIT(name) \ 104 , (long)&(name).__magic, 0 103 # define __WAITQUEUE_DEBUG_INIT(name) \ 102 #if WAITQUEUE_DEBUG 101 100 typedef struct __wait_queue_head wait_queue_head_t; 99 }; 98 #endif 97 long __creator; 96 long __magic; 95 #if WAITQUEUE_DEBUG 94 struct list_head task_list; 93 wq_lock_t lock; 92 struct __wait_queue_head { ==================== include/linux/wait.h 92 115 ==================== 56 typedef struct __wait_queue wait_queue_t; 55 }; endif# 54 354 355 504 for (p = tsk•>p_cptr ; p ; p = p•>p_osptr) { 505 if (pid>0) { 506 if (p•>pid != pid) 507 continue; 508 } else if (!pid) { 509 if (p•>pgrp != current•>pgrp) 510 continue; 511 } else if (pid != •1) { 512 if (p•>pgrp != •pid) 513 continue; 514 } 515 /* Wait for all children (clone and not) if __WALL is set; 516 * otherwise, wait for clone children *only* if __WCLONE is 517 * set; otherwise, wait for non•clone children *only*. (Note: 518 * A "clone" child here is one that reports to its parent 519 * using a signal other than SIGCHLD.) */ 520 if (((p•>exit_signal != SIGCHLD) ^ ((options & __WCLONE) != 0)) 521 && !(options & __WALL)) 522 continue; 523 flag = 1; 524 switch (p•>state) { 525 case TASK_STOPPED: 526 if (!p•>exit_code) 527 continue; 528 if (!(options & WUNTRACED) && !(p•>ptrace & PT_PTRACED)) 529 continue; 530 read_unlock(&tasklist_lock); 531 retval = ru ? getrusage(p, RUSAGE_BOTH, ru) : 0; 532 if (!retval && stat_addr) 533 retval = put_user((p•>exit_code << 8) | 0x7f, stat_addr); 534 if (!retval) { 535 p•>exit_code = 0; 536 retval = p•>pid; 537 } 538 goto end_wait4; 539 case TASK_ZOMBIE: 540 current•>times.tms_cutime += p•>times.tms_utime + p•>times.tms_cutime; 541 current•>times.tms_cstime += p•>times.tms_stime + p•>times.tms_cstime; 542 read_unlock(&tasklist_lock); 543 retval = ru ? getrusage(p, RUSAGE_BOTH, ru) : 0; 544 if (!retval && stat_addr) 545 retval = put_user(p•>exit_code, stat_addr); 546 if (retval) 547 goto end_wait4; 548 retval = p•>pid; 549 if (p•>p_opptr != p•>p_pptr) { 550 write_lock_irq(&tasklist_lock); 551 REMOVE_LINKS(p); 552 p•>p_pptr = p•>p_opptr; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䱚ߎᴹⱘ঺ϔܟᕾ⦃ਸ਼˛䖭ᰃ಴Ўᔧࠡ䖯⿟ৃ㛑ᰃϔϾ㒓⿟ˈ㗠᠔ㄝᕙⱘᇍ䈵ᅲ䰙Ϟᰃ⬅ৠϔϾ䖯⿟ ᅗϔѯᴵӊⱘᄤ䖯⿟DŽ䖭Ͼ for ᕾ⦃জጠ༫೼ϔϾ do•while ᕾ⦃ЁDŽЎҔМ㽕᳝䖭Ͼ໪ሖⱘ do•while task_struct 㒧ᵘЁⱘᣛ䩜 p_osptr ᠔ᔶ៤ⱘ䫒ᠿᦣˈᡒᇏϢ᠔ㄝᕙᇍ䈵ⱘ pid Ⳍヺⱘᄤ䖯⿟ǃ៪ヺড়݊ ⢊ᗕᰃ৺⒵䎇ᴵӊDŽ䖭䞠ⱘ for ᕾ⦃ᠿᦣϔϾ䖯⿟ⱘ᠔᳝ᄤ䖯⿟ˈҢ᳔ᑈ䕏ⱘᄤ䖯⿟ᓔྟ⊓ⴔ⬅৘Ͼ 㒣⬅ 576 㸠ⱘ goto 䇁হ䕀ಲ repeatˈݡ⃵䗮䖛ϔϾ for ᕾ⦃ᠿᦣ݊ᄤ䖯⿟䯳߫ˈⳟⳟ᠔ㄝᕙⱘᄤ䖯⿟ⱘ 䖤㸠DŽᔧ䆹䖯⿟಴ᬊࠄֵো㗠㹿૸䝦ˈᑊϨফࠄ䇗ᑺҢ schedule()䖨ಲᯊˈህজܜ䖯ܹⴵ⳴䅽߿ⱘ䖯⿟ ৺߭ˈᔧࠡ䖯⿟ᇚ݊㞾䑿ⱘ⢊ᗕ䆒៤TASK_INTERRUPTIBLE˄㾕499㸠˅ᑊ೼575㸠䇗⫼schedule() · 䖯⿟োЎ pid ⱘ䙷Ͼ䖯⿟ḍᴀϡᄬ೼ˈ៪㗙ϡᰃᔧࠡ䖯⿟ⱘᄤ䖯⿟DŽ Ў 1ˈ៪㗙ᔧࠡ䖯⿟ᬊࠄњ݊ᅗⱘֵো˗ · ᠔ㄝᕙⱘᄤ䖯⿟ᄬ೼ˈৃᰃϡ೼Ϟ䗄ϸϾ⢊ᗕˈ㗠䇗⫼খ᭄ options Ёⱘ WNOHANG ᷛᖫԡ · ᠔ㄝᕙⱘᄤ䖯⿟ⱘ⢊ᗕব៤ TASK_STOPPED ៪ TASK_ZOMBIE˗ Ёⱘ“goto end_wait4ā䇁হ˅˖ 䖭Ͼ⬅ goto ᅲ⦄ⱘᕾ⦃㽕ࠄᔧࠡ䖯⿟㹿䇗ᑺ䖤㸠ˈᑊϨϟ߫ᴵӊПϔᕫࠄ⒵䎇ᯊᠡ㒧ᴳ˄㾕ҷⷕ 583 } 582 return retval; 581 remove_wait_queue(¤t•>wait_chldexit,&wait); 580 current•>state = TASK_RUNNING; 579 end_wait4: 578 retval = •ECHILD; 577 } 576 goto repeat; 575 schedule(); 574 goto end_wait4; 573 if (signal_pending(current)) 572 retval = •ERESTARTSYS; 571 goto end_wait4; 570 if (options & WNOHANG) 569 retval = 0; 568 if (flag) { 567 read_unlock(&tasklist_lock); 566 } while (tsk != current); 565 tsk = next_thread(tsk); 564 break; 563 if (options & __WNOTHREAD) 562 } 561 } 560 continue; 559 default: 558 goto end_wait4; 557 release_task(p); 556 } else 555 write_unlock_irq(&tasklist_lock); 554 do_notify_parent(p, SIGCHLD); SET_LINKS(p); 553 356 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 52 /* 51 current•>cnswap += p•>nswap + p•>cnswap; 50 current•>cmaj_flt += p•>maj_flt + p•>cmaj_flt; 49 current•>cmin_flt += p•>min_flt + p•>cmin_flt; 48 release_thread(p); 47 46 unhash_process(p); 45 free_uid(p•>user); 44 atomic_dec(&p•>user•>processes); 43 #endif 42 task_unlock(p); 41 } 40 } while (p•>has_cpu); 39 barrier(); 38 do { 37 task_unlock(p); 36 break; 35 if (!p•>has_cpu) 34 task_lock(p); 33 for (;;) { 32 */ 31 * runqueue (active on some other CPU still) 30 * Wait to make sure the process isn't on the 29 /* 28 #ifdef CONFIG_SMP 27 if (p != current) { 26 { 25 static void release_task(struct task_struct * p) [sys_wait4()>release()]˖ ==================== kernel/exit.c 25 68 ==================== 䞞ᬒ˄kernel/exit.c˅˖ ൟⱘᴵӊϟˈህ䇗⫼ release_task()ᇚᄤ䖯⿟⅟ᄬⱘ䌘⑤ˈህᰃ݊ task_sturct 㒧ᵘ੠㋏㒳ぎ䯈ේᷜˈܼ䛑 ೼⫼᠋ぎ䯈䖤㸠ⱘᯊ䯈੠㋏㒳ぎ䯈䖤㸠ⱘᯊ䯈ϸ乍㒳䅵᭄᥂ড়ᑊܹ݊㞾䑿ⱘ㒳䅵᭄᥂ЁDŽ✊ৢˈ೼݌ ϔ⃵ᠿᦣ݊ᄤ䖯⿟䯳߫DŽ䖭ϔ⃵ˈᄤ䖯⿟ⱘ⢊ᗕᏆ㒣ᬍ៤ TASK_ZOMBIE њˈ᠔ҹ⠊䖯⿟೼ᇚᄤ䖯⿟ ᔧ⠊䖯⿟಴ᄤ䖯⿟೼ exit()৥݊থ䗕ֵো㗠㹿૸䝦ᯊˈህ䕀ಲࠄࠡ䴶 sys_wait4()Ёⱘ repeat ໘ˈজ 䕀⿏ࠄৃᠻ㸠䯳߫ЁˈՓ schedule()㛑໳Āⳟāࠄ⠊䖯⿟㗠ৃҹ䇗ᑺ݊䖤㸠DŽ 㗠 wake_up_process()ˈ߭ᡞ⠊䖯⿟ⱘ⢊ᗕҢ TASK_INTERRUPTIBLE ᬍ៤ TASK_RUNNINGˈᑊᇚ݊ wake_up_process()ᇚ⠊䖯⿟૸䝦DŽᇍ send_sig_info()ⱘҷⷕ៥Ӏᇚ೼Ā䖯⿟䯈䗮ֵāⱘֵোϔ㡖Ёҟ㒡DŽ ໛ϟϔϾ siginfo ᭄᥂㒧ᵘˈ✊ৢ䇗⫼ send_sig_info()ᇚ݊থ䗕㒭⠊䖯⿟ˈᑊ䇗⫼ޚϾߑ᭄ ᗢМ૸䝦ਸ਼˛៥Ӏ೼ࠡ䴶ⳟࠄˈᄤ䖯⿟೼ exit_notify()Ё䗮䖛 do_notify_parent()৥⠊䖯⿟থ䗕ֵোDŽ䖭 ೼䖤㸠ˈ᠔ҹ䗮䖛 schedule()䖯ܹⴵ⳴DŽᔧᄤ䖯⿟ exit()ᯊˈӮ৥⠊䖯⿟থϔϾֵোˈҢ㗠ᇚ݊૸䝦DŽ ᣛ৥䖭Ͼ㒧ᵘDŽ೼៥Ӏ䖭Ͼᚙ᱃Ёˈᔧ⠊䖯⿟䇗⫼ wait4()㗠㄀ϔ⃵ᠿᦣ݊ᄤ䖯⿟䯳߫ᯊˈ䆹ᄤ䖯⿟ᇮ ⷕЁⱘ next_thread()ҢৠϔϾ thread_group 䯳߫ЁᡒࠄϟϔϾ㒓⿟ⱘ task_struct 㒧ᵘˈᑊՓሔ䚼䞣 tsk Ͼ㒓⿟ⱘᄤ䖯⿟ˈ᠔ҹ㽕䗮䖛䖭Ͼ do•while ᕾ⦃ᴹẔᶹৠϔϾ thread_group Ё᠔᳝㒓⿟ⱘᄤ䖯⿟DŽҷ 357 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 649 /* nothing */; 648 while (sys_wait4(•1, NULL, WNOHANG, NULL) > 0) 647 /* Check for SIGCHLD: it's special. */ 646 continue; 645 if (signr != SIGCHLD) 644 if (ka•>sa.sa_handler == SIG_IGN) { 643 ka = ¤t•>sig•>action[signr•1]; ==================== arch/i386/kernel/signal.c 643 651 ==================== signal_return ໘䇗⫼ do_signal()DŽ㗠 do_signal()Ё᳝ϔϾ⠛↉Ў˄arch/i386/kernel/signal.c Ё ˅˖ 㒳䇗⫼ǃЁᮁ៪ᓖᐌ䖨ಲᯊˈ䛑㽕Ẕᶹϔϟᰃ৺ֵ᳝োㄝᕙ໘⧚ˈབ᳝ⱘ䆱ህ䕀ܹ entry.S Ёⱘ ৃᰃˈ㽕ᰃ⠊䖯⿟ϡ೼ wait4()Ёㄝᕙਸ਼˛䙷гϡ㽕㋻DŽ䇏㗙೼㄀ 3 ゴЁᏆ㒣ⳟࠄˈ↣ᔧ䖯⿟Ң㋏ 㟇ℸˈ೼ᠻ㸠њ release()ҹৢˈᄤ䖯⿟ህ᳔㒜Ā♄亲⚳♁āˈҢ㋏㒳Ё⍜༅њDŽ āথϔֵᙃˈህᰃ䖭Ͼॳ಴DŽ⠊ݏ㢹ᰃⱘ䆱ˈህ㽕᳓ᅗ䇗⫼ do_notify_parent()㒭ᮄⱘĀ Ёˈᔧ⠊䖯⿟㽕㒧ᴳ⫳ੑࠡЎ݊ᄤ䖯⿟Āᠬᄸāᯊˈ䖬㽕ⳟϔϟᄤ䖯⿟ⱘ⢊ᗕᰃ৺ TASK_ZOMBIEˈ ੑҸᴹ㾖ᆳ㋏㒳Ёⱘ䖯⿟⢊ᗕᯊˈӮⳟࠄ᳝Ͼ䖯⿟ⱘ⢊ᗕЎ“ZOMBIEāDŽ䇏㗙೼ࠡ䴶ⳟࠄ˖೼ exit_notify() ϟˈ㋏㒳ЁӮ⬭ϟᄤ䖯⿟ⱘĀሌԧāˈ⫼᠋䗮䖛“ps”މend_wait4ā䏇䖛њᇍ release()ⱘ䇗⫼˅DŽ೼䖭⾡ᚙ ໡ࠊࠄ⫼᠋ぎ䯈ЁDŽབᵰ໡ࠊ༅䋹ⱘ䆱ˈ䙷᱖ᯊህϡ㛑ᇚᄤ䖯⿟ⱘ task_struct 㒧ᵘ䞞ᬒњ˄䖭䞠ⱘ“goto ℸ໪ˈḍ᥂ᔧࠡ䖯⿟೼䇗⫼ wait4()ᯊⱘ㽕∖ˈ䖬ৃ㛑㽕ᡞϔѯ⢊ᗕֵᙃ੠㒳䅵ֵᙃ䗮䖛 put_user() ᡞᅗᔦ䖬㒭Ā⫳⠊āˈ䞡ᮄᣖܹ݊Ā⫳⠊āⱘ䯳߫DŽ✊ৢˈ㒭݊Ā⫳⠊āথϔֵোˈ䅽ᅗ㞾Ꮕᴹ໘⧚DŽ āⱘ䯳߫Ё㜅⾏ߎᴹˈݡ䗮䖛 SET_LINKS⠊ݏⳌৠˈᑊ䗮䖛 REMOVE_LINKS ᇚ݊ task_struct Ң݊Ā āϡৠᯊˈ݊Ā⫳⠊āৃ㛑г೼ㄝᕙˈ᠔ҹᇚᄤ䖯⿟ⱘ p_pptr ᣛ䩜䆒㕂៤Ϣ p_opptr⠊ݏᔧĀ⫳⠊āϢĀ āˈԚᰃ⠊ݏāϢĀ⫳⠊āϡৠDŽབࠡ᠔䗄ˈ䖯⿟೼ exit()ᯊˈdo_notify_parent()ⱘᇍ䈵ᰃ݊Ā⠊ݏ݊Ā 䳔㽕㗗㰥ˈ䙷ህᰃϛϔᄤ䖯⿟ⱘ p_opptr Ϣ p_pptr ϡৠˈгህᰃ䇈މ೼ sys_wait4()Ё䖬᳝Ͼ⡍⅞ᚙ ᬒDŽ᳔ৢˈህ䇗⫼ free_task_struct()ᇚ task_struct 㒧ᵘ੠㋏㒳ぎ䯈ේᷜ᠔ऴ᥂ⱘϸϾ⠽⧚义䴶䞞ᬒDŽ ޴乍㒳䅵ֵᙃгড়ᑊܹ⠊䖯⿟DŽ㟇Ѣ release_thread()াᰃẔᶹ䖯⿟ⱘ LDT˄བᵰ᳝ⱘ䆱˅ᰃ৺⹂Ꮖ䞞 㸼䯳߫Ёᨬ䰸ˈ✊ৢᡞᄤ䖯⿟ⱘ݊ᅗޥ䖭䞠䗮䖛 unhash_process()ᡞᄤ䖯⿟ⱘ task_struct 㒧ᵘҢᴖ 68 } 67 } 66 printk("task releasing itself\n"); 65 } else { 64 free_task_struct(p); 63 current•>counter = MAX_COUNTER; 62 if (current•>counter >= MAX_COUNTER) 61 current•>counter += p•>counter; 60 */ 59 * was given away by the parent in the first place.) 58 * timeslices, because any timeslice recovered here 57 * (this cannot be used to artificially 'generate' 56 * 55 * for creating too many processes. 54 * here • this way the parent does not get penalized Potentially available timeslices are retrieved * 53 358 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! (3) 䇗ᑺⱘᮍᓣ˖ᰃĀৃ࠹༎ā˄preemptive˅䖬ᰃĀϡৃ࠹༎ā˄nonpreemptive˅DŽᔧℷ೼䖤㸠ⱘ ߭ᣥ䗝ϟϔϾ䖯ܹ䖤㸠ⱘ䖯⿟DŽޚ(2) 䇗ᑺⱘĀᬓㄪā˄policy˅˖ḍ᥂ҔМ ϟǃҔМᯊ׭䖯㸠䇗ᑺDŽމ(1) 䇗ᑺⱘᯊᴎ˖೼ҔМᚙ Ўњ⒵䎇Ϟ䗄ⱘⳂᷛˈ೼䆒䅵ϔϾ䖯⿟䇗ᑺᴎࠊᯊ㽕㗗㰥ⱘ݋ԧ䯂乬Џ㽕᳝˖ 䆎ᮍ䴶ⱘ⏅ܹ᥶䅼ˈ᳝݈䍷ⱘ䇏㗙ৃҹ䯙䇏᪡԰㋏㒳ᮍ䴶ⱘϧ㨫DŽ ⱘⷨおᰃᭈϾ᪡԰㋏㒳⧚䆎ⱘḌᖗDŽϡ䖛ˈᴀкⱘⳂⱘ೼Ѣᇍ Linux ݙḌⱘࠪᵤ੠㾷䞞ˈ㗠ϡ೼Ѣ⧚ থ⫳ᯊˈ䇗ᑺᴎࠊ䖬ᑨ㛑䆚߿Ϣ࣪㾷DŽৃҹ䇈ˈ݇Ѣ䖯⿟䇗ᑺމDŽϔᮺ䖭ѯᚙމᯊ䯈ᕫϡࠄᠻ㸠ⱘᚙ ᇍ CPU 㛑࡯ⱘϡড়⧚Փ⫼ˈгህᰃ䇈㽕䰆ℶ CPU ᇮ᳝㛑࡯Ϩ᳝䖯⿟ㄝⴔᠻ㸠ˈै⬅Ѣᶤ⾡ॳ಴㗠䭓 ᑺ৘᳝ϡৠˈᑊ᳔㒜ফࠄ CPU 䗳ᑺ੠䋳䕑ⱘᕅડDŽ᳈䞡㽕ⱘᰃˈ䖬㽕䰆ℶĀ⅏䫕āⱘথ⫳ˈҹঞ䰆ℶ ঺໪ˈ䖯⿟䇗ᑺⱘᴎࠊ䖬㽕㗗㰥ࠄĀ݀ℷᗻāˈ䅽㋏㒳Ёⱘ᠔᳝䖯⿟䛑᳝ᴎӮ৥ࠡ᥼䖯ˈሑㅵ݊䖯 ⊼䞡ⱘᰃᇍ⿟ᑣᠻ㸠ⱘĀৃ乘⌟ᗻāDŽ ⱘᯊ䯈˅ˈ䖬㽕㗗㰥᳝݇⿟ᑣ˄ᐌᐌ೼⫼᠋ぎ䯈˅㛑৺೼㾘ᅮᯊ䯈ݙᠻ㸠ᅠDŽ೼ᅲᯊᑨ⫼Ёˈ ϡԚ㽕㗗㰥ડᑨ䗳ᑺ˄ेҢᶤϾџӊথ⫳ࠄ㋏㒳ᇍℸ԰ߎডᑨᑊᓔྟᠻ㸠᳝݇⿟ᑣП䯈᠔䳔 · ᅲᯊᑨ⫼DŽ䖭ᰃᯊ䯈ᗻ᳔ᔎⱘᑨ⫼ˈϡԚ㽕㗗㰥䖯⿟ᠻ㸠ⱘᑇഛ䗳ᑺˈ䖬㽕㗗㰥Āेᯊ䗳ᑺā˗ ᅠ៤ϔϾ԰Ϯ᠔䳔ⱘᯊ䯈ҡᰃϔϾ䞡㽕ⱘ಴㋴ˈ㗗㰥ⱘᰃĀᑇഛ䗳ᑺāDŽ · ᡍ໘⧚ᑨ⫼DŽᡍ໘⧚ᑨ⫼ᕔᕔᰃ԰ЎĀৢৄ԰Ϯā䖤㸠ⱘˈ᠔ҹᇍડᑨ䗳ᑺᑊ᮴㽕∖ˈԚᰃ ⾡ᓊ䖳䍙䖛 150 ↿⾦ᯊˈՓ⫼㗙ህӮᯢᰒഄᛳ㾝ࠄDŽ ᯊˈҡ㽕㛑ֱ䆕↣Ͼ⫼᠋䛑᳝ৃҹ᥹ফⱘડᑨ䗳ᑺ㗠ᑊϡᛳࠄᯢᰒⱘᓊ䖳DŽḍ᥂⌟ᅮˈᔧ䖭 Ͼᑨ⫼⿟ᑣ˅䛑㛑ᛳ㾝ࠄ㞾Ꮕᰃ೼⣀ऴഄՓ⫼ϔϾ㋏㒳DŽ⡍߿ᰃˈᔧ㋏㒳Ё᳝໻䞣䖯⿟݅ᄬ · ѸѦᓣᑨ⫼DŽ೼䖭⾡ᑨ⫼Ёˈⴔ䞡Ѣ㋏㒳ⱘડᑨ䗳ᑺˈՓ݅⫼ϔϾ㋏㒳ⱘ৘Ͼ⫼᠋˄ҹঞ৘ 㽕ݐ乒ϝ⾡ϡৠᑨ⫼ⱘ䳔㽕˖ Ӯ⬅Ѣᅲ⦄ⱘ໡ᴖ⿟ᑺ㗠೼ࡳ㛑Ϣᗻ㛑ᮍ䴶԰ߎᖙ㽕ⱘᴗ㸵੠䅽ℹDŽϔϾདⱘ㋏㒳ⱘ䖯⿟䇗ᑺᴎࠊˈ ᰃ㸵䞣ϔϾ᪡԰㋏㒳ᗻ㛑ⱘ䞡㽕ᣛᷛDŽ䖯⿟䇗ᑺᴎࠊⱘ䆒䅵ˈ䖬ᇍ㋏㒳໡ᴖᗻ᳝ⴔᵕ໻ⱘᕅડˈᐌᐌ ᅮᗻⱘᕅડDŽḍ᥂䇗ᑺ㒧ᵰ᠔԰ⱘ䖯⿟ߛᤶⱘ䗳ᑺˈгއⱘᅲ⦄ǃࡳ㛑䆒㕂ҹঞ৘ᮍ䴶ⱘᗻ㛑䛑᳝ⴔ ೼໮䖯⿟ⱘ᪡԰㋏㒳Ёˈ䖯⿟䇗ᑺᰃϔϾܼሔᗻⱘǃ݇䬂ᗻⱘ䯂乬ˈᅗᇍ㋏㒳ⱘᘏԧ䆒䅵ǃ㋏㒳 4.6 䖯⿟ⱘ䇗ᑺϢߛᤶ 㒳ⱘĀᖗ䏇āDŽ Ёᮁᰃ਼ᳳᗻഄথ⫳ⱘˈ㽕ϡ✊ህ䖲䇗ᑺг᳝ৃ㛑ϡӮথ⫳њˈℷ಴Ўབℸˈᯊ䩳Ёᮁᠡ㹿ⳟ԰ᰃ㋏ ਸ਼˛ϛϔ⠊䖯⿟೼䖤㸠ᯊ᮶ϡ԰㋏㒳䇗⫼ˈгϡ䆓䯂໪䆒ˈ᳈≵᳝ӏԩ᪡԰ᓩ䍋ᓖᐌਸ਼˛߿ᖬ䆄ᯊ䩳 䇏㗙г䆌䖬Ӯ䯂ˈᗢḋᠡ㛑ֱ䆕ϔᅮӮ᳝㋏㒳䇗⫼ǃЁᮁ៪ᓖᐌᴹ䖿Փ݊⠊䖯⿟ᠻ㸠 do_signal() ᔧ✊ˈབᵰ⠊䖯⿟Ꮖ㒣Ў SIGCHLD ֵো䆒㕂њ݊ᅗⱘ໘⧚⿟ᑣˈ䙷ህ঺԰߿䆎њDŽ ⼎ৠϔϾ䖯⿟㒘ЁⱘӏԩϔϾᄤ䖯⿟䛑೼໘⧚П߫˄㾕 sys_wait4()ⱘ for ᕾ⦃Ёᇍখ᭄ pid ⱘ↨ᇍ˅DŽ ৃ㾕⠊䖯⿟೼ᬊࠄ SIGCHLD ֵোৢ䖬Ӯ㹿ࡼഄᴹ䇗⫼ sys_wait4()ˈℸᯊⱘ䇗⫼খ᭄ pid Ў•1ˈ㸼 651 } continue; 650 359 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 㒭ߎϔϾ䖯⿟ⱘ⢊ᗕ䕀ᤶ݇㋏⼎ᛣ೒˄೒ 4.4˅DŽܜ೼ᕔϟভ䗄Пࠡˈℸ໘ 䙷МˈLinux ݙḌⱘ䇗ᑺᴎࠊࠄᑩᰃҔМḋⱘਸ਼˛៥Ӏ䖬ᰃߚϝϾᮍ䴶ᴹৠㄨ䖭Ͼ䯂乬DŽ 䖬ᰃ㛑⒵䎇㽕∖ⱘˈ㱑✊ҢᭈԧϞ䆆 CPU ⫼೼䇗ᑺϢߛᤶϞⱘᓔ䫔᠔ऴⱘ↨՟ϞछњDŽ 䎇໳໮ⱘџˈ䙷Мᇍϔ㠀ⱘᅲᯊᑨ⫼ᴹ䇈ৃ㛑خ䯈⠛ߚᇣࠄ 0.5 ↿⾦ˈ㗠 CPU ҡ㛑೼䖭Мⷁⱘᯊ䯈䞠 Ѣᡔᴃⱘথሩˈ⡍߿ᰃ CPU ⱘ䗳ᑺDŽ՟བˈህ೼䖭МϔϾ㋏㒳ЁˈབᵰৃҹᡞᯊއϾ㾦ᑺ䆆ˈ䖭гপ ҎৢᏅāˈ߿ⱘ䖯⿟াདᑆㄝⴔᅗᡞᯊ䯈⠛⫼ᅠ㗠ത༅㡃ᴎDŽҢ঺ϔܜĀ㾝ᙳϡ催āˈϡពᕫĀأأ⿟ 㑻߿催ⱘ䖯⿟ᗹⴔ㽕䖤㸠ˈ㗠ℷ೼䖤㸠Ёⱘ䖯ܜӬˈމ಴Ўˈ᳝ৃ㛑থ⫳Āᗹ᚞亢䘛Ϟ᜶䚢Ёāⱘᚙ া㽕ᯊ䯈⠛ߦߚᕫᔧˈѸѦᓣᑨ⫼ⱘ㽕∖ህৃҹ⒵䎇њDŽԚᰃˈ䖭ḋⱘ㋏㒳ᰒ✊ϡ䗖ড়ᅲᯊⱘᑨ⫼DŽ 㑻߿ⱘ催Ԣ䖯㸠䇗ᑺˈ↣Ͼᯊ䯈⠛ϔ⃵ˈ䰸ℸП໪ህা㛑೼䖯⿟㞾ᜓᯊᠡৃ䖯㸠䇗ᑺDŽ䖭ḋˈܜ⿟ⱘӬ བˈৃҹᡞᯊ䯈ߦߚ៤ᯊ䯈⠛ˈ↣Ͼᯊ䯈⠛ᴹϔ⃵ᯊ䩳Ёᮁˈ㗠䇗ᑺৃҹ೼ᯊ䯈⠛Ёᮁᯊ䖯㸠DŽᣝ䖯 ϟৃ࠹༎ህ៤њ䞡㽕ⱘ䯂乬DŽ՟މ䖯ϔℹˈབᵰ䇗ᑺⱘᗻ䋼ᰃ᳝ᴵӊഄৃ࠹༎ˈ䙷Мˈ೼ҔМᚙ ᓩ䍋ϡᖙ㽕ⱘ⏋⎚DŽܡЎ“policyāˈ᠔ҹ៥Ӏাདᡞ䖭⿄ЎĀᮍᓣāˈҹ⿄ޚ߭៪ᷛޚᑺ 䖭䞠㽕䇈ᯢϔϟˈ೼Ё᭛䞠г䆌ᑨ䆹ᡞᰃ৺ৃҹ࠹༎⿄ЎĀᬓㄪāˈԚᰃ೼㣅᭛ⱘкߞЁᏆ㒣ᡞ䇗 ϾᛣᜓDŽৠᯊˈ䖬㽕㗗㰥ˈབᵰϔϾ䖯⿟಴䱋ܹњ⅏ᕾ⦃㗠ᡧԣ CPU ϡᬒ䆹ᗢМࡲDŽ ᜓ䇗ᑺⱘᯊ׭ᠡ㛑䖯㸠䇗ᑺDŽⳌᑨഄˈህ㽕䆒䅵ϔϾĀॳ䇁āˈे㋏㒳䇗⫼ˈ䅽䖯⿟ৃҹ㸼䖒㞾Ꮕⱘ䖭 ᅮњˈা㛑೼᳝ϔϾ䖯⿟㞾އϡৃ࠹༎ˈгህᰃ䇈മᣕᅠܼ㞾ᜓⱘॳ߭ˈ䙷М䇗ᑺⱘᯊᴎгህ෎ᴀϞ 䖭ϝϾ䯂乬ˈ⡍߿ᰃ㄀ϔ੠㄀ϝϾ䯂乬ˈᰃ㋻ᆚ㒧ড়೼ϔ䍋ⱘDŽ՟བˈབᵰ䇗ᑺⱘᗻ䋼ᰃ㒱ᇍഄ Ā՟໪ā˛ ݊䖤㸠㗠㒭݊ᅗⱘ䖯⿟ϔϾᴎӮDŽབᵰᰃৃ࠹༎ˈ䙷Мᰃ৺೼ӏԩᴵӊϟ䛑ৃ࠹༎ˈ᳝≵᳝ ℶذˈ䖯⿟ᑊϡ㞾ᜓ᱖ᯊᬒᓗᇍ CPU ⱘĀՓ⫼ᴗāᯊˈᰃ৺ৃҹᔎࠊᗻഄ᱖ᯊ࠹༎݊Փ⫼ᴗ 360 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ㄀ 3 ゴ䆆䗄Ёᮁ䖨ಲᯊᦤࠄ䖛ˈԚᰃ᳝ᖙ㽕೼ℸࡴҹᔎ䇗ˈᑊ䞡⏽ arch/i386/kernel/entry.S ЁⱘϸϾ⠛ ⴔা᳝೼⫼᠋ぎ䯈˄ᔧ CPU ೼⫼᠋ぎ䯈䖤㸠ᯊ˅থ⫳ⱘЁᮁ៪ᓖᐌᠡӮᓩ䍋䇗ᑺDŽ݇Ѣ䖭ϔ⚍៥Ӏ೼ ៪ᓖᐌ໘⧚䖨ಲࠄ⫼᠋ぎ䯈ⱘࠡ໩DŽ⊼ᛣˈ䖭䞠Ā䖨ಲࠄ⫼᠋ぎ䯈ā޴Ͼᄫᰃ݇䬂ᗻⱘˈ಴Ў䖭ᛣੇ 䰸ℸП໪ˈ䇗ᑺ䖬ৃҹ䴲㞾ᜓഄˈेᔎࠊഄথ⫳೼↣⃵Ң㋏㒳䇗⫼䖨ಲⱘࠡ໩ˈҹঞ↣⃵ҢЁᮁ ⱘDŽ ㋏㒳䇗⫼ЁDŽ޴Т᠔᳝⍝ঞࠄ໪䆒ⱘ㋏㒳䇗⫼ˈབ open()ǃread()ǃwrite()੠ select()ㄝˈ䛑ᰃৃ㛑ফ䰏 㞾ᜓᬒᓗ䖤㸠䖭ϔВࡼᰃৃ㾕ⱘ˗㗠೼ݙḌЁ㞾ᜓᬒᓗ䖤㸠߭ᰃϡৃ㾕ⱘˈᅗ䱤㮣೼݊ᅗৃ㛑ফ䰏ⱘ ᭄ˈϡᰃ㋏㒳䇗⫼ˈԚ᳔㒜㽕䗮䖛㋏㒳䇗⫼ᴹᅠ៤˅DŽ䖭䞠㽕ᣛߎ˖Ңᑨ⫼ⱘ㾦ᑺⳟˈা᳝೼⫼᠋ぎ䯈 ⫼Ѣℸ乍Ⳃⱘ˗Ⳍᑨഄˈ೼⫼᠋ぎ䯈߭ৃҹ䗮䖛㋏㒳䇗⫼ nanosleep()㗠䖒ࠄⳂⱘ˄⊼ᛣˈsleep()ᰃᑧߑ ᴹ䖒ࠄৠḋⱘⳂⱘDŽгৃҹЎ䖭⾡㞾ᜓⱘ᱖ᯊᬒᓗ䖤㸠ࡴϞᯊ䯈䰤ࠊDŽ೼ݙḌЁ᳝ schedule_timeout() TASK_UNINTERRUPTIBLEˈ᱖ᯊᬒᓗ䖤㸠㗠䖯ܹⴵ⳴DŽ೼⫼᠋ぎ䯈Ёˈ߭ৃҹ䗮䖛㋏㒳䇗⫼ pause() ✊гৃҹ೼䇗⫼ schedule()Пࠡˈᇚᴀ䖯⿟ⱘ⢊ᗕ䆒㕂៤Ў TASK_INTERRUPTIBLE ៪ 㞾ᜓⱘ䇗ᑺ䱣ᯊ䛑ৃҹ䖯㸠DŽ೼ݙḌ䞠䴶ˈϔϾ䖯⿟ৃҹ䗮䖛 schedule()ਃࡼϔ⃵䇗ᑺˈᔧˈܜ佪 ⳟ䇗ᑺⱘᯊᴎDŽܜ ೒ 4.4 䖯⿟⢊ᗕ䕀ᤶ೒ 361 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䯂乬DŽ᠔ҹ䖭ᰃϾ䆒䅵䯂乬㗠ϡᰃᅲ⦄䯂乬DŽাᰃˈ䱣ⴔ CPU 䗳ᑺবᕫ䍞ᴹ䍞ᖿˈ䖭Ͼ䯂乬އᴀϞ㾷 䕏䖭Ͼ䯂乬ˈԚᑊϡ㛑Ңḍޣজথ⫳޴⃵Ёᮁⱘ䆱ˈህৃ㛑ᇚ䇗ᑺ䖛ߚഄ᥼䖳DŽ㡃དⱘݙḌҷⷕৃҹ Ёᮁ˄៪ᓖᐌ˅䖨ಲᯊᠡӮথ⫳䇗ᑺDŽ׬㢹ݙḌЁⱘ䖭↉ҷⷕᙄད䳔㽕䕗䭓ᯊ䯈ᅠ៤ⱘ䆱ˈ៪㗙䖲㓁 ḌЁᯊˈᴀ⃵Ёᮁ䖨ಲᰃϡӮᓩ䍋䇗ᑺⱘˈ㗠㽕ࠄ᳔߱Փ CPU Ң⫼᠋ぎ䯈䖯ܹݙḌⱘ䙷⃵㋏㒳䇗⫼៪ ҹ֓೼䕗催ⱘሖ⃵Ϟˈгህᰃ೼⫼᠋ぎ䯈Ёᇍџӊ䖯㸠ঞᯊⱘ໘⧚DŽৃᰃˈབᵰ䖭ḋⱘЁᮁথ⫳೼ݙ ᯊⱘᑨ⫼ЁˈᶤϾЁᮁⱘথ⫳ৃ㛑ϡԚ㽕∖䖙䗳ⱘЁᮁ᳡ࡵˈ䖬㽕∖䖙䗳ഄ䇗ᑺ᳝݇䖯⿟䖯ܹ䖤㸠ˈ 䙷М Linux ⦄㸠ⱘ䖭⾡䇗ᑺᴎࠊ᳝ҔМ㔎⚍៪ϡ䎇ˈЎҔМৃ㛑Ӯ᳝᮹ৢⱘᬍ䖯ਸ਼˛՟བ˖೼ᅲ ৢᇍ䇗ᑺᴎࠊⱘᬍ䖯༴ᅮ෎⸔DŽ ݊ॳ಴ˈϔᴹৃ㛑ᰃ໾໮њˈࡴϡ㚰ࡴˈݡ䇈೼ऩ໘⧚఼ᴵӊϟⱘ䖤㸠ᯊᓔ䫔гϡ໻˗ѠᴹгᰃЎ᮹ ⱘҷⷕᬒ೼ᴵӊ㓪⋑#ifdef __SMP__ЁˈԚᰃै≵᳝ᡞ䖭ѯ⫼ѢѦ᭹ֱᡸⱘ᪡԰гᬒ೼ᴵӊ㓪䆥ЁDŽお 䙷М໮ⱘ up()ǃdown()ㄝֵো䞣᪡԰៪ࡴ䫕᪡԰ⱘॳ಴DŽLinux ݙḌЁϔ㠀ᇚ⫼Ѣ໮໘⧚఼ SMP 㒧ᵘ ˄ৃ㛑೼ϡৠⱘ CPU Ϟ䖤㸠˅݅ѿⱘব䞣੠᭄᥂㒧ᵘˈ䛑ᕫֱᡸ䍋ᴹDŽ䖭ህᰃ䇏㗙೼䯙䇏ҷⷕᯊⳟࠄ ݅ѿ䌘⑤ⱘৃ㛑ᗻDŽ䖭ḋˈϡㅵ೼ৠϔϾ CPU Ϟᰃ৺᳝ৃ㛑೼ݙḌЁথ⫳䇗ᑺˈ᠔᳝ৃ㛑Ў໮Ͼ䖯⿟ ሑㅵ೼ݙḌЁ⬅ѢϡӮথ⫳䇗ᑺ㗠᮴䳔㗗㰥Ѧ᭹ˈԚैϡ㛑ϡ㗗㰥೼঺ϔϾ໘⧚఼Ϟ䖤㸠ⱘ䖯⿟䆓䯂 ⱘ䞛⫼ˈ䖭⾡ㅔ࣪ℷ೼༅এ䞡㽕ᗻDŽ೼໮໘⧚఼ SMP ㋏㒳Ё˄㾕Ā໮໘⧚఼ SMP ㋏㒳㒧ᵘāϔゴ˅ˈ ো䞣˅ࡴҹֱᡸˈ៪㗙䇈䛑㽕ᬒ೼Ј⬠ऎЁDŽϡ䖛ˈ䱣ⴔ໮໘⧚఼ SMP ㋏㒳㒧ᵘⱘߎ⦄ҹঞ᮹Ⲟᑓ⊯ ⱘDŽ৺߭ⱘ䆱ˈݙḌЁ᠔᳝ৃ㛑ЎϔϾҹϞ䖯⿟݅ѿⱘব䞣੠᭄᥂㒧ᵘህܼ䛑㽕䗮䖛Ѧ᭹ᴎࠊ˄བֵ ᓖᐌϡӮᓩ䍋䇗ᑺDŽ䖭ህՓݙḌⱘᅲ⦄ㅔ࣪њˈᮽᳳⱘ Unix ݙḌℷᰃ䴴䖭Ͼࠡᦤᴹㅔ࣪݊䆒䅵Ϣᅲ⦄ Ё䖤㸠ᯊ᮴䳔㗗㰥ᔎࠊ䇗ᑺⱘৃ㛑ᗻDŽথ⫳೼㋏㒳ぎ䯈ⱘЁᮁ៪ᓖᐌᔧ✊ᰃৃ㛑ⱘˈԚᰃ䖭⾡Ёᮁ៪ VM86 ῵ᓣ㗠䆒㕂ⱘ˅DŽ䖭ϔ⚍ᇍѢ㋏㒳ⱘ䆒䅵੠ᅲ⦄᳝ᕜ䞡㽕ⱘᛣНDŽ಴Ў䙷ᛣੇⴔᔧ CPU ೼ݙḌ ЎЁᮁ˄៪ᓖᐌ˅থ⫳ࠡ CPU ⱘ䖤㸠㑻߿Ў 3ˈे⫼᠋⢊ᗕ˄៥Ӏ೼䖭䞠ϡ݇ᖗ VM_MASKˈ䙷ᰃЎ ఼ CS ⱘݙᆍˈ᳔Ԣⱘϸԡ㸼⼎ᔧᯊⱘ䖤㸠㑻߿DŽҢҷⷕЁৃҹⳟࠄˈ䕀ܹ ret_with_reschedule ⱘᴵӊ 277 㸠Ёᆘᄬ఼ EAX ⱘݙᆍ᳝ϸϾᴹ⑤ˈ᳔݊Ԣⱘᄫ㡖ᴹ㞾ֱᄬ೼ේᷜЁⱘ䖯ܹЁᮁࠡ໩↉ᆘᄬ 279 jmp restore_all 278 jne ret_with_reschedule 277 testl $(VM_MASK | 3),%eax # return to VM86 mode or non•supervisor? 276 movb CS(%esp),%al 275 movl EFLAGS(%esp),%eax # mix EFLAGS and CS 274 GET_CURRENT(%ebx) 273 ENTRY(ret_from_intr) 272 271 jne handle_softirq 270 #endif 269 testl SYMBOL_NAME(irq_stat)+4,%ecx # softirq_mask 268 movl SYMBOL_NAME(irq_stat),%ecx # softirq_active 267 #else ==================== arch/i386/kernel/entry.S 267 279 ==================== 261 #ifdef CONFIG_SMP 260 ret_from_exception: ==================== arch/i386/kernel/entry.S 260 261 ==================== ↉˖ 362 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⫼ sched_setscheduler()䆒ᅮ㞾Ꮕ䗖⫼ⱘ䇗ᑺᬓㄪDŽ݊Ё SCHED_FIFO 䗖ড়Ѣᯊ䯈ᗻ㽕∖↨䕗ᔎǃԚ↣ SCHED_RR ҹঞ SCHED_OTHERDŽ↣Ͼ䖯⿟䛑᳝㞾Ꮕ䗖⫼ⱘ䇗ᑺᬓㄪˈᑊϨˈ䖯⿟䖬ৃҹ䗮䖛㋏㒳䇗 ԚᰃˈЎњ䗖ᑨ৘⾡ϡৠᑨ⫼ⱘ䳔㽕ˈݙḌ೼ℸ෎⸔Ϟᅲ⦄њϝ⾡ϡৠⱘᬓㄪ˖SCHED_FIFOǃ 㑻Ў෎⸔ⱘ䇗ᑺDŽܜ᠔ҹ䇈ᰃҹӬ 㑻Ў෎⸔ⱘˈܜ᳝䖯⿟ⱘ䌘Ḑ䛑ব៤њ 0 ᯊˈህ䞡ᮄ䅵ㅫϔ⃵᠔᳝䖯⿟ⱘ䌘ḐDŽ䌘Ḑⱘ䅵ㅫЏ㽕ᰃҹӬ Ң㗠೼ϟϔ⃵䇗ᑺⱘᯊ׭ॳᴹ䌘Ḑ䕗Ԣⱘ䖯⿟ৃ㛑ህ᳈᳝䌘Ḑ䖤㸠њDŽࠄ᠔ˈޣ⿟ⱘ䌘Ḑ䱣ᯊ䯈㗠䗦 䅵ㅫߎϔϾড᯴݊䖤㸠Ā䌘Ḑāⱘᴗؐˈ✊ৢᣥ䗝ᴗ᳔ؐ催ⱘ䖯⿟ᡩܹ䖤㸠DŽ೼䖤㸠䖛⿟Ёˈᔧࠡ䖯 㑻Ў෎⸔ⱘ䇗ᑺDŽݙḌЎ㋏㒳Ёⱘ↣Ͼ䖯⿟ܜ㟇Ѣ䇗ᑺᬓㄪˈ෎ᴀϞᰃҢ UNIX 㒻ᡓϟᴹⱘҹӬ ݙḌ˅䖨ಲ⫼᠋ぎ䯈ⱘࠡ໩DŽ 䙷Мˈ࠹༎ᓣⱘ䇗ᑺথ⫳೼ҔМᯊ׭ਸ਼˛ৠḋгᰃথ⫳೼䖯⿟Ң㋏㒳ぎ䯈˄ࣙᣀ಴㋏㒳䇗⫼䖯ܹ ৃ࠹༎ⱘˈ݊ॳ಴ⲪߎѢℸDŽ ⱘ䇗ᑺᰃৃ࠹༎ⱘˈ᳝ⱘै䇈ᰃϡৃ࠹༎ⱘˈ⫮㟇ৠϔᴀкЁ᳝ᯊ׭䇈ᰃৃ࠹༎ⱘˈ᳝ᯊ׭জ䇈ᰃϡ 䇈ᰃৃ࠹༎ⱘˈৃᰃ೼ᅲ䰙䖤㸠Ё⬅Ѣ䇗ᑺᯊᴎⱘ䰤ࠊ㗠ব៤њ᳝ᴵӊⱘDŽℷ಴Ў䖭ḋˈ᳝ⱘк䇈 Linux 䖯⿟ेᇚĀϟৄāˈгህᰃಲࠄ⫼᠋ぎ䯈ⱘࠡ໩ᠡ㛑࠹༎݊䖤㸠DŽ᠔ҹˈLinux ⱘ䇗ᑺᮍᓣҢॳ߭Ϟᴹ ሖā㗠ĀߥϡϞ໻໿āњDŽ䖭ᯊ׭ˈሑㅵݙḌⶹ䘧ᑨ䆹㽕䇗ᑺњˈԚᅲ䰙ϞैϡӮথ⫳ˈϔⳈ㽕ࠄ䆹 ᰃ䖯њĀ催ڣ䖤㸠DŽৃᰃˈϔᮺ䖯⿟䖯ܹњݙḌぎ䯈ˈ៪㗙䇈䖯ܹњĀ䭓ᅬā˄supervisor˅῵ᓣˈ䙷ህད 㞾ᜓˈϔᮺ᳝ᖙ㽕˄՟བᏆ㒣䖤㸠њ䎇໳䭓ⱘᯊ䯈˅ˈݙḌህৃҹ᱖ᯊ࠹༎݊䖤㸠㗠䇗ᑺ݊ᅗ䖯⿟䖯ܹ Linux ݙḌⱘ䇗ᑺᮍᓣৃҹ䇈ᰃĀ᳝ᴵӊⱘৃ࠹༎āᮍᓣDŽᔧ䖯⿟೼⫼᠋ぎ䯈䖤㸠ᯊˈϡㅵ㞾ᜓϡ ݡⳟ䇗ᑺⱘᮍᓣDŽ থ⦄ᔧࠡ䖯⿟Ꮖ㒣䖲㓁䖤㸠໾Йⱘᯊ׭DŽ 䇗⫼Ё಴ᶤ⾡ॳ಴ফ䰏ҹ໪ˈЏ㽕ህᰃᔧ಴ᶤ⾡ॳ಴૸䝦ϔϾ䖯⿟ⱘᯊ׭ˈҹঞ೼ᯊ䩳Ёᮁ᳡ࡵ⿟ᑣ ϟ䆒㕂䖭Ͼᄫ↉ਸ਼˛䰸ᔧࠡ䖯⿟䗮䖛㋏㒳䇗⫼㞾ᜓ䅽ߎ䖤㸠ҹঞ೼㋏㒳މᵘⱘDŽৃᰃˈݙḌ೼ҔМᚙ ⫼ schedule()DŽ䙷Мˈ䇕ᴹ䆒㕂䖭Ͼᄫ↉ਸ਼˛ᔧ✊ᰃݙḌˈҢ⫼᠋ぎ䯈ᰃ䆓䯂ϡࠄ䖯⿟ⱘ task_struct 㒧 ৃ㾕ˈা᳝೼ᔧࠡ䖯⿟ⱘ task_struct 㒧ᵘЁⱘ need_resched ᄫ↉Ў䴲 0 ᯊᠡӮ䕀ࠄ reschedule ໘䇗 289 jmp ret_from_sys_call 288 call SYMBOL_NAME(schedule) # test 287 reschedule: ==================== arch/i386/kernel/entry.S 287 289 ==================== 223 RESTORE_ALL 222 restore_all: 221 jne signal_return 220 cmpl $0,sigpending(%ebx) 219 jne reschedule 218 cmpl $0,need_resched(%ebx) 217 ret_with_reschedule: ==================== arch/i386/kernel/entry.S 217 223 ==================== 䇗ᑺ䖬㽕ⳟ᳝᮴ℸ⾡㽕∖ˈⳟϔϟ arch/i386/kernel/entry.S Ёⱘ䖭ϔ↉ҷⷕ˖ ߚᴵӊDŽ݋ԧᰃ৺থ⫳ܙ⊼ᛣˈĀҢ㋏㒳ぎ䯈䖨ಲࠄ⫼᠋ぎ䯈āাᰃথ⫳䇗ᑺⱘᖙ㽕ᴵӊˈ㗠ϡᰃ ⏤⏤ഄবᕫϡ䙷М䞡㽕њDŽ 363 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 529 528 handle_softirq_back: 527 goto handle_softirq; 526 if (softirq_active(this_cpu) & softirq_mask(this_cpu)) 525 /* Do "administrative" work here while we don't hold any locks */ 524 523 release_kernel_lock(prev, this_cpu); 522 521 goto scheduling_in_interrupt; 520 if (in_interrupt()) 519 518 this_cpu = prev•>processor; 517 prev = current; 516 need_resched_back: 515 if (!current•>active_mm) BUG(); 514 513 int this_cpu, c; 512 struct list_head *tmp; 511 struct task_struct *prev, *next, *p; 510 struct schedule_data * sched_data; 509 { 508 asmlinkage void schedule(void) 507 */ 506 * information in task[0] is never used. 505 * tasks can run. It can not be killed, and it cannot sleep. The 'state' 504 * NOTE!! Task 0 is the 'idle' task, which gets called when no other 503 * 502 * The goto is "interesting". 501 * 500 * scheduler: it's not perfect, but certainly works for most things. 499 * 'schedule()' is the scheduler function. It's a very simple and nice 498 /* ==================== kernel/sched.c 498 529 ==================== 䞠䴶এˈ݊ҷⷕ೼ kernel/sched.c Ё˖ ੑⱘ䖯⿟೼ do_exit()Ёⱘ᳔ৢϔӊџህᰃ䇗⫼ schedule()ˈ៥ӀህҢ䖭䞠᥹ⴔᕔϟⳟˈ⏅ܹࠄ schedule() ⬅ᔧࠡ䖯⿟㞾ᜓ䇗⫼ schedule()᱖ᯊᬒᓗ䖤㸠ⱘᚙ᱃DŽ೼ exit()ϔ㡖Ёˈ䇏㗙Ꮖ㒣ⳟࠄϔϾℷ೼㒧ᴳ⫳ ⳟϔϾЏࡼ䇗ᑺˈгህᰃܜϟ䴶ˈ៥Ӏህ㒧ড়ҷⷕ⏅ܹࠄ䇗ᑺ੠ߛᤶⱘ䖛⿟ЁএDŽ೼ᴀ㡖Ё៥Ӏ њ䰤ࠊDŽ៥Ӏᇚ㒧ড়ҷⷕ᳈⏅ܹഄ䅼䆎䖭ѯᬓㄪ䯈ⱘᏂᓖ੠԰⫼DŽ 㑻߿гࡴܜ໻ᄺᯊヺড়ᶤѯ⡍⅞ᴵӊⱘ㗗⫳ৃҹ㦋ᕫࡴߚϔḋDŽৠᯊˈᇍѢ䗖⫼ϡৠᬓㄪⱘ䖯⿟ⱘӬ 㗗ڣᅲ䰙Ϟ᳔ৢ䖬ᰃ䛑ᔦ㒧ࠄ৘Ͼ䖯⿟ⱘᴗؐˈাϡ䖛ᰃ೼䅵ㅫ䌘Ḑᯊᡞ䗖⫼ᬓㄪг㗗㰥䖯এˈህད ᅮপ㟡ਸ਼˛އ᮶✊↣Ͼ䖯⿟䛑᳝㞾Ꮕⱘ䗖⫼䇗ᑺᬓㄪˈݙḌᗢḋ೼䗖⫼ϡৠ䇗ᑺᬓㄪⱘ䖯⿟П䯈 SCHED_OTHERˈ߭ЎӴ㒳ⱘ䇗ᑺᬓㄪˈ↨䕗䗖ড়ѢѸѦᓣⱘߚᯊᑨ⫼DŽ Robināˈᰃ䕂⌕ⱘᛣᗱˈ䖭⾡ᬓㄪ䗖ড়↨䕗໻ǃгህᰃ↣⃵䖤㸠䳔ᯊ䕗䭓ⱘ䖯⿟DŽ㗠䰸ℸѠ㗙П໪ⱘ ⃵䖤㸠᠔䳔ⱘᯊ䯈↨䕗ⷁⱘ䖯⿟ˈᅲᯊⱘᑨ⫼໻䛑݋᳝䖭ḋⱘ⡍⚍DŽSCHED_RR Ё ⱘ“ RRā㸼 ⼎“ Round 364 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ぎ䇁হˈ᠔ҹ᥹ⴔህᰃẔᶹᰃ৺᳝ݙḌ䕃Ё↉᳡ࡵ䇋∖೼ㄝᕙ˄㾕㄀ 3 ゴ˅DŽབᵰ᳝ህ䕀ܹ handle_softirq ៥Ӏৠ䖛༈ᴹ㒻㓁ᕔϟⳟ schedule()ˈ䖭䞠 523 㸠ⱘ release_kernel_lock()ᇍѢ i386 ऩ໘⧚఼㋏㒳Ў ᰃϡӮথ⫳ⱘˈ䰸䴲ℷ೼䇗䆩⫼᠋㞾Ꮕ㓪ݭⱘЁᮁ᳡ࡵ⿟ᑣDŽ do_invalid_op()DŽᔧ✊ˈ೼ᅲ䰙䖤㸠Ё䖭ḋⱘ䫭䇃˄೼Ёᮁ᳡ࡵ⿟ᑣ៪ bf ߑ᭄ⱘݙ䚼䇗⫼ schedule()˅ ϸϾᄫ㡖ᵘ៤ⱘᰃ䴲⊩ᣛҸˈ಴㗠Ӯѻ⫳ϔ⃵Ā䴲⊩ᣛҸ˄invalid_op˅āᓖᐌˈՓ CPU ᠻ㸠 ໛ϟњϸϾᄫ㡖 0x0f ੠ 0x0bˈ䅽 CPU ᔧ԰ᣛҸএᠻ㸠DŽৃᰃ⬅䖭ޚ䖭䞠ⱘ༹཭П໘ᰃ೼ 91 㸠Ё 92 } while (0) 91 __asm__ __volatile__(".byte 0x0f,0x0b"); \ 90 printk("kernel BUG at %s:%d!\n", __FILE__, __LINE__); \ 89 #define BUG() do { \ 88 */ 87 * see^H^H^Hhear bugs in early bootup as well! 86 * Tell the user there is some problem. Beep too, so we can 85 /* ==================== include/asm•i386/page.h 85 92 ==================== BUGˈ䖭ᰃ೼ include/asm•i386/page.h ЁᅮНⱘ˖ ݙḌᇍℸⱘডᑨᰃᰒ⼎៪㗙೼/var/log/messages ᭛ӊ᳿ሒ⏏Ϟϔᴵߎ䫭ֵᙃˈ✊ৢᠻ㸠ϔϾᅣ᪡԰ 689 return; 688 BUG(); 687 printk("Scheduling in interrupt\n"); 686 scheduling_in_interrupt: [schedule()] ==================== kernel/sched.c 686 689 ==================== kernel/sched.c˖ Ёᮁ᳡ࡵ⿟ᑣݙ䚼䇗⫼ schedule()ˈ䙷ϔᅮᰃ᳝䯂乬њˈ᠔ҹ䕀৥ scheduling_in_interruptDŽ᥹ⴔⳟ 㽕Ң㋏㒳ぎ䯈䖨ಲ⫼᠋ぎ䯈Пᯊ߭Ꮖ㒣⾏ᓔњ݋ԧⱘЁᮁ᳡ࡵ⿟ᑣˈ䆺㾕㄀ 3 ゴDŽ᠔ҹˈབᵰ೼ᶤϾ ᰃ䖨ಲࠄ⫼᠋ぎ䯈DŽ䖬㽕⊼ᛣ˖಴Ёᮁ㗠䖯ܹݙḌᑊϡㄝѢᏆ㒣䖯ܹњᶤϾЁᮁ᳡ࡵ⿟ᑣˈ㗠ᔧ CPU ⱘݙ䚼䇗⫼њ䖭Ͼߑ᭄৫˛݊ᅲˈҢጠ༫ⱘЁᮁ䖨ಲᯊϡӮ䇗⫼ schedule()ˈ಴ЎℸᯊⱘЁᮁ䖨ಲᑊϡ ⫳њጠ༫Ёᮁˈ䙷МᔧҢጠ༫ⱘЁᮁ䖨ಲᯊϡᰃг㽕䇗⫼ schedule()৫˛䙷ϡህㄝѢᰃ೼Ёᮁ᳡ࡵ⿟ᑣ 䆌ᓔЁᮁⱘˈབᵰ೼ᠻ㸠䖛⿟Ёথܕг䆌Ӯ䯂ˈ៥Ӏ೼㄀ 3 ゴЁⳟࠄˈ೼ᠻ㸠Ёᮁ᳡ࡵ⿟ᑣⱘᯊ׭ᰃ гা㛑䗮䖛ᡞᔧࠡ䖯⿟ⱘ need_resched ᄫ↉䆒៤ 1 ᴹ㸼䖒䖭⾡㽕∖ˈ㗠ϡ㛑Ⳉ᥹䇗⫼ schedule()DŽ䇏㗙 ⱘࠡ໩㹿ࡼഄথ⫳ˈ㗠ϡ㛑೼ϔϾЁᮁ᳡ࡵ⿟ᑣⱘݙ䚼থ⫳DŽेՓϔϾЁᮁ᳡ࡵ⿟ᑣ᳝䇗ᑺⱘ㽕∖ˈ ҹࠡ䆆䖛ˈᇍ schedule()া㛑⬅䖯⿟೼ݙḌЁЏࡼ䇗⫼ˈ៪㗙೼ᔧࠡ䖯⿟Ң㋏㒳ぎ䯈䖨ಲ⫼᠋ぎ䯈 ࠡ䖯⿟ˈ೼䖯ܹ schedule()ᯊ݊ active_mm ϔᅮϡ㛑ᰃ 0˄㾕 515 㸠˅DŽৢ䴶៥Ӏ䖬㽕ಲࠄ䖭Ͼ䆱乬ϞDŽ ೼ᅗПࠡ䖤㸠ⱘ䙷Ͼ䖯⿟ⱘ active_mmDŽ᠔ҹˈℷ೼䖤㸠Ёⱘ䖯⿟ˈгेᔧ⫼׳Ў 0ˈ䖤㸠ᯊህ㽕᱖ᯊ ˄⫼᠋˅ぎ䯈ⱘ᭄᥂㒧ᵘDŽབᵰᔧࠡ䖯⿟ᅲ䰙ϞᰃϾݙḌ㒓⿟ˈ䙷ህ≵᳝⫼᠋ぎ䯈ˈ᠔ҹ݊ mm ᣛ䩜 ҹࠡ៥Ӏ䆆䖛ˈ೼ task_struct 㒧ᵘЁ᳝ϸϾ mm_struct ᣛ䩜DŽϔϾᰃ mmˈᣛ৥ҷ㸼ⴔ䖯⿟ⱘ㰮ᄬ ᰃৃҹ⧚㾷ⱘˈাᰃ㒭䯙䇏੠⧚㾷ᏺᴹњϔѯೄ䲒DŽ 䖭Ͼߑ᭄ЁՓ⫼њ䆌໮ goto 䇁হDŽᇍѢ䖭МϔϾ䴲ᐌ乥㐕ഄᠻ㸠ⱘߑ᭄ˈᡞ䖤㸠ᬜ⥛ᬒ೼㄀ϔԡ 365 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᇚ䖭Ͼ䯳߫䫕ԣˈҹ䰆ℶᴹ㞾݊ᅗ໘⧚఼ⱘᑆᡄDŽབᵰܜϟ䴶ህ㽕⍝ঞৃᠻ㸠䖯⿟䯳߫њˈ᠔ҹ 㒓ᇍ唤DŽކ㓧ᄬЁⱘ㓧 ྟ࣪៤{&init_task, 0}ˈ݊ԭⱘܼ߭Ў{0, 0}DŽҷⷕЁ__cacheline_aligned 㸼⼎᭄᥂㒧ᵘⱘ䍋⚍ᑨϢ催䗳 ㋴ˈे CPU0 ⱘ schedule_data 㒧ᵘ߱ܗSMP 㒧ᵘ㗠䆒ⱘˈ᠔ҹ៥Ӏ೼䖭䞠ᑊϡ݇ᖗDŽ᭄㒘Ёⱘ㄀ϔϾ 䖭䞠ⱘ㉏ൟ cycles_t ᅲ䰙Ϟᰃ᮴ヺোᭈ᭄ˈ⫼ᴹ䆄ᔩ䇗ᑺথ⫳ⱘᯊ䯈DŽ䖭Ͼ᭄᥂㒧ᵘᰃЎ໮໘⧚఼ 101 } aligned_data [NR_CPUS] __cacheline_aligned = { {{&init_task,0}}}; 100 char __pad [SMP_CACHE_BYTES]; 99 } schedule_data; 98 cycles_t last_schedule; 97 struct task_struct * curr; 96 struct schedule_data { 95 static union { 94 */ 93 * to prevent cacheline ping•pong. 92 * We align per•CPU scheduling data on cacheline boundaries, 91 /* ==================== kernel/sched.c 91 101 ==================== ˄kernel/sched.c˅˖ ᣛ䩜 sched_data ᣛ৥ϔϾ schedule_data ᭄᥂㒧ᵘˈ⫼ᴹֱᄬկϟϔ⃵䇗ᑺᯊՓ⫼ⱘֵᙃ 541 move_rr_back: 540 goto move_rr_last; 539 if (prev•>policy == SCHED_RR) 538 /* move an exhausted RR process to be last.. */ 537 536 spin_lock_irq(&runqueue_lock); 535 534 sched_data = & aligned_data[this_cpu].schedule_data; 533 */ 532 * only one process per CPU. 531 * 'sched_data' is protected by the fact that we can run 530 /* 529 528 handle_softirq_back: [schedule()] ==================== kernel/sched.c 528 541 ==================== Ңᠻ㸠 softirq 䯳߫ᅠ↩ҹৢ㒻㓁ᕔϟⳟ˖ 677 goto handle_softirq_back; 676 do_softirq(); 675 handle_softirq: [schedule()] ==================== kernel/sched.c 675 677 ==================== Ў䖭ѯ䇋∖᳡ࡵ˖ 366 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 58 #define TICK_SCALE(x) ((x) >> 1) 57 #elif HZ < 400 56 #define TICK_SCALE(x) ((x) >> 2) 55 #if HZ < 200 54 */ 53 * calculation depends on the value of HZ. 52 * We want the time•slice to be around 50ms or so, so this 51 * 50 * task. 49 * is a "high•priority" task, and a "+10" is a low•priority 48 * gets. The nice value ranges from •20 to +19, where a •20 47 * NOTE! The unix "nice" value influences how long a process 46 * 45 * Scheduling quanta. 44 /* ==================== kernel/sched.c 44 67 ==================== ҹ䖤㸠ⱘᯊ䯈䜡乱ˈ䖭гᰃ೼ kernel/sched.c ЁᅮНⱘ˖ 㑻߿ᤶㅫ៤ৃܜ㑻ⱘ݊ᅗ䖯⿟᳝њӬ࢓DŽᅣ᪡԰ NICE_TO_TICKS ḍ᥂㋏㒳ᯊ䩳ⱘ㊒ᑺᇚ䖯⿟ⱘӬܜ ᠔ҹ䖭Փ䯳߫Ё݋᳝ⳌৠӬˈܜ㑻ⱘ䖯⿟ˈ䇗ᑺⱘᯊ׭ᥦ೼ࠡ䴶ⱘ䖯⿟Ӭܜ䯈䜡乱DŽᇍѢ݋᳝ⳌৠӬ 乱䰡ࠄњ 0ˈህ㽕Ңৃᠻ㸠䖯⿟䯳߫ runqueue Ёᔧࠡⱘԡ㕂Ϟ⿏ࠄ䯳߫ⱘ᳿ሒˈৠᯊᘶ໡᳔݊߱ⱘᯊ ࠄ 0DŽᇍѢ䇗ᑺᬓㄪЎ SCHED_RR ⱘ䖯⿟ˈϔᮺ݊ᯊ䯈䜡ޣ᳝໮催ˈ䱣ⴔ䖤㸠ᯊ䯈ⱘ⿃㌃᳔㒜ᘏӮ䗦 DŽ䖭ᰃ೼ϔϾߑ᭄ update_process_times()Ё䖯㸠ⱘˈ䆺㾕ϟϔ㡖DŽϡㅵϔϾ䖯⿟ⱘᯊ䯈䜡乱ޣᯊ䛑㽕䗦 䖭ᰃҔМᛣᗱਸ਼˛䖭䞠ⱘ prev•>counter ҷ㸼ⴔᔧࠡ䖯⿟ⱘ䖤㸠ᯊ䯈䜡乱ˈ᭄݊ؐ೼↣⃵ᯊ䩳Ёᮁ 685 684 goto move_rr_back; 683 } 682 move_last_runqueue(prev); 681 prev•>counter = NICE_TO_TICKS(prev•>nice); 680 if (!prev•>counter) { 679 move_rr_last: [schedule()] ==================== kernel/sched.c 679 685 ==================== 䞠ˈህᰃᇍ䇗ᑺᬓㄪЎ SCHED_RR ⱘᔧࠡ䖯⿟ⱘ䖭⾡໘⧚˄kernel/sched.c˅˖ 䖤㸠DŽ䖭ܜ㑻ⱘ݊ᅗህ㒾䖯⿟ܜSCHED_RR ⱘ䖯⿟᳝Ͼᯊ䯈䜡乱ˈ⫼ᅠњ䖭Ͼ䜡乱ህ㽕䅽݋᳝ⳌৠӬ 㑻Ϟᅲ㸠䕂ᤶ䇗ᑺDŽгህᰃ䇈ˈᇍ䇗ᑺᬓㄪЎܜ㸠 SCHED_RR 䇗ᑺᬓㄪˈ䖭⾡ᬓㄪ೼ⳌৠⱘӬ 㑻ⱘ݊ᅗ䖯⿟ህϡ݀ᑇњDŽ᠔ҹˈᇍ䖭ḋⱘ䖯⿟ᑨ䆹ᅲܜህ≵᳝ᴎӮ䖤㸠DŽԚᰃˈ䖭ḋᇍ݋᳝ⳌৠӬ 㑻᳈Ԣⱘ䖯⿟߭ᴀᴹܜ㑻ⱘ䖯⿟ৃҹ࠹༎ᅗⱘ䖤㸠ˈ㗠Ӭܜ㑻ⱘ䖯⿟㗠㿔ⱘDŽ಴Ў݋᳝᳈催ӬܜৠӬ ཹDŽৃᰃˈབᵰᰃফࠄ䇗ᑺৢৃ㛑Ӯ䭓ᯊ䯈䖤㸠ⱘ䖯⿟ˈ䙷ḋህϡ݀ᑇњDŽ䖭⾡ϡ݀ℷᗻᰃᇍ݋᳝Ⳍ 㑻᳈催ⱘ䖯⿟࠹༎ЎℶDŽᇍѢ↣⃵ফࠄ䇗ᑺᯊ㽕∖䖤㸠ᯊ䯈ϡ䭓ⱘ䖯⿟ˈ䖭ḋᑊ≵᳝ҔМϡܜ㗙㹿Ӭ ᳝ऎ߿DŽ䇗ᑺᬓㄪЎ SCHED_FIFO ⱘ䖯⿟ϔᮺফࠄ䇗ᑺ㗠ᓔྟ䖤㸠Пৢˈህ㽕ϔⳈ䖤㸠ࠄ㞾ᜓ䅽ߎ៪ 㑻ⱘ䖯⿟䖭Ͼ䯂乬ϞѠ㗙ܜ㑻ⱘ䇗ᑺᬓㄪˈৃᰃ೼ᗢḋ䇗ᑺ݋᳝ⳌৠӬܜ੠ SCHED_FIFO 䛑ᰃ෎ѢӬ 䖯㸠ϔ⚍⡍⅞ⱘ໘⧚DŽSCHED_RRܜᔧࠡ䖯⿟ prev ⱘ䇗ᑺᬓㄪЎ SCHED_RRˈे䕂ᤶ䇗ᑺˈ䙷ህ㽕 367 368 59 #elif HZ < 800 60 #define TICK_SCALE(x) (x) 61 #elif HZ < 1600 62 #define TICK_SCALE(x) ((x) << 1) 63 #else 64 #define TICK_SCALE(x) ((x) << 2) 65 #endif 66 67 #define NICE_TO_TICKS(nice) (TICK_SCALE(20•(nice))+1) ᇚϔϾ䖯⿟ⱘ task_struct 㒧ᵘҢৃᠻ㸠䯳߫Ёⱘᔧࠡԡ㕂⿏ࠄ䯳߫ⱘ᳿ሒᰃ⬅ move_last_runqueue ᅠ៤ⱘ˄kernel/sched.c˅˖ ==================== kernel/sched.c 309 313 ==================== [schedule()>move_last_runqueue()] 309 static inline void move_last_runqueue(struct task_struct * p) 310 { 311 list_del(&p•>run_list); 312 list_add_tail(&p•>run_list, &runqueue_head); 313 } ᡞ䖯⿟⿏ࠄৃᠻ㸠䖯⿟䯳߫ⱘ᳿ሒᛣੇⴔ˖བᵰ䯳߫Ё≵᳝䌘Ḑ᳈催ⱘ䖯⿟ˈԚᰃ᳝ϔϾ䌘ḐϢ ПⳌৠⱘ䖯⿟ᄬ೼ˈ䙷Мˈ䙷Ͼ䌘Ḑ㱑✊Ⳍৠ㗠ᥦ೼ࠡ䴶ⱘ䖯⿟ህӮ㹿䗝ЁDŽ㒻㓁೼ schedule()Ёᕔϟ ⳟ˄kernel/sched.c˅˖ ==================== kernel/sched.c 541 553 ==================== [schedule()] 541 move_rr_back: 542 543 switch (prev•>state) { 544 case TASK_INTERRUPTIBLE: 545 if (signal_pending(prev)) { 546 prev•>state = TASK_RUNNING; 547 break; 548 } 549 default: 550 del_from_runqueue(prev); 551 case TASK_RUNNING: 552 } 553 prev•>need_resched = 0; ᔧࠡ䖯⿟ህᰃℷ೼䖤㸠Ёⱘ䖯⿟ˈৃᰃᔧ䖯ܹ schedule()ᯊ݊⢊ᗕैϡϔᅮᰃ TASK_RUNNINGDŽ ՟བˈ೼៥Ӏ䖭Ͼᚙ᱃Ёˈᔧࠡ䖯⿟Ꮖ೼ do_exit()Ёᇚ݊⢊ᗕᬍ៤ TASK_ZOMBIEDŽজ՟བˈࠡϔ㡖 Ё៥Ӏⳟࠄᔧࠡ䖯⿟೼ sys_wait4()Ё䇗⫼ schedule()ᯊⱘ⢊ᗕЎ TASK_INTERRUPTIBLEDŽ᠔ҹˈ䖭䞠 ⱘ prev•>state Ϣ݊䇈ᰃᔧࠡ䖯⿟ⱘ⢊ᗕ䖬ϡབ䇈ᰃ݊ᛣᜓDŽℷ಴Ў䖭ḋˈᔧ݊ᛣᜓ᮶ϡᰃ㒻㓁䖯㸠г ϡᰃৃЁᮁⱘⴵ⳴ᯊˈህ㽕䗮䖛 del_from_runqueue()ᡞ䖭䖯⿟Ңৃᠻ㸠䯳߫Ё᩸ϟᴹDŽ঺ϔᮍ䴶ˈг ৃҹⳟߎ TASK_INTERRUPTIBLE Ϣ TASK_UNINTERRUPTIBLE ϸ⾡ⴵ⳴⢊ᗕП䯈ⱘऎ߿ˈࠡ㗙೼ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 148 */ 147 * @head: the head for your list. 146 * @pos: the &struct list_head to use as a loop counter. 145 * list_for_each • iterate over a list 144 /** ==================== include/linux/list.h 144 150 ==================== ⷕЁⱘ list_for_each ᰃϾᅣᅮНˈᅮНѢ include/linux/list.h Ϟ˖ Ў໻āDŽгህᰃ䇈ˈབᵰϸϾ䖯⿟݋᳝Ⳍৠᴗؐⱘ䆱ˈ䙷ህᰃᥦ೼ࠡ䴶ⱘ䖯⿟㚰ߎDŽҷܹܜ䖭ᛣੇⴔĀ ᭄ goodness()䅵ㅫߎᅗᔧࠡ᠔݋᳝ⱘᴗؐˈ✊ৢϢᔧࠡⱘ᳔催ؐ c Ⳍ↨DŽ⊼ᛣ䖭䞠ⱘᴵӊ“weight > cāˈ ⱘ䖨ಲؐ∌䖰Ў 1˅ˈгህᰃϔ㠀᪡԰㋏㒳ᬭ⾥㡖Ё᠔⿄ⱘĀህ㒾ā䖯⿟ˈЎ↣ϔϾ䖭ḋⱘ䖯⿟䗮䖛ߑ ҹ䖤㸠ᯊᠡӮ䅽ᅗ䖤㸠DŽ✊ৢˈ䘡ग़ৃᠻ㸠䯳߫ runqueue Ёⱘ↣Ͼ䖯⿟˄೼ऩ CPU ㋏㒳Ё can_schedule() 䗝ⱘ䖛⿟Ң idle 䖯⿟े 0 ো䖯⿟ᓔྟˈ݊ᴗؐЎ•1000ˈ䖭ᰃৃ㛑ⱘ᳔Ԣؐˈ㸼⼎ҙ೼≵᳝݊ᅗ䖯⿟ৃ ೼䖭↉⿟ᑣЁˈnext ᘏᰃᣛ৥Ꮖⶹ᳔Շⱘ׭䗝䖯⿟ˈc ߭ᰃ䖭Ͼ䖯⿟ⱘ㓐ড়ᴗؐˈ៪䖤㸠䌘ḐDŽᣥ 576 } 575 } 574 c = weight, next = p; 573 if (weight > c) 572 int weight = goodness(p, this_cpu, prev•>active_mm); 571 if (can_schedule(p, this_cpu)) { 570 p = list_entry(tmp, struct task_struct, run_list); 569 list_for_each(tmp, &runqueue_head) { 568 still_running_back: 567 566 goto still_running; 565 if (prev•>state == TASK_RUNNING) 564 c = •1000; 563 next = idle_task(this_cpu); 562 */ 561 * Default process to select.. 560 /* 559 repeat_schedule: 558 557 */ 556 * this is the scheduler proper: 555 /* [schedule()] ==================== kernel/sched.c 555 576 ==================== 䖤㸠њ˄kernel/sched.c˅˖ ✊ৢˈᇚ prev•>need_resched ᘶ໡៤ 0ˈ಴Ў᠔䳔∖ⱘ䇗ᑺᏆ㒣೼䖯㸠DŽϟ䴶ህ㽕ᣥ䗝ϔϾ䖯⿟ᴹ 㒻㓁䖯㸠˄㾕 551 㸠˅ˈ䙷೼䖭䞠ህϡ䳔㽕᳝ҔМ⡍⅞໘⧚DŽ ⱘᚙᔶˈৠḋ㽕ᇚ䖯⿟Ңৃᠻ㸠䯳߫Ё᩸ϟᴹDŽডПˈབᵰᔧࠡ䖯⿟ⱘᛣᜓᰃ TASK_RUNNINGˈे ⱘᕅડDŽ䇋⊼ᛣˈ೼ 548 㸠Ϣ 549 㸠П䯈ᑊ᮴ break 䇁হˈ᠔ҹᔧ≵ֵ᳝োㄝᕙ໘⧚ᯊህ㨑ܹњ default 䖯⿟ֵ᳝োㄝᕙ໘⧚ᯊ㽕ᇚ݊ᬍ៤ TASK_RUNNINGˈ䅽݊໘⧚ᅠ䖭ѯֵোݡ䇈ˈ㗠ৢ㗙߭ϡফֵো 369 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 151 * Non•RT process • normal case first. 150 /* 149 148 goto out; 147 if (p•>policy & SCHED_YIELD) 146 weight = •1; 145 */ 144 * Also, dont trigger a counter recalculation. 143 * runnable process, but before the idle thread. 142 * select the current process after every other 141 /* 140 139 int weight; 138 { 137 static inline int goodness(struct task_struct * p, int this_cpu, struct mm_struct *this_mm) 136 135 */ 134 * +1000: realtime process, select this. 133 * +ve: "goodness" value (the larger, the better) 132 * selected) 131 * 0: out of time, recalculate counters (but it might still be 130 * •1000: never select this 129 * Return values: 128 * 127 * and TLB miss penalties. 126 * on what CPU they've run on lately etc to try to handle cache 125 * You can weigh different processes against each other depending 124 * This is the function that decides how desirable a process is.. 123 /* [schedule()>goodness()] ==================== kernel/sched.c 123 187 ==================== 䙷Мˈ䖯⿟ⱘᔧࠡᴗؐᰃᗢḋ䅵ㅫⱘਸ਼˛䇋ⳟ goodness()ⱘҷⷕ˄kernel/sched.c˅˖ DŽܜᛣੇⴔˈⳌᇍѢᴗؐⳌৠⱘ݊ᅗ䖯⿟ᴹ䇈ˈᔧࠡ䖯⿟Ӭ гህᰃ䇈ˈབᵰᔧࠡ䖯⿟ᛇ㽕㒻㓁䖤㸠ˈ䙷М೼ᣥ䗝׭䗝䖯⿟ᯊҹᔧࠡ䖯⿟ℸࠏⱘᴗؐᓔྟDŽ䖭 674 673 goto still_running_back; 672 next = prev; 671 c = goodness(prev, this_cpu, prev•>active_mm); 670 still_running: [schedule()] ==================== kernel/sched.c 670 674 ==================== ˄kernel/sched.c˅˖ ᠻ㸠ϔϟ still_runningܜ䖭䞠䖬᳝ϔϾᇣᦦ᳆ˈህᰃབᵰᔧࠡ䖯⿟ⱘᛣ೒ᰃ㒻㓁䖤㸠ˈ䙷ህ㽕 150 for (pos = (head)•>next; pos != (head); pos = pos•>next) define list_for_each(pos, head) \# 149 370 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 㑻ˈܜᇍѢᅲᯊ䖯⿟ˈे䇗ᑺᬓㄪЎ SCHED_FIFO ៪ SCHED_RR ⱘ䖯⿟ˈ߭঺᳝ϔ⾡ℷ৥ⱘӬ ⫼᠋ぎ䯈Ϣᔧࠡ䖯⿟ⱘⳌৠˈ಴㗠᮴䳔ߛᤶ⫼᠋ぎ䯈ˈ߭Ӯᕫࠄϔ⚍ᇣĀ༪ࢅāˈᇚᴗؐ乱໪ࡴ 1DŽ 40DŽ᠔ҹˈ㓐ড়ⱘᴗؐ೼ᯊ䯈䜡乱ᇮ᳾⫼ᅠᯊ෎ᴀϞᰃѠ㗙П੠DŽℸ໪ˈབᵰᰃϾݙḌ㒓⿟ˈ៪㗙݊ ҹ•20 Ў᳔催ˈা᳝⡍ᴗ⫼᠋ᠡ㛑ᡞ nice ؐ䆒㕂៤ᇣѢ 0˗㗠(20 • p•>nice)߭ᥝ䕀њᅗⱘᮍ৥៤Ў 1 㟇 㑻ˈ᭄݊ؐ㸼⼎Ā䇺䅽āⱘ⿟ᑺˈ᠔ҹ⿄Ў“niceāDŽ݊পؐ㣗ೈЎ 19̚•20ˈܜUnix ⊓⫼ϟᴹⱘ䋳৥Ӭ 㑻 niceˈ䖭ᰃҢᮽᳳܜϔϾ಴㋴ᰃ࠽ϟⱘᯊ䯈䜡乱ˈབᵰ⫼ᅠњ߭ᴗؐЎ 0DŽ঺ϔϾ಴㋴ᰃ䖯⿟ⱘӬ ѢϸϾ಴㋴DŽއᇍѢ≵᳝ᅲᯊ㽕∖ⱘ䖯⿟ˈे䇗ᑺᬓㄪЎ SCHED_OTHER ⱘ䖯⿟ˈ݊ᴗؐЏ㽕প ᕜԢⱘᴗؐˈϔ㠀ህ㒾䖯⿟ⱘᴗؐ㟇ᇥᰃ 0DŽ བᵰϔϾ䖯⿟䗮䖛㋏㒳䇗⫼ sched_yield()ᯢ⹂㸼⼎њĀ⼐䅽āৢˈህᇚ݊ᴗؐᅮЎ•1DŽ䖭ᰃˈܜ佪 187 } 186 return weight; 185 out: 184 weight = 1000 + p•>rt_priority; 183 */ 182 * into account). 181 * runqueue (taking priorities within processes 180 * Realtime process, select the first one on the 179 /* 178 177 } 176 goto out; 175 weight += 20 • p•>nice; 174 weight += 1; 173 if (p•>mm == this_mm || !p•>mm) 172 /* .. and a slight advantage to the current MM */ 171 170 #endif 169 weight += PROC_CHANGE_PENALTY; 168 if (p•>processor == this_cpu) 167 /* (this is equivalent to penalizing other processors) */ 166 /* Give a largish advantage to the same processor... */ 165 #ifdef CONFIG_SMP 164 163 goto out; 162 if (!weight) 161 weight = p•>counter; 160 */ 159 * over.. 158 * Don't do any other calculations if the time slice is 157 * 156 * according to the number of clock•ticks it has left. 155 * Give the process a first•approximation goodness value 154 /* if (p•>policy == SCHED_OTHER) { 153 /* 152 371 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 664 p•>counter = (p•>counter >> 1) + NICE_TO_TICKS(p•>nice); 663 for_each_task(p) 662 read_lock(&tasklist_lock); 661 spin_unlock_irq(&runqueue_lock); 660 struct task_struct *p; 659 { 658 recalculate: [schedule()] ==================== kernel/sched.c 658 669 ==================== SCHED_OTHER 䖯⿟ⱘᴗؐ֓≵᳝ᴎӮĀ⍜㗫āࠄ 0DŽ Ꮖ㒣ᣕ㓁њϔ↉ᯊ䯈ˈ಴Ў৺߭މབϞ᠔䗄ˈ䖭䇈ᯢ㋏㒳Ёᔧࠡ≵᳝ህ㒾ⱘᅲᯊ䖯⿟DŽ㗠Ϩˈ䖭⾡ᚙ བᵰᔧࠡᏆ㒣䗝ᢽⱘ䖯⿟˄ᴗ᳔ؐ催ⱘ䖯⿟˅ⱘᴗؐЎ 0ˈ䙷ህ㽕䞡ᮄ䅵ㅫ৘Ͼ䖯⿟ⱘᯊ䯈䜡乱DŽ 580 goto recalculate; 579 if (!c) 578 /* Do we need to re•calculate counters? */ [schedule()] ==================== kernel/sched.c 578 580 ==================== 㒻㓁ᕔϟⳟ˄kernel/sched.c˅˖ SCHED_RR ⱘ䖯⿟ᄬ೼ˈ߭݊ᴗؐ㟇ᇥг᳝ 1000DŽ ؐ䛑Ꮖ䰡ࠄ 0ˈ䇈ᯢ䖭ѯ䖯⿟ⱘ䇗ᑺᬓㄪ䛑ᰃ SCHED_OTHERˈ಴Ў㢹᳝ᬓㄪЎ SCHED_FIFO ៪ ԢЎ 0ˈ᠔ҹা㽕䯳߫Ё᳝݊ᅗህ㒾䖯⿟ᄬ೼ህϡৃ㛑Ў䋳᭄DŽ䖭䞠㽕ᣛߎˈ䯳߫Ё᠔᳝݊ᅗ䖯⿟ⱘᴗ ؐ䛑ᰃ 0 ⱘᯊ׭DŽ⬅Ѣ䰸 init 䖯⿟੠䇗⫼њ˄㋏㒳䇗⫼˅sched_yield()ⱘ䖯⿟ҹ໪ˈ↣Ͼ䖯⿟ⱘᴗ᳔ؐ 0 ⱘℷ᭄ˈℸᯊ next ᣛ৥ᣥ䗝ߎᴹⱘ䖯⿟DŽ঺ϔ⾡ৃ㛑ᰃ c ⱘؐЎ 0ˈথ⫳Ѣህ㒾䯳߫Ё᠔᳝䖯⿟ⱘᴗ িࠄ schedule()DŽᔧҷⷕЁⱘ while ᕾ⦃㒧ᴳᯊˈব䞣 c Ёⱘ᳝ؐ޴⾡ৃ㛑DŽϔ⾡ৃ㛑ᰃϔϾ໻Ѣ ⊩䖯㸠ᅮ䞣ⱘߚᵤDŽ 㽕䆆ࠄⱘ recalculate 㒧ড়䍋ᴹߚᵤDŽ䰤Ѣ㆛ᐙˈᴀкᇚϧ⊼Ѣҷⷕᴀ䑿ⱘ䘏䕥ঞ䖛⿟ˈ㗠ϡᇍ䇗ᑺㅫ 䚼ˈ㗠㽕Ϣࠡ䴶䆆ࠄⱘᇍ SCHED_RR ⱘ⡍⅞໘⧚ǃᇍᛣ℆㒻㓁䖤㸠ⱘᔧࠡ䖯⿟ⱘ⡍⅞໘⧚ǃҹঞϟ䴶 ⱘ㽕∖DŽϡ䖛ˈgoodness()䖭Ͼߑ᭄ᑊϡҷ㸼 Linux ⱘ䇗ᑺㅫ⊩ⱘܼޚᇍᅲᯊ䖯⿟ⱘ䇗ᑺгᰃ POSIX ᷛ 䇗ᑺⱘᬜ⥛ǃ䇗ᑺⱘ݀ℷᗻҹঞ݊ᅗᣛᷛП䯈ড໡ᴗ㸵ǃᡬ㹋ˈথሩ៤њ⦄೼䖭ϾḋᄤDŽ঺ϔᮍ䴶ˈ Ⳍᔧ໡ᴖⱘㅫ⊩˄䙷ᯊ䖬≵᳝ᅲᯊ䖯⿟˅ˈৢᴹ೼ᅲ䏉Ё㾝ᕫ䙷༫ㅫ⊩໾໡ᴖњˈህϡᮁࡴҹㅔ࣪ˈ೼ ⬅ℸৃ㾕ˈ೼ Linux ݙḌЁᇍᴗؐⱘ䅵ㅫᰃᕜㅔऩⱘDŽџᅲϞˈ೼ᮽᳳⱘ Unix ㋏㒳Ёᅲ⦄њϔ༫ ᇣ᳝݇DŽ⬅Ѣᅲᯊ䖯⿟ⱘᴗ᳝ؐϾᕜ໻ⱘ෎᭄ˈᔧ᳝ᅲᯊ䖯⿟ህ㒾ᯊ䴲ᅲᯊ䖯⿟ᰃ≵᳝ᴎӮ䖤㸠ⱘDŽ 㑻᮴݇ˈԚᰃᇍ SCHED_RR 䖯⿟ⱘᯊ䯈䜡乱໻ܜᇚ䖯⿟⿏ࠄ䯳߫ⱘሒ䚼DŽᅲᯊ䖯⿟ⱘ nice ᭄ؐϢ݊Ӭ ϡ䍋԰⫼DŽϡ䖛ˈࠡ䴶Ꮖ㒣ⳟࠄˈᇍѢ䇗ᑺᬓㄪЎ SCHED_RR ⱘ䖯⿟ˈᔧ p•>counter 䖒ࠄ 0 ᯊӮᇐ㟈 ϸ⾡䇗ᑺᅲᯊᬓㄪˈϔϾ䖯⿟Ꮖ㒣䖤㸠њ໮Йˈгህᰃᯊ䯈䜡乱 p•>counter ⱘᔧࠡؐˈᇍᴗؐⱘ䅵ㅫ ⫼ˈ᭄݊ؐгᰃ೼㋏㒳䇗⫼ sched_setscheduler()ЁϢ䇗ᑺᬓㄪϔ䍋䆒㕂ⱘDŽҢ䖭䞠䖬ৃҹⳟߎ˖ᇍѢ䖭 䖭⾡䖯⿟ⱘᴗؐ㟇ᇥᰃ 1000DŽ঺ϔᮍ䴶ˈrt_priority ⱘؐᇍѢᅲᯊ䖯⿟П䯈ⱘᴗؐ↨䕗г䍋ⴔ䞡㽕ⱘ԰ SCHED_FIFO ੠ SCHED_RR ϸ⾡᳝ᯊ䯈㽕∖ⱘᬓㄪ䌟ќ䖯⿟ᕜ催ⱘᴗؐ˄ⳌᇍѢ SCHED_OTHER˅ˈ 㑻 rt_priorityˈ˄䖭䞠ⱘ“rtā㸼⼎“real timeā˅ˈ㗠ᴗؐЎ(1000 + p•>rt_priority)DŽৃ㾕ˈܜ䙷ህᰃᅲᯊӬ 372 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 618 * prev == .... ==> (last => next) 617 * 616 * there are 3 processes which are affected by a context switch: 615 /* 614 kstat.context_swtch++; 613 612 #endif /* CONFIG_SMP */ ==================== kernel/sched.c 612 657 ==================== 596 #ifdef CONFIG_SMP 595 594 goto same_process; 593 if (prev == next) 592 591 spin_unlock_irq(&runqueue_lock); 590 #endif 589 next•>processor = this_cpu; 588 next•>has_cpu = 1; 587 #ifdef CONFIG_SMP 586 sched_data•>curr = next; 585 */ 584 * sched_data. 583 * switching to the next task, save this fact in 582 * from this point on nothing can prevent us from 581 /* [schedule()] ==================== kernel/sched.c 581 596 ==================== ⱘህᰃߛᤶⱘџᚙњ˄kernel/sched.c˅˖خ䖯⿟ᣥདПৢˈ᥹ϟᴹ㽕 䞣 c ⱘؐᑨ䆹ϡЎ 0 њˈℸᯊ next ᣛ৥ᣥ䗝ߎᴹⱘ䖯⿟DŽ ᅠҹৢˈ⿟ᑣ䕀ಲᷛো repeat_schedule ໘䞡ᮄᣥ䗝DŽ䖭ḋˈᔧݡ⃵ᅠ៤ᇍህ㒾䖯⿟䯳߫ⱘᠿᦣᯊˈব 䜡乱ⱘ๲ࡴᑊϡӮᦤछ݊㓐ড়ᴗؐˈ㗠ϨᇍѢ SCHED_FIFO 䖯⿟߭䖲ᯊ䯈䜡乱гᰃ≵᳝ᛣНⱘDŽ䅵ㅫ ゲѝⱘഄℹ˄㓐ড়ᴗؐ㟇ᇥᰃ 1000˅ˈ᠔ҹাᰃᇍ䴲ᅲᯊ䖯⿟П䯈ⱘゲѝ᳝ᛣНDŽ㟇Ѣᅲᯊ䖯⿟ˈᯊ䯈 ᰺āˈгϡ㛑䖒ࠄৃϢᅲᯊ䖯⿟ݏܝ䷀DŽ಴ℸˈेՓ㒣䖛ᕜ䭓ᯊ䯈ⱘĀסNICE_TO_TICKS(p•>nice)ⱘϸ ᅮњ䞡ᮄ䅵ㅫҹৢⱘ㓐ড়ᴗؐ∌䖰гϡৃ㛑䖒ࠄއNICE_TO_TICKS(p•>nice)Ⳍࡴˈ䖭ḋህ ञˈݡϢޣᴎӮDŽϡ䖛ˈᇍ㓐ড়ᴗؐⱘ䖭⾡ᦤछᰃᕜ᳝䰤ⱘˈ↣⃵䞡ᮄ䅵ㅫ䛑ᇚॳ᳝ⱘᯊ䯈䜡乱 ᕾ⦃DŽᇍѢϡ೼ህ㒾䖯⿟䯳߫Ёⱘ䴲ᅲᯊ䖯⿟ˈ䖭䞠ᕫࠄњᦤछ݊ᯊ䯈䜡乱ǃҢ㗠ᦤछ݊㓐ড়ᴗؐⱘ ⱘ䅵ㅫᰃᕜㅔऩⱘDŽ䖭䞠㽕⊼ᛣˈfor_each_task()ᰃᇍ᠔᳝䖯⿟ⱘᕾ⦃ˈ㗠ᑊϡᰃҙᇍህ㒾䖯⿟䯳߫ⱘ ᅮⴔᯊ䯈䜡乱˅DŽৃ㾕ˈ᠔԰އ㑻гܜᅮНг೼ࠡ䴶ⳟࠄ䖛њ˄ᰒ✊ˈnice ؐᇍѢ䴲ᅲᯊ䖯⿟᮶㸼⼎Ӭ p•>counter 䰸ҹ 2ˈݡ೼Ϟ䴶ࡴϞ⬅䆹䖯⿟ⱘ nice ؐᤶㅫ䖛ᴹⱘ tick ᭄䞣DŽᅣ᪡԰ NICE_TO_TICKS ⱘ ᅣᅮН for_each_task()ˈ䇏㗙Ꮖ㒣೼ҹࠡⳟࠄ䖛њDŽ䖭䞠᠔԰ⱘ䅵ㅫᰃᇚ↣Ͼ䖯⿟ᔧࠡⱘᯊ䯈䜡乱 669 668 goto repeat_schedule; 667 } 666 spin_lock_irq(&runqueue_lock); read_unlock(&tasklist_lock); 665 373 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᷛᖫԡ⏙៤ 0DŽ᠔ҹᅲ䰙Ϟা࠽ϟњϸӊџˈ݊ϔᰃᇍ⫼᠋㰮ᄬぎ䯈ⱘ໘⧚ˈ݊Ѡህᰃ䖯⿟ⱘߛᤶ 649 㸠ⱘ__schedule_tail()߭াᰃᇚᔧࠡ䖯⿟ prev ⱘ task_struct 㒧ᵘЁ policy ᄫ↉䞠ⱘ SCHED_YIELD ᔧࠡ䖯⿟ prev ϡৠˈ䙷ህ㽕ߛᤶњDŽᇍѢ i386 ऩ CPU 㒧ᵘ㗠㿔ˈprepare_to_switch()гᰃぎ䇁হˈ㗠 ᳝њব࣪ˈ᠔ҹ䕀ಲ tq_scheduler_back ໘ݡ䇗ᑺϔ⃵DŽ৺߭ˈབᵰᣥ䗝ߎᴹⱘ䖯⿟ next ϢމᮁᑊϨᚙ 㒧ᵘ㗠㿔Ўぎ䇁হDŽࠡ䴶Ꮖ㒣ᡞᔧࠡ䖯⿟ⱘ need_resched ⏙ 0ˈབᵰ⦄೼জ៤њ䴲 0 ߭ϔᅮᰃথ⫳њЁ ህϡ⫼ߛᤶˈⳈ᥹䕀ࠄᷛো same_process ໘ህ䖨ಲњDŽ䖭䞠ⱘ reacquire_kernel_lock()ᇍѢ i386 ऩ CPU བᵰᣥ䗝ߎᴹⱘ䖯⿟ next ህᰃᔧࠡ䖯⿟ prevˈˈܜ䖭䞠៥Ӏ䏇䖛ᇍ SMP 㒧ᵘⱘᴵӊ㓪䆥䚼ߚDŽ佪 657 656 return; 655 654 goto need_resched_back; 653 if (current•>need_resched) 652 reacquire_kernel_lock(current); 651 same_process: 650 649 __schedule_tail(prev); 648 switch_to(prev, next, prev); 647 */ 646 * stack. 645 * This just switches the register state and the 644 /* 643 642 } 641 } 640 mmdrop(oldmm); 639 prev•>active_mm = NULL; 638 if (!prev•>mm) { 637 636 } 635 switch_mm(oldmm, mm, next, this_cpu); 634 if (next•>active_mm != mm) BUG(); 633 } else { 632 enter_lazy_tlb(oldmm, next, this_cpu); 631 atomic_inc(&oldmm•>mm_count); 630 next•>active_mm = oldmm; 629 if (next•>active_mm) BUG(); 628 if (!mm) { 627 struct mm_struct *oldmm = prev•>active_mm; 626 struct mm_struct *mm = next•>mm; 625 { 624 prepare_to_switch(); 623 */ 622 * This might sound slightly confusing but makes tons of sense. 621 * but prev is set to (the just run) 'last' process by switch_to(). It's the 'much more previous' 'prev' that is on next's stack, * 620 * 619 374 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᔩDŽ㟇Ѣ LDT ߭ҙ೼ VM86 ῵ᓣЁᠡՓ⫼ˈ᠔ҹϡ೼៥Ӏ݇ᖗП߫DŽ Ⳃᔩⱘ䍋ྟ⠽⧚ഄഔ㺙ܹࠄ᥻ࠊᆘᄬ఼ CR3 ЁDŽ៥Ӏ೼㄀ 2 ゴ䆆䖛ˈCR3 ᘏᰃᣛ৥ᔧࠡ䖯⿟ⱘ义䴶Ⳃ ᇍѢऩ CPU 㒧ᵘ㗠㿔ˈ䖭䞠݇䬂ⱘ䇁হা᳝ϔ㸠ˈ䙷ህᰃ 44 㸠Ёⱘ∛㓪䇁হˈᅗᇚᮄ䖯⿟义䴶 59 } 58 #endif ==================== include/asm•i386/mmu_context.h 58 59 ==================== 46 #ifdef CONFIG_SMP 45 } 44 asm volatile("movl %0,%%cr3": :"r" (__pa(next•>pgd))); 43 /* Re•load page tables */ 42 set_bit(cpu, &next•>cpu_vm_mask); 41 #endif 40 cpu_tlbstate[cpu].active_mm = next; 39 cpu_tlbstate[cpu].state = TLBSTATE_OK; 38 #ifdef CONFIG_SMP 37 load_LDT(next); 36 if (prev•>context.segments != next•>context.segments) 35 */ 34 * Re•load LDT if necessary 33 /* 32 clear_bit(cpu, &prev•>cpu_vm_mask); 31 /* stop flush ipis for the previous mm */ 30 if (prev != next) { 29 { unsigned cpu) 28 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk, [schedule()>switch_mm()] ==================== kernel/sched.c 612 657 ==================== 䖯㸠⫼᠋ぎ䯈ⱘߛᤶDŽ䖭ᰃϾ inline ߑ᭄ˈ݊ҷⷕ೼ include/asm•i386/mmu•context.h Ё˖ ᖙ乏Ϣ next•>mm Ⳍৠˈ৺߭ህ᳝䯂乬њDŽ⬅Ѣᮄ䖯⿟᳝㞾Ꮕⱘ⫼᠋ぎ䯈ˈ᠔ҹህ㽕䗮䖛 switch_mm() ҹৢϡৃ㛑䖒ࠄ 0DŽབᵰᮄ䖯⿟ next ᳝㞾Ꮕⱘ mm_struct 㒧ᵘ˄಴ℸᰃϾ䖯⿟˅ˈ䙷М next•>actieve_mm 1 ޣ㗠ϡᰃⳳⱘᇚℸ㒧ᵘ䞞ᬒˈ಴Ў䖭Ͼ䅵᭄೼ˈ1 ޣⱘ mm_struct 㒧ᵘЁⱘ݅ѿ䅵᭄⫼׳ᰃᇚ䗮䖛݅ѿ ⱘџᚙDŽ䖭䞠ⱘ mmdrop()াخгህᰃ䇈ᔧ䖭ϾݙḌ㒓⿟㹿ߛᤶߎএᯊᔦ䖬ˈ䖭ህᰃ 638 㸠㟇 641 㸠᠔ ⱘ mm_struct 㒧ᵘҔМᯊ׭ᔦ䖬ਸ਼˛ࠄϟϔ⃵䇗ᑺ݊ᅗ䖯⿟䖤㸠ᯊˈ⫼׳ˈ㒳ぎ䯈᯴ᇘ䛑ᰃⳌৠⱘDŽ䙷М mm_struct 㒧ᵘ㛑⫼৫˛㛑DŽ಴Ў᮶✊≵᳝⫼᠋ぎ䯈ˈ߭᠔䳔ⱘাᰃ㋏㒳ぎ䯈ⱘ᯴ᇘˈ㗠᠔᳝䖯⿟ⱘ㋏ ᴹⱘ׳ϔϾ mm_struct 㒧ᵘ˄㾕 628 㸠੠ 630 㸠˅DŽৃᰃ⫼׳ህ㽕೼䖯ܹ䖤㸠ᯊ৥㹿ߛᤶߎএⱘ䖯⿟ 䴶᯴ᇘⳂᔩⱘᣛ䩜ህ೼䖭Ͼ᭄᥂㒧ᵘЁDŽ᠔ҹˈབᵰᮄ䖯⿟≵᳝㞾Ꮕⱘ mm_struct˄಴ℸᰃݙḌ㒓⿟˅ˈ 䆌ϔϾ䖯⿟˄ાᗩᰃݙḌ㒓⿟˅≵᳝ active_mmˈ಴Ўᣛ৥义ܕ乬DŽԚᰃˈݙḌⱘ䆒䅵੠ᅲ⦄ᅲ䰙Ϟϡ བᵰᮄ䖯⿟ᰃϾϡ݋໛⫼᠋ぎ䯈ⱘݙḌ㒓⿟ˈ䙷М݊ active_mm ᣛ䩜гᖙ乏ᰃ 0ˈ৺߭ህϔᅮᰃߎњ䯂 ˈܜ੠ oldmmˈࠡ㗙ᣛ৥Āᮄ䖯⿟”next ⱘ mm_struct 㒧ᵘˈৢ㗙߭ЎĀ㗕䖯⿟”prev ⱘ active_mmDŽ佪 ߚ䜡ϸϾব䞣 mmܙᴹⳟᇍ⫼᠋ぎ䯈ⱘ໘⧚ˈ䖭䞠П᠔ҹ㽕ᮄᓔϔϾ scope ᰃ಴Ў㽕೼ේᷜЁ㸹ܜ switch_to()DŽ 375 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! current ᅲ䰙ϞᰃᅣᅮНˈᅗḍ᥂ᔧࠡⱘේᷜᣛ䩜 ESP 䅵ㅫߎ᠔䳔ⱘഄഔDŽབᵰ㄀ 21 㸠໘ᓩ⫼ current ᰃ B 㗠ϡᰃ A њDŽ៥Ӏҹࠡ䆆䖛ˈ೼ݙḌҷⷕЁᔧ䳔㽕䆓䯂ᔧࠡ䖯⿟ⱘ task_struct 㒧ᵘᯊՓ⫼ⱘᣛ䩜 Փ⫼ A ⱘේᷜˈ㗠Ң㄀ 21 㸠ᓔྟህᰃ೼⫼ B ⱘේᷜњDŽᤶ㿔ПˈҢ㄀ 21 㸠ᓔྟˈĀᔧࠡ䖯⿟āˈᏆ㒣 ೼ᴀ⃵ߛᤶЁ A Ў㽕Ā䇗⾏āⱘ䖯⿟ˈ㗠 B Ў㽕Āߛܹāⱘ䖯⿟DŽ䙷Мˈ೼䖭䞠ⱘ㄀ 16 㟇 20 㸠ᰃ೼ ᅮ៥Ӏ᳝ AǃB ϸϾ䖯⿟ˈ೼ᴀ⃵ߛᤶЁ prev ᣛ৥ Aˈ㗠 next ᣛ৥ BDŽгህᰃ䇈ˈ؛Ꮖ㒣ߛᤶњේᷜDŽ ⱘ㋏㒳ぎ䯈ේᷜᣛ䩜 next•>thread.esp 㕂ܹ ESPDŽ䖭ḋϔᴹˈCPU ೼㄀ 20 㸠Ϣ 21 㸠䖭ϸᴵᣛҸП䯈ህ ࠡ䖯⿟ prev ⱘ㋏㒳ぎ䯈ේᷜᣛ䩜ᄬܹ prev•>thread.espˈ㄀ 20 㸠জᇚᮄফࠄ䇗ᑺ㽕䖯ܹ䖤㸠ⱘ䖯⿟ next ᰃᕜϔ㠀ˈ݊ᅲैᱫ㮣⥘ᴎDŽϨⳟ㄀ 19 㸠੠ 20 㸠DŽ㄀ 19 㸠ᇚᔧࠡⱘ ESPˈгህᰃᔧڣҸDŽⳟ䍋ᴹད ᴹⳟᓔ༈ⱘϝᴵ push ᣛҸ੠㒧ሒ໘ⱘϝᴵ pop ᣛܜ䖭ϔ↉⿟ᑣ㱑✊া᳝ᆹᆹ᭄㸠ˈैᕜ༹᳝཭DŽ ᆘᄬ఼ EAXǃEDX ҹঞ EBX 㒧ড়ˈߚ߿ᇍᑨѢ prevǃnext ੠ prevDŽ 5 Ͼখ᭄DŽ݊Ё%3 ੠%4 ೼ݙᄬЁˈߚ߿Ў next•>thread.esp ੠ next•>thread.eipˈ%5ǃ%6 ੠%7 ߚ߿Ϣ prev•>thread.esp ੠ prev•>thread.eipˈ㗠%2 ߭Ϣᆘᄬ఼ EBX 㒧ড়ˈᇍᑨѢখ᭄Ёⱘ lastDŽ㗠䕧ܹ䚼᳝߭ 䚼᳝ϝϾখ᭄ˈ㸼⼎䖭↉⿟ᑣᠻ㸠ҹৢ᳝ϝ乍᭄᥂Ӯ᳝ᬍবDŽ݊Ё%0 ੠%1 䛑೼ݙᄬЁˈߚ߿Ў 㒣ग़䖛ࠡ䴶޴ゴЁⱘ∛㓪⿟ᑣˈ䇏㗙⦄೼ᇍጠܹ C ⿟ᑣЁⱘ∛㓪䇁হᑨ䆹ϡ䰠⫳њDŽ䖭䞠ⱘ䕧ߎ 33 } while (0) 32 "b" (prev)); \ 31 "a" (prev), "d" (next), \ 30 :"m" (next•>thread.esp),"m" (next•>thread.eip), \ 29 "=b" (last) \ 28 :"=m" (prev•>thread.esp),"=m" (prev•>thread.eip), \ 27 "popl %%esi\n\t" \ 26 "popl %%edi\n\t" \ 25 "popl %%ebp\n\t" \ 24 "1:\t" \ 23 "jmp __switch_to\n" \ 22 "pushl %4\n\t" /* restore EIP */ \ 21 "movl $1f,%1\n\t" /* save EIP */ \ 20 "movl %3,%%esp\n\t" /* restore ESP */ \ 19 "movl %%esp,%0\n\t" /* save ESP */ \ 18 "pushl %%ebp\n\t" \ 17 "pushl %%edi\n\t" \ 16 asm volatile("pushl %%esi\n\t" \ 15 #define switch_to(prev,next,last) do { \ [schedule()>switch_to()] ==================== include/asm•i386/system.h 15 33 ==================== ᅠ៤ⱘˈᅮНѢ include/asm•i386/system.h Ё˖ ⦄೼ˈࠄњ᳔ৢ㽕ߛᤶ䖯⿟ⱘ݇༈њDŽ᠔䇧䖯⿟ⱘߛᤶЏ㽕ᰃේᷜⱘߛᤶˈ䖭ᰃ⬅ᅣ᪡԰ switch_to() ⫼᠋ぎ䯈ˈ㋏㒳ぎ䯈ⱘ᯴ᇘ߭∌䖰ϡবDŽ ⳌᇍᑨⱘⳂᔩ乍䛑ᣛ৥Ⳍৠⱘ义䴶㸼ˈ᠔ҹˈϡㅵᤶϞાϔϾ䖯⿟ⱘ义䴶Ⳃᔩ䛑ϔḋˈফᕅડⱘাᰃ ϡӮ䗴៤䯂乬৫˛ϡӮⱘˈ಴Ў䖭Ͼᯊ׭ CPU ೼㋏㒳ぎ䯈䖤㸠ˈ㗠᠔᳝䖯⿟ⱘ义䴶ⳂᔩЁϢ㋏㒳ぎ䯈 ㅵ⧚ᴎࠊЁⱘ义䴶Ⳃᔩᣛ䩜 CR3 ैᏆ㒣ߛᤶњˈ䖭ḋټ䇏㗙г䆌Ӯ䯂˖䖯⿟ᴀ䑿ᇮ᳾ߛᤶˈ㗠ᄬ 376 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 4.5 ᰃϔᐙ⼎ᛣ೒DŽ ⿟ҹৢ㹿䇗ᑺҢ㋏㒳䇗⫼ fork()䖨ಲ㗠ߛܹᯊⱘ˄㋏㒳ぎ䯈˅ේᷜ԰ϔ↨䕗ˈህৃҹⳟᕫ᳈⏙Ἦњˈ೒ ࠄ⫼᠋ぎ䯈এњDŽᇚࠡ䴶ᚙ᱃Ёᄤ䖯⿟㹿߯ᓎҹৢ㄀ϔ⃵ߛܹᯊⱘ㋏㒳ぎ䯈ේᷜ੠⠊䖯⿟߯ᓎњᄤ䖯 гህᰃ䇈ˈᇍѢᮄ߯ᓎⱘ䖯⿟ˈ೼䇗⫼ schedule_tail()ҹৢህⳈ᥹䕀ࠄ ret_from_sys_callˈĀ䖨ಲ” 186 jmp ret_from_sys_call 185 jne tracesys_exit 184 testb $0x02,tsk_ptrace(%ebx) # PT_TRACESYS 183 GET_CURRENT(%ebx) 182 addl $4, %esp 181 call SYMBOL_NAME(schedule_tail) 180 pushl %ebx 179 ENTRY(ret_from_fork) ==================== arch/i386/kernel/entry.S 179 186 ==================== 㕂Ў ret_from_forkˈ݊ҷⷕ೼ entry.S Ё˖ џᅲϞˈ䇏㗙೼ fork()ϔ㡖ЁᏆ㒣ⳟࠄˈ䖭Ͼഄഔ೼ copy_thread()Ё˄㾕 arch/i386/kernel/process.c˅䆒 Ѣ݊㋏㒳ぎ䯈ේᷜⱘ䆒㕂DŽއ䆒㕂དˈѠᴹ᠔䆒㕂ⱘĀ䖨ಲഄഔāг᳾ᖙᰃ䖭䞠ⱘᷛো“1ā᠔೼ˈ䖭প ܜ೼ĀϞϔ⃵䇗⾏ᯊāᠻ㸠䖛䖭䞠ⱘ㄀ 16 㟇 21 㸠ˈ᠔ҹϔᴹ㽕ᇚ݊ task_struct 㒧ᵘЁⱘ thread.eip џ ᘶ໡䖤㸠ᯊ䛑ᰃҢ䖭䞠ⱘ㄀ 25 㸠ᓔྟDŽԚᰃ᳝ϔϾ՟໪ˈ䙷ህᰃᮄ߯ᓎⱘ䖯⿟DŽᮄ߯ᓎⱘ䖯⿟ᑊ≵᳝ ᅮњ↣Ͼ䖯⿟೼ফࠄ䇗ᑺއᣛҸ᠔೼ⱘഄഔDŽ⬅Ѣ↣Ͼ䖯⿟೼㹿䇗⾏ᯊ䛑㽕ᠻ㸠䖭䞠ⱘ㄀ 21 㸠ˈ䖭ህ 䖯ܹේᷜⱘ next•>thread.eip ህব៤њ䖨ಲഄഔˈ㗠䖭ህᰃᷛো“1ā᠔೼ⱘഄഔˈгህᰃ 25 㸠ⱘ pop ೼__switch_to()ЁᑆњѯҔМˈᔧ CPU ᠻ㸠ࠄ䙷䞠ⱘ ret ᣛҸᯊˈ⬅Ѣᰃ䗮䖛 jmp ᣛҸ䕀䖛এⱘˈ᳔ৢ ⱘ pop ᣛҸDŽ᥹ⴔˈ೼ 23 㸠䗮䖛 jmp ᣛҸˈ㗠ϡᰃ call ᣛҸˈ䕀ܹњϔϾߑ᭄__switch_to()DŽϨϡ䇈 next•>thread.eip ℷᰃ䖯⿟ B Ϟϔ⃵㹿䇗⾏ᯊ೼㄀ 21 㸠ЁֱᄬⱘDŽᅗгᣛ৥䖭䞠ⱘᷛো“1āˈे 25 㸠 ⃵㹿䇗ᑺ䖤㸠㗠ߛܹᯊⱘĀ䖨ಲāഄഔDŽ✊ৢˈজᇚ next•>thread.eip य़ܹේᷜDŽ᠔ҹˈ䖭䞠ⱘ ᠔೼ⱘഄഔˈᅲ䰙Ϟህᰃ㄀ 25 㸠ⱘ pop ᣛҸ᠔೼ⱘഄഔֱᄬ೼ prev•>thread.eip Ёˈ԰Ў䖯⿟ A ϟϔ 䙷Мˈ⿟ᑣᠻ㸠ⱘߛᤶˈ݋ԧজᰃᗢḋᅲ⦄ⱘਸ਼˛䅽៥Ӏᴹⳟ㄀ 21 㸠㟇 24 㸠 DŽ㄀ 21 㸠ᇚᷛো“1” ཭᠔೼њDŽ݊ᅲˈ㄀ 25 㸠㟇 27 㸠ᰃ೼ᘶ໡ᮄߛܹⱘ䖯⿟೼Ϟϔ⃵㹿䇗⾏ᯊ push 䖯ේᷜⱘݙᆍDŽ ЎҔМ೼㄀ 16 㟇 18 㸠 push 䖯 A ⱘේᷜˈ㗠೼㄀ 25 㸠㟇 27 㸠ैҢ B ⱘේᷜ POP ಲᴹਸ਼˛䖭ህᰃ༹ Ꮖ㒣ᅠ៤њDŽԚᰃˈᵘ៤ϔϾ䖯⿟ⱘ঺ϔϾ㽕㋴ᰃ⿟ᑣⱘᠻ㸠ˈ䖭ᮍ䴶ⱘߛᤶᰒ✊ᇮ᳾ᅠ៤DŽ䙷Мˈ ⱘ䆱ˈ䙷ህᏆ㒣ᣛ৥ B ⱘ task_struct 㒧ᵘњDŽҢ䖭ϾᛣНϞ䇈ˈ䖯⿟ⱘߛᤶ೼㄀ 20 㸠ⱘᣛҸᠻ㸠ᅠህ 377 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 607 * We fsave/fwait so that an exception goes off at the right time 606 * 605 * switch_to(x,yn) should switch tasks from x to y. 604 /* [schedule()>switch_to()>__switch_to()] ==================== arch/i386/kernel/process.c 604 688 ==================== ᳔ৢˈ೼__switch_t()ЁࠄᑩᑆњѯҔМਸ਼˛ⳟ arch/i386/kernel/process.c ЁⱘⳌ݇ҷⷕ˖ ret_from_sys_callˈ᠔ҹ೼__switch_to()ϔᠻ㸠 ret ᣛҸህⳈ᥹ಲࠄњ䙷䞠ˈᡘњϔᇣ↉䖥䏃DŽ 289 㸠ˈ㋻᥹ⴔህӮ䏇䕀ࠄ ret_from_sys_callDŽⳌ↨Пϟˈᄤ䖯⿟ⱘ䖭ϾĀ䖨ಲഄഔā㹿䆒㕂៤ ᠔ҹ݊ේᷜЁⱘ䖭ϔ乍гৃҹⳟ៤ᰃĀҢ__switch_to()䖨ಲⱘഄഔāDŽ⠊䖯⿟᳔ৢ䖨ಲࠄњ entry.S Ёⱘ ೼ switch_to()ⱘ 20 㸠ᘶ໡њ݊ේᷜᣛ䩜ˈ✊ৢ೼__switch_to()Ёᠻ㸠 ret ᣛҸᯊህĀ䖨ಲāࠄњ 25 㸠ˈ 㗠ᑊϡᰃϔϾߑ᭄ˈ᠔ҹේᷜЁᑊ≵᳝Ң switch_to()䖨ಲⱘഄഔDŽᇚᴹˈᔧ⠊䖯⿟㹿䇗ᑺᘶ໡䖤㸠ᯊˈ ᑺ䖤㸠ᯊⱘߛܹ⚍ˈ䙷ህᰃ೼ࠡ䴶 switch_to()ⱘҷⷕЁ 21 㸠䆒㕂ⱘDŽ⊼ᛣˈswitch_to()ᰃϔϾᅣ᪡԰ Ͼ䖯⿟䖤㸠ˈ䙷М݊㋏㒳ぎ䯈ේᷜህব៤њ೒ 4.5 ЁⱘḋᄤDŽ໘ѢේᷜĀ乊䚼āⱘᰃ䖯⿟೼ϟϔ⃵㹿䇗 ҔМџг≵থ⫳䖛ϔḋDŽ㗠བᵰ䇗ᑺњ঺ϔڣᑺⱘ㒧ᵰᰃ㒻㓁䖤㸠ˈ䙷ህ偀ϞӮҢ schedule()䖨ಲˈህ need_resched Ꮖ㒣ᰃ 1ˈህ㽕䇗⫼ schedule()䖯㸠䇗ᑺˈ᠔ҹ݊ේᷜᣛ䩜জಲ䖛༈ᴹ৥ϟԌሩDŽབᵰ䇗 ේᷜᣛ䩜Ꮖ㒣ᣛ৥њ regsˈ᠔ҹ RESTORE_ALL ህՓ䖯⿟ಲࠄ⫼᠋ぎ䯈˄খⳟ㄀ 3 ゴ˅DŽৃᰃˈ⦄೼ ࠄ䖒 ret_with_reschedule ᯊˈབᵰ݊ task_struct 㒧ᵘЁⱘ need_resched Ў 0ˈ䙷ህⳈ᥹䖨ಲњˈ䖭ᯊ݊ need_resched ᷛᖫ䆒៤њ 1ˈ✊ৢህҢ do_fork()੠ sys_fork()Ё䖨ಲDŽ㒣䖛 entry.S Ёⱘ ret_from_sys_call ⠊䖯⿟೼ fork()ᄤ䖯⿟ҹৢˈᑊϡゟेЏࡼ䇗⫼ schedule()ˈ㗠াᰃᇚ݊ task_struct 㒧ᵘЁⱘ ⱘׂᬍ˄㾕 copy_thread()˅DŽ 䖯⿟ⱘේᷜЁˈԚ݊Ё⫼ᴹ䖨ಲߑ᭄ؐⱘ EAX 㹿䆒៤ 0ˈ㗠ᣛ৥⫼᠋ぎ䯈ේᷜⱘᣛ䩜 ESP г԰њⳌᑨ SAVE_ALL ֱᄬⱘᆘᄬ఼ݙᆍˈড়೼ϔ䍋ᔶ៤ϔϾ᭄᥂㒧ᵘ regsDŽ䖭ϔ䚼ߚ㹿ॳᇕϡࡼഄ໡ࠊࠄњᄤ 䖨ಲĀ⦄എāˈࣙᣀ CPU ೼こ䍞䱋䰅䮼ᯊ㞾ࡼֱᄬ೼㋏㒳ぎ䯈ේᷜЁⱘݙᆍҹঞ䗮䖛 entry.S Ёⱘ ೼ේᷜぎ䯈ⱘ乊䚼ˈ៪㗙䇈ේᷜⱘĀᑩ䚼āˈᰃ⠊䖯⿟಴ fork()㋏㒳䇗⫼㗠䖯ܹ㋏㒳ぎ䯈ᯊֱᄬⱘ ೒ 4.5 ㋏㒳䇗⫼䖨ಲᯊ⠊ᄤ䖯⿟㋏㒳ේᷜᇍ✻೒ 378 379 608 * (as a call from the fsave or fwait in effect) rather than to 609 * the wrong process. Lazy FP saving no longer makes any sense 610 * with modern CPU's, and this simplifies a lot of things (SMP 611 * and UP become the same). 612 * 613 * NOTE! We used to use the x86 hardware context switching. The 614 * reason for not using it any more becomes apparent when you 615 * try to recover gracefully from saved state that is no longer 616 * valid (stale segment register values in particular). With the 617 * hardware task•switch, there is no way to fix up bad state in 618 * a reasonable manner. 619 * 620 * The fact that Intel documents the hardware task•switching to 621 * be slow is a fairly red herring • this code is not noticeably 622 * faster. However, there _is_ some room for improvement here, 623 * so the performance issues may eventually be a valid point. 624 * More important, however, is the fact that this allows us much 625 * more flexibility. 626 */ 627 void __switch_to(struct task_struct *prev_p, struct task_struct *next_p) 628 { 629 struct thread_struct *prev = &prev_p•>thread, 630 *next = &next_p•>thread; 631 struct tss_struct *tss = init_tss + smp_processor_id(); 632 633 unlazy_fpu(prev_p); 634 635 /* 636 * Reload esp0, LDT and the page table pointer: 637 */ 638 tss•>esp0 = next•>esp0; 639 640 /* 641 * Save away %fs and %gs. No need to save %es and %ds, as 642 * those are always kernel segments while inside the kernel. 643 */ 644 asm volatile("movl %%fs,%0":"=m" (*(int *)&prev•>fs)); 645 asm volatile("movl %%gs,%0":"=m" (*(int *)&prev•>gs)); 646 647 /* 648 * Restore %fs and %gs. 649 */ 650 loadsegment(fs, next•>fs); 651 loadsegment(gs, next•>gs); 652 653 /* 654 * Now maybe reload the debug registers 655 */ 656 if (next•>debugreg[7]){ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Ёᠻ㸠 ret ЎℶDŽ៪㗙гৃҹ䅸Ўˈߛܹ⚍೼ switch_to()Ёⱘ 25 㸠ˈϔⳈ䖤㸠ࠄ೼ϟϔ⃵䖯ܹ switch_to() Ёˈ಴Ў switch_to()ᰃϾᅣ᪡԰˅Ёⱘᷛো“1āˈϔⳈ䖤㸠ࠄϟϔ⃵䖯ܹ switch_to()ҹৢ೼__switch_to() ᘏПˈ䰸߮߯ᓎⱘᮄ䖯⿟໪ˈ᠔᳝䖯⿟೼ফࠄ䇗ᑺᯊⱘߛܹ⚍䛑ᰃ೼ switch_to()˄݊ᅲᰃ೼ schedule() ᰃϡ㿔㗠ஏⱘDŽצህᰃⳂࠡⱘ䖭⾡䕃ӊᅲ⦄⫮㟇↨⹀ӊᅲ⦄ৃҹ᳈ᖿDŽ㟇Ѣ㄀ϝϾॳ಴ˈे♉⌏ᗻˈ䙷 Ԛৢᴹϡ⫼њDŽ⊼䞞Ё䇈њϝϾॳ಴ˈ݊Ё㄀ϔϾॳ಴䇁⛝ϡ䆺DŽԚᰃˈ㄀ѠϾॳ಴ᰃᕜ᳝䍷ⱘˈ䙷 ⱘˈԚᰃᅲ䰙Ϟै᳾ᖙড়䗖DŽ䖭䞠ˈҷⷕⱘ԰㗙ࡴњϔ↉⊼䞞ˈ䇈 Linux ᳒㒣⫼䖛⬅⹀ӊᅲ⦄ⱘߛᤶˈ 䩜ǃгህᰃᆘᄬ఼ TR ⱘݙᆍˈ⬅ CPU ⱘ⹀ӊᴹᅲ⦄䖯⿟˄ӏࡵ˅ⱘߛᤶDŽ㸼䴶Ϟⳟ䖭ᰃᕜ᳝਌ᓩ࡯ ៥Ӏ೼㄀ 3 ゴЁᦤࠄ䖛ˈIntel ⱘॳᛣᰃ䅽᪡԰㋏㒳Ў↣Ͼ䖯⿟䛑䆒㕂ϔϾ TSSˈ䗮䖛ߛᤶ TSS ᣛ ϔѯᆘᄬ఼ˈҹঞ䇈ᯢ䖯⿟ I/O ᪡԰ᴗ䰤ⱘԡ೒˄㾕㄀ 3 ゴ˅ˈ䙷ህϡᰃ៥Ӏ೼䖭䞠᠔݇ᖗⱘњDŽ ˄䆺㾕㄀ 3 ゴ˅DŽ݊⃵ˈ↉ᆘᄬ఼ fs ੠ gs ⱘݙᆍг䱣ৢ԰њⳌᑨⱘߛᤶDŽ㟇Ѣ CPU ЁЎ debug 㗠䆒ⱘ 䖭ᰃ಴Ў CPU ೼こ䍞Ёᮁ䮼៪䱋䰅䮼ᯊ㽕ḍ᥂ᮄⱘ䖤㸠㑻߿Ң TSSЁপᕫ䖯⿟೼㋏㒳ぎ䯈ⱘේᷜᣛ䩜 䖭䞠໘⧚ⱘЏ㽕ᰃTSSˈ݊Ḍᖗህᰃ㄀638㸠 ˈᇚ TSSЁⱘݙḌぎ䯈˄0㑻˅ේᷜᣛ䩜ᤶ៤next•>esp0DŽ 688 } 687 } 686 tss•>bitmap = INVALID_IO_BITMAP_OFFSET; 685 */ 684 * sys_ioperm() call sets up the bitmap properly. 683 * tries to use a port IO instruction. The first 682 * causes a nicely controllable SIGSEGV if a process 681 * a bitmap offset pointing outside of the TSS limit 680 /* 679 } else 678 tss•>bitmap = IO_BITMAP_OFFSET; 677 IO_BITMAP_SIZE*sizeof(unsigned long)); 676 memcpy(tss•>io_bitmap, next•>io_bitmap, 675 */ 674 * is not really acceptable.] 673 * and playing VM tricks to switch the IO bitmap 672 * [Putting the TSSs into 4k•tlb mapped regions 671 * This only affects processes which use ioperm(). 670 * bad either. Anyone got something better? 669 * 4 cachelines copy ... not good, but not that 668 /* 667 if (next•>ioperm) { 666 if (prev•>ioperm || next•>ioperm) { 665 664 } 663 loaddebug(next, 7); 662 loaddebug(next, 6); 661 /* no 4 and 5 */ 660 loaddebug(next, 3); 659 loaddebug(next, 2); 658 loaddebug(next, 1); loaddebug(next, 0); 657 380 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᠔ҹϡӮݡ㹿䇗ᑺ䖤㸠DŽ ৃᰃˈ೼ exit()䖭Ͼᚙ᱃Ёˈ⬅Ѣ೼䇗⫼ schedule()ПࠡᏆ㒣ᡞ䖯⿟ⱘ⢊ᗕᬍ៤ TASK_ZOMBIEˈ schedule()ˈ㗠ৢ㗙߭ϔᅮᰃ೼ entry.S Ёⱘ reschedule ໘DŽ Ѣ䖯⿟೼ԩ໘䇗⫼އᯊⱘ㋏㒳ぎ䯈ේᷜҹঞ䖨ಲ䏃㒓Ϣ㹿ࡼഄ㹿࠹༎䖤㸠ᯊ᳝᠔ϡৠDŽࠡ㗙প ℸ໘᠔䆆ⱘ䖨ಲ䏃㒓Ϣࠡ䴶䆆ⱘ㋏㒳䇗⫼ fork()Ёⱘ⠊䖯⿟԰ϔ↨䕗ˈৃথ⦄ˈ䖯⿟ЏࡼѸߎ䖤㸠 (5) 䗮䖛ᅣ᪡԰ RESTORE_ALL ಲࠄ⫼᠋ぎ䯈DŽ (4) Ң sys_exit()䖨ಲ㟇ࠄ entry.S Ёⱘ system_call ໘ˈेҷⷕЁⱘ 204 㸠DŽ (3) Ң do_exit()䖨ಲࠄ sys_exit()ЁDŽ (2) Ң schedule()䖨ಲࠄ do_exit()ЁDŽ Ϟᰃ೼ schedule()ЁDŽ (1) Ң switch_to()Ёⱘᷛো“1ā໘ᘶ໡䖤㸠DŽ⬅Ѣ switch_to()ᰃᅣ᪡԰㗠ϡᰃߑ᭄ˈ᠔ҹ䖭ᅲ䰙 ⱘ䏃㒓䖨ಲࠄ⫼᠋ぎ䯈˖ ᅗ䖯⿟ϔḋӮ㹿䇗ᑺ㒻㓁䖤㸠ⱘ䆱ˈᅗህӮᕾϟ߫݊ڣᅮ˅䖭Ͼ䖯⿟؛˄Ң೒Ёৃҹⳟߎˈབᵰ ೒ 4.6 䖯⿟ߛᤶᯊ㋏㒳ぎ䯈ේᷜ⼎ᛣ೒ ⱘ䇗⫼ᰃ೼ do_exit()Ё԰ߎⱘˈ೼Ѹ᥹ᯊ䖭Ͼ䖯⿟ⱘ㋏㒳ぎ䯈ේᷜབ೒ 4.6 ᠔⼎DŽ ᳔ৢˈ䅽៥Ӏಲࠄ೼㋏㒳䇗⫼exit()Ё䗮䖛 schedule()㞾ᜓ䅽ߎ䖤㸠ⱘᚙ᱃˄೒4.6˅DŽ⬅Ѣᇍschedule() ⱘ䏃㒓Ϟˈ৘༨ࠡ⿟њDŽ 㒳ぎ䯈ᯊⱘ䖤㸠⦄എϡৠˈ䖨ಲഄഔϡৠˈ⫼᠋ぎ䯈ේᷜᣛ䩜ϡৠˈϔᮺಲࠄ⫼᠋ぎ䯈ህಲࠄњ৘㞾 ৠ䖯⿟ⱘϡৠĀϞϟ᭛āজᰃᗢḋԧ⦄ⱘਸ਼˛䖭ϡৠህ೼Ѣ㋏㒳ぎ䯈ේᷜЁⱘݙᆍDŽϡৠ䖯⿟䖯ܹ㋏ 䙷Мˈ᮶✊䛑ᰃ೼ৠϔ⚍ϞѸ᥹ˈᑊϨҢℸҹৢϔⳈࠄ䖨ಲ⫼᠋ぎ䯈䖭ϔ↉䏃⿟জᰃ݅ৠⱘˈϡ ⱘ 23 㸠DŽᘏПˈ䖭ᮄǃᮻᔧࠡ䖯⿟ⱘѸ᥹⚍ህ೼ switch_to()䖭↉ҷⷕЁDŽৢ 381 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 588 p•>need_resched = 1; 587 p•>counter = 0; 586 if (••p•>counter <= 0) { 585 if (p•>pid) { 584 update_one_process(p, user_tick, system, cpu); 583 582 int cpu = smp_processor_id(), system = user_tick ^ 1; 581 struct task_struct *p = current; 580 { 579 void update_process_times(int user_tick) 578 */ 577 * process. user_tick is 1 if the tick is user time, 0 for system. 576 * Called from the timer interrupt handler to charge one tick to the current 575 /* [do_timer_interrupt()>do_timer()>update_process_times()] ==================== kernel/timer.c 575 597 ==================== ᔧࠡ䖯⿟Ϣᯊ䯈᳝݇ⱘϔѯ䖤㸠খ᭄ˈ݊ҷⷕ೼ kemel/sched.c Ё˖ Ё৘Ͼ CPU Փ⫼ᴀഄⱘᅮᯊ఼ˈ⿄Ў APIC ᅮᯊ఼˅㽕䇗⫼঺ϔϾߑ᭄ update_process_times()ᴹ䇗ᭈ ⫼ϔϾߑ᭄ do_timer()ˈᑊᏆ⌣㾜䖛䖭Ͼߑ᭄ⱘҷⷕDŽ೼䖭Ͼߑ᭄ЁˈᇍѢऩ CPU 㒧ᵘ˄೼ SMP 㒧ᵘ ೼㄀ 3 ゴⱘĀᯊ䩳Ёᮁāϔ㡖Ёˈ䇏㗙Ꮖ㒣ⳟࠄˈ೼ᯊ䩳Ёᮁ᳡ࡵ⿟ᑣ do_timer_interrupt()Ё㽕䇗 ఼᭄ⱘؐ䰡㟇 0 ᯊˈህ㽕ᔎࠊ䖯㸠ϔ⃵䇗ᑺˈ࠹༎ᔧࠡ䖯⿟ⱘ䖤㸠DŽ ᔧࠡ䖯⿟ⱘᯊ䯈䜡乱ˈՓᔧࠡ䖯⿟ⱘ䖤㸠䌘Ḑ䗤⏤䰡Ԣˈᔧ䅵ޣ㸠Ёˈ߭↣ᔧথ⫳ᯊ䩳Ёᮁᯊ䛑㽕䗦 㑻߿DŽ೼䖤ܜѢЎ৘Ͼ䖯⿟䆒ᅮⱘӬއᯊˈህ㽕䞡ᮄ䅵ㅫᑊ䆒㕂↣Ͼ䖯⿟ⱘᯊ䯈䜡乱ˈ᭄݊ؐЏ㽕প ⱘ䖯⿟ഛЎѸѦᓣ䖯⿟ˈे䇗ᑺᬓㄪЎ SCHED_OTHER ⱘ䖯⿟ˈᑊϨ᠔᳝䖭ѯ䖯⿟䛑⫼ᅠњᯊ䯈䜡乱 SCHED_RR ៪ SCHED_FIFO ⱘ䖯⿟ˈ߭䖤㸠䌘ḐϢℸ᮴݇ˈᑊϨ䛑᳝䴲ᐌ催ⱘᴗؐDŽᔧ䯳߫Ё᠔᳝ े task_struct 㒧ᵘЁⱘϔϾ䅵఼᭄ counter ⱘᔧࠡؐDŽᇍѢ᳝ᅲᯊ㽕∖ⱘ䖯⿟ˈгህᰃ䇗ᑺᬓㄪЎ Ѣ䖯⿟࠽ϟⱘᯊ䯈䜡乱ˈއϾ䖯⿟䛑䅵ㅫߎϔϾᔧᯊⱘᴗؐDŽᇍѢϔ㠀ѸѦᓣⱘᑨ⫼ˈ᭄݊ؐЏ㽕প DŽ೼ࠡϔ㡖ˈ䇏㗙Ꮖⳟࠄˈ䇗ᑺᯊ㽕Ўৃᠻ㸠䖯⿟䯳߫˄ህ㒾䖯⿟䯳߫˅Ёⱘ↣މⳟ㄀ϔ⾡ᚙܜ ᑺˈ಴ℸ䖭ḋⱘ㋏㒳䇗⫼Ӯᓩ䍋ゟे䇗ᑺDŽ ᅲ䰙Ϟᑨ䆹㹿㾚ЎЏࡼⱘǃ㞾ᜓⱘ䇗މ· ϔϾ䖯⿟䗮䖛㋏㒳䇗⫼ᬍব䇗ᑺᬓㄪ៪⼐䅽DŽ䖭⾡ᚙ · ᔧ૸䝦ϔϾⴵ⳴Ёⱘ䖯⿟ᯊˈথ⦄㹿૸䝦ⱘ䖯⿟↨ᔧࠡ䖯⿟᳈᳝䌘Ḑ䖤㸠DŽ ೼ᯊ䩳Ёᮁⱘ᳡ࡵ⿟ᑣЁˈথ⦄ᔧࠡ䖯⿟˄䖲㓁˅䖤㸠ⱘᯊ䯈䖛䭓DŽ · ˖މᴀⱘݙḌЁˈ೼ऩ CPU ⱘᴵӊϟˈЏ㽕᳝བϟ޴⾡ᚙ ϟᠡ㕂៤ 1 ⱘDŽ೼Ⳃࠡ⠜މ৺Ў 1˄䴲 0˅DŽ಴ℸˈ䯂乬ህᔦ㒧Ўᔧࠡ䖯⿟ⱘ need_resched ᰃ೼ҔМᚙ Ѣᔧࠡ䖯⿟ task_struct 㒧ᵘЁⱘ need_resched ᰃއҹⳟߎˈℸᯊᰃ৺ⳳⱘ䇗⫼ schedule()ˈ᳔㒜䖬㽕প ࠄ⫼᠋ぎ䯈ᯊ䛑Ӯথ⫳ℸ㉏䇗ᑺDŽҢ㄀ 3 ゴЁҹঞࠡ㡖ᓩ㞾 entry.S ⱘҷⷕ⠛↉ ret_with_reschedule ৃ ࠡ䴶䆆䖛ˈ䖭⾡䇗ᑺথ⫳೼䖯⿟Ң㋏㒳ぎ䯈䖨ಲࠄ⫼᠋ぎ䯈ⱘࠡ໩DŽᔧ✊ˈᑊ䴲↣⃵Ң㋏㒳ぎ䯈䖨ಲ Linux ݙḌЁ䖯⿟ⱘᔎࠊᗻ䇗ᑺˈгህᰃ䴲㞾ᜓⱘǃ㹿ࡼⱘǃ࠹༎ᓣⱘ䇗ᑺˈЏ㽕ᰃ⬅ᯊ䯈ᓩ䍋ⱘDŽ ᔎࠊᗻ䇗ᑺ 4.7 382 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 199 * This is ugly, but reschedule_idle() is very timing•critical. 198 /* ==================== kernel/sched.c 198 207 ==================== ᠻ㸠䖯⿟䯳߫˅ˈ✊ৢ֓䇗⫼ߑ᭄ reschedule_idel()DŽᇍѢऩ CPU 㒧ᵘᴹ䇈ˈ䖭Ͼߑ᭄ᕜㅔऩDŽ ৃ㾕ˈ᠔䇧Ā૸䝦āˈህᰃᡞ䖯⿟ⱘ⢊ᗕ䆒㕂៤ TASK_RUNNINGˈᑊᇚ䖯⿟ᣖܹ runqueue˄ेৃ 344 } 343 spin_unlock_irqrestore(&runqueue_lock, flags); 342 out: 341 reschedule_idle(p); 340 add_to_runqueue(p); 339 goto out; 338 if (task_on_runqueue(p)) 337 p•>state = TASK_RUNNING; 336 spin_lock_irqsave(&runqueue_lock, flags); 335 */ 334 * We want the common case fall through straight, thus the goto. 333 /* 332 331 unsigned long flags; 330 { 329 inline void wake_up_process(struct task_struct * p) 328 */ 327 * without the overhead of this. 326 * "current•>state = TASK_RUNNING" to mark yourself runnable 325 * progress), and as such you're allowed to do the simpler 324 * run•queue (except when the actual re•schedule is in 323 * already there. The "current" process is always on the 322 * Wake up a process. Put it on the run•queue if it's not 321 /* ==================== kernel/sched.c 321 344 ==================== 䖭Ͼߑ᭄ⱘҷⷕг೼ kernel/sched.c Ё˖ DŽ೼ݙḌ⾡ˈᔧ㽕૸䝦ϔϾⴵ⳴Ёⱘ䖯⿟ᯊˈৃҹ䇗⫼ϔϾߑ᭄ wake_up_process()DŽމݡⳟ㄀Ѡ⾡ᚙ ೼䖭䞠ᑊϡ݇ᖗˈ䇏㗙ৃҹ㞾Ꮕ䯙䇏DŽ need_resched 㕂៤ 1DŽ㟇Ѣߑ᭄Ё݊ᅗⱘ᪡԰ˈࣙᣀ update_one_process()ˈাᰃϢ㒳䅵ֵᙃ᳝݇ˈ៥Ӏ 1DŽᔧ䅵఼᭄䰡ࠄ 0 ᯊˈህᇚ task_struct 㒧ᵘЁⱘ ޣা㽕ϡᰃ 0 ো䖯⿟ˈህҢᔧࠡ䖯⿟ⱘ䅵఼᭄Ё 597 } 596 kstat.per_cpu_system[cpu] += system; 595 } else if (local_bh_count(cpu) || local_irq_count(cpu) > 1) 594 kstat.per_cpu_system[cpu] += system; 593 kstat.per_cpu_user[cpu] += user_tick; 592 else 591 kstat.per_cpu_nice[cpu] += user_tick; if (p•>nice > 0) 590 { 589 383 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⊩ˈᇚᔧࠡ䖯⿟ⱘ need_resched ᷛᖫ㕂Ў 1ˈՓᕫ೼䖯⿟䖨ಲ⫼᠋ぎ䯈ࠡ໩থ⫳䇗ᑺˈ᠔ҹгᬒ೼䖭ϔ ᅲ䰙Ϟᑨ䆹㹿㾚Ў㞾ᜓⱘ䅽ߎDŽԚᰃˈҢݙḌҷⷕⱘᔶᓣϞⳟˈгᰃ䗮䖛Ⳍৠⱘࡲˈމ㄀ϝ⾡ᚙ ᔧ CPU 㽕Ң㋏㒳ぎ䯈䖨ಲࠄ⫼᠋ぎ䯈ᯊˈ䖭Ͼ䖯⿟Ꮖ㒣Ā೼ԡāњDŽ ⹂ˈ಴Ўޚᯊ䯈䞠ˈ䖭Ͼ䖯⿟ᑊϡᰃⳳℷⱘᔧࠡ䖯⿟DŽৃᰃˈᇚ߮૸䝦ⱘ䖯⿟Ϣ䖭Ͼ䖯⿟Ⳍ↨ᰒ✊᳈ kernel/sched.cˈ586 㸠˅DŽ᠔ҹˈ೼໻䚼ߚᯊ䯈Ё䖭ᰃϢ current ϔ㟈ⱘˈԚᰃ೼ᅠ៤ߛᤶПࠡⱘϔᇣ↉ ϡⶹ䇏㗙ᰃ৺䆄ᕫˈ䖭ᰃ೼ schedule()Ёᣥ䗝њ㽕䖯ܹ䖤㸠Ԛᇮ᳾ߛᤶПࠡ䆒㕂ⱘˈ˄㾕 103 #define cpu_curr(cpu) aligned_data[(cpu)].schedule_data.curr ==================== kernel/sched.c 103 103 ==================== ЁᅮНⱘ˖ ᴹⳟⳟ cpu_curr ⱘᅮНˈ䙷гᰃ೼ kernel/sched.cܜ˛Ͼᅣ᪡԰ cpu_curr ᕫࠄⱘDŽ䖭ϸ㗙᳝ҔМऎ߿ਸ਼ 䇏㗙г䆌⊼ᛣࠄњˈ೼ reschedule_idle()Ёⱘᔧࠡ䖯⿟ᣛ䩜ᑊϡᰃ䗮䖛ᅣ᪡԰ current 㗠ᰃ䗮䖛঺ϔ 196 } 195 return goodness(p, cpu, prev•>active_mm) • goodness(prev, cpu, prev•>active_mm); 194 { 193 static inline int preemption_goodness(struct task_struct * prev, struct task_struct * p, int cpu) 192 */ 191 * positive value means 'replace', zero or negative means 'dont'. 190 * the 'goodness value' of replacing a process on a given CPU. 189 /* [wake_up_process()>reschedule_idle()>preemption_goodness()] ==================== kernel/sched.c 189 196 ==================== ᰃ೼ kernel/sched.c ЁᅮНⱘ˖ ⿟ⱘ need˻rescbed ᷛᖫ䆒㕂៤ 1DŽߑ᭄ preemption_goodness()䅵ㅫϸϾ䖯⿟㓐ড়ᴗؐⱘᏂˈ݊ҷⷕг ݊Ⳃⱘᰃᇚ᠔૸䝦ⱘ䖯⿟Ϣᔧࠡ䖯⿟П䯈԰ϔ↨䕗DŽབᵰ᠔૸䝦ⱘ䖯⿟䖤㸠䌘Ḑ᳈催ህᇚᔧࠡ䖯 294 } 293 #endif 292 tsk•>need_resched = 1; 291 if (preemption_goodness(tsk, p, this_cpu) > 1) 290 tsk = cpu_curr(this_cpu); 289 288 struct task_struct *tsk; 287 int this_cpu = smp_processor_id(); 286 #else /* UP */ ==================== kernel/sched.c 286 294 ==================== 207 #ifdef CONFIG_SMP 206 { 205 static void reschedule_idle(struct task_struct * p) 204 203 static FASTCALL(void reschedule_idle(struct task_struct * p)); 202 */ 201 * not claim the tasklist_lock. We are called with the runqueue spinlock held and we must * 200 384 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 916 else { 915 policy = p•>policy; 914 if (policy < 0) 913 912 goto out_unlock; 911 if (!p) 910 retval = •ESRCH; 909 908 p = find_process_by_pid(pid); 907 906 spin_lock(&runqueue_lock); 905 read_lock_irq(&tasklist_lock); 904 */ 903 * We play safe to avoid deadlocks. 902 /* 901 900 goto out_nounlock; 899 if (copy_from_user(&lp, param, sizeof(struct sched_param))) 898 retval = •EFAULT; 897 896 goto out_nounlock; 895 if (!param || pid < 0) 894 retval = •EINVAL; 893 892 int retval; 891 struct task_struct *p; 890 struct sched_param lp; 889 { 888 struct sched_param *param) 887 static int setscheduler(pid_t pid, int policy, ==================== kernel/sched.c 887 955 ==================== 966 } 965 return setscheduler(pid, •1, param); 964 { 963 asmlinkage long sys_sched_setparam(pid_t pid, struct sched_param *param) 962 961 } 960 return setscheduler(pid, policy, param); 959 { 958 struct sched_param *param) 957 asmlinkage long sys_sched_setscheduler(pid_t pid, int policy, ==================== kernel/sched.c 957 966 ==================== ݙḌҷⷕЁᇍℸ㋏㒳䇗⫼ⱘᅲ⦄ sys_sched_setscheduler()೼ kernel/sched.c Ё˖ ⱘ䇗ᑺᬓㄪ䘫Ӵ㒭њᄤ䖯⿟DŽԚᰃˈ⫼᠋ৃҹ䗮䖛㋏㒳䇗⫼ sched_setscheduler()ᬍব݊䗖⫼䇗ᑺᬓㄪDŽ SCHED_OTHERˈгህᰃ咬䅸Ў᮴ᅲᯊ㽕∖ⱘѸѦᓣᑨ⫼DŽ೼䗮䖛 fork()߯ᓎᮄ䖯⿟ᯊ߭ᇚℸ䖯⿟䗖⫼ sched_setscheduler()ⱘ԰⫼ᰃᬍব䖯⿟ⱘ䇗ᑺᬓㄪDŽ⫼᠋ⱏᔩࠄ㋏㒳ৢˈ㄀ϔϾ䖯⿟ⱘ䗖⫼䇗ᑺᬓㄪЎ 㡖ЁDŽℸ㉏㋏㒳䇗⫼᳝ϸϾˈϔϾᰃ sched_setscheduler()ˈ঺ϔϾᰃ sched_yield()DŽ㋏㒳䇗⫼ 385 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䯳߫Ёⱘ䆱˅ˈՓ݊೼䇗ᑺᯊ˄ⳌᇍѢ݋᳝Ⳍৠ䖤㸠䌘Ḑⱘ䖯⿟˅໘Ѣ䕗Ў᳝߽ⱘഄԡDŽ᳔ৢᇚᔧࠡ䖯 move_first_runqueue()ᇚ䖯⿟Ңৃᠻ㸠䖯⿟䯳߫ⱘᔧࠡԡ㕂⿏ࠄ䯳߫ⱘࠡ䚼˄བᵰ䆹䖯⿟೼ৃᠻ㸠䖯⿟ 䆌䖯㸠ᶤ⾡⡍ᅮⱘ᪡԰DŽ᭛ӊ include/inux/capability.h ЁᅮНњ᠔᳝ⱘᷛᖫԡDŽߑ᭄ܕ䖯⿟ᰃ৺ 䖭䞠ⱘ capable()ᰃϾ inline ߑ᭄ˈᅗẔᶹ current•>cap_effectiveˈⳟᶤϾᷛᖫԡᰃ৺Ў 1ˈгህᰃ ᅮњᅗফݙḌ䇗ᑺⱘ⾡⾡⡍ᗻDŽއⱘ䇗ᑺᬓㄪϢ䇗ᑺখ᭄㒧ড়೼ϔ䍋ህ SCHED_OTHERDŽ↣Ͼ䖯⿟䛑ᖙ✊䞛প݊ЁПϔ˄㾕 918 㸠˅DŽ䰸䇗ᑺᬓㄪ໪䖬᳝ϔѯখ᭄ˈϔϾ䖯⿟ ҢҷⷕЁৃҹⳟߎˈLinux ݙḌ᳝ϝ⾡ϡৠⱘ䇗ᑺᬓㄪˈे SCHED_FIFOǃSCHED_RR ҹঞ 955 } 954 return retval; 953 out_nounlock: 952 951 read_unlock_irq(&tasklist_lock); 950 spin_unlock(&runqueue_lock); 949 out_unlock: 948 947 current•>need_resched = 1; 946 945 move_first_runqueue(p); 944 if (task_on_runqueue(p)) 943 p•>rt_priority = lp.sched_priority; 942 p•>policy = policy; 941 retval = 0; 940 939 goto out_unlock; 938 !capable(CAP_SYS_NICE)) 937 if ((current•>euid != p•>euid) && (current•>euid != p•>uid) && 936 goto out_unlock; 935 !capable(CAP_SYS_NICE)) 934 if ((policy == SCHED_FIFO || policy == SCHED_RR) && 933 retval = •EPERM; 932 931 goto out_unlock; 930 if ((policy == SCHED_OTHER) != (lp.sched_priority == 0)) 929 goto out_unlock; 928 if (lp.sched_priority < 0 || lp.sched_priority > 99) 927 retval = •EINVAL; 926 */ 925 * priority for SCHED_OTHER is 0. 924 * Valid priorities for SCHED_FIFO and SCHED_RR are 1..99, valid 923 /* 922 921 } 920 goto out_unlock; 919 policy != SCHED_OTHER) 918 if (policy != SCHED_FIFO && policy != SCHED_RR && retval = •EINVAL; 917 386 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Ā䇗ᑺᓊ䖳˄dispatch latency˅āDŽ೼ࠡ߫ⱘخ䙷ህᰃҢথ⦄᳝䇗ᑺⱘᖙ㽕ࠄ䇗ᑺⳳℷথ⫳᳝Ͼᓊ䖳ˈি ϢЏࡼ䇗ᑺⳌ↨ˈ䗮䖛ᇚᔧࠡ䖯⿟ⱘ need_resched ᷛᖫ㕂 1 ҹᔎࠊ䖯㸠ⱘ䇗ᑺ᳝ϔϾ䞡㽕ⱘϡৠˈ 䖯⿟ҹৢ䇗⫼ⱘDŽ ㋻᥹ⴔⱘ䇗ᑺЁህ⏙៤ 0DŽ᳝݇ⱘҷⷕ೼__schedule_tail()Ёˈ䖭ᰃ೼ schedule()Ё䗮䖛 switch_to()ߛᤶ ेℷ೼ㄝᕙ䖤㸠ⱘ䖯⿟᭄䞣DŽҷⷕЁᇚ current•>policy Ёⱘ SCHED_YIELD ᷛᖫ㕂Ў 1ˈ䖭Ͼᷛᖫԡ೼ 㽕Ẕᶹ nr_pendingˈܜϟᠡ᳝ᛣНˈ᠔ҹ䖭䞠މஏˈĀ⼐䅽āা᳝೼㋏㒳Ё䖬᳝݊ᅗህ㒾䖯⿟ᄬ೼ⱘᚙ Ϣᬍব䇗ᑺᬓㄪ៪খ᭄ᯊϡৠⱘᰃˈ䖭䞠ᑊϡᬍবᔧࠡ䖯⿟೼ৃᠻ㸠䖯⿟䯳߫Ёⱘԡ㕂DŽϡ㿔㗠 1052 } 1051 return 0; 1050 } 1049 current•>need_resched = 1; 1048 current•>policy |= SCHED_YIELD; 1047 if (current•>policy == SCHED_OTHER) 1046 */ 1045 * so this is safe without any locking. 1044 * This process can only be rescheduled by us, 1043 /* 1042 if (nr_pending) { 1041 #endif 1040 nr_pending••; 1039 // on UP this process is on the runqueue as well 1038 #else 1037 nr_pending••; 1036 if (aligned_data[i].schedule_data.curr != idle_task(i)) 1035 for (i = 0; i < smp_num_cpus; i++) 1034 // Substract non•idle processes running on other CPUs. 1033 1032 int i; 1031 #if CONFIG_SMP 1030 1029 int nr_pending = nr_running; 1028 1027 */ 1026 * gets triggered quite often. 1025 * to be atomic.) In threaded applications this optimization 1024 * only the current processes. (This test does not have 1023 * 'pending' runnable processes, then returns if it's 1022 * Trick. sched_yield() first counts the number of truly 1021 /* 1020 { 1019 asmlinkage long sys_sched_yield(void) ==================== kernel/sched.c 1019 1052 ==================== ⱘᅲ⦄ sys_sched_yield()г೼ kernel/sched.c Ё˖ ঺ϔϾ㋏㒳䇗⫼ sched_yield()Փ䖤㸠Ёⱘ䖯⿟ৃҹЎ݊ᅗ䖯⿟Ā䅽䏃āˈԚᑊϡ䖯ܹⴵ⳴DŽݙḌЁ ⿟ⱘ need_resched 䆒៤ 1ˈᔎࠊথ⫳ϔ⃵䇗ᑺDŽ 387 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᔧࠡ䖯⿟᥹ᬊࠄњֵᓧ SIGSTOPˈ✊ৢ೼ᔧࠡ䖯⿟Ң㋏㒳ぎ䯈䖨ಲࠄ⫼᠋ぎˈމ䖬᳝ϔ⾡⡍⅞ᚙ ᠡ㹿૸䝦DŽ Ϟৃҹⳟ԰ᰃ pause()ⱘϔ⾡⡍՟ˈ಴Ўᅗ㽕೼᥹ᬊࠄ⡍ᅮⱘֵো SIGCHLD ᑊϨ⒵䎇㢹ᑆ⡍⅞ᴵӊᯊ ҹᐌᐌ⫼ᴹण䇗㢹ᑆ䖯⿟ⱘ䖤㸠DŽ䇏㗙೼ࠡ޴㡖Ёⳟࠄⱘ㋏㒳䇗⫼ wait4()ˈ㉏Ԑⱘ䖬᳝ wait3()ˈᅲ䰙 ㋏㒳䇗⫼ pause()гՓᔧࠡ䖯⿟䖯ܹⴵ⳴ˈৃᰃϢᯊ䯈᮴݇ˈ㽕ࠄ᥹ᬊࠄϔϾֵোᯊᠡ㹿૸䝦ˈ᠔ nanosleep()ᴹᅲ⦄ⱘDŽ ҹᐌᐌ⫼ᴹᅲ⦄਼ᳳᗻⱘᑨ⫼DŽ⿟ᑣਬӀᐌᐌՓ⫼ⱘ sleep()ᰃϾᑧߑ᭄ˈᅲ䰙Ϟᰃ䗮䖛㋏㒳䇗⫼ ㋏㒳䇗⫼ nanosleep()Փᔧࠡ䖯⿟䖯ܹⴵ⳴⢊ᗕˈԚᰃ೼ᣛᅮⱘᯊ䯈ҹৢ⬅ݙḌᇚ䆹䖯⿟૸䝦ˈ᠔ nanosleep()ˈ঺ϔϾᰃ pause()DŽ ⴵ⳴ǃ䅽ߎ CPUDŽ঺ϔ⾡ᰃᯢ⹂ⱘˈⳂⱘህ೼䖯ܹⴵ⳴⢊ᗕDŽ䖭ḋⱘ㋏㒳䇗⫼Џ㽕᳝ϸϾˈϔϾᰃ open()ǃsend()ǃrecvfrom()ㄝㄝˈ޴Т᠔᳝Ϣ໪䆒᳝݇ⱘ㋏㒳䇗⫼䛑᳝ৃ㛑೼ᠻ㸠ⱘ䖛⿟Ёফ䰏㗠䖯ܹ Ⳃⱘϔᯊϡ㛑䖒ࠄˈᖙ乏ㄝᕙᯊᠡߎѢ݀ᖋᖗˈᡞ CPU ᱖ᯊ䅽ߎᴹDŽ䖭ḋⱘ՟ᄤ᳝ read()ǃwrite()ǃ ᰃ䇈᱖ᯊ䅽ߎ CPU ⱘৃ㛑ᗻ䱤৿೼݊ᅗ㸠ЎПЁDŽℸᯊ䅽ߎ CPU ᴀ䑿ᑊϡᰃⳂⱘˈ㗠াᰃ೼ⳳℷⱘ 䖭⾡Џࡼ೼ϔ↉ᯊ䯈ݙᬒᓗ䖤㸠ǃ䅽ߎ CPU ⱘ㸠ࡼৃҹߚ៤ϸ⾡DŽϔ⾡ᰃ䱤৿ⱘˈϡ⹂ᅮⱘˈህ 䝦ᠡ㛑ᇚ⢊ᗕᘶ໡៤ TASK_RUNNINGˈᑊಲࠄৃᠻ㸠䯳߫ЁDŽ 㸠䯳߫Ё㜅䩽ˈ䇗ᑺⱘ㒧ᵰϔᅮᰃ݊ᅗ䖯⿟ᕫҹ䖤㸠DŽᑊϨˈ䖯⿟ϔᮺ䖯ܹⴵ⳴⢊ᗕˈህ䳔㽕㒣䖛૸ ܹⴵ⳴ˈгህᰃᇚ䖯⿟ⱘ⢊ᗕব៤ TASK_INTERRUPTIBLE ៪ TASK_UNINTERRUPTIBLEˈᑊҢৃᠻ DŽ㗠䖭䞠᠔䇈ⱘᰃˈᔧࠡ䖯⿟䖯މ⢊ϡৠˈ䙷াᰃ䅽ݙḌ䖯㸠ϔ⃵䇗ᑺˈ㗠ᔧࠡ䖯⿟㒻㓁ֱᣕৃ䖤㸠 ࠄDŽ⊼ᛣˈࠡϔ㡖Ё䆆ࠄⱘ㋏㒳䇗⫼ sched_yield()Ϣℸ᳝᠔خ䗮䖛㋏㒳䇗⫼ˈ៪㗙೼㋏㒳䇗⫼ݙ䚼ᠡ㛑 ߎѢ⾡⾡ॳ಴ˈ䖤㸠Ёⱘ䖯⿟ᐌᐌ䳔㽕Џࡼ䖯ܹⴵ⳴⢊ᗕˈᑊথ䍋ϔ⃵䇗ᑺ䅽ߎ CPUDŽ䖭ϔᅮ㽕 4.8 ㋏㒳䇗⫼ nanosleep()੠ pause() ᴵӊDŽ Ӯথ⫳䇗ᑺˈ䖭ϔ⚍೼Ⳃࠡⱘ linux ݙḌЁᰃ≵ֱ᳝䆕ⱘˈ㗠া㛑Ң㒳䅵ⱘǃᑇഛⱘ㾦ᑺⳟᰃ৺㛑⒵䎇 㑻ⱘՓ⫼㗠᳝њֱ䆕DŽ㄀ѠህᰃࠄᑩҔМᯊ׭˄໮ᇥᖂ⾦Пݙ˅ܜSCHED_RR ϸ⾡䇗ᑺᬓㄪⱘ䆒ゟ੠Ӭ 䞠᳝ϸϾ䯂乬ˈ㄀ϔᰃᔧ䇗ᑺথ⫳ᯊ㹿૸䝦ⱘ䖯⿟ᰃ৺ϔᅮӮ㹿ᣥ䗝ϞDŽ䖭ϔ⚍⬅Ѣ SCHED_FIFO ੠ ᕔᕔ᳝᳈催ⱘᯊ䯈㽕∖DŽ䖭މᑆϾ䖯⿟˅૸䝦ˈՓ݊ৃҹ೼⫼᠋ぎ䯈ᇍџӊ԰䖯ϔℹⱘ໘⧚DŽ䖭⾡ᚙ џӊⱘথ⫳ᓩ䍋њϔ⃵Ёᮁˈ೼Ёᮁ᳡ࡵ⿟ᑣЁ៪ bh ߑ᭄Ё⬅Ѣ䆹џӊⱘথ⫳㗠㽕ᇚᶤϾ䖯⿟˄៪㢹 䖛ᶤ⾡᠟↉৥݊থ䗕ϔϾ䇋∖ᯊህ㽕ᇚ݊Ңⴵ⳴Ё૸䝦DŽ㗠㄀Ѡ⾡ᴹ⑤߭䗮ᐌ᳈Ў㋻ᗹˈ䙷ህᰃᶤϾ ህᰃ೼ client/server ᮍᓣⱘᑨ⫼ЁDŽϔϾ䖯⿟೼ⴵ⳴Ёㄝᕙᴹ㞾݊ᅗ䖯⿟ⱘ᳡ࡵ䇋∖ˈ㗠ᔧ݊ᅗ䖯⿟䗮 ৢ䇏㗙೼Ā䖯⿟䯈䗮ֵāЁӮⳟࠄⱘ䗮䖛ㅵ䘧ǃ᡹᭛䯳߫ҹঞ socket ㄝ᠟↉䖯㸠ⱘ䗮ֵDŽ݌ൟⱘᚙ᱃ ֵোˈ䇏㗙Ꮖ㒣೼㋏㒳䇗⫼ exit()ϔ㡖Ёⳟࠄ䖛DŽ䖯⿟䯈䗮ֵᔧ✊ϡሔ䰤Ѣֵোথ䗕ˈ݊ᅗⱘ՟ᄤབҹ ૸䝦ϔϾⴵ⳴Ёⱘ䖯⿟᳝ϸ⾡ᴹ⑤DŽϔ⾡ᰃ䖯⿟䯈䗮ֵˈ՟བϔϾ䖯⿟৥঺ϔϾ䖯⿟থ䗕њϔϾ ㋻ᗹᯊˈ䖭ህ᳝ᯊ䯂㽕∖њDŽ ᑊ᮴ᅲᯊⱘ㽕∖DŽ㗠㄀Ѡ⾡ˈህᰃᔧ૸䝦ϔϾ䖯⿟ᑊথ⦄䆹䖯⿟ⱘᴗؐ↨ᔧࠡ䖯⿟᳈催ˈгህᰃ᳈Ў ϝ⾡ᴵӊЁˈ㄀ϝ⾡˄ᬍব䇗ᑺᬓㄪ៪⼐䅽˅ᇍᯊ䯈ᑊϡᬣᛳDŽ㄀ϔ⾡㱑ᰃ⬅ᯊ䯈ᓩ䍋ˈԚᅲ䰙Ϟг 388 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 835 return 0; 834 } 833 return •EINTR; 832 } 831 return •EFAULT; 830 if (copy_to_user(rmtp, &t, sizeof(struct timespec))) 829 jiffies_to_timespec(expire, &t); 828 if (rmtp) { 827 if (expire) { 826 825 expire = schedule_timeout(expire); 824 current•>state = TASK_INTERRUPTIBLE; 823 822 expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec); 821 820 } 819 return 0; 818 udelay((t.tv_nsec + 999) / 1000); 817 */ 816 * Its important on SMP not to do this holding locks. 815 * 814 * high precision by a busy wait for all real•time processes. 813 * Short delay requests up to 2 ms will be handled with 812 /* 811 { 810 current•>policy != SCHED_OTHER) 809 if (t.tv_sec == 0 && t.tv_nsec <= 2000000L && 808 807 806 return •EINVAL; 805 if (t.tv_nsec >= 1000000000L || t.tv_nsec < 0 || t.tv_sec < 0) 804 803 return •EFAULT; 802 if(copy_from_user(&t, rqtp, sizeof(struct timespec))) 801 800 unsigned long expire; 799 struct timespec t; 798 { 797 asmlinkage long sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp) ==================== kernel/timer.c 797 836 ==================== ㋏㒳䇗⫼ nanosleep()೼ݙḌЁⱘᅲ⦄Ў sys_nanosleep()ˈ݊ҷⷕ೼ kernel/sched.c Ё˖ 䖭ϔ㡖Ё៥Ӏ䲚Ёҟ㒡 nanosleep()੠ pause()ϸϾ㋏㒳䇗⫼DŽ schedule()ˈ᠔ҹ≵᳝ᡞᅗᬒ೼ᔎࠊᗻ䇗ᑺϔ㡖Ёˈ៥Ӏ೼䆆䖯⿟䯈䗮ֵᯊ䖬㽕䆆ࠄ䖭Ͼ䆱乬DŽ ᅲ䰙ϞᰃᔎࠊᗻⱘˈԚ⬅Ѣ೼ᔶᓣϞᔧࠡ䖯⿟೼ do_signal()ⱘ䖛⿟ЁĀЏࡼā䇗⫼މᗕDŽ䖭⾡ᚙ TASK_STOPPEDˈᑊҢৃᠻ㸠䯳߫Ё㜅䩽ˈϔⳈ㽕ࠄᬊࠄϔϾ SIGCONT ֵোᯊᠡ㛑ᘶ໡ࠄৃ䖤㸠⢊ 䯈Пࠡ˄ϡㅵᰃ಴Ў㋏㒳䇗⫼ǃЁᮁ៪ᰃᓖᐌ˅ˈህӮ೼ do_signal()Ё䇗⫼ schedule()ˈ䖯⿟⢊ᗕব៤ 389 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 70 __asm__("mull %0" 69 int d0; 68 { 67 inline void __const_udelay(unsigned long xloops) ==================== arch/i386/lib/delay.c 67 74 ==================== ϟ ysnanodeeP()˚u 㸔 lay()˚udday()˚consLudday()1 79 } 78 __const_udelay(usecs * 0x000010c6); /* 2**32 / 1000000 */ 77 { 76 void __udelay(unsigned long usecs) [sys_nanosleep()>udelay()>__udelay()] ==================== arch/i386/lib/delay.c 76 79 ==================== Ӏᡞ⍝ঞⱘ৘Ͼߑ᭄䗤ሖ߫೼ϟ䴶ˈկ䇏㗙䯙䇏˖ 䰸㢹ᑆ乘ᅮⱘᐌ᭄ҹ໪ˈ䛑ᰃ䗮䖛ߑ᭄__udelay()ᅠ៤ᓊ䖳ˈ݊ҷⷕ೼ arch/i386/lib/delay.c ЁDŽ៥ 18 __udelay(n)) 17 ((n) > 20000 ? __bad_udelay() : __const_udelay((n) * 0x10c6ul)) : \ 16 #define udelay(n) (__builtin_constant_p(n) ? \ ==================== include/asm•i386/delay.h 16 18 ==================== include/asm•i386/delay.h Ё˖ ᦤկⱘাᰃᓊ䖳㗠ϡᰃⴵ⳴DŽ䖭䞠⬅ϔϾᅣ᪡԰ udelay()䗮䖛䅵᭄ᴹᅲ⦄ᓊ䖳ˈ݊ᅮН೼ ϟ㛑މ㽕ࠄ 10 ↿⾦ҹৢᠡ㛑ᇚ݊૸䝦ˈᇍѢᅲᯊᑨ⫼ⱘ䖯⿟ᴹ䇈䖭ᰃϡ㛑᥹ফⱘDŽ᠔ҹˈ೼䖭ḋⱘᚙ ˄݊䇗ᑺᬓㄪЎ SCHED_FIFO ៪ SCHED_RR˅ˈ䙷ህϡ㛑ⳳⱘ䅽䖭Ͼ䖯⿟䖯ܹⴵ⳴ˈ಴Ў䙷ḋ᳝ৃ㛑 㸠ⱘ⡍⅞໘⧚ˈ䙷ህᰃབᵰ㽕∖ⴵ⳴ⱘᯊ䯈ᇣѢ 2 ↿⾦ˈ㗠㽕∖ⴵ⳴ⱘ䖯⿟জᰃϾ᳝ᅲᯊ㽕∖ⱘ䖯⿟ ᐌ䗨ᕘ⬅ᯊ䩳Ёᮁ᳡ࡵ⿟ᑣᴹ૸䝦ⱘ䆱ˈ䙷ህা㛑䖒ࠄ 10 ↿⾦ⱘ㊒ᑺDŽℷ಴Ў䖭ḋˈᠡ᳝ 809̚821 include/asm•i386/param.h˅ˈгህᰃ䇈ᯊ䩳Ёᮁⱘ਼ᳳЎ 10 ↿⾦DŽ䖭ᛣੇⴔˈབᵰ䖯⿟䖯ܹⴵ⳴㗠ᕾℷ ᑺৃҹ䖒ࠄ↿ᖂ⾦ⱘ䞣㑻DŽҹࠡ䆆䖛ˈ೼݌ൟⱘݙḌ䜡㕂Ёᯊ䩳Ёᮁⱘ乥⥛ Hz Ў 100˄㾕 䖭䞠ⱘ tv_secˈऩԡЎ⾦ˈ㗠 tv_nsec Ў↿ᖂ⾦ˈгህᰃ 10•9 ⾦DŽᔧ✊ˈ䖭ᑊϡ㸼⼎ⴵ⳴ᯊ䯈ⱘ㊒ 12 }; 11 long tv_nsec; /* nanoseconds */ 10 time_t tv_sec; /* seconds */ 9 struct timespec { ==================== include/linux/time.h 9 12 ==================== ᭄᥂㒧ᵘ timespec ⱘᅮН೼ include/linux/time.h Ё˖ DŽܝ⫼ᅮᰃ৺ݡ⃵ⴵ⳴ᡞᯊ䯈އⱘ᭄᥂㒧ᵘЁ䖨ಲ࠽ԭⱘᯊ䯈˄བᵰ rmtp ϡᰃ NULL˅ˈ✊ৢ䖯⿟ৃҹ ᥂㒧ᵘDŽ䖭ᰃ಴Ўⴵ⳴Ёⱘ䖯⿟᳝ৃ㛑಴᥹ᬊࠄֵো㗠ᦤࠡ㹿૸䝦ˈ䖭ᯊ׭ߑ᭄䖨ಲ•1 ᑊ೼ rmtp ᠔ᣛ ϔϾᣛ䩜 rqtpˈᣛ৥㒭ᅮ᠔䳔ⴵ⳴ᯊ䯈ⱘ᭄᥂㒧ᵘ˗㄀ѠϾᣛ䩜 rmtpˈ߭ᣛ৥䖨ಲ࠽ԭⴵ⳴ᯊ䯈ⱘ᭄ ᑧߑ᭄ sleep()ⱘখ᭄ᰃҹ⾦Ўऩԡⱘᭈ᭄ˈ㗠 nanosleep()ⱘখ᭄߭ЎϸϾ timespec 㒧ᵘᣛ䩜DŽ㄀ { 836 390 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䇗⫼ timespec_to_jiffies()ˈᇚ᭄᥂㒧ᵘ tܜˈ∖ಲࠄ sys_nanosleep()ⱘҷⷕЁDŽᇍѢℷᐌⱘⴵ⳴㽕 ϸ⃵᪡԰П䯈ⱘᯊ䯈䯈䱨ϡᕫᇣѢᶤϾ⡍ᅮؐˈ᠔ҹህ᳝њ䖭Мⷁⱘᓊ䖳㽕∖DŽ ᴹᅲ⦄DŽৃᰃˈЎҔМӮ᳝䖭Мⷁⱘᓊ䖳㽕∖ਸ਼˛䖭ϔ㠀ᰃϢ໪䆒᪡԰Ⳍ㘨㋏ⱘˈ᳝ѯ໪䆒㽕∖䖲㓁 ᕫᏆ㗠ЎПDŽݡ䇈ˈेՓᇍѢ᳝ᅲᯊ㽕∖ⱘ䖯⿟ˈা㽕ᓊ䖳ⱘᯊ䯈䍙䖛 2 ↿⾦ˈгϡ⫼䗮䖛䖭⾡ࡲ⊩ CPUˈ㗠াᰃ䗮䖛ᕾ⦃ᴹ⍜⺼ᥝϔѯᯊ䯈DŽ䖭ᔧ✊ϡᰃϾདࡲ⊩ˈԚᇍѢ᳝ᅲᯊ㽕∖ⱘ䖯⿟гাདϡ ϟᔧࠡ䖯⿟ᑊϡⳳⱘ䖯ܹⴵ⳴ˈᑊϡ䅽ߎމudelay()ᰃ䗮䖛䅵᭄ᕾ⦃ᴹ䖒ࠄᓊ䖳ⱘDŽгህᰃ䇈ˈ䖭⾡ᚙ 䇏㗙ᇍѢጠܹ C ҷⷕⱘ∛㓪䇁হᏆ㒣ϡ䰠⫳њˈ᠔ҹ䖭䞠ϡݡ㾷䞞DŽҢ䖭↉ҷⷕЁৃҹⳟߎˈ 57 } 56 :"0" (loops)); 55 :"=&a" (d0) 54 "2:\tdecl %0\n\tjns 2b" 53 ".align 16\n" 52 "1:\tjmp 2f\n" 51 ".align 16\n" 50 "\tjmp 1f\n" 49 __asm__ __volatile__( 48 int d0; 47 { 46 static void __loop_delay(unsigned long loops) 45 44 */ 43 * Non TSC based delay loop for 386, 486, MediaGX 42 /* [sys_nanosleep()>udelay()>__udelay()>__const_udelay()>__delay()>__loop_delay()] ==================== arch/i386/lib/delay.c 42 57 ==================== ᅲ⦄DŽ བᵰ CPU ᬃᣕ෎Ѣ⹀ӊⱘᓊ䖳ˈ䙷Мህ䗮䖛__rdtsc_delay()ᅠ៤᠔䳔ⱘᓊ䖳ˈ৺߭⬅䕃ӊ䗮䖛䅵᭄ 65 } 64 __loop_delay(loops); 63 else 62 __rdtsc_delay(loops); 61 if(x86_udelay_tsc) 60 { 59 void __delay(unsigned long loops) [sys_nanosleep()>udelay()>__udelay()>__const_udelay()>__delay()] ==================== arch/i386/lib/delay.c 59 65 ==================== ᥂⹂ᅮˈᑊֱᄬ೼᭄᥂㒧ᵘ current_cpu_data Ё˖ Ѣ݋ԧⱘ CPUˈ㋏㒳߱ྟ࣪ᯊ⬅ݙḌḍ᥂䞛䲚ⱘ᭄އᐌ䞣 current_cpu_data.loops_per_sec ⱘ᭄ؐপ 74 } 73 __delay(xloops * HZ); 72 :"1" (xloops),"0" (current_cpu_data.loops_per_jiffy)); d" (xloops), "=&a" (d0)=": 71 391 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 378 * These two special cases are useful to be comfortable 377 /* 376 case MAX_SCHEDULE_TIMEOUT: 375 { 374 switch (timeout) 373 372 unsigned long expire; 371 struct timer_list timer; 370 { 369 signed long schedule_timeout(signed long timeout) [sys_nanosleep()>schedule_timeout()] ==================== kernel/sched.c 369 419 ==================== 䝦DŽߑ᭄ schedule_timeout()ⱘҷⷕг೼ sched.c Ё˖ ⴵ⳴⢊ᗕ TASK_INTERRUPTϢ TASK_UNINTERRUPT ⱘऎ߿೼Ѣৢ㗙೼䖯⿟᥹ᬊࠄֵোᯊϡӮ㹿૸ ✊ৢˈᇚᔧࠡ䖯⿟ⱘ⢊ᗕᬍЎ TASK_lNTERRUPT ᑊ䇗⫼ schedule_timeout()䖯ܹⴵ⳴DŽҹࠡ䆆䖛ˈ ⊼ᛣˈࠡ䴶 sys_nanosleep()Ёⱘ 822 㸠ⱘ(t.tv_sec || t.tv_nsec)ᰃ݇㋏㸼䖒ᓣˈ݊ؐЎ 1 ៪㗙 0DŽ 42 } 41 return HZ * sec + nsec; 40 nsec /= 1000000000L / HZ; 39 nsec += 1000000000L / HZ • 1; 38 return MAX_JIFFY_OFFSET; 37 if (sec >= (MAX_JIFFY_OFFSET / HZ)) 36 35 long nsec = value•>tv_nsec; 34 unsigned long sec = value•>tv_sec; 33 { 32 timespec_to_jiffies(struct timespec *value) 31 static __inline__ unsigned long 30 29 #define MAX_JIFFY_OFFSET ((~0UL >> 1)•1) 28 */ 27 * be positive. 26 * at _least_ "jiffies" • so "jiffies+1" had better still 25 * to wait "jiffies+1" in order to guarantee that we wait 24 * for various timeout reasons we often end up having 23 * Note that we don't want to return MAX_LONG, because 22 * 21 * And some not so obvious. 20 * 19 * most obvious overflows.. 18 * Change timeval to jiffies, trying to avoid the 17 /* [sys_nanosleep()>timespec_to_jiffies()] ==================== include/linux/time.h 17 42 ==================== Ёⱘ᭄ؐᤶㅫ៤ᯊ䩳Ёᮁⱘ⃵᭄ˈᤶㅫⱘᮍ⊩೼ time.h Ёˈ៥Ӏᡞᅗ⬭㒭䇏㗙㞾Ꮕ䯙䇏˄time.h˅˖ 392 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⱘ⊼㾷ҹঞҷⷕЁⱘ㄀ 822 㸠˅ˈህ䖨ಲϔϾᐌ᭄ MAX_JIFFY_OFFSETDŽ䖭Ͼᐌ᭄೼ schedule_timeout() 㾕ࠡ䴶 sys_nanosleep()ߑ᭄ҷⷕЁᇍ timespec_to_jiffies()ˈ1 ޣᏺヺোᭈ᭄㸼䖒ᯊ˄݊ᅲᰃ᳔໻ⱘℷ᭄ ᡞ䖭Ͼ᭄䞣Ϣᔧࠡⱘ jiffies ⳌࡴህᕫࠄњĀࠄ⚍āⱘᯊ䯈DŽԚᰃˈᔧ᠔㽕∖ⱘᯊ䯈໾䭓ˈ䭓ࠄϡ㛑⫼ ᤶㅫ៤ᯊ䩳Ёᮁⱘ᭄䞣ˈܜЁᮁⱘ⃵᭄䅵᭄DŽ᠔ҹˈ೼䇗⫼ schedule_timeout()Пࠡᡞ䳔㽕ⴵ⳴ⱘᯊ䯈 ˄Āⶀ䯈āⱘᛣᗱ˅DŽϢℸⳌᑨˈݙḌЁ䆒㕂њϔϾܼሔⱘ䅵఼᭄ jiffiesˈ⫼ᴹᇍ㋏㒳㞾߱ྟ࣪ҹᴹᯊ䩳 jiffy”“خ೼ݙḌЁᡞᯊ䩳Ёᮁⱘ⃵᭄԰Ў䅵ᯊⱘ㒳ϔሎᑺˈᑊ㒭ᯊ䩳ЁᮁП䯈ⱘ䯈䱨䍋њϾৡᄫি 419 } 418 return timeout < 0 ? 0 : timeout; 417 out: 416 415 timeout = expire • jiffies; 414 413 del_timer_sync(&timer); 412 schedule(); 411 add_timer(&timer); 410 409 timer.function = process_timeout; 408 timer.data = (unsigned long) current; 407 timer.expires = expire; 406 init_timer(&timer); 405 404 expire = timeout + jiffies; 403 402 } 401 } 400 goto out; 399 current•>state = TASK_RUNNING; 398 __builtin_return_address(0)); 397 "value %lx from %p\n", timeout, 396 printk(KERN_ERR "schedule_timeout: wrong timeout " 395 { 394 if (timeout < 0) 393 */ 392 * that will tell you if something is gone wrong and where. 391 * should never happens anyway). You just have the printk() 390 * for a negative retval of schedule_timeout() (since it 389 * 0 since no piece of kernel is supposed to do a check 388 * Another bit of PARANOID. Note that the retval will be 387 /* 386 default: 385 goto out; 384 schedule(); 383 */ 382 * the caller to do everything it want with the retval. 381 * but I' d like to return a valid offset (>=0) to allow 380 * MAX_SCHEDULE_TIMEOUT from one of the negative value in the caller. Nothing more. We could take * 379 393 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 122 static inline void internal_add_timer(struct timer_list *timer) [sys_nanosleep()>schedule_timeout()>add_timer()>internal_add_timer()] ==================== kernel/timer.c 122 160 ==================== internal_add_timer()ⱘҷⷕ䖬ᰃ೼ৠϔ᭛ӊЁ˄timer.c˅˖ ᇚЁᮁ݇䯁ˈ㗠 spin_unlock_irqsave()߭೼᪡԰ҹৢݡᘶ໡ॳ⢊DŽߑ᭄ܜ()ᡸ䍋ᴹDŽ⬅ spin_lock_irqsave Ḍᖗⱘ᪡԰ᰃ೼ internal_add_timer()ᅠ៤ⱘˈ䖭䞠໮њϔሖĀࣙ㺙āˈⳂⱘᰃᇚḌᖗⱘ䯳߫᪡԰ֱ 190 } 189 __builtin_return_address(0)); 188 printk("bug: kernel timer added twice at %p.\n", 187 spin_unlock_irqrestore(&timerlist_lock, flags); 186 bug: 185 return; 184 spin_unlock_irqrestore(&timerlist_lock, flags); 183 internal_add_timer(timer); 182 goto bug; 181 if (timer_pending(timer)) 180 spin_lock_irqsave(&timerlist_lock, flags); 179 178 unsigned long flags; 177 { 176 void add_timer(struct timer_list *timer) [sys_nanosleep()>schedule_timeout()>add_timer()] ==================== kernel/timer.c 176 190 ==================== ߑ᭄ add_timer()ᇚ timer ᣖܹᅮᯊ఼䯳߫ˈ݊ҷⷕ೼ timer.c Ё˖ ᳝݇ⱘDŽ task_struct ᣛ䩜˛䖭ᰃ಴Ў䖭ḋ᳈Ў♉⌏ǃ䗮⫼ˈݡ䇈ࠄ⚍ᯊ㽕䇗⫼ⱘߑ᭄гᑊϡᘏᰃϢᶤϾ䖯⿟Ⳉ᥹ ࠄᔧࠡ䖯⿟ task_struct ᣛ䩜ⱘᅣ᪡԰DŽ䇏㗙г䆌Ӯ䯂ˈЎҔМϡᑆ㛚ᡞ᭄᥂㒧ᵘЁⱘব䞣 data ᬍ៤ ໛Ӵ㒭 process_timeout()ⱘখ᭄Ў currentˈ䇏㗙ᑨ䆹䖬䆄ᕫˈ䖭ᅲ䰙ϞᰃϔϾᕫޚᅗࠄᑩᑆѯҔМњDŽ ࠄ⚍ᯊ䯈䆒㕂៤䅵ㅫᕫࠄⱘ expireDŽࠄ⚍ᯊ㽕ᠻ㸠ⱘߑ᭄߭Ў process_timeout()ˈㄝϔϟ៥ӀህӮⳟࠄ schedule_timeout()ⱘҷⷕˈህৃҹᡞᭈϾ䖛⿟੠ᴎࠊ䆆⏙ἮњDŽ䖭䞠ˈ೼ init_timer()ҹৢˈᇚᅮᯊ఼ⱘ ܹࡴҹ䅼䆎ˈ䖭ᰃ಴Ў䙷ᯊ៥Ӏ䖬≵᳝䆆䖛䖯⿟䇗ᑺঞ᳝݇ⱘᴎࠊˈᕜ䲒ⳳℷ䆆⏙ἮDŽ㗠⦄೼ˈ㒧ড় ЁᅮНⱘˈ䆺㾕㄀ 3 ゴЁⱘĀᯊ䩳Ёᮁāϔ㡖DŽ៥Ӏ೼䙷䞠ᦤࠄњ䖭Ͼ᭄᥂㒧ᵘঞ݊԰⫼ˈԚ≵᳝⏅ 䩳Ёᮁᯊ䛑㽕Ẕᶹ䖭ѯᅮᯊ఼ᰃ৺ࠄ⚍DŽ᭄᥂㒧ᵘ timer ⱘ㉏ൟЎ timer_listˈᰃ೼ include/linux/time.h ݙḌ㽕䆒㕂དϔϾĀᅮᯊ఼āˈгህᰃ䖭䞠ⱘ᭄᥂㒧ᵘ timerˈᑊᇚ݊ᣖܹϔϾᅮᯊ఼䯳߫ˈ㗠↣⃵ᯊ ᔧ㽕∖ⱘⴵ⳴ᯊ䯈೼㾘ᅮⱘ㣗ࠊҹݙᯊˈݙḌህ㽕ᡓᢙ䍋ᣝᯊᇚℸ䖯⿟૸䝦ⱘ䋷ӏњDŽЎℸⳂⱘˈ এ᳝䰤ৢ㒧ᵰ䖬ᰃ᮴䰤ⱘॳ⧚DŽޣ䖭ԧ⦄њҢ᮴䰤 䇁হ䕀ࠄᷛো out ໘ˈ㗠ব䞣 timeout ⱘ᭄ؐ೼䖭ᭈϾ䖛⿟Ёᑊ᳾ᬍবˈҡᮻᰃ MAX_JIFFY_OFFSETˈ ϟˈᔧ䖯⿟㹿૸䝦㗠Ң schedule()䖨ಲᯊህ䗮䖛 gotoމЎ MAX_JIFFY_OFFSET ᯊⱘ䖨ಲؐDŽ೼䖭⾡ᚙ ߑ᭄ schedule_timeout()ⱘ䖨ಲؐᰃ䖯⿟㹿૸䝦ᯊ࠽ϟⱘ䖬᳾ⴵᅠⱘᯊ䯈DŽ៥Ӏᴹⳟⳟᔧ䇗⫼খ᭄ ᯊᇚ݊૸䝦ⱘ䋷ӏˈ䖭Ͼ䖯⿟㽕ϔⳈⴵ⳴ࠄ᳝঺ϔϾ䖯⿟৥݊থ䗕ϔϾֵোᯊᠡӮ㹿૸䝦DŽ Ё㹿㾚԰Ā᮴䰤ᳳāˈ᠔ҹ೼ 384 㸠Ё䇗⫼ schedule()ህᅠџњDŽ᮶✊ᰃ᮴䰤ᳳⴵ⳴ˈݙḌህϡᡓᢙᣝ 394 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ϟ㽕ᠿᦣ䖛䯳߫Ё᠔᳝ⱘ᭄މЁএᯊˈ㽕೼䯳߫Ё䖯㸠㒓ᗻ᧰㋶ˈᇏᡒ䗖ᔧⱘ䫒ܹԡ㕂ˈ೼᳔ണⱘᚙ ᳾ࠄ⚍ⱘᅮᯊ఼ᯊህৃҹ㒧ᴳњDŽৃᰃ䖭ḋ᳝Ͼ㔎⚍ˈህᰃ↣ᔧ㽕ᇚϔϾᮄⱘᅮᯊ఼ࡴܹࠄ䖭Ͼ䯳߫ ߫ˈ✊ৢ↣ᔧ jiffies ᬍবᯊህҢ䆹䯳߫ⱘ༈䚼ᓔྟ䗤ϾẔᶹᑊ໘⧚䖭ѯ᭄᥂㒧ᵘˈⳈࠄথ⦄㄀ϔϾᇮ 䫒᥹೼ϔ䍋៤ЎϔϾ䯳ৢܜᴀᴹˈ᳔ㅔऩⱘࡲ⊩ᰃᇚ᠔᳝ⱘ timer_list 㒧ᵘˈेᅮᯊ఼ˈᣝĀࠄ⚍āⱘ ໻㟈ҟ㒡ϔϟᅮᯊ఼䯳߫ⱘ㒘㒛DŽܜ೼䖯ϔℹ⏅ܹࠄ internal_add_timer()ⱘҷⷕЁএПࠡˈ᳝ᖙ㽕 ԰⫼DŽ ៥ӀህӮⳟࠄᅗⱘܓ᭄᳝ؐৃ㛑ӮϡৠѢ jiffiesˈㄝϔӮ݊ˈ⚍ޚњાϔ⚍ˈৠᯊгᰃ䆒㕂ᅮᯊ఼ⱘ෎ ೼ 128 㸠Ёᓩ⫼ⱘ timer_jiffies гᰃϾܼሔ䞣ˈ㸼⼎ᔧࠡᇍᅮᯊ఼䯳߫ⱘ໘⧚೼ᯊ䯈ϞᏆ㒣᥼䖯ࠄ 160 } 159 list_add(&timer•>list, vec•>prev); 158 */ 157 * Timers are FIFO! 156 /* 155 } 154 return; 153 INIT_LIST_HEAD(&timer•>list); 152 /* Can only get here on architectures with 64•bit jiffies */ 151 } else { 150 vec = tv5.vec + i; 149 int i = (expires >> (TVR_BITS + 3 * TVN_BITS)) & TVN_MASK; 148 } else if (idx <= 0xffffffffUL) { 147 vec = tv1.vec + tv1.index; 146 */ 145 * or you set a timer to go off in the past 144 /* can happen if you add a timer with expires == jiffies, 143 } else if ((signed long) idx < 0) { 142 vec = tv4.vec + i; 141 int i = (expires >> (TVR_BITS + 2 * TVN_BITS)) & TVN_MASK; 140 } else if (idx < 1 << (TVR_BITS + 3 * TVN_BITS)) { 139 vec = tv3.vec + i; 138 int i = (expires >> (TVR_BITS + TVN_BITS)) & TVN_MASK; 137 } else if (idx < 1 << (TVR_BITS + 2 * TVN_BITS)) { 136 vec = tv2.vec + i; 135 int i = (expires >> TVR_BITS) & TVN_MASK; 134 } else if (idx < 1 << (TVR_BITS + TVN_BITS)) { 133 vec = tv1.vec + i; 132 int i = expires & TVR_MASK; 131 if (idx < TVR_SIZE) { 130 129 struct list_head * vec; 128 unsigned long idx = expires • timer_jiffies; 127 unsigned long expires = timer•>expires; 126 */ must be cli•ed when calling this * 125 */ 124 } 123 395 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 101 (struct timer_vec *)&tv1, &tv2, &tv3, &tv4, &tv5 100 static struct timer_vec * const tvecs[] = { 99 98 static struct timer_vec_root tv1; 97 static struct timer_vec tv2; 96 static struct timer_vec tv3; 95 static struct timer_vec tv4; 94 static struct timer_vec tv5; 93 92 }; 91 struct list_head vec[TVR_SIZE]; 90 int index; 89 struct timer_vec_root { 88 87 }; 86 struct list_head vec[TVN_SIZE]; 85 int index; 84 struct timer_vec { 83 82 #define TVR_MASK (TVR_SIZE • 1) 81 #define TVN_MASK (TVN_SIZE • 1) 80 #define TVR_SIZE (1 << TVR_BITS) 79 #define TVN_SIZE (1 << TVN_BITS) 78 #define TVR_BITS 8 77 #define TVN_BITS 6 76 */ 75 * Event timer code 74 /* ==================== kernel/timer.c 74 102 ==================== 㸼ˈेᅮᯊ఼䯳᭄߫㒘DŽ䆺㾕ϟ߫ҷⷕ˄timer.c˅˖ޥ೼ Linux ݙḌЁ䆒㕂њѨϾ㗠ϡᰃϔϾ䖭ḋⱘᴖ њ䖭Ͼ䯂乬ˈ䆒䅵ᑊᅲ⦄њϔ⾡ⳌᔧᎻ཭ⱘᮍḜDŽއḌ↨䕗དഄ㾷 ⥛ˈজ㽕乒ঞ೼ᇚᅮᯊ఼ᦦܹࠄ䖭ѯ䯳߫Ёএᯊⱘᬜ⥛ˈᇍℸᴎࠊⱘ䆒䅵੠ᅲ⦄ᰃϔ⾡ᣥ៬DŽLinux ݙ ৃᰃᘏϡৃ㛑䆒㕂 232 Ͼᅮᯊ఼䯳߫৻˛᠔ҹˈ᮶㽕乒ঞ೼ᯊ䩳Ёᮁথ⫳ᯊẔᶹᑊ໘⧚䖭ѯᅮᯊ఼ⱘᬜ ᮍḜᰃ↣Ͼ䯳߫Ёা᳝ሲѢৠϔࠄ⚍ᯊ䯈ⱘᅮᯊ఼DŽއ㊳㊩ⱘˈԚᰃᘏিҎ㾝ᕫϡሑབҎᛣDŽ⧚ᛇⱘ㾷 ϟ೼ϔϾ䯳߫Ёৃҹ᳝ߚሲѢ 222 ⾡ϡৠࠄ⚍ᯊ䯈ⱘᅮᯊ఼DŽᔧ✊ˈ೼ᅲ䰙䖤㸠ЁᰃϡӮ䖭Мމണⱘᚙ 䅵ㅫⱘ㒧ᵰˈгህᰃ䇈೼᭄㒘Ё᳝ 210 Ͼ䯳߫ˈ䙷МҢ⧚䆎Ϟ䇈೼᳔ޥབ៥Ӏপ᳔Ԣⱘ 10 ԡ԰Ўᴖ؛ ˈ䅵ㅫৢⱘ㒧ᵰⳌৠህӮ㹿䫒ܹࠄৠϔϾ䯳߫ЁDŽ՟བˈjiffies ᰃϾ 32 ԡ᮴ヺোᭈ᭄ޥᰃ಴Ўা㽕ᴖ 㸼㒘㒛䞠↣Ͼ䯳߫Ё䖬Ӯ᳝ᕜ໮ߚሲѢϡৠࠄ⚍ᯊ䯈ⱘᅮᯊ఼ˈ䖭ޥ݊ԭ᭄DŽԚᰃˈ೼䖭⾡ㅔऩⱘᴖ гህᰃ䗮䖛ĀϢā䖤ㅫᇚ᭄ؐЁⱘ催ԡሣ㬑ᥝˈ䖭ᅲ䰙ϞⳌᔧѢᇚ᭄ؐ䰸ҹϔϾ 2 ⱘᭈ᭄⃵ᐖҹৢপ 䅵ㅫ㥿䖛ѢҢ᭄ؐЁᢑপ᳔Ԣⱘ㢹ᑆԡˈޥᇣ৘Ͼ䯳߫ⱘᑇഛ䭓ᑺˈҢ㗠ᦤ催ᬜ⥛DŽ᳔ㅔऩⱘᴖޣҹ ᅮᑨ䆹ᇚ݊䫒ܹࠄાϔϾ䯳߫ЁDŽ䖭ḋˈ䗮䖛ᇚᅮᯊ఼ߚᬷ䫒ܹࠄϡৠⱘ䯳߫Ёˈህৃއ䅵ㅫޥ䖛ᴖ 㒛៤ϔϾ䯳᭄߫㒘ˈ៪㗙䇈䯳߫ⱘ䰉߫ˈ㗠ϡᰃϔϾऩϔⱘ䯳߫ˈ✊ৢḍ᥂↣Ͼᅮᯊ఼ࠄ⚍ⱘᯊ䯈㒣 ā˄hash˅ᴹᬍ୘ᬜ⥛DŽгህᰃ䇈ˈᇚ䖭ѯᅮᯊ఼᭄᥂㒧ᵘ㒘ޥ⊩ⱘ䇏㗙ৃ㛑偀ϞӮᛇࠄৃҹ䗮䖛Āᴖ ᥂㒧ᵘDŽᔧ䯳߫Ёⱘ៤ਬ᭄䞣᳝ৃ㛑ᕜ໻ᯊˈ䖭⾡ᮍḜⱘᬜ⥛ህϡ㛑ҸҎ⒵ᛣњDŽᄺ䖛᭄᥂㒧ᵘϢㅫ 396 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䖯ܹ tv5ˈߚѨℹᠡ㛑ܜܹ tv2ˈ㄀ 2 ℹ䖯ܹ tv1˅DŽձ⃵㉏᥼ˈᔧࠄ⚍ᯊ䯈Ϣᔧࠡᯊ䯈ⱘᏂ໻Ѣ 226 ᯊ㽕 ߫Ёⱘԡ㕂᮴݇DŽ⬅ℸৃ㾕ˈ䫒ܹ tv2 ৘Ͼ䯳߫䞠ⱘᅮᯊ఼ᰃߚϸℹࠄԡ䖯ܹ tv1 Ёⱘ䯳߫˄㄀ϔℹ䖯 ᏆᇣѢ 256˄⬅Ѣᔧࠡᯊ䯈ⱘ᥼䖯˅ˈ᠔ҹ䛑Ӯ㹿ߚᬷࠄ tv1 Ёⱘ৘Ͼ䯳߫Ёএˈ㗠Ϣ৘Ͼᅮᯊ఼೼䯳 ↣Ͼᅮᯊ఼䛑ݡ䇗⫼ϔ⃵ internal_add_timer()DŽℸᯊ䆹䯳߫Ё᠔᳝ᅮᯊ఼ⱘࠄ⚍ᯊ䯈Ϣᔧࠡᯊ䯈ⱘᏂ䛑 ៤ˈህҢ tv2 Ёḍ᥂ tv2.index ⱘᣛᓩᇚ tv2 ЁⱘϔϾ䯳߫ᨀ䖤ࠄ tv1 ЁDŽ೼ᨀ䖤ⱘ䖛⿟Ёˈᇍ䯳߫Ёⱘ ህজᇚ݊䆒៤ 0ˈಲࠄ᭄㒘ⱘᓔ༈ˈᓔྟ঺໪ϔ䕂ⱘ 256 ⃵ᯊ䩳ЁᮁDŽℸᯊˈ⬅ѢϔϾ tv1 ਼ᳳᏆ㒣ᅠ ᠔ᣛᅮⱘߑ᭄˅ᑊᇚ䖭ѯᅮᯊ఼䞞ᬒˈ✊ৢᇚ index г৥ࠡ᥼䖯ϔℹህ㸠њDŽᔧ tv1.index 䖒ࠄ 256 ᯊ ৥ࠡ᥼䖯ϔℹᯊˈা㽕೼ tv1 Ёḍ᥂ index ⱘᣛ⼎ᇚϔϾ䯳߫Ё᠔᳝ⱘᅮᯊ఼䛑໘⧚ϔ䘡˄ᠻ㸠ᅮᯊ఼ ܹԡ㕂њˈҢ㗠݊ҷӋ៤њϔϾᐌ᭄ˈ㗠Ϣ䯳߫䭓ᑺ᮴݇њDŽৠᯊˈᔧᯊ䩳Ёᮁথ⫳ˈҢ㗠ᇚ jiffies ᳝݇㋏DŽ䖭ḋˈᇚϔϾᅮᯊ఼䫒ܹࠄ䯳߫Ёⱘ᪡԰বᕫᕜㅔऩˈḍᴀህϡ䳔㽕೼䯳߫Ёᇏᡒড়䗖ⱘᦦ ᰃ೼ৠϔᯊ䯈ࠄ⚍ˈ᠔ҹᦦܹⱘԡ㕂ḍᴀ≵᳝݇㋏˗㗠ᇍѢ݊ᅗⱘ䯳߫ᴹ䇈ˈϟ䴶ህӮⳟࠄ݊ᅲг≵ гህᰃ䇈ˈ↣⃵䛑ᰃᦦܹࠄ䯳߫ⱘሒ䚼DŽᇍѢ tv1 Ёⱘ䯳߫ᴹ䇈ˈ⬅Ѣ↣Ͼ䯳߫Ё᠔᳝ⱘᅮᯊ఼䛑 ఼䫒ܹࠄ䯳߫Ёⱘ᪡԰⬅ list_add()ᅠ៤DŽ ⦄೼ৃҹಲࠄ internal_add_timer()ⱘҷⷕЁњDŽ䇏㗙ᑨ䆹ৃҹ㞾Ꮕ䇏ព䖭↉ҷⷕˈ݊Ё݋ԧᇚᅮᯊ 䇈ˈtv2 ⱘĀሎᑺāϢ tv1 ϡৠDŽᔧᏂؐ໻Ѣ 214 ᯊˈ䙷ህ㽕䖯ϔℹⳟᏂؐᰃ৺໻Ѣ 220 њˈԭ㉏᥼DŽ tv2 Ёⱘ䯳߫߭ϡ✊DŽ⧚䆎Ϟ tv2 Ёⱘ↣Ͼ䯳߫䛑ৃ㛑৿᳝ߚሲ 256 Ͼϡৠࠄ⚍ᯊ䯈ⱘᅮᯊ఼DŽгህᰃ ᰒ✊ˈtv2 Ёⱘ䯳߫Ϣ tv1 Ёⱘϡৠˈ಴Ў tv1 Ё↣Ͼ䯳߫䞠ⱘᅮᯊ఼䛑ሲѢৠϔϾࠄ⚍ᯊ䯈ˈ㗠 ೒ 4.7 ᅮᯊ఼䯳᭄߫㒘ϟᷛ⹂ᅮ㾘߭⼎ᛣ೒ ៪ϟᷛˈᑊᇚᅮᯊ఼ᦦܹࠄ tv2 ⱘᶤϾ䯳߫ЁএDŽ⼎ᛣ೒བ೒ 4.7DŽˈؐޥᴖ ᯊ׭ህⳟᏂؐᰃ৺ᇣѢ 214ˈབᵰᰃˈህপࠄ⚍ᯊ䯈ⱘ᭄ؐЁⱘ㄀Ѡ↉˄6 ԡˈҢ㄀ 8 ԡ㟇㄀ 14 ԡ˅Ў 䯳߫ˈ᠔ҹ↣Ͼ䯳߫Ёⱘᅮᯊ఼䛑݋᳝Ⳍৠⱘࠄ⚍ᯊ䯈DŽৃᰃˈᔧᏂؐ໻ѢㄝѢ 256 ᯊᗢМࡲਸ਼˛䖭 Ўϟᷛ೼ tv1 ⱘ᭄㒘ЁᡒࠄⳌᑨⱘ䯳߫ˈᑊᇚℸᅮᯊ఼䫒ܹࠄ䖭Ͼ䯳߫ЁDŽ⬅Ѣ tv1 ⱘ᭄㒘Ё᳝ 256 Ͼ ԰ؐޥ✊ৢ⫼䖭Ͼᴖˈؐޥҹৢࠄ⚍ˈབᵰ䖭ϾᏂؐᇣѢ 256 ⱘ䆱ህপࠄ⚍ᯊ䯈ⱘ᳔Ԣ 8 ԡ԰Ў݊ᴖ ḍ᥂ࠄ⚍ᯊ䯈੠ᔧࠡᯊ䯈䅵ㅫߎ䖭Ͼᅮᯊ఼ᑨ䆹೼໮ᇥ⃵ᯊ䩳ЁᮁܜˈᇚϔϾᅮᯊ఼ᣖܹ䯳߫Ёএᯊ ᇚ 32 ԡⱘࠄ⚍ᯊ䯈гߦߚ៤Ѩ↉ˈ݊Ё᳔Ԣⱘϔ↉Ў 8 ԡˈϢ tv1 Ⳍᇍᑨˈ݊ᅗಯ↉߭䛑ᰃ 6 ԡDŽ㽕 ⱘDŽ↣Ͼ᭄㒘䛑ϢϔϾব䞣 index Ⳍ㘨㋏ˈ⫼ᴹᣛ⼎ᔧϟϔϾᯊ䩳Ёᮁথ⫳ᯊ㽕໘⧚ⱘ䯳߫DŽϢℸৠᯊˈ Ёⱘ᭄㒘໻ᇣЎ 28ˈ㗠݊ᅗ޴Ͼⱘ໻ᇣ䛑ᰃ 26DŽ䖭ḋˈ䯳߫ⱘ᭄䞣ᘏ݅ᰃ 28+4×26=512ˈ䖬ᰃৃҹ᥹ফ 㸼Ёⱘ↣Ͼᣛ䩜䛑ᣛ৥ϔϾᅮᯊ఼䯳߫DŽ݊Ё tv1 Ϣ݊ᅗ޴Ͼ᭄᥂㒧ᵘⱘϡৠҙ೼Ѣ᭄㒘ⱘ໻ᇣˈtv1 㸼˄bucket˅ˈޥ᭄᥂㒧ᵘ tv1ǃtv2ǃĂǃtv5 ↣Ͼ䛑ࣙ৿њϔϾ timer_list ᣛ䩜᭄㒘ˈ䖭ህᰃ᠔䇧ᴖ ;{ 102 397 398 䖯ܹ tv1DŽ㱑✊᳝ѯᅮᯊ఼㽕ߚད޴ℹᠡ㛑ࠄ䖒 tv1 Ёˈ݊ҷӋҡ✊Ϣ䯳߫䭓ᑺ᮴݇ˈᑊϨ᳝ϾϞ䰤ˈ ህᰃ᳔໮ѨℹDŽ᠔ҹˈ䖭Ͼࡲ⊩㽕↨㒓ᗻ᧰㋶དᕫ໮DŽ ᇚᅮᯊ఼䫒ܹࠄᶤϾ䯳߫Ёҹৢˈschedule_timeout()ህ䇗⫼ schedule()ˈՓᔧࠡ䖯⿟ⳳℷഄ䖯ܹⴵ ⳴ˈㄝᕙ૸䝦DŽ 䙷Мˈᯊ䩳Ёᮁᗢḋ૸䝦䖭Ͼ䖯⿟ਸ਼˛ ೼㄀ 3 ゴЁⱘĀᯊ䩳Ёᮁāϔ㡖Ёˈ៥Ӏⳟࠄ೼Ңᯊ䩳Ёᮁ䖨ಲПࠡ㽕ᠻ㸠Ϣᯊ䩳᳝݇ⱘ bh ߑ᭄ timer_bh()ˈ㗠 timer_bh()㽕䇗⫼ϔϾߑ᭄ run_timer_list()˄kernel/timer.c˅˖ ==================== kernel/timer.c 668 672 ==================== 668 void timer_bh(void) 669 { 670 update_times(); 671 run_timer_list(); 672 } ߑ᭄ run_timer_list()ⱘҷⷕ೼ kernel/sched.c Ё˖ ==================== kernel/timer.c 288 324 ==================== [timetbh()˚runnmerlist()] 288 struct task_struct *tsk; 289 290 tsk = cpu_curr(this_cpu); 291 if (preemption_goodness(tsk, p, this_cpu) > 1) 292 tsk•>need_resched = 1; 293 #endif 294 } 295 296 /* 297 * Careful! 298 * 299 * This has to add the process to the _beginning_ of the 300 * run•queue, not the end. See the comment about "This is 301 * subtle" in the scheduler proper.. 302 */ 303 static inline void add_to_runqueue(struct task_struct * p) 304 { 305 list_add(&p•>run_list, &runqueue_head); 306 nr_running++; 307 } 308 309 static inline void move_last_runqueue(struct task_struct * p) 310 { 311 list_del(&p•>run_list); 312 list_add_tail(&p•>run_list, &runqueue_head); 313 } 314 315 static inline void move_first_runqueue(struct task_struct * p) Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 272 * We are removing _all_ timers from the list, so we don't have to 271 /* 270 curr = head•>next; 269 head = tv•>vec + tv•>index; 268 267 struct list_head *head, *curr, *next; 266 /* cascade all the timers from tv up one level */ 265 { 264 static inline void cascade_timers(struct timer_vec *tv) [timer_bh()>run_timer_list()>cascade_timers()] ==================== kernel/timer.c 264 286 ==================== ߑ᭄ cascade_timers()ⱘҷⷕг೼ৠϔ᭛ӊЁDŽ䖭ᰃϔ↉ㅔऩⱘҷⷕˈ៥Ӏህϡࡴ㾷䞞њDŽ 104 #define NOOF_TVECS (sizeof(tvecs) / sizeof(tvecs[0])) ==================== kernel/timer.c 104 104 ==================== 䖭䞠ⱘ NOOF_TVECS Ўϔᐌ᭄ˈᅲ䰙Ϟህᰃ 5˄timer.c˅˖ 䖤ˈ㗠㄀ 298 㸠߭㸼⼎བᵰ tv2.index ᥼䖯ҹৢব៤њ 1 ህ㽕䖯ϔℹҢ tv3 ᨀ䖤ˈԭ㉏᥼DŽ ㄝѨϾ᭄᥂㒧ᵘгᬒ೼ϔϾ᭄㒘Ёˈ䖭ህᰃ tvecs[]DŽ䖭䞠ᇚϟᷛ䆒៤Ң 1 Ϡྟˈህᰃ㸼⼎Ң tv2 ᓔྟᨀ 㗠ϡᰃ 0DŽᅲ䰙Ϟˈtv2 ЁϟᷛЎ 0 ⱘ䙷Ͼ䯳߫ϔᅮᰃぎⱘDŽৠᯊˈЎњ֓Ѣᅲ⦄ˈҷⷕЁᇚ tv1ǃtv2 ⏙ἮњDŽᔧࠄ⚍ᯊ䯈Ϣᔧࠡᯊ䯈ⱘᏂ idx Ў TVR_SIZE े 256 ᯊˈ㒣䖛㄀ 136 㸠ⱘ໘⧚ҹৢ㒧ᵰЎ 1 ЎҔМᰃ೼ tv2.index Ў 1 ᯊˈ㗠ϡᰃЎ 0 ᯊˈᠡҢ tv3 Ёᨀ䖤ਸ਼˛ಲ༈এⳟϔϟ internal ⱘҷⷕህ Ё੠ tv1 Ёˈԭ㉏᥼DŽ ᰃҹ 64 Ў῵ⱘˈ᠔ҹ೼䖒ࠄ 63 ҹৢህ㽕ಲࠄ 0DŽᔧ tv2.index Ў 1 ᯊህ㽕Ң tv3 Ёᨀ䖤ϔϾ䯳߫ࠄ tv2 ℹˈгህᰃ↣ᔧথ⫳њ 256 ⃵ᯊ䩳Ёᮁᯊˈtv2.index ህ㽕ᕔࠡ᥼䖯ϔℹDŽϢ tv1.index ϡৠˈtv2.index Ң tv2 Ёᨀ䖤ϔϾ䯳߫ࠄ tv1 ЁᴹDŽtv2 Ёг᳝ϔϾ indexˈг㽕৥ࠡ᥼䖯DŽ↣ᔧ jiffies ৥ࠡ᥼䖯њ 256 ᠔ҹ᭄݊ؐ೼ 255 ҹৢህಲࠄњ 0ˈϟϔϾᕾ⦃Ё៪㗙ϟϔ⃵ᠻ㸠䖭Ͼߑ᭄ᯊህ㽕䗮䖛 cascade_timers() ߫ᯊˈህᇚtimes_jiffies੠tv1.indexгᕔࠡ᥼䖯ϔℹDŽԚᰃˈtv1.indexⱘؐᰃҹ256 Ў῵ⱘ˄TVR_MASK˅ˈ ᡞᅮᯊ఼䗮䖛 detach_timer()Ң䯳߫Ёᨬ䰸ߎᴹˈ✊ৢህᠻ㸠䆹ᅮᯊ఼᠔ᣛᅮⱘߑ᭄DŽᠻ㸠ᅠ䖭ᭈϾ䯳 ҷⷕЁ⬅ goto ᅲ⦄ⱘᕾ⦃ህᰃ໘⧚೼䖭ϔℹЁࠄ⚍ⱘ䯳߫DŽ໘⧚ᴀ䑿ᰃᕜㅔऩⱘˈ乎ⴔ䯳߫᣼Ͼ DŽމᴹⳟϡЎ 0 ᯊⱘᚙܜˈ᱖ᯊ᧕ϔϟމህ㽕Ң tv2 Ёᨀ䖤ϔϾ䯳߫ࠄ tv1 ЁDŽ៥Ӏгᡞ䖭⾡ᚙ ⳟ tv1.index ᰃ৺Ў 0ˈ㢹Ў 0ܜˈ䖭ḋˈ䖭䞠䗮䖛ϔϾᕾ⦃ᴹ໘⧚ jiffies ⱘ↣ϾऩℹDŽ೼↣ϾऩℹЁ ϟ jiffies ৥ࠡ᥼䖯ⱘℹ䭓᳝ৃ㛑໻ 1DŽℷ಴Ўމ೼Āᯊ䩳Ёᮁāϔ㡖Ёˈ៥Ӏ䖬䆆䖛ˈ೼⡍⅞ⱘᚙ 324 * run•queue (except when the actual re•schedule is in 323 * already there. The "current" process is always on the 322 * Wake up a process. Put it on the run•queue if it's not 321 /* 320 319 } 318 list_add(&p•>run_list, &runqueue_head); list_del(&p•>run_list); 317 } 316 399 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 54 long time_esterror = NTP_PHASE_LIMIT; /* estimated error (us) */ ˷tinsewbhh()˚runtimerlist()˚detacLtlner()˚timeL ि ending()˹ ==================== include/linux/timer.h 54 57 ==================== 198 } 197 return 1; 196 list_del(&timer•>list); 195 return 0; 194 if (!timer_pending(timer)) 193 { 192 static inline int detach_timer (struct timer_list *timer) [timer_bh()>run_timer_list()>detach•tiner()] ==================== kernel/timer.c 192 198 ==================== ᴹⳟⳟ detach_timet()੠ del_timer()ⱘҷⷕ˄sched.c˅˖ ৫˛ᗢМ䖭䞠জ㽕 del_timer_sync()ਸ਼˛ᇍѢऩ໘⧚఼ⱘ㋏㒳ˈdel_timer_sync()ᅮНЎ del_timer()ˈ៥Ӏ 䇏㗙г䆌Ӯᛳࠄ༛ᗾˈ߮ᠡ೼ run_timer_list()ЁϡᰃᏆ㒣䗮䖛 detach_timer()ᡞᅮᯊ఼Ң䯳߫Ёߴ䰸њ ಲ䖛এ㒻㓁ⳟ schedute_timeout()ⱘҷⷕˈҢ schedule()䖨ಲҹৢ㋻᥹ⴔህ䇗⫼њ del_timer_sync()ˈ 䖯⿟Ңࠡ䴶 schedule_timeout()Ёⱘ schedule()䖨ಲњDŽ ࠄ䖛њDŽ䖯⿟㹿૸䝦ᑊϨݡ⃵㹿䇗ᑺ䖤㸠ᯊˈህಲࠄњࠡ䴶ⱘ schedule_timeout()ЁDŽᤶহ䆱䇈ˈᰃ䆹 ߑ᭄䗮䖛 wake_up_process ᇚⴵ⳴Ёⱘ䖯⿟૸䝦DŽᅗⱘҷⷕ䇏㗙Ꮖ㒣೼ࠡϔ㡖Āᔎࠊᗻ䇗ᑺāЁⳟ 367 } 366 wake_up_process(p); 365 364 struct task_struct * p = (struct task_struct *) __data; 363 { 362 static void process_timeout(unsigned long __data) [timer_bh()>run_timer_list()>process_timeout()] ==================== kernel/sched.c 362 367 ==================== ᠔ҹࠄ⚍ᯊህӮ䇗⫼ process_timeout()˄sched.c˅˖ ೼៥Ӏ䖭Ͼᚙ᱃Ёˈᅮᯊ఼Ёⱘߑ᭄ᣛ䩜Ў process_timeoutˈখ᭄Ўⴵ⳴Ё䖯⿟ⱘ task_struct ᣛ䩜ˈ 286 } 285 tv•>index = (tv•>index + 1) & TVN_MASK; 284 INIT_LIST_HEAD(head); 283 } 282 curr = next; 281 internal_add_timer(tmp); 280 list_del(curr); // not needed 279 next = curr•>next; 278 tmp = list_entry(curr, struct timer_list, list); 277 276 struct timer_list *tmp; 275 while (curr != head) { 274 */ detach them individually, just clear the list afterwards. * 273 400 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 250 asmlinkage int sys_pause(void) ==================== arch/i386/kernel/sys_i386.c 250 255 ==================== arch/i386/kernel/sys_i386.c Ё˖ Ϣ sys_nanosheep()Ⳍ↨ˈৠḋгᰃ㋏㒳䇗⫼ⱘ sys_pause()ⱘҷⷕህᕜㅔऩњˈ݊ҷⷕ೼ ࠄᅗⱘՓ⫼DŽℸ໪ˈԣݙḌЁгৃҹⳈ᥹䇗⫼ schedule_timeout()DŽ interruptible_sleep_on_timeout()ˈկ৘⾡䆒໛偅ࡼ⿟ᑣ೼ݙḌЁՓ⫼ˈᇚᴹ೼䆒໛偅ࡼϔゴЁ䇏㗙Ӯⳟ ㋏㒳䇗⫼ sys_nanoshecp()ᑊ䴲 schedule_timeout()ⱘᚳϔĀ⫼᠋āDŽݙḌЁ䖬ᦤկњϔϾߑ᭄ 偀ϞህӮ㹿䇗ᑺ䖤㸠DŽ 䖯㸠ᇍ jiffies ⱘ໘⧚ˈ᠔ҹϔ⃵ህ৥ࠡ᥼䖯ད޴ℹDŽ঺ϔᮍ䴶ेՓᣝᯊᇚ䖯⿟૸䝦гϡ㛑ֱ䆕䆹䖯⿟ ϟг䆌Ӯᡞད޴⃵ᯊ䩳Ёᮁড়ᑊ೼ϔ䍋މᰃ᳝ⱘDŽ䖭ϔᮍ䴶ᰃ಴Ў೼⡍⅞ⱘᚙצ߭ˈⴵ䖛њ༈ⱘৃ㛑 Ёⱘ⾦੠↿ᖂ⾦ˈ✊ৢ䖨ಲ㒭⫼᠋ぎ䯈DŽᔧ✊ˈা᳝೼䖯⿟಴ֵো㗠㹿૸䝦ᯊᠡ᳝ৃ㛑䖬᳾ⴵ໳DŽ৺ ⴵ໳ⱘᯊ䯈ᰃҹᯊ䩳Ёᮁⱘ⃵᭄Ўሎᑺⱘˈ᠔ҹ೼ sys_nanosheep()Ёজᇚ݊ᤶㅫಲ timespec ᭄᥂㒧ᵘ ᳔ৢˈᳳᳯЁⱘࠄ⚍ᯊ䯈 expire Ϣᔧࠡᯊ䯈 jiffies ПᏂЎ࠽ϟⱘᇮ᳾ⴵ໳ⱘᯊ䯈DŽ䖭࠽ϟⱘᇮ᳾ ᇚ݊Ң䯳߫Ёএ䰸DŽ ҹᦤ催ᬜ⥛DŽৃᰃᇚ䖭ḋϔϾ᭄᥂㒧ᵘ⬭೼䯳߫Ёᰃᕜॅ䰽ⱘˈϔᅮ㽕ֱ䆕೼䖭Ͼ᭄᥂㒧ᵘ䖬᳝ᬜᯊ ఼ⱘ咏⚺ˈгৃކᮺҢ schedule_timeout()䖨ಲˈ䖭Ͼ᭄᥂ህ⍜༅њDŽ䖭䞠ৃҹⳕএࡼᗕߚ䜡੠䞞ᬒ㓧 䇗⫼ϔ⃵ del_timer()ህৃҹ⹂ֱᅝܼњDŽ䖭䞠㽕ᣛߎˈ䖭䞠ⱘ timer ᰃϾሔ䚼䞣ˈ݊ぎ䯈೼ේᷜЁˈϔ ᔧ঺ϔϾ䖯⿟৥ⴵ⳴Ёⱘ䖯⿟থ䗕ϔϾֵোᯊˈৠḋৃҹᇚ݊૸䝦DŽ᠔ҹˈ೼ schedule_timcout()Ёݡ ᮴⫼ࡳଞDŽᰃⱘˈԚᰃ㽕ᛇࠄˈrun_timer_list()ᑊϡᰃᚳϔৃҹᇚ䖭Ͼ䖯⿟૸䝦ⱘߑ᭄DŽخ⬅⧛г≵᳝ ৃ㾕ˈᇍϔϾᏆ㒣Ң䯳߫Ё㜅䫒ⱘᅮᯊ఼ݡ䇗⫼ϔ⃵ del_timer()ᑊ≵᳝ᆇ໘DŽৃᰃˈेՓ≵᳝ᆇ໘ˈ 223 } 222 return ret; 221 spin_unlock_irqrestore(&timerlist_lock, flags); 220 timer•>list.next = timer•>list.prev = NULL; 219 ret = detach_timer(timer); 218 spin_lock_irqsave(&timerlist_lock, flags); 217 216 unsigned long flags; 215 int ret; 214 { 213 int del_timer(struct timer_list * timer) [sys_nanosleep()>schedute_timeout()>del_timer()] ==================== kernel/timer.c 213 223 ==================== del_timer()ᅲ䰙Ϟ䇗⫼ detach_timer()˖ ᠔ҹ detach_tiner()ҙ೼᠔໘⧚ⱘ timer_list ᭄᥂㒧ᵘ೼䯳߫ЁᯊᠡᡞᅗҢ䯳߫Ёߴ䰸DŽߑ᭄ 57 /* frequency offset (scaled ppm)*/ 56 long time_freq = ((1000000 + HZ/2) % HZ • HZ/2) << SHIFT_USEC; long time_phase; /* phase offset (scaled us) */ 55 401 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 䅵఼᭄ count ᠔䅵ⱘህᰃĀֵো䞣āЁⱘ䙷ϾĀ䞣āˈᅗҷ㸼ⴔৃՓ⫼䌘⑤ⱘ᭄䞣DŽ≵᳝ᄺд䖛᪡ 51 }; 50 #endif 49 long __magic; 48 #if WAITQUEUE_DEBUG 47 wait_queue_head_t wait; 46 int sleepers; 45 atomic_t count; 44 struct semaphore { ==================== include/asm•i386/semaphore.h 44 51 ==================== ⳟ᭄᥂㒧ᵘˈstruct semaphore ᰃ೼ include/asm/i386/semaphore.h ЁᅮНⱘ˖ܜ ⾡᪡԰DŽ㟇Ѣֵো䞣ˈ߭ᰃϔ⾡᭄᥂㒧ᵘ㉏ൟ semaphoreDŽ ⾡ᴎࠊᴹᅲ⦄ⱘDŽݙḌЁЎℸᦤկњ down()੠ up()ϸϾߑ᭄ˈߚ߿ᇍᑨѢ᪡԰㋏㒳⧚䆎Ёⱘ P ੠ V ϸ 䖯⿟䯈ᇍ݅ѿ䌘⑤ⱘѦ᭹䆓䯂ˈ៪㗙䇈ᇍ䖯⿟䯈ᑆᡄⱘ䰆㣗ˈᰃ䗮䖛Āֵো䞣ā˄semaphore˅䖭 ೼㋏㒳䇗⫼ vfork()Ёˈ⫼԰⠊䖯⿟Ϣᄤ䖯⿟П䯈ᇍѢ݅ѿ㰮ᄬぎ䯈ⱘѦ᭹ֱᡸ᠟↉DŽ⫼׳г㹿 䗄ˈབᵰϡࡴ䰆ℶˈऩ໘⧚఼㋏㒳Ё೼ϔᅮᴵӊϟгӮথ⫳䖯⿟䯈ⱘѦⳌᑆᡄDŽ঺ϔᮍ䴶ˈ䖭⾡᥾ᮑ ೼Ā໮໘⧚఼ SMP ㋏㒳㒧ᵘāϔゴЁˈ៥Ӏᇚ䅼䆎᳝݇໮໘⧚఼㒧ᵘⱘ⾡⾡䯂乬DŽԚᰃˈབϞ᠔ ᴹ㞾݊ᅗ໘⧚఼ⱘЁᮁ᳡ࡵ⿟ᑣⱘᑆᡄDŽ ৃ㛑ⱘDŽᑊϨˈ೼໮໘⧚఼㋏㒳ЁˈϡԚ㽕䰆ℶᴹ㞾ৠϔ໘⧚఼ϞⱘЁᮁ᳡ࡵ⿟ᑣⱘᑆᡄˈ䖬㽕䰆ℶ ㉏䌘⑤ⱘᯊ׭ህৃ㛑ফࠄ݊ᅗ䖯⿟ⱘᑆᡄDŽ㟇Ѣᴹ㞾Ёᮁ᳡ࡵ⿟ᑣ˄ࣙᣀ bh ߑ᭄˅ⱘᑆᡄˈ߭ᘏᰃ᳝ Ԛᰃ೼݋ԧՓ⫼ᳳ䯈ै䳔㽕⣀ऴˈ㗠Ϩᇍ䖭ѯ䌘⑤ⱘ䆓䯂ৃ㛑Ӯফ䰏㗠䳔㽕ⴵ⳴ㄝᕙDŽ䖯⿟೼䆓䯂ℸ ϟˈ߭ҡ᳝ৃ㛑থ⫳䖯⿟䯈ⱘᑆᡄDŽ㋏㒳Ё᳝ѯ䌘⑤ᰃ݅ѿⱘˈމЁᰃϡӮথ⫳ⱘDŽԚᰃˈ೼঺ϔ⾡ᚙ ᅲ䰙ϞাӮথ⫳೼໮໘⧚఼ⱘ㋏㒳Ёˈ೼ऩ໘⧚఼ⱘ㋏㒳މҹˈϞ䗄ϸϾ䖯⿟೼ݙḌЁѦⳌᑆᡄⱘᚙ Ͼ䖯⿟䯈ⱘᑆᡄᅲ䰙ϞϡӮথ⫳೼ݙḌЁDŽ䖭ϔ⚍೼Ā䖯⿟ⱘ䇗ᑺϢߛᤶāϔ㡖ЁᏆ㒣䅼䆎䖛њDŽ᠔ ⿟Ё䗨˅П໪ˈা᳝೼Ң㋏㒳ぎ䯈䖨ಲࠄ⫼᠋ぎ䯈ⱘࠡ໩ᠡ᳝ৃ㛑থ⫳䇗ᑺDŽ䖭ḋⱘᅝᥦՓᕫϞ䗄ϸ ᰒ✊ϡӮথ⫳೼ϡᆍ䆌ফࠄᠧᡄⱘ䖛˄މϡ䖛ˈ䰸њϔϾ䖯⿟Џࡼ䇗⫼ schedule()䅽ߎ CPU ⱘᚙ ೼໮໘⧚఼ SMP 㒧ᵘⱘ㋏㒳Ёˈ䖭⾡ᑆᡄ䖬᳝ৃ㛑ᴹ㞾঺ϔϾ໘⧚఼DŽ 䅽㄀ѠϾ䖯⿟ᦦњ䖯ᴹˈ㒧ᵰᕜৃ㛑ህхњDŽ㉏Ԑⱘᑆᡄг᳝ৃ㛑ᴹ㞾ᶤϾЁᮁ᳡ࡵ⿟ᑣ៪ bh ߑ᭄DŽ ⿟䛑㽕ᇚϔϾ᭄᥂㒧ᵘ䫒ܹࠄৠϔϾ䯳߫ⱘሒ䚼ˈ㽕ᰃ೼㄀ϔϾ䖯⿟ᅠ៤њϔञⱘᯊ׭থ⫳њ䇗ᑺˈ ݙḌЁⱘᕜ໮᪡԰೼䖯㸠ⱘ䖛⿟Ё䛑ϡᆍ䆌ফࠄᠧᡄˈ᳔݌ൟⱘ՟ᄤህᰃ䯳߫᪡԰DŽབᵰϸϾ䖯 4.9 ݙḌЁⱘѦ᭹᪡԰ ᰒ✊ˈᔧࠡ䖯⿟䗮䖛 sys_pause()ܹⴵҹৢˈা᳝೼᥹ᬊࠄֵোᯊᠡӮ㹿૸䝦DŽ 255 } 254 return •ERESTARTNOHAND; 253 schedule(); current•>state = TASK_INTERRUPTIBLE; 252 } 251 402 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 53 #if WAITQUEUE_DEBUG ==================== include/asm•i386/semaphore.h 53 71 ==================== ᳝݇ DECLAR_MUTEX()ⱘᅮН೼ include/asm•i386/semaphore.h Ё˖ 㽕ˈ᧲ϡདⱘ䆱ৢᵰৃ㛑ᕜϹ䞡āⱘᛣᗱDŽ section˅DŽ乎֓䇈ϔϟˈᡞ critical section 㗏䆥៤ĀЈ⬠ऎāԐТ᳝⚍ᄺお⇨ˈcritical ݊ᅲህᰃĀ䴲ᐌ䞡 ℷ೼ㄝᕙⱘ݊ᅗ䖯⿟DŽ᪡԰㋏㒳⧚䆎䞠ᡞ䖭↉䳔㽕⣀ᆊ݇䍋䮼ᴹᑆⱘ᪡԰⿄ЎĀЈ⬠ऎā˄critical ᠻ㸠 down()ҹᕫࠄϔᓴĀ䮼⼼āˈ㗠ᔧᅠ៤њ᪡԰Ң䞠䴶ߎᴹᯊ߭㽕ᠻ㸠 up()ҹᔦ䖬䮼⼼ᑊ૸䝦ৃ㛑ܜ Ϩᡞ㽕ࡴҹֱᡸⱘ᪡԰ᬒ೼䖯䮼˄down˅੠ߎ䮼˄up˅ϸϾ᪡԰П䯈DŽ㽕䖯ܹ䖭ϾĀ䰶ᄤāᯊᖙ乏㽕 ೼㄀ 50 㸠ᓎゟ䍋ϔϾ⣀䮼ⱘĀ䰶ᄤāˈ៪㗙䇈Āֵো䞣”mount_semˈᑊܜᑺⱘDŽЎ䖒ࠄ䖭ϾⳂⱘˈ佪 ៪ᢚौ᭛ӊ㋏㒳ˈ㗠ᅝ㺙៪ᢚौ᭛ӊ㋏㒳ⱘ䖛⿟জᰃৃ㛑˄ᅲ䰙Ϟᰃᖙᅮ˅ফ䰏ˈ಴㗠Ё䗨Ӯথ⫳䇗 䆌᳝ϔϾ䖯⿟೼ᅝ㺙ܕ䖭䞠ⱘⳂⱘᰃ㽕ᡞ do_umount()ֱᡸ䍋ᴹˈ಴Ў೼ৠϔᯊ䯈䞠ᭈϾ㋏㒳Ёা 1153 } 1152 return retval; ==================== fs/super.c 1152 1153 ==================== 1146 up(&mount_sem); 1145 retval = do_umount(nd.mnt, 0, flags); 1144 down(&mount_sem); ==================== fs/super.c 1144 1146 ==================== 1118 { 1117 asmlinkage long sys_umount(char * name, int flags) ==================== fs/super.c 1117 1118 ==================== 50 static DECLARE_MUTEX(mount_sem); 49 */ 48 * else). 47 * unmounting a filesystem and re•mounting it (or something 46 * activity • imagine the mess if we have a race between 45 * We use a semaphore to synchronize all mount/umount 44 /* ==================== fs/super.c 44 50 ==================== ϟ䴶ⱘҷⷕপ㞾᭛ӊ fs/super.c˖ ϡ݇ᖗᗢḋᢚौ˄umount˅ϔϾᏆ㒣ᅝ㺙ⱘ᭛ӊ㋏㒳ˈ㗠ᰃ݇ᖗᗢḋᡞϔ䚼ߚ݇䬂ᗻⱘ᪡԰ֱᡸ䍋ᴹDŽ 䴶䗮䖛ϔ↉ᅲ՟ˈⳟⳟ䖭↉䖛⿟݋ԧᰃᗢМᅲ⦄ⱘDŽ䖭↉ᅲ՟প㞾㋏㒳䇗⫼ umount()ˈ៥Ӏ೼䖭䞠ᑊ 䇈Ā⦄೼᳝䮼⼼њāˈ៪㗙䇈ˈᇚ䖭ѯ䖯⿟૸䝦ˈ䅽ᅗӀএゲѝ䙷ᓴ䮼⼼DŽৃ㾕ˈॳ⧚݊ᅲᕜㅔऩDŽϟ 䮼⼼ⱘ᭄䞣Ꮖ㒣ᰃ 0ˈ䙷ህৃ㛑᳝䖯⿟ℷ೼ӥᙃᅸЁㄝ׭ˈ᠔ҹ䖬㽕৥䖭ѯℷ೼ㄝ׭ⱘ䖯⿟ᠧϾ᢯੐ˈ ⱘџᚙҹৢˈ䖬ᰃҢৠϔϾ໻䮼ߎᴹˈᑊᇚ䮼⼼Ѹ䖬DŽབᵰ೼Ѹ䖬䮼⼼Пࠡˈخᄤⱘ䖯⿟ˈᅠ៤њᅗ㽕 ⳴ǃㄝ׭DŽ䖭Ͼӥᙃᅸህᰃ䖭䞠ⱘ䯳߫ waitˈ㗠䅵఼᭄ sleepers ߭㸼⼎᳝޴Ͼ䖯⿟ℷ೼ㄝ׭DŽ䖯ܹњ䰶 ᔧϔϾ䖯⿟ᴹࠄ䮼ষ㽕乚⼼ˈैথ⦄䮼⼼Ꮖ㒣থᅠⱘᯊ׭ˈህাདࠄ໻䮼ᮕ䖍ⱘĀӥᙃᅸāএⴵ ϟˈϔ݅ህা᳝ϔᓴ⼼ˈ᠔ҹা᳝ϔϾ䖯⿟ৃҹ䖯এDŽމ޴Ͼ䖯⿟ৃҹ䖯䮼DŽ೼݌ൟⱘᚙ 㽕೼໻䮼ষ乚প䮼⼼DŽ᠔ҹ count ⱘ᭄ؐे㸼⼎䖬᳝ܜˈ䖯⿟ᛇ㽕䖯ܹ䖭Ͼ䰶ᄤⱘೈ๭䞠䴶ᑆѯҔМᯊ ៤ϔϾ䰶ᄤⱘ໻䮼ˈ㗠 count 㸼⼎ϔ᳝݅޴ᓴ䮼⼼DŽᔧϔϾڣ԰㋏㒳⧚䆎ⱘ䇏㗙ϡོᡞ䖭Ͼ᭄᥂㒧ᵘᛇ 403 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 121 "# atomic down operation\n\t" 120 __asm__ __volatile__( 119 118 #endif 117 CHECK_MAGIC(sem•>__magic); 116 #if WAITQUEUE_DEBUG 115 { 114 static inline void down(struct semaphore * sem) 113 */ 112 * routine that actually waits. See arch/i386/kernel/semaphore.c 111 * "__down_failed" is a special asm handler that calls the C 110 * This is ugly, but we want the default case to fall through. 109 /* ==================== include/asm•i386/semaphore.h 109 132 ==================== ⳟ down()˖ܜinclude/asm•i386/semaphore.h ЁᅮНⱘDŽ ᇍѢֵো䞣ⱘ᪡԰া᳝ down()੠ up()ϸ⾡ˈ䖭ᰃϸϾ inline ߑ᭄ˈ䛑ᰃ೼ њᅗӀ৘㞾ⱘ⫼䗨DŽℸ໪ˈֵো䞣᮶ৃҹ԰Ўܼሔ䞣ᄬ೼ˈгৃҹ԰ЎᶤϾߑ᭄ⱘሔ䚼䞣ᄬ೼DŽ ϔ㡖Ёⳟࠄ䖛ℸ⾡ֵো䞣ⱘ䖤⫼DŽϸ⾡ֵো䞣৘᳝৘ⱘ⫼໘ˈ㗠 MUTEX ੠ MUTEX_LOCKED ℷড᯴ 䆌݊䖯ܹ໻䮼DŽ䇏㗙Ꮖ㒣೼㋏㒳䇗⫼ fork()ܕᶤϾ䖯⿟䗮䖛 up()᪡԰䗕ᴹϔᓴᠡ㛑ᡞᅗথ㒭ϔϾ䖯⿟㗠 ܹЈ⬠ऎDŽ঺ϔ⾡䗮䖛 DECLARE_MUTEX_LOCKED()ᓎゟⱘֵো䞣߭ϔᓴ䮼⼼г≵᳝ˈϔᅮ㽕ㄝࠄ гህᰃ䇈ˈ䗮䖛 DECLARE_MUTEX()ᓎゟⱘֵো䞣া᳝ 1 ᓴĀ䮼⼼āˈ᠔ҹা᳝ϔϾ䖯⿟ৃҹ䖯 static struct semaphore mount_sem={[(1)] ˈOˈ…} ࠡ䴶ⱘ㄀ 50 㸠ህব៤㉏ԐѢ䖭ḋⱘ䇁হ˖ include/asm•i386/atomic.h ੠ include/linux/wait.h Ёˈ䇏㗙ৃҹ㞾㸠খ䯙DŽᘏПˈ㒣䖛 gcc ⱘ乘໘⧚ҹৢˈ ᅣᅮН ATOMIC_INIT()੠__WAIT_QUEUE_HEAD_INITIALIZER()ߚ߿೼ 71 #define DECLARE_MUTEX_LOCKED(name) __DECLARE_SEMAPHORE_GENERIC 70 #define DECLARE_MUTEX(name) __DECLARE_SEMAPHORE_GENERIC(name,1) 69 68 struct semaphore name = __SEMAPHORE_INITIALIZER(name,count) 67 #define __DECLARE_SEMAPHORE_GENERIC(name,count) \ 66 65 __SEMAPHORE_INITIALIZER(name,1) 64 #define __MUTEX_INITIALIZER(name) \ 63 62 __SEM_DEBUG_INIT(name) } 61 { ATOMIC_INIT(count), 0, __WAIT_QUEUE_HEAD_INITIALIZER((name).wait) \ 60 #define __SEMAPHORE_INITIALIZER(name,count) \ 59 58 #endif 57 # define __SEM_DEBUG_INIT(name) 56 #else 55 , (int)&(name).__magic define __SEM_DEBUG_INIT(name) \ # 54 404 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 192 "ret" 191 "popl %eax\n\t" 190 "popl %edx\n\t" 189 "popl %ecx\n\t" 188 "call __down\n\t" 187 "pushl %ecx\n\t" 186 "pushl %edx\n\t" 185 "pushl %eax\n\t" 184 "__down_failed:\n\t" 183 ".globl __down_failed\n" 182 ".align 4\n" 181 asm( 180 */ 179 * value.. 178 * registers (%eax, %edx and %ecx) except %eax when used as a return 177 * %ecx contains the semaphore pointer on entry. Save the C•clobbered 176 * 175 * there is contention on the semaphore. 174 * need to convert that sequence back into the C sequence when 173 * allow us to do a simpler in•line version of them. These routines 172 * The semaphore operations have a special calling sequence that 171 /* [down()>__down_failed()] ==================== arch/i386/kernel/semaphore.c 171 193 ==================== ߑ᭄__down_failed()ҹঞ᳝݇ⱘҷⷕ䛑೼ arch/i386/kernel/semaphore.c Ё˖ 䕀ࠄᷛো“1ā㗠㒧ᴳ down()᪡԰ˈे䖯ܹњЈ⬠ऎDŽ 䰙Ϟˈ䖯⿟೼__down_failed()ЁӮ䖯ܹⴵ⳴ˈϔⳈ㽕ࠄ㹿૸䝦ᑊ៤ࡳഄᣓࠄ䮼⼼ᠡӮҢ䙷䞠䖨ಲˈ✊ৢ њҹৢⱘ㒧ᵰЎ䋳᭄ˈ䙷ህ㸼⼎ᣓϡࠄ䮼⼼ˈህ䕀ࠄᷛো“2ā໘䇗⫼__down_failed()DŽᅲޣབᵰ ⱘᑆᡄDŽ Ҹ decl ࠡ䴶᳝Ͼࠡ㓔 LOCKˈ㸼⼎೼ᠻ㸠䖭ᴵᣛҸᯊ㽕ᡞᘏ㒓䫕ԣˈҹ䰆ৃ㛑ᴹ㞾ৠϔ㋏㒳Ё݊ᅗ CPU ᵰ㢹Ў 0 ៪໻Ѣ 0ˈ៪㗙䇈བᵰ៤ࡳഄᣓࠄњϔᓴ䮼⼼ˈ䙷Мህ೼ᷛো“1ā໘㒧ᴳњDŽ⊼ᛣ䖭䞠೼ᣛ њҹৢⱘ㒧ޣⱘᅲ䰙Ϟᰃ sem•>countDŽޣेЎᣛ৥ sem•>count ⱘᣛ䩜ˈҢ㗠㄀ 122 㸠ⱘ decl ᣛҸ᠔䗦 ᆘᄬ఼ ECX 㒧ড়DŽ⬅Ѣ count ᰃ semaphore ᭄᥂㒧ᵘЁⱘ㄀ϔϾ៤ߚˈ᠔ҹᣛ৥䆹᭄᥂㒧ᵘⱘᣛ䩜 sem 䖭↉ጠܹ∛㓪ҷⷕⱘ䕧ߎ䚼Ўぎˈ䇈ᯢᠻ㸠ৢᑊϡᬍবᆘᄬ఼ⱘݙᆍ˗㗠䕧ܹ䚼߭Փᣛ䩜 sem Ϣ 132 } 131 :"memory"); 130 :"c" (sem) 129 :"=m" (sem•>count) 128 ".previous" 127 "jmp 1b\n" 126 "2:\tcall __down_failed\n\t" 125 ".section .text.lock,\"ax\"\n" 124 "1:\n" 123 "js 2f\n" LOCK "decl %0\n\t" /* ••sem•>count */ 122 405 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 75 sem•>sleepers = 0; 74 if (!atomic_add_negative(sleepers • 1, &sem•>count)) { 73 */ 72 * playing, because we own the spinlock. 71 * Add "everybody else" into it. They aren't 70 /* 69 68 int sleepers = sem•>sleepers; 67 for (;;) { 66 sem•>sleepers++; 65 spin_lock_irq(&semaphore_lock); 64 63 add_wait_queue_exclusive(&sem•>wait, &wait); 62 tsk•>state = TASK_UNINTERRUPTIBLE; 61 DECLARE_WAITQUEUE(wait, tsk); 60 struct task_struct *tsk = current; 59 { 58 void __down(struct semaphore * sem) [down()>__down_failed()__down()] ==================== arch/i386/kernel/semaphore.c 58 89 ==================== ݡⳟ__down()ⱘҷⷕ˖ 39 */ 38 * where we want to avoid any extra jumps and calls. 37 * critical part is the inline stuff in 36 * "non•critical" part of the whole semaphore business. The 35 * contention on the lock, and as such all this is the 34 * Note that these functions are only called when there is 33 * 32 * protected by the semaphore spinlock. 31 * "sleeping" and the contention routine ordering is 30 * 29 * the increment operation. 28 * needs to do something only if count was negative before 27 * efficiently test if they need to do any extra work (up 26 * Notably, the inline "up()" and "down()" functions can 25 * 24 * variable is a count of such acquires. 23 * that tries to acquire the semaphore, while the "sleeping" 22 * The "count" variable is decremented for each process 21 * Semaphores are implemented using a two•way counter: 20 /* ==================== arch/i386/kernel/semaphore.c 20 39 ==================== ↉⊼䞞ˈ៪ৃᐂࡽ䇏㗙᳈དഄ⧚㾷˖ ᰒ✊ˈ䖭䞠ⱘⳂⱘা೼Ѣ䇗⫼__down()DŽҷⷕⱘ԰㗙೼䖭Ͼ᭛ӊ˄semaphore.c˅ⱘᓔ༈໘ࡴњϔ ;( 193 406 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⱘᰃˈᔧⴵ⳴Ёⱘ䖯⿟㹿૸䝦㗠Ң schedule()Ё䖨ಲˈᑊಲࠄᕾ⦃ԧⱘࠡ䚼ᯊˈ⬅Ѣ sem•>sleepers ೼ ᯊˈ⬅Ѣᷛᖫԡ TASK_EXCLUSIVE Ў 1ˈ᠔ҹা᳝ᥦ೼䯳߫Ёⱘ㄀ϔϾ䖯⿟ᠡӮ㹿૸䝦DŽ䖬㽕ᣛߎ ⫼ schedule()DŽ⬅Ѣܹⴵⱘ⢊ᗕЎ TASK_UNINTERRUPTIBLEˈ᠔ҹϡӮ಴᥹ᬊࠄֵো㗠㹿૸䝦DŽৠ ᔧ atomic_add_negative()䖨ಲ䴲 0 ᯊˈᔧࠡ䖯⿟ህⳳⱘ㽕䖯ܹⴵ⳴⢊ᗕㄝᕙњˈ᠔ҹ೼㄀ 81 㸠䇗 䙷ᯊ sem•>count Ꮖ㒣ব៤ 1ˈ৺߭ atomic_add_negative()ᖙ✊Ӯ䖨ಲ䴲 0DŽ䖭ϔ⚍ㄝϔϟ៥Ӏ䖬㽕䅼䆎DŽ 䖭ᰃ಴Ў䙷ᯊ׭ sem•>sleepers Ꮖ㒣䆒៤њ 0˄㾕㄀ 75 㸠˅ˈ᠔ҹ(sleepers•1)Ў•1ˈᏆ㒣ᰃϾ䋳᭄˗䰸䴲 ㄝᕙ䯳߫Ёⱘ݊ᅗ䖯⿟DŽϡ䖛ˈབᵰ᳝݊ᅗ䖯⿟ℷ೼ㄝᕙⱘ䆱ˈ㹿૸䝦Пৢ໮ञ䗮ϡ䖛㄀ 74 㸠ⱘ⌟䆩ˈ བᵰᔧࠡ䖯⿟থ⦄ϡݡ䳔㽕ㄝᕙњˈᅗህ䗮䖛䖭䞠ⱘ break 䇁হ䏇ߎ for ᕾ⦃ˈᑊ೼䖨ಲПࠡ૸䝦 ✊ৢḍ᥂㒧ᵰᰃ৺Ў 0 ߸ᅮᰃ৺᳝䖯⿟䳔㽕૸䝦DŽ ैᑊϡ䳔㽕㸼ᯢࠄᑩ᳝޴Ͼ䖯⿟ℷ೼ㄝᕙDŽ䖭ḋˈ೼ up()᪡԰Ёৃҹ⫼ϔᴵᣛҸᇚ sem•>count ࡴ 1ˈ 㗙ৃҹ⧚㾷Ў䖬࠽ϟ޴ᓴĀ䮼⼼ā˗㗠 sem•>count Ў䋳ⱘᯊ׭߭㸼ᯢᏆ㒣≵᳝䌘⑤ᑊϨ᳝䖯⿟ℷ೼ㄝᕙˈ down()䞠䴶ᇚ sem•>count Ң 1 ব៤њ 0 ϔḋDŽᔧ sem•>count ⱘؐЎℷ᭄៪ 0 ᯊ㸼⼎䖬᳝໮ᇥ䌘⑤ˈ៪ ᴀ⃵᪡԰೼ڣatomic_add_negative()䖨ಲ䳊ˈ㸼⼎ᔧࠡ䖯⿟ϡ䳔㽕ㄝᕙњˈৃҹ䖯ܹЈ⬠ऎњˈህད ᠻ㸠њ up()᪡԰ˈ䙷М sleepers•1 ҡЎ 0ˈԚ sem•>count ব៤њ 0ˈⳌࡴⱘ㒧ᵰЎ 0 㗠ϡᰃ䋳᭄ˈℸᯊ atomic_add_negative()䖨ಲ䴲䳊ˈ㸼⼎ᔧࠡ䖯⿟ҡ䳔ㄝᕙDŽ㗠㢹೼ 65 㸠ПࠡᏆ㒣᳝Ͼ䖯⿟೼ℸֵো䞣Ϟ ⿟೼ℸֵো䞣Ϟᠻ㸠 up()ˈ䙷М(sleepers•1)Ў 0ˈ㗠 sem•>count Ў•1ˈⳌࡴⱘ㒧ᵰҡᰃ•1ˈℸᯊ ϡӮᇣѢ•1DŽВ՟ᴹ䇈ˈབᵰ೼ᔧࠡ䖯⿟ᠻ㸠 down()Пࠡ sem•>count Ў 0ˈᑊϨҢ䙷ᯊ׭ҹᴹᑊ᮴䖯 ᶹᰃᕜ݇䬂ⱘDŽ㗠Ϩˈᅗ᠔԰ⱘ䖬ϡҙҙᰃẔᶹˈᅗᇚ(sleepers•1)ࡴࠄ sem•>count ϞএˈՓᕫᅗⱘؐ ᳈㊳ⱘᰃˈৃ㛑ݡг≵᳝䖯⿟Ӯᴹ૸䝦ᅗњDŽ᠔ҹˈ೼ for()ᕾ⦃Ё䗮䖛 atomic_ddd_negative()᠔԰ⱘẔ ׭ᅲ䰙ϞᏆ㒣᳝Ā䮼⼼āњDŽབᵰϡݡ԰ϔ⃵Ẕᶹˈ䙷ህӮ᮴䇧ഄ䖯ܹⴵ⳴㗠ㄝᕙᏆ㒣ᄬ೼ⱘĀ䮼⼼āDŽ 䇈ϡᅮᏆ㒣᳝ᶤϾ䖯⿟˄ᔧ✊ᰃ೼঺ϔϾ໘⧚఼Ϟ˅೼ℸᳳ䯈Ꮖ㒣ᠻ㸠њϔ⃵ up()᪡԰ˈ಴㗠䖭Ͼᯊ ᣓϡࠄĀ䮼⼼āˈ䖯ϡњЈ⬠ऎᠡࠄњ__down()ЁˈԚᰃ⬅Ѣ೼䖭䞠ⱘ spin_lock_irq()Пࠡᑊ≵᳝ࡴ䫕ˈ 㸼ᯢ˄䖲ᔧࠡ䖯⿟೼ݙ˅ϔ᳝݅޴Ͼ䖯⿟ℷ೼ㄝᕙⴔ㽕䖯ܹЈ⬠ऎDŽ঺ϔᮍ䴶ˈ㱑✊ᔧࠡ䖯⿟ᰃ಴Ў wait 䫒ܹࠄ⬅䯳߫༈ sem•>wait ҷ㸼ⱘㄝᕙ䯳߫ⱘሒ䚼DŽᔧ CPU ᠻ㸠ࠄ䖒 for(;;)ᕾ⦃ᯊˈsem•>sleepers ㋴ܗ೼ࠡϔ㡖Ёⳟࠄ䖛ˈℸ໘ϡݡ䌬䗄DŽ㗠 add_wait_queue_exclusive()߭ᡞҷ㸼ᔧࠡ䖯⿟ⱘㄝᕙ䯳߫ ㋴ⱘ᭄᥂㒧ᵘ wait_queue_t ҹঞᅣᅮН DECLARE_WAITQUEUE()ˈ䇏㗙Ꮖܗ᳝݇ㄝᕙ䯳߫Ё৘ 89 } 88 wake_up(&sem•>wait); 87 tsk•>state = TASK_RUNNING; 86 remove_wait_queue(&sem•>wait, &wait); 85 spin_unlock_irq(&semaphore_lock); 84 } 83 spin_lock_irq(&semaphore_lock); 82 tsk•>state = TASK_UNINTERRUPTIBLE; 81 schedule(); 80 79 spin_unlock_irq(&semaphore_lock); 78 sem•>sleepers = 1; /* us • see •1 above */ 77 } break; 76 407 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! sem•>countˈᑊϨ೼䗦 ޣᰒ✊ˈϢ down()ⱘҷⷕᰃⳌԐⱘˈϡৠП໘ҙ೼Ѣ䖭ᰃ䗦๲ˈ㗠ϡᰃ䗦 205 } 204 :"memory"); 203 :"c" (sem) 202 :"=m" (sem•>count) 201 ".previous" 200 "jmp 1b\n" 199 "2:\tcall __up_wakeup\n\t" 198 ".section .text.lock,\"ax\"\n" 197 "1:\n" 196 "jle 2f\n" 195 LOCK "incl %0\n\t" /* ++sem•>count */ 194 "# atomic up operation\n\t" 193 __asm__ __volatile__( 192 #endif 191 CHECK_MAGIC(sem•>__magic); 190 #if WAITQUEUE_DEBUG 189 { 188 static inline void up(struct semaphore * sem) 187 */ 186 * jumps for both down() and up(). 185 * The default case (no contention) will result in NO 184 * the semaphore was negative (== somebody was waiting on it). 183 * Note! This is subtle. We jump to wake people up only if 182 /* ==================== include/asm•i386/semaphore.h 182 205 ==================== ݡᴹⳟ up()ህ↨䕗ㅔऩњˈ䖭гᰃ೼ semaphore.h Ё˖ 㑻ᕜ催ⱘ䖯⿟ᴹ䖯㸠DŽܜ᳝ৃ㛑Ё䗨ফ䰏㗠䳔㽕ⴵ⳴ⱘ᪡԰ˈ߭ϔ㠀ϡᅰ⬅Ӭ ḌЁ䳔㽕೼Ј⬠ऎݙ䖯㸠ⱘ᪡԰ϔ㠀䛑ᰃᕜⷁ֗ⱘˈϡ㟇Ѣফ䰏˗ডПˈᖙ乏೼Ј⬠ऎݙ䖯㸠ǃ㗠জ Ё䙷МϹ䞡ˈ಴ЎݙڣᛇڣḌЁᇮ᳾ᅲ⦄ℸ⾡ᴎࠊˈ䖭гᰃϔϾৃҹᬍ䖯ⱘഄᮍDŽϡ䖛ˈ䯂乬гᑊϡ ā㒭Ј⬠ऎݙⱘ䖯⿟ˈᦤ催݊ゲѝ࡯DŽⳂࠡ೼ Linux ݙ׳㑻Āܜऎ໪ㄝᕙⱘᯊ׭ˈህ᱖ᯊᡞᅗⱘ催Ӭ 㑻催ⱘ䖯⿟೼Ј⬠ܜⱘᮍ⊩ᰃˈᔧ᳝Ӭއ㑻໾Ԣ㗠ফњ䖲㌃DŽ㾷ܜ㑻催ⱘ䖯⿟಴Ј⬠ऎݙⱘ䖯⿟Ӭܜ ϟˈӬމܹⴵ⳴ˈ✊ৢ㹿૸䝦ᯊ֓ϔᯊᕫϡࠄᴎӮ䖤㸠ˈѢᰃ֓Āᗹ᚞亢䘛Ϟњ᜶䚢ЁāDŽ೼䖭ḋⱘᚙ 㑻ᕜԢˈ㗠Ϩϔᮺ಴᪡԰ফ䰏䖯ܜӬأأ催ⱘ䖯⿟೼ᶤϔЈ⬠ऎ䮼໪ㄝᕙˈ㗠ℷ೼Ј⬠ऎ䞠䴶ⱘ䖯⿟ 㑻ᕜܜ䕀āDŽ䆩ᛇ䖭Мϔ⾡ᚙ᱃˖ϔϾӬצ㑻ܜ㑻੠Ј⬠ऎⳌ㘨㋏ˈ⿄ЎĀӬܜ䖬᳝Ͼ䯂乬гϢӬ Ёг䆌Ӯ㗗㰥䖭Ͼ䯂乬DŽ 㑻߿ᑊ≵᳝䍋԰⫼DŽ೼᳝ᅲᯊ㽕∖ⱘ㋏㒳Ё䖭᳾ᖙϡᰃϔϾ㔎䱋ˈᇚᴹⱘ⠜ᴀܜ䖯āˈ㗠䖯⿟ⱘӬܜᴹ ܜҢҷⷕЁৃҹⳟߎˈᔧ᳝໮Ͼ䖯⿟೼ㄝᕙ䖯ܹϔϾЈ⬠ऎᯊˈᔧࠡ䖯⿟⬹᳝ѯӬ࢓ˈ✊ৢህᰃĀ ህᕔᕔ㽕㒻㓁ㄝᕙњDŽ ᕾ⦃Ң__down()䖨ಲ㗠䖯ܹЈ⬠ऎDŽ೼Ң__down()䖨ಲПࠡ䖬㽕ݡҢ䯳߫Ё૸䝦ϔϾ䖯⿟ˈ㗠䙷Ͼ䖯⿟ ϟˈ㹿 up()᪡԰᠔૸䝦ⱘ䖯⿟Ӯ⺄Ϟ sem•>count Ў 0ˈҢ㗠㛑䏇ߎ for()މᰃ৺Ў䋳᭄(•1)DŽ೼݌ൟⱘᚙ Ѣᔧᯊⱘ sem•>countއ㸠㹿䆒៤ 1ˈ᠔ҹℸᯊ(sleepers•1)ᖙ✊Ў 0ˈ᠔ҹ㛑৺䖯ܹЈ⬠ऎⱘᴵӊপ 78 408 409 ๲ҹৢ㒧ᵰЎ 0 ៪䋳᭄ᯊህ䇗⫼__up_wakeup()ˈ䙷гᰃ೼ semaphore.c Ё˖ ==================== arch/i386/kernel/semaphore.c 219 231 ==================== [up()>__up_wakeup()] 219 asm( 220 ".align 4\n" 221 ".globl __up_wakeup\n" 222 "__up_wakeup:\n\t" 223 "pushl %eax\n\t" 224 "pushl %edx\n\t" 225 "pushl %ecx\n\t" 226 "call __up\n\t" 227 "popl %ecx\n\t" 228 "popl %edx\n\t" 229 "popl %eax\n\t" 230 "ret" 231 ); ৠḋˈ__up()ⱘҷⷕг೼ semaphore.c Ё˖ ==================== arch/i386/kernel/semaphore.c 41 54 ==================== [up()>__up_wakeup()>__up()] 41 /* 42 * Logic: 43 * • only on a boundary condition do we need to care. When we go 44 * from a negative count to a non•negative, we wake people up. 45 * • when we go from a non•negative count to a negative do we 46 * (a) synchronize with the "sleeper" count and (b) make sure 47 * that we're on the wakeup list before we synchronize so that 48 * we cannot lose wakeup events. 49 */ 50 51 void __up(struct semaphore *sem) 52 { 53 wake_up(&sem•>wait); 54 } 䖭䞠ⱘ wake_up()੠ϔѯ᳝݇ⱘᅣᅮН䛑ᰃ೼ sched.h ЁᅮНⱘ˖ ==================== include/linux/sched.h 555 560 ==================== 555 #define wake_up(x) __wake_up((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE,WQ_FLAG_EXCLUSIVE) 556 #define wake_up_all(x) __wake_up((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE,0) 557 #define wake_up_sync(x) __wake_up_sync((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE,WQ_FLAG_EXCLUSIVE) 558 #define wake_up_interruptible(x) __wake_up((x),TASK_INTERRUPTIBLE,WQ_FLAG_EXCLUSIVE) 559 #define wake_up_interruptible_all(x) __wake_up((x),TASK_INTERRUPTIBLE,0) 560 #define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE,WQ_FLAG_EXCLUSIVE) 㗠__wake_up()߭೼ sched.c ЁDŽ䇏㗙ৃҹⳟࠄ䖭Ͼߑ᭄ձ⃵૸䝦ϔϾ䯳߫Ёⱘ᠔᳝ヺড়ᴵӊⱘ䖯 Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 728 state = p•>state; 727 p = curr•>task; 726 #endif 725 CHECK_MAGIC(curr•>__magic); 724 #if WAITQUEUE_DEBUG 723 722 tmp = tmp•>next; 721 720 wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list); 719 unsigned int state; 718 while (tmp != head) { 717 tmp = head•>next; 716 #endif 715 WQ_BUG(); 714 if (!head•>next || !head•>prev) 713 #if WAITQUEUE_DEBUG 712 head = &q•>task_list; 711 710 #endif 709 CHECK_MAGIC_WQHEAD(q); 708 #if WAITQUEUE_DEBUG 707 706 wq_write_lock_irqsave(&q•>lock, flags); 705 best_exclusive = NULL; 704 irq = in_interrupt(); 703 best_cpu = smp_processor_id(); 702 701 goto out; 700 if (!q) 699 698 int best_cpu, irq; 697 unsigned long flags; 696 struct task_struct *p, *best_exclusive; 695 struct list_head *tmp, *head; 694 { 693 unsigned int wq_mode, const int sync) 692 static inline void __wake_up_common (wait_queue_head_t *q, unsigned int mode, [up()>__up_wakeup()>__up()>wake_up()>__wake_up()>__wake_up_common()] ==================== kernel/sched.c 692 764 ==================== 769 } 768 __wake_up_common(q, mode, wq_mode, 0); 767 { 766 void __wake_up(wait_queue_head_t *q, unsigned int mode, unsigned int wq_mode) [up()>__up_wakeup()>__up()>wake_up()>__wake_up()] ==================== kernel/sched.c 766 769 ==================== ˄kernel/sched.c˅DŽ ⿟DŽԚᰃˈབᵰϔϾ㹿૸䝦䖯⿟ⱘ TASK_EXCLUSIVE ᷛᖫЎ 1 ህϡݡ㒻㓁૸䝦䯳߫Ё݊ԭⱘ䖯⿟њ 410 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⊩DŽ䖯ϔℹˈेՓ೼ϔϾЈ⬠ऎЁӕ೒䖯ܹ঺ϔϾЈ⬠ऎˈԚᰃབᵰЎ᠔᳝ⱘЈ⬠ऎᥦདϔϾ⃵ᑣˈ ֱ䆕ϡ೼ϔϾЈ⬠ऎЁӕ೒䖯ܹ঺ϔϾЈ⬠ऎˈ䙷ህϡӮথ⫳⅏䫕ˈ㗠䖭гᰃ䰆ℶ⅏䫕ⱘ᳔ㅔऩⱘࡲ ᇐ㟈⅏䫕ⱘDŽ೼ Linux ㋏㒳Ёˈ໮᭄ⱘ䌘⑤䛑ᰃৃ݅ѿⱘˈ㗠⣀ऴ䌘⑤ⱘՓ⫼߭㕂ѢЈ⬠ऎЁDŽা㽕 䆌݅ѿⱘ䌘⑤˅ⱘՓ⫼ᰃϡӮܕ䖯㗠ࠄ䖒ৃҹ䞞ᬒ䌘⑤ⱘ䙷ϔℹDŽᰒ✊ˈᇍ݅ѿ䌘⑤˄೼Փ⫼ᳳ䯈г 䌘⑤ⱘ䖯⿟ᙄདгℷ೼ A ⱘ䯳߫Ёㄝᕙˈ䙷ህথ⫳њ᠔䇧ⱘĀ⅏䫕āˈ಴ЎℸᯊϸϾ䖯⿟䛑᮴⊩৥ࠡ᥼ ᠔䳔ⱘ䌘⑤ˈ㗠াད೼ B ⱘ䯳߫ЁㄝᕙDŽ䙷Мˈ᠔ㄝᕙⱘ䌘⑤জ೼䇕ⱘ᠟䞠ਸ਼˛བᵰᏆ㒣ऴ᳝њ䙷乍 ⬠ऎ Aˈ㗠জӕ೒䖯ܹ঺໪ϔϾЈ⬠ऎ B ⱘ䆱ˈ䙷ህৃ㛑Ӯ಴Ў䖯ܹϡњ䙷ϾЈ⬠ऎˈгህᰃᕫϡࠄ ऴⱘ䌘⑤˅ⱘ䞞ᬒDŽ㗠䖯ܹњϔϾЈ⬠ऎⱘ䖯⿟߭ऴ⫼њϔ乍⣀ऴ䌘⑤DŽབᵰϔϾ䖯⿟䖯ܹњϔϾЈ ৃҹⳟߎˈᔧϔϾ䖯⿟ℷ೼ㄝᕙ䖯ܹϔϾЈ⬠ऎᯊˈᅗ᠔ㄝᕙⱘᰃ⣀ऴ䌘⑤˄೼Փ⫼ᳳ䯈䳔㽕⣀ 764 } 763 return; 762 out: 761 wq_write_unlock_irqrestore(&q•>lock, flags); 760 } 759 wake_up_process(best_exclusive); 758 else 757 wake_up_process_synchronous(best_exclusive); 756 if (sync) 755 if (best_exclusive) { 754 } 753 } 752 } 751 break; 750 if (curr•>flags & wq_mode & WQ_FLAG_EXCLUSIVE) 749 wake_up_process(p); 748 else 747 wake_up_process_synchronous(p); 746 if (sync) 745 } else { 744 } 743 break; 742 best_exclusive = p; 741 if (p•>processor == best_cpu) { 740 best_exclusive = p; 739 if (!best_exclusive) 738 if (irq && (curr•>flags & wq_mode & WQ_FLAG_EXCLUSIVE)) { 737 */ 736 * CPU. 735 * prefer processes which are affine to this 734 * If waking up from an interrupt context then 733 /* 732 #endif 731 curr•>__waker = (long)__builtin_return_address(0); 730 #if WAITQUEUE_DEBUG if (state & mode) { 729 411 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 29 (0) 28 #define read_unlock_bh(lock) do { read_unlock(lock); local_bh_enable(); } while (0) 27 #define read_unlock_irq(lock) do { read_unlock(lock); local_irq_enable(); } while while (0) 26 #define read_unlock_irqrestore(lock, flags) do { read_unlock(lock); local_irq_restore(flags); } 25 (0) 24 #define spin_unlock_bh(lock) do { spin_unlock(lock); local_bh_enable(); } while (0) 23 #define spin_unlock_irq(lock) do { spin_unlock(lock); local_irq_enable(); } while while (0) 22 #define spin_unlock_irqrestore(lock, flags) do { spin_unlock(lock); local_irq_restore(flags); } 21 20 #define write_lock_bh(lock) do { local_bh_disable(); write_lock(lock); } while (0) (0) 19 #define write_lock_irq(lock) do { local_irq_disable(); write_lock(lock); } while while (0) 18 #define write_lock_irqsave(lock, flags) do { local_irq_save(flags); write_lock(lock); } 17 16 #define read_lock_bh(lock) do { local_bh_disable(); read_lock(lock); } while (0) 15 #define read_lock_irq(lock) do { local_irq_disable(); read_lock(lock); } while (0) while (0) 14 #define read_lock_irqsave(lock, flags) do { local_irq_save(flags); read_lock(lock); } 13 12 #define spin_lock_bh(lock) do { local_bh_disable(); spin_lock(lock); } while (0) 11 #define spin_lock_irq(lock) do { local_irq_disable(); spin_lock(lock); } while (0) while (0) 10 #define spin_lock_irqsave(lock, flags) do { local_irq_save(flags); spin_lock(lock); } 9 */ 8 * locks.. 7 * These are the generic versions of the spinlocks and read•write 6 /* ==================== include/linux/spinlock.h 6 32 ==================== ᮴ৃ᳓ҷⱘ԰⫼DŽ᭛ӊ include/1inux/spinlock.h ЁᅮНњϔѯࡴ䫕᪡԰˖ spin_unlock_irq()ህᰃ݊ЁПϔDŽ⡍߿ᰃ೼໮໘⧚఼ SMP 㒧ᵘⱘ㋏㒳Ёˈ⬅䕃ӊᅲ⦄ⱘ৘⾡䫕ᇸ݊䍋ⴔ ঺ϔ⾡᳝ᬜⱘ᠟↉ህᰃࡴ䫕DŽ䇏㗙೼ࠡ䴶__down()ⱘҷⷕЁⳟࠄⱘ spin_lock_irq()੠ ϡ㛑䰆ℶᴹ㞾঺ϔ໘⧚఼ϞⱘЁᮁ᳡ࡵ⿟ᑣ៪䖯⿟ⱘᑆᡄDŽ ⱘᚳϔ᠟↉DŽ՟བˈ݇Ёᮁ᮴⭥ᰃֱ䆕ৠϔ໘⧚఼Ё䖯⿟ϢЁᮁ᳡ࡵ⿟ᑣ䯈Ѧ᭹ⱘϔ⾡᠟↉ˈԚᰃᅗ bh ߑ᭄˅П䯈DŽৠᯊˈĀֵো䞣āгᑊ䴲䰆ℶ䖯⿟䯈ˈ⡍߿ᰃ೼ϡৠ໘⧚఼Ϟ䖤㸠ⱘ䖯⿟П䯈ѦⳌᑆᡄ ೼ݙḌЁˈ䳔㽕ĀѦ᭹āⱘϡҙҙᰃ䖯⿟Ϣ䖯⿟П䯈ˈᑆᡄгৃ㛑থ⫳Ѣ䖯⿟ϢЁᮁ᳡ࡵ⿟ᑣ˄៪ ℶ䖯⿟೼ϔϾЈ⬠ऎЁϡᣝ⃵ᑣ䖯ܹ঺ϔϾЈ⬠ऎⱘ᥾ᮑDŽ᠔ҹˈ䖭гᰃᇚᴹৃࡴҹᬍ䖯ⱘϔϾᮍ䴶DŽ ߭гϡӮথ⫳Ϟ䗄಴ᕾ⦃ㄝᕙ㗠ᓩ䍋ⱘ⅏䫕DŽ೼ⳂࠡⱘݙḌЁᇮ᮴䰆ℶ੠࣪㾷⅏䫕ⱘ᥾ᮑˈг≵᳝䰆 䖯 B ৢ䖯 A˅ˈܜ䆌ܕ䖯 A ৢ䖯 Bˈ㗠ϡܜ᠔᳝ⱘ䖯⿟೼䖯ܹЈ⬠ऎᯊ䛑䙉ᅜⳌৠⱘ⃵ᑣ˄՟བˈা㛑 412 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 91 } 90 :"=m" (lock•>lock) : : "memory"); 89 spin_lock_string 88 __asm__ __volatile__( 87 #endif 86 } 85 BUG(); 84 printk("eip: %p\n", &&here); 83 if (lock•>magic != SPINLOCK_MAGIC) { 82 here: 81 __label__ here; 80 #if SPINLOCK_DEBUG 79 { 78 static inline void spin_lock(spinlock_t *lock) ==================== include/asm•i386/spinlock.h 78 91 ==================== ݡᴹⳟ spin_lock()ˈ݊ᅮН೼ spinlock.h˖ г೼ѢℸDŽ ੠ pop ᣛҸ㒣䖛ේᷜᇚ݊ݙᆍֱᄬࠄখ᭄ x ЁDŽⳌᑨഄˈlocal_irq_restore()Ϣ local_irq_enable()ⱘऎ߿ ህᰃᡞ IF ᷛᖫԡ⏙ 0˅ˈҹ֓೼এ䫕ᯊࡴҹᘶ໡DŽ⬅Ѣ⢊ᗕᷛᖫᆘᄬ఼ᑊ䴲䗮⫼ᆘᄬ఼ˈ᠔ҹ㽕⫼ push ఼⢊ᗕᷛᖫᆘᄬ఼ⱘݙᆍֱᄬ䍋ᴹˈ಴Ў݊Ёⱘ IF ᷛᖫህড᯴ᔧࠡⱘЁᮁᰃᓔⴔ䖬ᰃ݇ⴔ˄ᣛҸ cli ᡞᔧࠡⱘ໘⧚ܜৃ㾕ˈlocal_irq_save()੠ local_irq_disable()䛑䗮䖛 cli ᣛҸᴹ݇䯁ЁᮁˈԚᰃࠡ㗙 307 #define local_irq_enable() __sti() 306 #define local_irq_disable() __cli() 305 #define local_irq_restore(x) __restore_flags(x) */ :"memory") 304 #define local_irq_save(x) __asm__ __volatile__("pushfl ; popl %0 ; cli":"=g" (x): /* no input ==================== include/asm•i386/system.h 304 307 ==================== ᓔЁᮁˋ݇Ёᮁⱘ᪡԰ˈ䖭䛑ᰃ೼ include/asm•i386/system.h ЁᅮНⱘ˖ ᴹⳟ໘⧚ܜЁᮁડᑨDŽ঺ϔ䚼ߚᰃ᪡԰ৡҹ_lock 㒧ሒⱘˈ݊԰⫼ᰃ䰆ℶᴹ㞾݊ᅗ໘⧚఼ⱘᑆᡄDŽ៥Ӏ ↣ϔϾ᪡԰䛑ࣙ৿њϸ䚼ߚDŽϔ䚼ߚᰃ᪡԰ৡҹ local_ᓔ༈ⱘˈ݊԰⫼ᰃ݇䯁៪ᓔਃᴀ໘⧚఼Ϟⱘ ࠡ㗙䇗⫼ spin_lock()㗠ৢ㗙䇗⫼ read_lock()DŽ ݡᴹⳟⳟϡৠ㒘ⱘࡴ䫕᪡԰᳝ҔМϡৠDŽ՟བˈspin_lock_irq()੠ read_lock_irq()П䯈ⱘऎ߿ҙ೼Ѣ spin_unlock_irq()П䯈ⱘऎ߿гህ಴ℸ㗠ϡৠˈࠡ㗙䇗⫼ local_irq_restore()㗠ৢ㗙䇗⫼ local_irq_enable()DŽ ೼Ѣࠡ㗙䇗⫼ local_irq_save()㗠ৢ㗙䇗⫼ local_irq_disable()DŽⳌᑨⱘ㾷䫕᪡԰ spin_unlock_irqsave()Ϣ ᴹⳟⳟৠϔ㒘ࡴ䫕᪡԰П䯈ⱘϡৠDŽ՟བˈspin_lock_irqsave()Ϣ spin_lock_irq()П䯈ⱘऎ߿ҙܜ佪 (0) 32 #define write_unlock_bh(lock) do { write_unlock(lock); local_bh_enable(); } while (0) 31 #define write_unlock_irq(lock) do { write_unlock(lock); local_irq_enable(); } while while (0) define write_unlock_irqrestore(lock, flags) do { write_unlock(lock); local_irq_restore(flags); }# 30 413 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 94 { 93 static inline void spin_unlock(spinlock_t *lock) ==================== include/asm•i386/spinlock.h 93 104 ==================== 㟇Ѣ spin_unlock()ⱘҷⷕ䙷ህᕜㅔऩњ˖ ໾䭓ˈ৺߭ህৃ㛑໾⌾䌍њDŽ ⱘˈᅗৃ㛑䇗⫼㞾ϔ↉Ёᮁ᳡ࡵ⿟ᑣ៪㗙 bh ߑ᭄ˈḍᴀህϡᰃৃ䇗ᑺⱘDŽ䖭г䇈ᯢˈࡴ䫕ⱘᯊ䯈ϡ㛑 ѯ᳝⫼ࡳਸ਼˛䖭ᰃ಴Ўᛇ㽕ࡴ䫕ⱘ䖭↉⿟ᑣ᳾ᖙᰃ೼ϔϾ䖯⿟ⱘϞϟ᭛Ё䇗⫼خCPU 䅽㒭݊ᅗ䖯⿟ᴹ ೼ᇍֵো䞣ⱘ down()᪡԰䙷ḋ䖯ܹⴵ⳴ˈᡞڣ᮴⫼ࡳDŽ䙷МЎҔМϡخϡᮁഄ䖭М䖲䕈䕀ˈᔧ✊ᰃ೼ ؐˈⳈ㟇݊ব៤໻Ѣ 0 Ўℶˈ᠔ҹᠡ᳝ spin_lock 䖭ϾৡᄫDŽ᠔䇧 spin ህᰃĀ䖲䕈䕀āⱘᛣᗱDŽ໘⧚఼ ҢҷⷕЁৃҹⳟߎˈབᵰ lock•>lock ⱘؐॳᴹህᏆ㒣ᰃ 0 ៪䋳᭄ˈ߭໘⧚఼ϡᮁഄᕾ⦃⌟䆩ᅗⱘ 䖭ᯊህ䕀⿏ࠄᷛো“2ā໘ᕾ⦃⌟䆩ˈㄝᕙࡴ䫕㗙এ䫕ৢᇚ lock•>lock 䆒㕂៤໻Ѣ 0ˈ✊ৢজ䆩ⴔࡴ䫕DŽ ࡴњ䫕ˈ಴ℸ㹿䫕೼њ䮼໪ˈܜҹৢⱘ㒧ᵰ៤њ䋳᭄ˈ䙷ህ㸼⼎Ꮖ㒣᳝݊ᅗ᪡԰ 1 ޣ⦃䖨ಲњDŽབᵰথ ҹৢˈ㽕ᰃ㒧ᵰ䴲䋳˄ヺোԡЎ 0˅߭ࡴ䫕៤ࡳˈ᠔ҹህ 1 ޣҹℸᴹֱ䆕䆹ᴵᣛҸᠻ㸠ⱘĀॳᄤᗻāDŽ 㸼⼎᪡԰᭄Ў 8 ԡDŽ䖭ᴵᣛҸᏺ᳝ࠡ㓔“lockāˈ㸼⼎೼ᠻ㸠ᯊ㽕ᇚᘏ㒓䫕ԣˈϡ䅽݊ᅗ໘⧚఼䆓䯂ˈ 㗠ৢ㓔 b ߭ˈ1 ޣ 䖭䞠ⱘ%0 Ϣখ᭄ lock•>lock Ⳍ㒧ড়DŽ䖭䞠ⱘᣛҸ decb ᇚ᪡԰᭄ˈे lock•>lock 60 ".previous" 59 "jmp 1b\n" \ 58 "jle 2b\n\t" \ 57 "rep;nop\n\t" \ 56 "cmpb $0,%0\n\t" \ 55 "2:\t" \ 54 ".section .text.lock,\"ax\"\n" \ 53 "js 2f\n" \ 52 "lock ; decb %0\n\t" \ 51 "\n1:\t" \ 50 #define spin_lock_string \ ==================== include/asm•i386/spinlock.h 50 60 ==================== ⱘĀӬ࣪āDŽҷⷕЁᓩ⫼ⱘ spin_lock_string জᰃϔϾᅣᅮН˖ བᵰϡ㗗㰥䇗䆩ˈ䖭ᅲ䰙ϞህᰃϔϾ᮴ヺোᭈ᭄ˈԚᰃ䖭ḋ᳝߽Ѣ䰆ℶ gcc ೼㓪䆥䖛⿟ࡴҹ᳝ᆇ 26 } spinlock_t; 25 #endif 24 unsigned magic; 23 #if SPINLOCK_DEBUG 22 volatile unsigned int lock; 21 typedef struct { 20 19 */ 18 * Your basic SMP spinlocks, allowing only a single CPU anywhere 17 /* ==================== include/asm•i386/spinlock.h 17 26 ==================== খ᭄ lock ⱘ㉏ൟЎ spinlock_tˈᅮНѢ include/asm•i386/spinlock.h˖ 414 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 150 BUG(); 149 if (rw•>magic != RWLOCK_MAGIC) 148 #if SPINLOCK_DEBUG 147 { 146 static inline void read_lock(rwlock_t *rw) 145 144 /* the spinlock helpers are in arch/i386/kernel/semaphore.c */ 143 */ 142 * semaphore.h for details. •ben 141 * Changed to use the same technique as rw semaphores. See 140 * 139 * The inline assembly is non•obvious. Think about it. 138 * 137 * with the high bit (sign) being the "contended" bit. 136 * On x86, we implement read•write locks as a 32•bit counter 135 /* ==================== include/asm•i386/spinlock.h 135 165 ==================== 㒭䇏㗙㞾Ꮕ䯙䇏˖ ݡᴹⳟ read_lock()੠ write_lock()ˈ݊ᅲ⦄гᰃ໻ৠᇣᓖˈ៥Ӏᡞ䖭ѯҷⷕ˄г೼ spinlock.h Ё˅⬭ ↉ҷⷕ៪ফৠϔᡞ䫕ֱᡸⱘҷⷕˈህৃҹ㒻㓁䖤㸠DŽ ᑆ㛚ᡞᘏ㒓䫕њ䙷ህ䖲Ёᮁгϡ㛑ડᑨњDŽݡ䇈ˈ㋏㒳Ёг䖬ৃ㛑᳝݊ᅗ໘⧚఼ˈা㽕ϡᛇ䖯ܹৠϔ ᮴⫼ࡳⱘ໘⧚఼ҡ㛑ડᑨЁᮁDŽ㗠བᵰخⱘџᚙᅠ៤ҹৢᠡᓔᬒਸ਼˛䖭䖬ᰃϡৠⱘˈℷ೼䖲䕈䕀خ㽕 ᮴⫼ࡳˈ䙷Ўԩϡᑆ㛚ህᡞᘏ㒓䫕њˈϔⳈࠄخ䇏㗙г䆌Ӯ䯂ˈ᮶✊㹿䫕೼䮼໪ⱘ໘⧚఼াᰃ೼ ᘏ㒓㾦ᑺⳟϡᰃॳᄤᗻⱘDŽ movb ⱘ᪡԰ᴀ䑿ህᰃॳᄤᗻⱘDŽⳌ↨Пϟˈࠡ䴶ⱘᣛҸ decb ಴Ў⍝ঞĀ䇏— ᬍ— ݭā਼ᳳˈ᠔ҹҢ ҷⷕЁⱘᣛҸ movb ᇚ lock•>lock 䆒㕂៤ 1ˈབℸ㗠ᏆDŽ䖭ᴵᣛҸϡᏺ᳝ࠡ㓔“lockāˈ಴ЎᣛҸ 66 "movb $1,%0" 65 #define spin_unlock_string \ 64 */ 63 * This works. Despite all the confusion. 62 /* ==================== include/asm•i386/spinlock.h 62 66 ==================== ৠḋˈspin_unlock_string гᰃϾᅣᅮН˖ 104 } 103 :"=m" (lock•>lock) : : "memory"); 102 spin_unlock_string 101 __asm__ __volatile__( 100 #endif 99 BUG(); 98 if (!spin_is_locked(lock)) 97 BUG(); 96 if (lock•>magic != SPINLOCK_MAGIC) if SPINLOCK_DEBUG# 95 415 416 151 #endif 152 __build_read_lock(rw, "__read_lock_failed"); 153 } 154 155 static inline void write_lock(rwlock_t *rw) 156 { 157 #if SPINLOCK_DEBUG 158 if (rw•>magic != RWLOCK_MAGIC) 159 BUG(); 160 #endif 161 __build_write_lock(rw, "__write_lock_failed"); 162 } 163 164 #define read_unlock(rw) asm volatile("lock ; incl %0" :"=m" ((rw)•>lock) : : "memory") 165 #define write_unlock(rw) asm volatile("lock ; addl $" RW_LOCK_BIAS_STR ",%0":"=m" ((rw)•>lock) : : "memory") ҷⷕЁᓩ⫼ⱘϔѯᅣ᪡԰੠ᅣᅮНЎ˖ ==================== include/asm•i386/rwlock.h 20 81 ==================== 20 #define RW_LOCK_BIAS 0x01000000 21 #define RW_LOCK_BIAS_STR "0x01000000" 22 23 #define __build_read_lock_ptr(rw, helper) \ 24 asm volatile(LOCK "subl $1,(%0)\n\t" \ 25 "js 2f\n" \ 26 "1:\n" \ 27 ".section .text.lock,\"ax\"\n" \ 28 "2:\tcall " helper "\n\t" \ 29 "jmp 1b\n" \ 30 ".previous" \ 31 ::"a" (rw) : "memory") 32 33 #define __build_read_lock_const(rw, helper) \ 34 asm volatile(LOCK "subl $1,%0\n\t" \ 35 "js 2f\n" \ 36 "1:\n" \ 37 ".section .text.lock,\"ax\"\n" \ 38 "2:\tpushl %%eax\n\t" \ 39 "leal %0,%%eax\n\t" \ 40 "call " helper "\n\t" \ 41 "popl %%eax\n\t" \ 42 "jmp 1b\n" \ 43 ".previous" \ 44 :"=m" (*(volatile int *)rw) : : "memory") 45 46 #define __build_read_lock(rw, helper) do { \ 47 if (__builtin_constant_p(rw)) \ 48 __build_read_lock_const(rw, helper); \ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com 417 49 else \ 50 __build_read_lock_ptr(rw, helper); \ 51 } while (0) 52 53 #define __build_write_lock_ptr(rw, helper) \ 54 asm volatile(LOCK "subl $" RW_LOCK_BIAS_STR ",(%0)\n\t" \ 55 "jnz 2f\n" \ 56 "1:\n" \ 57 ".section .text.lock,\"ax\"\n" \ 58 "2:\tcall " helper "\n\t" \ 59 "jmp 1b\n" \ 60 ".previous" \ 61 ::"a" (rw) : "memory") 62 63 #define __build_write_lock_const(rw, helper) \ 64 asm volatile(LOCK "subl $" RW_LOCK_BIAS_STR ",(%0)\n\t" \ 65 "jnz 2f\n" \ 66 "1:\n" \ 67 ".section .text.lock,\"ax\"\n" \ 68 "2:\tpushl %%eax\n\t" \ 69 "leal %0,%%eax\n\t" \ 70 "call " helper "\n\t" \ 71 "popl %%eax\n\t" \ 72 "jmp 1b\n" \ 73 ".previous" \ 74 :"=m" (*(volatile int *)rw) : : "memory") 75 76 #define __build_write_lock(rw, helper) do { \ 77 if (__builtin_constant_p(rw)) \ 78 __build_write_lock_const(rw, helper); \ 79 else \ 80 __build_write_lock_ptr(rw, helper); \ 81 } while (0) 䇗⫼__build_read_lock()੠__build_write_lock()ᯊⱘ㄀ѠϾখ᭄䛑ᰃߑ᭄ᣛ䩜ˈߚ߿Ў __read_lock_failed ੠__write_lock_failed˄㾕 152 㸠੠ 161 㸠˅ˈ݊ҷⷕབϟ˖ ==================== arch/i386/kernel/semaphore.c 426 453 ==================== 426 #if defined(CONFIG_SMP) 427 asm( 428 " 429 .align 4 430 .globl __write_lock_failed 431 __write_lock_failed: 432 " LOCK "addl $" RW_LOCK_BIAS_STR ",(%eax) 433 1: cmpl $" RW_LOCK_BIAS_STR ",(%eax) 434 jne 1b 435 436 " LOCK "subl $" RW_LOCK_BIAS_STR ",(%eax) Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Ўњᡧᇣً㗠ܼජ៦ϹϔḋˈሲѢĀ䰆㣗䖛ᔧāњDŽڣབᵰϡ䯂䴦㑶ⱖⱑˈࡼϡࡼህࡴ䫕ˈ䙷ህད 䖬᳝ˈЈ⬠ऎ੠ࡴ䫕೼ὖᗉϞᰃ㉏ԐⱘˈԚᇍ㋏㒳ᕅડⱘ⿟ᑺैϡৠˈ᠔ҹ೼Փ⫼ᯊ㽕ࡴҹऎߚDŽ ⱘ䯂乬DŽ ⱘˈ䳔㽕ᬒ೼Ј⬠ऎЁ៪㗙ࡴ䫕ⱘҷⷕᰃᕜᇥǃᕜⷁⱘDŽ᠔ҹˈ೼ᅲ䰙Փ⫼Ё⅏䫕ᑊϡᰃϔϾᕜ⦄ᅲ 䕗᳝㒣偠ǃ∈ᑇ↨䕗催ⱘҎˈҪӀ㞾Ӯ⊼ᛣ䖭Ͼ䯂乬DŽ঺ϔᮍ䴶ˈLinux ㋏㒳Ёⱘ໮᭄䌘⑤䛑ᰃৃ݅ѿ ↣Ͼ⿟ᑣਬ䛑䳔㽕㓪ݭ೼ݙḌЁ䖤㸠ⱘ⿟ᑣˈ䋳䋷ᓔথݙḌ⿟ᑣ˄བ䆒໛偅ࡼ˅ⱘ⿟ᑣਬ䗮ᐌᘏᰃ↨ Ⳃࠡⱘ Linux ݙḌЁᇮ᳾䞛প᥾ᮑᴹ䰆ℶ䖭⾡⅏䫕ˈ䖭г᳝ᕙѢᇚᴹ䖯ϔℹⱘᬍ䖯DŽϡ䖛ˈᑊ䴲 ৢ䖯ܹ BDŽ 䖯ܹ Aܜⱘ䆱ˈህ㽕Ў᠔᳝ࡴ䫕ⱘҷⷕ↉ᓎゟϔϾ㒳ϔⱘ⃵ᑣˈ՟བᖙ乏ᰃخ䆌䖭ḋܕ(2) བᵰ 䆌೼䖯ܹࡴ䫕ⱘҷⷕҹৢݡ䖯ܹ݊ᅗࡴ䫕ⱘҷⷕˈ䖭ᰃ᳔ㅔऩⱘDŽܕ(1) ϡ ৃ㛑Ӯᔶ៤ⱘ⅏䫕ᰃϔ㟈ⱘDŽ䰆ℶℸ⾡⅏䫕ⱘ᠟↉Џ㽕᳝˖ Ͼ໘⧚఼䛑䱋ܹњ䖲䕈䕀ˈ䙷ህᔶ៤њ⅏䫕DŽ䖭䎳೼ֵো䞣Ϟ䗮䖛 down()᪡԰ጠ༫ഄ䖯ܹњЈ⬠ऎᯊ ৃ㛑೼䙷䞠㹿݇೼䮼໪ˈ㗠བᵰᏆ㒣೼ B Ёⱘ໘⧚఼ᙄདজ಴ӕ೒䖯ܹ A 㗠г㹿݇೼њ䮼໪ˈ಴ℸϸ ؐᕫᣛߎⱘᰃˈབᵰ CPU 䖯ܹњϔ↉ࡴ䫕ⱘҷⷕ A ҹৢজӕ೒䖯ܹ঺ϔ↉ࡴ䫕ⱘҷⷕ Bˈ䙷ህ᳝ 453 #endif 452 ); 451 " 450 ret 449 js __read_lock_failed 448 lock ; decl (%eax) 447 446 js 1b 445 1: cmpl $1,(%eax) 444 lock ; incl (%eax) 443 __read_lock_failed: 442 .globl __read_lock_failed 441 .align 4 440 439 438 ret jnz __write_lock_failed 437 418 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ᑣህৃҹᡞ᠔᳝ⱘ᭛ӊ䛑ⳟ԰ϔ㟈ⱘǃᢑ䈵ⱘ“VFS ᭛ӊāˈ䗮䖛䖭ѯ㋏㒳䇗⫼ᇍ᭛ӊ䖯㸠᪡԰ˈ㗠᮴ ⱘ᭛ӊ᪡԰ᵘ៤ˈҹ㋏㒳䇗⫼ⱘᔶᓣᦤկѢ⫼᠋⿟ᑣˈབ read()ǃwrite()ǃlseek()ㄝㄝDŽ䖭ḋˈ⫼᠋⿟ ⱘǃᢑ䈵ޚᰃ᠔䇧Ā㰮ᢳ᭛ӊ㋏㒳”VFS˄Virtual Filesystem System˅DŽ䖭Ͼᢑ䈵ⱘ⬠䴶Џ㽕⬅ϔ㒘ᷛ এ৘⾡ϡৠ᭛ӊ㋏㒳ⱘᅲ⦄㒚㡖ˈЎ⫼᠋⿟ᑣᦤկϔϾ㒳ϔⱘǃᢑ䈵ⱘǃ㰮ᢳⱘ᭛ӊ㋏㒳⬠䴶ˈ䖭ህ гህᰃৠϔ㒘㋏㒳䇗⫼ˈᇍ৘⾡ϡৠⱘ᭛ӊ㋏㒳˄ҹঞ᭛ӊ˅䖯㸠᪡԰DŽ䖭ḋˈህৃҹᇍ⫼᠋⿟ᑣ䱤 䅽ݙḌЁⱘ᭛ӊ㋏㒳⬠䴶៤Ўϔᴵ᭛ӊ㋏㒳Āᘏ㒓āˈՓᕫ⫼᠋⿟ᑣৃҹ䗮䖛ৠϔϾ᭛ӊ㋏㒳᪡԰⬠䴶ˈ ㋏㒳ⱘ䯂乬DŽ㽕ᅲ⦄䖭ϾⳂⱘˈህ㽕ᇚᇍ৘⾡ϡৠ᭛ӊ㋏㒳ⱘ᪡԰੠ㅵ⧚㒇ܹࠄϔϾ㒳ϔⱘḚᶊЁDŽ 䰸 Linux ᴀ䑿ⱘ᭛ӊ㋏㒳 Ext2 ໪ˈ䆒䅵Ҏਬᕜᮽህ⊼ᛣࠄњབԩՓ Linux ᬃᣕ݊ᅗ৘⾡ϡৠ᭛ӊ ㋏㒳DŽ䖭Ͼ᭛ӊ㋏㒳ৃҹ䇈ህᰃ“Linux ᭛ӊ㋏㒳āDŽ ⡍߿ᰃ਌পњ໮ᑈᴹᇍӴ㒳 Unix ᭛ӊ㋏㒳ⱘ৘⾡ᬍ䖯᠔㌃⿃䍋ⱘ㒣偠ˈ᳔ৢᔶ៤њ⦄೼ⱘ Ext2 ᭛ӊ ᭛ӊ㋏㒳ⱘ໻ᇣҙ䰤Ѣ 64M ᄫ㡖ˈ᭛ӊৡ䭓ᑺ䰤Ѣ 14 ϾᄫヺDŽ᠔ҹˈ㒣䖛ϔ↉ᯊ䯈ⱘᬍ䖯੠থሩˈ Linux ᳔߱䞛⫼ⱘᰃ minix ⱘ᭛ӊ㋏㒳ˈԚᰃ minix াᰃϔ⾡ᅲ偠ᗻ˄⫼Ѣᬭᄺ˅ⱘ᪡԰㋏㒳ˈ݊ ህᰃᴀゴⱘЏ㽕䆱乬DŽ (3) ᣛ᪡԰㋏㒳Ё˄䗮ᐌ೼ݙḌЁ˅⫼ᴹㅵ⧚᭛ӊ㋏㒳ҹঞᇍ᭛ӊ䖯㸠᪡԰ⱘᴎࠊঞ݊ᅲ⦄ˈ䖭 㒳ᯊˈᣛⱘህᰃ䖭ϾᛣᗱDŽ ҟ䋼DŽᔧ៥Ӏ䇈Āᅝ㺙ā៪ĀᢚौāϔϾ᭛ӊ㋏ټ(2) ᣛᣝ⡍ᅮḐᓣ䖯㸠њĀḐᓣ࣪āⱘϔഫᄬ 㗠 Windows NT ⱘ᭛ӊ㋏㒳ᰃ NTFS ៪ FAT32ˈህᰃᣛ䖭ϾᛣᗱDŽ (1) ᣛϔ⾡⡍ᅮⱘ᭛ӊḐᓣDŽ՟བˈ៥Ӏ䇈 Linux ⱘ᭛ӊ㋏㒳ᰃ Ext2ˈMSDOSⱘ᭛ӊ㋏㒳ᰃ FAT16ˈ ᥂Ϟϟ᭛ᠡ㛑ࡴҹऎߚ˖ ৃᰃˈेՓᡯᓔĀ᭛ӊā䖭Ͼ䆡ⱘ῵㊞ᗻϡ䇈ˈĀ᭛ӊ㋏㒳ā䖭Ͼ䆡জ䖯ϔℹ᳝޴⾡ϡৠⱘ৿Нˈ㽕ḍ ᙃˈԚᰃᦦষⱘথ䗕ッĀ⍜㗫āֵᙃˈ㗠᥹ᬊッ߭Āѻߎāֵᙃˈ᠔ҹᡞᦦষⳟ៤᭛ӊᰃড়Т䘏䕥ⱘDŽ ⴔⱘֵټ⍜㗫ֵᙃⱘ䛑ᰃ᭛ӊDŽҹ೼㔥㒰⦃๗Ё⫼ᴹᬊথ᡹᭛ⱘĀᦦষāᴎࠊᴹ䇈ˈᅗህᑊϡҷ㸼ᄬ ᙃDŽᑓНഄ䇈ˈ߭ Unix Ңϔᓔྟᯊህᡞ໪䚼䆒໛䛑ᔧ៤Ā᭛ӊāDŽҢ䖭ϾᛣНϞ䆆ˈ޵ᰃৃҹѻ⫳៪ Ѣӏԩҟ䋼˄ࣙᣀݙᄬ˅Ёⱘϔ㒘ֵټ䇈ˈĀ᭛ӊāᰃᣛĀ⺕Ⲭ᭛ӊāˈ䖯㗠ৃҹᰃ᳝㒘㒛᳝⃵ᑣഄᄬ ЁĀ᭛ӊāⱘ৿Нህ᳝⣁НϢᑓНПߚDŽ⣁Нഄ݊ˈܜĀ᭛ӊ㋏㒳ā䖭Ͼ䆡ⱘ৿Н↨䕗῵㊞DŽ佪 ӊ㋏㒳DŽ кⱘࠡ޴ゴЁᏆ㒣䆆䗄њϢ Linux ⱘ䖯⿟ㅵ⧚᳝݇ⱘݙᆍˈҢ⦄೼ᓔྟˈ䅽៥Ӏᡞ⊼ᛣ࡯䕀৥ᅗⱘ᭛ MSDOS˅᳝߭᭛ӊ㋏㒳㗠≵᳝䖯⿟ㅵ⧚DŽৃᰃˈ㽕ᰃѠ㗙䛑≵᳝ˈ䙷ህ⿄ϡϞĀ᪡԰㋏㒳āњDŽ೼ᴀ Ϟˈ᳝ѯ᪡԰㋏㒳˄བϔѯĀጠܹᓣā㋏㒳˅ৃ㛑᳝䖯⿟ㅵ⧚㗠≵᳝᭛ӊ㋏㒳˗㗠঺ϔѯ᪡԰㋏㒳˄བ 㢹㽕䯂ᵘ៤ϔϾĀ᪡԰㋏㒳āⱘ᳔䞡㽕ⱘ䚼ӊᰃҔМˈ䙷ህ㥿䖛Ѣ䖯⿟ㅵ⧚੠᭛ӊ㋏㒳њDŽџᅲ 5.1 ὖ䗄 ㄀5ゴ᭛ӊ㋏㒳 419 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! Linux ݙḌЁᇍ VFS Ϣ݋ԧ᭛ӊ㋏㒳ⱘ݇㋏ߦߚৃҹ⫼೒ 5.1 㸼⼎DŽ ৢӮⳟࠄ᳝݇ⱘ䆺ᚙDŽ ᡞ݋ԧⱘĀ᥹ষवāᦦܹࠄњĀᦦῑāЁDŽ䇏㗙ҹڣ݋ԧ᭛ӊ㋏㒳᠔ᦤկⱘϔ㒘ߑ᭄ᣖϞњ䩽ˈህད ⱘᣛ䩜 f_op 䆒㕂៤ᣛ৥ᶤϾ݋ԧⱘ file_operations 㒧ᵘˈህᣛᅮњ䖭Ͼ᭛ӊ᠔ሲⱘ᭛ӊ㋏㒳ˈᑊϨϢ ᭛āDŽ䖭⾡䖲᥹ҹϔϾ file ᭄᥂㒧ᵘ԰Ўҷ㸼ˈ㒧ᵘЁ᳝Ͼ file_operations 㒧ᵘᣛ䩜 f_opDŽᇚ file 㒧ᵘЁ ↣Ͼ䖯⿟䗮䖛Āᠧᓔ᭛ӊā˄open()˅Ϣ݋ԧⱘ᭛ӊᓎゟ䍋䖲᥹ˈ៪㗙䇈ᓎゟ䍋ϔϾ䇏ݭⱘĀϞϟ Ё৘⾡ᣛ䩜ⱘ⫼䗨ˈҹৢ䇏㗙ᇚӮⳟࠄ݊ЁЏ㽕ⱘϔѯ᪡԰ⱘ݌ൟҷⷕDŽ 㒳ϡᬃᣕᶤ⾡᪡԰ˈ݊ file_operations 㒧ᵘЁⱘⳌᑨߑ᭄ᣛ䩜ህᰃ NULLDŽ៥Ӏϡ೼䖭䞠ҟ㒡䖭Ͼ㒧ᵘ ᰃϾߑ᭄䏇䕀㸼ˈ՟བ read ህᣛ৥݋ԧ᭛ӊ㋏㒳⫼ᴹᅲ⦄䇏᭛ӊ᪡԰ⱘܹষߑ᭄DŽབᵰ݋ԧⱘ᭛ӊ㋏ ↣⾡᭛ӊ㋏㒳䛑᳝㞾Ꮕⱘ file_operations ᭄᥂㒧ᵘˈ㒧ᵘЁⱘ៤ߚ޴Тܼᰃߑ᭄ᣛ䩜ˈ᠔ҹᅲ䰙Ϟ 790 }; 789 ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *); 788 ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *); 787 int (*lock) (struct file *, int, struct file_lock *); 786 int (*fasync) (int, struct file *, int); 785 int (*fsync) (struct file *, struct dentry *, int datasync); 784 int (*release) (struct inode *, struct file *); 783 int (*flush) (struct file *); 782 int (*open) (struct inode *, struct file *); 781 int (*mmap) (struct file *, struct vm_area_struct *); 780 int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); 779 unsigned int (*poll) (struct file *, struct poll_table_struct *); 778 int (*readdir) (struct file *, void *, filldir_t); 777 ssize_t (*write) (struct file *, const char *, size_t, loff_t *); 776 ssize_t (*read) (struct file *, char *, size_t, loff_t *); 775 loff_t (*llseek) (struct file *, loff_t, int); 774 struct module *owner; 773 struct file_operations { 772 */ 771 * without the big kernel lock held in all filesystems. 770 * read, write, poll, fsync, readv, writev can be called 769 * NOTE: 768 /* ==================== include/linux/fs.h 768 790 ==================== Н೼ include/linux/fs.h Ё˖ ԚᰃϢ VFS П䯈ⱘ⬠䴶߭ᰃ᳝ᯢ⹂ᅮНⱘDŽ䖭Ͼ⬠䴶ⱘЏԧህᰃϔϾ file_operations ᭄᥂㒧ᵘˈ݊ᅮ 㒓ǃ↣ᴵ㒓ᑆҔМ⫼߭ᰃ᳝ᯢ⹂ᅮНⱘDŽৠḋˈϡৠⱘ᭛ӊ㋏㒳䗮䖛ϡৠⱘ⿟ᑣᴹᅲ⦄݊৘⾡ࡳ㛑ˈ ϔഫĀ᥹ষवāDŽϡৠⱘ᥹ষवϞ᳝ϡৠⱘ⬉ᄤ㒓䏃ˈԚᰃᅗӀϢᦦῑⱘ䖲᥹᳝޴ᴵڣⱘ᭛ӊ㋏㒳ህད བᵰᡞݙḌ↨ᢳЎ PC ᴎЁⱘĀ↡ᵓāˈᡞ VFS ↨ᢳЎĀ↡ᵓāϞⱘϔϾĀᦦῑāˈ䙷М↣Ͼ݋ԧ ᅗӀгᰃ Ext2 Ḑᓣⱘ᭛ӊϔḋDŽڣᮍᓣ䆓䯂䖭ѯ᭛ӊˈህད ৃҹᇚ DOS Ḑᓣⱘ⺕Ⲭ៪ߚऎ˄े᭛ӊ㋏㒳˅Āᅝ㺙āࠄ㋏㒳Ёˈ✊ৢ⫼᠋⿟ᑣህৃҹᣝᅠܼⳌৠⱘ 䳔݇ᖗ݋ԧⱘ᭛ӊሲѢҔМ᭛ӊ㋏㒳ҹঞ݋ԧ᭛ӊ㋏㒳ⱘ䆒䅵੠ᅲ⦄DŽ՟བˈ೼ Linux ᪡԰㋏㒳Ёˈ 420 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 9 struct dentry * root, * pwd, * altroot; 8 int umask; 7 rwlock_t lock; 6 atomic_t count; 5 struct fs_struct { ==================== include/linux/fs_struct.h 5 11 ==================== Ё˖ ⳟ fs_struct 㒧ᵘˈᅗⱘᅮН೼ include/linux/fs_struct.hܜfiles_struct ᭄᥂㒧ᵘˈᰃ݇ѢᏆᠧᓔ᭛ӊⱘֵᙃDŽ 䖭䞠᳝ϸϾᣛ䩜 fs ੠ filesˈϔϾᣛ৥ fs_struct ᭄᥂㒧ᵘˈᰃ݇Ѣ᭛ӊ㋏㒳ⱘֵᙃ˗঺ϔϾᣛ৥ 397 }; ==================== include/linux/sched.h 397 397 ==================== . . . . . . 378 struct files_struct *files; 377 /* open file information */ 376 struct fs_struct *fs; 375 /* filesystem information */ ==================== include/linux/sched.h 375 378 ==================== . . . . . . 277 struct task_struct { ==================== include/linux/sched.h 277 277 ==================== ℸ᳝݇ⱘ޴㸠ݡ߫ߎབϟ˄include/linux/sched.h˅˖ Ӏⳟ䖛䖭Ͼ᭄᥂㒧ᵘⱘᅮНˈԚᰃ䙷ᯊ׭ᗑ⬹њϢ᭛ӊ㋏㒳᳝݇ⱘݙᆍDŽ⦄೼ᡞ task_struct 㒧ᵘЁϢ 䖲᥹ⱘ file 㒧ᵘᖙ✊Ϣҷ㸼ⴔ䖯⿟ⱘ task_struct ᭄᥂㒧ᵘᄬ೼ⴔ㘨㋏DŽ೼Ā䖯⿟Ϣ䖯⿟䇗ᑺāϔゴЁ៥ 䖯⿟Ϣ᭛ӊⱘ䖲᥹ˈेĀᏆᠧᓔ᭛ӊāˈᰃ䖯⿟ⱘϔ乍Ā䋶ѻāˈᔦ݋ԧⱘ䖯⿟᠔᳝DŽҷ㸼ⴔ䖭⾡ ೒ 5.1 VFSϢ݋ԧ᭛ӊ㋏㒳ⱘ݇㋏⼎ᛣ೒ 421 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! 82 int (*d_compare) (struct dentry *, struct qstr *, struct qstr *); 81 int (*d_hash) (struct dentry *, struct qstr *); 80 int (*d_revalidate)(struct dentry *, int); 79 struct dentry_operations { ==================== include/linux/dcache.h 79 86 ==================== Ў՟ˈ݊ᅮН೼ include/linux/dcache.h Ё˖ file_operations 㒧ᵘЁ䙷ѯߑ᭄䙷Мᐌ⫼ˈ៪㗙ϡ䙷М᳝ĀⶹৡᑺāDŽҹ dentry_operations ڣぎ䯈˅ˈ᠔ҹϡ ᣛ䩜ˈԚᰃ䖭ѯߑ᭄໻໮াᰃ೼ᠧᓔ᭛ӊⱘ䖛⿟ЁՓ⫼ˈ៪㗙ҙ೼᭛ӊ᪡԰ⱘĀᑩሖāՓ⫼˄བߚ䜡 ᭄᥂㒧ᵘ੠Ϣ㋶ᓩ㡖⚍Ⳍ㘨㋏ⱘ inode_operations ᭄᥂㒧ᵘDŽ䖭ϸϾ᭄᥂㒧ᵘЁⱘݙᆍг䛑ᰃϔѯߑ᭄ ᵘˈᰃ಴Ў䰸ℸП໪䖬᳝ϔѯ݊ᅗⱘ᭄᥂㒧ᵘDŽ݊ЁЏ㽕ⱘ䖬᳝ϢⳂᔩ乍Ⳍ㘨㋏ⱘ dentry_operations ࠡ䴶៥Ӏ䇈㰮ᢳ᭛ӊ㋏㒳 VFS Ϣ݋ԧⱘ᭛ӊ㋏㒳П䯈ⱘ⬠䴶ⱘĀЏԧāᰃ file_operations ᭄᥂㒧 ㋏DŽ 㗠 inode 㒧ᵘ᠔ҷ㸼ⱘᰃ⠽⧚ᛣНϞⱘ᭛ӊˈ䆄ᔩⱘᰃ݊⠽⧚Ϟⱘሲᗻ˗ᅗӀП䯈ⱘ݇㋏ᰃ໮ᇍϔⱘ݇ ӊᯊⱘᴗ䰤гৃ㛑ϡৠDŽ᠔ҹˈdentry 㒧ᵘ᠔ҷ㸼ⱘᰃ䘏䕥ᛣНϞⱘ᭛ӊˈ䆄ᔩⱘᰃ݊䘏䕥ϞⱘሲᗻDŽ 㒧ᵘ᠔ᦣ䗄ⱘⳂᷛᰃϡৠⱘˈ಴ЎϔϾ᭛ӊৃ㛑᳝ད޴Ͼ᭛ӊৡˈ㗠䗮䖛ϡৠⱘ᭛ӊৡ䆓䯂ৠϔϾ᭛ 䗄䖭Ͼ᭛ӊ৘ᮍ䴶ⱘሲᗻˈ䙷ЎҔМϡĀড়ѠЎϔāˈ㗠㽕ĀϔߚЎѠāਸ਼˛݊ᅲˈdentry 㒧ᵘϢ inode ᣛ৥Ⳍᑨⱘ inode 㒧ᵘDŽ䇏㗙г䆌㽕䯂ˈ᮶✊ϔϾ᭛ӊⱘ dentry 㒧ᵘ੠ inode 㒧ᵘ䛑೼Ңϡৠⱘ㾦ᑺᦣ ҟ䋼Ϟⱘԡ㕂ϢߚᏗㄝֵᙃDŽৠᯊˈdentry 㒧ᵘЁ᳝Ͼ inode 㒧ᵘᣛ䩜 d_inodeټ䞠䴶䆄ᔩⴔ᭛ӊ೼ᄬ ↣Ͼ᭛ӊ䰸᳝ϔϾĀⳂᔩ乍āे dentry ᭄᥂㒧ᵘҹ໪ˈ䖬᳝ϔϾĀ㋶ᓩ㡖⚍āे inode ᭄᥂㒧ᵘˈ ⡍ᅮⱘ᭛ӊˈ᳈ϡϧሲѢᶤϾ⡍ᅮⱘϞϟ᭛DŽ 㗠ᓎゟ䍋໮Ͼ䇏ݭϞϟ᭛DŽৠ⧚ˈ↣⾡᭛ӊ㋏㒳া᳝ϔϾ file_operations ᭄᥂㒧ᵘˈᅗ᮶ϡϧሲѢᶤϾ ЎϔϾ᭛ӊা᳝ϔϾ dentry ᭄᥂㒧ᵘˈ㗠ৃ㛑᳝໮Ͼ䖯⿟ᠧᓔᅗˈ⫮㟇ৠϔϾ䖯⿟гৃ㛑໮⃵ᠧᓔᅗ ЎҔМϡᑆ㛚ᡞ᭛ӊⱘ dentry 㒧ᵘᬒ೼ file 㒧ᵘ䞠䴶ˈ㗠াᰃ䅽 file 㒧ᵘ䗮䖛ᣛ䩜ᴹᣛ৥ᅗਸ਼˛䖭ᰃ಴ file_operations ᭄᥂㒧ᵘDŽৠᯊˈfile 㒧ᵘЁ䖬᳝Ͼᣛ䩜 f_dentryˈᣛ৥䆹᭛ӊⱘ dentry ᭄᥂㒧ᵘDŽ䙷Мˈ 㒧ᵘ೼᭄㒘ЁⱘϟᷛDŽབࠡ᠔䗄ˈ↣Ͼ file 㒧ᵘЁ᳝Ͼᣛ䩜 f_opˈᣛ৥䆹᭛ӊ᠔ሲ᭛ӊ㋏㒳ⱘ ᠧᓔϔϾ᭛ӊҹৢˈ䖯⿟ህ䗮䖛ϔϾĀᠧᓔ᭛ӊো”fid ᴹ䆓䯂䖭Ͼ᭛ӊˈ㗠 fid ᅲ䰙ϞህᰃⳌᑨ file Ϣ݋ԧᏆᠧᓔ᭛ӊ᳝݇ⱘֵᙃ೼ file 㒧ᵘЁˈ㗠 files_struct 㒧ᵘⱘЏԧህᰃϔϾ file 㒧ᵘ᭄㒘DŽ↣ ࠡᎹ԰Ⳃᔩ೼ DOS ᭛ӊ㋏㒳Ёˈ㗠ϔϾ݋ԧⱘᏆᠧᓔ᭛ӊैৃ㛑ᰃ䆒໛᭛ӊDŽ ݋ԧⱘ䖯⿟㗠㿔˅ˈ㗠Ϣ݋ԧⱘᏆᠧᓔ᭛ӊ≵᳝ҔМ݇㋏DŽ՟བ䖯⿟ⱘḍⳂᔩ೼ Ext2 ᭛ӊ㋏㒳Ёˈᔧ ⱘ vfsmount ᭄᥂㒧ᵘDŽ⊼ᛣˈfs_struct 㒧ᵘЁⱘֵᙃ䛑ᰃϢ᭛ӊ㋏㒳੠䖯⿟᳝݇ⱘˈᏺ᳝ܼሔᗻⱘ˄ᇍ ⱘ᪡԰Ё䖭ѯᅝ㺙⚍䍋ⴔ䞡㽕ⱘ԰⫼㗠ᐌᐌ㽕⫼ࠄˈ᠔ҹৢϝϾᣛ䩜ህ৘㞾ᣛ৥ҷ㸼ⴔ䖭ѯĀᅝ㺙” 㡖⚍Ϟⱘ Ext2 ᭛ӊ㋏㒳Ёˈ㗠ᔧࠡᎹ԰Ⳃᔩ߭ৃ㛑ᅝ㺙Ѣ/dosc ⱘϔϾ DOS ᭛ӊ㋏㒳ЁDŽ೼᭛ӊ㋏㒳 Ӯ䆆ࠄDŽᅲ䰙䖤㸠ᯊ䖭ϝϾⳂᔩϡϔᅮ䛑೼ৠϔϾ᭛ӊ㋏㒳Ёˈ՟བ䖯⿟ⱘḍⳂᔩ䗮ᐌᰃᅝ㺙Ѣ“/” ᔧ⫼᠋ⱏᔩ䖯ܹ㋏㒳ᯊ᠔ĀⳟࠄāⱘḍⳂᔩ˗㟇Ѣ altroot ߭Ў⫼᠋䆒㕂ⱘĀ᳓ᤶḍⳂᔩāˈ៥Ӏҹৢ䖬 Ё pwd ߭ᣛ৥䖯⿟ᔧࠡ᠔೼ⱘⳂᔩ˗㗠 root ᠔ᣛ৥ⱘ dentry 㒧ᵘҷ㸼ⴔᴀ䖯⿟ⱘĀḍⳂᔩāˈ䙷ህᰃ 㸼ⴔϔϾĀⳂᔩ乍āⱘ dentry ᭄᥂㒧ᵘˈ䞠䴶䆄ᔩⴔ᭛ӊⱘ৘乍ሲᗻˈབ᭛ӊৡǃ䆓䯂ᴗ䰤ㄝㄝDŽ݊ 㒧ᵘЁ᳝݁Ͼᣛ䩜DŽࠡϝϾᰃ dentry 㒧ᵘᣛ䩜ˈህᰃ rootǃpwd ҹঞ altrootDŽ䖭ѯᣛ䩜৘㞾ᣛ৥ҷ struct vfsmount * rootmnt, * pwdmnt, * altrootmnt; 10 422 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ೒ 5.2 Linux᭛ӊ㋏㒳䘏䕥㒧ᵘ೒ 㟇Ѣ䖭ᐙ೒Ё৘Ͼ᭄᥂㒧ᵘⱘߚ䜡Ϣ䆒㕂ˈгህᰃ䖭ᐙ೒ⱘᵘㄥˈ߭Ӯ೼Āᠧᓔ᭛ӊāϔ㡖Ёভ䗄DŽ DŽ៥ӀྥϨ⿄ПЎĀ᭛ӊ㋏㒳䘏䕥㒧ᵘ೒āˈҹৢ䇏㗙ᇚ䳔㽕ড໡ഄಲ䖛ᴹⳟ䖭ᐙ㒧ᵘ⼎ᛣ೒DŽމ෎ᴀᚙ 㱑✊៥Ӏᇮ᳾⏅ܹഄ䆆䗄᭛ӊ㋏㒳ⱘݙ䚼㒧ᵘ੠᪡԰ˈ䇏㗙ৃҹҢ೒ 5.2 ⳟࠄ᭛ӊ㋏㒳ݙ䚼㒧ᵘⱘ կ䖭ѯ᭄᥂㒧ᵘˈৢ䴶៥Ӏ䖬㽕⏅ܹ䅼䆎DŽ dentry_operationsǃinode_operationsˈ䖬᳝݊ᅗ˄ℸ໘᱖Ң⬹˅DŽॳ߭Ϟ↣⾡᭛ӊ㋏㒳䛑ᖙ乏೼ݙḌЁᦤ ᘏПˈ݋ԧ᭛ӊ㋏㒳Ϣ㰮ᢳ᭛ӊ㋏㒳 VFS 䯈ⱘ⬠䴶ᰃϔ㒘᭄᥂㒧ᵘˈࣙᣀ file_operationsǃ 䆌˗᳝ⱘᬃᣕ∝ᄫˈ᳝ⱘ߭ϡᬃᣕDŽ᠔ҹˈ᭛ӊৡⱘ↨ᇍ⹂ᅲ಴᭛ӊ㋏㒳㗠ᓖDŽ ⱘ䭓ᑺ䰤Ѣ 8 Ͼᄫヺˈ᳝ⱘ߭ৃҹ᳝ 255 Ͼᄫヺ˗᳝ⱘ᭛ӊ㋏㒳ᆍ䆌೼᭛ӊৡ䞠᳝ぎḐˈ᳝ⱘ߭ϡᆍ ᇍህᰃᄫヺІⱘ↨ᇍˈ䖭䲒䘧гᰃ಴᭛ӊ㋏㒳㗠ᓖ˛݊ᅲϡ䎇Ў༛ˈᛇϔᛇ˖᳝ⱘ᭛ӊ㋏㒳Ё᭛ӊৡ ᪡԰DŽ䖬᳝Ͼߑ᭄ᣛ䩜 d_compare ᕜ᳝ᛣᗱˈᅗ⫼Ѣ᭛ӊৡⱘ↨ᇍDŽ䇏㗙ৃ㛑ᛳࠄ༛ᗾˈ᭛ӊৡⱘ↨ 䖭䞠 d_delete ᰃᣛ৥݋ԧ᭛ӊ㋏㒳ⱘĀߴ䰸᭛ӊā᪡԰ⱘܹষߑ᭄ˈd_release ߭⫼ѢĀ݇䯁᭛ӊ” 86 }; 85 void (*d_iput)(struct dentry *, struct inode *); 84 void (*d_release)(struct dentry *); int (*d_delete)(struct dentry *); 83 423 424 䙷МˈLinux ࠄᑩᬃᣕાϔѯ݋ԧⱘ᭛ӊ㋏㒳ਸ਼˛᭄᥂㒧ᵘ inode Ё᳝ϔϾ៤ߚ uˈᰃϔϾ unionDŽ ḍ᥂݋ԧ᭛ӊ㋏㒳ⱘϡৠˈৃҹᇚ䖭Ͼ union 㾷䞞៤ϡৠⱘ᭄᥂㒧ᵘDŽ՟བˈᔧ inode ᠔ҷ㸼ⱘ᭛ӊᰃ Ͼᦦষ˄socket˅ᯊˈu ህ⫼԰ socket ᭄᥂㒧ᵘ˗ᔧ inode ᠔ҷ㸼ⱘ᭛ӊሲѢ Ext2 ᭛ӊ㋏㒳ᯊˈu ህ⫼ ԰ Ext2 ᭛ӊ㋏㒳ⱘ䆺㒚ᦣ䗄㒧ᵘ ext2_inode_infoDŽ᠔ҹˈⳟϔϟᇍ䖭Ͼ union ⱘᅮНህৃҹⳟߎ Linux Ⳃࠡᬃᣕ໮ᇥ⾡᭛ӊ㋏㒳ˈ䖭ϾᅮНᰃ inode ᭄᥂㒧ᵘᅮНⱘϔ䚼ߚˈ೼᭛ӊ include/linux/fs.h Ё˖ ==================== include/linux/fs.h 433 460 ==================== 433 union { 434 struct minix_inode_info minix_i; 435 struct ext2_inode_info ext2_i; 436 struct hpfs_inode_info hpfs_i; 437 struct ntfs_inode_info ntfs_i; 438 struct msdos_inode_info msdos_i; 439 struct umsdos_inode_info umsdos_i; 440 struct iso_inode_info isofs_i; 441 struct nfs_inode_info nfs_i; 442 struct sysv_inode_info sysv_i; 443 struct affs_inode_info affs_i; 444 struct ufs_inode_info ufs_i; 445 struct efs_inode_info efs_i; 446 struct romfs_inode_info romfs_i; 447 struct shmem_inode_info shmem_i; 448 struct coda_inode_info coda_i; 449 struct smb_inode_info smbfs_i; 450 struct hfs_inode_info hfs_i; 451 struct adfs_inode_info adfs_i; 452 struct qnx4_inode_info qnx4_i; 453 struct bfs_inode_info bfs_i; 454 struct udf_inode_info udf_i; 455 struct ncp_inode_info ncpfs_i; 456 struct proc_inode_info proc_i; 457 struct socket socket_i; 458 struct usbdev_inode_info usbdev_i; 459 void *generic_ip; 460 } u; ݊Ё᳝ѯ៤ߚᰃϡ㿔㞾ᯢⱘˈབ ext2ǃmsdos ㄝˈԚᰃ໮᭄䛑䳔㽕ϔ⚍ㅔⷁⱘ䇈ᯢ˖ hpfs——IBM Ў PC ᓔথⱘ OS/2 ᪡԰㋏㒳᠔䞛⫼ⱘ᭛ӊ㋏㒳DŽ䖭⾡Ḑᓣা⫼Ѣ⹀Ⲭˈ㗠 OS/2 ᠔⫼ ⱘ䕃Ⲭ߭Ϣ msdos ⳌৠDŽ ntfs——Windows NT ⱘ᭛ӊ㋏㒳DŽ umsdos——ϔ⾡⡍⅞ⱘĀ᭛ӊ㋏㒳āˈ⫼ msdos ᭛ӊ㋏㒳ᴹ῵ᢳ Ext2 ᭛ӊ㋏㒳DŽ݊ད໘ᰃৃҹ೼ ⺕ⲬϞⱘ DOS ߚऎⳈ᥹䖤㸠 Linuxˈ㗠ϡ䳔㽕ܜ䞡ᮄऎᑊḐᓣ࣪DŽണ໘ᔧ✊гϡᇥˈ佪ܜᰃ䰡Ԣњ䖤 㸠ⱘ䗳ᑺˈ㗠Ϩ䖭ḋϔᴹህᇍ DOS ᭛ӊ㋏㒳ⱘ⮙↦༅এњĀܡ⭿࡯āDŽ isofs——⫼Ѣ CDROM˄ܝⲬ˅DŽ nfs——“㔥㒰᭛ӊ㋏㒳”NFSDŽ Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ߫ߎњ proc Ϣ socketˈԚᰃ⫼ᴹᅲ⦄Āੑৡㅵ䘧āⱘ঺ϔ⾡⡍⅞᭛ӊ FIFO ህ≵᳝೼䖭䞠ऩ⣀ഄ߫ߎDŽ 䖭ѯ䆒໛䛑ϡ㽕∖ᇚ inode 㒧ᵘЁⱘ䖭Ͼ union ԰ϡৠⱘ㾷䞞DŽৠḋⱘ䘧⧚ˈᇍĀ⡍⅞᭛ӊāˈ䖭䞠া Āഫ䆒໛āজᗢḋ˛Āᄫヺ䆒໛āজᗢḋ˛Ā㔥㒰䆒໛āজᗢḋ˛ЎҔМ䖭䞠䛑≵᳝ਸ਼˛ॳ಴ህ೼Ѣ ⾡ϡৠⱘ᭛ӊ㋏㒳DŽৃᰃˈ೼䖭Ͼ union ⱘᅮНЁैা߫ߎњ usbdev ԰Ўϔ⾡⣀ゟⱘ᭛ӊ㋏㒳ˈ䙷М 䖬᳝ˈ೼ Linux ㋏㒳Ё໪䚼䆒໛ᰃ㾚ৠ᭛ӊⱘˈ᠔ҹҢὖᗉϞ䆆↣⾡ϡৠⱘ໪䚼䆒໛ህⳌᔧѢϔ ৠⱘ file_operations 㒧ᵘDŽ ᰃ⫼԰ㅵ䘧ⱘ᭛ӊህḍ᥂䇏ǃݭᴗ䰤ⱘϡৠ㗠᳝ϔϾϡܝˈЁৃҹⳟࠄˈህ೼ Ext2 ᭛ӊ㋏㒳ⱘḚᶊЁ ↣Ͼ file_operations 㒧ᵘ䛑ҷ㸼ⴔϔϾϡৠⱘ᭛ӊ㋏㒳ህϡ⹂ߛњDŽ䇏㗙ҹৢ೼㄀ 6 ゴⱘĀㅵ䘧āϔ㡖 䴶ˈ㱑✊ॳ߭Ϟ↣Ͼ᭛ӊ㋏㒳䛑᳝݊㞾䑿ⱘߑ᭄䏇䕀㸼ˈे file_operations ᭄᥂㒧ᵘˈԚᰃড䖛ᴹ䇈ˈ ៪“man fsāᆳⳟ˅ˈԚᰃ⬅Ѣᅗ೼ inode 㒧ᵘЁᇍ u ⱘ㾷䞞Ϣ Ext2 Ⳍৠˈ䖭䞠ህⳟϡߎᴹњDŽ঺ϔᮍ থሩࠄ Ext2˄㸼⼎ ext ㄀ 2 ⠜˅ˈ⦄೼ⱘ Linux ݙḌг䖬ᬃᣕ䖭Ͼ᭛ӊ㋏㒳˄䇏㗙ৃҹ⫼ੑҸ“man mount” В՟ᴹ䇈ˈᮽᳳ Linux ᳒ᓔথ੠Փ⫼њ঺ϔϾ᭛ӊ㋏㒳 ext˄䖬᳝ϔϾ᭛ӊ㋏㒳ি Xiafs˅ˈৢᴹᠡ Ң䖭ϾᅮНЁড᯴ߎᴹњDŽ ᵘЁ䖭ϔ䚼ߚぎ䯈ⱘϡৠ⫼⊩੠㾷䞞Ў෎⸔ⱘDŽབᵰϸ⾡᭛ӊ㋏㒳ᇍ䖭Ͼ union ⱘ㾷䞞Ⳍৠˈ䙷ህϡ㛑 䖭Ͼ union ⱘᅮНাᰃ໻㟈ഄড᯴њ Linux ݙḌⳂࠡ᠔ᬃᣕⱘ৘⾡᭛ӊ㋏㒳ˈ಴Ў䖭ᰃҹ inode 㒧 usbdev——“䗮⫼І㸠ᘏ㒓”USB ⱘ偅ࡼ⿟ᑣDŽ proc——Ⳃᔩ/proc ϟⱘ⡍⅞᭛ӊˈ䖭ѯ᭛ӊᇍ㋏㒳ㅵ⧚੠⿟ᑣ䇗䆩䛑ᕜ᳝⫼໘DŽ ncpfs——Novell NetWare ᭛ӊ㋏㒳DŽ Ⲭˈгৃ⫼Ѣ⹀ⲬDŽܝudf——“Universal Disk Fileā᳔ᮄⱘĀ䗮⫼᭛ӊ㋏㒳āˈ᮶⫼Ѣ DVD ੠ৃݭ bfs——⫼Ѣ SCO Unix Ware V ⱘϔ⾡᭛ӊ㋏㒳 qnx4——QNX ᰃϔϾ᪡԰㋏㒳ˈᐌ⫼ѢĀጠܹᓣā㋏㒳ˈ݊᭛ӊ㋏㒳Ў QNXFSDŽ ㋏㒳ेЎ“Acorn Disk Filing SystemāDŽ adfs——Acorn ݀ৌᓔথњϔ⾡෎Ѣ ARM ໘⧚఼ⱘ RISC PCˈ݊᪡԰㋏㒳⿄Ў RISCOSˈ㗠᭛ӊ hfs——Apple Macintosh ⱘ᭛ӊ㋏㒳DŽ smbfs——े sambaˈՓ Win 95ǃWin NT ㄝ㋏㒳ৃҹ䗮䖛㔥㒰䆓䯂 Linux ᭛ӊ㋏㒳DŽ coda——гᰃϔ⾡㔥㒰᭛ӊ㋏㒳ˈᰃᇍ nfs ⱘϔ⾡ᬍ䖯DŽ ѢϔѯĀጠܹāᓣ㋏㒳DŽ ḌЁ㑺ऴ 30K ᄫ㡖ˈnfs ᭛ӊ㋏㒳㑺ऴ 57K ᄫ㡖ˈ㗠 romfs াऴϔϾ义䴶DŽ᠔ҹˈromfs ᐌᐌ↨䕗䗖ড় ㄝDŽ䖬᳝ϔϾ⡍⚍ᰃ䖭⾡᭛ӊ㋏㒳ⱘᅲ⦄೼ݙḌЁ᠔ऴⱘĀഄⲬāᕜᇣDŽ՟བˈmsdos ᭛ӊ㋏㒳೼ݙ romfs——“া䇏ā᭛ӊ㋏㒳DŽ乒ৡᗱНˈ䖭⾡᭛ӊ㋏㒳ৃҹᓎゟ೼া䇏ҟ䋼Ϟˈབ EPROMǃPROM efs——Silicon Graphics ⱘ IRIX ᭛ӊ㋏㒳DŽ ᠔ҹ⿄Ў ufsDŽ SolarisǃFreeBSDǃNetBSDǃOpenBSD ҹঞ Nextstep ㄝㄝ˅⠜ᴀˈ಴㗠ᅲ䰙Ϟ៤Ўњ“Unix File Systemāˈ ufs——䖭ᰃ FFS ⱘ঺ϔ⾡ᅲ⦄ˈᑓ⊯䗖⫼Ѣ BSD ⱘ৘⾡⠜ᴀҹঞ৘⾡ Unix ব⾡˄བ SunOSǃ Amiga ݀ৌ೼݊᪡԰㋏㒳 AmigaOS Ё䞛⫼њ䖭⾡᭛ӊ㋏㒳ˈ᠔ҹ⿄Ў affsDŽ affs——BSD ᇍ S5FS ԰њᕜ໻ⱘᬍ䖯ˈᬍ䖯ৢⱘ᭛ӊ㋏㒳⿄ЎĀᖿ䗳᭛ӊ㋏㒳”FFSDŽ⬅Ѣᔧᯊ sysv——Unix ㋏㒳 V ⱘ᭛ӊ㋏㒳 S5FSDŽ 425 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ټ᭛ӊāˈ䖭ህᰃ⣁НⱘǃᴀᴹᛣНϞⱘĀ᭛ӊāˈ䗮ᐌҹ⺕ⲬЎᄬټ⺕Ⲭ᭛ӊг䆌ᑨ䆹⿄ЎĀᄬ 5.1.1 ⺕Ⲭ᭛ӊ ೒Ё᳝ϝ⾡ϡৠ㉏ൟⱘ᭛ӊDŽ ೒ 5.3 Linux᭛ӊ㋏㒳ⱘሖ⃵㒧ᵘ ᥂DŽϟ߫⼎ᛣ೒˄೒ 5.3˅㸼⼎њϡৠ᭛ӊⱘ䖭⾡ሖ⃵㒧ᵘDŽ ⱘ᭄᥂ⱘϸ⾡ϡৠ㾖⚍ˈϔ⾡ᰃᡞᅗⳟ៤᳝㒧ᵘǃ᳝㒘㒛ⱘ᭄᥂ˈ㗠঺ϔ⾡߭ᡞᅗⳟ៤㒓ᗻぎ䯈ⱘ᭄ ໛Ā᭛ӊāϢ԰Ў᭛ӊ㋏㒳Āᑩሖāⱘ⺕ⲬĀ䆒໛āজ᳝ҔМऎ߿ਸ਼˛䖭ᅲ䰙Ϟড᯴њᇍ⺕Ⲭҟ䋼Ϟ ໛偅ࡼāDŽৃᰃˈ᮶✊䆒໛˄ᅲ䰙Ϟᰃ䆒໛偅ࡼ˅гᰃ㾚԰᭛ӊⱘˈ䙷М԰ЎϢ᭛ӊĀᑇ㑻āⱘ⺕Ⲭ䆒 䛑㽕䕀࣪Ўᇍᶤϔ䚼ߚ⺕Ⲭҟ䋼ⱘ᪡԰ˈ᠔ҹҢሖ⃵ⱘ㾖⚍ᴹⳟˈ೼Ā᭛ӊ㋏㒳āҹϟ䖬᳝ϔሖĀ䆒 ᔧϔϾ᭛ӊ㋏㒳ҷ㸼ⴔ⺕Ⲭ˄៪݊ᅗҟ䋼˅Ϟᣝ⡍ᅮḐᓣ㒘㒛ⱘ᭛ӊᯊˈᇍ↣Ͼ᭛ӊⱘ᪡԰᳔㒜 㒧ᵘ߭ডᑨњᅗӀ೼ㅫ⊩˄᪡԰˅ϞⱘϡৠDŽ ᠔ҹˈinode 㒧ᵘЁⱘ䖭Ͼ union ড᯴њ৘⾡᭛ӊ㋏㒳೼䚼ߚ᭄᥂㒧ᵘϞⱘϡৠˈ㗠 file_operations 426 w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ⳟ inode ⱘᅮН˄include/linux/fs.h˅˖ܜ໘៥Ӏাᇍ㒧ᵘЁⱘᶤѯ៤ߚ԰ϔѯҟ㒡DŽ Ё৘Ͼ៤ߚⱘᯊ׭ˈ䇏㗙ҹৢ䖬ᕫড໡ಲ䖛ᴹⳟᅗⱘᅮНDŽ᳔ৢˈ䇏㗙㞾Ꮕህ㛑԰ߎ䖭⾡㾷䞞њDŽℸ inode ⱘᅮН೼᭛ӊ include/linux/fs.h Ё㒭ߎˈ៥Ӏᡞᅗ߫೼䖭䞠ˈԚᰃ⦄೼䖬ϡᰃ᳝㋏㒳ഄ㾷䞞㒧ᵘ ㄝᖙ㽕ⱘֵᙃDŽ᭄᥂㒧ᵘܓⴔ໮ᇥ᭄᥂ǃ䖭ѯ᭄᥂೼ҔМഄᮍҹঞ݊ϟሖⱘ偅ࡼ⿟ᑣ೼ાټӊЁᄬ ϔᅮ㽕䗮䖛ᅗⱘ㋶ᓩᠡ㛑ⶹ䘧䖭Ͼ᭛ӊᰃҔМ㉏ൟⱘ᭛ӊ˄՟བˈᰃ৺䆒໛᭛ӊ˅ǃᰃᗢḋ㒘㒛ⱘǃ᭛ ӊ䛑᳝ϔϾ inodeDŽ᠔䇧 inodeˈгህᰃĀ㋶ᓩ㡖⚍ā˄៪⿄“i 㡖⚍ā˅ⱘᛣᗱDŽ㽕Ā䆓䯂āϔϾ᭛ӊᯊˈ ϝ⾡ϡৠ㉏ൟⱘ᭛ӊ᳝ϔϾ݅ৠ⚍ˈ䙷ህᰃᅗӀ䛑᳝ϔѯ݇Ѣ㒘㒛੠ㅵ⧚ⱘֵᙃDŽ಴ℸˈ↣Ͼ᭛ 㒡঺ϔ⾡䞡㽕ⱘ⡍⅞᭛ӊˈ䙷ህᰃ೼“/procāⳂᔩϟⱘϔ㋏߫᭛ӊDŽ ⡍߿ᰃĀੑৡㅵ䘧āⱘ FIFO ᭛ӊˈ䖬᳝ Unix ඳⱘ socketˈг䛑ሲѢ⡍⅞᭛ӊDŽ೼ᴀゴЁ៥Ӏ䖬㽕ҟ Ⳃᔩϟˈᅲ䋼ϞैᰃϔϾ⡍⅞᭛ӊDŽ䇏㗙೼Ā䖯⿟䯈䗮ֵāϔゴЁᇚӮⳟࠄ⫼ᴹᅲ⦄Āㅵ䘧āⱘ᭛ӊˈ ӊˈ޵ᰃݭܹ䖭Ͼ᭛ӊⱘ᭄᥂ܼ䚼䛑㹿϶ᓗњˈḍᴀህϢ໪䚼䆒໛᮴݇DŽ᠔ҹ䖭Ͼ᭛ӊ㱑✊೼“/dev” ⱘˈ៪㗙ҢݙᄬЁᬊ䲚ǃࡴᎹߎᴹⱘˈডПѺ✊DŽҢ䖭ϾᛣНϞ䇈ˈ᭛ӊ“dev/nullāህᰃϔϾ⡍⅞᭛ ҹঞ CPU ᴀ䑿DŽᔧҢϔ⡍⅞᭛ӊĀ䇏āᯊˈ᠔䇏ߎⱘ᭄᥂䛑ᰃ⬅㋏㒳ݙ䚼ᣝϔᅮⱘ㾘߭Јᯊ⫳៤ߎᴹ Ⳃᔩ乍DŽϢࠡϸ⾡᭛ӊЏ㽕ⱘϡৠᰃ˖⡍⅞᭛ӊϔ㠀䛑Ϣ໪䚼䆒໛᮴݇ˈ᠔⍝ঞⱘҟ䋼䗮ᐌህᰃݙᄬ ҟ䋼Ϟ᳝㋶ᓩ㡖⚍੠ټ⡍⅞᭛ӊ೼ݙᄬЁг᳝ inode ᭄᥂㒧ᵘ੠ dentry ᭄᥂㒧ᵘˈԚᰃϡϔᅮ೼ᄬ 5.1.3 ⡍⅞᭛ӊ 䖛䆒໛᥹ষϞⱘϔϾĀ᥻ࠊˋ⢊ᗕᆘᄬ఼ā䖯㸠ˈ݋ԧৃখⳟᴀкϟݠĀ䆒໛偅ࡼāϔゴDŽ ⱘ㒧ড়DŽᅲ䰙ϞˈϡㅵҔМ䆒໛ˈ೼᪡԰ⱘ䖛⿟Ёᘏ㽕Ԉ䱣ⴔϔᅮ⿟ᑺⱘ᭄᥂䞛䲚੠᥻ࠊˈ䗮ᐌ䛑䗮 ⫼Ѣ᥹ᬊˋথ䗕ⱘ˄བ㔥㒰व˅ˈ䖬ৃҹᰃկ䞛䲚ˋ᥻ࠊⱘ˄བϔѯᴎ⬉䆒໛˅ˈ⫮㟇ৃҹᰃ᭄⾡㉏ൟ 䇏ߎⱘ˄བ⺕Ⲭ˅ˈгৃҹᰃˋټⴔⱘ᭄᥂DŽḍ᥂䆒໛㉏ൟ੠ᗻ䋼ⱘϡৠˈᅗৃҹᰃ⫼Ѣᄬټϔᅮ᳝ᄬ ҟ䋼Ϟⱘ㋶ᓩ㡖⚍੠Ⳃᔩ乍ˈԚᰃैϡټ䆒໛᭛ӊৠḋࣙ৿᳝⫼Ѣ㒘㒛੠ㅵ⧚ⱘֵᙃˈৠḋ᳝ᄬ 5.1.2 䆒໛᭛ӊ Ⳃᔩ㡖⚍ᅲ䰙Ϟᰃϔ⾡⡍⅞ᔶᓣ੠⫼䗨ⱘ᭛ӊDŽ ᇚা݇ᖗ Ext2 ᭛ӊ㋏㒳DŽ⺕ⲬϞⱘⳂᔩ乍г↨ݙᄬЁⱘ dentry 㒧ᵘㅔऩˈᄬ೼Ѣ᠔䇧ĀⳂᔩ㡖⚍āЁDŽ ⚍DŽेՓ᭛ӊЁ᱖ᯊ≵᭄᳝᥂ˈ݊㋶ᓩ㡖⚍ᘏᰃᄬ೼ⱘDŽ䖭㉏᭛ӊᰃᴀゴⱘ䞡⚍ˈϡ䖛៥Ӏ೼ᴀкЁ inode 㒧ᵘ䙷ḋӮ಴ᮁ⬉㗠Ā᣹থāˈгϡऴ⫼ݙᄬぎ䯈DŽ↣Ͼ᭛ӊ䛑᳝ϔϾᑊϨা᳝ϔϾ㋶ᓩ㡖 ڣϡ inode ᭄᥂㒧ᵘⳌԐˈԚ᳝᠔ϡৠˈৃҹⳟ៤ᰃ inode 㒧ᵘⱘㅔ࣪њⱘ⠜ᴀDŽϡ䖛ˈ⺕ⲬϞⱘ㋶ᓩ㡖⚍ ೼᭛ӊⱘĀ㋶ᓩ㡖⚍ā੠ĀⳂᔩ乍āЁDŽ⺕ⲬϞⱘ㋶ᓩ㡖⚍Ϣࠡ䗄˄ᄬ೼ѢݙᄬЁ˅ⱘټֵᙃЏ㽕ᄬ ೼Ā᭛ӊ㋏㒳āЁˈгህᰃ⺕ⲬϞDŽ݊ЁϢ㒘㒛੠ㅵ⧚᳝݇ⱘټⲬ᭛ӊᴹ䇈ˈ䖭ϸ⾡ֵᙃᖙᅮܼ䛑ᄬ ⱘ᭄᥂ᴀ䑿ˈ䖬᳝ϔ䚼ߚህᰃ᳝݇䆹᭛ӊⱘ㒘㒛੠ㅵ⧚ⱘֵᙃDŽᇍѢ⺕ټ৿њϸᮍ䴶ⱘֵᙃˈϔᰃᄬ ೼ҟ䋼Ϟⱘֵᙃˈ᠔ҹϔϾĀ᭛ӊā݊ᅲࣙټ῵ᢳ⺕Ⲭҟ䋼DŽ᠔䇧Ā᭛ӊāˈህᰃᣝϔᅮⱘ㒘㒛ᔶᓣᄬ ҟ䋼ˈԚгৃ㛑䞛⫼݊ᅗҟ䋼DŽ՟བˈromfs ህᰃ䞛⫼ EPROM П㉏ⱘҟ䋼ˈ㗠 RAMDISK ߭೼ݙᄬЁ 427 428 ==================== include/linux/fs.h 387 435 ==================== 387 struct inode { 388 struct list_head i_hash; 389 struct list_head i_list; 390 struct list_head i_dentry; 391 392 struct list_head i_dirty_buffers; 393 394 unsigned long i_ino; 395 atomic_t i_count; 396 kdev_t i_dev; 397 umode_t i_mode; 398 nlink_t i_nlink; 399 uid_t i_uid; 400 gid_t i_gid; 401 kdev_t i_rdev; 402 loff_t i_size; 403 time_t i_atime; 404 time_t i_mtime; 405 time_t i_ctime; 406 unsigned long i_blksize; 407 unsigned long i_blocks; 408 unsigned long i_version; 409 struct semaphore i_sem; 410 struct semaphore i_zombie; 411 struct inode_operations *i_op; 412 struct file_operations *i_fop; /* former •>i_op•>default_file_ops */ 413 struct super_block *i_sb; 414 wait_queue_head_t i_wait; 415 struct file_lock *i_flock; 416 struct address_space *i_mapping; 417 struct address_space i_data; 418 struct dquot *i_dquot[MAXQUOTAS]; 419 struct pipe_inode_info *i_pipe; 420 struct block_device *i_bdev; 421 422 unsigned long i_dnotify_mask; /* Directory notify events */ 423 struct dnotify_struct *i_dnotify; /* for directory notifications */ 424 425 unsigned long i_state; 426 427 unsigned int i_flags; 428 unsigned char i_sock; 429 430 atomic_t i_writecount; 431 unsigned int i_attr_flags; 432 __u32 i_generation; 433 union { 434 struct minix_inode_info minix_i; Click to buy NOW! PDF•XC H ANGE w w w . d o c u • t r a c k.com w . d o c u • t r a c k.com w w PDF•XC H ANGE Click to buy NOW! ==================== include/linux/ext2_fs.h 214 269 ==================== ໻㟈Ϟ↨䕗ϔϟᅗϢ inode 㒧ᵘⱘᓖৠDŽܜ䖭ᰃ೼ include/linux/ext2_fs.h ЁᅮНⱘˈ䇏㗙ϡོ ⱘਸ਼˛䖭㽕ⳟ݋ԧⱘ᭛ӊ㋏㒳㗠ᅮDŽህ Linux ᴀ䑿ⱘ᭛ӊ㋏㒳 Ext2 㗠㿔ˈ䙷ህᰃ ext2_inode ᭄᥂㒧ᵘˈ 䙷Мˈ᮶✊া㽕ᡞ inode 㒧ᵘЁⱘ䚼ߚֵᙃֱᄬ೼⺕ⲬϞⱘĀ㋶ᓩ㡖⚍āЁˈ䖭ѯ㡖⚍জᰃҔМḋ ⺕Ⲭⱘ䆒໛偅ࡼ⿟ᑣᇚ݊᠔೼ⱘ䆄ᔩഫ䇏ܹݙᄬЁDŽ Џ㽕ߚ៤ϸ䚼ߚˈϔ䚼ߚ⫼Ѣ㋶ᓩ㡖⚍ˈϔ䚼ߚ⫼Ѣ᭛ӊⱘ᭄᥂DŽ㒭ᅮϔϾ㋶ᓩ㡖⚍োˈህৃҹ䗮䖛 ҹৢ䇏㗙䖬Ӯⳟࠄˈ⺕ⲬⱘḐᓣ࣪г㗗㰥ࠄњ䖭Ͼ䯂乬DŽҹ Ext2 ḐᓣЎ՟ˈ⺕ⲬϞⱘ䆄ᔩഫ˄᠛ऎ˅ ሖЎㅵ⧚ⱘⳂⱘ೼⺕ⲬϞֱᄬ੠ᘶ໡ inode 㒧ᵘ˄ҹঞ݊ᅗϔѯ᭄᥂㒧ᵘˈབ dentry˅᠔䳔ⱘϔѯֵᙃDŽ ೼ࠡ䴶ⱘ᭛ӊ㋏㒳ሖ⃵೒˄೒ 5.3˅Ёˈᅲ䰙ϞҢ VFS ੠⺕Ⲭҟ䋼П䯈䖬ᑨ䆹ࡴϞϔᴵ䖲㒓ˈ㸼⼎ VFS ϡ݋᭄᳝᥂䚼ߚⱘ䆒໛᭛ӊ੠⡍⅞᭛ӊгᰃᖙ䳔ⱘ˄া᳝ᇥ᭄⡍⅞᭛ӊ՟໪ˈབ᮴ৡㅵ䘧᭛ӊ˅DŽ᠔ҹˈ ᰃ䳔㽕ֱᄬ೼Āϡ᣹থᗻāҟ䋼བ⺕ⲬϞⱘDŽ䖭ϔ⚍ᇍ݋᭄᳝᥂䚼ߚⱘ⺕Ⲭ᭛ӊ೎ϡᕙ㿔ˈህᰃᇍѢ ঺໪ˈinode 㒧ᵘЁ union 䞠䴶ⱘֵᙃг᳝ᕜ໮ᰃࡼᗕⱘDŽᰒ✊ˈinode 㒧ᵘЁⳌᇍ䴭ᗕⱘϔѯֵᙃ ⱘ޴Ͼ list_head 㒧ᵘࡼᗕഄ䫒ܹࠄݙᄬЁⱘ㢹ᑆ䯳߫Ёˈ䖭⾡݇㋏ᰒ✊гᰃ೼ࡼᗕഄব࣪ⱘDŽ inode 㒧ᵘⱘ݅ѿ䅵᭄ˈ䖭Ͼ᭄ؐ೼㋏㒳䖤㸠ⱘ䖛⿟Ёᰃᐌᐌ೼ব࣪ⱘDŽজབˈinode 㒧ᵘৃҹ䗮䖛ᅗ 䰸њⳌᇍ䴭ᗕⱘֵᙃҹ໪ˈinode 㒧ᵘЁ䖬᳝ѯ៤ߚ⫼Ѣ㸼⼎ϔѯࡼᗕⱘֵᙃDŽ՟བˈi_count ህᰃ Ͼ