火星蚊的地球記事

Static Link

Posted on 2016-04-06 Edited on 2017-02-12

所謂 Static Link（靜態連結）──在 linking 階段針對未知 address 的 symbol 填入 address，把一堆 object file 黏在一起變成可執行檔。

沒了。(誤)

object file 是 compile 後的 binary 中間檔，它有多種 format，ELF 是其中之一。linking 主要是將其他 object file 中 symbol 的正確 address 填進 reference 到它的指令中，例如 object file A 的指令 reference 到 object file B 的 symbol x，link 前 object A 無法得知 x 的 address，link 時才會知道 x 正確的 address 並將之填進 object file A 的指令中。

Two-pass linking

Static link 的流程是 two-pass linking，將 linking 分為兩個步驟：

分配 virtual address space
symbol resolution and relocation

1. 分配 virtual address space

合併多個 object file 成一個檔案。

掃描所有 object file，合併相同的 section，例如合併 a.o 跟 b.o 的 .text section。
linker 透過 object file 中的各種 table 得知各 section 的長度、屬性以及位置。
收集所有 object file symbol table 中的 symbol definition 跟 symbol reference 放到 global symbol table。
合併完 object file，各 symbol 的 virtual address 已經確定，linker 會計算 symbol 的 virtual address。
- symbol 的 virtual address = 所在 section 合併後的 address + symbol 的 offset

2. symbol relocation and resolution

static link 的重點。

compiler 不知道 reference 到別的 object file 中 variable 或 function 的 address，所以遇到其他 object file 的 symbol 時會塞假的 address 進 instruction。

利用 relocation table 將真正的 virtual address 填進 instruction 即為 symbol relocation。relocation table 記錄需要調整的 instruction 所在位置以及如何調整。每個需要 relocate 的 section 都有一個 relocation table，relocation table 也是 ELF 檔中的一個 section，如 .rel.text section 是 .text 的 relocation table。可以利用 objdump -r xxx.o 看 relocation table。

linker 由 global symbol table 得知 symbol 的 address，接著依據不同定址模式將 address 填進 instruction。所有 object file 中原本 undefined 的 symbol 經過 relocate 及 resolve 後應該要能在 global symbol table 中找到對應的 address，否則會出現 undefined reference 的 error。

Example

來點例子比較有 fu。

source

foo.c

extern int sum;
static int globalvar = 2;

int foo(int a, int b)
{
    static int staticvar = 1;
    sum = a + b;
    static int* p = &staticvar;
    p = &sum;
}

sum 宣告成 extern，表示是其他檔案的 symbol，在 foo() 裡使用就是跨 object file 的 reference。

static 的 global 變數 globalvar 是只在這個 file 裡才看得到的變數。

main.c

extern int foo(int, int);
int sum = 1;

int main()
{
    foo(5, 3);
    return 0;
}

宣告 foo() 在別的檔案。

$ gcc -c foo.c main.c compile 成 object file。

$ ld foo.o main.o -e main -o foo link 兩個 object file，以 -e 指定 entry point。

PS：以上述簡化的 compile 及 link，程式會在跑到要結束的時候發生 segmentation fault，可能跟自己指定 entry point、未使用 C Runtime 處理開始及結束 process 有關。

分配空間及 address

首先分配空間及 address，將多個 object file 裡相同的 section 放在一起並分配空間及 address，觀察三個檔案的 section：

$ readelf -S foo.o

There are 13 section headers, starting at offset 0x318:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       0000000000000025  0000000000000000  AX       0     0     1
  [ 2] .rela.text        RELA             0000000000000000  000002a0
       0000000000000048  0000000000000018   I      11     1     8
  [ 3] .data             PROGBITS         0000000000000000  00000068
       0000000000000014  0000000000000000  WA       0     0     8
  [ 4] .rela.data        RELA             0000000000000000  000002e8
       0000000000000018  0000000000000018   I      11     3     8
  [ 5] .bss              NOBITS           0000000000000000  0000007c
       0000000000000000  0000000000000000  WA       0     0     1
  [ 6] .comment          PROGBITS         0000000000000000  0000007c
       000000000000001e  0000000000000001  MS       0     0     1
  [ 7] .note.GNU-stack   PROGBITS         0000000000000000  0000009a
       0000000000000000  0000000000000000           0     0     1
  [ 8] .eh_frame         PROGBITS         0000000000000000  000000a0
       0000000000000038  0000000000000000   A       0     0     8
  [ 9] .rela.eh_frame    RELA             0000000000000000  00000300
       0000000000000018  0000000000000018   I      11     8     8
  [10] .shstrtab         STRTAB           0000000000000000  000000d8
       000000000000005e  0000000000000000           0     0     1
  [11] .symtab           SYMTAB           0000000000000000  00000138
       0000000000000138  0000000000000018          12    11     8
  [12] .strtab           STRTAB           0000000000000000  00000270
       000000000000002f  0000000000000000           0     0     1

$ readelf -S main.o

There are 12 section headers, starting at offset 0x268:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       000000000000001a  0000000000000000  AX       0     0     1
  [ 2] .rela.text        RELA             0000000000000000  00000238
       0000000000000018  0000000000000018   I      10     1     8
  [ 3] .data             PROGBITS         0000000000000000  0000005c
       0000000000000004  0000000000000000  WA       0     0     4
  [ 4] .bss              NOBITS           0000000000000000  00000060
       0000000000000000  0000000000000000  WA       0     0     1
  [ 5] .comment          PROGBITS         0000000000000000  00000060
       000000000000001e  0000000000000001  MS       0     0     1
  [ 6] .note.GNU-stack   PROGBITS         0000000000000000  0000007e
       0000000000000000  0000000000000000           0     0     1
  [ 7] .eh_frame         PROGBITS         0000000000000000  00000080
       0000000000000038  0000000000000000   A       0     0     8
  [ 8] .rela.eh_frame    RELA             0000000000000000  00000250
       0000000000000018  0000000000000018   I      10     7     8
  [ 9] .shstrtab         STRTAB           0000000000000000  000000b8
       0000000000000059  0000000000000000           0     0     1
  [10] .symtab           SYMTAB           0000000000000000  00000118
       0000000000000108  0000000000000018          11     8     8
  [11] .strtab           STRTAB           0000000000000000  00000220
       0000000000000015  0000000000000000           0     0     1

$ readelf -S foo

There are 8 section headers, starting at offset 0x3c8:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         00000000004000e8  000000e8
       000000000000003f  0000000000000000  AX       0     0     1
  [ 2] .eh_frame         PROGBITS         0000000000400128  00000128
       0000000000000058  0000000000000000   A       0     0     8
  [ 3] .data             PROGBITS         0000000000600180  00000180
       0000000000000018  0000000000000000  WA       0     0     8
  [ 4] .comment          PROGBITS         0000000000000000  00000198
       000000000000001d  0000000000000001  MS       0     0     1
  [ 5] .shstrtab         STRTAB           0000000000000000  000001b5
       000000000000003a  0000000000000000           0     0     1
  [ 6] .symtab           SYMTAB           0000000000000000  000001f0
       0000000000000180  0000000000000018           7    10     8
  [ 7] .strtab           STRTAB           0000000000000000  00000370
       0000000000000053  0000000000000000           0     0     1

Static link step1

foo.o 跟 main.o 的 .text 以及 .data section 在 foo 合在一起啦！

relocation

section 合併之後就能計算出 symbol 的 address，進入 static link 的重頭戲 relocation。

先看還沒 relocate 的 foo.o 的 symbol：

$ readelf -s foo.o

Symbol table '.symtab' contains 13 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS foo.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     5: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    3 globalvar
     6: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    3 p.1749
     7: 0000000000000010     4 OBJECT  LOCAL  DEFAULT    3 staticvar.1748
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    7 
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT    8 
    10: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
    11: 0000000000000000    37 FUNC    GLOBAL DEFAULT    1 foo
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND sum

Num：symbol table array 的 index。
Name：st_name，symbol name。
Value：st_value，symbol value，該 symbol 的 address。
Size：st_size，表示所佔的大小。如果 symbol 是變數且在這個 object file 內，size 會有值，再根據有沒有 initialized 決定放在 .data 或 .bss section。global 跟 local static 有 initialized 的變數會在 compile 階段挖好空間、決定好 address，也就會在 executable file 中佔有空間。
Type 及 Bind 對應 st_info，GLOBAL 表示 global 可見，LOCAL 則表示在這個 compile unit 中可見。
Ndx：st_shndx，屬於哪個 section，UND 表示這個 symbol 還是 undefined。

globalvar 跟 staticvar 兩個變數都放在 .data section，foo 在 code section .text，sum 被宣告成 extern 則是 undefined 要等 link 的時候才知道在哪。

main.o 的 symbol：

$ readelf -s main.o

Symbol table '.symtab' contains 11 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    7 
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     8: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 sum
     9: 0000000000000000    26 FUNC    GLOBAL DEFAULT    1 main
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND foo

sum 定義在 main.o 裡，foo 在 main.o 則是 undefined。

foo 的 symbol：

$ readelf -s foo

Symbol table '.symtab' contains 16 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000004000e8     0 SECTION LOCAL  DEFAULT    1 
     2: 0000000000400128     0 SECTION LOCAL  DEFAULT    2 
     3: 0000000000600180     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     5: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS foo.c
     6: 0000000000600180     4 OBJECT  LOCAL  DEFAULT    3 globalvar
     7: 0000000000600188     8 OBJECT  LOCAL  DEFAULT    3 p.1749
     8: 0000000000600190     4 OBJECT  LOCAL  DEFAULT    3 staticvar.1748
     9: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
    10: 0000000000600194     4 OBJECT  GLOBAL DEFAULT    3 sum
    11: 0000000000600198     0 NOTYPE  GLOBAL DEFAULT    3 __bss_start
    12: 000000000040010d    26 FUNC    GLOBAL DEFAULT    1 main
    13: 00000000004000e8    37 FUNC    GLOBAL DEFAULT    1 foo
    14: 0000000000600198     0 NOTYPE  GLOBAL DEFAULT    3 _edata
    15: 0000000000600198     0 NOTYPE  GLOBAL DEFAULT    3 _end

link 後 symbol 填上 value，原本是 undefined 的 sum 跟 foo 都有各自的 address 跟所屬的 section。這個合併後的 symbol table 就是 global symbol table。

接著，linker 從 global symbol table 知道 symbol 的 address，並依據 relocation table 知道哪些幾個指令要改以及怎麼改。main.o 跟 foo.o 的 relocation table：

$ readelf -r foo.o

Relocation section '.rela.text' at offset 0x2a0 contains 3 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000014  000c00000002 R_X86_64_PC32     0000000000000000 sum - 4
00000000001b  000300000002 R_X86_64_PC32     0000000000000000 .data + 0
00000000001f  000c0000000b R_X86_64_32S      0000000000000000 sum + 0

Relocation section '.rela.data' at offset 0x2e8 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000008  000300000001 R_X86_64_64       0000000000000000 .data + 10

Relocation section '.rela.eh_frame' at offset 0x300 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000020  000200000002 R_X86_64_PC32     0000000000000000 .text + 0


$ readelf -r main.o

Relocation section '.rela.text' at offset 0x238 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000f  000a00000002 R_X86_64_PC32     0000000000000000 foo - 4

Relocation section '.rela.eh_frame' at offset 0x250 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000020  000200000002 R_X86_64_PC32     0000000000000000 .text + 0

offset 欄位表示需要 relocate 的 instrcution 所在位置，如 foo.o 的 0x14 是需要 sum address 的位置。

修正 address 的方式依據 instruction 而定。簡單分成相對定址跟絕對定址，可由 relocation entry 的 type 知道是哪種定址模式。相對定址填入相對下一個指令 address 的 offset，絕對定址填入 symbol 的絕對 address，所以執行檔中有以 offset 跟絕對 address 得到 symbol address 的 instruction。R_X86_64_PC32 屬於相對定址，R_X86_64_32S 屬於絕對定址。看看 link 後會 instruction 怎麼改變：

$ objdump -d foo.o

Disassembly of section .text:

0000000000000000 <foo>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   89 7d fc                mov    %edi,-0x4(%rbp)
   7:   89 75 f8                mov    %esi,-0x8(%rbp)
   a:   8b 55 fc                mov    -0x4(%rbp),%edx
   d:   8b 45 f8                mov    -0x8(%rbp),%eax
  10:   01 d0                   add    %edx,%eax
  12:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # 18 <foo+0x18>
  18:   48 c7 05 00 00 00 00    movq   $0x0,0x0(%rip)        # 23 <foo+0x23>
  1f:   00 00 00 00 
  23:   5d                      pop    %rbp
  24:   c3                      retq   
  

$ objdump -d main.o

Disassembly of section .text:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   be 03 00 00 00          mov    $0x3,%esi
   9:   bf 05 00 00 00          mov    $0x5,%edi
   e:   e8 00 00 00 00          callq  13 <main+0x13>
  13:   b8 00 00 00 00          mov    $0x0,%eax
  18:   5d                      pop    %rbp
  19:   c3                      retq   


$ objdump -d foo

Disassembly of section .text:

00000000004000e8 <foo>:
  4000e8:       55                      push   %rbp
  4000e9:       48 89 e5                mov    %rsp,%rbp
  4000ec:       89 7d fc                mov    %edi,-0x4(%rbp)
  4000ef:       89 75 f8                mov    %esi,-0x8(%rbp)
  4000f2:       8b 55 fc                mov    -0x4(%rbp),%edx
  4000f5:       8b 45 f8                mov    -0x8(%rbp),%eax
  4000f8:       01 d0                   add    %edx,%eax
  4000fa:       89 05 94 00 20 00       mov    %eax,0x200094(%rip)        # 600194 <sum>
  400100:       48 c7 05 7d 00 20 00    movq   $0x600194,0x20007d(%rip)        # 600188 <p.1749>
  400107:       94 01 60 00 
  40010b:       5d                      pop    %rbp
  40010c:       c3                      retq   

000000000040010d <main>:
  40010d:       55                      push   %rbp
  40010e:       48 89 e5                mov    %rsp,%rbp
  400111:       be 03 00 00 00          mov    $0x3,%esi
  400116:       bf 05 00 00 00          mov    $0x5,%edi
  40011b:       e8 c8 ff ff ff          callq  4000e8 <foo>
  400120:       b8 00 00 00 00          mov    $0x0,%eax
  400125:       5d                      pop    %rbp
  400126:       c3                      retq

compile foo.c 跟 main.c 時 compiler 不知道 reference 到外部 symbol 的 address，在 instruction 中填入 0，link 才填入真正的 address。

foo.o 的 instruction 從

1
2
3

12:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # 18 <foo+0x18>
18:   48 c7 05 00 00 00 00    movq   $0x0,0x0(%rip)        # 23 <foo+0x23>
1f:   00 00 00 00

變成

1
2
3

4000fa:       89 05 94 00 20 00       mov    %eax,0x200094(%rip)        # 600194 <sum>
400100:       48 c7 05 7d 00 20 00    movq   $0x600194,0x20007d(%rip)        # 600188 <p.1749>
400107:       94 01 60 00

mov 使用相對定址 access sum (0x600194)，將下一個 instruction 的 address 0x400100 加上 0x200094 得到 sum 的 address，可從 symbol table 驗證。movq 使用絕對定址，由上 0x400107 可以看到 sum 的 address 直接寫進 instruction 了。之所以在 instruction 中數值看起來是反過來的，是因為 intel x86 CPU 使用 little-endian（Endianness wiki）。

main.o 則是

1	e: e8 00 00 00 00 callq 13 <main+0x13>

變成

1	40011b: e8 c8 ff ff ff callq 4000e8 <foo>

callq 指令要 call foo()，其中 e8 是指令本身，ffffffc8 是 offset，以二的補數來看是十進位 -56，所以 0x400120 - 0x38 = 0x4000e8，就是 foo() 的 address 啦。

Ref

《程式設計師的自我修養》 ch 4
Endianness wiki

Template Method Pattern

Posted on 2016-04-04 Edited on 2025-03-25

這個 pattern 是用來建立一個 algorithm 的 template。在一個 method 中定義 algorithm 的骨架，其中的小步驟定義在 derived class。可以在不改變 algorithm 架構的狀況下改變其中某些步驟的做法。

UML

Template Method

template method 定義了 algorithm 的骨架，derived class 藉由 override 其中的步驟 function 改變 algorithm 的行為。

在 base class 中可以定義共用的 operation。有些 operation 在 algorithm 概念上是 derived class 一定要 implement 的，C++ 裡可用 pure virtual function。通常 base class 會有一份 hook 的 implement，derived class 可以選擇性 override hook，依據 hook 在 template method 裡的使用，override hook 可能影響 algorithm 的行為，例如做或不做某些步驟。

應用

很多 UI framework，例如 Java 的 UI framework 跟 Qt，都有 paint() 之類的 painting function 以及 event handling function（例如處理 mouse event）就是使用 Template Method pattern。framework 已經決定何時會 call 這些 function，而 user 寫的 UI component 則依據需要 override 這些 function 決定實際上要做什麼事，如畫什麼東西、按滑鼠時要做什麼等等。

Facade Pattern

Posted on 2016-04-03 Edited on 2018-10-26

Facade 定義較簡單（抽象程度更高）的 interface 來讓 client 更容易使用複雜的 sub system。

目的在簡化 sub system 的使用方式。

使用情境

sub system 提供很多功能與 interface 但太複雜，希望有簡單的方式使用 sub system。

UML

Facade Pattern

Facade 沒有封裝 sub system，只是提供簡化的 interface 方便使用。

client 可以用 Facade 的簡單 interface，也可以使用原本 sub system 提供的 interface。就像有些軟體在設定頁只放一般常用設定，需要調整細部設定的使用者再按「進階」鈕進入設定。

Adapter Pattern

Posted on 2016-04-02 Edited on 2025-03-25

將一個 class 的 interface 轉換成另一個 interface 供其他人使用，讓原本不相容的 interface 可以相容。

一個轉接頭的概念。

使用情境

不想改其他使用 class A 的 code 卻想用 class B 達成相同功能時，以 Adapter 將 class B 的 interface 轉成 class A。

Adapter 因為受限 Adaptee 的能力，不一定能完美 implement interface 所提供的功能，這種時候通常用文件（就大家講好）或 exception 等等方式處理。之前一直以為 Adpater 要完全 implement interface 提供的功能，遇到受限的狀況就有點 confuse 這樣是不是 adpater…

UML

Adapter Pattern (合成)

Client 只知道 Target 的 interface，不知道 Adaptee 的 interface，Class 跟 Adaptee 之間是鬆綁的。如果需要同時使用兩種 interface，Adapater 也可以 implement 多個 Target interface，例如有 Target1 及 Target2 兩個 interface，有些地方原本使用 Target1，後來新寫的 code 使用 Target2，Adapter 同時支援兩者就可以不改動到原有的 code。

我比較習慣用合成讓 Adapter 使用 Adaptee，有另一種做法是用繼承，沒很懂這樣用的好處跟時機，先記著有這種方式：

Adapter Pattern (繼承)

跟其他 pattern 比較

Adapter 是做 interface 轉換。
Facade 是為了提供簡單的 interface 讓其他人易於操作 sub system，Adapter 跟 Facade 的差別在「目的」。
Decorator 是加功能。

select()

Posted on 2016-03-27 Edited on 2025-03-25

由 kernel 注意某些 fd 是否 active（readable、writable 及有 error），有則 return 讓 application process 對 active 的 fd 做相應的處理。用 select() 可避免 application process 去 polling 看各個 socket 是否 active、浪費 CPU 資源。如果沒有 fd active、沒設 timeout、沒有 signal 打斷，select() 是 blocking。

正常狀況下 select() return 三個 fdset 共有多少 fd active。timeout 時 return 0。收到 signal return -1 且 errno 設為 EINTR，不會測試 fd 也不會修改 fd_set，所以不能用 fd 判斷是否 active。select() 之所以在被 signal 打斷時不修改 fd_set，是為了避免 select() 跟 signal handler 不斷修改同一個 flag 造成 infinite loop。例如 select() 發現某 flag 是 0 會將 flag 設為 1，而某個 signal handler 遇到 flag 是 1 又把 flag 設為 0，沒完沒了。

pselect() 可設定擋住哪些 signal，讓這些 signal 不打斷 pselect()。

Sample Code

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <cstring>
#include <cstdlib>
#include <errno.h>
#include <set>
#include <iostream>

#define MAXBUF 1000

using namespace std;

void run(int listenPort, int qlen);
int CreateListenSock(int listenPort, int qlen);

int main()
{
    run(8899, 5);
    return 0;
}

void run(int listenPort, int qlen)
{
    fd_set afdset, rfdset, wfdset, efdset;
    int listenfd = CreateListenSock(listenPort, qlen);
    int maxfd = listenfd;
    set<int> fds;
    bool bNeedWrite = false;

    FD_ZERO(&afdset); FD_ZERO(&rfdset); FD_ZERO(&wfdset); FD_ZERO(&efdset);
	FD_SET(listenfd, &afdset);

    fds.insert(listenfd);

    while (true)
    {
        int iActive = 0;
        struct timeval timeout;

        rfdset = afdset;
        efdset = afdset;

        if (bNeedWrite)
        {
            wfdset = afdset;
        }
        else
        {
            FD_ZERO(&wfdset);
        }

        timeout.tv_sec = 3;
		timeout.tv_usec = 0;
		
        if ((iActive = select(maxfd + 1, &rfdset, &wfdset, &efdset, &timeout)) == -1)
        {
            // handle error
            if (errno == EINTR)
            {
            }
            else
            {
            }
        }
        else
        {
            int iHandled = 0;
			set<int>::iterator fdIter = fds.begin();
            for (; fdIter != fds.end() && iHandled < iActive; ++fdIter)
            {
                int fd = *fdIter;

                if (FD_ISSET(fd, &rfdset))
                {
                    if (fd == listenfd)
                    {
                        // handle new connection
                        struct sockaddr_in cliaddr;
                        socklen_t cliaddrlen = sizeof(cliaddr);

                        bzero((char *)&cliaddr, sizeof(cliaddr));

                        int connfd = accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddrlen);

                        fds.insert(connfd);
                        FD_SET(connfd, &afdset);
                        if (connfd > maxfd)
						{
							maxfd = connfd;
						}
                    }
                    else
                    {
                        // handle read
                        char readBuf[MAXBUF];
                        int iRead = 0;

                        bzero(readBuf, MAXBUF);

                        if ((iRead = read(fd, &readBuf, MAXBUF - 1)) > 0)
                        {
                            readBuf[iRead] = 0;
                        }
                        else if (iRead == 0)
                        {
                            close(fd);
                            FD_CLR(fd, &afdset);
                            fds.erase(fd);
                        }
                        else
                        {
                            // handle read error
                        }
                    }

                    iHandled++;
                }

                if (FD_ISSET(fd, &wfdset))
                {
                    // handle write
                    iHandled++;
                }

                if (FD_ISSET(fd, &efdset))
                {
                    // handle error
                    iHandled++;
                }
            }
        }
    }
}

int CreateListenSock(int listenPort, int qlen)
{
    struct sockaddr_in servAddr;
    int listenfd = -1;

    if ((listenfd = socket(AF_INET, SOCK_STREAM, 0)) < 0) 
	{
        cerr << "Create socket failed" << endl;
        exit(1);
    }

    bzero((char *)&servAddr, sizeof(servAddr));
    servAddr.sin_family = AF_INET;
    servAddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servAddr.sin_port = htons(listenPort);

    if (bind(listenfd, (struct sockaddr *)&servAddr, sizeof(servAddr)) < 0) 
	{
        cerr << "Bind socket failed" << endl;
        exit(1);
    }

    listen(listenfd, qlen);

    return listenfd;
}

C++ toolchain on windows and linux

Posted on 2016-03-06 Edited on 2025-03-25

cl /I <include path>
link /LIBPATH:<library path> /OUT:<output file>

cl /I "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include" /c hello.cpp
link /LIBPATH:"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\lib" /LIBPATH:"C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Lib" /OUT:hello.exe hello.obj

library path 加 C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Lib 主要為了 kernel32.lib。

不過很少直接用 cl，通常都是用 Visual Studio 整套 IDE。

C++ 在 operator=() 處理 self assignment

Posted on 2015-12-13 Edited on 2025-03-25

self assignment：object 被 assign 給自己。

class member 有 pointer 時寫 operator=() 要小心處理 self assignment。如果先把自己原本的 member delete 掉，等同把 rhs 的 member 也 delete 掉，assign 後會得到爛掉的 pointer。

Foo& Foo::operator=(const Foo& rhs)
{
	Bitmap* pOrig = pb;    // pb is member pointer in Widget
	
	if (rhs.pb != NULL)
		pb = new Bitmap(*rhs.pb);
	else
		pb = NULL;
	
	delete pOrig;
    return *this;
}

這做法可以處理 member pointer 但會讓 member pointer 指的位置經過 self assignment 後變得不同，另一種做法是檢查 this 是否跟 &rhs 相同，不同時才真的做 copy。

Ref

《Effective C++》

Strong Symbol and Weak Symbol

Posted on 2015-12-13 Edited on 2025-03-25

symbol 的 definition 可分為 strong symbol 跟 weak symbol。C/C++ 的 compiler 預設 function 及有初始化的 global variable 為 strong symbol，未初始化的 global variable 為 weak symbol。strong & weak symbol 跟處理 symbol 重複定義有關：

不允許 strong symbol 重複定義，有的話會 link error。
如果一個 symbol 在某個 object file 中是 strong symbol，其他都是 weak symbol，選 strong symbol。
如果都是 weak symbol，選 type size 最大的。

Usage

GCC 中可用 __attribute__((weak)) 來定義一個 strong symbol 為 weak symbol：

weaksym.cpp

1	__attribute__((weak)) int x = 2; // weak symbol

main.cpp

#include <iostream>

int x = 123;	// strong symbol

int main() {
    std::cout << x << endl;		// result is 123
	return 0;
}

weak symbol 可以在 link time 置換 function。一開始給個預設 implementation 並設為 weak symbol，使用者可以寫 function 編成 object file 去 link。由於使用者寫的是 strong symbol 會蓋掉原本的 default implementation，達到 link 階段換 implementation。

weakfoo.cpp

#include <iostream>

extern void foo() __attribute__ ((weak));

void foo() { std::cout << "default foo" << endl; }

int main() {
    foo();
    return 0;
}

foo.cpp

1
2
3

#include <iostream>

void foo() { std::cout << "custom foo" << endl; }

> g++ -c weakfoo.cpp -o weakfoo.o
> g++ -c foo.cpp -o foo.o
> g++ weakfoo.o
> ./a.out
default foo
> g++ weakfoo.o foo.o
> ./a.out
custom foo

Ref

《程式設計師的自我修養》3.5.5
https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Function-Attributes.html
https://en.wikipedia.org/wiki/Weak_symbol

DLL usage

Posted on 2015-11-22 Edited on 2025-03-25

link DLL 分成 implicit link 及 explicit link。

Implicit link

DLL library 需要 export 出 symbol，使用 DLL 的程式則需要 import symbol。VC 裡透過 __declspec(export) 來標示要 export 的 symbol，以及 __declspec(import) 標示要從外面 import 的 symbol。如果要讓 C++ 的 symbol 跟 C 相容，需要加 extern "C"（不做 C++ 名稱修飾）。

使用 library 的程式需要：

compile 時需要 library export 的 symbol 的 header file
需 link library 的 .lib，.lib 會在 build DLL 時一起 build 出來
執行時需 .dll

build DLL 產生的 .lib 跟 static library 的不一樣，DLL 的 .lib 只是告訴使用的程式去哪找到 DLL，不會包含實際功能，所以檔案比較小。使用 implicitly link DLL 的程式必須要在 load 時可以找到 DLL，否則會跳錯誤訊息而且無法繼續執行。

作為 library 的 project 需在 VC 的 project properties→General→Configuration Type 設為 Dynamic Library (.dll)、Target Extension 設為 .dll。另外，如果 code 實際上沒有 export symbol，build DLL 時不會生出 .lib。

Sample

Foo library

compile 時需 define FOO_DLL_EXPORTS。

Foo.h

#ifndef FOO_H
#define FOO_H

#ifdef FOO_DLL_EXPORTS
	#define FOO_API __declspec(dllexport)
#else
	#define FOO_API __declspec(dllimport)
#endif

FOO_API int Add(int a, int b);
extern "C" FOO_API int Sub(int a, int b);
#endif

Foo.cpp

1
2
3

#include "Foo.h"
int Add(int a, int b) { return (a + b); }
int Sub(int a, int b) { return (a - b); }

dumpbin 要從 VC 的 command prompt 才叫得出來。

export symbol

D:\tmp\FooLibrary\Debug>dumpbin /EXPORTS FooLibrary.dll
Dump of file FooLibrary.dll

File Type: DLL

  Section contains the following exports for FooLibrary.dll

    00000000 characteristics
    56507D3B time date stamp Sat Nov 21 22:18:35 2015
        0.00 version
           1 ordinal base
           2 number of functions
           2 number of names

    ordinal hint RVA      name

          1    0 0001107D ?Add@@YAHHH@Z = @ILT+120(?Add@@YAHHH@Z)
          2    1 000110FA Sub = @ILT+245(_Sub)

Test program

project 的 include file 中須包含 Foo.h，link library 需有 FooLibrary.lib，執行檔旁則需放 FooLibrary.dll。

main.cpp

#include <iostream>
#include "Foo.h"
int main()
{
    std::cout << Add(1, 2) << ", " << Sub(1, 2) << endl;
    return 0;
}

Explicit link

在 runtime 時才 load DLL。因為 runtime 才 load，即使 load DLL 失敗也可以在程式裡處理錯誤並繼續執行下去。

使用 library 的程式需要：

call LoadLibrary() load DLL
call GetProcAddress() 取得想要的 function 的 address
用完 library 需 call FreeLibrary()
程式 compile 時不一定需要 library 的 header file（但需要知道要 call 的 function 的 prototype），link 時不需要 .lib，僅在執行時需要 .dll

用 GetProcAddress() 需指定的 function name 是 library export 出來的 symbol，不是 library source code 裡的 function name。經過 C++ 名稱修飾，需要指定的 function name 會變得難以理解，這種 interface 應該沒人想用。除了 __declspec(dllexport) 外，export function symbol 的另一個做法是使用 .def 模組定義檔來宣告名稱。實際上是指定 alias 給原本的 symbol。

Sample load DLL

library source code 同上。

有加 extern "C" 的 Sub() 因為沒經過 C++ 名稱修飾，所以能直接用 function name，但 Add() 就得寫出 C++ 修飾後的 symbol name 才拿得到 function pointer。

main.cpp

#include <iostream>
#include <windows.h>

typedef int(*pfn)(int, int);

int main()
{
    HINSTANCE dllHandle = LoadLibrary("FooLibrary.dll");

    if (dllHandle != NULL)
    {
        // Get address of function
        pfn pSubFunc = (pfn)GetProcAddress(dllHandle, "Sub");

        if (!pSubFunc)
        {
            std::cout << "Load Sub() fail" << std::endl;  // handle the error
        }
        else
        {
            std::cout << pSubFunc(2, 3) << std::endl; // call the function
        }

        pfn pAddFunc = (pfn)GetProcAddress(dllHandle, "?Add@@YAHHH@Z");

        if (!pAddFunc)
        {
            std::cout << "Load Add() fail" << std::endl;
        }
        else
        {
            std::cout << pAddFunc(2, 3) << std::endl;
        }

        FreeLibrary(dllHandle);
    }
    else
    {
        std::cout << "Load FooLibrary.dll fail" << std::endl;
    }

    return 0;
}

Sample 模組定義檔

刪掉 Foo.h 裡的 __declspec(dllexport)。

Foo.def

LIBRARY FooLibrary
EXPORTS
Add
Sub

D:\tmp\FooLibrary\Debug>dumpbin /EXPORTS FooLibrary.dll
Dump of file FooLibrary.dll

File Type: DLL

  Section contains the following exports for FooLibrary.dll

    00000000 characteristics
    56516FE9 time date stamp Sun Nov 22 15:34:01 2015
        0.00 version
           1 ordinal base
           2 number of functions
           2 number of names

    ordinal hint RVA      name

          1    0 0001107D Add = @ILT+120(?Add@@YAHHH@Z)
          2    1 000110FA Sub = @ILT+245(_Sub)

main.cpp 的 GetProcAddress() 可以直接寫 Add 跟 Sub。

~~其實這是小實驗筆記吧…~~

Ref

trace code

Posted on 2015-08-30 Edited on 2025-03-25

最近看了些 code，發現不同情況下有不同 trace code 的方式，來 murmur 一下。

通常拿到一份 code 會先看它主要 component 的結構。如果有 UI 會先大概了解有哪些 component、分別叫什麼、彼此階層關係是什麼，例如哪個 container 裡放著什麼之類的。如果是網頁會先看資料夾結構是一般自己寫的還是用 framework。之後依照要做的事情不同而有不同的 trace 方式。

第一種，debug 或者找特定功能。

意識到這種 trace code 方式應該是大學在計中打工的時候，那時候想將 Wordpress 跟一個系統做簡單的整合連結，所以要找 Wordpress 裡相對應的功能。工作後 debug 也常常是用這種方式在看 code。

如果有 UI 操作，從 UI 操作 trigger 的地方開始一路往下看。
簡單的東西可能只需要找到特定 function，改一改或加一加功能就好。
稍微複雜一點就得看懂整條路在幹嘛。

這方式是單看程式裡某一條路徑、某一段特定邏輯，除非發現是架構上的 bug 才會再往外擴。要是對那份 code 很熟，是也不用從最開始往下找啦……

第二種，想知道程式整體結構或運作之類的。

想全面但不深入細節的了解結構跟 high level 的邏輯概念。著重架構及概念，會看大致的流程邏輯，但不會細看每個 function 怎麼實作。這是最近演化(?)出來的方式。

找最主要的 component 當起點，通常那 project 叫什麼主要 component 就叫什麼，找不到就從 main() 開始。
看 class name 猜用途~~猜猜樂~~
- 看名字看不出來的就看 public function 來了解這 class 提供什麼功能
- function name 還是看不出來，找其他地方如何使用或者快速掃一下實作
- 有時候會遇到「這個 class 就是這堆功能的集合，我也看不出來這名字跟這堆功能有啥關係」的狀況就是……
需要知道某些流程的時候
- 以類似第一種做法順著流程邏輯看，但只看概略，不細看實作。
- 偶爾會看點實作，但比較像用大筆刷過去……第一種方式看實作比較像要刻字那樣精雕細琢…….~~我到底在寫什麼…Orz…~~
了解各 object 的關係，通常看 member。
- 如果是 pointer 要注意是自己生的還是別人傳進來的。
遇到某些關鍵字，例如 XXXFactory、XXXObserver，直接套用已知概念。
- 只注意誰跟誰有這類的關係，不細看如何實作這些關係。當然也有人家有關鍵字但我不會那概念就沒東西套的狀況……XD
遇見某些常見寫法，直接套用那種寫法的概念。
- 例如 select() 常常就是一個 while loop、塞一塞 fdset、call select()、後面依照 fdset 做事。某些 event loop 做法也有相似性。

我覺得如果遇到關鍵字跟常見做法可以直接套用已知觀念，相對來說就會快很多，因為不需要特別再看實作去理解這部分在做什麼。有時候需要看實作是因為拼湊其他線索後還是不知道那段在幹嘛，只好透過實作細節重新抽象化成概念。

至於如何找 code？find all 與 grep --color -nr * 萬歲！(欸)

PS：寫這個是想知道自己怎麼 trace code 的，但怎麼寫一寫好像還是有點像要心領神會的難以言喻……Orz……

火星蚊的地球記事

Static Link

Two-pass linking

1. 分配 virtual address space

2. symbol relocation and resolution

Example

source

分配空間及 address

relocation

Ref

Template Method Pattern

UML

相關 pattern

應用

Facade Pattern

使用情境

UML

相關 pattern

Adapter Pattern

使用情境

UML

跟其他 pattern 比較

select()

Sample Code

C++ toolchain on windows and linux

C++ 在 operator=() 處理 self assignment

Ref

Strong Symbol and Weak Symbol

Usage

Ref

DLL usage

Implicit link

Sample

Foo library

Test program

Explicit link

Sample load DLL

Sample 模組定義檔

Ref

trace code