新聞中心

EEPW首頁(yè) > 嵌入式系統(tǒng) > 設(shè)計(jì)應(yīng)用 > 內(nèi)核中的互斥之我見(jiàn)

內(nèi)核中的互斥之我見(jiàn)

作者: 時(shí)間:2006-12-12 來(lái)源:網(wǎng)絡(luò) 收藏

/*e4gle:在我修改linux源代碼的過(guò)程中曾被大量的內(nèi)核互斥現(xiàn)象所困擾,這需要利用內(nèi)核鎖去解決,雖然最后大部分解決,但我覺(jué)得應(yīng)該留下些什么,也沒(méi)時(shí)間寫了,偶爾看見(jiàn)這位兄弟的文章,覺(jué)得正是我想整理的,所以拿出來(lái)給大家分享,關(guān)于bottom_half和中斷的問(wèn)題,在tcp/ip半底中絕對(duì)不能對(duì)文件讀寫操作,不然就panic,恰恰我在linux中的增強(qiáng)功能就有這個(gè)操作,使我郁悶了很久,歡迎大家討論
  */
  內(nèi)核中的互斥之我見(jiàn)
  by wheelz

  看了前面各位的討論,我也有些想法,與大家商榷。
  需要澄清的是,互斥手段的選擇,不是根據(jù)臨界區(qū)的大小,而是根據(jù)臨界區(qū)的性質(zhì),以及 有哪些部分的代碼,即哪些內(nèi)核執(zhí)行路徑來(lái)爭(zhēng)奪。
  從嚴(yán)格意義上說(shuō),和spinlock_XXX屬于不同層次的互斥手段,前者的 實(shí)現(xiàn)有賴于后者,這有點(diǎn)象HTTP和TCP的關(guān)系,都是協(xié)議,但層次是不同的。
  先說(shuō),它是進(jìn)程級(jí)的,用于多個(gè)進(jìn)程之間對(duì)資源的互斥,雖然也是在 內(nèi)核中,但是該內(nèi)核執(zhí)行路徑是以進(jìn)程的身份,代表進(jìn)程來(lái)爭(zhēng)奪資源的。如果 競(jìng)爭(zhēng)不上,會(huì)有context switch,進(jìn)程可以去sleep,但CPU不會(huì)停,會(huì)接著運(yùn)行 其他的執(zhí)行路徑。從概念上說(shuō),這和單CPU或多CPU沒(méi)有直接的關(guān)系,只是在 本身的實(shí)現(xiàn)上,為了保證semaphore結(jié)構(gòu)存取的原子性,在多CPU中需要spinlock來(lái)互斥。
  在內(nèi)核中,更多的是要保持內(nèi)核各個(gè)執(zhí)行路徑之間的數(shù)據(jù)訪問(wèn)互斥,這是最基本的互斥問(wèn)題,即保持?jǐn)?shù)據(jù)修改的原子性。semaphore的實(shí)現(xiàn),也要依賴這個(gè)。在單CPU中,主要是中斷和bottom_half的問(wèn)題,因此,開(kāi)關(guān)中斷就可以了。在多CPU中,又加上了其他CPU的干擾,因此需要spinlock來(lái)幫助。這兩個(gè)部分結(jié)合起來(lái),就形成了spinlock_XXX。它的特點(diǎn)是,一旦CPU進(jìn)入了spinlock_XXX,它就不會(huì)干別的,而是一直空轉(zhuǎn),直到鎖定成功為止。因此,這就決定了被spinlock_XXX鎖住的臨界區(qū)不能停,更不能context switch,要存取完數(shù)據(jù)后趕快出來(lái),以便其他的在空轉(zhuǎn)的執(zhí)行路徑能夠獲得spinlock。這也是spinlock的原則所在。如果當(dāng)前執(zhí)行路徑一定要進(jìn)行context switch,那就要在schedule()之前釋放spinlock,否則,容易死鎖。因?yàn)樵谥袛嗪蚥h中,沒(méi)有context,無(wú)法進(jìn)行context switch,只能空轉(zhuǎn)等待spinlock,你context switch走了,誰(shuí)知道猴年馬月才能回來(lái)。
  因?yàn)閟pinlock的原意和目的就是保證數(shù)據(jù)修改的原子性,因此也沒(méi)有理由在spinlock 鎖住的臨界區(qū)中停留。
  spinlock_XXX有很多形式,有
  spin_lock()/spin_unlock(),
  spin_lock_irq()/spin_unlock_irq(),
  spin_lock_irqsave/spin_unlock_irqrestore()
  spin_lock_bh()/spin_unlock_bh()
  local_irq_disable/local_irq_enable
  local_bh_disable/local_bh_enable
  那么,在什么情況下具體用哪個(gè)呢?這要看是在什么內(nèi)核執(zhí)行路徑中,以及要與哪些內(nèi)核執(zhí)行路徑相互斥。我們知道,內(nèi)核中的執(zhí)行路徑主要有:
  1 用戶進(jìn)程的內(nèi)核態(tài),此時(shí)有進(jìn)程context,主要是代表進(jìn)程在執(zhí)行系統(tǒng)調(diào)用 等。
  2 中斷或者異?;蛘咦韵莸龋瑥母拍钌险f(shuō),此時(shí)沒(méi)有進(jìn)程context,不能進(jìn)行
  context switch。
  3 bottom_half,從概念上說(shuō),此時(shí)也沒(méi)有進(jìn)程context。
  4 同時(shí),相同的執(zhí)行路徑還可能在其他的CPU上運(yùn)行。
  這樣,考慮這四個(gè)方面的因素,通過(guò)判斷我們要互斥的數(shù)據(jù)會(huì)被這四個(gè)因素中
  的哪幾個(gè)來(lái)存取,就可以決定具體使用哪種形式的spinlock。如果只要和其他CPU互斥,就要用spin_lock/spin_unlock,如果要和irq及其他CPU互斥,就要用
  spin_lock_irq/spin_unlock_irq,如果既要和irq及其他CPU互斥,又要保存EFLAG的狀態(tài),就要用spin_lock_irqsave/spin_unlock_irqrestore,如果要和bh及其他CPU互斥,就要用spin_lock_bh/spin_unlock_bh,如果不需要和其他CPU互斥,只要和irq互斥,則用local_irq_disable/local_irq_enable,
  如果不需要和其他CPU互斥,只要和bh互斥,則用local_bh_disable/local_bh_enable,
  等等。值得指出的是,對(duì)同一個(gè)數(shù)據(jù)的互斥,在不同的內(nèi)核執(zhí)行路徑中,
  所用的形式有可能不同(見(jiàn)下面的例子)。
  舉一個(gè)例子。在中斷部分中有一個(gè)irq_desc_t類型的結(jié)構(gòu)數(shù)組變量irq_desc[],
  該數(shù)組每個(gè)成員對(duì)應(yīng)一個(gè)irq的描述結(jié)構(gòu),里面有該irq的響應(yīng)函數(shù)等。
  在irq_desc_t結(jié)構(gòu)中有一個(gè)spinlock,用來(lái)保證存取(修改)的互斥。
  對(duì)于具體一個(gè)irq成員,irq_desc[irq],對(duì)其存取的內(nèi)核執(zhí)行路徑有兩個(gè),一是
  在設(shè)置該irq的響應(yīng)函數(shù)時(shí)(setup_irq),這通常發(fā)生在module的初始化階段,或
  系統(tǒng)的初始化階段;二是在中斷響應(yīng)函數(shù)中(do_IRQ)。代碼如下:
  int setup_irq(unsigned int irq, struct irqaction * new)
  {
  int shared = 0;
  unsigned long flags;
  struct irqaction *old, **p;
  irq_desc_t *desc = irq_desc + irq;
  /*
  * Some drivers like serial.c use request_irq() heavily,
  * so we have to be careful not to interfere with a
  * running system.
  */
  if (new->flags SA_SAMPLE_RANDOM) {
  /*
  * This function might sleep, we want to call it first,
  * outside of the atomic block.
  * Yes, this might clear the entropy pool if the wrong
  * driver is attempted to be loaded, without actually
  * installing a new handler, but is this really a problem,
  * only the sysadmin is able to do this.
  */
  rand_initialize_irq(irq);
  }
  /*
  * The following block of code has to be executed atomically
  */
  [1] spin_lock_irqsave(desc->lock,flags);
  p = desc->action;
  if ((old = *p) != NULL) {
  /* Can't share interrupts unless both agree to */
  if (!(old->flags new->flags SA_SHIRQ)) {
  [2] spin_unlock_irqrestore(desc->lock,flags);
  return -EBUSY;
  }
  /* add new interrupt at end of irq queue */
  do {
  p = old->next;
  old = *p;
  } while (old);
  shared = 1;
  }
  *p = new;
  if (!shared) {
  desc->depth = 0;
  desc->status = ~(IRQ_DISABLED | IRQ_AUTODETECT | IRQ_WAITING);
  desc->handler->startup(irq);
  }
  [3] spin_unlock_irqrestore(desc->lock,flags);
  register_irq_proc(irq);
  return 0;
  }
  asmlinkage unsigned int do_IRQ(struct pt_regs regs)
  {
  /*
  * We ack quickly, we don't want the irq controller
  * thinking we're snobs just because some other CPU has
  * disabled global interrupts (we have already done the
  * INT_ACK cycles, it's too late to try to pretend to the
  * controller that we aren't taking the interrupt).
  *
  * 0 return value means that this irq is already being
  * handled by some other CPU. (or is disabled)
  */
  int irq = regs.orig_eax 0xff; /* high bits used in ret_from_ code */
  int cpu = smp_processor_id();
  irq_desc_t *desc = irq_desc + irq;
  struct irqaction * action;
  unsigned int status;
  kstat.irqs[cpu][irq]++;
  [4] spin_lock(desc->lock);
  desc->handler->ack(irq);
  /*
  REPLAY is when Linux resends an IRQ that was dropped earlier
  WAITING is used by probe to mark irqs that are being tested
  */
  status = desc->status ~(IRQ_REPLAY | IRQ_WAITING);
  status |= IRQ_PENDING; /* we _want_ to handle it */
  /*
  * If the IRQ is disabled for whatever reason, we cannot
  * use the action we have.
  */
  action = NULL;
  if (!(status (IRQ_DISABLED | IRQ_INPROGRESS))) {
  action = desc->action;
  status = ~IRQ_PENDING; /* we commit to handling */
  status |= IRQ_INPROGRESS; /* we are handling it */
  }
  desc->status = status;
  /*
  * If there is no IRQ handler or it was disabled, exit early.
  Since we set PENDING, if another processor is handling
  a different instance of this same irq, the other processor
  will take care of it.
  */
  if (!action)
  goto out;
  /*
  * Edge triggered interrupts need to remember
  * pending events.
  * This applies to any hw interrupts that allow a second
  * instance of the same irq to arrive while we are in do_IRQ
  * or in the handler. But the code here only handles the _second_
  * instance of the irq, not the third or fourth. So it is mostly
  * useful for irq hardware that does not mask cleanly in an
  * SMP environment.
  */
  for (;;) {
  [5] spin_unlock(desc->lock);
  handle_IRQ_event(irq, ®s, action);
  [6] spin_lock(desc->lock)

本文引用地址:http://butianyuan.cn/article/258258.htm


關(guān)鍵詞: semaphore

評(píng)論


技術(shù)專區(qū)

關(guān)閉