しゅうせいちゅう

血球検出用の小規模データセットなる「BCCD」データセットを使ってCenterNetのModelファイルを作ってみる
BCCD様 https://github.com/Shenggan/BCCD_Dataset https://www.tensorflow.org/datasets/catalog/bccd
参照先 https://qiita.com/otakoma/items/9a37c0f8d29583e36192

どのようなデータかはCenterNet/LabelImgを使えば閲覧できます.
画像とその画像のどの部分が白血球(WBC: White Blood Cell)、 赤血球(RBC: Red Blood Cell)あるいは血小板(Platelets)かを
示すAnnotationsが入ったデータセットです.

(base) [illya@rockylinux ~]$ git clone https://github.com/Shenggan/BCCD_Dataset
 
(base) [illya@rockylinux ~]$ cd BCCD_Dataset/BCCD/
(base) [illya@rockylinux BCCD]$ ls -l
total 32
drwxrwxr-x. 2 illya illya 12288 May 24 06:51 Annotations
drwxrwxr-x. 3 illya illya    18 May 24 06:51 ImageSets
drwxrwxr-x. 2 illya illya 12288 May 24 06:51 JPEGImages
(base) [illya@rockylinux BCCD]$

CenterNet向けにアノテーションデータを変換する

「BCCD_Dataset/BCCD/Annotations/」フォルダに格納されているアノテーション情報は「Pascal VOC format」(xml形式)です.
CenterNetでmodelを作成するにはこの「Pascal VOC format」を「COCO format」(json形式)に変換する必要があるみたい.
下記参照先でご提示頂いている変換スクリプトを利用させて頂きました.
参照先 https://qiita.com/otakoma/items/e391e5e6924945b8a852

スクリプトは「XML2JSON.py」で「~/BCCD_Dataset/XML2JSON.py」に配置しました.
そしてBCCDに合うようにオリジナルからさらに修正を施しました.

--- XML2JSON.py.orig    2022-05-24 07:06:07.513692378 +0900
+++ XML2JSON.py 2022-05-24 07:07:29.890425685 +0900
@@ -8,15 +8,9 @@
 
 def XML2JSON(xmlFiles):
     attrDict = dict()
-    attrDict["categories"]=[{"supercategory":"none","id":1,"name":"racket"},
-                    {"supercategory":"none","id":2,"name":"player"},
-                    {"supercategory":"none","id":3,"name":"tennisball"},
-                    {"supercategory":"none","id":4,"name":"umpire"},
-                {"supercategory":"none","id":5,"name":"ballperson"},
-                {"supercategory":"none","id":6,"name":"camera"},
-                {"supercategory":"none","id":7,"name":"player"},
-                {"supercategory":"none","id":8,"name":"tv"},
-                {"supercategory":"none","id":9,"name":"smartphone"}
+    attrDict["categories"]=[{"supercategory":"none","id":1,"name":"Platelets"},
+                    {"supercategory":"none","id":2,"name":"RBC"},
+                    {"supercategory":"none","id":3,"name":"WBC"}
                   ]
     images = list()
     annotations = list()
@@ -53,8 +47,6 @@
                         id1 +=1
                         annotations.append(annotation)
 
-            else:
-                print("File: {} doesn't have any object".format(file))
 
         else:
             print("File: {} not found".format(file))
@@ -69,6 +61,6 @@
         f.write(jsonString)
 
 
-path="./annotations/"
+path="./BCCD/Annotations/"
 trainXMLFiles=glob.glob(os.path.join(path, '*.xml'))
 XML2JSON(trainXMLFiles)

ただそのままでは動かずcondaで作ったCenterNet環境に「xmltodict」を追加. っで実行する.

(base) [illya@rockylinux ~]$ conda activate CenterNet
 
(CenterNet) [illya@rockylinux ~]$ cd BCCD_Dataset/
 
(CenterNet) [illya@rockylinux BCCD_Dataset]$ conda install xmltodict
(CenterNet) [illya@rockylinux BCCD_Dataset]$ cp -arp BCCD BCCD.orig      <--バックアップ
 
(CenterNet) [illya@rockylinux BCCD_Dataset]$ python XML2JSON.py

変換されたファイルは「train.json」として出力されて、中身はこんな感じ. (たった5枚で試したものです)

{"categories": [
	{"supercategory": "none", "id": 1, "name": "Platelets"},
	{"supercategory": "none", "id": 2, "name": "RBC"},
	{"supercategory": "none", "id": 3, "name": "WBC"}],
"images": [
	{"file_name": "BloodImage_00000.jpg", "height": 480, "width": 640, "id": 1},
	{"file_name": "BloodImage_00001.jpg", "height": 480, "width": 640, "id": 2},
	{"file_name": "BloodImage_00002.jpg", "height": 480, "width": 640, "id": 3},
	{"file_name": "BloodImage_00003.jpg", "height": 480, "width": 640, "id": 4},
	{"file_name": "BloodImage_00004.jpg", "height": 480, "width": 640, "id": 5}],
"annotations": [
	{"iscrowd": 0, "image_id": 1, "bbox": [259, 176, 232, 200], "area": 46400.0, "category_id": 3, "ignore": 0, "id": 1, "segmentation": [[259, 176, 259, 376, 491, 376, 491, 176]]}, 
	{"iscrowd": 0, "image_id": 1, "bbox": [77, 335, 107, 100], "area": 10700.0, "category_id": 2, "ignore": 0, "id": 2, "segmentation": [[77, 335, 77, 435, 184, 435, 184, 335]]}, 
	{"iscrowd": 0, "image_id": 1, "bbox": [62, 236, 107, 100], "area": 10700.0, "category_id": 2, "ignore": 0, "id": 3, "segmentation": [[62, 236, 62, 336, 169, 336, 169, 236]]}, 
	{"iscrowd": 0, "image_id": 1, "bbox": [213, 361, 107, 100], "area": 10700.0, "category_id": 2, "ignore": 0, "id": 4, "segmentation": [[213, 361, 213, 461, 320, 461, 320, 361]]}, 
	{"iscrowd": 0, "image_id": 1, "bbox": [413, 351, 93, 94], "area": 8742.0, "category_id": 2, "ignore": 0, "id": 5, "segmentation": [[413, 351, 413, 445, 506, 445, 506, 351]]}, 
 :
	{"iscrowd": 0, "image_id": 5, "bbox": [0, 314, 78, 83], "area": 6474.0, "category_id": 2, "ignore": 0, "id": 11, "segmentation": [[0, 314, 0, 397, 78, 397, 78, 314]]}, 
	{"iscrowd": 0, "image_id": 5, "bbox": [390, 372, 79, 83], "area": 6557.0, "category_id": 2, "ignore": 0, "id": 12, "segmentation": [[390, 372, 390, 455, 469, 455, 469, 372]]}, 
	{"iscrowd": 0, "image_id": 5, "bbox": [126, 46, 37, 35], "area": 1295.0, "category_id": 1, "ignore": 0, "id": 13, "segmentation": [[126, 46, 126, 81, 163, 81, 163, 46]]}],
"type": "instances"}

4つのセクションに分かれていて、1つはカテゴリ、2つめはファイル、3つめは場所の定義、4つめはよく分からん。starファイルっぽい.

ここでは「BloodImage_00000.xml」から「BloodImage_00410.xml」までの 364 枚のうち、「BloodImage_00000.xml」から「BloodImage_00398.xml」までの 355 枚を学習データにして、
残りの「BloodImage_00400.xml」から「BloodImage_00409.xml」までの 8 枚をテストデータにしてみた. 残りの「BloodImage_00410.xml」に対応するイメージ画像「BloodImage_00410.jpg」を未知データとしてみた.

参照先様では学習データのファイル名を「pascal_trainval0712.json」、テストデータを「pascal_test2007.json」にされている. 真似してみる.

(CenterNet) [illya@rockylinux BCCD_Dataset]$ rm -rf BCCD 
(CenterNet) [illya@rockylinux BCCD_Dataset]$ cp -arp BCCD.orig BCCD
(CenterNet) [illya@rockylinux BCCD_Dataset]$ ls -l BCCD/Annotations/BloodImage_00*.xml | wc -l
364
(CenterNet) [illya@rockylinux BCCD_Dataset]$ rm -rf BCCD/Annotations/BloodImage_004*
(CenterNet) [illya@rockylinux BCCD_Dataset]$ ls -l BCCD/Annotations/BloodImage_00*.xml | wc -l
355
(CenterNet) [illya@rockylinux BCCD_Dataset]$ python XML2JSON.py
(CenterNet) [illya@rockylinux BCCD_Dataset]$ mv train.json pascal_trainval0712.json
 
(CenterNet) [illya@rockylinux BCCD_Dataset]$ rm -rf BCCD
(CenterNet) [illya@rockylinux BCCD_Dataset]$ cp -arp BCCD.orig BCCD
(CenterNet) [illya@rockylinux BCCD_Dataset]$ rm -rf BCCD/Annotations/BloodImage_00[0123]*
(CenterNet) [illya@rockylinux BCCD_Dataset]$ rm -rf BCCD/Annotations/BloodImage_00410.xml
(CenterNet) [illya@rockylinux BCCD_Dataset]$ ls -l BCCD/Annotations/BloodImage_00*.xml | wc -l
8
(CenterNet) [illya@rockylinux BCCD_Dataset]$ python XML2JSON.py
(CenterNet) [illya@rockylinux BCCD_Dataset]$ mv train.json pascal_test2007.json

作った2つの「COCO format」(json形式)ファイルを配置します.

(CenterNet) [illya@rockylinux BCCD_Dataset]$ cd ~/apps/CenterNet/data/
(CenterNet) [illya@rockylinux data]$ rm -rf voc
(CenterNet) [illya@rockylinux data]$ mkdir -p voc/{annotations,images}
 
(CenterNet) [illya@rockylinux data]$ cp ~/BCCD_Dataset/pascal_trainval0712.json voc/annotations/
 
(CenterNet) [illya@rockylinux data]$ cp ~/BCCD_Dataset/pascal_test2007.json     voc/annotations/
 
(CenterNet) [illya@rockylinux data]$ cp ~/BCCD_Dataset/BCCD/JPEGImages/BloodImage_*    voc/images/

参照先様に見習いソースコードに修正を加えます

(CenterNet) [illya@rockylinux data]$ cd
(CenterNet) [illya@rockylinux ~]$ cp apps/CenterNet/src/lib/datasets/dataset/pascal.py apps/CenterNet/src/lib/datasets/dataset/pascal.py.orig

この「bccd.py」を下記のように修正します

--- apps/CenterNet/src/lib/datasets/dataset/pascal.py.orig      2022-05-24 22:10:02.218900033 +0900
+++ apps/CenterNet/src/lib/datasets/dataset/pascal.py   2022-05-24 22:10:30.314151990 +0900
@@ -27,10 +27,7 @@
       self.data_dir, 'annotations',
       'pascal_{}.json').format(_ann_name[split])
     self.max_objs = 50
-    self.class_name = ['__background__', "aeroplane", "bicycle", "bird", "boat",
-     "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog",
-     "horse", "motorbike", "person", "pottedplant", "sheep", "sofa",
-     "train", "tvmonitor"]
+    self.class_name = ['__background__', "Platelets", "RBC", "WBC"]
     self._valid_ids = np.arange(1, 21, dtype=np.int32)
     self.cat_ids = {v: i for i, v in enumerate(self._valid_ids)}
     self._data_rng = np.random.RandomState(123)

っで学習開始.

(CenterNet) [illya@rockylinux ~]$ python ./apps/CenterNet/src/main.py ctdet --exp_id pascal_resdcn18_384 --dataset pascal  --num_epochs 500 --lr_step 50,100,200,300,400 --batch_size 14
 ;
 ;
ctdet/pascal_resdcn18_384 |################################| train: [499][24/25]|Tot: 0:00:21 |ETA: 0:00:01 |loss 0.2546 |hm_loss 0.1369 |wh_loss 0.2815 |off_loss 0.0895 |Data 0.005s(0.021s) |Net 0.851s
ctdet/pascal_resdcn18_384 |################################| train: [500][24/25]|Tot: 0:00:21 |ETA: 0:00:01 |loss 0.2454 |hm_loss 0.1266 |wh_loss 0.2768 |off_loss 0.0911 |Data 0.005s(0.021s) |Net 0.851s
ctdet/pascal_resdcn18_384 |################################| val: [500][7/8]|Tot: 0:00:00 |ETA: 0:00:01 |loss 8.8491 |hm_loss 8.1935 |wh_loss 3.5636 |off_loss 0.2992 |Data 0.000s(0.006s) |Net 0.024s
 
(CenterNet) [illya@rockylinux ~]$

既定は「--batch_size 32」なのだが、それだとA2000(6GB)では無理だった. なので下げてみた. 計算時間は3時間ほど.

(CenterNet) [illya@rockylinux ~]$ ls -l apps/CenterNet/exp/ctdet/pascal_resdcn18_384
total 1473540
drwxrwxr-x. 2 illya illya         6 May 25 01:18 debug
drwxrwxr-x. 2 illya illya        36 May 25 01:20 logs_2022-05-25-01-20
-rw-rw-r--. 1 illya illya 237995797 May 25 01:57 model_100.pth
-rw-rw-r--. 1 illya illya 237995797 May 25 02:34 model_200.pth
-rw-rw-r--. 1 illya illya 237995797 May 25 03:11 model_300.pth
-rw-rw-r--. 1 illya illya 237995797 May 25 03:48 model_400.pth
-rw-rw-r--. 1 illya illya 237995797 May 25 01:39 model_50.pth
-rw-rw-r--. 1 illya illya  80911841 May 25 01:22 model_best.pth
-rw-rw-r--. 1 illya illya 237995797 May 25 04:25 model_last.pth
-rw-rw-r--. 1 illya illya      2359 May 25 01:20 opt.txt
(CenterNet) [illya@rockylinux ~]$
--- apps/CenterNet/src/lib/utils/debugger.py.orig       2022-05-22 01:44:09.215171030 +0900
+++ apps/CenterNet/src/lib/utils/debugger.py    2022-05-22 01:45:24.422858725 +0900
@@ -436,9 +436,7 @@
   'p', 'v'
 ]
 
-pascal_class_name = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus",
-  "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike",
-  "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
+pascal_class_name = [ "Platelets", "RBC", "WBC" ]
 
 coco_class_name = [
      'person', 'bicycle', 'car', 'motorcycle', 'airplane',
(CenterNet) [illya@centos7 ~]$ python ./apps/CenterNet/src/demo.py ctdet --demo BCCD_Dataset/BCCD.orig/JPEGImages/BloodImage_00400.jpg --dataset pascal --load_model apps/CenterNet/exp/ctdet/pascal_dla_384/model_last.pth --debug 2

トップ   編集 添付 複製 名前変更     ヘルプ   最終更新のRSS
Last-modified: 2022-05-25 (水) 07:33:39 (87d)