過去ページ Alphafold-v2.0.0

本家様 https://github.com/deepmind/alphafold
AIを使った蛋白質立体構造予測プログラム.

ここでは本家様が利用される docker を使わない「alphafold_non_docker」版https://github.com/kalininalab/alphafold_non_dockerについて記載します.
dockerを使用したオリジナル版はこちらを参照Alphafold
使用計算機はCentOS7.9、CUDA-11.6、RTX A2000

alphafoldのコード取得

この中に予測に必要な「Genetic databases」と「model parameters」の取得方法がありますので、まずはコードを取得します
*最新がv2.2.0なので不要かなと思うが、tagのv2.2.0に合わせておきました

[root@centos7 ~]# mkdir -p /apps/src  && cd /apps
[root@centos7 apps]# git clone https://github.com/deepmind/alphafold  && cd alphafold
[root@centos7 alphafold]# git tag
v2.0.0
v2.0.1
v2.1.0
v2.1.1
v2.1.2
v2.2.0
[root@centos7 alphafold]# git checkout refs/tags/v2.2.0
[root@centos7 alphafold]# git branch --all
* (detached from v2.2.0)
  main
  remotes/origin/HEAD -> origin/main
  remotes/origin/main
[root@centos7 alphafold]#

alphafold_non_docker 実行環境

本家様では docker の利用を提案している.
ここでは冒頭に示したように docker を利用しない alphafold_non_docker 版を作ります。dockerを使用したオリジナル版はこちらを参照Alphafold

https://github.com/kalininalab/alphafold_non_docker はminicondaを使っている.
それもいいのだが、crYOLOとかtopazでここではanacondaを使っているのでそれに合わせてみる.
anaconda3-5.3.1ではなく最新のanaconda3-2021.11を使ってます

git clone https://github.com/yyuu/pyenv.git /apps/pyenv
export PYENV_ROOT=/apps/pyenv
export PATH=$PYENV_ROOT/bin:$PATH
pyenv install anaconda3-2021.11
export PATH=$PYENV_ROOT/versions/anaconda3-2021.11/bin:$PATH
 
既にpyenv/anaconda環境があるなら
export PYENV_ROOT=/apps/pyenv
export PATH=$PYENV_ROOT/bin:$PATH
eval "$(pyenv init - --no-rehash)"
export PATH=$PYENV_ROOT/versions/anaconda3-2021.11/bin/:$PATH

alphafold_non_docker 実行環境を作ります. RTX A2000向けに少々変更しています

[root@centos7 ~]# conda create -n alphafold python==3.8
 
[root@centos7 ~]# source activate alphafold
(alphafold) [root@centos7 ~]# conda install -y -c conda-forge openmm==7.5.1 cudnn==8.2.1.32 cudatoolkit==11.3.1 pdbfixer==1.7
               *オリジナルは「conda install -y -c conda-forge openmm==7.5.1 cudnn==8.2.1.32 cudatoolkit==11.0.3 pdbfixer==1.7」
 
(alphafold) [root@centos7 ~]# conda install -y -c bioconda hmmer==3.3.2 hhsuite==3.3.0 kalign2==2.04
   *オリジナルと同じ
 
(alphafold) [root@centos7 apps]# pip install absl-py==0.13.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 \
 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 numpy==1.19.5 scipy==1.7.0 tensorflow==2.5.0 pandas==1.3.4 tensorflow-cpu==2.5.0
   *オリジナルと同じ
 
(alphafold) [root@centos7 apps]# pip install           jax==0.2.25 jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
                *オリジナルは  「pip install --upgrade jax==0.2.14 jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html」
 
 
 
(alphafold) [root@centos7 apps]# pip install protobuf==3.20.0             <-- 動かない場合

その後はmm用のファイルを調達して

(alphafold) [root@centos7 ~]# cd /apps
(alphafold) [root@centos7 apps]# wget -P  alphafold/alphafold/common/ \
https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt --no-check-certificate

patchを適用します

(alphafold) [root@centos7 apps]# cd /apps/pyenv/versions/anaconda3-2021.11/envs/alphafold/lib/python3.8/site-packages/
(alphafold) [root@centos7 site-packages]# patch -p0 < /apps/alphafold/docker/openmm.patch
 
(alphafold) [root@centos7 site-packages]# source deactivate
[root@centos7 site-packages]#

スクリプトの準備

[root@centos7 ~]# cd /apps/src
[root@centos7 src]# git clone https://github.com/kalininalab/alphafold_non_docker
 
[root@centos7 src]# cp alphafold_non_docker/run_alphafold.sh /apps/alphafold/

「/apps/alphafold/run_alphafold.sh」は下記のように修正を加えてます.

--- /apps/alphafold/run_alphafold.sh.orig       2022-04-04 17:49:07.215185888 +0900
+++ /apps/alphafold/run_alphafold.sh    2022-04-04 17:48:57.735109552 +0900
@@ -131,7 +131,7 @@
 fi
 
 # This bash script looks for the run_alphafold.py script in its current working directory, if it does not exist then exits
-current_working_dir=$(pwd)
+current_working_dir=$alphafold_path
 alphafold_script="$current_working_dir/run_alphafold.py"
 
 if [ ! -f "$alphafold_script" ]; then

EnvironmentModules

「/etc/modulefiles/alphafold」として中身は下記のようにします

#%Module1.0
set          alphafold_path  /apps/alphafold
set          root /apps/pyenv/versions/anaconda3-2021.11/envs/alphafold
setenv       alphafold_path  $alphafold_path
prepend-path PATH  $root/bin:$alphafold_path

使ってみる

EnvironmentModulesを定義したので、まずはmoduleをloadしてから実行します

[saber@centos7 ~]$ module load alphafold
[saber@centos7 ~]$ run_alphafold.sh
 
Please make sure all required parameters are given
Usage: /apps/alphafold/run_alphafold.sh <OPTIONS>
Required Parameters:
-d <data_dir>         Path to directory of supporting data
-o <output_dir>       Path to a directory that will store the results.
-f <fasta_path>       Path to a FASTA file containing sequence. If a FASTA file contains multiple sequences, then it will be folded as a multimer
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets
Optional Parameters:
-g <use_gpu>          Enable NVIDIA runtime to run with GPUs (default: true)
-r <run_relax>        Whether to run the final relaxation step on the predicted models. Turning relax off might result in predictions with distracting (略
-e <enable_gpu_relax> Run relax on GPU if GPU is enabled (default: true)
-n <openmm_threads>   OpenMM threads (default: all available cores)
-a <gpu_devices>      Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
-m <model_preset>     Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or (略
-c <db_preset>        Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (略
-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration (略
-l <num_multimer_predictions_per_model> How many predictions (each with a different random seed) will be generated per model. E.g. if this is 2 and there (略
-b <benchmark>        Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time (略
 
[saber@centos7 ~]$

と使い方を示してくれます(途中省いてます)

[saber@centos7 ~]$ mkdir alphafold && cd $_
[saber@centos7 alphafold]$ cp /apps/src/alphafold_non_docker/example/query.fasta .
[saber@centos7 alphafold]$ cat query.fasta
>dummy_sequence
GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE
 
[saber@centos7 alphafold]$ run_alphafold.sh -d /apps/AlphafoldData -o . -f query.fasta -t 2020-05-14 -c reduced_dbs -g false -m monomer

num_recyleとjackhmmerで使用するcore数を引数で変更するには

alphafoldでのリサイクル数、jackhmmerによる配列検索時のcpu数、monomer予測時に使われるhhsearchのcpu数、
multimer予測時に使われるhmmsearchのcpu数をそれぞれ指定できるようにしてみた.

「/apps/alphafold/run_alphafold.sh」

--- ../src/alphafold_non_docker/run_alphafold.sh.orig   2022-06-09 02:34:27.897005704 +0900
+++ run_alphafold.sh    2022-06-12 14:46:32.518462539 +0900
@@ -23,10 +23,15 @@
         echo "-l <num_multimer_predictions_per_model> How many predictions (each with a different random seed) will be (略
         echo "-b <benchmark>        Run multiple JAX model evaluations to obtain a timing that excludes the compilation (略
         echo ""
+        echo "-C <num_recycle>      ReCycle number [3]"
+        echo "-N <n_cpu>            jackhmmer: number of parallel CPU workers to use for multithreads [8]"
+        echo "-h <hhsearch_cpu>     hhsearch: number of CPUs to use (for shared memory SMPs) [2](monomer)"
+        echo "-H <hmmsearch_cpu>    hmmsearch: number of parallel CPU workers to use for multithreads [8](multimer)"
+        echo ""
         exit 1
 }
 
-while getopts ":d:o:f:t:g:r:e:n:a:m:c:p:l:b" i; do
+while getopts ":d:o:f:t:g:r:e:n:a:m:c:p:l:C:N:h:H:b" i; do
         case "${i}" in
         d)
                 data_dir=$OPTARG
@@ -67,6 +72,18 @@
         l)
                 num_multimer_predictions_per_model=$OPTARG
         ;;
+        C)
+                num_recycle=$OPTARG
+        ;;
+        N)
+                n_cpu=$OPTARG
+        ;;
+        h)
+                hhsearch_cpu=$OPTARG
+        ;;
+        H)
+                hmmsearch_cpu=$OPTARG
+        ;;
         b)
                 benchmark=true
         ;;
@@ -78,6 +95,18 @@
     usage
 fi
 
+if [[ "$hmmsearch_cpu" == "" ]] ; then
+    hmmsearch_cpu=8
+fi
+if [[ "$hhsearch_cpu" == "" ]] ; then
+    hhsearch_cpu=2
+fi
+if [[ "$n_cpu" == "" ]] ; then
+    n_cpu=8
+fi
+if [[ "$num_recycle" == "" ]] ; then
+    num_recycle=3
+fi
 if [[ "$benchmark" == "" ]] ; then
     benchmark=false
 fi
@@ -131,7 +160,7 @@
 fi
 
 # This bash script looks for the run_alphafold.py script in its current working directory, if it does not exist then exits
-current_working_dir=$(pwd)
+current_working_dir=$alphafold_path
 alphafold_script="$current_working_dir/run_alphafold.py"
 
 if [ ! -f "$alphafold_script" ]; then
@@ -197,5 +226,6 @@
        database_paths="$database_paths --uniclust30_database_path=$uniclust30_database_path --bfd_database_path=$bfd_database_path"
 fi
 
+extra_args="--num_recycle=$num_recycle --n_cpu=$n_cpu --hhsearch_cpu=$hhsearch_cpu --hmmsearch_cpu=$hmmsearch_cpu"
 # Run AlphaFold with required parameters
-$(python $alphafold_script $binary_paths $database_paths $command_args)
+$(python $alphafold_script $binary_paths $database_paths $command_args $extra_args)

「/apps/alphafold/run_alphafold.py」

--- run_alphafold.py.orig       2022-06-09 02:35:22.146479521 +0900
+++ run_alphafold.py    2022-06-12 14:43:24.842855059 +0900
@@ -128,6 +128,10 @@
                      'Relax on GPU can be much faster than CPU, so it is '
                      'recommended to enable if possible. GPUs must be available'
                      ' if this setting is enabled.')
+flags.DEFINE_integer('num_recycle', None,'num_recycle')
+flags.DEFINE_integer('n_cpu', 8,'n_cpu')
+flags.DEFINE_integer('hhsearch_cpu', 2,'hhsearch_cpu')
+flags.DEFINE_integer('hmmsearch_cpu', 8,'hmmsearch_cpu')
 
 FLAGS = flags.FLAGS
 
@@ -315,6 +319,7 @@
     template_searcher = hmmsearch.Hmmsearch(
         binary_path=FLAGS.hmmsearch_binary_path,
         hmmbuild_binary_path=FLAGS.hmmbuild_binary_path,
+        hmmsearch_cpu=FLAGS.hmmsearch_cpu,
         database_path=FLAGS.pdb_seqres_database_path)
     template_featurizer = templates.HmmsearchHitFeaturizer(
         mmcif_dir=FLAGS.template_mmcif_dir,
@@ -326,6 +331,7 @@
   else:
     template_searcher = hhsearch.HHSearch(
         binary_path=FLAGS.hhsearch_binary_path,
+        hhsearch_cpu=FLAGS.hhsearch_cpu,
         databases=[FLAGS.pdb70_database_path])
     template_featurizer = templates.HhsearchHitFeaturizer(
         mmcif_dir=FLAGS.template_mmcif_dir,
@@ -337,6 +343,7 @@
 
   monomer_data_pipeline = pipeline.DataPipeline(
       jackhmmer_binary_path=FLAGS.jackhmmer_binary_path,
+      n_cpu=FLAGS.n_cpu,
       hhblits_binary_path=FLAGS.hhblits_binary_path,
       uniref90_database_path=FLAGS.uniref90_database_path,
       mgnify_database_path=FLAGS.mgnify_database_path,
@@ -359,6 +366,10 @@
     num_predictions_per_model = 1
     data_pipeline = monomer_data_pipeline
 
+  num_recycle = FLAGS.num_recycle
+  if num_recycle is None:
+    num_recycle = 3
+
   model_runners = {}
   model_names = config.MODEL_PRESETS[FLAGS.model_preset]
   for model_name in model_names:
@@ -367,6 +378,7 @@
       model_config.model.num_ensemble_eval = num_ensemble
     else:
       model_config.data.eval.num_ensemble = num_ensemble
+    model_config.data.common.num_recycle = FLAGS.num_recycle
     model_params = data.get_model_haiku_params(
         model_name=model_name, data_dir=FLAGS.data_dir)
     model_runner = model.RunModel(model_config, model_params)
@@ -417,6 +429,7 @@
       'max_template_date',
       'obsolete_pdbs_path',
       'use_gpu_relax',
+      'num_recycle',
   ])
 
   app.run(main)

「/apps/alphafold/alphafold/data/pipeline.py」

--- a/alphafold/data/pipeline.py
+++ b/alphafold/data/pipeline.py
@@ -124,15 +124,18 @@ class DataPipeline:
                use_small_bfd: bool,
                mgnify_max_hits: int = 501,
                uniref_max_hits: int = 10000,
+               n_cpu: int = 8,
                use_precomputed_msas: bool = False):
     """Initializes the data pipeline."""
     self._use_small_bfd = use_small_bfd
     self.jackhmmer_uniref90_runner = jackhmmer.Jackhmmer(
         binary_path=jackhmmer_binary_path,
+        n_cpu=n_cpu,
         database_path=uniref90_database_path)
     if use_small_bfd:
       self.jackhmmer_small_bfd_runner = jackhmmer.Jackhmmer(
           binary_path=jackhmmer_binary_path,
+          n_cpu=n_cpu,
           database_path=small_bfd_database_path)
     else:
       self.hhblits_bfd_uniclust_runner = hhblits.HHBlits(
@@ -140,6 +143,7 @@ class DataPipeline:
           databases=[bfd_database_path, uniclust30_database_path])
     self.jackhmmer_mgnify_runner = jackhmmer.Jackhmmer(
         binary_path=jackhmmer_binary_path,
+        n_cpu=n_cpu,
         database_path=mgnify_database_path)
     self.template_searcher = template_searcher
     self.template_featurizer = template_featurizer

「/apps/alphafold/alphafold/data/tools/hhsearch.py b/alphafold/data/tools/hhsearch.py」

--- a/alphafold/data/tools/hhsearch.py
+++ b/alphafold/data/tools/hhsearch.py
@@ -33,6 +33,7 @@ class HHSearch:
                *,
                binary_path: str,
                databases: Sequence[str],
+               hhsearch_cpu: int = 2,
                maxseq: int = 1_000_000):
     """Initializes the Python HHsearch wrapper.
 
@@ -50,6 +51,7 @@ class HHSearch:
     self.binary_path = binary_path
     self.databases = databases
     self.maxseq = maxseq
+    self.hhsearch_cpu = hhsearch_cpu
 
     for database_path in self.databases:
       if not glob.glob(database_path + '_*'):
@@ -79,6 +81,7 @@ class HHSearch:
       cmd = [self.binary_path,
              '-i', input_path,
              '-o', hhr_path,
+             '-cpu', str(self.hhsearch_cpu),
              '-maxseq', str(self.maxseq)
              ] + db_cmd
 

「/apps/alphafold/alphafold/data/tools/hmmsearch.py b/alphafold/data/tools/hmmsearch.py」

--- a/alphafold/data/tools/hmmsearch.py
+++ b/alphafold/data/tools/hmmsearch.py
@@ -33,6 +33,7 @@ class Hmmsearch(object):
                binary_path: str,
                hmmbuild_binary_path: str,
                database_path: str,
+               hmmsearch_cpu: int = 8,
                flags: Optional[Sequence[str]] = None):
     """Initializes the Python hmmsearch wrapper.
 
@@ -49,6 +50,7 @@ class Hmmsearch(object):
     self.binary_path = binary_path
     self.hmmbuild_runner = hmmbuild.Hmmbuild(binary_path=hmmbuild_binary_path)
     self.database_path = database_path
+    self.hmmsearch_cpu = hmmsearch_cpu
     if flags is None:
       # Default hmmsearch run settings.
       flags = ['--F1', '0.1',
@@ -89,7 +91,7 @@ class Hmmsearch(object):
       cmd = [
           self.binary_path,
           '--noali',  # Don't include the alignment in stdout.
-          '--cpu', '8'
+          '--cpu', str(self.hmmsearch_cpu)
       ]
       # If adding flags, we have to do so before the output and input:
       if self.flags:

トップ   編集 添付 複製 名前変更     ヘルプ   最終更新のRSS
Last-modified: 2022-12-17 (土) 20:10:10 (95d)