過去ページ Alphafold-v2.0.0
本家様 https://github.com/deepmind/alphafold
AIを使った蛋白質立体構造予測プログラム.
ここでは本家様が利用される docker を使わない「alphafold_non_docker」版https://github.com/kalininalab/alphafold_non_docker について記載します.
dockerを使用したオリジナル版はこちらを参照Alphafold
使用計算機はCentOS7.9、CUDA-11.6、RTX A2000
alphafoldのコード取得 †
この中に予測に必要な「Genetic databases」と「model parameters」の取得方法がありますので、まずはコードを取得します
*最新がv2.2.0なので不要かなと思うが、tagのv2.2.0に合わせておきました
[root@centos7 ~]# mkdir -p /apps/src && cd /apps
[root@centos7 apps]# git clone https://github.com/deepmind/alphafold && cd alphafold
[root@centos7 alphafold]# git tag
v2.0.0
v2.0.1
v2.1.0
v2.1.1
v2.1.2
v2.2.0
[root@centos7 alphafold]# git checkout refs/tags/v2.2.0
[root@centos7 alphafold]# git branch --all
* (detached from v2.2.0)
main
remotes/origin/HEAD -> origin/main
remotes/origin/main
[root@centos7 alphafold]#
alphafold_non_docker 実行環境 †
本家様では docker の利用を提案している.
ここでは冒頭に示したように docker を利用しない alphafold_non_docker 版を作ります。dockerを使用したオリジナル版はこちらを参照Alphafold
https://github.com/kalininalab/alphafold_non_docker はminicondaを使っている.
それもいいのだが、crYOLOとかtopazでここではanacondaを使っているのでそれに合わせてみる.
anaconda3-5.3.1ではなく最新のanaconda3-2021.11を使ってます
git clone https://github.com/yyuu/pyenv.git /apps/pyenv
export PYENV_ROOT=/apps/pyenv
export PATH=$PYENV_ROOT/bin:$PATH
pyenv install anaconda3-2021.11
export PATH=$PYENV_ROOT/versions/anaconda3-2021.11/bin:$PATH
既にpyenv/anaconda環境があるなら
export PYENV_ROOT=/apps/pyenv
export PATH=$PYENV_ROOT/bin:$PATH
eval "$(pyenv init - --no-rehash)"
export PATH=$PYENV_ROOT/versions/anaconda3-2021.11/bin/:$PATH
alphafold_non_docker 実行環境を作ります. RTX A2000向けに少々変更しています
[root@centos7 ~]# conda create -n alphafold python==3.8
[root@centos7 ~]# source activate alphafold
(alphafold) [root@centos7 ~]# conda install -y -c conda-forge openmm==7.5.1 cudnn==8.2.1.32 cudatoolkit==11.3.1 pdbfixer==1.7
*オリジナルは「conda install -y -c conda-forge openmm==7.5.1 cudnn==8.2.1.32 cudatoolkit==11.0.3 pdbfixer==1.7」
(alphafold) [root@centos7 ~]# conda install -y -c bioconda hmmer==3.3.2 hhsuite==3.3.0 kalign2==2.04
*オリジナルと同じ
(alphafold) [root@centos7 apps]# pip install absl-py==0.13.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 \
immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 numpy==1.19.5 scipy==1.7.0 tensorflow==2.5.0 pandas==1.3.4 tensorflow-cpu==2.5.0
*オリジナルと同じ
(alphafold) [root@centos7 apps]# pip install jax==0.2.25 jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
*オリジナルは 「pip install --upgrade jax==0.2.14 jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html」
(alphafold) [root@centos7 apps]# pip install protobuf==3.20.0 <-- 動かない場合
その後はmm用のファイルを調達して
(alphafold) [root@centos7 ~]# cd /apps
(alphafold) [root@centos7 apps]# wget -P alphafold/alphafold/common/ \
https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt --no-check-certificate
patchを適用します
(alphafold) [root@centos7 apps]# cd /apps/pyenv/versions/anaconda3-2021.11/envs/alphafold/lib/python3.8/site-packages/
(alphafold) [root@centos7 site-packages]# patch -p0 < /apps/alphafold/docker/openmm.patch
(alphafold) [root@centos7 site-packages]# source deactivate
[root@centos7 site-packages]#
スクリプトの準備
[root@centos7 ~]# cd /apps/src
[root@centos7 src]# git clone https://github.com/kalininalab/alphafold_non_docker
[root@centos7 src]# cp alphafold_non_docker/run_alphafold.sh /apps/alphafold/
「/apps/alphafold/run_alphafold.sh」は下記のように修正を加えてます.
| --- /apps/alphafold/run_alphafold.sh.orig 2022-04-04 17:49:07.215185888 +0900
+++ /apps/alphafold/run_alphafold.sh 2022-04-04 17:48:57.735109552 +0900
@@ -131,7 +131,7 @@
fi
# This bash script looks for the run_alphafold.py script in its current working directory, if it does not exist then exits
-current_working_dir=$(pwd)
+current_working_dir=$alphafold_path
alphafold_script="$current_working_dir/run_alphafold.py"
if [ ! -f "$alphafold_script" ]; then
|
EnvironmentModules †
「/etc/modulefiles/alphafold」として中身は下記のようにします
#%Module1.0
set alphafold_path /apps/alphafold
set root /apps/pyenv/versions/anaconda3-2021.11/envs/alphafold
setenv alphafold_path $alphafold_path
prepend-path PATH $root/bin:$alphafold_path
使ってみる †
EnvironmentModulesを定義したので、まずはmoduleをloadしてから実行します
[saber@centos7 ~]$ module load alphafold
[saber@centos7 ~]$ run_alphafold.sh
Please make sure all required parameters are given
Usage: /apps/alphafold/run_alphafold.sh <OPTIONS>
Required Parameters:
-d <data_dir> Path to directory of supporting data
-o <output_dir> Path to a directory that will store the results.
-f <fasta_path> Path to a FASTA file containing sequence. If a FASTA file contains multiple sequences, then it will be folded as a multimer
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets
Optional Parameters:
-g <use_gpu> Enable NVIDIA runtime to run with GPUs (default: true)
-r <run_relax> Whether to run the final relaxation step on the predicted models. Turning relax off might result in predictions with distracting (略
-e <enable_gpu_relax> Run relax on GPU if GPU is enabled (default: true)
-n <openmm_threads> OpenMM threads (default: all available cores)
-a <gpu_devices> Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
-m <model_preset> Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or (略
-c <db_preset> Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (略
-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration (略
-l <num_multimer_predictions_per_model> How many predictions (each with a different random seed) will be generated per model. E.g. if this is 2 and there (略
-b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time (略
[saber@centos7 ~]$
と使い方を示してくれます(途中省いてます)
[saber@centos7 ~]$ mkdir alphafold && cd $_
[saber@centos7 alphafold]$ cp /apps/src/alphafold_non_docker/example/query.fasta .
[saber@centos7 alphafold]$ cat query.fasta
>dummy_sequence
GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE
[saber@centos7 alphafold]$ run_alphafold.sh -d /apps/AlphafoldData -o . -f query.fasta -t 2020-05-14 -c reduced_dbs -g false -m monomer
num_recyleとjackhmmerで使用するcore数を引数で変更するには †
alphafoldでのリサイクル数、jackhmmerによる配列検索時のcpu数、monomer予測時に使われるhhsearchのcpu数、
multimer予測時に使われるhmmsearchのcpu数をそれぞれ指定できるようにしてみた.
「/apps/alphafold/run_alphafold.sh」
| --- ../src/alphafold_non_docker/run_alphafold.sh.orig 2022-06-09 02:34:27.897005704 +0900
+++ run_alphafold.sh 2022-06-12 14:46:32.518462539 +0900
@@ -23,10 +23,15 @@
echo "-l <num_multimer_predictions_per_model> How many predictions (each with a different random seed) will be (略
echo "-b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation (略
echo ""
+ echo "-C <num_recycle> ReCycle number [3]"
+ echo "-N <n_cpu> jackhmmer: number of parallel CPU workers to use for multithreads [8]"
+ echo "-h <hhsearch_cpu> hhsearch: number of CPUs to use (for shared memory SMPs) [2](monomer)"
+ echo "-H <hmmsearch_cpu> hmmsearch: number of parallel CPU workers to use for multithreads [8](multimer)"
+ echo ""
exit 1
}
-while getopts ":d:o:f:t:g:r:e:n:a:m:c:p:l:b" i; do
+while getopts ":d:o:f:t:g:r:e:n:a:m:c:p:l:C:N:h:H:b" i; do
case "${i}" in
d)
data_dir=$OPTARG
@@ -67,6 +72,18 @@
l)
num_multimer_predictions_per_model=$OPTARG
;;
+ C)
+ num_recycle=$OPTARG
+ ;;
+ N)
+ n_cpu=$OPTARG
+ ;;
+ h)
+ hhsearch_cpu=$OPTARG
+ ;;
+ H)
+ hmmsearch_cpu=$OPTARG
+ ;;
b)
benchmark=true
;;
@@ -78,6 +95,18 @@
usage
fi
+if [[ "$hmmsearch_cpu" == "" ]] ; then
+ hmmsearch_cpu=8
+fi
+if [[ "$hhsearch_cpu" == "" ]] ; then
+ hhsearch_cpu=2
+fi
+if [[ "$n_cpu" == "" ]] ; then
+ n_cpu=8
+fi
+if [[ "$num_recycle" == "" ]] ; then
+ num_recycle=3
+fi
if [[ "$benchmark" == "" ]] ; then
benchmark=false
fi
@@ -131,7 +160,7 @@
fi
# This bash script looks for the run_alphafold.py script in its current working directory, if it does not exist then exits
-current_working_dir=$(pwd)
+current_working_dir=$alphafold_path
alphafold_script="$current_working_dir/run_alphafold.py"
if [ ! -f "$alphafold_script" ]; then
@@ -197,5 +226,6 @@
database_paths="$database_paths --uniclust30_database_path=$uniclust30_database_path --bfd_database_path=$bfd_database_path"
fi
+extra_args="--num_recycle=$num_recycle --n_cpu=$n_cpu --hhsearch_cpu=$hhsearch_cpu --hmmsearch_cpu=$hmmsearch_cpu"
# Run AlphaFold with required parameters
-$(python $alphafold_script $binary_paths $database_paths $command_args)
+$(python $alphafold_script $binary_paths $database_paths $command_args $extra_args)
|
「/apps/alphafold/run_alphafold.py」
| --- run_alphafold.py.orig 2022-06-09 02:35:22.146479521 +0900
+++ run_alphafold.py 2022-06-12 14:43:24.842855059 +0900
@@ -128,6 +128,10 @@
'Relax on GPU can be much faster than CPU, so it is '
'recommended to enable if possible. GPUs must be available'
' if this setting is enabled.')
+flags.DEFINE_integer('num_recycle', None,'num_recycle')
+flags.DEFINE_integer('n_cpu', 8,'n_cpu')
+flags.DEFINE_integer('hhsearch_cpu', 2,'hhsearch_cpu')
+flags.DEFINE_integer('hmmsearch_cpu', 8,'hmmsearch_cpu')
FLAGS = flags.FLAGS
@@ -315,6 +319,7 @@
template_searcher = hmmsearch.Hmmsearch(
binary_path=FLAGS.hmmsearch_binary_path,
hmmbuild_binary_path=FLAGS.hmmbuild_binary_path,
+ hmmsearch_cpu=FLAGS.hmmsearch_cpu,
database_path=FLAGS.pdb_seqres_database_path)
template_featurizer = templates.HmmsearchHitFeaturizer(
mmcif_dir=FLAGS.template_mmcif_dir,
@@ -326,6 +331,7 @@
else:
template_searcher = hhsearch.HHSearch(
binary_path=FLAGS.hhsearch_binary_path,
+ hhsearch_cpu=FLAGS.hhsearch_cpu,
databases=[FLAGS.pdb70_database_path])
template_featurizer = templates.HhsearchHitFeaturizer(
mmcif_dir=FLAGS.template_mmcif_dir,
@@ -337,6 +343,7 @@
monomer_data_pipeline = pipeline.DataPipeline(
jackhmmer_binary_path=FLAGS.jackhmmer_binary_path,
+ n_cpu=FLAGS.n_cpu,
hhblits_binary_path=FLAGS.hhblits_binary_path,
uniref90_database_path=FLAGS.uniref90_database_path,
mgnify_database_path=FLAGS.mgnify_database_path,
@@ -359,6 +366,10 @@
num_predictions_per_model = 1
data_pipeline = monomer_data_pipeline
+ num_recycle = FLAGS.num_recycle
+ if num_recycle is None:
+ num_recycle = 3
+
model_runners = {}
model_names = config.MODEL_PRESETS[FLAGS.model_preset]
for model_name in model_names:
@@ -367,6 +378,7 @@
model_config.model.num_ensemble_eval = num_ensemble
else:
model_config.data.eval.num_ensemble = num_ensemble
+ model_config.data.common.num_recycle = FLAGS.num_recycle
model_params = data.get_model_haiku_params(
model_name=model_name, data_dir=FLAGS.data_dir)
model_runner = model.RunModel(model_config, model_params)
@@ -417,6 +429,7 @@
'max_template_date',
'obsolete_pdbs_path',
'use_gpu_relax',
+ 'num_recycle',
])
app.run(main)
|
「/apps/alphafold/alphafold/data/pipeline.py」
| --- a/alphafold/data/pipeline.py
+++ b/alphafold/data/pipeline.py
@@ -124,15 +124,18 @@ class DataPipeline:
use_small_bfd: bool,
mgnify_max_hits: int = 501,
uniref_max_hits: int = 10000,
+ n_cpu: int = 8,
use_precomputed_msas: bool = False):
"""Initializes the data pipeline."""
self._use_small_bfd = use_small_bfd
self.jackhmmer_uniref90_runner = jackhmmer.Jackhmmer(
binary_path=jackhmmer_binary_path,
+ n_cpu=n_cpu,
database_path=uniref90_database_path)
if use_small_bfd:
self.jackhmmer_small_bfd_runner = jackhmmer.Jackhmmer(
binary_path=jackhmmer_binary_path,
+ n_cpu=n_cpu,
database_path=small_bfd_database_path)
else:
self.hhblits_bfd_uniclust_runner = hhblits.HHBlits(
@@ -140,6 +143,7 @@ class DataPipeline:
databases=[bfd_database_path, uniclust30_database_path])
self.jackhmmer_mgnify_runner = jackhmmer.Jackhmmer(
binary_path=jackhmmer_binary_path,
+ n_cpu=n_cpu,
database_path=mgnify_database_path)
self.template_searcher = template_searcher
self.template_featurizer = template_featurizer
|
「/apps/alphafold/alphafold/data/tools/hhsearch.py b/alphafold/data/tools/hhsearch.py」
| --- a/alphafold/data/tools/hhsearch.py
+++ b/alphafold/data/tools/hhsearch.py
@@ -33,6 +33,7 @@ class HHSearch:
*,
binary_path: str,
databases: Sequence[str],
+ hhsearch_cpu: int = 2,
maxseq: int = 1_000_000):
"""Initializes the Python HHsearch wrapper.
@@ -50,6 +51,7 @@ class HHSearch:
self.binary_path = binary_path
self.databases = databases
self.maxseq = maxseq
+ self.hhsearch_cpu = hhsearch_cpu
for database_path in self.databases:
if not glob.glob(database_path + '_*'):
@@ -79,6 +81,7 @@ class HHSearch:
cmd = [self.binary_path,
'-i', input_path,
'-o', hhr_path,
+ '-cpu', str(self.hhsearch_cpu),
'-maxseq', str(self.maxseq)
] + db_cmd
|
「/apps/alphafold/alphafold/data/tools/hmmsearch.py b/alphafold/data/tools/hmmsearch.py」
| --- a/alphafold/data/tools/hmmsearch.py
+++ b/alphafold/data/tools/hmmsearch.py
@@ -33,6 +33,7 @@ class Hmmsearch(object):
binary_path: str,
hmmbuild_binary_path: str,
database_path: str,
+ hmmsearch_cpu: int = 8,
flags: Optional[Sequence[str]] = None):
"""Initializes the Python hmmsearch wrapper.
@@ -49,6 +50,7 @@ class Hmmsearch(object):
self.binary_path = binary_path
self.hmmbuild_runner = hmmbuild.Hmmbuild(binary_path=hmmbuild_binary_path)
self.database_path = database_path
+ self.hmmsearch_cpu = hmmsearch_cpu
if flags is None:
# Default hmmsearch run settings.
flags = ['--F1', '0.1',
@@ -89,7 +91,7 @@ class Hmmsearch(object):
cmd = [
self.binary_path,
'--noali', # Don't include the alignment in stdout.
- '--cpu', '8'
+ '--cpu', str(self.hmmsearch_cpu)
]
# If adding flags, we have to do so before the output and input:
if self.flags:
|
|