Author: u1sbxhkjoqp2

iTransformer

Implementation of iTransformer – SOTA Time Series Forecasting using Attention networks, out of Tsinghua / Ant group

All that remains is tabular data (xgboost still champion here) before one can truly declare “Attention is all you need”

In before Apple gets the authors to change the name.

The official implementation has been released here!

Appreciation

StabilityAI and 🤗 Huggingface for the generous sponsorship, as well as my other sponsors, for affording me the independence to open source current artificial intelligence techniques.
Greg DeVos for sharing experiments he ran on iTransformer and some of the improvised variants

Install

$ pip install iTransformer

Usage

import torch
from iTransformer import iTransformer

# using solar energy settings

model = iTransformer(
    num_variates = 137,
    lookback_len = 96,                  # or the lookback length in the paper
    dim = 256,                          # model dimensions
    depth = 6,                          # depth
    heads = 8,                          # attention heads
    dim_head = 64,                      # head dimension
    pred_length = (12, 24, 36, 48),     # can be one prediction, or many
    num_tokens_per_variate = 1,         # experimental setting that projects each variate to more than one token. the idea is that the network can learn to divide up into time tokens for more granular attention across time. thanks to flash attention, you should be able to accommodate long sequence lengths just fine
    use_reversible_instance_norm = True # use reversible instance normalization, proposed here https://openreview.net/forum?id=cGDAkQo1C0p . may be redundant given the layernorms within iTransformer (and whatever else attention learns emergently on the first layer, prior to the first layernorm). if i come across some time, i'll gather up all the statistics across variates, project them, and condition the transformer a bit further. that makes more sense
)

time_series = torch.randn(2, 96, 137)  # (batch, lookback len, variates)

preds = model(time_series)

# preds -> Dict[int, Tensor[batch, pred_length, variate]]
#       -> (12: (2, 12, 137), 24: (2, 24, 137), 36: (2, 36, 137), 48: (2, 48, 137))

For an improvised version that does granular attention across time tokens (as well as the original per-variate tokens), just import iTransformer2D and set the additional num_time_tokens

Update: It works! Thanks goes out to Greg DeVos for running the experiment here!

Update 2: Got an email. Yes you are free to write a paper on this, if the architecture holds up for your problem. I have no skin in the game

import torch
from iTransformer import iTransformer2D

# using solar energy settings

model = iTransformer2D(
    num_variates = 137,
    num_time_tokens = 16,               # number of time tokens (patch size will be (look back length // num_time_tokens))
    lookback_len = 96,                  # the lookback length in the paper
    dim = 256,                          # model dimensions
    depth = 6,                          # depth
    heads = 8,                          # attention heads
    dim_head = 64,                      # head dimension
    pred_length = (12, 24, 36, 48),     # can be one prediction, or many
    use_reversible_instance_norm = True # use reversible instance normalization
)

time_series = torch.randn(2, 96, 137)  # (batch, lookback len, variates)

preds = model(time_series)

# preds -> Dict[int, Tensor[batch, pred_length, variate]]
#       -> (12: (2, 12, 137), 24: (2, 24, 137), 36: (2, 36, 137), 48: (2, 48, 137))

Experimental

iTransformer with fourier tokens

A iTransformer but also with fourier tokens (FFT of time series is projected into tokens of their own and attended along side with the variate tokens, spliced out at the end)

import torch
from iTransformer import iTransformerFFT

# using solar energy settings

model = iTransformerFFT(
    num_variates = 137,
    lookback_len = 96,                  # or the lookback length in the paper
    dim = 256,                          # model dimensions
    depth = 6,                          # depth
    heads = 8,                          # attention heads
    dim_head = 64,                      # head dimension
    pred_length = (12, 24, 36, 48),     # can be one prediction, or many
    num_tokens_per_variate = 1,         # experimental setting that projects each variate to more than one token. the idea is that the network can learn to divide up into time tokens for more granular attention across time. thanks to flash attention, you should be able to accommodate long sequence lengths just fine
    use_reversible_instance_norm = True # use reversible instance normalization, proposed here https://openreview.net/forum?id=cGDAkQo1C0p . may be redundant given the layernorms within iTransformer (and whatever else attention learns emergently on the first layer, prior to the first layernorm). if i come across some time, i'll gather up all the statistics across variates, project them, and condition the transformer a bit further. that makes more sense
)

time_series = torch.randn(2, 96, 137)  # (batch, lookback len, variates)

preds = model(time_series)

# preds -> Dict[int, Tensor[batch, pred_length, variate]]
#       -> (12: (2, 12, 137), 24: (2, 24, 137), 36: (2, 36, 137), 48: (2, 48, 137))

Todo

beef up the transformer with latest findings
improvise a 2d version across both variates and time
improvise a version that includes fft tokens
improvise a variant that uses adaptive normalization conditioned on statistics across all variates

Citation

@misc{liu2023itransformer,
  title   = {iTransformer: Inverted Transformers Are Effective for Time Series Forecasting}, 
  author  = {Yong Liu and Tengge Hu and Haoran Zhang and Haixu Wu and Shiyu Wang and Lintao Ma and Mingsheng Long},
  year    = {2023},
  eprint  = {2310.06625},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG}
}

@misc{shazeer2020glu,
    title   = {GLU Variants Improve Transformer},
    author  = {Noam Shazeer},
    year    = {2020},
    url     = {https://arxiv.org/abs/2002.05202}
}

@misc{burtsev2020memory,
    title   = {Memory Transformer},
    author  = {Mikhail S. Burtsev and Grigory V. Sapunov},
    year    = {2020},
    eprint  = {2006.11527},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

@inproceedings{Darcet2023VisionTN,
    title   = {Vision Transformers Need Registers},
    author  = {Timoth'ee Darcet and Maxime Oquab and Julien Mairal and Piotr Bojanowski},
    year    = {2023},
    url     = {https://api.semanticscholar.org/CorpusID:263134283}
}

@inproceedings{dao2022flashattention,
    title   = {Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},
    author  = {Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
    booktitle = {Advances in Neural Information Processing Systems},
    year    = {2022}
}

@Article{AlphaFold2021,
    author  = {Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z}}{\'\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis},
    journal = {Nature},
    title   = {Highly accurate protein structure prediction with {AlphaFold}},
    year    = {2021},
    doi     = {10.1038/s41586-021-03819-2},
    note    = {(Accelerated article preview)},
}

@inproceedings{kim2022reversible,
    title   = {Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift},
    author  = {Taesung Kim and Jinhee Kim and Yunwon Tae and Cheonbok Park and Jang-Ho Choi and Jaegul Choo},
    booktitle = {International Conference on Learning Representations},
    year    = {2022},
    url     = {https://openreview.net/forum?id=cGDAkQo1C0p}
}

@inproceedings{Katsch2023GateLoopFD,
    title   = {GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling},
    author  = {Tobias Katsch},
    year    = {2023},
    url     = {https://api.semanticscholar.org/CorpusID:265018962}
}

@article{Zhou2024ValueRL,
    title   = {Value Residual Learning For Alleviating Attention Concentration In Transformers},
    author  = {Zhanchao Zhou and Tianyi Wu and Zhiyun Jiang and Zhenzhong Lan},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2410.17897},
    url     = {https://api.semanticscholar.org/CorpusID:273532030}
}

@article{Zhu2024HyperConnections,
    title   = {Hyper-Connections},
    author  = {Defa Zhu and Hongzhi Huang and Zihao Huang and Yutao Zeng and Yunyao Mao and Banggu Wu and Qiyang Min and Xun Zhou},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2409.19606},
    url     = {https://api.semanticscholar.org/CorpusID:272987528}
}

pqtl_pipeline_finemap

Fine mapping analysis within the pQTL pipeline project at Human Technopole, Milan, Italy

We started this analysis pipeline in early April 2024. We adopted the Next-Flow (NF) pipeline developed by the Statistical Genomics team at Human Technopole and deployed it in Snakemake (SMK). We independtly validated each of the multiple analyses stated below before incorporating it in SMK.

Locus Breaker

We incorporated Locus Breaker (LB) function written in R (see publication PMID:) for example meta-analysis GWAS results of the proteins and we deployed it in SMK in mid April 2024.

COJO Conditional Analysis

Once running the pipeline, rule run_cojo will generate output files below:

list of independent variants resulted from GCTA cojo-slct (TSV/CSV)
conditional dataset for each independent signal resulted from GCTA cojo-cond (RDS)
fine-mapping results using coloc::coloc.ABF function, containing values such as l-ABF, posterior probabilities (PPI) for each variant (RDS)
colocalization info table containing credible set variants (with cumulative PPI > 0.99) for each independent variant
regional association plots

These outputs are going to be stored in workspace_path provided by the user in config_finemap.yaml and stored in such directory:
<workspace_path>/results/*/cojo/

Colocalization of Two Proteins

We performed colocalization (Giambartolomei et al., 2014) across the pQTL signals. To meet the fundamental assumption of colocalization of only one causal variant per locus, we used conditional datasets, thus performing one colocalization test per pair of independent SNPs in 2 overlapping loci. For each regional association and each target SNP, we identified a credible set as the set of variants with posterior inclusion probability (PIP) > 0.99 within the region. More precisely, using the conditional dataset, we computed Approximate Bayes Factors (ABF) with the ‘process.dataset’ function in the coloc v5.2.3 R package and calculated posterior probabilities by normalizing ABFs across variants. Variants were ranked, and those with a cumulative posterior probability exceeding 0.99 were included in the credible sets. Among XXX protein pairs with overlapping loci, XXX protein pairs sharing a credible set variant were then tested for colocalization using the ‘coloc.abf’ function. Colocalized pairs were identified when the posterior probability for hypothesis 4 assuming a shared causal variant for two proteins exceeded 0.80.

New Features on Top of NF pipeline

We also incorporated new features such as exclusion of signals in HLA and NLRP12 regions from the results and follow-up analyses, allowing user to decide through the configuration file.

NOTE

This SMK pipeline which is designed for pQTLs project does not include munging and alignment of input GWAS summary files. Therefore, it is a MUST to have your GWAS results completely harmonized by your genotype data. Eg. variants IDs, refrence/alternate (effect/other) alleles should be concordant across your input files. Our GWAS summary stats from REGENIE are already aligned with QC pipeline (adopted by GWASLab) developed by pQTL analysts team at Health Data Science Center.

How to run the pipeline:

You can use the default configuration file in config/config_finemap.yaml. Otherwise, prepare your configuration in config/ folder. Then, make sure that configfile in workflow/Snakefile matches with your newly created config file name. Then, run the pipeline by typing below command in bash.

sbatch submit.sh

Not interested to run colocalization?

If you want to skip running colocalization with your traits, uncomment this #--until collect_credible_sets in Makefile. If you want to skip both COJO and colocalization and only run locus breaker, then change previous option in Makefile to --until collect_loci and run the pipeline as mentioned before.

Workflow example

MoC Indexer

WARNING: DEPRECATED!.

This repository is going to be archived

Please use instead:

API: Stable Protocol API

Indexer: Stable Protocol Indexer

Introduction

To speed up the app we need an indexer of the blockchain of our contracts.
The indexer query the status of the contracts
and write to mongo database, so the app query the mongo instead of blockchain (slow).

Indexer jobs

Scan Raw TX: Indexing blocks
Scan Events: Indexing events transactions
Scan Prices: Scan prices
Scan Moc State: Scan current moc state
Scan Moc Status
Scan MocState Status
Scan User State Update
Scan Blocks not processed
Reconnect on lost chain

Usage

Requirement and installation

We need Python 3.6+
Brownie

Install libraries

pip install -r requirements.txt

Brownie is a Python-based development and testing framework for smart contracts.
Brownie is easy so we integrated it with Money on Chain.

pip install eth-brownie==1.17.1

Network Connections

First we need to install custom networks (RSK Nodes) in brownie:

console> brownie networks add RskNetwork rskTestnetPublic host=https://public-node.testnet.rsk.co chainid=31 explorer=https://blockscout.com/rsk/mainnet/api
console> brownie networks add RskNetwork rskTestnetLocal host=http://localhost:4444 chainid=31 explorer=https://blockscout.com/rsk/mainnet/api
console> brownie networks add RskNetwork rskMainnetPublic host=https://public-node.rsk.co chainid=30 explorer=https://blockscout.com/rsk/mainnet/api
console> brownie networks add RskNetwork rskMainnetLocal host=http://localhost:4444 chainid=30 explorer=https://blockscout.com/rsk/mainnet/api
brownie networks add BSCNetwork bscTestnet host=https://data-seed-prebsc-1-s1.binance.org:8545/ chainid=97 explorer=https://blockscout.com/rsk/mainnet/api

Connection table

Network Name	Network node	Host	Chain
rskTestnetPublic	RSK Testnet Public	https://public-node.testnet.rsk.co	31
rskTestnetLocal	RSK Testnet Local	http://localhost:4444	31
rskMainnetPublic	RSK Mainnet Public	https://public-node.rsk.co	30
rskMainnetLocal	RSK Mainnet Local	http://localhost:4444	30
bscTestnet	BSC Testnet Public	https://data-seed-prebsc-1-s1.binance.org:8545/	97
bscTestnetPrivate	BSC Testnet Private	http://localhost:8545/	97

Usage

Example

Make sure to change settings/settings-xxx.json to point to your mongo db.

python ./app_run_moc_indexer.py --config=settings/aws-moc-alpha-testnet.json --config_network=mocTestnetAlpha --connection_network=rskTestnetPublic

–config: Path to config.json

–config_network=mocTestnetAlpha: Config Network name in the json

–connection_network=rskTestnetPublic: Connection network in brownie

Usage Docker

Build

bash ./docker_build.sh -e ec2_alphatestnet -c ./settings/aws-moc-alpha-testnet.json

Run

docker run -d \
--name ec2_alphatestnet_1 \
--env APP_MONGO_URI=mongodb://192.168.56.2:27017/ \
--env APP_MONGO_DB=local_alpha_testnet2 \
--env APP_CONFIG_NETWORK=mocTestnetAlpha \
--env APP_CONNECTION_NETWORK=https://public-node.testnet.rsk.co,31 \
moc_indexer_ec2_alphatestnet

Custom node

APP_CONNECTION_NETWORK: https://public-node.testnet.rsk.co,31

AWS

Starting building server

First you have to start the building server in EC2

Connect to builder with bastion

ssh -F /home/martin/.ssh/bastion/moc_ssh_config moc-builder

change user to builder

sudo su builder -s /bin/bash

AWS Building image

./aws_build_and_push.sh -e <environment> -c <config file> -i <aws id>

Where environment could be

ec2_alphatestnet: alpha-testnet.moneyonchain.com
ec2_testnet: moc-testnet.moneyonchain.com
ec2_mainnet: alpha.moneyonchain.com
ec2_rdoc_mainnet: rif.moneyonchain.com
ec2_rdoc_testnet: rif-testnet.moneyonchain.com
ec2_rdoc_alphatestnet: rif-alpha.moneyonchain.com

Finally it will build the docker image.

Example:

Before pushing the image, we need to check if ecr image exist, go to https://us-west-1.console.aws.amazon.com/ecr/repositories?region=us-west-1 and create it

Ensure you have installed the latest version of the AWS CLI and Docker.

Make sure you have built your image before pushing it.

This script will tag with latest and push to the proper repository.

$ ./aws_build_and_push.sh -e ec2_alphatestnet -c ./settings/aws-moc-mainnet2.json -i 123456

Setting up in AWS ECS

On the task definition it’s important to set up the proper environment variables.

APP_CONFIG: The config.json you find in your _settings/deploy_XXX.json folder as json
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY: these are needed for the heartbeat function of the jobs, as it needs an account that has write access to a metric in Cloudwatch
APP_CONFIG_NETWORK: The network here is listed in APP_NETWORK
APP_CONNECTION_NETWORK: The network here is listed in APP_CONNECTION_NETWORK
APP_MONGO_URI: mongo uri
APP_MONGO_DB: mongo db

AWS Webservice Building image

./aws_webservice_build_and_push.sh -e <environment> -c <config file> -i <aws id>

Where environment could be

alpha-testnet: alpha-testnet.moneyonchain.com
testnet: moc-testnet.moneyonchain.com
mainnet: alpha.moneyonchain.com
rdoc-mainnet: rif.moneyonchain.com
rdoc-testnet: rif-testnet.moneyonchain.com
rdoc-alphatestnet: rif-alpha.moneyonchain.com

Finally it will build the docker image.

Example:

bash ./aws_webservice_build_and_push.sh -e alpha-testnet -i 123 -r us-west-1 -c ./settings/aws-moc-alpha-testnet.json

Setting up in AWS ECS

On the task definition it’s important to set up the proper environment variables.

APP_CONFIG: The config.json you find in your _settings/deploy_XXX.json folder as json
APP_MONGO_URI: mongo uri
APP_MONGO_DB: mongo db

Author: u1sbxhkjoqp2

iTransformer

iTransformer

Appreciation

Install

Usage

Experimental

iTransformer with fourier tokens

Todo

Citation

pqtl_pipeline_finemap

pqtl_pipeline_finemap

Locus Breaker

COJO Conditional Analysis

Colocalization of Two Proteins

New Features on Top of NF pipeline

NOTE

How to run the pipeline:

Not interested to run colocalization?

Workflow example

MOC-Indexer

MoC Indexer

WARNING: DEPRECATED!.

This repository is going to be archived

Please use instead:

API: Stable Protocol API

Indexer: Stable Protocol Indexer

Introduction

Indexer jobs

Usage

Custom node

AWS

Starting building server

AWS Building image

Setting up in AWS ECS

AWS Webservice Building image

Setting up in AWS ECS