Author: u1sbxhkjoqp2

  • iTransformer

    iTransformer

    Implementation of iTransformer – SOTA Time Series Forecasting using Attention networks, out of Tsinghua / Ant group

    All that remains is tabular data (xgboost still champion here) before one can truly declare “Attention is all you need”

    In before Apple gets the authors to change the name.

    The official implementation has been released here!

    Appreciation

    • StabilityAI and 🤗 Huggingface for the generous sponsorship, as well as my other sponsors, for affording me the independence to open source current artificial intelligence techniques.

    • Greg DeVos for sharing experiments he ran on iTransformer and some of the improvised variants

    Install

    $ pip install iTransformer

    Usage

    import torch
    from iTransformer import iTransformer
    
    # using solar energy settings
    
    model = iTransformer(
        num_variates = 137,
        lookback_len = 96,                  # or the lookback length in the paper
        dim = 256,                          # model dimensions
        depth = 6,                          # depth
        heads = 8,                          # attention heads
        dim_head = 64,                      # head dimension
        pred_length = (12, 24, 36, 48),     # can be one prediction, or many
        num_tokens_per_variate = 1,         # experimental setting that projects each variate to more than one token. the idea is that the network can learn to divide up into time tokens for more granular attention across time. thanks to flash attention, you should be able to accommodate long sequence lengths just fine
        use_reversible_instance_norm = True # use reversible instance normalization, proposed here https://openreview.net/forum?id=cGDAkQo1C0p . may be redundant given the layernorms within iTransformer (and whatever else attention learns emergently on the first layer, prior to the first layernorm). if i come across some time, i'll gather up all the statistics across variates, project them, and condition the transformer a bit further. that makes more sense
    )
    
    time_series = torch.randn(2, 96, 137)  # (batch, lookback len, variates)
    
    preds = model(time_series)
    
    # preds -> Dict[int, Tensor[batch, pred_length, variate]]
    #       -> (12: (2, 12, 137), 24: (2, 24, 137), 36: (2, 36, 137), 48: (2, 48, 137))

    For an improvised version that does granular attention across time tokens (as well as the original per-variate tokens), just import iTransformer2D and set the additional num_time_tokens

    Update: It works! Thanks goes out to Greg DeVos for running the experiment here!

    Update 2: Got an email. Yes you are free to write a paper on this, if the architecture holds up for your problem. I have no skin in the game

    import torch
    from iTransformer import iTransformer2D
    
    # using solar energy settings
    
    model = iTransformer2D(
        num_variates = 137,
        num_time_tokens = 16,               # number of time tokens (patch size will be (look back length // num_time_tokens))
        lookback_len = 96,                  # the lookback length in the paper
        dim = 256,                          # model dimensions
        depth = 6,                          # depth
        heads = 8,                          # attention heads
        dim_head = 64,                      # head dimension
        pred_length = (12, 24, 36, 48),     # can be one prediction, or many
        use_reversible_instance_norm = True # use reversible instance normalization
    )
    
    time_series = torch.randn(2, 96, 137)  # (batch, lookback len, variates)
    
    preds = model(time_series)
    
    # preds -> Dict[int, Tensor[batch, pred_length, variate]]
    #       -> (12: (2, 12, 137), 24: (2, 24, 137), 36: (2, 36, 137), 48: (2, 48, 137))

    Experimental

    iTransformer with fourier tokens

    A iTransformer but also with fourier tokens (FFT of time series is projected into tokens of their own and attended along side with the variate tokens, spliced out at the end)

    import torch
    from iTransformer import iTransformerFFT
    
    # using solar energy settings
    
    model = iTransformerFFT(
        num_variates = 137,
        lookback_len = 96,                  # or the lookback length in the paper
        dim = 256,                          # model dimensions
        depth = 6,                          # depth
        heads = 8,                          # attention heads
        dim_head = 64,                      # head dimension
        pred_length = (12, 24, 36, 48),     # can be one prediction, or many
        num_tokens_per_variate = 1,         # experimental setting that projects each variate to more than one token. the idea is that the network can learn to divide up into time tokens for more granular attention across time. thanks to flash attention, you should be able to accommodate long sequence lengths just fine
        use_reversible_instance_norm = True # use reversible instance normalization, proposed here https://openreview.net/forum?id=cGDAkQo1C0p . may be redundant given the layernorms within iTransformer (and whatever else attention learns emergently on the first layer, prior to the first layernorm). if i come across some time, i'll gather up all the statistics across variates, project them, and condition the transformer a bit further. that makes more sense
    )
    
    time_series = torch.randn(2, 96, 137)  # (batch, lookback len, variates)
    
    preds = model(time_series)
    
    # preds -> Dict[int, Tensor[batch, pred_length, variate]]
    #       -> (12: (2, 12, 137), 24: (2, 24, 137), 36: (2, 36, 137), 48: (2, 48, 137))

    Todo

    • beef up the transformer with latest findings
    • improvise a 2d version across both variates and time
    • improvise a version that includes fft tokens
    • improvise a variant that uses adaptive normalization conditioned on statistics across all variates

    Citation

    @misc{liu2023itransformer,
      title   = {iTransformer: Inverted Transformers Are Effective for Time Series Forecasting}, 
      author  = {Yong Liu and Tengge Hu and Haoran Zhang and Haixu Wu and Shiyu Wang and Lintao Ma and Mingsheng Long},
      year    = {2023},
      eprint  = {2310.06625},
      archivePrefix = {arXiv},
      primaryClass = {cs.LG}
    }

    @misc{shazeer2020glu,
        title   = {GLU Variants Improve Transformer},
        author  = {Noam Shazeer},
        year    = {2020},
        url     = {https://arxiv.org/abs/2002.05202}
    }

    @misc{burtsev2020memory,
        title   = {Memory Transformer},
        author  = {Mikhail S. Burtsev and Grigory V. Sapunov},
        year    = {2020},
        eprint  = {2006.11527},
        archivePrefix = {arXiv},
        primaryClass = {cs.CL}
    }

    @inproceedings{Darcet2023VisionTN,
        title   = {Vision Transformers Need Registers},
        author  = {Timoth'ee Darcet and Maxime Oquab and Julien Mairal and Piotr Bojanowski},
        year    = {2023},
        url     = {https://api.semanticscholar.org/CorpusID:263134283}
    }

    @inproceedings{dao2022flashattention,
        title   = {Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},
        author  = {Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
        booktitle = {Advances in Neural Information Processing Systems},
        year    = {2022}
    }

    @Article{AlphaFold2021,
        author  = {Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z}}{\'\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis},
        journal = {Nature},
        title   = {Highly accurate protein structure prediction with {AlphaFold}},
        year    = {2021},
        doi     = {10.1038/s41586-021-03819-2},
        note    = {(Accelerated article preview)},
    }

    @inproceedings{kim2022reversible,
        title   = {Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift},
        author  = {Taesung Kim and Jinhee Kim and Yunwon Tae and Cheonbok Park and Jang-Ho Choi and Jaegul Choo},
        booktitle = {International Conference on Learning Representations},
        year    = {2022},
        url     = {https://openreview.net/forum?id=cGDAkQo1C0p}
    }

    @inproceedings{Katsch2023GateLoopFD,
        title   = {GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling},
        author  = {Tobias Katsch},
        year    = {2023},
        url     = {https://api.semanticscholar.org/CorpusID:265018962}
    }

    @article{Zhou2024ValueRL,
        title   = {Value Residual Learning For Alleviating Attention Concentration In Transformers},
        author  = {Zhanchao Zhou and Tianyi Wu and Zhiyun Jiang and Zhenzhong Lan},
        journal = {ArXiv},
        year    = {2024},
        volume  = {abs/2410.17897},
        url     = {https://api.semanticscholar.org/CorpusID:273532030}
    }

    @article{Zhu2024HyperConnections,
        title   = {Hyper-Connections},
        author  = {Defa Zhu and Hongzhi Huang and Zihao Huang and Yutao Zeng and Yunyao Mao and Banggu Wu and Qiyang Min and Xun Zhou},
        journal = {ArXiv},
        year    = {2024},
        volume  = {abs/2409.19606},
        url     = {https://api.semanticscholar.org/CorpusID:272987528}
    }

    Visit original content creator repository

  • pqtl_pipeline_finemap

    pqtl_pipeline_finemap

    Fine mapping analysis within the pQTL pipeline project at Human Technopole, Milan, Italy

    We started this analysis pipeline in early April 2024. We adopted the Next-Flow (NF) pipeline developed by the Statistical Genomics team at Human Technopole and deployed it in Snakemake (SMK). We independtly validated each of the multiple analyses stated below before incorporating it in SMK.

    Locus Breaker

    We incorporated Locus Breaker (LB) function written in R (see publication PMID:) for example meta-analysis GWAS results of the proteins and we deployed it in SMK in mid April 2024.

    COJO Conditional Analysis

    Once running the pipeline, rule run_cojo will generate output files below:

    • list of independent variants resulted from GCTA cojo-slct (TSV/CSV)
    • conditional dataset for each independent signal resulted from GCTA cojo-cond (RDS)
    • fine-mapping results using coloc::coloc.ABF function, containing values such as l-ABF, posterior probabilities (PPI) for each variant (RDS)
    • colocalization info table containing credible set variants (with cumulative PPI > 0.99) for each independent variant
    • regional association plots

    These outputs are going to be stored in workspace_path provided by the user in config_finemap.yaml and stored in such directory:
    <workspace_path>/results/*/cojo/

    Colocalization of Two Proteins

    We performed colocalization (Giambartolomei et al., 2014) across the pQTL signals. To meet the fundamental assumption of colocalization of only one causal variant per locus, we used conditional datasets, thus performing one colocalization test per pair of independent SNPs in 2 overlapping loci. For each regional association and each target SNP, we identified a credible set as the set of variants with posterior inclusion probability (PIP) > 0.99 within the region. More precisely, using the conditional dataset, we computed Approximate Bayes Factors (ABF) with the ‘process.dataset’ function in the coloc v5.2.3 R package and calculated posterior probabilities by normalizing ABFs across variants. Variants were ranked, and those with a cumulative posterior probability exceeding 0.99 were included in the credible sets. Among XXX protein pairs with overlapping loci, XXX protein pairs sharing a credible set variant were then tested for colocalization using the ‘coloc.abf’ function. Colocalized pairs were identified when the posterior probability for hypothesis 4 assuming a shared causal variant for two proteins exceeded 0.80.

    New Features on Top of NF pipeline

    We also incorporated new features such as exclusion of signals in HLA and NLRP12 regions from the results and follow-up analyses, allowing user to decide through the configuration file.

    NOTE

    This SMK pipeline which is designed for pQTLs project does not include munging and alignment of input GWAS summary files. Therefore, it is a MUST to have your GWAS results completely harmonized by your genotype data. Eg. variants IDs, refrence/alternate (effect/other) alleles should be concordant across your input files. Our GWAS summary stats from REGENIE are already aligned with QC pipeline (adopted by GWASLab) developed by pQTL analysts team at Health Data Science Center.

    How to run the pipeline:

    You can use the default configuration file in config/config_finemap.yaml. Otherwise, prepare your configuration in config/ folder. Then, make sure that configfile in workflow/Snakefile matches with your newly created config file name. Then, run the pipeline by typing below command in bash.

    sbatch submit.sh

    Not interested to run colocalization?

    If you want to skip running colocalization with your traits, uncomment this #--until collect_credible_sets in Makefile. If you want to skip both COJO and colocalization and only run locus breaker, then change previous option in Makefile to --until collect_loci and run the pipeline as mentioned before.

    Workflow example

    example workflow

    Visit original content creator repository

  • MOC-Indexer

    MoC Indexer

    WARNING: DEPRECATED!.

    This repository is going to be archived

    Please use instead:

    Introduction

    To speed up the app we need an indexer of the blockchain of our contracts.
    The indexer query the status of the contracts
    and write to mongo database, so the app query the mongo instead of blockchain (slow).

    Indexer jobs

    1. Scan Raw TX: Indexing blocks
    2. Scan Events: Indexing events transactions
    3. Scan Prices: Scan prices
    4. Scan Moc State: Scan current moc state
    5. Scan Moc Status
    6. Scan MocState Status
    7. Scan User State Update
    8. Scan Blocks not processed
    9. Reconnect on lost chain

    Usage

    Requirement and installation

    • We need Python 3.6+
    • Brownie

    Install libraries

    pip install -r requirements.txt

    Brownie is a Python-based development and testing framework for smart contracts.
    Brownie is easy so we integrated it with Money on Chain.

    pip install eth-brownie==1.17.1

    Network Connections

    First we need to install custom networks (RSK Nodes) in brownie:

    console> brownie networks add RskNetwork rskTestnetPublic host=https://public-node.testnet.rsk.co chainid=31 explorer=https://blockscout.com/rsk/mainnet/api
    console> brownie networks add RskNetwork rskTestnetLocal host=http://localhost:4444 chainid=31 explorer=https://blockscout.com/rsk/mainnet/api
    console> brownie networks add RskNetwork rskMainnetPublic host=https://public-node.rsk.co chainid=30 explorer=https://blockscout.com/rsk/mainnet/api
    console> brownie networks add RskNetwork rskMainnetLocal host=http://localhost:4444 chainid=30 explorer=https://blockscout.com/rsk/mainnet/api
    brownie networks add BSCNetwork bscTestnet host=https://data-seed-prebsc-1-s1.binance.org:8545/ chainid=97 explorer=https://blockscout.com/rsk/mainnet/api
    

    Connection table

    Network Name Network node Host Chain
    rskTestnetPublic RSK Testnet Public https://public-node.testnet.rsk.co 31
    rskTestnetLocal RSK Testnet Local http://localhost:4444 31
    rskMainnetPublic RSK Mainnet Public https://public-node.rsk.co 30
    rskMainnetLocal RSK Mainnet Local http://localhost:4444 30
    bscTestnet BSC Testnet Public https://data-seed-prebsc-1-s1.binance.org:8545/ 97
    bscTestnetPrivate BSC Testnet Private http://localhost:8545/ 97

    Usage

    Example

    Make sure to change settings/settings-xxx.json to point to your mongo db.

    python ./app_run_moc_indexer.py --config=settings/aws-moc-alpha-testnet.json --config_network=mocTestnetAlpha --connection_network=rskTestnetPublic

    –config: Path to config.json

    –config_network=mocTestnetAlpha: Config Network name in the json

    –connection_network=rskTestnetPublic: Connection network in brownie

    Usage Docker

    Build

    bash ./docker_build.sh -e ec2_alphatestnet -c ./settings/aws-moc-alpha-testnet.json
    

    Run

    docker run -d \
    --name ec2_alphatestnet_1 \
    --env APP_MONGO_URI=mongodb://192.168.56.2:27017/ \
    --env APP_MONGO_DB=local_alpha_testnet2 \
    --env APP_CONFIG_NETWORK=mocTestnetAlpha \
    --env APP_CONNECTION_NETWORK=https://public-node.testnet.rsk.co,31 \
    moc_indexer_ec2_alphatestnet
    

    Custom node

    APP_CONNECTION_NETWORK: https://public-node.testnet.rsk.co,31

    AWS

    Starting building server

    First you have to start the building server in EC2

    Connect to builder with bastion

    ssh -F /home/martin/.ssh/bastion/moc_ssh_config moc-builder
    

    change user to builder

    sudo su builder -s /bin/bash
    

    AWS Building image

    ./aws_build_and_push.sh -e <environment> -c <config file> -i <aws id>
    

    Where environment could be

    • ec2_alphatestnet: alpha-testnet.moneyonchain.com
    • ec2_testnet: moc-testnet.moneyonchain.com
    • ec2_mainnet: alpha.moneyonchain.com
    • ec2_rdoc_mainnet: rif.moneyonchain.com
    • ec2_rdoc_testnet: rif-testnet.moneyonchain.com
    • ec2_rdoc_alphatestnet: rif-alpha.moneyonchain.com

    Finally it will build the docker image.

    Example:

    Before pushing the image, we need to check if ecr image exist, go to https://us-west-1.console.aws.amazon.com/ecr/repositories?region=us-west-1 and create it

    Ensure you have installed the latest version of the AWS CLI and Docker.

    Make sure you have built your image before pushing it.

    This script will tag with latest and push to the proper repository.

    $ ./aws_build_and_push.sh -e ec2_alphatestnet -c ./settings/aws-moc-mainnet2.json -i 123456 
    

    Setting up in AWS ECS

    On the task definition it’s important to set up the proper environment variables.

    1. APP_CONFIG: The config.json you find in your _settings/deploy_XXX.json folder as json
    2. AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY: these are needed for the heartbeat function of the jobs, as it needs an account that has write access to a metric in Cloudwatch
    3. APP_CONFIG_NETWORK: The network here is listed in APP_NETWORK
    4. APP_CONNECTION_NETWORK: The network here is listed in APP_CONNECTION_NETWORK
    5. APP_MONGO_URI: mongo uri
    6. APP_MONGO_DB: mongo db

    AWS Webservice Building image

    ./aws_webservice_build_and_push.sh -e <environment> -c <config file> -i <aws id>
    

    Where environment could be

    • alpha-testnet: alpha-testnet.moneyonchain.com
    • testnet: moc-testnet.moneyonchain.com
    • mainnet: alpha.moneyonchain.com
    • rdoc-mainnet: rif.moneyonchain.com
    • rdoc-testnet: rif-testnet.moneyonchain.com
    • rdoc-alphatestnet: rif-alpha.moneyonchain.com

    Finally it will build the docker image.

    Example:

    bash ./aws_webservice_build_and_push.sh -e alpha-testnet -i 123 -r us-west-1 -c ./settings/aws-moc-alpha-testnet.json
    

    Setting up in AWS ECS

    On the task definition it’s important to set up the proper environment variables.

    1. APP_CONFIG: The config.json you find in your _settings/deploy_XXX.json folder as json
    2. APP_MONGO_URI: mongo uri
    3. APP_MONGO_DB: mongo db

    Visit original content creator repository