Skip to content

cfe-lab/codeclub1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeClub1 Program1

This program extracts the vif gene sequences from a set of HIV sequences by aligning them to the HXB2 reference genome.

What it does

  • Reads the HXB2 reference sequence from data/hxb2.fasta
  • Reads multiple query sequences from data/sequences.fasta
  • For each query sequence:
    • Performs a global alignment against HXB2
    • Maps the vif gene coordinates (positions 5243-5619 in HXB2) to the query sequence
    • Extracts and prints the corresponding vif sequence in FASTA format

How to run

  1. Ensure you have uv installed.

  2. Navigate to the project root directory.

  3. Run the program:

    uv run program1

The output will be printed to stdout, with each vif sequence in FASTA format.

Output format

For each input sequence, the program outputs:

  • A FASTA header: >{sequence_id}_vif
  • The extracted vif sequence

Notes

  • The program uses global pairwise alignment with BioPython's PairwiseAligner.
  • Coordinate mapping is handled by the aligntools library.
  • The program skips sequences where the vif region cannot be successfully mapped.

About

Repository for CFE's first code club session

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages