uvarc · kah3f · Nov 24, 2025 · Nov 18, 2025
diff --git a/content/authors/det/_index.md b/content/authors/det/_index.md
@@ -0,0 +1,26 @@
+---
+# Display name
+title: Deb Triant
+
+# Username (this should match the folder name)
+authors:
+- det
+
+# Is this the primary user of the site?
+superuser: false
+
+# Role/position
+role: Research Computing Scientist
+
+# Organizations/Affiliations
+organizations:
+- name: University of Virginia Research Computing
+  url: "https://www.rc.virginia.edu"
+
+
+interests:
+- Bioinformatics
+- HPC 
+- Research
+
+---
diff --git a/content/notes/bioinfo-intro/02-intro.md b/content/notes/bioinfo-intro/02-intro.md
@@ -0,0 +1,17 @@
+---
+title: Bioinformatics
+date: 2025-08-23-03:19:53Z
+type: docs 
+weight: 150
+menu: 
+    bioinfo-intro:
+---
+
+The term _Bioinformatics_ first appeared in 1970s and exploded in the 1990s with the Human Genome Project and the rise of high-throughput sequencing technologies. Earlier roots include computational tools for analyzing molecular data developed in the 1960s, with methodological precedents in wartime cryptanalytic work from the 1940s. 
+
+> [Read More](https://www.nature.com/articles/35042090)
+
+{{< figure src=/notes/bioinfo-intro/img/bioinformatics-ss.png caption="Bioinformatics sits at the intersection of biology, computer science, mathematics/statistics, engineering, and biochemistry." width=70% height=70% >}}
+
+
+
diff --git a/content/notes/bioinfo-intro/03-analys-types.md b/content/notes/bioinfo-intro/03-analys-types.md
@@ -0,0 +1,59 @@
+---
+title: Types of Bioinformatics Analyses
+date: 2025-08-23-03:19:53Z
+type: docs 
+weight: 200
+menu: 
+    bioinfo-intro:
+        parent: Bioinformatics
+---
+
+The following are common categories of analyses performed in modern genomics and systems biology.
+
+**1. Proteomics**
+
+Proteomics is the large-scale study of proteins. 
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_6.png caption="Protein structure ribbon diagram" width=30% height=30% >}}
+
+**2. Metabolomics**
+
+Metabolomics focuses on the complete set of small molecules within a biological sample.
+
+**3. RNA-Seq**
+
+RNA Sequencing is used to quantify RNA molecules and gene expression. 
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_7.png caption="RNA-seq protocol similarity heatmap" width=45% height=45% >}}
+
+**4. Single-cell Analysis**
+
+Single-cell analysis explores gene expression at the individual cell level. 
+
+**5. Genome Assembly and Annotation**
+
+Genome assembly and annotation reconstructs complete genomes from short or long sequencing reads and labels genes, regulatory regions, and functional elements.
+
+**6. Regulatory Genomics**
+
+Regulatory genomics explores how DNA and other factors control gene expression patterns.
+
+**7. Variant Calling and Haplotype Analysis**
+
+Variant calling and haplotype analysis identifies base substitutions (Single Nucleotide Variants), helping identify mutations.
+
+Example SNV: 
+
+C <span style="color:#3469c0">A</span> GCTTA               <span style="color:#3469c0">G</span>
+
+<span style="color:#ff0000">T</span> GCTTA                <span style="color:#ff0000">T</span>
+
+<span style="color:#3469c0">A</span> GCTTA               <span style="color:#3469c0">G</span>
+
+A <span style="color:#3469c0">A</span> GCTTACG         <span style="color:#3469c0">G</span>
+
+><small>Blue = reference base (G), red = alternate base (T).</small>
+
+[Read More: RNA-Seq Methods](https://www.nature.com/articles/s41592-024-02298-3)
+
+
diff --git a/content/notes/bioinfo-intro/04-databases.md b/content/notes/bioinfo-intro/04-databases.md
@@ -0,0 +1,62 @@
+---
+title: Databases
+date: 2025-08-23-03:19:53Z
+type: docs 
+weight: 250
+menu: 
+    bioinfo-intro:
+        parent: Bioinformatics
+---
+
+**InterPro**
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_8.png width=45% height=45% >}}
+
+
+[https://www.ebi.ac.uk/interpro/entry/pfam](https://www.ebi.ac.uk/interpro/entry/pfam)
+
+---
+
+**National Library of Medicine** 
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_9.png width=45% height=45% >}}
+
+
+[https://www.ncbi.nlm.nih.gov](https://www.ncbi.nlm.nih.gov)
+
+---
+
+**Ensembl** 
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_11.png width=45% height=45% >}}
+
+
+[https://www.ensembl.org/index.html](https://www.ensembl.org/index.html)
+
+---
+
+**Fang**
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_10.png width=45% height=45% >}}
+
+
+[https://data.faang.org/home](https://data.faang.org/home)
+
+---
+
+**EMBL-EBI**
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_13.png width=65% height=65% >}}
+
+
+[https://www.ebi.ac.uk](https://www.ebi.ac.uk)
+
+---
+
+**RGD**
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_12.png width=75% height=75% >}}
+
+
+[https://rgd.mcw.edu](https://rgd.mcw.edu)
+
diff --git a/content/notes/bioinfo-intro/05-technologies.md b/content/notes/bioinfo-intro/05-technologies.md
@@ -0,0 +1,21 @@
+---
+title: Sequencing Technologies
+date: 2025-08-23-03:19:53Z
+type: docs 
+weight: 300
+menu: 
+    bioinfo-intro:
+        parent: Bioinformatics
+---
+
+
+**Illumina**: Illumina tools generate short read sequences (< 1kb). They are widely used for whole-genome and exome sequencing, small RNA/microRNA profiling, and many single-cell applications.
+
+**PacBio**: PacBio generates long read sequences (~ 25 kb). The PacBio Revio sequencer is available at UVA.
+
+**Nanopore**: Nanopore generates "ultra-long" sequences (up to 1Mb). 
+
+**HiC**: HiC is a crosslinking technique used to capture interactions within a genome. 
+
+
+
diff --git a/content/notes/bioinfo-intro/06-pacbio.md b/content/notes/bioinfo-intro/06-pacbio.md
@@ -0,0 +1,19 @@
+---
+title: PacBio HiFi Reads
+date: 2025-08-23-03:19:53Z
+type: docs 
+weight: 350
+menu: 
+    bioinfo-intro:
+        parent: Bioinformatics
+---
+
+The below figure compares how the different sequencing technologies map reads to the STRC gene.
+
+{{< figure src=/notes/bioinfo-intro/img/pacbio.png width=90% height=90% >}}
+
+This shows how PacBio produces reads that are both long and accurate.
+
+[Read More](https://www.pacb.com)
+
+
diff --git a/content/notes/bioinfo-intro/07-fileformats.md b/content/notes/bioinfo-intro/07-fileformats.md
@@ -0,0 +1,70 @@
+---
+title: File Formats
+date: 2025-08-23-03:19:53Z
+type: docs 
+weight: 400
+menu: 
+    bioinfo-intro:
+        parent: Bioinformatics
+---
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_16.png caption="Source: https://xkcd.com/927/" width=80% height=80% >}}
+
+The format name usually denotes the file suffix. 
+
+**FASTA** files (suffix: `.fasta`, `.fna`, `.fa`) store sequencing data. 
+
+**FASTQ** files (suffix: `.fastq`) include sequencing data and quality scores. 
+
+**SAM/BAM** files (suffix: `.sam`/`.bam`) were developed for next-generation sequencing (NGS) data. SAM stands for Sequence Alignment Map. These files are used to store alignment information. 
+
+**VCF** (suffix: `.vcf`) stands for Variant Call Format. These files are used to store information about genetic variants. [Read More](https://samtools.github.io/hts-specs/VCFv4.2.pdf)
+
+**GFF3** (suffix: `.gff3`) stands for Generic Feature Format (version 3). These files are used to store information about genomic features. [Read More](https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md) 
+
+**BED** (suffix: `.bed`) stands for Browser Extensible Data format. These files are used to store genomic regions. [Read More](https://github.com/arq5x/bedtools2)
+
+### FASTA Format 
+
+A FASTA file begins with a header line, indicated by the `>` symbol, that contains an identifier and optional description The following lines contain the biological sequence itself.
+
+
+<span style="color:#ff0000"> __>__ </span> NP_000552.2 Human glutathione transferase M1 (GSTM1) ```
+MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEFEKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPKCLDAFPNLKDFISRFEGLEKISAYMKSSRFLPRPVFSKMAVWGNK```
+
+
+### FASTQ Format
+
+FASTQ files are helpful for base calling, quality control, and trimming. 
+
+Most sequencing tools return data in FASTQ format with quality scores included (ASCII code). 
+
+FASTQ files contain four lines: 
+1. ID, beginning with `@`
+2. Sequence
+3. Description line (typically a `+`)
+4. Base qualities in ASCII format
+
+```plaintext
+@SEQ_ID
+GATTTGGGGTTCAAAGCAGTATCGATCAAATA
++
+!''*((((***+))%%%++)(%%%%).1***-
+```
+
+**FASTQ File Example: Multiple Reads**
+
+```plaintext
+@M00747:32:000000000-A16RG:1:1112:15153:29246 1:N:0:1
+TCGATCGAGTAACTCGCTGCTGTCAGACTGGTTTTTGGTCGATCGACTATTGTTTCAGTCGCAAGAATATTGTGTCCAGTCGATCGACTGAATTCTGCTGTACGGCCACGGCGGATGCACGGTACAGCAGGCTCAGACGGATTAAACTGTT
++ 
+5=9=9<=9,-5@<<55>,6+8AC>EE.88AE9CDD7>+7.CC9CD+++5@=-FCCA@EF@+**+*--55--AA---AA-5A<9C+3+<9)4++=E=+===<D94)00=9)))2@624(/(/2/-(.(6;9(((((.(.'((6-66<6(///
+@M00747:32:000000000-A16RG:1:1112:15536:29246 1:N:0:1
+GTAAAATTGAGGTAAATTGTGCGGAATTTAGCAATACCGTTTTTTTTATTATCACCGGATATCTATTCTGCTGTACGGCCAAGGAGGATGTACGGTACAGCAGGTGCGAACTCACTCCGACGCTCAAGTCAGTGACTTAATGATAAGCGTG
++
+?????<BBBBBB5<?BFFFFFFECHEFFECCFF?9AAC>7@FHHHHHHFG?EAFGF@EEDEHHDGHHCBDFFGDFHF)<CCD@F,+3=CFBDFHBD++??DBDEEEDE:):CBEEEBCE68>?))5?**0?:AE*A*0//:/*:*:**.0)
+@M00747:32:000000000-A16RG:1:1112:15513:29246 1:N:0:1
+GCTAGTCTTGTGTTTAGTTTTATGTTTTGCATGTTGTAACGGATTCATAAACATAGGTGTTTGTTTCTTTTTATGGTTGTACAATTTGGCCCTAAGGCCCTACACTTACTTGTTTGTTTCTTTTATGGTACGACATTTGAGTGGTGGTTGA
++
+```
+
diff --git a/content/notes/bioinfo-intro/08-qualityscores.md b/content/notes/bioinfo-intro/08-qualityscores.md
@@ -0,0 +1,27 @@
+---
+title: Quality Scores
+date: 2025-08-23-03:19:53Z
+type: docs 
+weight: 450
+menu: 
+    bioinfo-intro:
+        parent: Bioinformatics
+---
+
+`Q` (Quality) scores are defined as a property that is logarithmically related to the base calling error probabilities (`P`). 
+
+### Calculating Phred Quality Scores - Base calling accuracy
+
+$$
+Q = -10 \log_{10} P
+$$
+
+`Q` represents the sequencing quality score of a given base Q
+
+`P` represents the probability of base call being wrong
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_21.png caption="Table source: https://www.illumina.com/Documents/products/technotes/technote_Q-Scores.pdf " width=90% height=90% >}}
+
+While next-generation sequencing metrics vary from those of Sanger sequencing (e.g., no electropherogram peak heights), the process of generating a Phred quality scoring scheme is largely the same. 
+
+[More on Quality Scores](https://help.basespace.illumina.com/files-used-by-basespace/quality-scores)
diff --git a/content/notes/bioinfo-intro/09-sambam.md b/content/notes/bioinfo-intro/09-sambam.md
@@ -0,0 +1,38 @@
+---
+title: SAM/BAM Sequence Alignment
+date: 2025-08-23-03:19:53Z
+type: docs 
+weight: 500
+menu: 
+    bioinfo-intro:
+        parent: Bioinformatics
+---
+
+An alignment file provides context for raw data. It has 11 tab-delimited columns with one alignment record per line. 
+
+`SAM` is plain-text (human readable), whereas `BAM` is in binary format. 
+
+[SAMTools](http://samtools.sourceforge.net) is a suite of utilities for SAM/BAM files. [Picard](https://broadinstitute.github.io/picard/) is a set of tools for sequencing data. 
+
+### Example SAM file
+
+```plaintext
+D4ZHLFP1:53:D2386ACXX:6:2115:17945:68812 0 Mle_000001 18 42 108M * 0 0
+    TCCCCCTGCATGGTCCGTCTGCGTGCAATCGCATGAGTATGCCTCCAGCATGAGTTACCGATCGTGGACACCTGCTTG
+GCCAAGATGTACTGAGATGCAT
+C@CFDEFFHHGHHFGBGFEGGDGGGEHGHGGGJJJJIIGIIB9BFBFHGHHICEAHGGEGEDHIGEEDBECCACBDDC@CCDBCDD<
+?2+4>@4>>CCCAA@@  AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0A107 
+    YT:Z:UU
+D4ZHLFP1:53:D2386ACXX:7:2110:5214:83081 0 Mle_000001 18 42 108M * 0 0
+TCCCCCTGCATGGTCCGTCTGCGTGCAATCGCATGAGTATGCCTCCAGCATGAGTTACCGATCGTGGCAACCTGCTTGCCAA
+GATGTACTGAGATGCAT
+CCCFFFFHHHHHHHGGGEGIJIIGJFHJJJJIJIJJIJIJGIJJIJJIJFHJJJIJJHHFFCEEEEEDDDDDDDDDDDDD  AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0A107      
+    YT:Z:UU
+D4ZHLFP1:53:D2386ACXX:7:2206:9985:31556 0 Mle_000001 18 42 108M * 0 0
+TCCCCCTGCATGGTCCGTCTGCGTGCAATCGCATGAGTATGCCTCCAGCATGAGTTACCGATCGTGGCAACCTGCTTGCCAA
+GATGTACTGAGATGCAT
+CCCEFFFFHHHHHJJIJHJJIJIJJIJIJJJJIJIJJJIJJIJJJIJJJGEFFEEEEDDDDDDDDDDDDDDDDDDDDDDD  AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0A107 
+    YT:Z:UU
+```
+
+Helpful site for looking up `SAM` flags: [https://broadinstitute.github.io/picard/explain-flags.html](https://broadinstitute.github.io/picard/explain-flags.html)
diff --git a/content/notes/bioinfo-intro/10-fastqc-qualityreads.md b/content/notes/bioinfo-intro/10-fastqc-qualityreads.md
@@ -0,0 +1,21 @@
+---
+title: Checking Read Quality - FASTQC
+date: 2025-08-23-03:19:53Z
+type: docs 
+weight: 550
+menu: 
+    bioinfo-intro:
+        parent: Bioinformatics
+---
+
+FASTQC provides an overview of sequencing read quality.
+
+Sample FASTQC reports displaying varying metrics: 
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_24.png width=85% height=85% caption="FASTQC report showing per-base sequence quality, with most bases maintaining high quality across reads and slight drops at the read ends typical of Illumina data." >}}
+
+{{< figure src=/notes/bioinfo-intro/img/Intro-Bioinformatics-for-posting_20250604_25.png width=85% height=85% caption="FASTQC report showing per-base sequence quality, where read quality declines toward the end, indicating potential sequencing degradation or lower confidence in base calls at later positions." >}}
+
+{{< figure src=/notes/bioinfo-intro/img/readq3.png width=85% height=85% caption="FASTQC report showing a decline in per-base sequence quality toward the end of reads, indicating significant quality drop-off and potential sequencing errors in later positions." >}}
+
+[Read More](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
diff --git a/content/notes/bioinfo-intro/11-rcresources.md b/content/notes/bioinfo-intro/11-rcresources.md
@@ -0,0 +1,16 @@
+---
+title: Research Computing Resources
+date: 2025-08-23-03:19:53Z
+type: docs 
+weight: 600
+menu: 
+    bioinfo-intro:
+---
+
+Relevant Tutorials: 
+
+[Using UVA's HPC System from the Terminal](https://learning.rc.virginia.edu/notes/hpc-from-terminal/)
+
+[HPC orientation session and office hours](https://www.rc.virginia.edu/support/#office-hours)
+
+