Quick Start
===========

Installation
------------

Install Peptacular using pip:

.. code-block:: bash

   pip install peptacular

Basic Usage
-----------

Import as a namespace:

.. testcode::

   import peptacular as pt

.. testsetup:: *

   import peptacular as pt

Parse/Serialize
~~~~~~~~~~~~~~~

`pt.parse` will return a ProFormaAnnotation object representing the sequence with modifications.

.. testcode::
   
   # Parse a ProForma sequence
   peptide = pt.parse("PEM[Oxidation]TIDE/2")
   print(peptide.charge_state)
   print(peptide.serialize())

.. testoutput::

   2
   PEM[Oxidation]TIDE/2


Parse and Modify Sequences
~~~~~~~~~~~~~~~~~~~~~~~~~~~

A number of methods exist to modify the ProFormaAnnotation object after parsing. You can set, append, extend, and remove modifications.

There are a lot of accepted input values for these methods, see the API documentation for details. 
But generally, strings representing modification names or identifiers, and float / int values are accepted.

append / extend methods will add to existing modifications (increasing the count if the modification already exists at that position),
while set methods will overwrite existing modifications at that position.

Annotations have the following modification types: static, isotope, labile, nterm, cterm, internal, interval, charge (charge state or adducts).

.. testcode::

   # Parse ProForma sequences
   simple = pt.parse("PEPTIDE")
   modified = pt.parse("[Acetyl]-PEP[Oxidation]TIDE/2")
   labeled = pt.parse("<13C>PEPTIDE")

   # Programmatic modification
   annot = pt.ProFormaAnnotation(sequence="PEPTIDE")
   annot.set_nterm_mods({"Acetyl": 1})
   annot.set_internal_mods_at_index(2, {"Oxidation": 1})
   annot.set_charge(2)
   print(annot.serialize())

.. testoutput::

   [Acetyl]-PEP[Oxidation]TIDE/2


Fragment Generation
~~~~~~~~~~~~~~~~~~~

The fragment method requires that you provide a list of ion types and charge states to generate. It will return a list of FragmentAnnotation objects.
If no charge states are provided, it will default to the charge state of the peptide, or charge state 1 if the peptide does not have a charge state assigned.

.. testcode::

   peptide = pt.parse("PEM[Oxidation]TIDE/2")
   
   # Fragment the peptide
   for frag in peptide.fragment(ion_types=['b', 'y'], charges=[1, 2])[:5]:
       print(f"{frag.ion_type}{frag.position}+{frag.charge_state}: {frag.mz:.3f}")

.. testoutput::
   
   b1+1: 98.060
   b1+2: 49.534
   b2+1: 227.103
   b2+2: 114.055
   b3+1: 374.138

Protein Digestion
~~~~~~~~~~~~~~~~~

The digest method will return spans representing the (start, end, missed_cleavages) of each peptide generated from the digestion.
You can readily create peptide annotations by slicing the protein annotation with the spans.

A number of proteases are available in pt.Proteases, but you can also provide your own cleavage rules via a regex pattern.

.. testcode::

   # Digest a protein
   protein = pt.parse("PEM[Oxidation]TRPEPTIDEKPEPTIDEIDE/2")
   for span in protein.digest(pt.Proteases.TRYPSIN, missed_cleavages=1):
       print(f"  {protein[span].serialize()}")

.. testoutput::
   :hide:

   ...

There is an additional digest option which is based on amino acids rather than regex patterns.

.. testcode::

   # Digest a protein
   protein = pt.parse("PEM[Oxidation]TRPEPTIDEKPEPTIDEIDE/2")
   for span in protein.simple_digest(cleave_on="KR", restrict_after="P", missed_cleavages=1):
       print(f"  {protein[span].serialize()}")

.. testoutput::
   :hide:

   ...


Mass / MZ / Composition
~~~~~~~~~~~~~~~~~~~~~~~

There are a number of methods available to calculate masses, m/z values, and elemental compositions of peptides. 

This is where modifications are parsed and applied to the calculations. So previously parsed ProFormaAnnotation objects just required valid syntax, 
but for mass/comp calculations the modifications must also be recognized. For mass calculations to work, all modifications must be mass resolvable. For compositions,
all modifications must be composition resolvable (This means delta mass modifications eg. [+15.99] will not work for compositions).

.. testcode::

   peptide = pt.parse("PEM[Oxidation]TIDE/2")
   
   # Calculate masses with different parameters
   precursor_mass = peptide.mass(ion_type="p")
   neutral_mass = peptide.neutral_mass(ion_type="p")
   mz_2plus = peptide.mz(charge=2)

   # With isotopes and losses
   mass_c13 = peptide.mass(isotopes=1)
   mass_h2o_loss = peptide.mass(deltas={'H-2O-1': 1})

   # Adduct masses
   mass_na = peptide.mass(charge='Na:z+1')

Isotopic Patterns
~~~~~~~~~~~~~~~~~

Peptacular also includes functionality to generate isotopic distributions for peptides and fragments. This relies on parsing a valid composition for the peptide/modifications.
If the composition cannot be resolved, you may use estimate_isotopic_distribution instead which will use an averagine model to estimate the composition based on the peptide's mass.

.. testcode::

   peptide = pt.parse("PEM[Oxidation]TIDE/2")
   
   # Generate isotopic distribution
   dist = peptide.isotopic_distribution(
       max_isotopes=5,
       min_abundance_threshold=0.01,
       distribution_resolution=3
   )

   # For fragments with modifications
   dist_fragment = peptide.isotopic_distribution(
       ion_type='y',
       charge=2,
       isotopes=1,
       deltas={'H-2O-1': 1}
   )


Physicochemical Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~
These are based solely on the amino acid sequence, modifications are not considered in these calculations.

Properties are accessible via the `prop` property. See the API documentation for a full list of available properties and methods. There are also options to calculate custom properties via the calc_property method.


.. testcode::

   peptide = pt.parse("PEM[Oxidation]TIDE/2")
   
   # Simple properties
   hydro = peptide.prop.hydrophobicity
   pi = peptide.prop.pi
   arom = peptide.prop.aromaticity

   # Secondary structure prediction
   ss = peptide.prop.secondary_structure()
   print(f"Alpha helix: {ss['alpha_helix']:.1f}%")
   print(f"Beta sheet: {ss['beta_sheet']:.1f}%")

.. testoutput::
   :hide:

   ...

.. testcode::

   peptide = pt.parse("PEM[Oxidation]TIDE/2")
   
   # Custom property calculations
   custom = peptide.prop.calc_property(
       scale=pt.HydrophobicityScale.KYTE_DOOLITTLE,
       aggregation_method='avg'
   )

Format Conversion
~~~~~~~~~~~~~~~~~

Peptacular supports conversion from and to a number of popular peptide sequence formats.

.. testcode::

   # From other formats
   ip2_seq = pt.convert_ip2_sequence('K.PEPTIDE.K')
   diann_seq = pt.convert_diann_sequence('_PEM[Carbamidomethyl]TIDE_')
   casanovo_seq = pt.convert_casanovo_sequence('+43.006PEPTIDE')

   # To MS2PIP format
   peptide = pt.parse("PEM[Oxidation]TIDE")
   unmod_seq, mod_str = peptide.to_ms2_pip()

Batch Processing
~~~~~~~~~~~~~~~~

Most functions in Peptacular support batch processing of multiple sequences via lists. Parallelization is handled automatically when 
providing a list of sequences to one of the sequential API functions. The sequence API functions support strings and annotation objects as input.

.. testcode::

   # Process multiple sequences in parallel
   sequences = ['PEPTIDE', 'PEPTIDES', 'PEPTIDEST']

   # Automatic parallelization for lists
   masses = pt.mass(sequences)
   compositions = pt.comp(sequences)
   annotations = pt.parse(sequences)

   # Properties
   hydrophobicity = pt.hydrophobicity(sequences)
   pi_values = pt.pi(sequences)