Skip to content

JCelestial/molFrame

Repository files navigation

molFrame v0.4.8

For when you are tired of writing new scripts for every Molecular Simulation....

molFrame is a workflow tool that will allow one to go from simulation to plot-ready data in minutes!

And more!


What is molFrame?

MolFrame is a utility that is used to process large scale trajectory data from Molecular Dynamics simulation programs such as VMD. In programs like VMD, simulation trajectory coordinates can be exported under multiple formats, e.g. .xyz, .pdb, or .dcd; as of this current version, molFrame is most compatible with .xyz and currently looking to build classes to expand upon molFrame's usability for more extensions.


Current methods available:

  • Print coordinates (no header lines, just pure trajectories)
  • Center of Mass
  • Angular Conformation
  • Order parameter
  • Energy Average and standard deviation

Planned methods:


Why do we need molFrame?

Simulation coordinate files are very cumbersome to parse as it is littered with unecessary headers and columns, which is why this version begins with one of the least complex extensions, .xyz. A typical .xyz file can appear as such:

36288
 generated by VMD
  C1        31.869669      -20.711391      -34.581379
  N1        30.844368      -20.086567      -33.779266
  H1        31.610401      -21.606871      -35.055351
  H2        32.800102      -20.909853      -34.014919
  H3        32.144650      -20.023277      -35.347126
  C2        30.857540      -18.913921      -33.099659
  C3        29.650688      -18.775717      -32.479210
  N2        28.933077      -19.915432      -32.859100
  C4        29.690832      -20.693037      -33.642181

Not to mention the first two lines shows up intermittently to mark the beginning of every simulation frame. In short, a large simulation containing hundreds of thousands of frames can have hundreds of these headers, which can cause potential parsing errors. In addition, making shell scripts that utilizes grep commands and regex leads to messy outcomes.

molFrame takes advantage of these headers by first appending the .xyz files with the terminate_xyz script that will attach a termination sequence at the very end of the file and then using those lines as checkpoints for allocation and deallocation of memory space, preventing memory leaks.

$ ./terminate_xyz trajectories.xyz > terminatedTrajectories.txt

Once appended, it can properly parse these files and due to the nature of how MD programs arrange their data, this allows molFrame to analyze the simulation metrics of the simulation, containing the following:

  • Total number of simulation frames
  • Total number of molecules within simulation frames
  • Number of atoms per molecule
========================================================
molFrame : The data has the following array dimensions... 

Simulation Frames: 500
Molecules per Frame: 672 
Atoms per Molecule: 56
========================================================

Why Fortran?

One reason, for speed. Typically, multiple simulated systems have to be analyzed, and due to their large size, it's difficult to analyze all of them concurrently without using High-Performance Clusters, so the next best thing is to quickly analyze them one by one. To put it in perspective, it takes minutes for Interpreted languages like Python or R to analyze files that are roughly half a GB in size, whereas molFrame takes about 30 seconds to give the user simulation metrics and analyses.

With that said, there are future plans on overhauling molFrame into a different language, such as C++ to support more Object Orientation and the utilization of more Data Structures in order to perform more complex methods. Another proposed alternative is to strip molFrame of its main interface and leave its methods alone and turn the Fortran components into a dynamic library that can be invoked in memory.


Bugs and Future Plans

Bugs

  • Fix the garray subroutine to reset the simulation frame metric

Future Plans

  • Incorporate methods to analyze protein files such surface area
  • Expand methods on Energy module to include other statistical metrics
  • Greater support for multithreading to allow concurrent execution of multiple analyses
  • Development of a GUI to make it more user friendly

About

For when you are tired of writing new scripts for every Molecular Simulation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published