Skip to content

Adapting FIMO file to the correct format for centipede_data #17

@Rosmaninho

Description

@Rosmaninho

FIMO from the MEME suite website outputs data in the following format:

motif_id motif_alt_id sequence_name start stop strand score p-value q-value matched_sequence
ZNF528 MA1597.1 Peak_31367#chr12#10213230#10213429 54 70 - 27.9633 1.45e-10 2.88e-05 CCCAGGGAAGCCATCTC
ZNF528 MA1597.1 Peak_31367#chr12#10213177#10213376 107 123 - 27.9633 1.45e-10 2.88e-05 CCCAGGGAAGCCATCTC
SP4 MA0685.1 Peak_73465#chr19#45001886#45002085 50 66 - 25.5488 3.14e-10 3.97e-05 CAGGCCACGCCCCCTTC
SP4 MA0685.1 Peak_73465#chr19#45001835#45002034 101 117 - 25.5488 3.14e-10 3.97e-05 CAGGCCACGCCCCCTTC
SP4 MA0685.1 Peak_73465#chr19#45001828#45002027 108 124 - 25.5488 3.14e-10 3.97e-05 CAGGCCACGCCCCCTTC
THAP11 MA1573.1 Peak_110384#chr3#141370283#141370482 140 158 - 27.4944 3.59e-10 6.36e-05 AGGACTACATTTCCCAGAC
CTCF MA0139.1 Peak_71057#chr19#2474615#2474814 96 114 + 25.2247 4.23e-10 0.000166 CGGCCACCAGATGGCGCCA
ZNF16 MA1654.1 Peak_181996#chr9#129485761#129485960 1 23 + 27.5244 5.42e-10 0.000109 AATGGGGAGCCATCGAAGGCCTT
ZNF16 MA1654.1 Peak_181996#chr9#129485656#129485855 106 128 + 27.5244 5.42e-10 0.000109 AATGGGGAGCCATCGAAGGCCTT

In your tutorial it seems that I need to adapt FIMO output:

sequence.name start stop X.pattern.name strand score p.value

307 chr1 753016 753228 1 + 13.53 1.14e-05

315 chr1 876197 876409 1 - 12.07 3.73e-05

29 chr1 1365483 1365695 1 - 11.88 4.24e-05

30 chr1 1365877 1366089 1 - 12.72 2.24e-05

31 chr1 1406705 1406917 1 - 11.20 6.73e-05

64 chr1 1566358 1566570 1 + 13.99 7.75e-06

q.value matched.sequence

307 NA TTTCCCAGAAGGA

315 NA CTTCCCCGAAGGG

29 NA TTTCCAAGAAAGT

30 NA CTTCCCAGGAGAG

31 NA CTTCACAGAATTA

64 NA TTTCCAAGAACCG

I am getting the following error:
-- Column specification ------------------------------------------------------------------ cols( sequence_name = col_character(), chr = col_character(), start = col_double(), stop = col_double(), strand = col_character(), score = col_double(), p-value= col_double(),q-value` = col_double(),
matched_sequence = col_character(),
motif_id = col_character(),
motif_alt_id = col_character()
)

Error in h(simpleError(msg, call)) :
error in evaluating the argument 'which' in selecting a method for function 'ScanBamParam': In range 4685: at least two out of 'start', 'end', and 'width', must
be supplied.`

How do I need to adapt my FIMO output?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions