Let us assume that genomic sequences G1 and G2 are associated to antibiotic resistant bacterial strains, while genomic sequences G3 and G4 are derived from bacterial strains sensitives to this antibiotic (genomic sequences G1, G2, G3 and G4 are stored in genomes_fasta directory).
The objective here is to search for specific sequences than can be involved in antibiotic resistance in genomes G1 and G2.
To do so, we first create a list file with the names of genomic sequences that belong to antibiotic resistance strains (group1.txt).
SkIf search for specific k-mers (in this example length = 24) of genomes G1 and G2 in this list (group1.txt) :
cd example
../bin/SkIf -b genomes_fasta/ -k 24 -s n -o specific_output -a dna -g group1.txt
Check outputs :
We can observe that a lot of specific k-mers are located in the same genomic region. We want then to create long mer (concatenation of consecutive k-mers) with the following command line:
../scripts/getLongestKmers.pl -f specific_output_24-mers_present_gp1_G1.csv -o longspecific
Check outputs :
We can also observe that these 2 specific long-mers overlapped. So, we want to concatenate these overlapping regions :
../scripts/getLongestKmersNC.pl -f specific_output_24-mers_present_gp1_G1.csv -k 24 -o longlong
Check outputs :
As G1 and G2 are antibiotic resistant bacterial strain, we can postulate that the CDS involved in this trait is probably located in this long fragment.