Good evening,
Welcome to the forum. I will try to answer your question about VMSP and thanks for the feedback about the bug.
First, I have to say that there was a bug in how maxgap was implemented in VMSP that I have fixed a few weeks ago but I forgot to mention it in the release notes of SPMF. If you download the newest version of SPMF, I think it is possible that the bug has been fixed.
So please try the newest version and if the bug is still there, please let me know and send me the input file that you have used at philfv8 AT yahoo DOT com with the minsup value and maxpgap value so that I can reproduce the bug and fix it.
Second, yes, you are right that the support is not necessary to find maximal patterns as the definition of being maximal does not depend on the support. The reason for using the support in that code is as an optimization. Let me explain.
If we want to find only the maximal patterns, we need to do two basic operations : check for super-patterns and check for sub-patterns.
But these two operations are quite costly. For example, say that the algorithm find a new pattern X= <(a).(bc).(d)> .
A naive solution to determine if this pattern X is maximal is to compare Xwith all patterns that the algorithm has seen before and those that the algorithm will find after.
But if we want the algorithm to be efficient we should reduce the number of comparison to find super-patterns and sub-patterns.
To do that, we can make some basic observations:
- First, a pattern X can only be a (proper) sub-pattern of a pattern Y if Y is larger than X.
- Second, a pattern X can only be a (proper) super-pattern of a pattern Y if Y is smaller than X.
- Third, a pattern X can only be a sub-pattern of a pattern Y if Y has a support than is smaller or equal to the support of X. Maybe this is not obvious, but it is true due to the "Apriori property" of the support. Let me convince you about this with an example. If I tell you that the pattern X = <(a).(bc).(d)> appears in three sequences. Do you think that a super-pattern Y = <(f) (a).(bc).(d),(e)> could appear in more than three sequences?
Of course, not. So if we have two patterns X and Y like those and we know that X appears in 3 sequences and that Y appears in 4 sequences, we dont need to compare these patterns to know that X cannot be a sub-pattern of Y.
- Four, similarly, a pattern X can only be a super-pattern of a pattern Y if X has a support that is smaller or equal to the support of Y.
Using these basic ideas we can reduce the number of comparisons. This is what is done in the strategies of VMSP (Strategy 1,2,3) More details about this are in the
paper about VMSP.
Hope that this helps a little bit to understand more about the algorithm !
alevern wrote: ↑Mon Jan 09, 2023 2:59 pm
Hi all,
I have been implementing VMSP in typescript for the past few days (I'd like to thank the authors for creating and implementing this algorithm in SPMF), and I have a question regarding how it was implemented in Java.
When saving a pattern
p, the condition to decide whether another pattern
pPrime is a super-pattern of
p involves checking that the support of
p is superior or equal to the support of
pPrime (lines 774-778 of AlgoVMSP.java):
Code: Select all
if (p.prefix.sumOfEvenItems <= pPrime.prefix.sumOfEvenItems
&& p.prefix.sumOfOddItems <= pPrime.prefix.sumOfOddItems && p.bitmap.getSupport() >= pPrime.support
&& strictlyContains(pPrime.prefix, p.prefix)) {
return true;
}
Why the support of
p must be superior or equal to the support of
pPrime for
pPrime to be a superpattern of
p? More generally, why is the support involved here? My understanding of maximal sequential patterns is that they are not contained by another pattern w.r.t. minsup.
As a result, the java implementation returns the following two patterns:
41 -1 42 -1 support : 302
41 -1 8 -1 42 -1 support : 307
It is clear that the first is contained in the second (I have maxGap set to 5), but because the support of the first is lower than the support of the second, p.bitmap.getSupport() >= pPrime.support is false and thus the superpattern check fails.
Is this a bug or did I miss something?
Thank you so much for your time.