SPMF Implementation of VMSP

Ask any questions, discuss or post suggestions for SPMF
Post Reply
alevern
Posts: 4
Joined: Wed Jan 04, 2023 12:17 pm

SPMF Implementation of VMSP

Post by alevern »

Hi all,
I have been implementing VMSP in typescript for the past few days (I'd like to thank the authors for creating and implementing this algorithm in SPMF), and I have a question regarding how it was implemented in Java.

When saving a pattern p, the condition to decide whether another pattern pPrime is a super-pattern of p involves checking that the support of p is superior or equal to the support of pPrime (lines 774-778 of AlgoVMSP.java):

Code: Select all

				
if (p.prefix.sumOfEvenItems <= pPrime.prefix.sumOfEvenItems
		&& p.prefix.sumOfOddItems <= pPrime.prefix.sumOfOddItems && p.bitmap.getSupport() >= pPrime.support
		&& strictlyContains(pPrime.prefix, p.prefix)) {
	return true;
}
Why the support of p must be superior or equal to the support of pPrime for pPrime to be a superpattern of p? More generally, why is the support involved here? My understanding of maximal sequential patterns is that they are not contained by another pattern w.r.t. minsup.
As a result, the java implementation returns the following two patterns:
41 -1 42 -1 support : 302
41 -1 8 -1 42 -1 support : 307
It is clear that the first is contained in the second (I have maxGap set to 5), but because the support of the first is lower than the support of the second, p.bitmap.getSupport() >= pPrime.support is false and thus the superpattern check fails.
Is this a bug or did I miss something?
Thank you so much for your time.
User avatar
admin
Site Admin
Posts: 121
Joined: Tue Apr 05, 2022 12:47 am
Location: China
Contact:

Re: SPMF Implementation of VMSP

Post by admin »

Good evening,

Welcome to the forum. I will try to answer your question about VMSP and thanks for the feedback about the bug.

First, I have to say that there was a bug in how maxgap was implemented in VMSP that I have fixed a few weeks ago but I forgot to mention it in the release notes of SPMF. If you download the newest version of SPMF, I think it is possible that the bug has been fixed.

So please try the newest version and if the bug is still there, please let me know and send me the input file that you have used at philfv8 AT yahoo DOT com with the minsup value and maxpgap value so that I can reproduce the bug and fix it.

Second, yes, you are right that the support is not necessary to find maximal patterns as the definition of being maximal does not depend on the support. The reason for using the support in that code is as an optimization. Let me explain.
If we want to find only the maximal patterns, we need to do two basic operations : check for super-patterns and check for sub-patterns.
But these two operations are quite costly. For example, say that the algorithm find a new pattern X= <(a).(bc).(d)> .
A naive solution to determine if this pattern X is maximal is to compare Xwith all patterns that the algorithm has seen before and those that the algorithm will find after.
But if we want the algorithm to be efficient we should reduce the number of comparison to find super-patterns and sub-patterns.
To do that, we can make some basic observations:
- First, a pattern X can only be a (proper) sub-pattern of a pattern Y if Y is larger than X.
- Second, a pattern X can only be a (proper) super-pattern of a pattern Y if Y is smaller than X.
- Third, a pattern X can only be a sub-pattern of a pattern Y if Y has a support than is smaller or equal to the support of X. Maybe this is not obvious, but it is true due to the "Apriori property" of the support. Let me convince you about this with an example. If I tell you that the pattern X = <(a).(bc).(d)> appears in three sequences. Do you think that a super-pattern Y = <(f) (a).(bc).(d),(e)> could appear in more than three sequences? :-) Of course, not. So if we have two patterns X and Y like those and we know that X appears in 3 sequences and that Y appears in 4 sequences, we dont need to compare these patterns to know that X cannot be a sub-pattern of Y.

- Four, similarly, a pattern X can only be a super-pattern of a pattern Y if X has a support that is smaller or equal to the support of Y.

Using these basic ideas we can reduce the number of comparisons. This is what is done in the strategies of VMSP (Strategy 1,2,3) More details about this are in the paper about VMSP.

Hope that this helps a little bit to understand more about the algorithm ! :-)
alevern wrote: Mon Jan 09, 2023 2:59 pm Hi all,
I have been implementing VMSP in typescript for the past few days (I'd like to thank the authors for creating and implementing this algorithm in SPMF), and I have a question regarding how it was implemented in Java.

When saving a pattern p, the condition to decide whether another pattern pPrime is a super-pattern of p involves checking that the support of p is superior or equal to the support of pPrime (lines 774-778 of AlgoVMSP.java):

Code: Select all

				
if (p.prefix.sumOfEvenItems <= pPrime.prefix.sumOfEvenItems
		&& p.prefix.sumOfOddItems <= pPrime.prefix.sumOfOddItems && p.bitmap.getSupport() >= pPrime.support
		&& strictlyContains(pPrime.prefix, p.prefix)) {
	return true;
}
Why the support of p must be superior or equal to the support of pPrime for pPrime to be a superpattern of p? More generally, why is the support involved here? My understanding of maximal sequential patterns is that they are not contained by another pattern w.r.t. minsup.
As a result, the java implementation returns the following two patterns:
41 -1 42 -1 support : 302
41 -1 8 -1 42 -1 support : 307
It is clear that the first is contained in the second (I have maxGap set to 5), but because the support of the first is lower than the support of the second, p.bitmap.getSupport() >= pPrime.support is false and thus the superpattern check fails.
Is this a bug or did I miss something?
Thank you so much for your time.
alevern
Posts: 4
Joined: Wed Jan 04, 2023 12:17 pm

Re: SPMF Implementation of VMSP

Post by alevern »

Thank you for the crystal clear, detailed explanations. I understand why we would compare the support of the potential containee and container patterns, if maxgap is not set. But can we still have this optimization when maxgap is set?
Given the two patterns I previously cited (obtained with maxgap == 5):
#1: 41 -1 42 -1 support : 302
#2: 41 -1 8 -1 42 -1 support : 307
I understand that pattern #2 is a superpattern of pattern #1, yet it can make sense that pattern #2 has a bigger support, because there are five sequences that match pattern #2 but they have a gap > 5 between item 41 and 42, and so these 5 sessions do not match pattern #1. We end up with #1 having a smaller support, and so we bypass superpattern checking although #2 is a superpattern of #1.
Shouldn't we compare supportWithoutGapTotal instead?
I checked with the latest jar and I still obtain these two patterns. I did a diff between the latest Bitmap.java and my version, and they are identical.
I'll send you the input file I use, I case I made a mistake in my reasoning.
Thanks!
User avatar
admin
Site Admin
Posts: 121
Joined: Tue Apr 05, 2022 12:47 am
Location: China
Contact:

Re: SPMF Implementation of VMSP

Post by admin »

alevern wrote: Tue Jan 10, 2023 10:00 am Thank you for the crystal clear, detailed explanations. I understand why we would compare the support of the potential containee and container patterns, if maxgap is not set. But can we still have this optimization when maxgap is set?
Given the two patterns I previously cited (obtained with maxgap == 5):
#1: 41 -1 42 -1 support : 302
#2: 41 -1 8 -1 42 -1 support : 307
I understand that pattern #2 is a superpattern of pattern #1, yet it can make sense that pattern #2 has a bigger support, because there are five sequences that match pattern #2 but they have a gap > 5 between item 41 and 42, and so these 5 sessions do not match pattern #1. We end up with #1 having a smaller support, and so we bypass superpattern checking although #2 is a superpattern of #1.
Shouldn't we compare supportWithoutGapTotal instead?
I checked with the latest jar and I still obtain these two patterns. I did a diff between the latest Bitmap.java and my version, and they are identical.
I'll send you the input file I use, I case I made a mistake in my reasoning.
Thanks!
Good evening,

Yes, I think you are right. This optimization may not work with maxgap. In fact, I have added maxgap to VMSP after the original paper was published. It seems that I have overlooked that situation. Thanks for pointing this out.

Actually, I realize that the bug that I fixed recently was in VGEN rather than VMSP but it was also a problem with maxgap but it was a different problem. I got confused and was thinking that it was VMSP because they are very similar.

So for VMSP, it is indeed a problem. I will try to fix it in the next few days. It seems that as you said, a solution is to deactivate that optimization with the support when maxgap is used. Maybe that is enough to fix this problem. I will also think about it.

Thanks again!

Philippe
alevern
Posts: 4
Joined: Wed Jan 04, 2023 12:17 pm

Re: SPMF Implementation of VMSP

Post by alevern »

Great! Glad I could help. Thank you for the fast answers.
alevern
Posts: 4
Joined: Wed Jan 04, 2023 12:17 pm

Re: SPMF Implementation of VMSP

Post by alevern »

By the way, I've released the JS implementation of VMSP I was working on => https://www.npmjs.com/package/@smartesting/vmsp
Feel free to experiment with it, any feedback is welcome.
I am currently adding a feature to allow for mining closed patterns instead. I'll just have to change VMSP's meaning from Vertical mining of Maximal Sequential Patterns to Vertical Mining of Sequential Patterns :P
Post Reply