Chicago: A new dataset about crime for high utility itemset mining and frequent itemset mining)

This is for discussing anything related to pattern mining (e.g. itemsets, sequential patterns, subgraph mining)
Post Reply
User avatar
admin
Site Admin
Posts: 103
Joined: Tue Apr 05, 2022 12:47 am
Location: China
Contact:

Chicago: A new dataset about crime for high utility itemset mining and frequent itemset mining)

Post by admin »

Hi all,

This is to let you know that a new dataset is released on the dataset page of SPMF called Chicago_Crimes_2001_to_2017.

This dataset can be used for high utility itemset mining andfrequent itemset mining.

To download the Chicago dataset, see the Datasets page and select either the version of the dataset for high utility itemset mining or frequent itemset mining.

This Chicago dataset was obtained from UCI and converted by Chongjie Zhang to a format that is suitable for itemset mining and donated to SPMF.

Here is a brief description of the version of the dataset for high utility itemset mining. It contains 2,662,309 transactions and 35 items, and it has real utility values.

The dataset records the crimes occurred in Chicago from 2001 to 2017.

Every transaction corresponds to a <month, area>. A transaction describes the crimes that occurred in a specific area during a specific month. Utility is the count of crime, and the names of items are shown in the 'NAMES'.

For example, '1 2:4:2 2' means that the crime 'THEFT' occurs twice and the crime 'OTHER OFFENSE' occurs twice in the corresponding <month, area> represented by this transaction. Here is the definitions of items:

NAMES:
1: THEFT
2: OTHER OFFENSE
3: OFFENSE INVOLVING CHILDREN
4: CRIM SEXUAL ASSAULT
5: MOTOR VEHICLE THEFT
6: SEX OFFENSE
7: DECEPTIVE PRACTICE
8: BATTERY
9: BURGLARY
10: WEAPONS VIOLATION
11: PUBLIC PEACE VIOLATION
12: NARCOTICS
13: GAMBLING
14: PROSTITUTION
15: LIQUOR LAW VIOLATION
16: INTERFERENCE WITH PUBLIC OFFICER
17: CRIMINAL DAMAGE
18: ASSAULT
19: STALKING
20: ARSON
21: CRIMINAL TRESPASS
22: HOMICIDE
23: ROBBERY
24: OBSCENITY
25: KIDNAPPING
26: INTIMIDATION
27: RITUALISM
28: DOMESTIC VIOLENCE
29: OTHER NARCOTIC VIOLATION
30: PUBLIC INDECENCY
31: NON-CRIMINAL
32: HUMAN TRAFFICKING
33: CONCEALED CARRY LICENSE VIOLATION
34: NON - CRIMINAL
35: NON-CRIMINAL (SUBJECT SPECIFIED)

For the frequent itemset mining, the dataset is the same except that there is no utility values.
Post Reply