Java 如何实现最不常用(LFU)缓存?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21117636/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to implement a Least Frequently Used (LFU) cache?
提问by Snehal Masne
Least Frequently Used (LFU) is a type of cache algorithm used to manage memory within a computer. The standard characteristics of this method involve the system keeping track of the number of times a block is referenced in memory. When the cache is full and requires more room the system will purge the item with the lowest reference frequency.
最不常用 (LFU) 是一种用于管理计算机内存的缓存算法。此方法的标准特性涉及系统跟踪内存中引用块的次数。当缓存已满并需要更多空间时,系统将清除参考频率最低的项目。
What would be the best way to implement a most-recently-used cache of objects, say in Java?
以 Java 为例,实现最近使用的对象缓存的最佳方法是什么?
I've already implemented one using LinkedHashMap(by maintaining the no. of times objects are accessed) But I'm curious if any of the new concurrent collections would be better candidates.
我已经使用 LinkedHashMap 实现了一个(通过维护访问对象的次数)但是我很好奇是否有任何新的并发集合是更好的候选者。
Consider this case : Suppose cache is full and we need to make space for another one. Say two objects are noted in cache which are accessed for one time only. Which one to remove if we come to know that other(which is not in cache)object is being accessed for more than once ?
考虑这种情况:假设缓存已满,我们需要为另一个缓存腾出空间。假设在缓存中记录了两个对象,它们仅被访问一次。如果我们知道另一个(不在缓存中)对象被多次访问,应该删除哪一个?
Thanks!
谢谢!
采纳答案by Rakmo
According to me, the best way to implement a most-recently-used cache of objects would be to include a new variable as 'latestTS' for each object. TS stands for timestamp.
// A static method that returns the current date and time as milliseconds since January 1st 1970 long latestTS = System.currentTimeMillis();
ConcurrentLinkedHashMap is not yet implemented in Concurrent Java Collections. (Ref: Java Concurrent Collection API). However, you can try and use ConcurrentHashMapand DoublyLinkedList
About the case to be considered: in such case, as I have said that you can declare latestTS variable, based upon the value of latestTS variable, you can remove an entry and add the new object. (Don't forget to update frequency and latestTS of the new object added)
在我看来,实现最近使用的对象缓存的最佳方法是为每个对象包含一个新变量作为“latestTS”。TS 代表时间戳。
// 一个静态方法,以毫秒为单位返回自 1970 年 1 月 1 日以来的当前日期和时间 long latestTS = System.currentTimeMillis();
ConcurrentLinkedHashMap 尚未在并发 Java 集合中实现。(参考:Java 并发集合 API)。但是,您可以尝试使用ConcurrentHashMap和DoublyLinkedList
关于要考虑的情况:在这种情况下,正如我所说,您可以声明latestTS变量,根据latestTS变量的值,您可以删除条目并添加新对象。(不要忘记更新添加的新对象的频率和latestTS)
As you have mentioned, you can use LinkedHashMapas it gives element access in O(1) and also, you get the order traversal. Please, find the below code for LFU Cache: (PS: The below code is the answer for the question in the title i.e. "How to implement LFU cache")
正如您所提到的,您可以使用LinkedHashMap,因为它提供 O(1) 中的元素访问权限,并且您还可以进行顺序遍历。请找到LFU缓存的以下代码:(PS:以下代码是标题中问题的答案,即“如何实现LFU缓存”)
import java.util.LinkedHashMap;
import java.util.Map;
public class LFUCache {
class CacheEntry
{
private String data;
private int frequency;
// default constructor
private CacheEntry()
{}
public String getData() {
return data;
}
public void setData(String data) {
this.data = data;
}
public int getFrequency() {
return frequency;
}
public void setFrequency(int frequency) {
this.frequency = frequency;
}
}
private static int initialCapacity = 10;
private static LinkedHashMap<Integer, CacheEntry> cacheMap = new LinkedHashMap<Integer, CacheEntry>();
/* LinkedHashMap is used because it has features of both HashMap and LinkedList.
* Thus, we can get an entry in O(1) and also, we can iterate over it easily.
* */
public LFUCache(int initialCapacity)
{
this.initialCapacity = initialCapacity;
}
public void addCacheEntry(int key, String data)
{
if(!isFull())
{
CacheEntry temp = new CacheEntry();
temp.setData(data);
temp.setFrequency(0);
cacheMap.put(key, temp);
}
else
{
int entryKeyToBeRemoved = getLFUKey();
cacheMap.remove(entryKeyToBeRemoved);
CacheEntry temp = new CacheEntry();
temp.setData(data);
temp.setFrequency(0);
cacheMap.put(key, temp);
}
}
public int getLFUKey()
{
int key = 0;
int minFreq = Integer.MAX_VALUE;
for(Map.Entry<Integer, CacheEntry> entry : cacheMap.entrySet())
{
if(minFreq > entry.getValue().frequency)
{
key = entry.getKey();
minFreq = entry.getValue().frequency;
}
}
return key;
}
public String getCacheEntry(int key)
{
if(cacheMap.containsKey(key)) // cache hit
{
CacheEntry temp = cacheMap.get(key);
temp.frequency++;
cacheMap.put(key, temp);
return temp.data;
}
return null; // cache miss
}
public static boolean isFull()
{
if(cacheMap.size() == initialCapacity)
return true;
return false;
}
}
回答by ?ukasz Kidziński
How about a priority queue? You can keep elements sorted there with keys representing the frequency. Just update the object position in the queue after visiting it. You can update just from time to time for optimizing the performance (but reducing precision).
优先队列怎么样?您可以使用代表频率的键将元素排序在那里。访问它后只需更新队列中的对象位置。您可以不时更新以优化性能(但会降低精度)。
回答by MozenRath
回答by heorhi
I think, the LFU data structure must combine priority queue (for maintaining fast access to lfu item) and hash map (for providing fast access to any item by its key); I would suggest the following node definition for each object stored in cache:
我认为,LFU 数据结构必须结合优先队列(用于保持对 lfu 项目的快速访问)和哈希映射(用于通过其键提供对任何项目的快速访问);我建议为缓存中存储的每个对象使用以下节点定义:
class Node<T> {
// access key
private int key;
// counter of accesses
private int numAccesses;
// current position in pq
private int currentPos;
// item itself
private T item;
//getters, setters, constructors go here
}
You need key
for referring to an item.
You need numAccesses
as a key for priority queue.
You need currentPos
to be able to quickly find a pq position of item by key.
Now you organize hash map (key(Integer
) -> node(Node<T>
)) to quickly access items and min heap-based priority queue using number of accesses as priority. Now you can very quickly perform all operations (access, add new item, update number of acceses, remove lfu). You need to write each operation carefully, so that it maintains all the nodes consistent (their number of accesses, their position in pq and there existence in hash map). All operations will work with constant average time complexity which is what you expect from cache.
您需要key
引用一个项目。您需要numAccesses
作为优先队列的键。您需要currentPos
能够通过键快速找到 item 的 pq 位置。现在您组织哈希映射(key( Integer
) -> node( Node<T>
))以使用访问次数作为优先级快速访问项目和最小基于堆的优先级队列。现在您可以非常快速地执行所有操作(访问、添加新项目、更新访问次数、删除 lfu)。您需要仔细编写每个操作,以便它保持所有节点一致(它们的访问次数,它们在 pq 中的位置以及在哈希映射中的存在)。所有操作都将以恒定的平均时间复杂度工作,这正是您对缓存的期望。
回答by Andrushenko Alexander
Many implementations I have seen have runtime complexity O(log(n))
. This means, when the cache size is n
, the time needed to insert/remove an element into/from chache is logarithmic. Such implementations use usually a min heap
to maintain usage frequencies of elements. The root of the heap contains the element with lowest frequency, and can be accessed in O(1)
time. But to maintain the heap property we have to move an element, every time it is used (and frequency is incremented) inside of the heap, to place it into proper position, or when we have to insert new element into the cache (and so put it into the heap).
But the runtime complexity can be reduced to O(1)
, when we maintain a hashmap
(Java) or unordered_map
(C++) with the element as key. Additinally we need two sorts of lists, frequency list
and elements lists
. The elements lists
contain elements that have same frequency, and the frequency list
contain the element lists
.
我见过的许多实现都有运行时复杂性O(log(n))
。这意味着,当缓存大小为 时n
,在 chache 中插入/删除元素所需的时间是对数的。这种实现通常使用 amin heap
来维护元素的使用频率。堆的根包含频率最低的元素,可以及时访问O(1)
。但是为了保持堆属性,我们必须移动一个元素,每次在堆内部使用它(并且频率增加),将其放置到适当的位置,或者当我们必须将新元素插入缓存时(等等)将其放入堆中)。但是O(1)
当我们维护一个以元素为键的hashmap
(Java)或unordered_map
(C++)时,运行时复杂度可以降低到。另外我们需要两种列表,frequency list
和elements lists
。的elements lists
包含具有相同的频率的元件,并且将frequency list
包含element lists
。
frequency list
1 3 6 7
a k y x
c l z
m n
Here in the example we see the frequency list
that has 4 elements (4 elements lists
). The element list 1
contains elements (a,c,m)
, the elements list 3
contains elements (k, l, n)
etc.
Now, when we use say element y
, we have to increment its frequency and put it in the next list. Because the elements list with frequency 6 becomes empty, we delete it. The result is:
在示例中,我们看到frequency list
具有 4 个元素 (4 elements lists
) 的 。元素列表1
包含元素(a,c,m)
,元素列表3
包含元素(k, l, n)
等等。现在,当我们使用 say element 时y
,我们必须增加它的频率并将它放在下一个列表中。因为频率为 6 的元素列表变为空,我们将其删除。结果是:
frequency list
1 3 7
a k y
c l x
m n z
We place the element y in the begin of the elements list
7. When we have to remove elements from the list later, we will start from the end (first z
, then x
and then y
).
Now, when we use element n
, we have to increment its frequency and put it into the new list, with frequencies 4:
我们将元素 y 放在elements list
7的开头。当我们稍后必须从列表中删除元素时,我们将从结尾开始(先z
,然后x
,然后y
)。现在,当我们使用 element 时n
,我们必须增加它的频率并将其放入新列表中,频率为 4:
frequency list
1 3 4 7
a k n y
c l x
m z
I hope the idea is clear. I provide now my C++ implementation of the LFU cache, and will add later a Java implementation.
The class has just 2 public methods, void set(key k, value v)
and bool get(key k, value &v)
. In the get method the value to retrieve will be set per reference when the element is found, in this case the method returns true. When the element is not found, the method returns false.
我希望这个想法很清楚。我现在提供 LFU 缓存的 C++ 实现,稍后将添加 Java 实现。该类只有 2 个公共方法,void set(key k, value v)
和bool get(key k, value &v)
. 在 get 方法中,当找到元素时,将根据每个引用设置要检索的值,在这种情况下,该方法返回 true。当未找到元素时,该方法返回 false。
#include<unordered_map>
#include<list>
using namespace std;
typedef unsigned uint;
template<typename K, typename V = K>
struct Entry
{
K key;
V value;
};
template<typename K, typename V = K>
class LFUCache
{
typedef typename list<typename Entry<K, V>> ElementList;
typedef typename list <pair <uint, ElementList>> FrequencyList;
private:
unordered_map <K, pair<typename FrequencyList::iterator, typename ElementList::iterator>> cacheMap;
FrequencyList elements;
uint maxSize;
uint curSize;
void incrementFrequency(pair<typename FrequencyList::iterator, typename ElementList::iterator> p) {
if (p.first == prev(elements.end())) {
//frequency list contains single list with some frequency, create new list with incremented frequency (p.first->first + 1)
elements.push_back({ p.first->first + 1, { {p.second->key, p.second->value} } });
// erase and insert the key with new iterator pair
cacheMap[p.second->key] = { prev(elements.end()), prev(elements.end())->second.begin() };
}
else {
// there exist element(s) with higher frequency
auto pos = next(p.first);
if (p.first->first + 1 == pos->first)
// same frequency in the next list, add the element in the begin
pos->second.push_front({ p.second->key, p.second->value });
else
// insert new list before next list
pos = elements.insert(pos, { p.first->first + 1 , {{p.second->key, p.second->value}} });
// update cachMap iterators
cacheMap[p.second->key] = { pos, pos->second.begin() };
}
// if element list with old frequency contained this singe element, erase the list from frequency list
if (p.first->second.size() == 1)
elements.erase(p.first);
else
// erase only the element with updated frequency from the old list
p.first->second.erase(p.second);
}
void eraseOldElement() {
if (elements.size() > 0) {
auto key = prev(elements.begin()->second.end())->key;
if (elements.begin()->second.size() < 2)
elements.erase(elements.begin());
else
elements.begin()->second.erase(prev(elements.begin()->second.end()));
cacheMap.erase(key);
curSize--;
}
}
public:
LFUCache(uint size) {
if (size > 0)
maxSize = size;
else
maxSize = 10;
curSize = 0;
}
void set(K key, V value) {
auto entry = cacheMap.find(key);
if (entry == cacheMap.end()) {
if (curSize == maxSize)
eraseOldElement();
if (elements.begin() == elements.end()) {
elements.push_front({ 1, { {key, value} } });
}
else if (elements.begin()->first == 1) {
elements.begin()->second.push_front({ key,value });
}
else {
elements.push_front({ 1, { {key, value} } });
}
cacheMap.insert({ key, {elements.begin(), elements.begin()->second.begin()} });
curSize++;
}
else {
entry->second.second->value = value;
incrementFrequency(entry->second);
}
}
bool get(K key, V &value) {
auto entry = cacheMap.find(key);
if (entry == cacheMap.end())
return false;
value = entry->second.second->value;
incrementFrequency(entry->second);
return true;
}
};
Here are examples of usage:
以下是使用示例:
int main()
{
LFUCache<int>cache(3); // cache of size 3
cache.set(1, 1);
cache.set(2, 2);
cache.set(3, 3);
cache.set(2, 4);
rc = cache.get(1, r);
assert(rc);
assert(r == 1);
// evict old element, in this case 3
cache.set(4, 5);
rc = cache.get(3, r);
assert(!rc);
rc = cache.get(4, r);
assert(rc);
assert(r == 5);
LFUCache<int, string>cache2(2);
cache2.set(1, "one");
cache2.set(2, "two");
string val;
rc = cache2.get(1, val);
if (rc)
assert(val == "one");
else
assert(false);
cache2.set(3, "three"); // evict 2
rc = cache2.get(2, val);
assert(rc == false);
rc = cache2.get(3, val);
assert(rc);
assert(val == "three");
}
回答by Pramod Sivaraju
Here's the o(1) implementation for LFU - http://dhruvbird.com/lfu.pdf
这是 LFU 的 o(1) 实现 - http://dhruvbird.com/lfu.pdf
回答by siddhartha
I have tried to implement this below LFU cache implementation. Took reference from this - LFUpaper. My implementation is working nicely.
我试图在 LFU 缓存实现下面实现这个。参考了这个 - LFU论文。我的实现运行良好。
If anyone wants to provide any further suggestion to improve it again, please let me know.
如果有人想提供任何进一步的建议以再次改进它,请告诉我。
import java.util.HashMap;
import java.util.Map;
import java.util.Objects;
import java.util.TreeMap;
public class LFUCacheImplementation {
private Map<Integer, Node> cache = new HashMap<>();
private Map<Integer, Integer> counts = new HashMap<>();
private TreeMap<Integer, DoublyLinkedList> frequencies = new TreeMap<>();
private final int CAPACITY;
public LFUCache(int capacity) {
this.CAPACITY = capacity;
}
public int get(int key) {
if (!cache.containsKey(key)) {
return -1;
}
Node node = cache.get(key);
int frequency = counts.get(key);
frequencies.get(frequency).remove(new Node(node.key(), node.value()));
removeFreq(frequency);
frequencies.computeIfAbsent(frequency + 1, k -> new DoublyLinkedList()).add(new Node(node.key(), node.value()));
counts.put(key, frequency + 1);
return cache.get(key).value();
}
public void set(int key, int value) {
if (!cache.containsKey(key)) {
Node node = new Node(key, value);
if (cache.size() == CAPACITY) {
int l_count = frequencies.firstKey();
Node deleteThisNode = frequencies.get(l_count).head();
frequencies.get(l_count).remove(deleteThisNode);
int deleteThisKey = deleteThisNode.key();
removeFreq(l_count);
cache.remove(deleteThisKey);
counts.remove(deleteThisKey);
}
cache.put(key, node);
counts.put(key, 1);
frequencies.computeIfAbsent(1, k -> new DoublyLinkedList()).add(node);
}
}
private void removeFreq(int frequency) {
if (frequencies.get(frequency).size() == 0) {
frequencies.remove(frequency);
}
}
public Map<Integer, Node> getCache() {
return cache;
}
public Map<Integer, Integer> getCounts() {
return counts;
}
public TreeMap<Integer, DoublyLinkedList> getFrequencies() {
return frequencies;
}
}
class Node {
private int key;
private int value;
private Node next;
private Node prev;
public Node(int key, int value) {
this.key = key;
this.value = value;
}
public Node getNext() {
return next;
}
public void setNext(Node next) {
this.next = next;
}
public Node getPrev() {
return prev;
}
public void setPrev(Node prev) {
this.prev = prev;
}
public int key() {
return key;
}
public int value() {
return value;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Node)) return false;
Node node = (Node) o;
return key == node.key &&
value == node.value;
}
@Override
public int hashCode() {
return Objects.hash(key, value);
}
@Override
public String toString() {
return "Node{" +
"key=" + key +
", value=" + value +
'}';
}
}
class DoublyLinkedList {
private int size;
private Node head;
private Node tail;
public void add(Node node) {
if (null == head) {
head = node;
} else {
tail.setNext(node);
node.setPrev(tail);
}
tail = node;
size++;
}
public void remove(Node node) {
if(null == head || null == node) {
return;
}
if(this.size() == 1 && head.equals(node)) {
head = null;
tail = null;
} else if (head.equals(node)) {
head = node.getNext();
head.setPrev(null);
} else if (tail.equals(node)) {
Node prevToTail = tail.getPrev();
prevToTail.setNext(null);
tail = prevToTail;
} else {
Node current = head.getNext();
while(!current.equals(tail)) {
if(current.equals(node)) {
Node prevToCurrent = current.getPrev();
Node nextToCurrent = current.getNext();
prevToCurrent.setNext(nextToCurrent);
nextToCurrent.setPrev(prevToCurrent);
break;
}
current = current.getNext();
}
}
size--;
}
public Node head() {
return head;
}
public int size() {
return size;
}
}
Client code to use the above cache implementation -
使用上述缓存实现的客户端代码 -
import java.util.Map;
public class Client {
public static void main(String[] args) {
Client client = new Client();
LFUCache cache = new LFUCache(4);
cache.set(11, function(11));
cache.set(12, function(12));
cache.set(13, function(13));
cache.set(14, function(14));
cache.set(15, function(15));
client.print(cache.getFrequencies());
cache.get(13);
cache.get(13);
cache.get(13);
cache.get(14);
cache.get(14);
cache.get(14);
cache.get(14);
client.print(cache.getCache());
client.print(cache.getCounts());
client.print(cache.getFrequencies());
}
public void print(Map<Integer, ? extends Object> map) {
for(Map.Entry<Integer, ? extends Object> entry : map.entrySet()) {
if(entry.getValue() instanceof Node) {
System.out.println("Cache Key => "+entry.getKey()+", Cache Value => "+((Node) entry.getValue()).toString());
} else if (entry.getValue() instanceof DoublyLinkedList) {
System.out.println("Frequency Key => "+entry.getKey()+" Frequency Values => [");
Node head = ((DoublyLinkedList) entry.getValue()).head();
while(null != head) {
System.out.println(head.toString());
head = head.getNext();
}
System.out.println(" ]");
} else {
System.out.println("Count Key => "+entry.getKey()+", Count Value => "+entry.getValue());
}
}
}
public static int function(int key) {
int prime = 31;
return key*prime;
}
}