php 无需 cookie 或本地存储的用户识别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15966812/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
User recognition without cookies or local storage
提问by slash197
I'm building an analytic tool and I can currently get the user's IP address, browser and operating system from their user agent.
我正在构建一个分析工具,我目前可以从他们的用户代理获取用户的 IP 地址、浏览器和操作系统。
I'm wondering if there is a possibility to detect the same user without using cookies or local storage? I'm not expecting code examples here; just a simple hint of where to look further.
我想知道是否有可能在不使用 cookie 或本地存储的情况下检测到同一用户?我不期待这里的代码示例;只是一个简单的提示,告诉你在哪里可以看得更远。
Forgot to mention that it would need to be cross-browser compatible if it's the same computer/device. Basically I'm after device recognition not really the user.
忘了提到如果它是同一台计算机/设备,它需要跨浏览器兼容。基本上我在设备识别之后并不是真正的用户。
回答by Baba
Introduction
介绍
If I understand you correctly, you need to identify a user for whom you don't have a Unique Identifier, so you want to figure out who they are by matching Random Data. You can't store the user's identity reliably because:
如果我理解正确,您需要确定没有唯一标识符的用户,因此您想通过匹配随机数据来确定他们是谁。您无法可靠地存储用户的身份,因为:
- Cookies Can be deleted
- IP address Can change
- Browser Can Change
- Browser Cache may be deleted
- Cookie 可以删除
- IP地址可以更改
- 浏览器可以改变
- 浏览器缓存可能被删除
A Java Applet or Com Object would have been an easy solution using a hash of hardware information, but these days people are so security-aware that it would be difficult to get people to install these kinds of programs on their system. This leaves you stuck with using Cookies and other, similar tools.
Java Applet 或 Com Object 本来是使用硬件信息散列的简单解决方案,但是现在人们非常关注安全性,很难让人们在他们的系统上安装这些类型的程序。这让您不得不使用 Cookie 和其他类似工具。
Cookies and other, similar tools
Cookie 和其他类似工具
You might consider building a Data Profile, then using Probability tests to identify a Probable User. A profile useful for this can be generated by some combination of the following:
您可能会考虑构建数据配置文件,然后使用概率测试来识别可能的用户。可以通过以下某种组合生成对此有用的配置文件:
- IP Address
- Real IP Address
- Proxy IP Address (users often use the same proxy repeatedly)
- Cookies
- HTTP Cookies
- Session Cookies
- 3rd Party Cookies
- Flash Cookies (most people don't know how to delete these)
- Web Bugs (less reliable because bugs get fixed, but still useful)
- PDF Bug
- Flash Bug
- Java Bug
- Browsers
- Click Tracking (many users visit the same series of pages on each visit)
- Browsers Finger Print ? - Installed Plugins (people often have varied, somewhat unique sets of plugins)
- Cached Images (people sometimes delete their cookies but leave cached images)
- Using Blobs
- URL(s) (browser history or cookies may contain unique user id's in URLs, such as https://stackoverflow.com/users/1226894or http://www.facebook.com/barackobama?fref=ts)
- System Fonts Detection(this is a little-known but often unique key signature)
- HTML5 & Javascript
- HTML5 LocalStorage
- HTML5 Geolocation API and Reverse Geocoding
- Architecture, OS Language, System Time, Screen Resolution, etc.
- Network Information API
- Battery Status API
- IP地址
- 真实IP地址
- 代理IP地址(用户经常重复使用同一个代理)
- 饼干
- HTTP Cookie
- 会话 Cookie
- 第三方饼干
- Flash Cookies(大多数人不知道如何删除这些)
- 网络错误(不太可靠,因为错误得到修复,但仍然有用)
- PDF错误
- 闪退
- Java错误
- 浏览器
- 点击跟踪(许多用户在每次访问时访问同一系列的页面)
- 浏览器指纹?- 已安装的插件(人们通常有不同的、有点独特的插件集)
- 缓存图像(人们有时会删除他们的 cookie 但保留缓存图像)
- 使用 Blob
- URL(浏览器历史记录或 cookie 可能在 URL 中包含唯一的用户 ID,例如https://stackoverflow.com/users/1226894或http://www.facebook.com/barackobama?fref=ts)
- 系统字体检测(这是一个鲜为人知但通常是唯一的键签名)
- HTML5 和 JavaScript
- HTML5 本地存储
- HTML5 地理定位 API 和反向地理编码
- 架构、操作系统语言、系统时间、屏幕分辨率等。
- 网络信息API
- 电池状态API
The items I listed are, of course, just a few possible ways a user can be identified uniquely. There are many more.
当然,我列出的项目只是可以唯一标识用户的几种可能方式。还有更多。
With this set of Random Data elements to build a Data Profile from, what's next?
使用这组随机数据元素构建数据配置文件,下一步是什么?
The next step is to develop some Fuzzy Logic, or, better yet, an Artificial Neural Network(which uses fuzzy logic). In either case, the idea is to train your system, and then combine its training with Bayesian Inferenceto increase the accuracy of your results.
下一步是开发一些模糊逻辑,或者更好的是,人工神经网络(使用模糊逻辑)。无论哪种情况,其想法都是训练您的系统,然后将其训练与贝叶斯推理相结合,以提高结果的准确性。
The NeuralMeshlibrary for PHP allows you to generate Artificial Neural Networks. To implement Bayesian Inference, check out the following links:
PHP的NeuralMesh库允许您生成人工神经网络。要实施贝叶斯推理,请查看以下链接:
- Implement Bayesian inference using PHP, Part 1
- Implement Bayesian inference using PHP, Part 2
- Implement Bayesian inference using PHP, Part 3
At this point, you may be thinking:
这时候,你可能会想:
Why so much Math and Logic for a seemingly simple task?
为什么对于一个看似简单的任务需要如此多的数学和逻辑?
Basically, because it is not a simple task. What you are trying to achieve is, in fact, Pure Probability. For example, given the following known users:
基本上,因为这不是一项简单的任务。实际上,您要实现的是Pure Probability。例如,给定以下已知用户:
User1 = A + B + C + D + G + K
User2 = C + D + I + J + K + F
When you receive the following data:
当您收到以下数据时:
B + C + E + G + F + K
The question which you are essentially asking is:
您基本上要问的问题是:
What is the probability that the received data (B + C + E + G + F + K) is actually User1 or User2? And which of those two matches is mostprobable?
接收到的数据(B + C + E + G + F + K)实际上是 User1 或 User2 的概率是多少?这两场比赛中哪一场最有可能?
In order to effectively answer this question, you need to understand Frequency vs Probability Formatand why Joint Probabilitymight be a better approach. The details are too much to get into here (which is why I'm giving you links), but a good example would be a Medical Diagnosis Wizard Application, which uses a combination of symptoms to identify possible diseases.
为了有效地回答这个问题,您需要了解频率与概率格式以及为什么联合概率可能是更好的方法。这里的细节太多了(这就是我给你链接的原因),但一个很好的例子是医学诊断向导应用程序,它使用症状的组合来识别可能的疾病。
Think for a moment of the series of data points which comprise your Data Profile (B + C + E + G + F + K in the example above) as Symptoms, and Unknown Users as Diseases. By identifying the disease, you can further identify an appropriate treatment (treat this user as User1).
想一想包含您的数据配置文件(在上例中为 B + C + E + G + F + K)的一系列数据点为症状,未知用户为疾病。通过识别疾病,您可以进一步确定合适的治疗方法(将此用户视为 User1)。
Obviously, a Diseasefor which we have identified more than 1 Symptomis easier to identify. In fact, the more Symptomswe can identify, the easier and more accurate our diagnosis is almost certain to be.
显然,我们已识别出超过 1 个症状的疾病更容易识别。事实上,我们能识别的症状越多,我们的诊断就越容易和准确。
Are there any other alternatives?
还有其他选择吗?
Of course. As an alternative measure, you might create your own simple scoring algorithm, and base it on exact matches. This is not as efficient as probability, but may be simpler for you to implement.
当然。作为替代措施,您可以创建自己的简单评分算法,并基于完全匹配。这不如概率有效,但对您来说可能更容易实现。
As an example, consider this simple score chart:
例如,考虑这个简单的分数图表:
+-------------------------+--------+------------+ | Property | Weight | Importance | +-------------------------+--------+------------+ | Real IP address | 60 | 5 | | Used proxy IP address | 40 | 4 | | HTTP Cookies | 80 | 8 | | Session Cookies | 80 | 6 | | 3rd Party Cookies | 60 | 4 | | Flash Cookies | 90 | 7 | | PDF Bug | 20 | 1 | | Flash Bug | 20 | 1 | | Java Bug | 20 | 1 | | Frequent Pages | 40 | 1 | | Browsers Finger Print | 35 | 2 | | Installed Plugins | 25 | 1 | | Cached Images | 40 | 3 | | URL | 60 | 4 | | System Fonts Detection | 70 | 4 | | Localstorage | 90 | 8 | | Geolocation | 70 | 6 | | AOLTR | 70 | 4 | | Network Information API | 40 | 3 | | Battery Status API | 20 | 1 | +-------------------------+--------+------------+
For each piece of information which you can gather on a given request, award the associated score, then use Importanceto resolve conflicts when scores are the same.
对于您可以根据给定请求收集的每条信息,授予相关分数,然后在分数相同时使用重要性解决冲突。
Proof of Concept
概念证明
For a simple proof of concept, please take a look at Perceptron. Perceptron is a RNA Modelthat is generally used in pattern recognition applications. There is even an old PHP Classwhich implements it perfectly, but you would likely need to modify it for your purposes.
有关概念的简单证明,请查看Perceptron。感知器是一种RNA 模型,通常用于模式识别应用。甚至有一个旧的PHP 类可以完美地实现它,但您可能需要根据自己的目的修改它。
Despite being a great tool, Perceptron can still return multiple results (possible matches), so using a Score and Difference comparison is still useful to identify the bestof those matches.
尽管 Perceptron 是一个很棒的工具,但它仍然可以返回多个结果(可能的匹配项),因此使用 Score and Difference 比较仍然有助于确定这些匹配项中的最佳匹配项。
Assumptions
假设
- Store all possible information about each user (IP, cookies, etc.)
- Where result is an exact match, increase score by 1
- Where result is not an exact match, decrease score by 1
- 存储有关每个用户的所有可能信息(IP、cookie 等)
- 如果结果完全匹配,则将分数增加 1
- 如果结果不完全匹配,则将分数减 1
Expectation
期待
- Generate RNA labels
- Generate random users emulating a database
- Generate a single Unknown user
- Generate Unknown user RNA and Values
- The system will merge RNA information and teach the Perceptron
- After training the Perceptron, the system will have a set of weightings
- You can now test the Unknown user's pattern and the Perceptron will produce a result set.
- Store all Positive matches
- Sort the matches first by Score, then by Difference (as described above)
- Output the two closest matches, or, if no matches are found, output empty results
- 生成 RNA 标签
- 生成模拟数据库的随机用户
- 生成单个未知用户
- 生成未知用户 RNA 和值
- 系统将合并 RNA 信息并教导感知器
- 训练感知器后,系统会有一组权重
- 您现在可以测试未知用户的模式,感知器将生成一个结果集。
- 存储所有正面匹配
- 首先按分数对匹配项进行排序,然后按差异排序(如上所述)
- 输出两个最接近的匹配,或者,如果没有找到匹配,则输出空结果
Code for Proof of Concept
概念证明代码
$features = array(
'Real IP address' => .5,
'Used proxy IP address' => .4,
'HTTP Cookies' => .9,
'Session Cookies' => .6,
'3rd Party Cookies' => .6,
'Flash Cookies' => .7,
'PDF Bug' => .2,
'Flash Bug' => .2,
'Java Bug' => .2,
'Frequent Pages' => .3,
'Browsers Finger Print' => .3,
'Installed Plugins' => .2,
'URL' => .5,
'Cached PNG' => .4,
'System Fonts Detection' => .6,
'Localstorage' => .8,
'Geolocation' => .6,
'AOLTR' => .4,
'Network Information API' => .3,
'Battery Status API' => .2
);
// Get RNA Lables
$labels = array();
$n = 1;
foreach ($features as $k => $v) {
$labels[$k] = "x" . $n;
$n ++;
}
// Create Users
$users = array();
for($i = 0, $name = "A"; $i < 5; $i ++, $name ++) {
$users[] = new Profile($name, $features);
}
// Generate Unknown User
$unknown = new Profile("Unknown", $features);
// Generate Unknown RNA
$unknownRNA = array(
0 => array("o" => 1),
1 => array("o" => - 1)
);
// Create RNA Values
foreach ($unknown->data as $item => $point) {
$unknownRNA[0][$labels[$item]] = $point;
$unknownRNA[1][$labels[$item]] = (- 1 * $point);
}
// Start Perception Class
$perceptron = new Perceptron();
// Train Results
$trainResult = $perceptron->train($unknownRNA, 1, 1);
// Find matches
foreach ($users as $name => &$profile) {
// Use shorter labels
$data = array_combine($labels, $profile->data);
if ($perceptron->testCase($data, $trainResult) == true) {
$score = $diff = 0;
// Determing the score and diffrennce
foreach ($unknown->data as $item => $found) {
if ($unknown->data[$item] === $profile->data[$item]) {
if ($profile->data[$item] > 0) {
$score += $features[$item];
} else {
$diff += $features[$item];
}
}
}
// Ser score and diff
$profile->setScore($score, $diff);
$matchs[] = $profile;
}
}
// Sort bases on score and Output
if (count($matchs) > 1) {
usort($matchs, function ($a, $b) {
// If score is the same use diffrence
if ($a->score == $b->score) {
// Lower the diffrence the better
return $a->diff == $b->diff ? 0 : ($a->diff > $b->diff ? 1 : - 1);
}
// The higher the score the better
return $a->score > $b->score ? - 1 : 1;
});
echo "<br />Possible Match ", implode(",", array_slice(array_map(function ($v) {
return sprintf(" %s (%0.4f|%0.4f) ", $v->name, $v->score,$v->diff);
}, $matchs), 0, 2));
} else {
echo "<br />No match Found ";
}
Possible Match D (0.7416|0.16853),C (0.5393|0.2809)
Print_r of "D":
“D”的打印_r:
echo "<pre>";
print_r($matchs[0]);
Profile Object(
[name] => D
[data] => Array (
[Real IP address] => -1
[Used proxy IP address] => -1
[HTTP Cookies] => 1
[Session Cookies] => 1
[3rd Party Cookies] => 1
[Flash Cookies] => 1
[PDF Bug] => 1
[Flash Bug] => 1
[Java Bug] => -1
[Frequent Pages] => 1
[Browsers Finger Print] => -1
[Installed Plugins] => 1
[URL] => -1
[Cached PNG] => 1
[System Fonts Detection] => 1
[Localstorage] => -1
[Geolocation] => -1
[AOLTR] => 1
[Network Information API] => -1
[Battery Status API] => -1
)
[score] => 0.74157303370787
[diff] => 0.1685393258427
[base] => 8.9
)
If Debug = true you would be able to see Input (Sensor & Desired), Initial Weights, Output (Sensor, Sum, Network), Error, Correction and Final Weights.
如果 Debug = true 您将能够看到Input (Sensor & Desired), Initial Weights, Output (Sensor, Sum, Network), Error, Correction 和 Final Weights。
+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
| o | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | x11 | x12 | x13 | x14 | x15 | x16 | x17 | x18 | x19 | x20 | Bias | Yin | Y | deltaW1 | deltaW2 | deltaW3 | deltaW4 | deltaW5 | deltaW6 | deltaW7 | deltaW8 | deltaW9 | deltaW10 | deltaW11 | deltaW12 | deltaW13 | deltaW14 | deltaW15 | deltaW16 | deltaW17 | deltaW18 | deltaW19 | deltaW20 | W1 | W2 | W3 | W4 | W5 | W6 | W7 | W8 | W9 | W10 | W11 | W12 | W13 | W14 | W15 | W16 | W17 | W18 | W19 | W20 | deltaBias |
+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
| 1 | 1 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 0 | -1 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 |
| -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | -1 | -1 | 1 | -19 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
| 1 | 1 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 |
| -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | -1 | -1 | 1 | -19 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
x1 to x20 represent the features converted by the code.
x1 到 x20 表示代码转换的特征。
// Get RNA Labels
$labels = array();
$n = 1;
foreach ( $features as $k => $v ) {
$labels[$k] = "x" . $n;
$n ++;
}
Here is an online demo
这是一个在线演示
Class Used:
使用的类:
class Profile {
public $name, $data = array(), $score, $diff, $base;
function __construct($name, array $importance) {
$values = array(-1, 1); // Perception values
$this->name = $name;
foreach ($importance as $item => $point) {
// Generate Random true/false for real Items
$this->data[$item] = $values[mt_rand(0, 1)];
}
$this->base = array_sum($importance);
}
public function setScore($score, $diff) {
$this->score = $score / $this->base;
$this->diff = $diff / $this->base;
}
}
Modified Perceptron Class
改进的感知器类
class Perceptron {
private $w = array();
private $dw = array();
public $debug = false;
private function initialize($colums) {
// Initialize perceptron vars
for($i = 1; $i <= $colums; $i ++) {
// weighting vars
$this->w[$i] = 0;
$this->dw[$i] = 0;
}
}
function train($input, $alpha, $teta) {
$colums = count($input[0]) - 1;
$weightCache = array_fill(1, $colums, 0);
$checkpoints = array();
$keepTrainning = true;
// Initialize RNA vars
$this->initialize(count($input[0]) - 1);
$just_started = true;
$totalRun = 0;
$yin = 0;
// Trains RNA until it gets stable
while ($keepTrainning == true) {
// Sweeps each row of the input subject
foreach ($input as $row_counter => $row_data) {
// Finds out the number of columns the input has
$n_columns = count($row_data) - 1;
// Calculates Yin
$yin = 0;
for($i = 1; $i <= $n_columns; $i ++) {
$yin += $row_data["x" . $i] * $weightCache[$i];
}
// Calculates Real Output
$Y = ($yin <= 1) ? - 1 : 1;
// Sweeps columns ...
$checkpoints[$row_counter] = 0;
for($i = 1; $i <= $n_columns; $i ++) {
/** DELTAS **/
// Is it the first row?
if ($just_started == true) {
$this->dw[$i] = $weightCache[$i];
$just_started = false;
// Found desired output?
} elseif ($Y == $row_data["o"]) {
$this->dw[$i] = 0;
// Calculates Delta Ws
} else {
$this->dw[$i] = $row_data["x" . $i] * $row_data["o"];
}
/** WEIGHTS **/
// Calculate Weights
$this->w[$i] = $this->dw[$i] + $weightCache[$i];
$weightCache[$i] = $this->w[$i];
/** CHECK-POINT **/
$checkpoints[$row_counter] += $this->w[$i];
} // END - for
foreach ($this->w as $index => $w_item) {
$debug_w["W" . $index] = $w_item;
$debug_dw["deltaW" . $index] = $this->dw[$index];
}
// Special for script debugging
$debug_vars[] = array_merge($row_data, array(
"Bias" => 1,
"Yin" => $yin,
"Y" => $Y
), $debug_dw, $debug_w, array(
"deltaBias" => 1
));
} // END - foreach
// Special for script debugging
$empty_data_row = array();
for($i = 1; $i <= $n_columns; $i ++) {
$empty_data_row["x" . $i] = "--";
$empty_data_row["W" . $i] = "--";
$empty_data_row["deltaW" . $i] = "--";
}
$debug_vars[] = array_merge($empty_data_row, array(
"o" => "--",
"Bias" => "--",
"Yin" => "--",
"Y" => "--",
"deltaBias" => "--"
));
// Counts training times
$totalRun ++;
// Now checks if the RNA is stable already
$referer_value = end($checkpoints);
// if all rows match the desired output ...
$sum = array_sum($checkpoints);
$n_rows = count($checkpoints);
if ($totalRun > 1 && ($sum / $n_rows) == $referer_value) {
$keepTrainning = false;
}
} // END - while
// Prepares the final result
$result = array();
for($i = 1; $i <= $n_columns; $i ++) {
$result["w" . $i] = $this->w[$i];
}
$this->debug($this->print_html_table($debug_vars));
return $result;
} // END - train
function testCase($input, $results) {
// Sweeps input columns
$result = 0;
$i = 1;
foreach ($input as $column_value) {
// Calculates teste Y
$result += $results["w" . $i] * $column_value;
$i ++;
}
// Checks in each class the test fits
return ($result > 0) ? true : false;
} // END - test_class
// Returns the html code of a html table base on a hash array
function print_html_table($array) {
$html = "";
$inner_html = "";
$table_header_composed = false;
$table_header = array();
// Builds table contents
foreach ($array as $array_item) {
$inner_html .= "<tr>\n";
foreach ( $array_item as $array_col_label => $array_col ) {
$inner_html .= "<td>\n";
$inner_html .= $array_col;
$inner_html .= "</td>\n";
if ($table_header_composed == false) {
$table_header[] = $array_col_label;
}
}
$table_header_composed = true;
$inner_html .= "</tr>\n";
}
// Builds full table
$html = "<table border=1>\n";
$html .= "<tr>\n";
foreach ($table_header as $table_header_item) {
$html .= "<td>\n";
$html .= "<b>" . $table_header_item . "</b>";
$html .= "</td>\n";
}
$html .= "</tr>\n";
$html .= $inner_html . "</table>";
return $html;
} // END - print_html_table
// Debug function
function debug($message) {
if ($this->debug == true) {
echo "<b>DEBUG:</b> $message";
}
} // END - debug
} // END - class
Conclusion
结论
Identifying a user without a Unique Identifier is not a straight-forward or simple task. it is dependent upon gathering a sufficient amount of Random Data which you are able to gather from the user by a variety of methods.
在没有唯一标识符的情况下识别用户不是一项直接或简单的任务。它依赖于收集足够数量的随机数据,您可以通过各种方法从用户那里收集到这些数据。
Even if you choose not to use an Artificial Neural Network, I suggest at least using a Simple Probability Matrix with priorities and likelihoods - and I hope the code and examples provided above give you enough to go on.
即使您选择不使用人工神经网络,我建议至少使用具有优先级和可能性的简单概率矩阵 - 我希望上面提供的代码和示例足以让您继续下去。
回答by pozs
This technique (to detect same users without cookies - or even without ip address) is called browser fingerprinting. Basically you crawl as information about the browser as you can - better results can be achieved with javascript, flash or java (f.ex. installed extensions, fonts, etc.). After that, you can store the results hashed, if you want.
这种技术(在没有 cookie 的情况下检测相同的用户 - 甚至没有 ip 地址)称为浏览器指纹识别。基本上,您尽可能地抓取有关浏览器的信息 - 使用 javascript、flash 或 java(例如已安装的扩展程序、字体等)可以获得更好的结果。之后,您可以根据需要存储散列结果。
It's not infallible, but:
这不是万无一失的,但是:
83.6% of the browsers seen had a unique fingerprint; among those with Flash or Java enabled, 94.2%. This does not include cookies!
83.6% 的浏览器拥有独特的指纹;在启用 Flash 或 Java 的用户中,94.2%。这不包括饼干!
More info:
更多信息:
回答by Justin Alexander
The above mentioned thumbprinting works, but can still suffer colisions.
上面提到的指纹可以工作,但仍然会受到影响。
One way is to add UID to the url of each interaction with the user.
一种方法是将 UID 添加到与用户的每次交互的 url 中。
http://someplace.com/12899823/user/profile
http://someplace.com/12899823/user/profile
Where every link in the site is adapted with this modifier. It is similar to the way ASP.Net used to work using FORM data between pages.
站点中的每个链接都使用此修饰符进行了调整。它类似于 ASP.Net 过去在页面之间使用 FORM 数据的工作方式。
回答by Alexis Tyler
Have you looked into Evercookie? It may or may not work across browsers. An extract from their site.
你研究过Evercookie吗?它可能会或可能不会跨浏览器工作。从他们的网站摘录。
"If a user gets cookied on one browser and switches to another browser, as long as they still have the Local Shared Object cookie, the cookie will reproduce in both browsers."
“如果用户在一个浏览器上获取 cookie 并切换到另一个浏览器,只要他们仍然拥有本地共享对象 cookie,cookie 就会在两个浏览器中复制。”
回答by hobberwickey
You could do this with a cached png, it would be somewhat unreliable (different browsers behave differently, and it'll fail if the user clears their cache), but it's an option.
您可以使用缓存的 png 来执行此操作,它会有些不可靠(不同的浏览器行为不同,如果用户清除缓存会失败),但这是一个选项。
1: set up a Database that stores a unique user id as a hex string
1:建立一个数据库,以十六进制字符串的形式存储唯一的用户ID
2: create a genUser.php (or whatever language) file that generates a user id, stores it in the DB and then creates a true color .png out of the values of that hex string (each pixel will be 4 bytes) and return that to the browser. Be sure to set the content-type and cache headers.
2:创建一个 genUser.php(或任何语言)文件来生成用户 ID,将其存储在数据库中,然后根据该十六进制字符串的值(每个像素将是 4 个字节)创建一个真彩色 .png 并返回到浏览器。请务必设置内容类型和缓存标头。
3: in the HTML or JS create an image like <img id='user_id' src='genUser.php' />
3:在 HTML 或 JS 中创建一个像 <img id='user_id' src='genUser.php' />
4: draw that image to a canvas ctx.drawImage(document.getElementById('user_id'), 0, 0);
4:将该图像绘制到画布上 ctx.drawImage(document.getElementById('user_id'), 0, 0);
5: read the bytes of that image out using ctx.getImageData
, and convert the integers to a hex string.
5:使用 读取该图像的字节ctx.getImageData
,并将整数转换为十六进制字符串。
6: That is your unique user id that's now cached on the your users computer.
6:这是您的唯一用户 ID,现在缓存在您的用户计算机上。
回答by Mehdi Karamosly
Based on what you have said :
根据你所说的:
Basically I'm after device recognition not really the user
基本上我在设备识别之后并不是真正的用户
Best way to do it is to send the mac address which is the NIC ID.
最好的方法是发送作为 NIC ID 的 mac 地址。
You can take a look at this post : How can I get the MAC and the IP address of a connected client in PHP?
你可以看看这篇文章: 如何在 PHP 中获取已连接客户端的 MAC 和 IP 地址?
回答by DanielDMO
You could potentially create a blob to store a device identifier ...
您可能会创建一个 blob 来存储设备标识符......
the downside is that the user needs to download the blob ( you can force the download), as the browser can't access the File System to directly save the file.
缺点是用户需要下载blob(可以强制下载),因为浏览器无法访问文件系统直接保存文件。
reference:
参考:
https://www.inkling.com/read/javascript-definitive-guide-david-flanagan-6th/chapter-22/blobs
https://www.inkling.com/read/javascript-definitive-guide-david-flanagan-6th/chapter-22/blobs
回答by Brian McGinity
You can do it with etags. Although I am not sure if this legal as a bunch of lawsuits were filed.
你可以用 etags 做到这一点。虽然我不确定这是否合法,因为一堆诉讼被提起。
If you properly warn your users or if you have something like an intranet website it might be ok.
如果您正确警告您的用户,或者您有类似 Intranet 网站的内容,那可能没问题。
回答by rexposadas
Inefficient, but may give you the desired results, would be to poll an API on your side. Have a background process on the client side which sends user data at an interval. You will need a user identifier to send to your API. Once you have that you can send along any information associated to that unique identifier.
效率低下,但可能会给你想要的结果,那就是轮询你身边的 API。在客户端有一个后台进程,它每隔一段时间发送用户数据。您需要一个用户标识符来发送到您的 API。一旦你有了它,你就可以发送与该唯一标识符相关的任何信息。
This removes the need for cookies and localstorage.
这消除了对 cookie 和 localstorage 的需要。
回答by Valentin Heinitz
I can't believe, http://browserspy.dkstill has not been mentioned here! The site describes many features (in terms of pattern recognition), which could be used to build a classifier.
我不敢相信,http://browserspy.dk还没有在这里提到!该站点描述了许多功能(在模式识别方面),可用于构建分类器。
And of cause, for evaluating the features I'd suggest Support Vector Machines and libsvmin particular.
当然,为了评估功能,我特别建议使用支持向量机和libsvm。