在50,000个HTML页面中查找电话号码
时间:2020-03-05 18:49:03 来源:igfitidea点击:
我们如何在50,000个HTML页面中找到电话号码?
Jeff Attwood posted 5 Questions for programmers applying for jobs: In an effort to make life simpler for phone screeners, I've put together this list of Five Essential Questions that you need to ask during an SDE screen. They won't guarantee that your candidate will be great, but they will help eliminate a huge number of candidates who are slipping through our process today. 1) Coding The candidate has to write some simple code, with correct syntax, in C, C++, or Java. 2) OO design The candidate has to define basic OO concepts, and come up with classes to model a simple problem. 3) Scripting and regexes The candidate has to describe how to find the phone numbers in 50,000 HTML pages. 4) Data structures The candidate has to demonstrate basic knowledge of the most common data structures. 5) Bits and bytes The candidate has to answer simple questions about bits, bytes, and binary numbers. Please understand: what I'm looking for here is a total vacuum in one of these areas. It's OK if they struggle a little and then figure it out. It's OK if they need some minor hints or prompting. I don't mind if they're rusty or slow. What you're looking for is candidates who are utterly clueless, or horribly confused, about the area in question. >>> The Entirety of Jeff′s Original Post <<<
注意:史蒂夫·耶格(Steve Yegge)最初提出了问题。
解决方案
回答
Perl解决方案
上传者:" MH"通过codinghorror,com在2008年9月5日上午7:29
#!/usr/bin/perl while (<*.html>) { my $filename = $_; my @data = <$filename>; # Loop once through with simple search while (@data) { if (/\(?(\d\d\d)\)?[ -]?(\d\d\d)-?(\d\d\d\d)/) { push( @files, $filename ); next; } } # None found, strip html $text = ""; $text .= $_ while (@data); $text =~ s#<[^>]+>##gxs; # Strip line breaks $text =~ s#\n|\r##gxs; # Check for occurrence. if ( $text =~ /\(?(\d\d\d)\)?[ -]?(\d\d\d)-?(\d\d\d\d)/ ) { push( @files, $filename ); next; } } # Print out result print join( '\n', @files );
回答
用Java实现的。正则表达式是从此论坛借来的。
final String regex = "[\s](\({0,1}\d{3}\){0,1}" + "[- \.]\d{3}[- \.]\d{4})|" + "(\+\d{2}-\d{2,4}-\d{3,4}-\d{3,4})"; final Pattern phonePattern = Pattern.compile(regex); /* The result set */ Set<File> files = new HashSet<File>(); File dir = new File("/initDirPath"); if (!dir.isDirectory()) return; for (File file : dir.listFiles()) { if (file.isDirectory()) continue; BufferedReader reader = new BufferedReader(new FileReader(file)); String line; boolean found = false; while ((line = reader.readLine()) != null && !found) { if (found = phonePattern.matcher(line).find()) { files.add(file); } } } for (File file : files) { System.out.println(file.getAbsolutePath()); }
执行了一些测试,一切顺利! :)
请记住,我不是在这里尝试使用最佳设计。刚刚实现了该算法。
回答
egrep'(?\ d {3})?[-\ s。]?\ d {3} [-。] \ d {4}'* .html
回答
egrep "(([0-9]{1,2}.)?[0-9]{3}.[0-9]{3}.[0-9]{4})" . -R --include='*.html'
回答
我喜欢做这些小问题,不能帮助自己。
不确定是否值得这样做,因为它与Java答案非常相似。
private readonly Regex phoneNumExp = new Regex(@"(\({0,1}\d{3}\){0,1}[- \.]\d{3}[- \.]\d{4})|(\+\d{2}-\d{2,4}-\d{3,4}-\d{3,4})"); public HashSet<string> Search(string dir) { var numbers = new HashSet<string>(); string[] files = Directory.GetFiles(dir, "*.html", SearchOption.AllDirectories); foreach (string file in files) { using (var sr = new StreamReader(file)) { string line; while ((line = sr.ReadLine()) != null) { var match = phoneNumExp.Match(line); if (match.Success) { numbers.Add(match.Value); } } } } return numbers; }
回答
这就是电话面试编码问题不起作用的原因:
电话筛选器:如何在50,000个HTML页面中找到电话号码?
应聘者:请稍等一秒钟(盖手机)嘿(非常擅长编程的室友/朋友/等),如何在50,000个HTML页面中找到电话号码?
保存编码问题,以便在面试中尽早进行,并使面试问题更加个人化,即"我想了解有关上次使用代码解决问题的详细信息"。这是一个要跟进他们细节的问题,要想让其他人为我们回答这个问题而又不会在电话上听起来很奇怪,则要困难得多。
回答
从sieben的Canswer中借用两件事,下面是一个可以完成此任务的Fsnippet。它所缺少的只是一种调用processDirectory的方法,该方法被故意遗漏了:)
open System open System.IO open System.Text.RegularExpressions let rgx = Regex(@"(\({0,1}\d{3}\){0,1}[- \.]\d{3}[- \.]\d{4})|(\+\d{2}-\d{2,4}-\d{3,4}-\d{3,4})", RegexOptions.Compiled) let processFile contents = contents |> rgx.Matches |> Seq.cast |> Seq.map(fun m -> m.Value) let processDirectory path = Directory.GetFiles(path, "*.html", SearchOption.AllDirectories) |> Seq.map(File.ReadAllText >> processFile) |> Seq.concat