Re: [registrars] Bounced message from Gandi
On Mon, Jan 21, 2002 at 10:37:22PM +0530, Bhavin Turakhia took time to write:
> no offence buddy :) - your scripts are awesome ( i still use them) - to be
> more specific than what i wrote below -
> 1. Perl by default is obfuscated (for most people anyways) :)
Ok, I agree it enables to be relaxed, which for me is a plus, but I
understand it can be a minus ;-)
More specific for WhoisExtract, normally you just need to add new
Registrars, which is a matter of writing of creating a new
Gandi::WhoisExtract::Something, and then you can use all others as
But when you want major changes to be done, or read the main module,
WhoisExtract.pm, yes it is more difficult ;-)
> 2. perl runs as a cgi process and is not callable within a java program with
Ok, about that I do not know enough. I know that C and Perl mixes
(fairly) well, but in other cases I do not know.
I understand that our work is near to useless when you want to stick
it in a pure Java framework.
I'm sure that we will have the same reaction when we will put as free
software our whole suite of code
1) implementing RRP & EPP (straight compliant to the draft)
2) implementing a Registry abstraction layer enabling to interface to
as many Registries and protocoles you need
(at least it should be so, we use that already for the 2 Registries
we work with and hope to extend it), and code the application part
always the same
3) implementing a TLS/TCP connection multiplexer with local unix
all in perl, as we use it now ;-)
> as for separating out information - it is quite easy to write regex for some
> of the stuff such as -
> * email addresses
> * dates
> Other aspects like postal codes etc are complex but can be treated as units
> (ie complete address block with zipcode) - though i have noticed with some
> registrars it is difficult to separate out between the addresses and phone
Yes, there are many issues like that.
Splitting state from city for example. Phone numbers from zip codes.
Adresses from names, etc...
One of the example I prefer:
Office MOON 5-15-27 Sekimachikita Haitsu188-202 Nerima-ku Tokyo 177-0051 Japan
(comes straigth in one line)
Where are the adresses ? The owner ?
Ok, a human can infer that Tokyo is city, 177-0051 zip code, and Japan a country.
A human versed in Japanese/Japan (I am not, sadely) could probably
understand the whole thing.
I am not sure however how to safely give the same level of inference
to a program.
This is why I think that a full parser will never exist.
Nor is it needed, at least for transfers (which is only why we
created our software, to grab email addresses to acquire
authorisation, and owner data, to populate our own whois after
> however i am sure each registrar can do this for their own whois
> format, since they would know their own output better than we would
Yes, but we had a similar discussion on Verisign Registrars list before. I am really not sure
that each Registrar willingly will give that to the community.
If you solve that problem (ie make sure that each Registrar helps,
and does their own work to enable anyone to use automated tools to
parse information), then I think that the pure technical part is
easy, in whatever language you do it ;-)
You may remind though that some Registrars were against that idea
(helping in whatever way to have automated tools).
We had a project, through open.gandi.net to have a XML frontend to whois.
You query a whois server that uses our tools (or whatever) to give
back an XML parse of the whois. Should have been nice. And
However we let that behind, because
1) our work is not a total parser (see previous posts to know why)
2) without help from other registrars, it is near to impossible in my
eyes to keep with
- Registrars changing their names (as appearing in Registry whois)
- new Registars
- changes of output for a given Registrar.
Even notifying the community when some Registrar make changes to its
whois, that would be great.
I know for example Bulk and Tucows changed their whois output few
weeks ago. For one of them, I discovered it the hard way :-(
At my last count, our modules did parse (in the limited manner we
needed), more than 70 different Registrars.
I also observed that few stick to NSI legacy format, or close to