[cddlm] In preparation for the CDDLM/ACS joint session at GGF14

Mon Jul 4 07:41:43 CDT 2005

Keisuke Fukui wrote:
> Steve,
> 
> I snipped a part.
> 
> Steve Loughran wrote:
> 
>>
>> As long as references to content can be supplied as URLS, urls that 
>> the programs/JVMs on the hosts can resolve, then we could do (2) and 
>> (3) without a CDDLM implementation needing to know about how those 
>> URLs are supported. That does imply HTTP/FTP/File, but since both .NET 
>> and Java have a way of letting you plug in new URL handlers. If you 
>> had a new url, something like
>>
>> acs://app/124/component/12
>>
>> then we could handle it, though I would strongly advocate using HTTP 
>> as way of retrieving things. not only do all apps support it out the 
>> box, it is easier to debug in a web browser
> 
> 
> Does this mean the HTTP is your first recommendation for the component
> to pull the files?

if you are deploying to a fabric with a shared filesystem, my preference 
is for file://, because any client that looks for file: urls will know 
it wont need to download and cache stuff.

otherwise, HTTP is good because
(a) most things know about it
(b) it is easy to debug behaviour by hand just by constructing the URL 
and tapping it in to your browser.
(c) it goes through firewalls for remote download.

smartfrog already uses the maven2 repository, initially at build time, 
but later on at runtime, where you can declare dependencies on different 
versions of files. In the build I declare the files I want, and the 
default versions

   <target name="m2-files" depends="m2-tasks">
     <m2-libraries pathID="m2.classpath">
       <dependency groupID="org.ggf"
         artifactID="cddlm"
         version="${cddlm.version}"/>
       <dependency groupID="commons-lang"
         artifactID="commons-lang"
         version="${commons-lang.version}"/>
       <dependency groupID="commons-logging"
         artifactID="commons-logging-api"
         version="${commons-logging.version}"/>
       <dependency groupID="log4j"
         artifactID="log4j"
         version="${log4j.version}"/>
       <dependency groupID="org.smartfrog"
         artifactID="sf-xml"
         version="${Version}"/>
       <dependency groupID="xom"
         artifactID="xom"
         version="${xom.version}"/>
       <dependency groupID="xalan"
         artifactID="xalan"
         version="${xalan.version}"/>
     </m2-libraries>
   </target>

There is a properties file somewhere that sets the version of 
everything, but you can override this on a particular machine, which is 
good for a staged adoption of a new file version.

At deploy-time, we can also parse a descriptor to build up components 
that are nothing but declarations of dependencies

first I declare a library with the default settings (directory is 
${user.home}/.maven2/repository; layout policy is maven2)

     library extends Maven2Library {
     }

then I delcare multiple artifacts that come from that repository, with 
version and checksums

     commons-logging extends JarArtifact {
         library LAZY PARENT:library;
         project "commons-logging";
         version "1.0.4";
         sha1 "f029a2aefe2b3e1517573c580f948caac31b1056";
         md5 "8a507817b28077e0478add944c64586a";
     }

     axis extends JarArtifact {
         library LAZY PARENT:library;
         project "axis";
         version "1.1";
         sha1 "edd84c96eac48d4167bca4f45e7d36dcf36cf871";
     }

finally I can declare a component that uses them.

     tcpmonitor extends Java {
         classname "org.apache.axis.utils.tcpmon";
         classpath [
             LAZY axis:absolutePath,
             LAZY commons-logging:absolutePath];
     }

This particular repository caches stuff locally, and generates file:// 
references. Any program on the local system can share the same files, so 
there is a lot less downloading than otherwise, and better offline support.

Maven2 also does transitive dependencies, but I have disabled that in 
smartfrog, as I do not believe that the developers know best. For 
example, Xom depends on Jaxen, and that depends on dom4j. with 
transitive dependendencies. I'd get stuff on my classpath that I do not 
want, namely dom4j.

There is another think to think about which is using WebDAV as a means 
of uploading stuff. I have mixed feelings about this, but note that it 
is standard in the content management system world as so many tools are 
webdav aware, up to and including the winXP filesystem (which lets you 
mount webdav trees drives with a
   NET USE x: \\repository.example.org\tree
)

> 
>>
>> There are two more use cases,
>>
>> -your asset store is used as the back end by an implementation of the 
>> CDDLM services. That is, someone uses <addFile> to add a file, and the 
>> request is forwarded to the ACS repository to add a new file to the 
>> application. Would that work?
> 
> 
> I think it's among the doable possibilities.
> We however understood <addFile> is descried as an interim solution which 
> can
> be not used if an external asset store is used. The asset store can have
> its own repository interface other than <addFile>.
> 
> Your point is to keep <addFile> in common among the implementations and the
> components use HTTP to pull the things from there. So you expect external
> repositories implement <addFile> as an native interface.
> Is this correct understanding?

No, I'd expect a CDDLM service written to offload all repository work to 
an ACS asset store using whatever operations they mutually agreed on.

>>
>> -the results of a job are somehow placed into the asset store, for 
>> later retrieval by the submitter. This is out the scope of CDDLM; 
>> whatever you deploy needs to handle that submission process.
> 
> 
> We have discussed about storing the "output" of the job, but that is
> pending, since the output can be variable per execution. I personally
> doubt if this is sufficiently persistant or stable information that worth
> stored in the repository.

> 
>>
>> Asset stores are a trouble spot with me in the past; they have caused 
>> inordinate amounts of problems, at least when you are trying to use 
>> one built on MSSQL and classic IIS/ASP. Here are some things that I 
>> recall being troublesome
>>  -many file systems cannot have > 65535 files in a single dir, so you 
>> had better not use too flat a filesystem
>>  -if the NAS filestore is set to the GMT tz and the server in PST, it 
>> doesnt make any difference whether or not the clocks themselves are 
>> syncrhonized; the auto-cleanup process is going to delete new files 
>> under the assumption that they are out of date.
>>  -its very hard to secure stuff
>>  -any HTTP data provider must support HTTP/1.1 or at least 
>> content-length headers, so that the caller can determine that the 
>> supplied content was incomplete
>>
>> As with most things, everything worked in development, it is only when 
>> you go to production, put the asset store 1200km away from the 
>> rendering service and keep the files on a different host from the 
>> database that things start to go wrong.
> 
> 
> Let us think about http a little more.

My experiences in the past are all well documented:
http://www.iseran.com/Steve/papers/when_web_services_go_bad.html

HTTP/1.1 download is more universal and useful than SOAP-based download. 
Upload is a more complex beast, search even more troublesome, but 
URL-based retrieval simple and effective. In that maven2 example above, 
we transform (project-name,version,artifact-name, artifact-extension) to 
something like

http://ibiblio.org/maven2/${project-name}/${artifact-name}/${version}/${artifact-name}.${artifact-extension}
for the artifact, and

http://ibiblio.org/maven2/${project-name}/${artifact-name}/${version}/${artifact-name}.pom

for the metadata, including dependency info. The rules are simple, and 
you can browse by hand to see what is there, for example under 
http://www.ibiblio.org/maven2/xom/xom/1.1b2/  to see the stuff for that 
version.

-steve