Up
collapse
80%
WDescription%
Build stability: 1 out of the last 5 builds failed.80
Build History
x
 
 
 
 
 

Changes

#399 (Oct 7, 2020, 2:27:32 PM)

  1. Added all content to subfolder for imminent merge (commit: 2a4233b) — Claus Stadler / githubweb

#398 (Oct 7, 2020, 2:20:34 PM)

  1. folder structure, minimal worksheet and object, nt file in ressource (commit: e4c7714) — carsten.draschner / githubweb
  2. copied rdf reader to see if the more complex techstack works, and it does after changing project to jdk 8 (commit: 6c2cccc) — carsten.draschner / githubweb
  3. Outsource readin of rdf file into own function from initial run method. make function verbose (commit: 56993cb) — carsten.draschner / githubweb
  4. naive first methods for semantic similarity estimation, not runable: need to be corrected (commit: b903364) — carsten.draschner / githubweb
  5. some code cleanup and getting at least random evaluation running to see if workflow works (commit: 7fbfe30) — carsten.draschner / githubweb
  6. start triple transformation such that we will get dataformat which fits spark implemented min hash for jaccard sim (commit: 38136e0) — carsten.draschner / githubweb
  7. start collecting features, tests with some print lines to ensure code methods do the right stuff (commit: 20d632c) — carsten.draschner / githubweb
  8. collect all features to later transfer them to indexes, this works and is tested by test calls and print lines this correlates to step 4 (commit: 4707589) — carsten.draschner / githubweb
  9. created feature map for features to int as step five in minhash pipeline (commit: 542d084) — carsten.draschner / githubweb
  10. val for total number features as step six, needed for feature representation (commit: 1585843) — carsten.draschner / githubweb
  11. baseline tryout scala script for minhash feature representation transformation process. needed to see how scala functions and data types can be used. works on strings instead of nodes with uri. many main ideas are working (commit: 4f4a39c) — carsten.draschner / githubweb
  12. transformation of rdf data to needed min hash data is done, some example code snippets from spark minHash run through. self written Vectorization used (commit: f1f8785) — carsten.draschner / githubweb
  13. dense scala functions for naive transformation of triple representation to feature map (commit: c309471) — carsten.draschner / githubweb
  14. new sample and started with more dense code for data transformation (commit: c231b2e) — carsten.draschner / githubweb
  15. created idea for nodeIndexer class for feature transformation (commit: 1c7bc95) — carsten.draschner / githubweb
  16. created node indexer as class to get from node to int and vie versa. tested in minhash tryout. structure inspired by spark vectorizers (commit: c178237) — carsten.draschner / githubweb
  17. use new nodeindexer class to transform key to int (commit: 1bdec41) — carsten.draschner / githubweb
  18. call and show result of node indexer (commit: 5f9a383) — carsten.draschner / githubweb
  19. remove test point class (commit: 95e206b) — carsten.draschner / githubweb
  20. main idea for a class to get from triples RDD[Triple] a Map[Node, Iterable[Seq[Node]]] which curresponds to a map from uri to features. currently a problem in transform because of non seriasable but workaround is available for quick sessions (commit: 6d83709) — carsten.draschner / githubweb
  21. call node featoure factory in a working mode (commit: 3a0a63f) — carsten.draschner / githubweb
  22. A complete pipeline working on text basis inspired by databricks tutorial. (commit: 602f610) — carsten.draschner / githubweb
  23. more comments and more verbose, also df show without truncinate (commit: 07543d1) — carsten.draschner / githubweb
  24. created small sample nt file with movie idea for documentation and having very short uris (commit: 9e1b0ea) — carsten.draschner / githubweb
  25. Pipeline now working more verbose and call are more clean defined, stack is working (commit: 4ebae78) — carsten.draschner / githubweb
  26. some new imports, new read in, experiment results as triple output with write out. export strangly as folder (commit: 1b04138) — carsten.draschner / githubweb
  27. added all feature creation modes, added comments to explain some thoughts of development, changed save procedure, changed strings for final exported nt file (commit: c187350) — carsten.draschner / githubweb
  28. added outputs as comments like in stackoverflow and added better usage of sparse vector reusage in approxNearNeigh (commit: 0a755aa) — carsten.draschner / githubweb
  29. added comments for corresponding command line outputs. especially the dataframe repesentations so code explains more whats happening without need to run it (commit: bea5497) — carsten.draschner / githubweb
  30. implemented in pipeline a jaccard similarity calculation (commit: 90c125e) — carsten.draschner / githubweb
  31. added some more comments (commit: 4798645) — carsten.draschner / githubweb
  32. first idea for a module in a pipeline to generate from rdf rdd triples a df (commit: 17dafa8) — carsten.draschner / githubweb
  33. some new trz outs and comments but main idea is in over text pipeline (commit: 3612031) — carsten.draschner / githubweb
  34. future idea for class RDF minhash (commit: 857ae92) — carsten.draschner / githubweb
  35. class that could become part of pipelined semantic similaritz estimation (commit: 4050c3b) — carsten.draschner / githubweb
  36. comment because strange kind of test (commit: 953705e) — carsten.draschner / githubweb
  37. idea to create DF (commit: 68e0236) — carsten.draschner / githubweb
  38. we do not need this anymore. we wont create a central class but multiple. each for one approach (commit: 331a673) — carsten.draschner / githubweb
  39. we might create a node indixer in future but currently we reuse the nlp stack inspired pipeline (commit: 906fcbb) — carsten.draschner / githubweb
  40. in general this is now an old approach because we switched to nlp based pipeline (commit: dff713b) — carsten.draschner / githubweb
  41. added some input descriptions and added rodriguezegenhofer, batet and tversky but some questions to these formulas stored in todos (commit: 34096bc) — carsten.draschner / githubweb
  42. new modular feature extractor for similarity estations. tested and working! (commit: c8d96af) — carsten.draschner / githubweb
  43. modular pipeline for minhash. spark session start, file readin and feature extraction added and working (commit: 0db5b8c) — carsten.draschner / githubweb
  44. ideas for further approaches siliar to jaccard noted (commit: 9644606) — carsten.draschner / githubweb
  45. integrated count Vectorizer and made all important parameters a block and not hardcoded in constructer or method calls if they should be avaulable later in cmd call (commit: bc731ec) — carsten.draschner / githubweb
  46. added minhash with reformatting columns for easier usability and clearer reading. tested and working (commit: 373ebd2) — carsten.draschner / githubweb
  47. older node based approach. deleted because will be replaced by RDF_Feature_Extractor (commit: fea7b85) — carsten.draschner / githubweb
  48. older alternative feature extraction approach. deleted because will be replaced by RDF_Feature_Extractor (commit: 296dfbd) — carsten.draschner / githubweb
  49. old dummy class. not used anymore (commit: a6fcbac) — carsten.draschner / githubweb
  50. module for metagraph factory. the central inforamtion for an experiment got created now for each pair are next up (commit: 625e080) — carsten.draschner / githubweb
  51. replaced this tryout workflow bz over text pipeline and clean version will be in run minhash (commit: 51c9b74) — carsten.draschner / githubweb
  52. rearranged parameter, removed stringsfrom central uri of experiment, and fixed bug RDD creation. now tested and working (commit: 756615c) — carsten.draschner / githubweb
  53. added line for metagraph creeation. working and tested but not implemented the triples for each pair od elements with its similarity value (commit: cff6273) — carsten.draschner / githubweb
  54. added creation of triples also for small comparison elements, tested and done (commit: 9c89721) — carsten.draschner / githubweb
  55. store rdf representation of minhash similarity assesment. tested and working (commit: a71f4f0) — carsten.draschner / githubweb
  56. placeholder which is now replaced by minhash (commit: 20c0000) — carsten.draschner / githubweb
  57. added datetime to outputpath so no conflicting output paths should occur (commit: 07990dc) — carsten.draschner / githubweb
  58. noted todo that relation might be changed to something different in future (commit: 4cef2ee) — carsten.draschner / githubweb
  59. wrote first complete approach for a jaccard modular model corresponding to minhash from spark to make pipeline interchangeable (commit: 9baafb5) — carsten.draschner / githubweb
  60. fixed bug that lit cannot hadle Vector but typedLit can. also wrong df was handed over because withColumn works not inplace. now everthing is tested and working for jaccard (commit: 325f3db) — carsten.draschner / githubweb
  61. created complete running pipeline for jaccard with minor changes compared to Minhash. only changes needed where in some parameters, especially the minhash vs kaccard parameters and the call of jaccardmodel vs minhashmodel. tested and running. (commit: 5385cd3) — carsten.draschner / githubweb
  62. added missing parameter for number hash tables and changed name to more generic name so later superclass can easier handle differences in different simialrity estimators (commit: c1fb902) — carsten.draschner / githubweb
  63. remove outcommented parts from minhash (commit: d1d39fa) — carsten.draschner / githubweb
  64. created very generic superclass for several similarity estioations so reusablity is enlarged. tversky uses this already. tested and working! (commit: 5d18240) — carsten.draschner / githubweb
  65. tversky model implemented as current best example of reusing code for several Similarity estions. tested and working over Tversky in run (commit: ba9be76) — carsten.draschner / githubweb
  66. run of tversky with new reusability. tested and working. next up making this pipeline also more reusable in code perspective and reusing generic code (commit: d5df096) — carsten.draschner / githubweb
  67. ideas for making pipeline more generic (commit: 59e1c98) — carsten.draschner / githubweb
  68. added keep column option to switch between behavior known from spark min hash and behavior which is needed for metagraph creation. also changed esrimation udf to protected (commit: ca9c3eb) — carsten.draschner / githubweb
  69. changed complete jaccard to reusing generic similarity estimator code. tested and working. added keep column option to switch between behavior known from spark min hash and behavior which is needed for metagraph creation. also changed estimation udf to protected (commit: 3947af8) — carsten.draschner / githubweb
  70. changed jaccard pipeline to more generic one like tversky. tested and working (commit: 5631787) — carsten.draschner / githubweb
  71. chanhed all fitting elements to protected. so not callable from outside. also build in keep column for metagraph vs minhash behavior. also added options for filtering and ordering results as intended (commit: c96afaa) — carsten.draschner / githubweb
  72. createt batet distance. everthing what is needed is adjusted and now optimized compared to minhashOverpipeline cecause log is now inside udf. alternative still in comment (commit: ba1b6fb) — carsten.draschner / githubweb
  73. created braun blaquet model. pretty similar to jaccard (commit: 1706977) — carsten.draschner / githubweb
  74. removed non necessary code which was left in comment (commit: 60b8478) — carsten.draschner / githubweb
  75. created chiai model can be used as jaccardModel etc (commit: 6ac10a5) — carsten.draschner / githubweb
  76. created ROdriguez egenhofer on basis of tversky... can be called similar but now you have only alpha but not betha (commit: 560916f) — carsten.draschner / githubweb
  77. created simpson model. similar to braun blanquet. only subsumer different (commit: 63430bd) — carsten.draschner / githubweb
  78. tversky pipeline now with keep column option used for now working storage also for nn estiamtion to meta rdf graph (commit: 75ccc0d) — carsten.draschner / githubweb
  79. added typed to parameters in the beginning and changed approxNearestNeigbors st. it has another column so metagraph creation can handle it (commit: 97964f0) — carsten.draschner / githubweb
  80. added types to parameters (commit: bb325ca) — carsten.draschner / githubweb
  81. removed betha which was left from tversky (commit: 60249af) — carsten.draschner / githubweb
  82. rename generic superclass so it is alligned with the _Model name domain (commit: 4b75c29) — carsten.draschner / githubweb
  83. new error handling and new set opportunitiy. now over this so it is more alligned with the mllib behavior (commit: 336aa1d) — carsten.draschner / githubweb
  84. new set opportunitiy. now over this so it is more alligned with the mllib behavior (commit: 14065e1) — carsten.draschner / githubweb
  85. new set opportunitiy. now over this so it is more alligned with the mllib behavior and different error handling (commit: fe41671) — carsten.draschner / githubweb
  86. new set opportunitiy. now over this so it is more alligned with the mllib behavior (commit: 34f27b9) — carsten.draschner / githubweb
  87. different call of set parameters and also some more output for further discussions (commit: bd9d7a6) — carsten.draschner / githubweb
  88. different call of set parameters (commit: d78fbf5) — carsten.draschner / githubweb
  89. different call of set parameters (commit: 62c2c93) — carsten.draschner / githubweb
  90. added dice model as close model to jaccard (commit: 7799390) — carsten.draschner / githubweb
  91. added checks for nan bug. also inserted default alpha and beta so devide by zeo can not occur by default. (commit: 6a0e4df) — carsten.draschner / githubweb
  92. print lines of all major parts so run can be inspected more easily (commit: f8f5dba) — carsten.draschner / githubweb
  93. call of column name checks to look if default values fit if they are not net (commit: cf2447b) — carsten.draschner / githubweb
  94. object type specificaition (commit: efc8d73) — carsten.draschner / githubweb
  95. added column check method (commit: 5ba4483) — carsten.draschner / githubweb
  96. Feature extractor now working on basis of dataframe which is read in by spark.read.rdf (commit: 3c3493c) — carsten.draschner / githubweb
  97. rename to model (commit: 439d730) — carsten.draschner / githubweb
  98. major pipeline calling for all similarity experiments. started to optmize for sansa server execution. creates csv with important information (commit: a2b7e0f) — carsten.draschner / githubweb
  99. optimize featureExtractor for parallel computing (commit: 770d75b) — carsten.draschner / githubweb
  100. optimization to run on spark server over cmd line tools (commit: 7e30bcb) — carsten.draschner / githubweb
  101. mainly allow hyperparameter evaluation (commit: 1359951) — carsten.draschner / githubweb
  102. clearer structure, more aligned to scala camelCase and new handling of default values (commit: f53a3a8) — carsten.draschner / githubweb
  103. parameteras are setable over config file. now with cmd calling only argument needed to specify path to config (commit: 2a1d3b4) — carsten.draschner / githubweb
  104. not needed anymore becausse the implementation in sansa which was reused here does not allign with paper (commit: b6c5b8e) — carsten.draschner / githubweb
  105. some new try outs (commit: 4d079d3) — carsten.draschner / githubweb
  106. fixed bugs, alligned scala style and set default value handling (commit: 9a107f9) — carsten.draschner / githubweb
  107. started showcasing object for minimal calls needed for example for simpleML (commit: c1ab861) — carsten.draschner / githubweb
  108. first placeholder class for possible node indexer which can work as alternative to read in rdf into spark as df (commit: 6709bdb) — carsten.draschner / githubweb
  109. chnaged to use shade dependecy which creates huge jar including all needed packages. needed for server export so all imports run as intended (commit: 8257357) — carsten.draschner / githubweb
  110. changed default value of features column (commit: b793bc5) — carsten.draschner / githubweb
  111. scala camelCase style started (commit: 038305b) — carsten.draschner / githubweb
  112. allign model software code design (commit: 2c7822e) — carsten.draschner / githubweb
  113. collected all currently implemented similarity estiomations as minimal calls (commit: 2c041b6) — carsten.draschner / githubweb
  114. added alpha and beta (commit: 15aded4) — carsten.draschner / githubweb
  115. camelcase for alpha and beta (commit: 0a39073) — carsten.draschner / githubweb
  116. typo in beta (commit: 0ace453) — carsten.draschner / githubweb
  117. changed main class to experiment call (commit: 0f4c19a) — carsten.draschner / githubweb
  118. config resolver created by farshad to handle dynamically the file path of local and hdfs paths (commit: a919c19) — carsten.draschner / githubweb
  119. move outputpath to config, readin chnages by usage of config resolver so hdfs and local is usable, currently not working (commit: 6c60d80) — carsten.draschner / githubweb
  120. created object to test server usage with minimal complexity (commit: 6a8c228) — carsten.draschner / githubweb
  121. working evaluation pipeline (commit: 0519fb8) — carsten.draschner / githubweb
  122. file lister implemented by farshad so both local and hdfs files are listable (commit: 07147f7) — carsten.draschner / githubweb
  123. Fixed wrong method calls in similarity examples (commit: 5ffa145) — Patrick Westphal / githubweb
  124. Made FeatureExtractorModel inherit frm Transformer (commit: eebee24) — Patrick Westphal / githubweb
  125. change to CamelCase (commit: 1070594) — carsten.draschner / githubweb
  126. added number of runs as parameter for more stable processing times calculation (commit: 7e6c7f9) — carsten.draschner / githubweb
  127. minor try outs (commit: 904f594) — carsten.draschner / githubweb
  128. changes for camelCase (commit: f2ec52b) — carsten.draschner / githubweb
  129. file was only test class (commit: 15dad37) — carsten.draschner / githubweb
  130. this was first attempt but is now resolved in more cleaner substructures (commit: 000a9f4) — carsten.draschner / githubweb
  131. better place now for certain try outs (commit: ca4d9f5) — carsten.draschner / githubweb
  132. tversky added and key retrieval aligned. (commit: 2027a67) — carsten.draschner / githubweb
  133. bug fixing if both feature vectors are empty (commit: 86896bf) — carsten.draschner / githubweb
  134. if feature vector union emtpty return 0 (commit: 6dea37e) — carsten.draschner / githubweb
  135. key generation once in the beginning and length of dataframes in prints (commit: 09bab6c) — carsten.draschner / githubweb
  136. added further examples in minimal calls like subset estimation or stacked approaches (commit: a1ad081) — carsten.draschner / githubweb
  137. tests with filter option for only considering movies (commit: 0558c53) — carsten.draschner / githubweb
  138. added stacked option and give the option to stop at specified points in pipeline to try out parts of pipeline (commit: 41f8653) — carsten.draschner / githubweb
  139. some new autoimports (commit: 04d048e) — carsten.draschner / githubweb
  140. remove unused comment (commit: bc089d4) — carsten.draschner / githubweb
  141. autoimports reorganize and limit dataframe show to 10 rows to test if this ressults in out of memory (commit: d2d9ef6) — carsten.draschner / githubweb
  142. removed some command line print typos and rearranged minor things (commit: aecc3a9) — carsten.draschner / githubweb
  143. not needed, merged into FeatureExtractorModel. we now have overloaded transform for Dataset/Dataframe and for RDD Triple Node (commit: 831bb80) — carsten.draschner / githubweb
  144. added comments aligned to scala doc and limit dataframe output in cmd print (commit: bdfa143) — carsten.draschner / githubweb
  145. created docstrings aligned to scala doc and merged overloaded ttanform to make use of RDD Triple Node based read in possible (commit: b0b715d) — carsten.draschner / githubweb
  146. created scala doc aligned docstrings (commit: dcd8248) — carsten.draschner / githubweb
  147. scala docstrings to describe shortlz what this class is about (commit: 04eaa21) — carsten.draschner / githubweb
  148. refactor to camelCase (commit: bc3be7d) — carsten.draschner / githubweb
  149. changed the name for one relation (commit: d3a6f7c) — carsten.draschner / githubweb
  150. refactor name (commit: 1b10415) — carsten.draschner / githubweb
  151. refactor name (commit: 77739ee) — carsten.draschner / githubweb
  152. camelcase refactor (commit: ff273bd) — carsten.draschner / githubweb
  153. camelcase refactor (commit: f8621ec) — carsten.draschner / githubweb
  154. refactor camel case and switch to dataframe from rdd read in (commit: 7f6e75c) — carsten.draschner / githubweb
  155. move local[*] to conf and not in code, quick and dirty filtering for hands on try out. started with logging instead of printing (commit: 77c8060) — carsten.draschner / githubweb
  156. seperated filter part for relevant uris (commit: 9de95e2) — carsten.draschner / githubweb
  157. prepared first ReadMe.md (commit: 5ec68f9) — carsten.draschner / githubweb
  158. remove of not needed code (commit: 21e59a5) — carsten.draschner / githubweb
  159. renamed package to more suited and camelcase name (commit: b8bdeea) — carsten.draschner / githubweb
  160. started with unit tests for semantic simialrity estimation (commit: 2593ad9) — carsten.draschner / githubweb
  161. moved spark setup inside each run so overhead is present in each run (commit: a1b3f16) — carsten.draschner / githubweb
  162. optimized layout (commit: c7c434f) — carsten.draschner / githubweb
  163. tests changed from show to count because it is faster (commit: 4d96d04) — carsten.draschner / githubweb
  164. added information s.t. metagraph creator can easily fetch that all are similarity estimators (commit: e428ff4) — carsten.draschner / githubweb
  165. changed default column labeling from underscore to camel case (commit: 5f87664) — carsten.draschner / githubweb
  166. created minhash on basis of apache spark minhash lsh and added behavior as in other generic similarity estimators to allow better follow up handling of comulmns for e.g. metagraph creation (commit: d0cb5fd) — carsten.draschner / githubweb
  167. created novel method for metagraph creation using better literal creation etc and clearer parameters (commit: 2e3a137) — carsten.draschner / githubweb
  168. cleaned shape and tryouts for other datasets (commit: f5e6984) — carsten.draschner / githubweb
  169. minimal calls now with novel minHash and novel metagraph creation (commit: ad65179) — carsten.draschner / githubweb
  170. new default value for column labeling (commit: 6a9de1c) — carsten.draschner / githubweb
  171. removed because not needed anymore. we created a new file for full pipeline with all estimators instead having dedicated one for each model (commit: 34d6453) — carsten.draschner / githubweb
  172. new approach to have one concise pipeline for easy usage instead of needed construction of each pipeline (commit: ddf28c1) — carsten.draschner / githubweb
  173. nin hash depricated and now moved to similarityPipeline (commit: 87e7b6b) — carsten.draschner / githubweb
  174. changes for novel metagraph creation (commit: bf3c6d6) — carsten.draschner / githubweb
  175. replaces show by count to make tests faster (commit: f5958f2) — carsten.draschner / githubweb
  176. added comments and parameters, pipeline is working (commit: f328796) — carsten.draschner / githubweb
  177. added another alternative to start main (commit: c9588a5) — carsten.draschner / githubweb
  178. jaccard now merge in similarity Pipeline (commit: 9a025f1) — carsten.draschner / githubweb
  179. keeps track of experiment number when unning over extensive hyperparamter grid and printing this in cmd line (commit: 8718283) — carsten.draschner / githubweb
  180. added num hash tables option (commit: 27a2e54) — carsten.draschner / githubweb
  181. extended tests and give each pupeline module its own test (commit: 7dd3931) — carsten.draschner / githubweb
  182. scala docs retest (commit: f084a7b) — carsten.draschner / githubweb
  183. Create index.md (commit: 76c9cc0) — GitHub / githubweb
  184. Create index.html (commit: bc630af) — GitHub / githubweb
  185. Update index.md (commit: 3ff4fff) — GitHub / githubweb
  186. Delete index.html (commit: 99eafc4) — GitHub / githubweb
  187. Update index.md (commit: db60062) — GitHub / githubweb
  188. Update index.md (commit: e18a5a7) — GitHub / githubweb
  189. Update index.md (commit: 644b19e) — GitHub / githubweb
  190. experiments still entry point (commit: 9616163) — carsten.draschner / githubweb
  191. added more models to experments and fixed bug in counting total experiments (commit: caf057b) — carsten.draschner / githubweb
  192. moved movie file in subfolder (commit: b487432) — carsten.draschner / githubweb
  193. sample parameter setup for similarity evaluation (commit: 20275c2) — carsten.draschner / githubweb
  194. Update .travis.yml (commit: b9c1099) — GitHub / githubweb
  195. Update .travis.yml (commit: 78e6c43) — GitHub / githubweb
  196. Update .travis.yml (commit: 5a5f6fb) — GitHub / githubweb
  197. Create main.yml (commit: e413943) — GitHub / githubweb

#396 (Jul 10, 2020, 10:14:52 AM)

  1. prepare for Scala 2.12 support (commit: 1025a89) — Lorenz Buehmann / githubweb
  2. Update .travis.yml (commit: 2b7e231) — GitHub / githubweb
  3. Use Scala 2.12 prefix on teh artifactId (commit: 50d6685) — Gezim Sejdiu / githubweb
  4. Bump scala version to 2.12.11 on travis-ci (commit: 04e1f78) — Gezim Sejdiu / githubweb

#395 (Jan 15, 2020, 10:36:49 PM)

  1. Bump SANSA version to 0.7.1-SNAPSHOT (commit: 717f22d) — Gezim Sejdiu / githubweb
  2. Update versions for release (commit: 3673370) — Gezim Sejdiu / githubweb
  3. Update for next development version (commit: 236d4ca) — Gezim Sejdiu / githubweb