You are on page 1of 18

ATG

Ensure that liveconfig has been enabled by looking for the string "LIVECONFIG=true" in the startup log. It's best to enable it on when calling runAssembler Consider enabling liveconfig in all remote pre-production environments The EAR should be deployed to all remote pre-production environments in standalone mode (look at server startup for "standalone=true") Verify that all of the data source components (for instance, /atg/dynamo/service/JTDataSource) have a class of atg.nucleus.JNDIReference or atg.service.jdbc.WatcherDataSource. If atg.service.jdbc.WatcherDataSource is used, logging should be disabled. The class atg.service.jdbc.MonitoredDataSource should never be used Ensure that loggingDebug is disabled for all components. After a load test, search the logs for "**** debug" Check that SQLRepositoryEventServer starts up properly if distributed caching is being used. Look at the startup log Ensure that selective cache invalidation is properly enabled. Test it thoroughly Ensure that ServerLockManager is not running on an instance that also uses DAF.Deployment (for instance CA, or Search Admin). To find out which modules are running, look at the "Running Applications" page on /dyn/admin. To see if ServerLockManager is running, grep the startup logs for "ServerLockManager" If locked caching is enabled, ensure that two ServerLockManagers are running per commerce cluster - a primary and a backup. All instances in the cluster should point to the same two ServerLockManagers Verify that URL rewriting is being handled. All links (including document.location.href) should be appended with ;jsessionid=x for cookieless users. This is called URL rewriting. <dsp:a tags automatically append jsessionid. Standard <a href tags do not. If URL rewriting is not done properly for all links, cookieless users should be prevented from accessing the site. Search engine bots, which do not use cookies, will end up creating hundreds of thousands of sessions with each crawling Check that at least two global scenario server (GSS) instances are running per environment. The number of GSS instances is highly dependent on the number and profile of global scenarios. These GSS instances should ideally not be running on an instance that handles end-user sessions. Larger environments (> 100 instances) only need a handful of GSS instances Ensure that all important/relevant patches, fixpacks and hotfixes have been applied. Log in to support.oracle.com, click on the "Patches & Updates" tab, click on the "Product or Family (Advanced)" sub-tab, enter "Oracle ATG Web Commerce" in the box, select your major product version below (e.g. 10.x, 10.1.x, etc) and hit the "Search" button

Ensure that only one process editor server (PES) is running per cluster. If a second process editor starts up, it could replace all of the currently running scenarios with the new ones (if there are any - sometimes there aren't). This can cause major problems. Run the query "select distinct s.machine from v$session s where s.USERNAME = '<CORE_SCHEMA>'" against your Oracle DB to find all boxes connected. Run "select count(*) from v$session s where s.USERNAME ='<CORE_SCHEMA>' and MACHINE='<HOST>';" for each result to count how many ATG instances are connected. On each host, run "ps -ef | grep java | grep atg" to find each ATG process. Go to /dyn/admin/nucleus/atg/scenario/ScenarioManager on each instance. The processEditorServer flag should be false on all but one instance If repository items are imported, be sure to reset the seeds of the affected idspaces appropriately (das_id_generator.seed and das_secure_id_gen.seed). There should be no risk of the ID generators handing out an ID that's already in the database Check through logs from load tests looking for poor transaction demarcation (grep the logs for "atg.dtm.TransactionDemarcationException"). If poor transaction demarcation is found, it needs to be fixed Ensure that nobody changed the $class of /atg/dynamo/transaction/TransactionManager unless there was a good reason to do so. ATG sets this automatically and it should generally not be overwritten When using BigEars (where you provide -Datg.dynamo.modules=x to the JVM), ensure that all modules are started in order from more generic (left) to more specific (right). Verify using this JSP Ensure that static 404, 500, etc pages have been created and are working. Enter jibberish URLs (when connected through the load balancer) and see what happens Verify that the default passwords have been changed for all accounts in das_account Verify that the default passwords for all users in /atg/userprofiling/InternalProfileRepository have been changed Check that $DYNAMO_HOME/localconfig and $DYNAMO_HOME/home/servers/*/localconfig are empty and non-writable to developers. Developers on a dev box can add seemingly innocuous settings here that end up resulting in major issues later because these directories are rolled up into EARs that get deployed to all environments Verify that startup logs are clean. There shouldn't be errors Make sure that the ATG performance monitor has been enabled during load tests (and only during load tests) and its output studied Ensure that the proper ATG repository cache modes are being used. See http://docs.oracle.com/cd/E23095_01/Platform.93/RepositoryGuide/html/s1003cachingmodes0 1.html. Avoid using locked caching where possible. Consider having someone from Oracle review the settings

Ensure that ATG repository cache size tuning has been performed. If the caches are full, the heap should not be full. See this document for more information Verify that adminPort, drpPort, httpPort, rmiPort, siteHttpServerName, siteHttpServerPort properties of /atg/dynamo/Configuration have been properly set on an instance-by-instance basis. Each JVM should have its own unique ATG server (-Datg.dynamo.server.name=X where X = /export/stor07/Oracle/Middleware/user_projects/domains/customer_domain/ATG-Data/servers/X) Ensure that the catalog maintenance service (CMS) is running on one instance in the commerce cluster. The unessential functions should be removed. Seehttp://docs.oracle.com/cd/E23095_01/Platform.93/ATGCommProgGuide/html/s0502batchservic es01.html Ensure that all System.out.println statements are removed from the Java code. Be careful when removing them, as code could break (e.g. Iterator.next() calls) Check that the ScenarioManager, WorkflowProcessManager (in CA, CSC, Search Admin etc), InternalScenarioManager (in CA, CSC, Search Admin, etc) all start up properly. A successful startup message looks like "22:00:17,250 INFO [ScenarioManager] Initializing Process Editor Server X:20150." If you see a message like "Initializing individual process server X:8851. Current configured Process Editor Server is set to X:8851", ensure that the value configured in <servername> for scenarioManager.xml on all cluster instances is equal to the value of /atg/dynamo/service/ServerName.serverName on the designated PES/WPM/ISM instance and restart If price lists are being used and the number of lists == 1, consider setting /atg/commerce/pricing/priceLists/PriceListManager.usePriceCache=false If custom catalogs are used, ensure that DCS.DynamicCustomCatalogs is not running on the agent-side. Verify using the "Running Applications" page on /dyn/admin If internationalized content is stored in repositories backed by Oracle, make sure that all such repositories have useSetUnicodeStream=true. Don't forget the _production and _staging repositories. Put in GLOBAL.properties If using SQLServer, ensure that useSetUnicodeStream=false for all repositories. Put in GLOBAL.properties Look through log files for sensitive data. For instance, make sure nobody printed out a user's credit card number, expiration date, and CVV2 number in a logInfo(). There should be no personally identifiable information contained in logs, with the exception of session IDs Verify that "<distributable>" has been added to the web.xml of each web app using session failover or that -distributable is being passed to runAssembler Ensure that the taglibs (specifically the DSP taglib) in the custom web apps match the patch level that ATG is at. When applying ATG patches, the taglibs in the custom web apps often have to be manually updated

Verify that JavaScript libraries have the "defer="defer"" attribute where possible. If that attribute is not present, the entire JavaScript file must be downloaded (by itself; nothing else may be downloaded in parallel) and fully parsed before the client-side downloading/rendering can continue. Where possible, the defer attribute should also be used for 3rd party libraries. See http://www.schillmania.com/content/entries/2009/defer-script-loading/. There should always be a way to disable JavaScript includes quickly and at runtime Be sure that there are no plans to use the ACC in production unless it is absolutely required Make sure that PDML caching is enabled. maximumCacheEntries, maximumEntryLifetime, maximumEntrySize and maximumCacheSize of /atg/commerce/pricing/PMDLCache should all be set to a large value Check that idspaces in das_secure_id_gen and das_id_generator don't have numeric prefixes or suffixes. You could run into primary key constraint violations otherwise (e.g. prefix of "1" and seed of "001" - you'll run into a conflict with ID "1001") Ensure that all of the batch sizes in das_secure_id_gen are > 997. The batch sizes can default to 7, which leads to great contention on specific records in das_secure_id_gen. That can lead to a cascading failure of a production environment Be sure that all comments are JSP comments instead of HTML comments. HTML comments can give hackers valuable information about the construction of your site Be sure that only forms containing non-sensitive data use HTTP GET. All sensitive data should be submitted using HTTP POST Ensure that all sensitive data is always sent over HTTPS. With the exception of a user's HTTP session ID, nothing personally identifiable should be sent over HTTP Ensure that all components (e.g. JavaScript, CSS, images) of pages sent over HTTPS are accessed over HTTPS Ensure that if transient users or orders are persisted, make sure the consequences are well understood Consider using repository cache groups, which will cut down on the number of individual queries to the database. When cache groups are not used, the contents of each auxiliary table are retrieved using individual SQL queries, as opposed to using one join Consider changing request-handling thread names. This greatly aids with troubleshooting. See this document for more information Ensure that the browsers used for accessing all internally-facing applications comply with Oracle's supported environments matrix. The browser versions do matter, especially with the liberal use of Flex, JavaScript and other front-end technology

Consider using the transaction droplet if you see an excessive number of transactions in the database and/or high database CPU utilization. Be sure to thoroughly test this before deploying to production Verify that the PageFilter in web.xml is bound to the right extension. Typically it's *.jsp, not /*. If it's bound to /*, all HTTP requests (including those for images, CSS, JavaScript, etc) will pass through the servlet pipeline Verify that the item-cache-size and query-cache-size item descriptor attributes set in repository definitions match what's reported in /dyn/admin. There are situations (eg. certain super/subtype relationships) where the values set in the repository definition XML are not the values that are used at runtime If you're using ATG's REST functionality, be sure that only the bare minimum amount of required functionality/data is exposed in restSecurityConfiguration.xml. Test that nobody from the outside can execute methods and retrieve/manipulate data by arbitrarily entering REST URLs Check the code for ArrayLists, HashMaps, and other collections that are not thread-safe but are accessed in a multi-threaded environment (any component marked as "global" in scope). Synchronize or switch to equivalents that are thread safe. Problems resulting from this are notoriously difficult to troubleshoot and can lead to data inconsistency Avoid using MD5 as the password hasher. Use SHA 256 with strong passwords. The default is now SHA 256 but it could be MD5 from an old implementation. See /atg/dynamo/security/DigestPasswordHasher.algorithm Make sure that all caches have bounds, particularly instantiations of atg.service.cache.Cache, like /atg/commerce/pricing/priceLists/PriceCache. Many caches have no bounds out of the box. Caches without bounds are effectively memory leaks Verify that the key repositories are pointing to the proper ID generators. For example, the profile repository using the obfuscated ID generator would not be a good idea Make sure that ATG is not actually installed in a production environment. The EAR should be fully self-contained (e.g. not in "development" mode where it contains pointers to the ATG installation)

Source(s): ATG

ATG Content Administration (CA)


If CA is not clustered, verify that ClientLockManager.useLockServer=false

Verify that /atg/deployment/DeploymentManager.maxThreads=20. Adjust up or down based on DAF deployment performance. Oracle services has code to dynamically optimize the number of deployment threads, based on the number of assets being deployed On the agent-side, ensure that DCS.PublishingAgent is started instead of the standard PublishingAgent (this applies to both standard and custom catalogs) Verify that repositories that CA manages don't have tags like "<load-items item-descriptor="x" load-all-items="true" quiet="true" />" in their definitions Ensure that /atg/dynamo/service/ClusterName.useClusterTable=false and that /atg/dynamo/service/ClusterName.clusterName is unique across environments If custom catalogs are used, start DCS.DynamicCustomCatalogs and DCS.DynamicCustomCatalogs.Versioned Ensure that secondary caches are being invalidated following deployments. Price caches, pmdl caches, and droplet caches are examples of secondary caches. They have classes like atg.service.cache.Cache or atg.droplet.Cache Verify that the maximum number of database connections in the connection pools is adequate based on the value of /atg/deployment/DeploymentManager.maxThreads Check that all primary tables in the versioned schema have indexes on workspace_id and checkin_date per product documentation Ensure that all custom tables in the versioned schema have foreign key constraints, unique constraints, and unique indexes removed per product documentation Verify that the transaction timeout has been extended. The timeout should be around 1.5x the time it takes to perform a full deployment. A good starting point is 100,000 assets/hour using a DAF deployment Ensure that unique constraints all have the "asset_version" column as part of them in the versioned schema. Documentation says to drop all unique constraints from the versioned schema, but that's a bad idea. Dropping unique constraints on the versioned schema but not the catalog schemas could lead to deployment failures. Unique constraints should be in place, though they should be composite - column + asset_version Be sure to make one side of all shared table relationships read-only in CA. See product documentation Be sure to thoroughly test the impact that deployments have on the system, especially while the customer-facing site is under heavy load Make sure there is a promotion approval process in place so that business users don't drive large amounts of traffic to the site without technical administrators having fair warning. Technical administrators should always know when business users are driving large volumes of traffic to the

site so they have time to prepare if necessary. This secondary approval also helps prevent errors in promotions rules. For instance, a business user could say "100% off order" when he or she meant to say "10% off order" Be sure to have a process in place to validate the HTML of the banner ads. Since they're generally free-form HTML, they could contain any number issues that could impact performance, stability, or security. Any HTML that business users or 3rd parties enter on the site should be thoroughly validated by a technical administrator Make sure there is an approvals process for mass emails so that business users don't drive large amounts of traffic to the site without technical administrators having fair warning. Technical administrators should always know when business users are driving large volumes of traffic to the site so they have time to prepare if necessary Ensure that there is a process in place to detect unwanted cyclical references. For example, if you set a given category as its own parent, that could be bad depending on how the site is coded

Source(s): ATG

Database
Ensure that the database(s) are supported per Oracle's supported environments matrix Check that the tablespaces have plenty of available disk space. The tablespace behind the CA schema in particular needs plenty of extra space because the data is versioned Ensure that there are database-level alerts in place to warn of problems, like blocking sessions Verify that the schema owner/app user passwords have been changed. A password that's the same as the login (schema name) is not sufficient! Ensure that there are frequent backups taken of all schemas. It's best to do continual backups to an off-site database Ensure that all constraints are named. A constraint with a name of "SYS_C0025223" doesn't make for very fun debugging Verify that the database can handle all of the connections that the application could create to it. Look at the max size of each connection pool Ensure that no indexes are missing that could lead to deadlocks. Deadlocks can be caused by the absence of an index on a table that references another table. All databases require that a table have an index on a column that references another table by way of a foreign key Make sure to clean up any remaining test data. For instance, test users/orders are often created during load testing

Ensure that the character encoding is the same across all databases, preferably a flavor of UTF8 (AL32UTF8 in the case of Oracle) Verify that failover testing has been performed - both within a single clustered database (e.g. killing specific nodes), and across databases (in the case of Data Guard or GoldenGate) If you're using MSSQL, verify that you are using read commited snapshot isolation. See product documentation. This is not just for performance - it's actually required to prevent deadlocks

Source(s): ATG

Oracle Database
Verify that NLS_LENGTH_SEMANTICS is 'char' across all databases. If it's 'byte' you never know exactly how many characters a given column can handle. If it's 'char,' you know Ensure that all ConText indexes in the catalog schemas are created in the following manner: CREATE INDEX SCHEMA.INDEX_NAME ON SCHEMA.TABLE (COLUMN) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('SYNC (every "sysdate+(1/24)")');. Note the sync parameter. That prevents these indexes from rebuilt after every commit If using logical standby in Oracle, make sure that the versioned schema is in its own database and that logical standby isn't being used on that database

Oracle RAC Database


Consider using SCAN for the read-only data Consider binding core schema connections to one RAC node as described in this article. Always best to cut down on the interconnect traffic In the JDBC URL aka "Connection String" ensure all nodes are listed or a properly configured SCAN address is in use If using WebLogic, use GridLink data sources Be very very careful when setting up RAC Extended Distance Clusters. It requires special expertise, a lot of time for testing, and patience. Try an active/passive database tier with Data Guard instead

Source(s): ATG

Oracle JDBC Driver

Ensure that the JDBC drivers are supported per Oracle's supported environments matrix. Manually open the manifest files of the JDBC driver JAR file to verify If deploying to Exalogic and Exadata, make sure to use the JDBC drivers that ship with WebLogic in order to use SDP Verify that the proper class name is being used. The proper class name is oracle.jdbc.xa.client.OracleXADataSource Be sure to set -Doracle.jdbc.V8Compatible=true when using Oracle 10g databases. This ensures that dates are mapped to timestamps in the database and it helps OOTB indexes (like all of the ones on CHECKIN_DATE in CA) actually are used. See http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-faq-090281.html#08_01

Source(s): ATG

JVM
Ensure that the JDK is supported per Oracle's supported environments matrix. Take note of the FAQThe major version (e.g. 1.6, 1.7) must match what's listed in the matrix, but the minor versions need not match Don't mix and match JDKs between the build environment and the application environment. Ensure the vendor, version and system architecture are all identical. Ensure that thread dumps that have been taken under load show no deadlocks or contention Verify that -Xms and -Xmx are set to the same value and that the values follow the format of "Xms[n]g or -Xmx[n]m" NOT "-Xms[n]gb or -Xmx[n]mb" Verify that java.rmi.server.hostname is set. This is often required when the instance is to be accessed using the ACC or JRockit Mission Control Verify that garbage collection patterns look healthy during load tests and during soak tests. Restarting instances periodically to alleviate memory issues is not acceptable in a production environment Verify that the objects in the heap look normal when the instance is under load Perform an extended load test (12+ hrs) to look for memory leaks Be sure to remove any duplicate JVM args (e.g. -Xms8g -Xmx8g -Datg.dynamo.liveconfig=on -Xm4g) Do not limit the number of lines in a stack trace via the -XX:MaxJavaStackTraceDepth JVM argument; this will make troubleshooting production issues much more difficult --Georgeo12:23, 1 August 2012 (EDT)

Be sure to write a heap dump in the event of an OutOfMemory error using -XX: +HeapDumpOnOutOfMemoryError. Know where they're written and how to download them

Source(s): ATG

JRockit JVM
If deploying on Exalogic, make sure that R28 or higher is used Be sure to use the latest version (unless deploying to Exalogic) Heap sizes should be around 8gb Each JVM should use 2-3 CPU cores Disable management (-Xmanagement should not appear) on all but one or two instances in production. Overhead is low but it's still there If large pages are supported and enabled in the OS, enable with the arguments: -XX: +UseLargePagesForHeap -XX:+ForceLargePagesForHeap. Do not use the argument: -XX: +UseLargePagesForCode Turn on verbose GC logging (even in production). For instance, "-Xverbose:gc,memory,gcreport -XXnoSystemGC -XverboseLog:/logs/MS_1gc.log"

Source(s): ATG

HotSpot JVM
Ensure that -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -verbose:gc -Xloggc:<file> is used. These options should stay enabled even in production Make sure that -XX:+DisableExplicitGC is used. This replaces sun.rmi.dgc.server.gcInterval Ensure that -XX:ThreadStackSize=256 or 128. By default, a 64-bit JVM running on Linux allocates 1m per thread If large pages are supported and enabled in the OS, enable with the argument: -XX: +UseLargePages Make sure that -XX:-TraceClassUnloading is not being used. It's only for debugging and adds a lot of overhead

Make sure that -XX:+UseTLAB is being used. This has been found to lower CPU utilization by 2.44% with ATG

Source(s): ATG

Application Server
Ensure that the app server and version being used complies with Oracle's supported environments matrix. Only use the versions explicitly called out in the support matrix. Do not deviate at all from the point versions specified. ATG is very sensitive to app server version Verify that session replication (clustering) is enabled. All session-scoped components in ATG should be serializable per product documentation Ensure that the session timeout has been set and is appropriate Find out where logs are being written to (ATG and the app server each write out their own logs). Make sure there is a good log rotation strategy in place and that very old logs are automatically purged Verify that whitespace is being stripped from JSP pages Consider pre-compiling JSPs for better performance Be sure to enable log file compression Be sure to enable log file rotation/purging

Source(s): ATG

WebLogic App Server


Consider using GridLink data sources Ensure that the number of connections to the database will not exceed the database's capacity and that there are enough connections Put all of your managed servers in two or three clusters (for code deployment purposes) Make sure you use multicast If there are fewer than four managed servers per cluster, use unicast Be sure to enable whitespace stripping. Instructions may be found here, specifically "compresshtml-template" attribute

Increase the JTA Timeout value to 21600 sec (6 hours, due to BCC deployments) or higher Check the Set XA Transaction Timeout for each of created DataSources for all but the ATGPUB DataSource Set the XA Transaction Timeout value for each of created DataSources to 600 sec for all but the ATGPUB DataSource Increase the Accept Backlog setting by 25% in the Configuration > Tuning tab until the CONNECTION_REFUSED errors disappear or are significantly reduced in the WebLogic plug-in logging on the Apache servers. Increase the Login Timeout on Configuration > Tuning tab for each of created Servers from 5000 to 10000 Increase the Complete Message Timeout on Protocols > General tab for each of created Servers to 300 Increase the Duration on Protocols > HTTP tab for each of created Servers to 200 Uncheck the Enable Keepalives option on Protocols > HTTP tab for each of created Servers Reduce the session timeout from the default of 30 minutes to a lower value of 15 - 20 minutes in the web application's web.xml (unless a higher timeout is a client requirement) If using the WebLogic HTTP Plug-in, make sure that the "WebLogic Plug-In Enabled" setting is checked in the advanced section of the Configuration > General section of the WebLogic cluster or in each singleton WebLogic server

Source(s): ATG

Load Balancer/Web Server


Ensure that nobody can access /dyn/admin from the public internet. Instead of /dyn/admin, use NucleusBrowser. Get it as part of an Oracle services engagement Ensure that production SSL certificates have been installed, have the proper domain, and haven't expired Ensure that redirects are put in place to direct users from the old site to the new site. For instance, say the old site had a context path of /oldsite and the new site has a context path of /newsite. Users going to /oldsite or any page under /oldsite should not be given a 404 Ensure that favicon.ico is in place Check that service/auxiliary instances (GSS/PES/SLM, etc) are not receiving any user sessions

Verify that requests to http(s)://host get redirected to http(s)://host/contextpath (if there is one) Ensure that gzip compression is used for HTML/CSS/JavaScript Verify that robots.txt is in place Verify that directory listing is turned off Ensure that sitemap.xml is in place Ensure that the Expires header is set properly for all static media. The second request of a session should result in all static media being pulled from the user's browser cache. The browser shouldn't have to check with the web server to get an HTTP 304 Ensure that Keep-Alives are properly set for each application. See http://virtualthreads.blogspot.com/2006/01/tuning-apache-part-1.html Consider adding the "X-Content-Type-Options: nosniff" HTTP header. See http://htaccess.wordpress.com/2009/09/22/x-content-type-options-nosniff-header/ Make sure that JkLogLevel is set to "error" in the mod-jk.conf (or in httpd.conf, if mod_jk is set in the httpd.conf) Consider setting the "HttpOnly" attribute when placing cookies. Doing so effectively stops XSS attacks because the cookie cannot be retrieved over JavaScript. See product documentation Consider blocking HTTP requests to embedded JSP fragments, such as header.jsp and footer.jsp. Customers should only be able to access container JSPs, such as index.jsp or registration.jsp If you have redirects in place to your mobile site for mobile users, preserve the entire link upon redirect. For example, a search engine will index http://www.site.com. You may havehttp://m.site.com for mobile users. If a user on a mobile device clicks on a link to http://www.site.com/products/productXYZ.jsp, redirect the user tohttp://m.site.com/products/productXYZ.jsp as opposed to http://m.site.com. Losing links is common and frustrating to mobile users For Apache Web Server, make sure to use the Worker MPM. By default, Apache is configured to use the Prefork MPM, which is less efficient with application servers. Oracle HTTP Server has the Worker MPM configured as its default, Red Hat packages include MPM as well. To enable, simply uncomment the line in /etc/sysconfig/httpd For Apache Web Server, configure the Worker MPM to use a single worker thread pool. Example: StartServers 1, ServerLimit 1, ThreadLimit 2048, MaxClients 2048, ThreadsPerChild 2048, MaxRequestsPerChild 0

Source(s): ATG

Network
Verify that the DNS TTL of the site ATG is replacing is < 5 min, at least at launch Ensure that a strategy is in place to handle DDoS attacks. An edge-based defense (like Akamai's Web Application Firewall - WAF) is the preferred approach Be sure that throttling (end-user traffic) can technically be performed and has been tested in a non-production environment. Also make sure that the approvals process and criteria for throttling is well known by all personnel ahead of time. Take a look at Akamai's Shopper Prioritization Application (SPA) Be sure that if there is a firewall between the database and the application servers, and if the database is inspecting SQL*NET traffic, that the firewall can keep up with it. During periods of heavy site traffic, the firewall CPU can be maxed out due to all of the SQL*NET traffic it has to inspect

Source(s): ATG

Operating System
Ensure that the operating systems that the app servers run on are supported per the supported environments matrix. The major versions generally must match (e.g. Linux 5.x, 6.x, etc) but the point versions generally don't need to Ensure that developers can log in over SSH or Telnet to view log files. Even a browser-based tool is fine Ensure that monitoring is enabled and that the right people receive the right alerts Ensure the LANG and LC_* settings are correct - usually en_US.UTF-8. To check this, try running "locale" If a Linux-based OS, make sure that SELINUX=disabled. If it is in permissive, it can add a 1015% overhead on IO. This is especially harmful if using a VM.

Source(s): ATG

Exalogic
Be sure to run the Exalogic Health Check Utility (download here) Try for a ratio of one JVM per three cores with 8gb heaps. That's what testing has found to work best

Make sure that JRockit is used as the JVM when Oracle Linux is used as the OS Consider using the following JRockit JVM arguments (they have proven to work best on Exalogic, assuming four JVMs per compute node using 8g heap each): -Xgc:pausetime -XXgcThreads=6 -XX:OptThreads=6 -XX:+UseCallProfiling -XXtlasize:min=16k,preferred=1m,wasteLimit=8k -XX:+UseLargePagesForHeap Make sure to use SDP for cluster replication per product documentation Enable Linux HugePages per product documentation Default number of hugepages as of Exalogic 2.0 is 10000 and pagesize is 2mb (max for this Intel chip), for a total of 20GB reserved for large page use. This may not be enough for the JVM's (for instance, 8JVM's using 8g heap with LargePagesForHeap would need at least 32GB of hugepages). Increase the number of hugepages using sysctl to a large enough number to fit the heaps, but not too large that it starves the compute node of non-large page space needed for everything else (for instance, native process space outside of java heap used by JVM/WLS would also go to the regular pages). 40gb per compute node should be a good starting point Make sure to apply the latest PSU as the baseline. Then upgrade everything (WebLogic, JRockit, EECS, etc) as permitted by ATG's supported environments matrix Make sure that NFS attribute caching is enabled (is enabled by default, unless explicitly disabled in mount options) Make sure "Enable Exalogic Optimizations" is turned on all WLS instances, even if no SDP will be used. Ensure that GridLink data sources are used If Exadata is used, ensure that SDP is set up between Exalogic and Exadata per product documentation Ensure that Exalogic's WebLogic-related enhancements are enabled per product documentation If you see the error "Error: atg.search.routing.SearchEngineLaunchException: Can't find search engine binaries. Unknown OS-ARCHITECTURE Linux-x86_64", pass -Dos.arch=amd64 to startRemoteLauncher.sh and rename x86-linux32 to x86-linux64 Be sure that the ZFS projects have high enough disk quotas. The project where logs are written to should have > 1tb of space Run through the latest list of known issues

Source(s): ATG

Exadata
Drop the order_lastmod_idx index from the dcspp_order table. Testing has shown that index to be a problem on Exadata Enable the write-back flash caching. It's off by default Enable huge pages

Source(s): ATG

Security
If using a CDN as a reverse proxy (e.g. Akamai DSA), consider using its application attack prevention technology (e.g. Akamai's Web Application Firewall) to guard against XSS, SQL Injection, etc. These services can guard much more accurately and faster than servlets or filters in the application Make sure that session hijacking attacks are guarded against, specifically attacks from Firesheep. See http://www.informationweek.com/news/security/client/showArticle.jhtml? articleID=228000481&cid=RSSfeed_IWK_All. Note: Firesheep only has 25 sites pre-configured. You can add custom sites though Ensure that black box testing has been performed. IBM's AppScan is a good choice Ensure that a manual security audit by a 3rd party firm specializing in security audits has been performed before launch. Quarterly audits are recommended following launch If using a CDN as a reverse proxy (e.g. Akamai DSA), ensure that your origin (your production environment) is hidden from the public internet and only accepts traffic from your CDN Verify that all unnecessary default logins have been disabled or deleted Verify that a security scanner (something like Nessus) has been run. This will help guard against attacks from the inside and outside Ensure that all unnecessary services (eg. FTP, SMTP, telnet, etc) have been removed. Only services that are core to the OS or application should be running Ensure that all patches/updates have been applied and thoroughly tested prior to launch Run a port scanner against each box to ensure that no unnecessary ports are listening Ensure that all logins (failures and successes) are logged, archived, and available for audit Consider using LDAP or similar for access management

Be sure to perform an audit of all server logins. During development, accounts for developers and accounts for sys admins who leave are often forgotten about Make sure your developers understand that production heap dumps should be treated the same as production database dumps. Heap dumps will contain credit card numbers and other personally identifiable information

Source(s): ATG

What To Monitor
Servers (database/web/app) Disk CPU Memory App servers Free database connections Active HTTP sessions Free request-handling threads Web servers Free request-handling threads

Source(s): ATG

Miscellaneous
Ensure that production builds and deployments are automated. Having people manually performing builds introduces too much risk and variability Ensure that EARs can be rolled back quickly and in an automated fashion. The most recent EAR should always be on the server, ready to be used again

Consider using a "Customer Experience Management" tool like Tealeaf or Coradiant. These tools can record end-user sessions and are incredibly useful for troubleshooting/recreating errors. You may want to modify ATG's logging to print the session ID with each entry to the log file Ensure that an intelligent load balancing strategy is in place. Simply load balancing based on TCP pings is not acceptable, as an instance may be unusable for various reasons but still responsive to TCP pings. A good approach is to have a "healthcheck.jsp" that checks a number of application-level indicators for health and then prints out "OK" or "FAIL." The load balancer (or Apache) can periodically poll healthcheck.jsp and grep for the string "OK" and "FAIL" and then take actions appropriately Make sure that CSS, JavaScript, and image files are retrieved from the server following a new code deployment. If you're not careful, these files can stay permanently cached on the client-side. See http://stackoverflow.com/questions/206783/when-does-browser-automatically-clear-javascriptcache for a good approach Ensure that a search engine will never index any URLs containing a rewritten URL (e.g. ";jsessionid"). If your site is live, search Google for "site:YourSite.com jsessionid" to see if any pages contain rewritten URLs Verify that code is in place to programmatically invalidate HTTP sessions created by bots after each HTTP request. Search engines (should) crawl your site in a stateless fashion, meaning each HTTP request creates a new HTTP session. If you have thousands of HTTP requests per crawl, multiple search engines, the number of sessions and the memory those sessions consume can quickly get out hand Consider having a different pool of instances that handles HTTP requests from bots. A layer 7based load balancer can direct HTTP requests from bots to that special pool. Bots can be aggressive and handling bots requires special code/configuration. In order to isolate any damage done by bots, it's a good idea to keep that traffic separate from everything else Be sure to check for broken links. Use a link checker tool like Xenu - http://home.snafu.de/tilman/xenulink.html

You might also like